[jira] [Commented] (HDFS-12051) Reimplement NameCache in NameNode: Intern duplicate byte[] arrays (mainly those denoting file/directory names) to save memory

2018-02-13 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16363347#comment-16363347
 ] 

Aaron T. Myers commented on HDFS-12051:
---

Thanks for the followup, [~mi...@cloudera.com]. A few responses:

bq. It won't, but I can add this functionality.

Great, thanks.

bq. As you can see, this class is not really a static singleton. Its public API 
is indeed a single static put() method, but inside there is a singleton 
instance of NameCache, with its instance methods. Initially I didn't have this 
singleton at all, and it indeed was an instance variable of FSNamesystem. But 
later I found that there are several other places in the code where duplicate 
byte[] arrays are generated, and where it would be very hard to pass this 
instance variable. So I ended up with this static API, which makes it easier to 
use NameCache anywhere in the code. But ability to test it is not compromised.

Sorry, I shouldn't have said the class was a singleton, but I think the point 
remains. Especially in the context of tests, wherein we have potentially 
several HA or federated NNs running within a single process, I worry that using 
a singleton instance will cause some odd behavior. Passing it around may be 
difficult, but do all the places in the code where you're adding calls to 
{{NameCache}} perhaps have a reference to the {{FSNamesystem}}? If so, making 
the {{NameCache}} a member of the {{FSNamesystem}} may make that not so hard to 
deal with.

bq. Well, I can try that, but honestly, how paranoid should we be? In my 
opinion, this code is simple enough to pass with a combination of unit tests 
and some runs in the cluster.

I think we need to be diligent in confirming the correctness of this change, or 
any change like this, as the ramifications of a bug here are both potentially 
subtle and severe.

bq. The single findbugs issue has been already explained. It's legitimate, but 
we intentionally do something that wouldn't be good in general (use a volatile 
field and increment it without synchronization) just to enable some information 
for testing without degrading performance in production. As for unit tests - 
well, every time some different unit test fails, which makes me think that they 
are flaky (I had same experience in the past with my other changes in HDFS). I 
looked at them but couldn't see any obvious signs that the problems are related 
to my code. There are timeouts and similar things that tend to happen in flaky 
tests. Here I think I really need help from someone else in the HDFS team.

I think you're probably right that the failures are flaky tests - I just wanted 
to make sure you or someone had taken a look at them and confirmed that.

bq. I don't think there is any problem here. We use the same formula to get the 
next slot, and it wraps around the array boundary correctly. Take a look at the 
test program below that uses the same formula, and its output:

Gotcha, makes sense. This behavior would be a great thing to ensure is in a 
unit test.

> Reimplement NameCache in NameNode: Intern duplicate byte[] arrays (mainly 
> those denoting file/directory names) to save memory
> -
>
> Key: HDFS-12051
> URL: https://issues.apache.org/jira/browse/HDFS-12051
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Misha Dmitriev
>Assignee: Misha Dmitriev
>Priority: Major
> Attachments: HDFS-12051-NameCache-Rewrite.pdf, HDFS-12051.01.patch, 
> HDFS-12051.02.patch, HDFS-12051.03.patch, HDFS-12051.04.patch, 
> HDFS-12051.05.patch, HDFS-12051.06.patch, HDFS-12051.07.patch, 
> HDFS-12051.08.patch, HDFS-12051.09.patch, HDFS-12051.10.patch, 
> HDFS-12051.11.patch
>
>
> When snapshot diff operation is performed in a NameNode that manages several 
> million HDFS files/directories, NN needs a lot of memory. Analyzing one heap 
> dump with jxray (www.jxray.com), we observed that duplicate byte[] arrays 
> result in 6.5% memory overhead, and most of these arrays are referenced by 
> {{org.apache.hadoop.hdfs.server.namenode.INodeFileAttributes$SnapshotCopy.name}}
>  and {{org.apache.hadoop.hdfs.server.namenode.INodeFile.name}}:
> {code:java}
> 19. DUPLICATE PRIMITIVE ARRAYS
> Types of duplicate objects:
>  Ovhd Num objs  Num unique objs   Class name
> 3,220,272K (6.5%)   104749528  25760871 byte[]
> 
>   1,841,485K (3.7%), 53194037 dup arrays (13158094 unique)
> 3510556 of byte[17](112, 97, 114, 116, 45, 109, 45, 48, 48, 48, ...), 2228255 
> of byte[8](48, 48, 48, 48, 48, 48, 95, 48), 357439 of byte[17](112, 97, 114, 
> 116, 45, 109, 45, 48, 48, 48, ...), 237395 of byte[8](48, 48, 48, 48, 48, 49, 
> 95, 48), 227853 of byte[17](112, 97, 114, 116, 45, 109, 45, 48, 48, 48, 

[jira] [Commented] (HDFS-12051) Reimplement NameCache in NameNode: Intern duplicate byte[] arrays (mainly those denoting file/directory names) to save memory

2018-02-07 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16356542#comment-16356542
 ] 

Aaron T. Myers commented on HDFS-12051:
---

I've taken a quick look at the patch and have some questions:

# Is there a way to disable the cache entirely, if we find that there's some 
bug in the implementation? e.g. if you set the ratio to 0, does everything 
behave correctly?
# How hard would it be to not make this class a static singleton, and instead 
have a single instance of it in the NN that can be referenced, perhaps as an 
instance variable of the {{FSNamesystem}}? That seems a bit less fragile if 
it's possible, and could allow for the class to be more easily tested.
# Have you done any verification of the correctness of this cache in any of 
your benchmarks? e.g. something that walked the file system tree to ensure that 
the names are identical with/without this cache I think would help allay 
correctness concerns.
# I'd really like to see some more tests of the actual cache implementation 
itself, e.g. in the presence of hash collisions, behavior at the boundaries of 
the main cache array, overlap of slots probed in the open addressing search, 
other edge cases, etc.
# I see that precommit raised some findbugs warnings and had some failed unit 
tests. Can we please address the findbugs warnings, and also confirm that those 
unit test failures are unrelated?
# Seems like this cache will have a somewhat odd behavior if an item hashes to 
a slot that's within {{MAX_COLLISION_CHAIN_LEN}} slots of the end of the array, 
in that it looks like we'll just probe the same slot over and over again up to 
{{MAX_COLLISION_CHAIN_LEN}} times. Is this to be expected?

[~mi...@cloudera.com] - in general I share [~szetszwo]'s concern that we just 
need to be very careful with changes to this sort of code in the NN, because 
even a small bug could subtly result in very severe consequences. I realize 
that the length of time that this patch has been up is frustrating for you, but 
please understand that the concerns being raised are in good faith, and are 
just focused on trying to ensure that file system data is not ever put at risk. 
The more tests you can include in the patch, and the more correctness testing 
you can report having done on the patch, will help all reviewers feel more 
comfortable and confident in committing this very valuable change. The recent 
reviews that this patch have received demonstrate to me that we're moving in a 
good direction to getting this JIRA resolved.

I'd also like to ping [~daryn] and [~kihwal] to see if they have time to review 
this change, as I bet they'll be keenly interested in this improvement.

> Reimplement NameCache in NameNode: Intern duplicate byte[] arrays (mainly 
> those denoting file/directory names) to save memory
> -
>
> Key: HDFS-12051
> URL: https://issues.apache.org/jira/browse/HDFS-12051
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Misha Dmitriev
>Assignee: Misha Dmitriev
>Priority: Major
> Attachments: HDFS-12051-NameCache-Rewrite.pdf, HDFS-12051.01.patch, 
> HDFS-12051.02.patch, HDFS-12051.03.patch, HDFS-12051.04.patch, 
> HDFS-12051.05.patch, HDFS-12051.06.patch, HDFS-12051.07.patch, 
> HDFS-12051.08.patch, HDFS-12051.09.patch, HDFS-12051.10.patch, 
> HDFS-12051.11.patch
>
>
> When snapshot diff operation is performed in a NameNode that manages several 
> million HDFS files/directories, NN needs a lot of memory. Analyzing one heap 
> dump with jxray (www.jxray.com), we observed that duplicate byte[] arrays 
> result in 6.5% memory overhead, and most of these arrays are referenced by 
> {{org.apache.hadoop.hdfs.server.namenode.INodeFileAttributes$SnapshotCopy.name}}
>  and {{org.apache.hadoop.hdfs.server.namenode.INodeFile.name}}:
> {code:java}
> 19. DUPLICATE PRIMITIVE ARRAYS
> Types of duplicate objects:
>  Ovhd Num objs  Num unique objs   Class name
> 3,220,272K (6.5%)   104749528  25760871 byte[]
> 
>   1,841,485K (3.7%), 53194037 dup arrays (13158094 unique)
> 3510556 of byte[17](112, 97, 114, 116, 45, 109, 45, 48, 48, 48, ...), 2228255 
> of byte[8](48, 48, 48, 48, 48, 48, 95, 48), 357439 of byte[17](112, 97, 114, 
> 116, 45, 109, 45, 48, 48, 48, ...), 237395 of byte[8](48, 48, 48, 48, 48, 49, 
> 95, 48), 227853 of byte[17](112, 97, 114, 116, 45, 109, 45, 48, 48, 48, ...), 
> 179193 of byte[17](112, 97, 114, 116, 45, 109, 45, 48, 48, 48, ...), 169487 
> of byte[8](48, 48, 48, 48, 48, 50, 95, 48), 145055 of byte[17](112, 97, 114, 
> 116, 45, 109, 45, 48, 48, 48, ...), 128134 of byte[8](48, 48, 48, 48, 48, 51, 
> 95, 48), 108265 of byte[17](112, 97, 114, 116, 45, 109, 45, 48, 48, 48, ...)
> ... 

[jira] [Commented] (HDFS-12990) Change default NameNode RPC port back to 8020

2018-01-16 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16328056#comment-16328056
 ] 

Aaron T. Myers commented on HDFS-12990:
---

{quote}I also don't like the port number, but I don't like the risk that 
someone might be testing Hadoop 3.0.0 release, and decided to put 3.0.1 on 
production at random future time to find that that we made an incompatible 
change for NN rpc in a future event that we can not predict. 
{quote}
This is a very pragmatic concern, but I'm hoping you'll agree that there are 
*vastly* more users on 2.x versions that will eventually upgrade to some future 
3.x version. Those 2.x users are also surely not expecting the default NN RPC 
port to change for no good reason at all. To not fix this issue now will be 
creating a headache for all of us, and likely close to 100% of our users 
eventually, for years to come. Changing this back now is certainly not ideal, 
but from a purely pragmatic perspective it is obviously the right choice.

 

> Change default NameNode RPC port back to 8020
> -
>
> Key: HDFS-12990
> URL: https://issues.apache.org/jira/browse/HDFS-12990
> Project: Hadoop HDFS
>  Issue Type: Task
>  Components: namenode
>Affects Versions: 3.0.0
>Reporter: Xiao Chen
>Assignee: Xiao Chen
>Priority: Critical
> Attachments: HDFS-12990.01.patch
>
>
> In HDFS-9427 (HDFS should not default to ephemeral ports), we changed all 
> default ports to ephemeral ports, which is very appreciated by admin. As part 
> of that change, we also modified the NN RPC port from the famous 8020 to 
> 9820, to be closer to other ports changed there.
> With more integration going on, it appears that all the other ephemeral port 
> changes are fine, but the NN RPC port change is painful for downstream on 
> migrating to Hadoop 3. Some examples include:
> # Hive table locations pointing to hdfs://nn:port/dir
> # Downstream minicluster unit tests that assumed 8020
> # Oozie workflows / downstream scripts that used 8020
> This isn't a problem for HA URLs, since that does not include the port 
> number. But considering the downstream impact, instead of requiring all of 
> them change their stuff, it would be a way better experience to leave the NN 
> port unchanged. This will benefit Hadoop 3 adoption and ease unnecessary 
> upgrade burdens.
> It is of course incompatible, but giving 3.0.0 is just out, IMO it worths to 
> switch the port back.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12990) Change default NameNode RPC port back to 8020

2018-01-08 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16317507#comment-16317507
 ] 

Aaron T. Myers commented on HDFS-12990:
---

bq. No incompatible changes are allowed between 3.0.0 and 3.0.1. Dot releases 
only allow bug fixes. Won't you agree?

In general I agree that we should avoid making incompatible changes in dot or 
minor releases, but I also believe that we have historically evaluated any 
change like this on a case-by-case basis. I'm sure I can find some examples 
where we have made incompatible changes in minor or dot releases because we 
concluded that the benefits outweighed the costs. For example, I believe we 
have upgraded a dependency to address a security issue.

Regardless, I'm happy to discuss this more broadly and see how others view the 
compatibility guidelines in this respect. I'll start a thread on common-dev@ to 
discuss this and we can go from there.

> Change default NameNode RPC port back to 8020
> -
>
> Key: HDFS-12990
> URL: https://issues.apache.org/jira/browse/HDFS-12990
> Project: Hadoop HDFS
>  Issue Type: Task
>  Components: namenode
>Affects Versions: 3.0.0
>Reporter: Xiao Chen
>Assignee: Xiao Chen
>Priority: Critical
> Attachments: HDFS-12990.01.patch
>
>
> In HDFS-9427 (HDFS should not default to ephemeral ports), we changed all 
> default ports to ephemeral ports, which is very appreciated by admin. As part 
> of that change, we also modified the NN RPC port from the famous 8020 to 
> 9820, to be closer to other ports changed there.
> With more integration going on, it appears that all the other ephemeral port 
> changes are fine, but the NN RPC port change is painful for downstream on 
> migrating to Hadoop 3. Some examples include:
> # Hive table locations pointing to hdfs://nn:port/dir
> # Downstream minicluster unit tests that assumed 8020
> # Oozie workflows / downstream scripts that used 8020
> This isn't a problem for HA URLs, since that does not include the port 
> number. But considering the downstream impact, instead of requiring all of 
> them change their stuff, it would be a way better experience to leave the NN 
> port unchanged. This will benefit Hadoop 3 adoption and ease unnecessary 
> upgrade burdens.
> It is of course incompatible, but giving 3.0.0 is just out, IMO it worths to 
> switch the port back.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12990) Change default NameNode RPC port back to 8020

2018-01-08 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-12990:
--
Release Note: 
HDFS NameNode default RPC port is changed back to 8020. The only official 
release that has this differently is 3.0.0, which used 9820.

It is recommended for 2.x users to upgrade to 3.0.1+, to reduce the burden on 
changing default NN RPC port.

  was:
HDFS NameNode default RPC port is changed back to 8020. The only official 
release that has this differently is 3.0.0, which used 9820.

It is recommended from 2.x users to upgrade to 3.0.1+, to reduce the burden on 
changing default NN RPC port.


> Change default NameNode RPC port back to 8020
> -
>
> Key: HDFS-12990
> URL: https://issues.apache.org/jira/browse/HDFS-12990
> Project: Hadoop HDFS
>  Issue Type: Task
>  Components: namenode
>Affects Versions: 3.0.0
>Reporter: Xiao Chen
>Assignee: Xiao Chen
>Priority: Critical
> Attachments: HDFS-12990.01.patch
>
>
> In HDFS-9427 (HDFS should not default to ephemeral ports), we changed all 
> default ports to ephemeral ports, which is very appreciated by admin. As part 
> of that change, we also modified the NN RPC port from the famous 8020 to 
> 9820, to be closer to other ports changed there.
> With more integration going on, it appears that all the other ephemeral port 
> changes are fine, but the NN RPC port change is painful for downstream on 
> migrating to Hadoop 3. Some examples include:
> # Hive table locations pointing to hdfs://nn:port/dir
> # Downstream minicluster unit tests that assumed 8020
> # Oozie workflows / downstream scripts that used 8020
> This isn't a problem for HA URLs, since that does not include the port 
> number. But considering the downstream impact, instead of requiring all of 
> them change their stuff, it would be a way better experience to leave the NN 
> port unchanged. This will benefit Hadoop 3 adoption and ease unnecessary 
> upgrade burdens.
> It is of course incompatible, but giving 3.0.0 is just out, IMO it worths to 
> switch the port back.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12990) Change default NameNode RPC port back to 8020

2018-01-08 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16316786#comment-16316786
 ] 

Aaron T. Myers commented on HDFS-12990:
---

Hey Nicholas,

bq. We should first discuss changing the current compatibility policy before 
this JIRA since we have to change the policy before committing this.

Maybe I'm missing something, but what part of the existing compatibility policy 
would we have to change? There's no question that this would be an incompatible 
change, and I don't think any of us are suggesting that the policy should be 
changed to consider a change like this to be compatible. Rather, I and others 
believe that this is an incompatible change worth making from a pragmatic 
perspective.

My understanding of our compatibility policy is that it aims to describe all of 
what should be considered incompatible, but that we can still make incompatible 
changes as long as we mark them as such, call them out with release notes, and 
(like any change) no committer objects.

{quote}
It is not a good idea to discuss the policy here since many people will miss 
the discussion by the summary of this JIRA.

I believe this won't be the only JIRA which likes to break the compatibility.
{quote}

Totally agree, but as described above I don't think we need to change the 
policy in order to make this incompatible change.

Nicholas, to be totally explicit, are you opposed to making this change given 
the circumstances? Or do you just think we need to discuss it further and more 
broadly?

> Change default NameNode RPC port back to 8020
> -
>
> Key: HDFS-12990
> URL: https://issues.apache.org/jira/browse/HDFS-12990
> Project: Hadoop HDFS
>  Issue Type: Task
>  Components: namenode
>Affects Versions: 3.0.0
>Reporter: Xiao Chen
>Assignee: Xiao Chen
>Priority: Critical
> Attachments: HDFS-12990.01.patch
>
>
> In HDFS-9427 (HDFS should not default to ephemeral ports), we changed all 
> default ports to ephemeral ports, which is very appreciated by admin. As part 
> of that change, we also modified the NN RPC port from the famous 8020 to 
> 9820, to be closer to other ports changed there.
> With more integration going on, it appears that all the other ephemeral port 
> changes are fine, but the NN RPC port change is painful for downstream on 
> migrating to Hadoop 3. Some examples include:
> # Hive table locations pointing to hdfs://nn:port/dir
> # Downstream minicluster unit tests that assumed 8020
> # Oozie workflows / downstream scripts that used 8020
> This isn't a problem for HA URLs, since that does not include the port 
> number. But considering the downstream impact, instead of requiring all of 
> them change their stuff, it would be a way better experience to leave the NN 
> port unchanged. This will benefit Hadoop 3 adoption and ease unnecessary 
> upgrade burdens.
> It is of course incompatible, but giving 3.0.0 is just out, IMO it worths to 
> switch the port back.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12990) Change default NameNode RPC port back to 8020

2018-01-05 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16313755#comment-16313755
 ] 

Aaron T. Myers commented on HDFS-12990:
---

bq. Until this point, there are probably very few people / cluster have 
upgraded to 3.0.0., so it's likely that this will not negatively impact anyone. 
But if we do not make this change, eventually all existing deployments will be 
affected.

To me, this is the most compelling point here. Yes, this is definitely an 
incompatible change, but if we make it now I think we are far more likely to 
help more users and downstream applications with their upgrades from 2.x 
releases than we are to hurt the *very* few folks who have tried out 3.0.0. 
Given how recently 3.0.0 was released, it seems likely to me that no 
substantial deployments have moved to this release yet.

[~anu] - what are your thoughts? From a pragmatic standpoint, I think this is 
*definitely* the right trade-off to make.

> Change default NameNode RPC port back to 8020
> -
>
> Key: HDFS-12990
> URL: https://issues.apache.org/jira/browse/HDFS-12990
> Project: Hadoop HDFS
>  Issue Type: Task
>  Components: namenode
>Affects Versions: 3.0.0
>Reporter: Xiao Chen
>Assignee: Xiao Chen
>Priority: Critical
> Attachments: HDFS-12990.01.patch
>
>
> In HDFS-9427 (HDFS should not default to ephemeral ports), we changed all 
> default ports to ephemeral ports, which is very appreciated by admin. As part 
> of that change, we also modified the NN RPC port from the famous 8020 to 
> 9820, to be closer to other ports changed there.
> With more integration going on, it appears that all the other ephemeral port 
> changes are fine, but the NN RPC port change is painful for downstream on 
> migrating to Hadoop 3. Some examples include:
> # Hive table locations pointing to hdfs://nn:port/dir
> # Downstream minicluster unit tests that assumed 8020
> # Oozie workflows / downstream scripts that used 8020
> This isn't a problem for HA URLs, since that does not include the port 
> number. But considering the downstream impact, instead of requiring all of 
> them change their stuff, it would be a way better experience to leave the NN 
> port unchanged. This will benefit Hadoop 3 adoption and ease unnecessary 
> upgrade burdens.
> It is of course incompatible, but giving 3.0.0 is just out, IMO it worths to 
> switch the port back.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11096) Support rolling upgrade between 2.x and 3.x

2017-09-29 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16186201#comment-16186201
 ] 

Aaron T. Myers commented on HDFS-11096:
---

bq. Currently YARN rolling upgrades are actually failing because the Hadoop 3 
ResourceManager is using URIs like http://ns1:8020/... (logical namespace name 
used as though it's a hostname). Not sure where that's coming from yet, I need 
to dig there, but I doubt the fix there will invalidate much review of the 
current code.

I investigated this the other day and came to the conclusion that this is due 
to a bug introduced by YARN-6457. See [this 
comment|https://issues.apache.org/jira/browse/YARN-6457?focusedCommentId=16184700=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16184700]
 in particular for a description of the issue.

> Support rolling upgrade between 2.x and 3.x
> ---
>
> Key: HDFS-11096
> URL: https://issues.apache.org/jira/browse/HDFS-11096
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: rolling upgrades
>Affects Versions: 3.0.0-alpha1
>Reporter: Andrew Wang
>Assignee: Sean Mackrory
>Priority: Blocker
> Attachments: HDFS-11096.001.patch, HDFS-11096.002.patch, 
> HDFS-11096.003.patch, HDFS-11096.004.patch
>
>
> trunk has a minimum software version of 3.0.0-alpha1. This means we can't 
> rolling upgrade between branch-2 and trunk.
> This is a showstopper for large deployments. Unless there are very compelling 
> reasons to break compatibility, let's restore the ability to rolling upgrade 
> to 3.x releases.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12357) Let NameNode to bypass external attribute provider for special user

2017-08-29 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16146393#comment-16146393
 ] 

Aaron T. Myers commented on HDFS-12357:
---

Took a quick look at the patch, not thorough. One thing jumped out at me on a 
cursory inspection - why make {{usersToBypassExtAttrProvider}} static? Probably 
won't cause any problems, but also doesn't seem necessary, and could 
potentially confuse the situation, e.g. in a unit test with multiple NNs.

> Let NameNode to bypass external attribute provider for special user
> ---
>
> Key: HDFS-12357
> URL: https://issues.apache.org/jira/browse/HDFS-12357
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
> Attachments: HDFS-12357.001.patch
>
>
> This is a third proposal to solve the problem described in HDFS-12202.
> The problem is, when we do distcp from one cluster to another (or within the 
> same cluster), in addition to copying file data, we copy the metadata from 
> source to target. If external attribute provider is enabled, the metadata may 
> be read from the provider, thus provider data read from source may be saved 
> to target HDFS. 
> We want to avoid saving metadata from external provider to HDFS, so we want 
> to bypass external provider when doing the distcp (or hadoop fs -cp) 
> operation.
> Two alternative approaches were proposed earlier, one in HDFS-12202, the 
> other in HDFS-12294. The proposal here is the third one.
> The idea is, we introduce a new config, that specifies a special user (or a 
> list of users), and let NN bypass external provider when the current user is 
> a special user.
> If we run applications as the special user that need data from external 
> attribute provider, then it won't work. So the constraint on this approach 
> is, the special users here should not run applications that need data from 
> external provider.
> Thanks [~asuresh] for proposing this idea and [~chris.douglas], [~daryn], 
> [~manojg] for the discussions in the other jiras. 
> I'm creating this one to discuss further.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11660) TestFsDatasetCache#testPageRounder fails intermittently with AssertionError

2017-04-19 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15975502#comment-15975502
 ] 

Aaron T. Myers commented on HDFS-11660:
---

+1, latest patch looks good to me. Good improvements all around.

Thanks, [~andrew.wang].

> TestFsDatasetCache#testPageRounder fails intermittently with AssertionError
> ---
>
> Key: HDFS-11660
> URL: https://issues.apache.org/jira/browse/HDFS-11660
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.6.5
>Reporter: Andrew Wang
>Assignee: Andrew Wang
> Attachments: HDFS-11660.001.patch, HDFS-11660.002.patch, 
> HDFS-11660.003.patch
>
>
> We've seen this test fail occasionally with an error like the following:
> {noformat}
> java.lang.AssertionError
>   at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:510)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:695)
>   at java.lang.Thread.run(Thread.java:745)
> {noformat}
> This assertion fires when the heartbeat response is null



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11531) Expose hedged read metrics via libHDFS API

2017-03-16 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-11531:
--
Target Version/s: 3.0.0-alpha3
  Status: Patch Available  (was: Open)

> Expose hedged read metrics via libHDFS API
> --
>
> Key: HDFS-11531
> URL: https://issues.apache.org/jira/browse/HDFS-11531
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: libhdfs
>Affects Versions: 2.6.0
>Reporter: Sailesh Mukil
>Assignee: Sailesh Mukil
> Attachments: HDFS-11531.000.patch
>
>
> It would be good to expose the DFSHedgedReadMetrics via a libHDFS API for 
> applications to retrieve.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-11531) Expose hedged read metrics via libHDFS API

2017-03-16 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers reassigned HDFS-11531:
-

Assignee: Sailesh Mukil

> Expose hedged read metrics via libHDFS API
> --
>
> Key: HDFS-11531
> URL: https://issues.apache.org/jira/browse/HDFS-11531
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: libhdfs
>Affects Versions: 2.6.0
>Reporter: Sailesh Mukil
>Assignee: Sailesh Mukil
> Attachments: HDFS-11531.000.patch
>
>
> It would be good to expose the DFSHedgedReadMetrics via a libHDFS API for 
> applications to retrieve.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11441) Add escaping to error message in web KMS web UI

2017-03-03 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-11441:
--
Status: Patch Available  (was: Open)

> Add escaping to error message in web KMS web UI
> ---
>
> Key: HDFS-11441
> URL: https://issues.apache.org/jira/browse/HDFS-11441
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: security
>Affects Versions: 2.8.0
>Reporter: Aaron T. Myers
>Assignee: Aaron T. Myers
>Priority: Minor
> Attachments: HDFS-11441-branch-2.6.patch, HDFS-11441.patch, 
> HDFS-11441.patch
>
>
> There's a handful of places where web UIs don't escape error messages. We 
> should add escaping in these places.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11441) Add escaping to error message in web KMS web UI

2017-03-03 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-11441:
--
Summary: Add escaping to error message in web KMS web UI  (was: Add 
escaping to error messages in web UIs)

> Add escaping to error message in web KMS web UI
> ---
>
> Key: HDFS-11441
> URL: https://issues.apache.org/jira/browse/HDFS-11441
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: security
>Affects Versions: 2.8.0
>Reporter: Aaron T. Myers
>Assignee: Aaron T. Myers
>Priority: Minor
> Attachments: HDFS-11441-branch-2.6.patch, HDFS-11441.patch, 
> HDFS-11441.patch
>
>
> There's a handful of places where web UIs don't escape error messages. We 
> should add escaping in these places.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11441) Add escaping to error messages in web UIs

2017-03-03 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-11441:
--
Attachment: HDFS-11441.patch

Thanks a lot for the review, [~andrew.wang]. You're quite right - turns out 
there's a quoting input filter that handles escaping for anything using 
{{HttpServer}} or {{HttpServer2}}. I manually checked every instance I was 
trying to fix in this patch, and the only thing that isn't covered is the 
{{KMSAuthenticationFilter}}, which runs using Tomcat, and so I don't think has 
any quoting to this point.

Attaching a new patch which just covers that spot.

> Add escaping to error messages in web UIs
> -
>
> Key: HDFS-11441
> URL: https://issues.apache.org/jira/browse/HDFS-11441
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: security
>Affects Versions: 2.8.0
>Reporter: Aaron T. Myers
>Assignee: Aaron T. Myers
>Priority: Minor
> Attachments: HDFS-11441-branch-2.6.patch, HDFS-11441.patch, 
> HDFS-11441.patch
>
>
> There's a handful of places where web UIs don't escape error messages. We 
> should add escaping in these places.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11441) Add escaping to error messages in web UIs

2017-03-03 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-11441:
--
Status: Open  (was: Patch Available)

> Add escaping to error messages in web UIs
> -
>
> Key: HDFS-11441
> URL: https://issues.apache.org/jira/browse/HDFS-11441
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: security
>Affects Versions: 2.8.0
>Reporter: Aaron T. Myers
>Assignee: Aaron T. Myers
>Priority: Minor
> Attachments: HDFS-11441-branch-2.6.patch, HDFS-11441.patch
>
>
> There's a handful of places where web UIs don't escape error messages. We 
> should add escaping in these places.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11441) Add escaping to error messages in web UIs

2017-03-02 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-11441:
--
Attachment: HDFS-11441-branch-2.6.patch

Thanks a lot for the review, [~andrew.wang]. Would you mind verifying that the 
failed tests are unrelated to this patch? I'm having some trouble with my dev 
environment right now.

Also attaching a patch which does the same thing but for branch-2.6 as well, 
since that branch includes the old JSPs which also need some fixups.

No tests are included since this just changes the rendering of some HTML on the 
Web UIs.

> Add escaping to error messages in web UIs
> -
>
> Key: HDFS-11441
> URL: https://issues.apache.org/jira/browse/HDFS-11441
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: security
>Affects Versions: 2.8.0
>Reporter: Aaron T. Myers
>Assignee: Aaron T. Myers
>Priority: Minor
> Attachments: HDFS-11441-branch-2.6.patch, HDFS-11441.patch
>
>
> There's a handful of places where web UIs don't escape error messages. We 
> should add escaping in these places.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10899) Add functionality to re-encrypt EDEKs

2017-02-24 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-10899:
--
Summary: Add functionality to re-encrypt EDEKs  (was: Add functionality to 
re-encrypt EDEKs.)

> Add functionality to re-encrypt EDEKs
> -
>
> Key: HDFS-10899
> URL: https://issues.apache.org/jira/browse/HDFS-10899
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: encryption, kms
>Reporter: Xiao Chen
>Assignee: Xiao Chen
> Attachments: editsStored, HDFS-10899.01.patch, HDFS-10899.02.patch, 
> HDFS-10899.03.patch, HDFS-10899.04.patch, HDFS-10899.05.patch, 
> HDFS-10899.06.patch, HDFS-10899.07.patch, HDFS-10899.08.patch, 
> HDFS-10899.09.patch, HDFS-10899.wip.2.patch, HDFS-10899.wip.patch, Re-encrypt 
> edek design doc.pdf
>
>
> Currently when an encryption zone (EZ) key is rotated, it only takes effect 
> on new EDEKs. We should provide a way to re-encrypt EDEKs after the EZ key 
> rotation, for improved security.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11441) Add escaping to error messages in web UIs

2017-02-22 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-11441:
--
Status: Patch Available  (was: Open)

> Add escaping to error messages in web UIs
> -
>
> Key: HDFS-11441
> URL: https://issues.apache.org/jira/browse/HDFS-11441
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: security
>Affects Versions: 2.8.0
>Reporter: Aaron T. Myers
>Assignee: Aaron T. Myers
>Priority: Minor
> Attachments: HDFS-11441.patch
>
>
> There's a handful of places where web UIs don't escape error messages. We 
> should add escaping in these places.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11441) Add escaping to error messages in web UIs

2017-02-22 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-11441:
--
Attachment: HDFS-11441.patch

Attaching a patch which just adds calls to {{HtmlQuoting#quoteHtmlChars}} in a 
few places.

> Add escaping to error messages in web UIs
> -
>
> Key: HDFS-11441
> URL: https://issues.apache.org/jira/browse/HDFS-11441
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: security
>Affects Versions: 2.8.0
>Reporter: Aaron T. Myers
>Assignee: Aaron T. Myers
>Priority: Minor
> Attachments: HDFS-11441.patch
>
>
> There's a handful of places where web UIs don't escape error messages. We 
> should add escaping in these places.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-11441) Add escaping to error messages in web UIs

2017-02-22 Thread Aaron T. Myers (JIRA)
Aaron T. Myers created HDFS-11441:
-

 Summary: Add escaping to error messages in web UIs
 Key: HDFS-11441
 URL: https://issues.apache.org/jira/browse/HDFS-11441
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: security
Affects Versions: 2.8.0
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers
Priority: Minor


There's a handful of places where web UIs don't escape error messages. We 
should add escaping in these places.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10654) Move building of httpfs dependency analysis under "docs" profile

2016-07-20 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15386843#comment-15386843
 ] 

Aaron T. Myers commented on HDFS-10654:
---

Patch looks good to me. +1 pending Jenkins.

Thanks, Andrew.

> Move building of httpfs dependency analysis under "docs" profile
> 
>
> Key: HDFS-10654
> URL: https://issues.apache.org/jira/browse/HDFS-10654
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: build, httpfs
>Affects Versions: 2.6.4
>Reporter: Andrew Wang
>Assignee: Andrew Wang
>Priority: Minor
> Attachments: HDFS-10654.001.patch
>
>
> When built with "-Pdist" but not "-Pdocs", httpfs still generates a 
> share/docs directory since the dependency report is run unconditionally. 
> Let's move it under the "docs" profile like the rest of the site.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10423) Increase default value of httpfs maxHttpHeaderSize

2016-06-20 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-10423:
--
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.0.0-alpha1
   Status: Resolved  (was: Patch Available)

I've just committed this change to trunk.

Thanks again for the contribution, [~npopa].

> Increase default value of httpfs maxHttpHeaderSize
> --
>
> Key: HDFS-10423
> URL: https://issues.apache.org/jira/browse/HDFS-10423
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.6.4, 3.0.0-alpha1
>Reporter: Nicolae Popa
>Assignee: Nicolae Popa
>Priority: Minor
> Fix For: 3.0.0-alpha1
>
> Attachments: HDFS-10423.01.patch, HDFS-10423.02.patch, 
> testing-after-HDFS-10423.txt, 
> testing-after-HDFS-10423_withCustomHeader4.txt, 
> testing-before-HDFS-10423.txt
>
>
> The Tomcat default value of maxHttpHeaderSize is 8k, which is too low for 
> certain Hadoop workloads in kerberos enabled environments. This JIRA will to 
> change it to 65536 in server.xml



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10423) Increase default value of httpfs maxHttpHeaderSize

2016-06-20 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15340375#comment-15340375
 ] 

Aaron T. Myers commented on HDFS-10423:
---

The latest patch looks good to me, and thanks a lot [~npopa] for describing the 
manual testing you did.

+1, I'm going to commit this momentarily.

> Increase default value of httpfs maxHttpHeaderSize
> --
>
> Key: HDFS-10423
> URL: https://issues.apache.org/jira/browse/HDFS-10423
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.6.4, 3.0.0-alpha1
>Reporter: Nicolae Popa
>Assignee: Nicolae Popa
>Priority: Minor
> Attachments: HDFS-10423.01.patch, HDFS-10423.02.patch, 
> testing-after-HDFS-10423.txt, 
> testing-after-HDFS-10423_withCustomHeader4.txt, 
> testing-before-HDFS-10423.txt
>
>
> The Tomcat default value of maxHttpHeaderSize is 8k, which is too low for 
> certain Hadoop workloads in kerberos enabled environments. This JIRA will to 
> change it to 65536 in server.xml



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10423) Increase default value of httpfs maxHttpHeaderSize

2016-06-20 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-10423:
--
Target Version/s: 3.0.0-alpha1

> Increase default value of httpfs maxHttpHeaderSize
> --
>
> Key: HDFS-10423
> URL: https://issues.apache.org/jira/browse/HDFS-10423
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.6.4, 3.0.0-alpha1
>Reporter: Nicolae Popa
>Assignee: Nicolae Popa
>Priority: Minor
> Attachments: HDFS-10423.01.patch, HDFS-10423.02.patch, 
> testing-after-HDFS-10423.txt, 
> testing-after-HDFS-10423_withCustomHeader4.txt, 
> testing-before-HDFS-10423.txt
>
>
> The Tomcat default value of maxHttpHeaderSize is 8k, which is too low for 
> certain Hadoop workloads in kerberos enabled environments. This JIRA will to 
> change it to 65536 in server.xml



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-10423) Increase default value of httpfs maxHttpHeaderSize

2016-06-20 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers reassigned HDFS-10423:
-

Assignee: Nicolae Popa

> Increase default value of httpfs maxHttpHeaderSize
> --
>
> Key: HDFS-10423
> URL: https://issues.apache.org/jira/browse/HDFS-10423
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.6.4, 3.0.0-alpha1
>Reporter: Nicolae Popa
>Assignee: Nicolae Popa
>Priority: Minor
> Attachments: HDFS-10423.01.patch, HDFS-10423.02.patch, 
> testing-after-HDFS-10423.txt, 
> testing-after-HDFS-10423_withCustomHeader4.txt, 
> testing-before-HDFS-10423.txt
>
>
> The Tomcat default value of maxHttpHeaderSize is 8k, which is too low for 
> certain Hadoop workloads in kerberos enabled environments. This JIRA will to 
> change it to 65536 in server.xml



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-3077) Quorum-based protocol for reading and writing edit logs

2016-06-16 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers reassigned HDFS-3077:


Assignee: Todd Lipcon

> Quorum-based protocol for reading and writing edit logs
> ---
>
> Key: HDFS-3077
> URL: https://issues.apache.org/jira/browse/HDFS-3077
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: ha, namenode
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Fix For: QuorumJournalManager (HDFS-3077), 2.0.3-alpha
>
> Attachments: hdfs-3077-branch-2.txt, hdfs-3077-partial.txt, 
> hdfs-3077-test-merge.txt, hdfs-3077.txt, hdfs-3077.txt, hdfs-3077.txt, 
> hdfs-3077.txt, hdfs-3077.txt, hdfs-3077.txt, hdfs-3077.txt, 
> qjournal-design.pdf, qjournal-design.pdf, qjournal-design.pdf, 
> qjournal-design.pdf, qjournal-design.pdf, qjournal-design.pdf, 
> qjournal-design.tex, qjournal-design.tex
>
>
> Currently, one of the weak points of the HA design is that it relies on 
> shared storage such as an NFS filer for the shared edit log. One alternative 
> that has been proposed is to depend on BookKeeper, a ZooKeeper subproject 
> which provides a highly available replicated edit log on commodity hardware. 
> This JIRA is to implement another alternative, based on a quorum commit 
> protocol, integrated more tightly in HDFS and with the requirements driven 
> only by HDFS's needs rather than more generic use cases. More details to 
> follow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10463) TestRollingFileSystemSinkWithHdfs needs some cleanup

2016-05-26 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15302822#comment-15302822
 ] 

Aaron T. Myers commented on HDFS-10463:
---

Good point! +1, looks good to me.

Thanks, Daniel.

> TestRollingFileSystemSinkWithHdfs needs some cleanup
> 
>
> Key: HDFS-10463
> URL: https://issues.apache.org/jira/browse/HDFS-10463
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.9.0
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Critical
> Attachments: HDFS-10463.001.patch, HDFS-10463.branch-2.001.patch
>
>
> There are three primary issues.  The most significant is that the 
> {{testFlushThread()}} method doesn't clean up after itself, which can cause 
> other tests to fail.  The other big issue is that the {{testSilentAppend()}} 
> method is testing the wrong thing.  An additional minor issue is that none of 
> the tests are careful about making sure the metrics system gets shutdown in 
> all cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10463) TestRollingFileSystemSinkWithHdfs needs some cleanup

2016-05-25 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15301465#comment-15301465
 ] 

Aaron T. Myers commented on HDFS-10463:
---

[~templedf] - trunk patch looks pretty good to me, but seems like the branch-2 
patch had this very test fail. I'll be +1 once that's addressed.

> TestRollingFileSystemSinkWithHdfs needs some cleanup
> 
>
> Key: HDFS-10463
> URL: https://issues.apache.org/jira/browse/HDFS-10463
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.9.0
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Critical
> Attachments: HDFS-10463.001.patch, HDFS-10463.branch-2.001.patch
>
>
> There are three primary issues.  The most significant is that the 
> {{testFlushThread()}} method doesn't clean up after itself, which can cause 
> other tests to fail.  The other big issue is that the {{testSilentAppend()}} 
> method is testing the wrong thing.  An additional minor issue is that none of 
> the tests are careful about making sure the metrics system gets shutdown in 
> all cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-10463) TestRollingFileSystemSinkWithHdfs needs some cleanup

2016-05-25 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers reassigned HDFS-10463:
-

Assignee: Daniel Templeton

> TestRollingFileSystemSinkWithHdfs needs some cleanup
> 
>
> Key: HDFS-10463
> URL: https://issues.apache.org/jira/browse/HDFS-10463
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.9.0
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Critical
> Attachments: HDFS-10463.001.patch
>
>
> There are three primary issues.  The most significant is that the 
> {{testFlushThread()}} method doesn't clean up after itself, which can cause 
> other tests to fail.  The other big issue is that the {{testSilentAppend()}} 
> method is testing the wrong thing.  An additional minor issue is that none of 
> the tests are careful about making sure the metrics system gets shutdown in 
> all cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-8008) Support client-side back off when the datanodes are congested

2016-05-24 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-8008:
-
Issue Type: Improvement  (was: New Feature)

> Support client-side back off when the datanodes are congested
> -
>
> Key: HDFS-8008
> URL: https://issues.apache.org/jira/browse/HDFS-8008
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Reporter: Haohui Mai
>Assignee: Haohui Mai
> Fix For: 2.8.0
>
> Attachments: HDFS-8008.000.patch, HDFS-8008.001.patch, 
> HDFS-8008.002.patch, HDFS-8008.003.patch
>
>
> HDFS-7270 introduces the mechanism for DataNode to signal congestions. 
> DFSClient should be able to recognize the signals and back off.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10323) transient deleteOnExit failure in ViewFileSystem due to close() ordering

2016-05-17 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-10323:
--
Affects Version/s: 2.6.0
 Target Version/s: 3.0.0-beta1

Agree, seems incompatible. Targeting for 3.0.

> transient deleteOnExit failure in ViewFileSystem due to close() ordering
> 
>
> Key: HDFS-10323
> URL: https://issues.apache.org/jira/browse/HDFS-10323
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: federation
>Affects Versions: 2.6.0
>Reporter: Ben Podgursky
>
> After switching to using a ViewFileSystem, fs.deleteOnExit calls began 
> failing frequently, displaying this error on failure:
> 16/04/21 13:56:24 INFO fs.FileSystem: Ignoring failure to deleteOnExit for 
> path /tmp/delete_on_exit_test_123/a438afc0-a3ca-44f1-9eb5-010ca4a62d84
> Since FileSystem eats the error involved, it is difficult to be sure what the 
> error is, but I believe what is happening is that the ViewFileSystem’s child 
> FileSystems are being close()’d before the ViewFileSystem, due to the random 
> order ClientFinalizer closes FileSystems; so then when the ViewFileSystem 
> tries to close(), it tries to forward the delete() calls to the appropriate 
> child, and fails because the child is already closed.
> I’m unsure how to write an actual Hadoop test to reproduce this, since it 
> involves testing behavior on actual JVM shutdown.  However, I can verify that 
> while
> {code:java}
> fs.deleteOnExit(randomTemporaryDir);

> {code}
> regularly (~50% of the time) fails to delete the temporary directory, this 
> code:
> {code:java}
> ViewFileSystem viewfs = (ViewFileSystem)fs1;

> for (FileSystem fileSystem : viewfs.getChildFileSystems()) {
  
>   if (fileSystem.exists(randomTemporaryDir)) {

> fileSystem.deleteOnExit(randomTemporaryDir);
  
>   }
> 
}

> {code}
> always successfully deletes the temporary directory on JVM shutdown.
> I am not very familiar with FileSystem inheritance hierarchies, but at first 
> glance I see two ways to fix this behavior:
> 1)  ViewFileSystem could forward deleteOnExit calls to the appropriate child 
> FileSystem, and not hold onto that path itself.
> 2) FileSystem.Cache.closeAll could first close all ViewFileSystems, then all 
> other FileSystems.  
> Would appreciate any thoughts of whether this seems accurate, and thoughts 
> (or help) on the fix.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-7285) Erasure Coding Support inside HDFS

2016-03-15 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers reassigned HDFS-7285:


Assignee: Zhe Zhang  (was: Matt Hardy)

> Erasure Coding Support inside HDFS
> --
>
> Key: HDFS-7285
> URL: https://issues.apache.org/jira/browse/HDFS-7285
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Weihua Jiang
>Assignee: Zhe Zhang
> Fix For: 3.0.0
>
> Attachments: Compare-consolidated-20150824.diff, 
> Consolidated-20150707.patch, Consolidated-20150806.patch, 
> Consolidated-20150810.patch, ECAnalyzer.py, ECParser.py, 
> HDFS-7285-Consolidated-20150911.patch, HDFS-7285-initial-PoC.patch, 
> HDFS-7285-merge-consolidated-01.patch, 
> HDFS-7285-merge-consolidated-trunk-01.patch, 
> HDFS-7285-merge-consolidated.trunk.03.patch, 
> HDFS-7285-merge-consolidated.trunk.04.patch, 
> HDFS-EC-Merge-PoC-20150624.patch, HDFS-EC-merge-consolidated-01.patch, 
> HDFS-bistriped.patch, HDFSErasureCodingDesign-20141028.pdf, 
> HDFSErasureCodingDesign-20141217.pdf, HDFSErasureCodingDesign-20150204.pdf, 
> HDFSErasureCodingDesign-20150206.pdf, HDFSErasureCodingPhaseITestPlan.pdf, 
> HDFSErasureCodingSystemTestPlan-20150824.pdf, 
> HDFSErasureCodingSystemTestReport-20150826.pdf, fsimage-analysis-20150105.pdf
>
>
> Erasure Coding (EC) can greatly reduce the storage overhead without sacrifice 
> of data reliability, comparing to the existing HDFS 3-replica approach. For 
> example, if we use a 10+4 Reed Solomon coding, we can allow loss of 4 blocks, 
> with storage overhead only being 40%. This makes EC a quite attractive 
> alternative for big data storage, particularly for cold data. 
> Facebook had a related open source project called HDFS-RAID. It used to be 
> one of the contribute packages in HDFS but had been removed since Hadoop 2.0 
> for maintain reason. The drawbacks are: 1) it is on top of HDFS and depends 
> on MapReduce to do encoding and decoding tasks; 2) it can only be used for 
> cold files that are intended not to be appended anymore; 3) the pure Java EC 
> coding implementation is extremely slow in practical use. Due to these, it 
> might not be a good idea to just bring HDFS-RAID back.
> We (Intel and Cloudera) are working on a design to build EC into HDFS that 
> gets rid of any external dependencies, makes it self-contained and 
> independently maintained. This design lays the EC feature on the storage type 
> support and considers compatible with existing HDFS features like caching, 
> snapshot, encryption, high availability and etc. This design will also 
> support different EC coding schemes, implementations and policies for 
> different deployment scenarios. By utilizing advanced libraries (e.g. Intel 
> ISA-L library), an implementation can greatly improve the performance of EC 
> encoding/decoding and makes the EC solution even more attractive. We will 
> post the design document soon. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9934) ReverseXML oiv processor should bail out if the XML file's layoutVersion doesn't match oiv's

2016-03-09 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15188442#comment-15188442
 ] 

Aaron T. Myers commented on HDFS-9934:
--

+1 pending Jenkins, the patch looks good to me.

> ReverseXML oiv processor should bail out if the XML file's layoutVersion 
> doesn't match oiv's
> 
>
> Key: HDFS-9934
> URL: https://issues.apache.org/jira/browse/HDFS-9934
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 2.8.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-9934.001.patch
>
>
> ReverseXML oiv processor should bail out if the XML file's layoutVersion 
> doesn't match oiv's



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9933) ReverseXML should be capitalized in oiv usage message

2016-03-09 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15188441#comment-15188441
 ] 

Aaron T. Myers commented on HDFS-9933:
--

+1, patch looks good to me.

> ReverseXML should be capitalized in oiv usage message
> -
>
> Key: HDFS-9933
> URL: https://issues.apache.org/jira/browse/HDFS-9933
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 2.8.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
>Priority: Minor
> Attachments: HDFS-9933.001.patch
>
>
> ReverseXML should be capitalized in oiv usage message



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7037) Using distcp to copy data from insecure to secure cluster via hftp doesn't work (branch-2 only)

2016-01-21 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15111814#comment-15111814
 ] 

Aaron T. Myers commented on HDFS-7037:
--

[~wheat9] - reviving this very old thread... can you please respond to my 
points above? As I've said previously, your only objection seems to be that 
this introduces a security vulnerability, but as I've pointed out several times 
already, we as a project have chosen not to treat this issue as a security 
vulnerability in other areas, and thus I think we should go ahead and check in 
this change.

> Using distcp to copy data from insecure to secure cluster via hftp doesn't 
> work  (branch-2 only)
> 
>
> Key: HDFS-7037
> URL: https://issues.apache.org/jira/browse/HDFS-7037
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: security, tools
>Affects Versions: 2.6.0
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
>  Labels: BB2015-05-TBR
> Attachments: HDFS-7037.001.patch
>
>
> This is a branch-2 only issue since hftp is only supported there. 
> Issuing "distcp hftp:// hdfs://" gave the 
> following failure exception:
> {code}
> 14/09/13 22:07:40 INFO tools.DelegationTokenFetcher: Error when dealing 
> remote token:
> java.io.IOException: Error when dealing remote token: Internal Server Error
>   at 
> org.apache.hadoop.hdfs.tools.DelegationTokenFetcher.run(DelegationTokenFetcher.java:375)
>   at 
> org.apache.hadoop.hdfs.tools.DelegationTokenFetcher.getDTfromRemote(DelegationTokenFetcher.java:238)
>   at 
> org.apache.hadoop.hdfs.web.HftpFileSystem$2.run(HftpFileSystem.java:252)
>   at 
> org.apache.hadoop.hdfs.web.HftpFileSystem$2.run(HftpFileSystem.java:247)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
>   at 
> org.apache.hadoop.hdfs.web.HftpFileSystem.getDelegationToken(HftpFileSystem.java:247)
>   at 
> org.apache.hadoop.hdfs.web.TokenAspect.ensureTokenInitialized(TokenAspect.java:140)
>   at 
> org.apache.hadoop.hdfs.web.HftpFileSystem.addDelegationTokenParam(HftpFileSystem.java:337)
>   at 
> org.apache.hadoop.hdfs.web.HftpFileSystem.openConnection(HftpFileSystem.java:324)
>   at 
> org.apache.hadoop.hdfs.web.HftpFileSystem$LsParser.fetchList(HftpFileSystem.java:457)
>   at 
> org.apache.hadoop.hdfs.web.HftpFileSystem$LsParser.getFileStatus(HftpFileSystem.java:472)
>   at 
> org.apache.hadoop.hdfs.web.HftpFileSystem.getFileStatus(HftpFileSystem.java:501)
>   at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:57)
>   at org.apache.hadoop.fs.Globber.glob(Globber.java:248)
>   at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1623)
>   at 
> org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:77)
>   at org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:81)
>   at 
> org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:342)
>   at org.apache.hadoop.tools.DistCp.execute(DistCp.java:154)
>   at org.apache.hadoop.tools.DistCp.run(DistCp.java:121)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>   at org.apache.hadoop.tools.DistCp.main(DistCp.java:390)
> 14/09/13 22:07:40 WARN security.UserGroupInformation: 
> PriviledgedActionException as:hadoopu...@xyz.com (auth:KERBEROS) 
> cause:java.io.IOException: Unable to obtain remote token
> 14/09/13 22:07:40 ERROR tools.DistCp: Exception encountered 
> java.io.IOException: Unable to obtain remote token
>   at 
> org.apache.hadoop.hdfs.tools.DelegationTokenFetcher.getDTfromRemote(DelegationTokenFetcher.java:249)
>   at 
> org.apache.hadoop.hdfs.web.HftpFileSystem$2.run(HftpFileSystem.java:252)
>   at 
> org.apache.hadoop.hdfs.web.HftpFileSystem$2.run(HftpFileSystem.java:247)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
>   at 
> org.apache.hadoop.hdfs.web.HftpFileSystem.getDelegationToken(HftpFileSystem.java:247)
>   at 
> org.apache.hadoop.hdfs.web.TokenAspect.ensureTokenInitialized(TokenAspect.java:140)
>   at 
> org.apache.hadoop.hdfs.web.HftpFileSystem.addDelegationTokenParam(HftpFileSystem.java:337)
>   at 
> org.apache.hadoop.hdfs.web.HftpFileSystem.openConnection(HftpFileSystem.java:324)
>   at 
> org.apache.hadoop.hdfs.web.HftpFileSystem$LsParser.fetchList(HftpFileSystem.java:457)
>   at 
> 

[jira] [Commented] (HDFS-9319) Make DatanodeInfo thread safe

2016-01-21 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15111789#comment-15111789
 ] 

Aaron T. Myers commented on HDFS-9319:
--

[~jingzhao] - do you plan on committing this soon? I see that [~cnauroth] +1'ed 
it a few months back.

> Make DatanodeInfo thread safe
> -
>
> Key: HDFS-9319
> URL: https://issues.apache.org/jira/browse/HDFS-9319
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Jing Zhao
>Assignee: Jing Zhao
> Attachments: HDFS-9319.000.patch
>
>
> This jira plans to make DatanodeInfo's internal states independent of 
> external locks. Note because DatanodeInfo extends DatanodeID, we still need 
> to change DatanodeID as a follow-on work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-4478) Token operations should not lock namespace

2016-01-21 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15111866#comment-15111866
 ] 

Aaron T. Myers commented on HDFS-4478:
--

[~daryn] - is this perhaps a duplicate of HDFS-5029?

> Token operations should not lock namespace
> --
>
> Key: HDFS-4478
> URL: https://issues.apache.org/jira/browse/HDFS-4478
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 0.23.0, 2.0.0-alpha, 3.0.0
>Reporter: Daryn Sharp
>
> Token get/renew/cancel operations obtain a write lock on the namespace.  This 
> seems unnecessary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6804) race condition between transferring block and appending block causes "Unexpected checksum mismatch exception"

2015-12-15 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15058832#comment-15058832
 ] 

Aaron T. Myers commented on HDFS-6804:
--

[~max_datapath] - what version of Hadoop did you repro this on?

> race condition between transferring block and appending block causes 
> "Unexpected checksum mismatch exception" 
> --
>
> Key: HDFS-6804
> URL: https://issues.apache.org/jira/browse/HDFS-6804
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.2.0
>Reporter: Gordon Wang
>
> We found some error log in the datanode. like this
> {noformat}
> 2014-07-22 01:49:51,338 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Ex
> ception for BP-2072804351-192.168.2.104-1406008383435:blk_1073741997_9248
> java.io.IOException: Terminating due to a checksum error.java.io.IOException: 
> Unexpected checksum mismatch while writing 
> BP-2072804351-192.168.2.104-1406008383435:blk_1073741997_9248 from 
> /192.168.2.101:39495
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:536)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:703)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:575)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:115)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:68)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221)
> at java.lang.Thread.run(Thread.java:744)
> {noformat}
> While on the source datanode, the log says the block is transmitted.
> {noformat}
> 2014-07-22 01:49:50,805 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Da
> taTransfer: Transmitted 
> BP-2072804351-192.168.2.104-1406008383435:blk_1073741997
> _9248 (numBytes=16188152) to /192.168.2.103:50010
> {noformat}
> When the destination datanode gets the checksum mismatch, it reports bad 
> block to NameNode and NameNode marks the replica on the source datanode as 
> corrupt. But actually, the replica on the source datanode is valid. Because 
> the replica can pass the checksum verification.
> In all, the replica on the source data is wrongly marked as corrupted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9427) HDFS should not default to ephemeral ports

2015-11-20 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15015395#comment-15015395
 ] 

Aaron T. Myers commented on HDFS-9427:
--

+1, let's make this happen. Seems long overdue to me.

> HDFS should not default to ephemeral ports
> --
>
> Key: HDFS-9427
> URL: https://issues.apache.org/jira/browse/HDFS-9427
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, namenode
>Affects Versions: 3.0.0
>Reporter: Arpit Agarwal
>Assignee: Xiaobing Zhou
>  Labels: Incompatible
>
> HDFS defaults to ephemeral ports for the some HTTP/RPC endpoints. This can 
> cause bind exceptions on service startup if the port is in use.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9305) Delayed heartbeat processing causes storm of subsequent heartbeats

2015-10-30 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-9305:
-
Fix Version/s: 2.7.2

Setting the fix version to 2.7.2. [~arpitagarwal] - if that's not right, please 
change it appropriately.

> Delayed heartbeat processing causes storm of subsequent heartbeats
> --
>
> Key: HDFS-9305
> URL: https://issues.apache.org/jira/browse/HDFS-9305
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.7.1
>Reporter: Chris Nauroth
>Assignee: Arpit Agarwal
> Fix For: 2.7.2
>
> Attachments: HDFS-9305.01.patch, HDFS-9305.02.patch
>
>
> A DataNode typically sends a heartbeat to the NameNode every 3 seconds.  We 
> expect heartbeat handling to complete relatively quickly.  However, if 
> something unexpected causes heartbeat processing to get blocked, such as a 
> long GC or heavy lock contention within the NameNode, then heartbeat 
> processing would be delayed.  After recovering from this delay, the DataNode 
> then starts sending a storm of heartbeat messages in a tight loop.  In a 
> large cluster with many DataNodes, this storm of heartbeat messages could 
> cause harmful load on the NameNode and make overall cluster recovery more 
> difficult.
> The bug appears to be caused by incorrect timekeeping inside 
> {{BPServiceActor}}.  The next heartbeat time is always calculated as a delta 
> from the previous heartbeat time, without any compensation for possible long 
> latency on an individual heartbeat RPC.  The only mitigation would be 
> restarting all DataNodes to force a reset of the heartbeat schedule, or 
> simply wait out the storm until the scheduling catches up and corrects itself.
> This problem would not manifest after a NameNode restart.  In that case, the 
> NameNode would respond to the first heartbeat by telling the DataNode to 
> re-register, and {{BPServiceActor#reRegister}} would reset the heartbeat 
> schedule to the current time.  I believe the problem would only manifest if 
> the NameNode process kept alive, but processed heartbeats unexpectedly slowly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8808) dfs.image.transfer.bandwidthPerSec should not apply to -bootstrapStandby

2015-10-23 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14970589#comment-14970589
 ] 

Aaron T. Myers commented on HDFS-8808:
--

+1, the latest patch looks good to me.

Thanks very much, Zhe.

> dfs.image.transfer.bandwidthPerSec should not apply to -bootstrapStandby
> 
>
> Key: HDFS-8808
> URL: https://issues.apache.org/jira/browse/HDFS-8808
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: Gautam Gopalakrishnan
>Assignee: Zhe Zhang
> Attachments: HDFS-8808-00.patch, HDFS-8808-01.patch, 
> HDFS-8808-02.patch, HDFS-8808-03.patch, HDFS-8808.04.patch
>
>
> The parameter {{dfs.image.transfer.bandwidthPerSec}} can be used to limit the 
> speed with which the fsimage is copied between the namenodes during regular 
> use. However, as a side effect, this also limits transfers when the 
> {{-bootstrapStandby}} option is used. This option is often used during 
> upgrades and could potentially slow down the entire workflow. The request 
> here is to ensure {{-bootstrapStandby}} is unaffected by this bandwidth 
> setting



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-1312) Re-balance disks within a Datanode

2015-10-05 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14944280#comment-14944280
 ] 

Aaron T. Myers commented on HDFS-1312:
--

bq. There are a large number of clusters which are using round-robin 
scheduling. I have been looking around for the data on HDFS-1804 (In fact the 
proposal discusses that issue). It will be good if you have some data on 
HDFS-1804 deployment. Most of the clusters that I am (anecdotal, I know) seeing 
are based on round robin scheduling. Please also see the thread I refer to in 
the proposal on linkedin, and you will see customers are also looking for this 
data.

Presumably that's mostly because the round-robin volume choosing policy is the 
default, and many users don't even know that there's an alternative. 
Independently of the need to implement an active balancer as this JIRA 
proposes, should we consider changing the default to the available space volume 
choosing policy? We'd probably need to do this in Hadoop 3.0, since I'd think 
this should be considered an incompatible change.

> Re-balance disks within a Datanode
> --
>
> Key: HDFS-1312
> URL: https://issues.apache.org/jira/browse/HDFS-1312
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode
>Reporter: Travis Crawford
> Attachments: disk-balancer-proposal.pdf
>
>
> Filing this issue in response to ``full disk woes`` on hdfs-user.
> Datanodes fill their storage directories unevenly, leading to situations 
> where certain disks are full while others are significantly less used. Users 
> at many different sites have experienced this issue, and HDFS administrators 
> are taking steps like:
> - Manually rebalancing blocks in storage directories
> - Decomissioning nodes & later readding them
> There's a tradeoff between making use of all available spindles, and filling 
> disks at the sameish rate. Possible solutions include:
> - Weighting less-used disks heavier when placing new blocks on the datanode. 
> In write-heavy environments this will still make use of all spindles, 
> equalizing disk use over time.
> - Rebalancing blocks locally. This would help equalize disk use as disks are 
> added/replaced in older cluster nodes.
> Datanodes should actively manage their local disk so operator intervention is 
> not needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9001) DFSUtil.getNsServiceRpcUris() can return too many entries in a non-HA, non-federated cluster

2015-09-29 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936216#comment-14936216
 ] 

Aaron T. Myers commented on HDFS-9001:
--

+1, the patch looks good to me. I ran the failed tests locally both with and 
without the patch and the pass just fine.

I'm going to commit this momentarily.

> DFSUtil.getNsServiceRpcUris() can return too many entries in a non-HA, 
> non-federated cluster
> 
>
> Key: HDFS-9001
> URL: https://issues.apache.org/jira/browse/HDFS-9001
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: HDFS
>Affects Versions: 2.6.0, 2.7.0, 2.7.1
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
> Attachments: HDFS-9001.001.patch
>
>
> If defaultFS differs from rpc-address, then DFSUtil.getNsServiceRpcUris() 
> will return two entries: one for the [service] RPC address and one for the 
> default FS.  This behavior violates the expected behavior stated in the 
> JavaDoc header.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9001) DFSUtil.getNsServiceRpcUris() can return too many entries in a non-HA, non-federated cluster

2015-09-29 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-9001:
-
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.8.0
   Status: Resolved  (was: Patch Available)

I've just committed this to trunk and branch-2.

Thanks a lot for the contribution, Daniel.

> DFSUtil.getNsServiceRpcUris() can return too many entries in a non-HA, 
> non-federated cluster
> 
>
> Key: HDFS-9001
> URL: https://issues.apache.org/jira/browse/HDFS-9001
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: HDFS
>Affects Versions: 2.6.0, 2.7.0, 2.7.1
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
> Fix For: 2.8.0
>
> Attachments: HDFS-9001.001.patch
>
>
> If defaultFS differs from rpc-address, then DFSUtil.getNsServiceRpcUris() 
> will return two entries: one for the [service] RPC address and one for the 
> default FS.  This behavior violates the expected behavior stated in the 
> JavaDoc header.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9112) Haadmin fails if multiple name service IDs are configured

2015-09-18 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14876726#comment-14876726
 ] 

Aaron T. Myers commented on HDFS-9112:
--

Pinging [~templedf], since I know he was just looking at this code as well.

> Haadmin fails if multiple name service IDs are configured
> -
>
> Key: HDFS-9112
> URL: https://issues.apache.org/jira/browse/HDFS-9112
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 2.7.1
>Reporter: Anu Engineer
>Assignee: Anu Engineer
>
> In HDFS-6376 we supported a feature for distcp that allows multiple 
> NameService IDs to be specified so that we can copy from two HA enabled 
> clusters.
> That confuses haadmin command since we have a check in 
> DFSUtil#getNamenodeServiceAddr which fails if it finds more than 1 name in 
> that property.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7037) Using distcp to copy data from insecure to secure cluster via hftp doesn't work (branch-2 only)

2015-09-17 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14804611#comment-14804611
 ] 

Aaron T. Myers commented on HDFS-7037:
--

bq. Please correct me if I misunderstood. (1) The current behavior of RPC / 
WebHDFS is less than ideal and it is vulnerable to attack. (2) You argue that 
the proposed changes makes HFTP vulnerable for the fallback, but it is no worse 
than what we have in RPC / WebHDFS today.

Correct.

bq. As an analogy, it seems to me that the argument is that it's okay to have a 
broken window given that we have many broken windows already?

I don't think that's a reasonable analogy. The point you were making is that 
this change introduces a possible security vulnerability. I'm saying that this 
is demonstrably not a security vulnerability, since we consciously chose to add 
this capability to other interfaces. HADOOP-11701 will make things configurably 
more secure for all interfaces, but that's a separate discussion.

bq. My question is that is there a need to create yet another workaround, given 
that we know that it is prone for security vulnerability? 

Like I said above, this should not be considered a security vulnerability. If 
it is, then we should have never added this capability to WebHDFS/RPC, and we 
should be reverting it from WebHDFS/RPC right now.

bq. I'd like to understand your use cases better? Can you please elaborate why 
you'll need another workaround in HFTP, given that you guys have put the 
workaround in WebHDFS already?

Simple: because some users use HFTP and not WebHDFS, specifically for distcp 
from older clusters.

> Using distcp to copy data from insecure to secure cluster via hftp doesn't 
> work  (branch-2 only)
> 
>
> Key: HDFS-7037
> URL: https://issues.apache.org/jira/browse/HDFS-7037
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: security, tools
>Affects Versions: 2.6.0
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
>  Labels: BB2015-05-TBR
> Attachments: HDFS-7037.001.patch
>
>
> This is a branch-2 only issue since hftp is only supported there. 
> Issuing "distcp hftp:// hdfs://" gave the 
> following failure exception:
> {code}
> 14/09/13 22:07:40 INFO tools.DelegationTokenFetcher: Error when dealing 
> remote token:
> java.io.IOException: Error when dealing remote token: Internal Server Error
>   at 
> org.apache.hadoop.hdfs.tools.DelegationTokenFetcher.run(DelegationTokenFetcher.java:375)
>   at 
> org.apache.hadoop.hdfs.tools.DelegationTokenFetcher.getDTfromRemote(DelegationTokenFetcher.java:238)
>   at 
> org.apache.hadoop.hdfs.web.HftpFileSystem$2.run(HftpFileSystem.java:252)
>   at 
> org.apache.hadoop.hdfs.web.HftpFileSystem$2.run(HftpFileSystem.java:247)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
>   at 
> org.apache.hadoop.hdfs.web.HftpFileSystem.getDelegationToken(HftpFileSystem.java:247)
>   at 
> org.apache.hadoop.hdfs.web.TokenAspect.ensureTokenInitialized(TokenAspect.java:140)
>   at 
> org.apache.hadoop.hdfs.web.HftpFileSystem.addDelegationTokenParam(HftpFileSystem.java:337)
>   at 
> org.apache.hadoop.hdfs.web.HftpFileSystem.openConnection(HftpFileSystem.java:324)
>   at 
> org.apache.hadoop.hdfs.web.HftpFileSystem$LsParser.fetchList(HftpFileSystem.java:457)
>   at 
> org.apache.hadoop.hdfs.web.HftpFileSystem$LsParser.getFileStatus(HftpFileSystem.java:472)
>   at 
> org.apache.hadoop.hdfs.web.HftpFileSystem.getFileStatus(HftpFileSystem.java:501)
>   at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:57)
>   at org.apache.hadoop.fs.Globber.glob(Globber.java:248)
>   at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1623)
>   at 
> org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:77)
>   at org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:81)
>   at 
> org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:342)
>   at org.apache.hadoop.tools.DistCp.execute(DistCp.java:154)
>   at org.apache.hadoop.tools.DistCp.run(DistCp.java:121)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>   at org.apache.hadoop.tools.DistCp.main(DistCp.java:390)
> 14/09/13 22:07:40 WARN security.UserGroupInformation: 
> PriviledgedActionException as:hadoopu...@xyz.com (auth:KERBEROS) 
> cause:java.io.IOException: Unable to obtain remote token
> 14/09/13 22:07:40 ERROR tools.DistCp: Exception encountered 
> java.io.IOException: Unable to obtain remote token
>   at 
> 

[jira] [Commented] (HDFS-9072) Fix random failures in TestJMXGet

2015-09-16 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14791000#comment-14791000
 ] 

Aaron T. Myers commented on HDFS-9072:
--

Hey [~ste...@apache.org], it looks to me like this commit broke branch-2 
compilation due to a missing import of {{DFSTestUtil}}. I just pushed a little 
commit to fix branch-2 compilation (e533555b577c3678aa7a430c6b084973187de18a) 
and was hoping you could take a look at it when you get a chance.

> Fix random failures in TestJMXGet
> -
>
> Key: HDFS-9072
> URL: https://issues.apache.org/jira/browse/HDFS-9072
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 3.0.0, 2.8.0
>Reporter: J.Andreina
>Assignee: J.Andreina
>Priority: Critical
> Fix For: 2.8.0
>
> Attachments: HDFS-9072-01.patch
>
>
> There is a slight delay in updating the jmx values after HDFS operations. But 
> assertions are done even that . Hence following testcases fails. 
>  TestJMXGet#testNameNode()
>  TestJMXGet#testDataNode()



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7037) Using distcp to copy data from insecure to secure cluster via hftp doesn't work (branch-2 only)

2015-09-15 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14746406#comment-14746406
 ] 

Aaron T. Myers commented on HDFS-7037:
--

[~wheat9] - with regard to your comment that "the security concerns remain 
unaddressed," could you please respond to this point specifically:

bq. adding this capability to HFTP does not change the security semantics of 
Hadoop at all, since RPC and other interfaces used for remote access already 
support allowing configurable insecure fallback. This is not a security 
vulnerability. If it were, we should be removing the ability to configure 
insecure fallback at all in Hadoop. We're not doing that, because it was a 
deliberate choice to add that feature.

i.e., this change _is not changing the security level of Hadoop_, so I don't 
understand what security concerns you have with this change. This change is 
proposing to expand the fallback capability that already exists in other RPC 
interfaces to the HFTP interface.

> Using distcp to copy data from insecure to secure cluster via hftp doesn't 
> work  (branch-2 only)
> 
>
> Key: HDFS-7037
> URL: https://issues.apache.org/jira/browse/HDFS-7037
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: security, tools
>Affects Versions: 2.6.0
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
>  Labels: BB2015-05-TBR
> Attachments: HDFS-7037.001.patch
>
>
> This is a branch-2 only issue since hftp is only supported there. 
> Issuing "distcp hftp:// hdfs://" gave the 
> following failure exception:
> {code}
> 14/09/13 22:07:40 INFO tools.DelegationTokenFetcher: Error when dealing 
> remote token:
> java.io.IOException: Error when dealing remote token: Internal Server Error
>   at 
> org.apache.hadoop.hdfs.tools.DelegationTokenFetcher.run(DelegationTokenFetcher.java:375)
>   at 
> org.apache.hadoop.hdfs.tools.DelegationTokenFetcher.getDTfromRemote(DelegationTokenFetcher.java:238)
>   at 
> org.apache.hadoop.hdfs.web.HftpFileSystem$2.run(HftpFileSystem.java:252)
>   at 
> org.apache.hadoop.hdfs.web.HftpFileSystem$2.run(HftpFileSystem.java:247)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
>   at 
> org.apache.hadoop.hdfs.web.HftpFileSystem.getDelegationToken(HftpFileSystem.java:247)
>   at 
> org.apache.hadoop.hdfs.web.TokenAspect.ensureTokenInitialized(TokenAspect.java:140)
>   at 
> org.apache.hadoop.hdfs.web.HftpFileSystem.addDelegationTokenParam(HftpFileSystem.java:337)
>   at 
> org.apache.hadoop.hdfs.web.HftpFileSystem.openConnection(HftpFileSystem.java:324)
>   at 
> org.apache.hadoop.hdfs.web.HftpFileSystem$LsParser.fetchList(HftpFileSystem.java:457)
>   at 
> org.apache.hadoop.hdfs.web.HftpFileSystem$LsParser.getFileStatus(HftpFileSystem.java:472)
>   at 
> org.apache.hadoop.hdfs.web.HftpFileSystem.getFileStatus(HftpFileSystem.java:501)
>   at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:57)
>   at org.apache.hadoop.fs.Globber.glob(Globber.java:248)
>   at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1623)
>   at 
> org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:77)
>   at org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:81)
>   at 
> org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:342)
>   at org.apache.hadoop.tools.DistCp.execute(DistCp.java:154)
>   at org.apache.hadoop.tools.DistCp.run(DistCp.java:121)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>   at org.apache.hadoop.tools.DistCp.main(DistCp.java:390)
> 14/09/13 22:07:40 WARN security.UserGroupInformation: 
> PriviledgedActionException as:hadoopu...@xyz.com (auth:KERBEROS) 
> cause:java.io.IOException: Unable to obtain remote token
> 14/09/13 22:07:40 ERROR tools.DistCp: Exception encountered 
> java.io.IOException: Unable to obtain remote token
>   at 
> org.apache.hadoop.hdfs.tools.DelegationTokenFetcher.getDTfromRemote(DelegationTokenFetcher.java:249)
>   at 
> org.apache.hadoop.hdfs.web.HftpFileSystem$2.run(HftpFileSystem.java:252)
>   at 
> org.apache.hadoop.hdfs.web.HftpFileSystem$2.run(HftpFileSystem.java:247)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
>   at 
> org.apache.hadoop.hdfs.web.HftpFileSystem.getDelegationToken(HftpFileSystem.java:247)
>   at 
> 

[jira] [Commented] (HDFS-7037) Using distcp to copy data from insecure to secure cluster via hftp doesn't work (branch-2 only)

2015-09-15 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14746325#comment-14746325
 ] 

Aaron T. Myers commented on HDFS-7037:
--

[~wheat9] - it's been 5 months and I've received no response from you on this 
matter, and there's been no progress made on HADOOP-11701. As I said back in 
April, I don't think that fixing this bug in HFTP should not be gated on 
implementing that new feature. Would you please consider changing your -1 to a 
-0, so that we can fix this issue for users who are encountering this problem?

> Using distcp to copy data from insecure to secure cluster via hftp doesn't 
> work  (branch-2 only)
> 
>
> Key: HDFS-7037
> URL: https://issues.apache.org/jira/browse/HDFS-7037
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: security, tools
>Affects Versions: 2.6.0
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
>  Labels: BB2015-05-TBR
> Attachments: HDFS-7037.001.patch
>
>
> This is a branch-2 only issue since hftp is only supported there. 
> Issuing "distcp hftp:// hdfs://" gave the 
> following failure exception:
> {code}
> 14/09/13 22:07:40 INFO tools.DelegationTokenFetcher: Error when dealing 
> remote token:
> java.io.IOException: Error when dealing remote token: Internal Server Error
>   at 
> org.apache.hadoop.hdfs.tools.DelegationTokenFetcher.run(DelegationTokenFetcher.java:375)
>   at 
> org.apache.hadoop.hdfs.tools.DelegationTokenFetcher.getDTfromRemote(DelegationTokenFetcher.java:238)
>   at 
> org.apache.hadoop.hdfs.web.HftpFileSystem$2.run(HftpFileSystem.java:252)
>   at 
> org.apache.hadoop.hdfs.web.HftpFileSystem$2.run(HftpFileSystem.java:247)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
>   at 
> org.apache.hadoop.hdfs.web.HftpFileSystem.getDelegationToken(HftpFileSystem.java:247)
>   at 
> org.apache.hadoop.hdfs.web.TokenAspect.ensureTokenInitialized(TokenAspect.java:140)
>   at 
> org.apache.hadoop.hdfs.web.HftpFileSystem.addDelegationTokenParam(HftpFileSystem.java:337)
>   at 
> org.apache.hadoop.hdfs.web.HftpFileSystem.openConnection(HftpFileSystem.java:324)
>   at 
> org.apache.hadoop.hdfs.web.HftpFileSystem$LsParser.fetchList(HftpFileSystem.java:457)
>   at 
> org.apache.hadoop.hdfs.web.HftpFileSystem$LsParser.getFileStatus(HftpFileSystem.java:472)
>   at 
> org.apache.hadoop.hdfs.web.HftpFileSystem.getFileStatus(HftpFileSystem.java:501)
>   at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:57)
>   at org.apache.hadoop.fs.Globber.glob(Globber.java:248)
>   at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1623)
>   at 
> org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:77)
>   at org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:81)
>   at 
> org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:342)
>   at org.apache.hadoop.tools.DistCp.execute(DistCp.java:154)
>   at org.apache.hadoop.tools.DistCp.run(DistCp.java:121)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>   at org.apache.hadoop.tools.DistCp.main(DistCp.java:390)
> 14/09/13 22:07:40 WARN security.UserGroupInformation: 
> PriviledgedActionException as:hadoopu...@xyz.com (auth:KERBEROS) 
> cause:java.io.IOException: Unable to obtain remote token
> 14/09/13 22:07:40 ERROR tools.DistCp: Exception encountered 
> java.io.IOException: Unable to obtain remote token
>   at 
> org.apache.hadoop.hdfs.tools.DelegationTokenFetcher.getDTfromRemote(DelegationTokenFetcher.java:249)
>   at 
> org.apache.hadoop.hdfs.web.HftpFileSystem$2.run(HftpFileSystem.java:252)
>   at 
> org.apache.hadoop.hdfs.web.HftpFileSystem$2.run(HftpFileSystem.java:247)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
>   at 
> org.apache.hadoop.hdfs.web.HftpFileSystem.getDelegationToken(HftpFileSystem.java:247)
>   at 
> org.apache.hadoop.hdfs.web.TokenAspect.ensureTokenInitialized(TokenAspect.java:140)
>   at 
> org.apache.hadoop.hdfs.web.HftpFileSystem.addDelegationTokenParam(HftpFileSystem.java:337)
>   at 
> org.apache.hadoop.hdfs.web.HftpFileSystem.openConnection(HftpFileSystem.java:324)
>   at 
> org.apache.hadoop.hdfs.web.HftpFileSystem$LsParser.fetchList(HftpFileSystem.java:457)
>   at 
> 

[jira] [Commented] (HDFS-9001) DFSUtil.getNsServiceRpcUris() can return too many entries in a non-HA, non-federated cluster

2015-09-15 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14745770#comment-14745770
 ] 

Aaron T. Myers commented on HDFS-9001:
--

Hey Daniel,

I think this makes sense, thanks very much for the lucid explanation. I think 
the really correct thing to do (ignoring backward compatibility concerns) would 
be to make fs.defaultFS a client-side config only. It's only for historical 
reasons that it's not, in that I believe in the NN we fall back to having the 
NN bind to that as its RPC address if no {{rpc-address}} address is actually 
configured. We can only change that behavior in a major version, though, so I 
don't think we should do that here.

I think the way to go is to fix this JIRA as you propose, and then file a 
separate, backward-incompatible JIRA to change the NN (and all admin commands) 
to no longer look at fs.defaultFS at all. We'd then only commit that to trunk 
(and not branch-2) so it'll show up in Hadoop 3.0.

Does that make sense?

> DFSUtil.getNsServiceRpcUris() can return too many entries in a non-HA, 
> non-federated cluster
> 
>
> Key: HDFS-9001
> URL: https://issues.apache.org/jira/browse/HDFS-9001
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: HDFS
>Affects Versions: 2.6.0, 2.7.0, 2.7.1
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>
> If defaultFS differs from rpc-address, then DFSUtil.getNsServiceRpcUris() 
> will return two entries: one for the [service] RPC address and one for the 
> default FS.  This behavior violates the expected behavior stated in the 
> JavaDoc header.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8950) NameNode refresh doesn't remove DataNodes

2015-08-25 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-8950:
-
Target Version/s: 2.8.0

 NameNode refresh doesn't remove DataNodes
 -

 Key: HDFS-8950
 URL: https://issues.apache.org/jira/browse/HDFS-8950
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode, namenode
Affects Versions: 2.6.0
Reporter: Daniel Templeton
Assignee: Daniel Templeton

 If you remove a DN from NN's allowed host list (HDFS was HA) and then do NN 
 refresh, it doesn't remove it actually and the NN UI keeps showing that node. 
 It may try to allocate some blocks to that DN as well during an MR job.  This 
 issue is independent from DN decommission.
 To reproduce:
 1. Add a DN to dfs_hosts_allow
 2. Refresh NN
 3. Start DN. Now NN starts seeing DN.
 4. Stop DN
 5. Remove DN from dfs_hosts_allow
 6. Refresh NN - NN is still reporting DN as being used by HDFS.
 This is different from decom because there DN is added to exclude list in 
 addition to being removed from allowed list, and in that case everything 
 works correctly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8808) dfs.image.transfer.bandwidthPerSec should not apply to -bootstrapStandby

2015-08-18 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14702088#comment-14702088
 ] 

Aaron T. Myers commented on HDFS-8808:
--

Hey Zhe, latest patch looks pretty good to me. Just a few small comments:

# You should add an entry in {{hdfs-default.xml}} that describes what this 
config setting does, and perhaps also amend the description of 
{{dfs.image.transfer.bandwidthPerSec}} to make it clear that this does not 
apply to bootstrapping the standby.
# It's kind of unfortunate that this patch is perpetuating the inconsistent 
camel cased config name. I think much better would be to go with something 
consistent with, for example, the block scanner config name, which uses 
bytes.per.second instead of bandwidthPerSec, which removes the camel case 
and is more consistent. Of course, doing that would be inconsistent with 
{{dfs.image.transfer.bandwidthPerSec}}. Perhaps we should go with being 
consistent with the bandwidthPerSec style for now, but file a follow-up JIRA 
to deprecate that form?
# Given that this JIRA will change the applicability of the existing config 
name, I believe we should consider this an incompatible change. That said, I 
think very few if any users are actually using the current config intending to 
limit the bandwidth of bootstrapping the standby, so I'm personally OK putting 
it into branch-2, but I still think we should mark this JIRA incompatible so as 
to call it out appropriately.
# Nit: I think that bootstrapping should have two ps.

 dfs.image.transfer.bandwidthPerSec should not apply to -bootstrapStandby
 

 Key: HDFS-8808
 URL: https://issues.apache.org/jira/browse/HDFS-8808
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.7.1
Reporter: Gautam Gopalakrishnan
Assignee: Zhe Zhang
 Attachments: HDFS-8808-00.patch, HDFS-8808-01.patch, 
 HDFS-8808-02.patch, HDFS-8808-03.patch


 The parameter {{dfs.image.transfer.bandwidthPerSec}} can be used to limit the 
 speed with which the fsimage is copied between the namenodes during regular 
 use. However, as a side effect, this also limits transfers when the 
 {{-bootstrapStandby}} option is used. This option is often used during 
 upgrades and could potentially slow down the entire workflow. The request 
 here is to ensure {{-bootstrapStandby}} is unaffected by this bandwidth 
 setting



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8888) Support volumes in HDFS

2015-08-11 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-:
-
Description: 
There are multiple types of zones (e.g., snapshottable directories, encryption 
zones, directories with quotas) which are conceptually close to namespace 
volumes in traditional file systems.

This jira proposes to introduce the concept of volume to simplify the 
implementation of snapshots and encryption zones.

  was:
There are multiple types of zones (e.g., snapshot, encryption zone) which are 
conceptually close to namespace volumes in traditional filesystems.

This jira proposes to introduce the concept of volume to simplify the 
implementation of snapshots and encryption zones.


 Support volumes in HDFS
 ---

 Key: HDFS-
 URL: https://issues.apache.org/jira/browse/HDFS-
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Haohui Mai

 There are multiple types of zones (e.g., snapshottable directories, 
 encryption zones, directories with quotas) which are conceptually close to 
 namespace volumes in traditional file systems.
 This jira proposes to introduce the concept of volume to simplify the 
 implementation of snapshots and encryption zones.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8849) fsck should report number of missing blocks with replication factor 1

2015-08-04 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14654647#comment-14654647
 ] 

Aaron T. Myers commented on HDFS-8849:
--

Allen, I've seen plenty of users who at some point in the past have run 
TeraSort on their cluster, and for that job the default output replication is 
1. If a DN then goes offline that was containing some TeraSort output, then 
blocks appear missing and users get concerned because they see missing blocks 
on the NN web UI and via dfsadmin -report/fsck, but it's not obvious that those 
blocks were in fact set to replication factor 1. In my experience, this is 
really quite common, so definitely seems like something worthy of addressing to 
me. How we go about addressing this should certainly be discussed, and it could 
be that including this information in fsck doesn't make sense, but let's try to 
come up with something that does address this issue.

Separately, using phrases like Meanwhile, back in real life and calling a 
proposed improvement a useless feature is not an appropriate way to 
communicate in this forum. Let's please try to keep the communication 
constructive, not unnecessarily hostile. Comments like those contribute to the 
perception that our community is difficult to contribute to.

 fsck should report number of missing blocks with replication factor 1
 -

 Key: HDFS-8849
 URL: https://issues.apache.org/jira/browse/HDFS-8849
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: tools
Affects Versions: 2.7.1
Reporter: Zhe Zhang
Assignee: Zhe Zhang
Priority: Minor

 HDFS-7165 supports reporting number of blocks with replication factor 1 in 
 {{dfsadmin}} and NN metrics. But it didn't extend {{fsck}} with the same 
 support, which is the aim of this JIRA.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-2433) TestFileAppend4 fails intermittently

2015-07-20 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers resolved HDFS-2433.
--
Resolution: Cannot Reproduce

I don't think I've seen this fail in a long, long time. Going to close this 
out. Please reopen if you disagree.

 TestFileAppend4 fails intermittently
 

 Key: HDFS-2433
 URL: https://issues.apache.org/jira/browse/HDFS-2433
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode, namenode, test
Affects Versions: 0.20.205.0, 1.0.0
Reporter: Robert Joseph Evans
Priority: Critical
 Attachments: failed.tar.bz2


 A Jenkins build we have running failed twice in a row with issues form 
 TestFileAppend4.testAppendSyncReplication1 in an attempt to reproduce the 
 error I ran TestFileAppend4 in a loop over night saving the results away.  
 (No clean was done in between test runs)
 When TestFileAppend4 is run in a loop the testAppendSyncReplication[012] 
 tests fail about 10% of the time (14 times out of 130 tries)  They all fail 
 with something like the following.  Often it is only one of the tests that 
 fail, but I have seen as many as two fail in one run.
 {noformat}
 Testcase: testAppendSyncReplication2 took 32.198 sec
 FAILED
 Should have 2 replicas for that block, not 1
 junit.framework.AssertionFailedError: Should have 2 replicas for that block, 
 not 1
 at 
 org.apache.hadoop.hdfs.TestFileAppend4.replicationTest(TestFileAppend4.java:477)
 at 
 org.apache.hadoop.hdfs.TestFileAppend4.testAppendSyncReplication2(TestFileAppend4.java:425)
 {noformat}
 I also saw several other tests that are a part of TestFileApped4 fail during 
 this experiment.  They may all be related to one another so I am filing them 
 in the same JIRA.  If it turns out that they are not related then they can be 
 split up later.
 testAppendSyncBlockPlusBbw failed 6 out of the 130 times or about 5% of the 
 time
 {noformat}
 Testcase: testAppendSyncBlockPlusBbw took 1.633 sec
 FAILED
 unexpected file size! received=0 , expected=1024
 junit.framework.AssertionFailedError: unexpected file size! received=0 , 
 expected=1024
 at 
 org.apache.hadoop.hdfs.TestFileAppend4.assertFileSize(TestFileAppend4.java:136)
 at 
 org.apache.hadoop.hdfs.TestFileAppend4.testAppendSyncBlockPlusBbw(TestFileAppend4.java:401)
 {noformat}
 testAppendSyncChecksum[012] failed 2 out of the 130 times or about 1.5% of 
 the time
 {noformat}
 Testcase: testAppendSyncChecksum1 took 32.385 sec
 FAILED
 Should have 1 replica for that block, not 2
 junit.framework.AssertionFailedError: Should have 1 replica for that block, 
 not 2
 at 
 org.apache.hadoop.hdfs.TestFileAppend4.checksumTest(TestFileAppend4.java:556)
 at 
 org.apache.hadoop.hdfs.TestFileAppend4.testAppendSyncChecksum1(TestFileAppend4.java:500)
 {noformat}
 I will attach logs for all of the failures.  Be aware that I did change some 
 of the logging messages in this test so I could better see when 
 testAppendSyncReplication started and ended.  Other then that the code is 
 stock 0.20.205 RC2



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-3660) TestDatanodeBlockScanner#testBlockCorruptionRecoveryPolicy2 times out

2015-07-20 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers resolved HDFS-3660.
--
  Resolution: Cannot Reproduce
Target Version/s:   (was: )

This is an ancient/stale flaky test JIRA. Resolving.

 TestDatanodeBlockScanner#testBlockCorruptionRecoveryPolicy2 times out   
 

 Key: HDFS-3660
 URL: https://issues.apache.org/jira/browse/HDFS-3660
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.0.0-alpha
Reporter: Eli Collins
Priority: Minor

 Saw this on a recent jenkins run.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-3811) TestPersistBlocks#TestRestartDfsWithFlush appears to be flaky

2015-07-20 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers resolved HDFS-3811.
--
Resolution: Cannot Reproduce

I don't think I've seen this fail in a very long time. Going to resolve this. 
Please reopen if you disagree.

 TestPersistBlocks#TestRestartDfsWithFlush appears to be flaky
 -

 Key: HDFS-3811
 URL: https://issues.apache.org/jira/browse/HDFS-3811
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.0.2-alpha
Reporter: Andrew Wang
Assignee: Todd Lipcon
 Attachments: stacktrace, testfail-editlog.log, testfail.log, 
 testpersistblocks.txt


 This test failed on a recent Jenkins build, but passes for me locally. Seems 
 flaky.
 See:
 https://builds.apache.org/job/PreCommit-HDFS-Build/3021//testReport/org.apache.hadoop.hdfs/TestPersistBlocks/TestRestartDfsWithFlush/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-4001) TestSafeMode#testInitializeReplQueuesEarly may time out

2015-07-20 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers resolved HDFS-4001.
--
Resolution: Fixed

Haven't seen this fail in a very long time. Closing this out. Feel free to 
reopen if you disagree.

 TestSafeMode#testInitializeReplQueuesEarly may time out
 ---

 Key: HDFS-4001
 URL: https://issues.apache.org/jira/browse/HDFS-4001
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.0.0-alpha
Reporter: Eli Collins
 Attachments: timeout.txt.gz


 Saw this failure on a recent branch-2 jenkins run, has also been seen on 
 trunk.
 {noformat}
 java.util.concurrent.TimeoutException: Timed out waiting for condition
   at 
 org.apache.hadoop.test.GenericTestUtils.waitFor(GenericTestUtils.java:107)
   at 
 org.apache.hadoop.hdfs.TestSafeMode.testInitializeReplQueuesEarly(TestSafeMode.java:191)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6300) Prevent multiple balancers from running simultaneously

2015-07-20 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634192#comment-14634192
 ] 

Aaron T. Myers commented on HDFS-6300:
--

Given these recent fixes, do we think that HDFS-4505 is now obsolete and should 
therefore be closed?

 Prevent multiple balancers from running simultaneously
 --

 Key: HDFS-6300
 URL: https://issues.apache.org/jira/browse/HDFS-6300
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: balancer  mover
Reporter: Rakesh R
Assignee: Rakesh R
Priority: Critical
 Fix For: 2.7.1

 Attachments: HDFS-6300-001.patch, HDFS-6300-002.patch, 
 HDFS-6300-003.patch, HDFS-6300-004.patch, HDFS-6300-005.patch, 
 HDFS-6300-006.patch, HDFS-6300.patch


 Javadoc of Balancer.java says, it will not allow to run second balancer if 
 the first one is in progress. But I've noticed multiple can run together and 
 balancer.id implementation is not safe guarding.
 {code}
  * liAnother balancer is running. Exiting...
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8657) Update docs for mSNN

2015-07-20 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-8657:
-
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

I've just committed this change to trunk.

Thanks very much for the contribution, Jesse.

 Update docs for mSNN
 

 Key: HDFS-8657
 URL: https://issues.apache.org/jira/browse/HDFS-8657
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Jesse Yates
Assignee: Jesse Yates
Priority: Minor
 Fix For: 3.0.0

 Attachments: hdfs-8657-v0.patch, hdfs-8657-v1.patch


 After the commit of HDFS-6440, some docs need to be updated to reflect the 
 new support for more than 2 NNs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8657) Update docs for mSNN

2015-07-20 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634275#comment-14634275
 ] 

Aaron T. Myers commented on HDFS-8657:
--

+1, latest patch looks good to me. I'm going to commit this momentarily.

 Update docs for mSNN
 

 Key: HDFS-8657
 URL: https://issues.apache.org/jira/browse/HDFS-8657
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Jesse Yates
Assignee: Jesse Yates
Priority: Minor
 Attachments: hdfs-8657-v0.patch, hdfs-8657-v1.patch


 After the commit of HDFS-6440, some docs need to be updated to reflect the 
 new support for more than 2 NNs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-4746) ClassCastException in BlockManager.addStoredBlock() due to that blockReceived came after file was closed.

2015-07-10 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14623039#comment-14623039
 ] 

Aaron T. Myers commented on HDFS-4746:
--

Should we go ahead and close out this JIRA, since I don't think anyone intends 
to maintain 2.2 and 2.0 anymore?

 ClassCastException in BlockManager.addStoredBlock() due to that blockReceived 
 came after file was closed.
 -

 Key: HDFS-4746
 URL: https://issues.apache.org/jira/browse/HDFS-4746
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.0.3-alpha
Reporter: Konstantin Shvachko

 In some cases the last block replica of a file can be reported after the file 
 was closed. In this case file inode is of type INodeFile. 
 BlockManager.addStoredBlock() though expects it to be 
 INodeFileUnderConstruction, and therefore class cast to 
 MutableBlockCollection fails.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8657) Update docs for mSNN

2015-06-26 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-8657:
-
Affects Version/s: 3.0.0

 Update docs for mSNN
 

 Key: HDFS-8657
 URL: https://issues.apache.org/jira/browse/HDFS-8657
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Jesse Yates
Assignee: Jesse Yates
Priority: Minor
 Attachments: hdfs-8657-v0.patch


 After the commit of HDFS-6440, some docs need to be updated to reflect the 
 new support for more than 2 NNs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8657) Update docs for mSNN

2015-06-26 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-8657:
-
Fix Version/s: (was: 3.0.0)
   Status: Patch Available  (was: Open)

 Update docs for mSNN
 

 Key: HDFS-8657
 URL: https://issues.apache.org/jira/browse/HDFS-8657
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Jesse Yates
Assignee: Jesse Yates
Priority: Minor
 Attachments: hdfs-8657-v0.patch


 After the commit of HDFS-6440, some docs need to be updated to reflect the 
 new support for more than 2 NNs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8657) Update docs for mSNN

2015-06-26 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14603720#comment-14603720
 ] 

Aaron T. Myers commented on HDFS-8657:
--

Hey Jesse, patch looks pretty good to me, except that we should probably also 
amend {{HDFSHighAvailabilityWithNFS.md}} similarly as well.

+1 pending addressing that and a clean Jenkins run.

 Update docs for mSNN
 

 Key: HDFS-8657
 URL: https://issues.apache.org/jira/browse/HDFS-8657
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Jesse Yates
Assignee: Jesse Yates
Priority: Minor
 Attachments: hdfs-8657-v0.patch


 After the commit of HDFS-6440, some docs need to be updated to reflect the 
 new support for more than 2 NNs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6440) Support more than 2 NameNodes

2015-06-23 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14598630#comment-14598630
 ] 

Aaron T. Myers commented on HDFS-6440:
--

I re-ran the failed tests locally and they all passed, and I don't think those 
tests have much of anything to do with this patch anyway.

+1, the latest patch looks good to me. I realized just now doing some final 
looks at the patch that we should also update the 
HDFSHighAvailabilityWithQJM.md document to indicate that more than two NNs are 
now supported, but I think that can be done as a follow-up JIRA since 
continuing to rebase this patch is pretty unwieldy.

I'm going to commit this momentarily.

 Support more than 2 NameNodes
 -

 Key: HDFS-6440
 URL: https://issues.apache.org/jira/browse/HDFS-6440
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: auto-failover, ha, namenode
Affects Versions: 2.4.0
Reporter: Jesse Yates
Assignee: Jesse Yates
 Fix For: 3.0.0

 Attachments: Multiple-Standby-NameNodes_V1.pdf, 
 hdfs-6440-cdh-4.5-full.patch, hdfs-6440-trunk-v1.patch, 
 hdfs-6440-trunk-v1.patch, hdfs-6440-trunk-v3.patch, hdfs-6440-trunk-v4.patch, 
 hdfs-6440-trunk-v5.patch, hdfs-6440-trunk-v6.patch, hdfs-6440-trunk-v7.patch, 
 hdfs-6440-trunk-v8.patch, hdfs-multiple-snn-trunk-v0.patch


 Most of the work is already done to support more than 2 NameNodes (one 
 active, one standby). This would be the last bit to support running multiple 
 _standby_ NameNodes; one of the standbys should be available for fail-over.
 Mostly, this is a matter of updating how we parse configurations, some 
 complexity around managing the checkpointing, and updating a whole lot of 
 tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6440) Support more than 2 NameNodes

2015-06-23 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14598700#comment-14598700
 ] 

Aaron T. Myers commented on HDFS-6440:
--

Cool, thanks. I'll review HDFS-8657 whenever you post a patch.

 Support more than 2 NameNodes
 -

 Key: HDFS-6440
 URL: https://issues.apache.org/jira/browse/HDFS-6440
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: auto-failover, ha, namenode
Affects Versions: 2.4.0
Reporter: Jesse Yates
Assignee: Jesse Yates
 Fix For: 3.0.0

 Attachments: Multiple-Standby-NameNodes_V1.pdf, 
 hdfs-6440-cdh-4.5-full.patch, hdfs-6440-trunk-v1.patch, 
 hdfs-6440-trunk-v1.patch, hdfs-6440-trunk-v3.patch, hdfs-6440-trunk-v4.patch, 
 hdfs-6440-trunk-v5.patch, hdfs-6440-trunk-v6.patch, hdfs-6440-trunk-v7.patch, 
 hdfs-6440-trunk-v8.patch, hdfs-multiple-snn-trunk-v0.patch


 Most of the work is already done to support more than 2 NameNodes (one 
 active, one standby). This would be the last bit to support running multiple 
 _standby_ NameNodes; one of the standbys should be available for fail-over.
 Mostly, this is a matter of updating how we parse configurations, some 
 complexity around managing the checkpointing, and updating a whole lot of 
 tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-6440) Support more than 2 NameNodes

2015-06-23 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-6440:
-
  Resolution: Fixed
Target Version/s: 3.0.0  (was: 2.6.0)
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

I've just committed this change to trunk.

Thanks a lot for the monster contribution, Jesse. Thanks also very much to Eddy 
for doing a bunch of initial reviews, and to Lars for keeping on me to review 
this patch. :)

[~jesse_yates] - mind filing a follow-up JIRA to amend the docs appropriately?

 Support more than 2 NameNodes
 -

 Key: HDFS-6440
 URL: https://issues.apache.org/jira/browse/HDFS-6440
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: auto-failover, ha, namenode
Affects Versions: 2.4.0
Reporter: Jesse Yates
Assignee: Jesse Yates
 Fix For: 3.0.0

 Attachments: Multiple-Standby-NameNodes_V1.pdf, 
 hdfs-6440-cdh-4.5-full.patch, hdfs-6440-trunk-v1.patch, 
 hdfs-6440-trunk-v1.patch, hdfs-6440-trunk-v3.patch, hdfs-6440-trunk-v4.patch, 
 hdfs-6440-trunk-v5.patch, hdfs-6440-trunk-v6.patch, hdfs-6440-trunk-v7.patch, 
 hdfs-6440-trunk-v8.patch, hdfs-multiple-snn-trunk-v0.patch


 Most of the work is already done to support more than 2 NameNodes (one 
 active, one standby). This would be the last bit to support running multiple 
 _standby_ NameNodes; one of the standbys should be available for fail-over.
 Mostly, this is a matter of updating how we parse configurations, some 
 complexity around managing the checkpointing, and updating a whole lot of 
 tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6440) Support more than 2 NameNodes

2015-06-18 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14592816#comment-14592816
 ] 

Aaron T. Myers commented on HDFS-6440:
--

Hey Jesse,

Here's the error that it's failing with on my (and Eddy's) box:

{noformat}
testUpgradeFromRel2ReservedImage(org.apache.hadoop.hdfs.TestDFSUpgradeFromImage)
  Time elapsed: 0.901 sec   ERROR!
org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory 
/home/atm/src/apache/hadoop.git/src/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/name-0-1
 is in an inconsistent state: storage directory does not exist or is not 
accessible.
at 
org.apache.hadoop.hdfs.server.namenode.FSImage.recoverStorageDirs(FSImage.java:327)
at 
org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:215)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:976)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:685)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:584)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:644)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:809)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:793)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1482)
at 
org.apache.hadoop.hdfs.MiniDFSCluster.createNameNode(MiniDFSCluster.java:1208)
at 
org.apache.hadoop.hdfs.MiniDFSCluster.configureNameService(MiniDFSCluster.java:971)
at 
org.apache.hadoop.hdfs.MiniDFSCluster.createNameNodesAndSetConf(MiniDFSCluster.java:882)
at 
org.apache.hadoop.hdfs.MiniDFSCluster.initMiniDFSCluster(MiniDFSCluster.java:814)
at org.apache.hadoop.hdfs.MiniDFSCluster.init(MiniDFSCluster.java:473)
at 
org.apache.hadoop.hdfs.MiniDFSCluster$Builder.build(MiniDFSCluster.java:432)
at 
org.apache.hadoop.hdfs.TestDFSUpgradeFromImage.testUpgradeFromRel2ReservedImage(TestDFSUpgradeFromImage.java:480)
{noformat}

I'll poke around myself a bit as well to see if I can figure out what's going 
on. This happens very reliably for me.

 Support more than 2 NameNodes
 -

 Key: HDFS-6440
 URL: https://issues.apache.org/jira/browse/HDFS-6440
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: auto-failover, ha, namenode
Affects Versions: 2.4.0
Reporter: Jesse Yates
Assignee: Jesse Yates
 Fix For: 3.0.0

 Attachments: Multiple-Standby-NameNodes_V1.pdf, 
 hdfs-6440-cdh-4.5-full.patch, hdfs-6440-trunk-v1.patch, 
 hdfs-6440-trunk-v1.patch, hdfs-6440-trunk-v3.patch, hdfs-6440-trunk-v4.patch, 
 hdfs-6440-trunk-v5.patch, hdfs-6440-trunk-v6.patch, hdfs-6440-trunk-v7.patch, 
 hdfs-6440-trunk-v8.patch, hdfs-multiple-snn-trunk-v0.patch


 Most of the work is already done to support more than 2 NameNodes (one 
 active, one standby). This would be the last bit to support running multiple 
 _standby_ NameNodes; one of the standbys should be available for fail-over.
 Mostly, this is a matter of updating how we parse configurations, some 
 complexity around managing the checkpointing, and updating a whole lot of 
 tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6440) Support more than 2 NameNodes

2015-06-18 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14592829#comment-14592829
 ] 

Aaron T. Myers commented on HDFS-6440:
--

Aha, that was totally it. Applied v8 correctly (surprised patch didn't complain 
about not being able to apply the binary diff) and the test passes just fine.

I'll wait for Jenkins to come back on the latest patch and then check that in.

 Support more than 2 NameNodes
 -

 Key: HDFS-6440
 URL: https://issues.apache.org/jira/browse/HDFS-6440
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: auto-failover, ha, namenode
Affects Versions: 2.4.0
Reporter: Jesse Yates
Assignee: Jesse Yates
 Fix For: 3.0.0

 Attachments: Multiple-Standby-NameNodes_V1.pdf, 
 hdfs-6440-cdh-4.5-full.patch, hdfs-6440-trunk-v1.patch, 
 hdfs-6440-trunk-v1.patch, hdfs-6440-trunk-v3.patch, hdfs-6440-trunk-v4.patch, 
 hdfs-6440-trunk-v5.patch, hdfs-6440-trunk-v6.patch, hdfs-6440-trunk-v7.patch, 
 hdfs-6440-trunk-v8.patch, hdfs-multiple-snn-trunk-v0.patch


 Most of the work is already done to support more than 2 NameNodes (one 
 active, one standby). This would be the last bit to support running multiple 
 _standby_ NameNodes; one of the standbys should be available for fail-over.
 Mostly, this is a matter of updating how we parse configurations, some 
 complexity around managing the checkpointing, and updating a whole lot of 
 tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6440) Support more than 2 NameNodes

2015-06-18 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14592174#comment-14592174
 ] 

Aaron T. Myers commented on HDFS-6440:
--

Hey Jesse, I was just about to commit this and did one final run of the 
relevant tests, and discovered that {{TestDFSUpgradeFromImage}} seems to start 
failing after applying the patch. It currently passes on trunk. I also asked 
Eddy to give this a shot to see if this was something local to my box, and it 
fails for him too.

Could you please look into what's going on there? Sorry about this.

 Support more than 2 NameNodes
 -

 Key: HDFS-6440
 URL: https://issues.apache.org/jira/browse/HDFS-6440
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: auto-failover, ha, namenode
Affects Versions: 2.4.0
Reporter: Jesse Yates
Assignee: Jesse Yates
 Fix For: 3.0.0

 Attachments: Multiple-Standby-NameNodes_V1.pdf, 
 hdfs-6440-cdh-4.5-full.patch, hdfs-6440-trunk-v1.patch, 
 hdfs-6440-trunk-v1.patch, hdfs-6440-trunk-v3.patch, hdfs-6440-trunk-v4.patch, 
 hdfs-6440-trunk-v5.patch, hdfs-6440-trunk-v6.patch, hdfs-6440-trunk-v7.patch, 
 hdfs-multiple-snn-trunk-v0.patch


 Most of the work is already done to support more than 2 NameNodes (one 
 active, one standby). This would be the last bit to support running multiple 
 _standby_ NameNodes; one of the standbys should be available for fail-over.
 Mostly, this is a matter of updating how we parse configurations, some 
 complexity around managing the checkpointing, and updating a whole lot of 
 tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6440) Support more than 2 NameNodes

2015-06-17 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14590853#comment-14590853
 ] 

Aaron T. Myers commented on HDFS-6440:
--

All these changes look good to me, thanks a lot for making them, Jesse. I'll 
fix the {{TestPipelinesFailover}} whitespace issue on commit.

+1 from me. I'm going to commit this tomorrow morning, unless someone speaks up 
in the meantime.

 Support more than 2 NameNodes
 -

 Key: HDFS-6440
 URL: https://issues.apache.org/jira/browse/HDFS-6440
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: auto-failover, ha, namenode
Affects Versions: 2.4.0
Reporter: Jesse Yates
Assignee: Jesse Yates
 Fix For: 3.0.0

 Attachments: Multiple-Standby-NameNodes_V1.pdf, 
 hdfs-6440-cdh-4.5-full.patch, hdfs-6440-trunk-v1.patch, 
 hdfs-6440-trunk-v1.patch, hdfs-6440-trunk-v3.patch, hdfs-6440-trunk-v4.patch, 
 hdfs-6440-trunk-v5.patch, hdfs-6440-trunk-v6.patch, hdfs-6440-trunk-v7.patch, 
 hdfs-multiple-snn-trunk-v0.patch


 Most of the work is already done to support more than 2 NameNodes (one 
 active, one standby). This would be the last bit to support running multiple 
 _standby_ NameNodes; one of the standbys should be available for fail-over.
 Mostly, this is a matter of updating how we parse configurations, some 
 complexity around managing the checkpointing, and updating a whole lot of 
 tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8469) Lockfiles are not being created for datanode storage directories

2015-05-23 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14557178#comment-14557178
 ] 

Aaron T. Myers commented on HDFS-8469:
--

Agree, seems unintentional. It'd be pretty difficult to inadvertently start up 
two DNs on the same host, since they'll likely try to bind to the same 
RPC/HTTP/DTP ports and fail, but still seems like we should fix this anyway, if 
only to get rid of the warning message.

 Lockfiles are not being created for datanode storage directories
 

 Key: HDFS-8469
 URL: https://issues.apache.org/jira/browse/HDFS-8469
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.4.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-8469.001.patch


 Lockfiles are not being created for datanode storage directories.  Due to a 
 mixup, we are initializing the StorageDirectory class with shared=true (an 
 option which was only intended for NFS directories used to implement NameNode 
 HA).  Setting shared=true disables lockfile generation and prints a log 
 message like this:
 {code}
 2015-05-22 11:45:16,367 INFO  common.Storage (Storage.java:lock(675)) - 
 Locking is disabled for 
 /home/cmccabe/hadoop2/hadoop-hdfs-project/hadoop-hdfs/target/  
 test/data/dfs/data/data5/current/BP-122766180-127.0.0.1-1432320314834
 {code}
 Without lock files, we could accidentally spawn two datanode processes using 
 the same directories without realizing it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6440) Support more than 2 NameNodes

2015-05-14 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14544653#comment-14544653
 ] 

Aaron T. Myers commented on HDFS-6440:
--

bq. Ah, Ok. Yes, that second set seed will clearly not be used and is 
definitely be misleading. Sorry for being dense :-/ I was just looking at the 
usage of the Random, not the seed!

No sweat. I figured we were talking past each other a bit.

bq. I'm thinking to just pull the better log message up to the static 
initialization and remove the those two lines (4-5).

I agree, this seems like the right move to me. Just have a single seed for the 
whole test class. Possible that we may at some point encounter some inter-test 
dependencies, and if so it'll be nice that there's only a single seed used 
across all the tests, instead of having to manually set several seeds to 
reproduce the same sequence. The fact that we already clearly log which NN is 
becoming active should be sufficient for reproducing individual test failures 
if one wants to do that.

Thanks, Jesse.

 Support more than 2 NameNodes
 -

 Key: HDFS-6440
 URL: https://issues.apache.org/jira/browse/HDFS-6440
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: auto-failover, ha, namenode
Affects Versions: 2.4.0
Reporter: Jesse Yates
Assignee: Jesse Yates
 Fix For: 3.0.0

 Attachments: Multiple-Standby-NameNodes_V1.pdf, 
 hdfs-6440-cdh-4.5-full.patch, hdfs-6440-trunk-v1.patch, 
 hdfs-6440-trunk-v1.patch, hdfs-6440-trunk-v3.patch, 
 hdfs-multiple-snn-trunk-v0.patch


 Most of the work is already done to support more than 2 NameNodes (one 
 active, one standby). This would be the last bit to support running multiple 
 _standby_ NameNodes; one of the standbys should be available for fail-over.
 Mostly, this is a matter of updating how we parse configurations, some 
 complexity around managing the checkpointing, and updating a whole lot of 
 tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8380) Always call addStoredBlock on blocks which have been shifted from one storage to another

2015-05-13 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14542717#comment-14542717
 ] 

Aaron T. Myers commented on HDFS-8380:
--

Great find, Colin. The patch looks good to me. Pretty confident the test 
failures are unrelated - both fail for me locally, even without the patch 
applied. The checkstyle warning is about the length of the BlockManager class 
file, which I don't think there's much we can do about.

+1

 Always call addStoredBlock on blocks which have been shifted from one storage 
 to another
 

 Key: HDFS-8380
 URL: https://issues.apache.org/jira/browse/HDFS-8380
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-8380.001.patch


 We should always call addStoredBlock on blocks which have been shifted from 
 one storage to another.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6440) Support more than 2 NameNodes

2015-05-13 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14542834#comment-14542834
 ] 

Aaron T. Myers commented on HDFS-6440:
--

bq. By setting the seed, you get the same sequence nn failures. So one seed 
would do 1-2-1-3, while another might do 1-3-2-1. Then, with the seed you 
could reproduce the series of failovers in the same order, which seems like a 
laudable goal for the test- especially when trying to debug weird error cases. 
Unless I'm missing something?

Right, I get the intended purpose, but one of us must be missing something 
because I still think there's some funny stuff going on with the 
{{FAILOVER_SEED}} variable. :)

In the latest patch, you'll see that the variable {{FAILOVER_SEED}} is used in 
the following steps:

# Statically declare {{FAILOVER_SEED}} and initialize it to the value of 
{{System.currentTimeMillis()}}
# Statically create {{failoverRandom}} to be a new {{Random}} object, 
initialized with the value of {{FAILOVER_SEED}}.
# In a static block, log the value of {{FAILOVER_SEED}}.
# In {{doWriteOverFailoverTest}}, reset the value of {{FAILOVER_SEED}} to again 
be {{System.currentTimeMillis()}}.
# Immediately thereafter in {{doWriteOverFailoverTest}}, log the new value of 
{{FAILOVER_SEED}}.

Note that there is no step 6 that resets {{failoverRandom}} to use the new 
value of {{FAILOVER_SEED}} that was set in step 4, nor is {{FAILOVER_SEED}} 
used for anything else after step 5. Thus, unless I'm missing something, seems 
like steps 4 and 5 are at least superfluous, and at worst misleading since the 
test logs will contain a message about using a random seed that is in fact 
never used.

 Support more than 2 NameNodes
 -

 Key: HDFS-6440
 URL: https://issues.apache.org/jira/browse/HDFS-6440
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: auto-failover, ha, namenode
Affects Versions: 2.4.0
Reporter: Jesse Yates
Assignee: Jesse Yates
 Fix For: 3.0.0

 Attachments: Multiple-Standby-NameNodes_V1.pdf, 
 hdfs-6440-cdh-4.5-full.patch, hdfs-6440-trunk-v1.patch, 
 hdfs-6440-trunk-v1.patch, hdfs-6440-trunk-v3.patch, 
 hdfs-multiple-snn-trunk-v0.patch


 Most of the work is already done to support more than 2 NameNodes (one 
 active, one standby). This would be the last bit to support running multiple 
 _standby_ NameNodes; one of the standbys should be available for fail-over.
 Mostly, this is a matter of updating how we parse configurations, some 
 complexity around managing the checkpointing, and updating a whole lot of 
 tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6440) Support more than 2 NameNodes

2015-05-07 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533279#comment-14533279
 ] 

Aaron T. Myers commented on HDFS-6440:
--

Hey Jesse,

Thanks a lot for working through my feedback, responses below.

bq. I'm not sure how we would test this when needing to change the structure of 
the FS to support more than 2 NNs. Would you recommend (1) recognizing the old 
layout and then (2) transfering it into the new layout? The reason this seems 
silly (to me) is that the layout is only enforced by the way the minicluster is 
used/setup, rather than the way things would actually be run. By moving things 
into the appropriate directories per-nn, but keeping everything else below that 
the same, I think we keep the same upgrade properties but don't need to do the 
above contrived/synthetic upgrade.

I'm specifically thinking about just expanding {{TestRollingUpgrade}} with some 
tests that exercise the  2 NN scenario, e.g. amending or expanding 
{{testRollingUpgradeWithQJM}}.

bq. Maybe some salesforce terminology leak here.snip

Cool, that's what I figured. The new comment looks good to me.

bq. Yes, it for when there is an error and you want to run the exact sequence 
of failovers again in the test. Minor helper, but can be useful when trying to 
track down ordering dependency issues (which there shoudn't be, but sometimes 
these things can creep in).

Sorry, maybe I wasn't clear. I get the point of using the random seed in the 
first place, but I'm specifically talking about the fact that in 
{{doWriteOverFailoverTest}} we change the value of that variable, log the 
value, and then never read it again. Doesn't seem like that's doing anything.

bq. It can either be an InterruptedException or an IOException when transfering 
the checkpoint. Interrupted (ie) thrown if we are interrupted while waiting 
the any checkpoint to complete. IOE if there is an execution exception when 
doing the checkpoint.snip

Right, I get that, but what I was pointing out was just that in the previous 
version of the patch the variable {{ie}} was never being assigned to anything 
but {{null}}. Here was the code in that patch, note the 4th-to-last line:
{code}
+InterruptedException ie = null;
+IOException ioe= null;
+int i = 0;
+boolean success = false;
+for (; i  uploads.size(); i++) {
+  FutureTransferFsImage.TransferResult upload = uploads.get(i);
+  try {
+// TODO should there be some smarts here about retries nodes that are 
not the active NN?
+if (upload.get() == TransferFsImage.TransferResult.SUCCESS) {
+  success = true;
+  //avoid getting the rest of the results - we don't care since we had 
a successful upload
+  break;
+}
+
+  } catch (ExecutionException e) {
+ioe = new IOException(Exception during image upload:  + 
e.getMessage(),
+e.getCause());
+break;
+  } catch (InterruptedException e) {
+ie = null;
+break;
+  }
+}
{code}
That's fixed in the latest version of the patch, where the variable {{ie}} is 
assigned to {{e}} when an {{InterruptedException}} occurs, so I think we're 
good.

bq. There is {{TestFailoverWithBlockTokensEnabled}}snip

Ah, my bad. Yes indeed, that looks good to me. The overlapping range issue is 
exactly what I wanted to see tested.

 Support more than 2 NameNodes
 -

 Key: HDFS-6440
 URL: https://issues.apache.org/jira/browse/HDFS-6440
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: auto-failover, ha, namenode
Affects Versions: 2.4.0
Reporter: Jesse Yates
Assignee: Jesse Yates
 Fix For: 3.0.0

 Attachments: Multiple-Standby-NameNodes_V1.pdf, 
 hdfs-6440-cdh-4.5-full.patch, hdfs-6440-trunk-v1.patch, 
 hdfs-6440-trunk-v1.patch, hdfs-6440-trunk-v3.patch, 
 hdfs-multiple-snn-trunk-v0.patch


 Most of the work is already done to support more than 2 NameNodes (one 
 active, one standby). This would be the last bit to support running multiple 
 _standby_ NameNodes; one of the standbys should be available for fail-over.
 Mostly, this is a matter of updating how we parse configurations, some 
 complexity around managing the checkpointing, and updating a whole lot of 
 tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6440) Support more than 2 NameNodes

2015-05-05 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529699#comment-14529699
 ] 

Aaron T. Myers commented on HDFS-6440:
--

Hi Jesse and Lars, 

My sincere apologies it took so long for me to post a review. No good excuse 
except being busy, but what else is new.

Anyway, the patch looks pretty good to me. Most everything that's below is 
pretty small stuff.

One small potential correctness issue:
# In {{StandbyCheckpointer#doCheckpoint}}, unless I'm missing something, I 
don't think the variable {{ie}} can ever be non-null, and yet we check for 
whether or not it's null later in the method to determine if we should shut 
down.

Two things I'd really like to see some test coverage for:
# The changes to {{BlockTokenSecretManager}} - they look fine to me in general, 
but I'd love to see some extra tests of this functionality with several NNs in 
play. Unless I missed something, I don't think there are any tests that would 
exercise more than 2 {{BlockTokenSecretManager}}s.
# Rolling upgrades/downgrades/rollbacks. I agree with you in general that this 
change should likely not affect anything, but I think it's important that we 
have some test(s) exercising this regardless.

Several little nits:
# In {{MiniZKFCCluster}}, this method now supports more than just two services: 
+   * Set up two services and their failover controllers.
# Recommend making {{intRange}} and {{nnRangeStart}} final in 
{{BlockTokenSecretManager}}.
# Should document the behavior of both of the newly-introduced config keys 
(dfs.namenode.checkpoint.check.quiet-multiplier and 
dfs.hs.tail-edits.namenode-retries) in hdfs-default.xml.
# I think this error message could be a bit clearer:
{quote}
+Node is currently not in the active state, state: + 
state +
+ does not support reading FSImages from other NameNodes);
{quote}
Recommend something like NameNode hostname or IP address is currently not in 
a state which can accept uploads of new fsimages. State: state.
# Would be great for debugging purposes if we could include the hostname or IP 
address of the checkpointer already doing the upload with the higher txid in 
this message:
{quote}
+Another checkpointer is already in the process of 
uploading a +
+ checkpoint made up to transaction ID  + 
larger.last());
{quote}
# Spelled failure incorrectly here: AUTHENTICATION_FAILRE
# Sorry, I don't quite follow this comment in {{BootstrapStandby}}:
{quote}
+// get the namespace from any active NN. On a fresh cluster, this is 
the active. On a
+// running cluster, this works on any node.
{quote}
What's a fresh cluster vs. a running cluster in this sense?
# In {{HATestUtil#waitForStandbyToCatchUp}}, looks like you changed the method 
comment to indicate that the method takes multiple standbys as an argument, but 
in fact the method functionality is unchanged. There's just some whitespace 
changes in that method.
# In {{TestPipelinesFailover#doWriteOverFailoverTest}}, is changing the value 
of {{FAILOVER_SEED}} going to do anything, given that it's only ever read at 
the static initialization of the {{failoverRandom}}?

Also, not a problem at all, but just want to say that I really like the way 
this patch changes TransferFsImage, and the additional diagnostic info it 
provides when uploads fail. That's a nice little improvement by itself.

I'll be +1 once this stuff is addressed.

 Support more than 2 NameNodes
 -

 Key: HDFS-6440
 URL: https://issues.apache.org/jira/browse/HDFS-6440
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: auto-failover, ha, namenode
Affects Versions: 2.4.0
Reporter: Jesse Yates
Assignee: Jesse Yates
 Attachments: Multiple-Standby-NameNodes_V1.pdf, 
 hdfs-6440-cdh-4.5-full.patch, hdfs-6440-trunk-v1.patch, 
 hdfs-6440-trunk-v1.patch, hdfs-multiple-snn-trunk-v0.patch


 Most of the work is already done to support more than 2 NameNodes (one 
 active, one standby). This would be the last bit to support running multiple 
 _standby_ NameNodes; one of the standbys should be available for fail-over.
 Mostly, this is a matter of updating how we parse configurations, some 
 complexity around managing the checkpointing, and updating a whole lot of 
 tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-5223) Allow edit log/fsimage format changes without changing layout version

2015-05-05 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529231#comment-14529231
 ] 

Aaron T. Myers commented on HDFS-5223:
--

Hey Chris, thanks a lot for working on this.

Seems like this approach would certainly help with the downgrade/rollback 
issue, but wouldn't do much to make the upgrade itself easier. In cases where 
the only NN metadata change between versions is just the introduction of new 
edit log op codes, I think it'd be much better if we could just swap the 
software during a rolling restart without having to use the {{-rollingUpgrade}} 
functionality at all, and then optionally enable the feature via an 
administrative command afterward - essentially the feature flags proposal 
earlier discussed. That approach will both make non-destructive downgrades 
possible from versions which introduce new op codes, and make upgrades 
substantially easier as well.

What's your reasoning for wanting to stick with a linear layout version number 
approach when introducing new op codes? In general I think it'd be beneficial 
for HDFS to move toward a bit-set denoting which features/op codes are 
enabled/disabled, much like [~tlipcon] described earlier.

 Allow edit log/fsimage format changes without changing layout version
 -

 Key: HDFS-5223
 URL: https://issues.apache.org/jira/browse/HDFS-5223
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.1.1-beta
Reporter: Aaron T. Myers
Assignee: Colin Patrick McCabe
 Attachments: HDFS-5223-HDFS-Downgrade-Extended-Support.pdf, 
 HDFS-5223.004.patch, HDFS-5223.005.patch


 Currently all HDFS on-disk formats are version by the single layout version. 
 This means that even for changes which might be backward compatible, like the 
 addition of a new edit log op code, we must go through the full `namenode 
 -upgrade' process which requires coordination with DNs, etc. HDFS should 
 support a lighter weight alternative.
 Copied description from HDFS-8075 which is a duplicate and now closed. (by 
 sanjay on APril 7 2015)
 Background
 * HDFS image layout was changed to use Protobufs to allow easier forward and 
 backward compatibility.
 * Hdfs has a layout version which is changed on each change (even if it an  
 optional protobuf field was added).
 * Hadoop supports two ways of going back during an upgrade:
 **  downgrade: go back to old binary version but use existing image/edits so 
 that newly created files are not lost
 ** rollback: go back to checkpoint created before upgrade was started - 
 hence newly created files are lost.
 Layout needs to be revisited if we want to support downgrade is some 
 circumstances which we dont today. Here are use cases:
 * Some changes can support downgrade even though they was a change in layout 
 since there is not real data loss but only loss of new functionality. E.g. 
 when we added ACLs one could have downgraded - there is no data loss but you 
 will lose the newly created ACLs. That is acceptable for a user since one 
 does not expect to retain the newly added ACLs in an old version.
 * Some changes may lead to data-loss if the functionality was used. For 
 example, the recent truncate will cause data loss if the functionality was 
 actually used. Now one can tell admins NOT use such new such new features 
 till the upgrade is finalized in which case one could potentially support 
 downgrade.
 * A fairly fundamental change to layout where a downgrade is not possible but 
 a rollback is. Say we change the layout completely from protobuf to something 
 else. Another example is when HDFS moves to support partial namespace in 
 memory - they is likely to be a fairly fundamental change in layout.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-5223) Allow edit log/fsimage format changes without changing layout version

2015-05-05 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529336#comment-14529336
 ] 

Aaron T. Myers commented on HDFS-5223:
--

bq. Complexity in HDFS often arises from combinations of its features rather 
than individual features in isolation. If individual features can be toggled, 
then no two HDFS instances running the same software version are really 
guaranteed to be alike. This becomes another layer of troubleshooting required 
for a technical support team. Testing the possible combinations of features on 
and off becomes a combinatorial explosion that's difficult for a QA team to 
manage.

This is an issue, to be sure, but is this really different with or without 
feature flags present? Even today, users can always choose to use or not use 
all the various features of HDFS in any number of combinations. The fact that 
presently all features are always enabled means that we should consider 
ourselves obligated to make sure that all features work well with all other 
features.

bq. Aside from managing metadata upgrades, we've also found rolling upgrade to 
be valuable because of the OOB ack propagated through write pipelines 
(HDFS-5583) to tell clients to pause rather than aborting the connection. Even 
if it wasn't required from a metadata standpoint, some users might continue to 
use rolling upgrade to get this benefit, even within a minor release line where 
the layout version hasn't changed. Considering that use case, I see value in 
improving our ability to downgrade within the current rolling upgrade scheme.

Fair point, but this suggests to me that the OOB ack feature should perhaps be 
separated from the rolling upgrade feature, since those seem somewhat 
orthogonal. One might want to use the OOB ack feature just when doing a rolling 
restart (no upgrade) to effect a configuration change, without the additional 
complexity of metadata changes, etc.

 Allow edit log/fsimage format changes without changing layout version
 -

 Key: HDFS-5223
 URL: https://issues.apache.org/jira/browse/HDFS-5223
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.1.1-beta
Reporter: Aaron T. Myers
Assignee: Colin Patrick McCabe
 Attachments: HDFS-5223-HDFS-Downgrade-Extended-Support.pdf, 
 HDFS-5223.004.patch, HDFS-5223.005.patch


 Currently all HDFS on-disk formats are version by the single layout version. 
 This means that even for changes which might be backward compatible, like the 
 addition of a new edit log op code, we must go through the full `namenode 
 -upgrade' process which requires coordination with DNs, etc. HDFS should 
 support a lighter weight alternative.
 Copied description from HDFS-8075 which is a duplicate and now closed. (by 
 sanjay on APril 7 2015)
 Background
 * HDFS image layout was changed to use Protobufs to allow easier forward and 
 backward compatibility.
 * Hdfs has a layout version which is changed on each change (even if it an  
 optional protobuf field was added).
 * Hadoop supports two ways of going back during an upgrade:
 **  downgrade: go back to old binary version but use existing image/edits so 
 that newly created files are not lost
 ** rollback: go back to checkpoint created before upgrade was started - 
 hence newly created files are lost.
 Layout needs to be revisited if we want to support downgrade is some 
 circumstances which we dont today. Here are use cases:
 * Some changes can support downgrade even though they was a change in layout 
 since there is not real data loss but only loss of new functionality. E.g. 
 when we added ACLs one could have downgraded - there is no data loss but you 
 will lose the newly created ACLs. That is acceptable for a user since one 
 does not expect to retain the newly added ACLs in an old version.
 * Some changes may lead to data-loss if the functionality was used. For 
 example, the recent truncate will cause data loss if the functionality was 
 actually used. Now one can tell admins NOT use such new such new features 
 till the upgrade is finalized in which case one could potentially support 
 downgrade.
 * A fairly fundamental change to layout where a downgrade is not possible but 
 a rollback is. Say we change the layout completely from protobuf to something 
 else. Another example is when HDFS moves to support partial namespace in 
 memory - they is likely to be a fairly fundamental change in layout.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8178) QJM doesn't purge empty and corrupt inprogress edits files

2015-04-29 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520617#comment-14520617
 ] 

Aaron T. Myers commented on HDFS-8178:
--

Hi Zhe, thanks a lot for posting this patch. It looks pretty good to me. I have 
one comment, and one question for you:

Comment: I'm a tad concerned that the current patch be be over aggressive in 
terms of purging edit log segments, if somehow we ended up with two in-progress 
segments, both of which were required to form the complete history of edit log 
transactions. Unless we're somehow guaranteed that that's not a state we could 
end up in, I think we should do something to guarantee that the in-progress 
edits file(s) that we're considering purging definitely overlap with finalized 
edit log segments so that they don't contain any edits that we can't afford to 
lose. Or, perhaps we should move them aside with a different name until such 
time as we can be sure that we don't need their transactions anymore, i.e. 
their transaction is less than minTxIdToKeep, and so we can definitely safely 
discard them.

Question: In the patch for HDFS-5919, we introduced a new regex for identifying 
stale in-progress files. I'm not familiar with why that was necessary in that 
patch, but can you please comment on why it's not necessary in this case? 
Naively, I'd expect either both or neither the FJM and JN to require accounting 
for that.

 QJM doesn't purge empty and corrupt inprogress edits files
 --

 Key: HDFS-8178
 URL: https://issues.apache.org/jira/browse/HDFS-8178
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: qjm
Reporter: Zhe Zhang
Assignee: Zhe Zhang
 Attachments: HDFS-8178.000.patch


 HDFS-5919 fixes the issue for {{FileJournalManager}}. A similar fix is needed 
 for QJM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8194) Add administrative tool to be able to examine the NN's view of DN storages

2015-04-22 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507879#comment-14507879
 ] 

Aaron T. Myers commented on HDFS-8194:
--

Those all seem like decent ideas to me, especially the bit about putting it 
into JMX so that arbitrary other tools can get access to this info as well. The 
crux of the problem is just that there's no way currently to introspect the 
NN's view of the world with respect to the storages it's tracking.

 Add administrative tool to be able to examine the NN's view of DN storages
 --

 Key: HDFS-8194
 URL: https://issues.apache.org/jira/browse/HDFS-8194
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: namenode
Affects Versions: 2.7.0
Reporter: Aaron T. Myers
Assignee: Colin Patrick McCabe

 The NN has long had facilities to be able to list all of the DNs that are 
 registered with it. It would be great if there were an administrative tool be 
 able to list all of the individual storages that the NN is tracking.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-8193) Add the ability to delay replica deletion for a period of time

2015-04-20 Thread Aaron T. Myers (JIRA)
Aaron T. Myers created HDFS-8193:


 Summary: Add the ability to delay replica deletion for a period of 
time
 Key: HDFS-8193
 URL: https://issues.apache.org/jira/browse/HDFS-8193
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: namenode
Affects Versions: 2.7.0
Reporter: Aaron T. Myers
Assignee: Zhe Zhang


When doing maintenance on an HDFS cluster, users may be concerned about the 
possibility of administrative mistakes or software bugs deleting replicas of 
blocks that cannot easily be restored. It would be handy if HDFS could be made 
to optionally not delete any replicas for a configurable period of time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-8194) Add administrative tool to be able to examine the NN's view of DN storages

2015-04-20 Thread Aaron T. Myers (JIRA)
Aaron T. Myers created HDFS-8194:


 Summary: Add administrative tool to be able to examine the NN's 
view of DN storages
 Key: HDFS-8194
 URL: https://issues.apache.org/jira/browse/HDFS-8194
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: namenode
Affects Versions: 2.7.0
Reporter: Aaron T. Myers
Assignee: Colin Patrick McCabe


The NN has long had facilities to be able to list all of the DNs that are 
registered with it. It would be great if there were an administrative tool be 
able to list all of the individual storages that the NN is tracking.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8113) NullPointerException in BlockInfoContiguous causes block report failure

2015-04-15 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14496505#comment-14496505
 ] 

Aaron T. Myers commented on HDFS-8113:
--

That all makes sense to me as well.

[~chengbing.liu] - would you be up for adding a unit test to this patch as 
Harsh and Colin have described?

 NullPointerException in BlockInfoContiguous causes block report failure
 ---

 Key: HDFS-8113
 URL: https://issues.apache.org/jira/browse/HDFS-8113
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.6.0
Reporter: Chengbing Liu
Assignee: Chengbing Liu
 Attachments: HDFS-8113.patch


 The following copy constructor can throw NullPointerException if {{bc}} is 
 null.
 {code}
   protected BlockInfoContiguous(BlockInfoContiguous from) {
 this(from, from.bc.getBlockReplication());
 this.bc = from.bc;
   }
 {code}
 We have observed that some DataNodes keeps failing doing block reports with 
 NameNode. The stacktrace is as follows. Though we are not using the latest 
 version, the problem still exists.
 {quote}
 2015-03-08 19:28:13,442 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
 RemoteException in offerService
 org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): 
 java.lang.NullPointerException
 at org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo.(BlockInfo.java:80)
 at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$BlockToMarkCorrupt.(BlockManager.java:1696)
 at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.checkReplicaCorrupt(BlockManager.java:2185)
 at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReportedBlock(BlockManager.java:2047)
 at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.reportDiff(BlockManager.java:1950)
 at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1823)
 at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1750)
 at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.blockReport(NameNodeRpcServer.java:1069)
 at 
 org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.blockReport(DatanodeProtocolServerSideTranslatorPB.java:152)
 at 
 org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:26382)
 at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1623)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
 {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8101) DFSClient use of non-constant DFSConfigKeys pulls in WebHDFS classes at runtime

2015-04-09 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-8101:
-
   Resolution: Fixed
Fix Version/s: 2.8.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

I've just committed this to trunk and branch-2.

Thanks very much for the contribution, Sean.

 DFSClient use of non-constant DFSConfigKeys pulls in WebHDFS classes at 
 runtime
 ---

 Key: HDFS-8101
 URL: https://issues.apache.org/jira/browse/HDFS-8101
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs-client
Affects Versions: 2.7.0
Reporter: Sean Busbey
Assignee: Sean Busbey
Priority: Minor
 Fix For: 2.8.0

 Attachments: HDFS-8101.1.patch.txt


 Previously, all references to DFSConfigKeys in DFSClient were compile time 
 constants which meant that normal users of DFSClient wouldn't resolve 
 DFSConfigKeys at run time. As of HDFS-7718, DFSClient has a reference to a 
 member of DFSConfigKeys that isn't compile time constant 
 (DFS_CLIENT_KEY_PROVIDER_CACHE_EXPIRY_DEFAULT).
 Since the class must be resolved now, this particular member
 {code}
 public static final String  DFS_WEBHDFS_AUTHENTICATION_FILTER_DEFAULT = 
 AuthFilter.class.getName();
 {code}
 means that javax.servlet.Filter needs to be on the classpath.
 javax-servlet-api is one of the properly listed dependencies for HDFS, 
 however if we replace {{AuthFilter.class.getName()}} with the equivalent 
 String literal then downstream folks can avoid including it while maintaining 
 compatibility.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8101) DFSClient use of non-constant DFSConfigKeys pulls in WebHDFS classes at runtime

2015-04-09 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14487632#comment-14487632
 ] 

Aaron T. Myers commented on HDFS-8101:
--

+1, the patch looks good to me. Good sleuthing, Sean.

I'm going to commit this momentarily.

 DFSClient use of non-constant DFSConfigKeys pulls in WebHDFS classes at 
 runtime
 ---

 Key: HDFS-8101
 URL: https://issues.apache.org/jira/browse/HDFS-8101
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs-client
Affects Versions: 2.7.0
Reporter: Sean Busbey
Assignee: Sean Busbey
Priority: Minor
 Attachments: HDFS-8101.1.patch.txt


 Previously, all references to DFSConfigKeys in DFSClient were compile time 
 constants which meant that normal users of DFSClient wouldn't resolve 
 DFSConfigKeys at run time. As of HDFS-7718, DFSClient has a reference to a 
 member of DFSConfigKeys that isn't compile time constant 
 (DFS_CLIENT_KEY_PROVIDER_CACHE_EXPIRY_DEFAULT).
 Since the class must be resolved now, this particular member
 {code}
 public static final String  DFS_WEBHDFS_AUTHENTICATION_FILTER_DEFAULT = 
 AuthFilter.class.getName();
 {code}
 means that javax.servlet.Filter needs to be on the classpath.
 javax-servlet-api is one of the properly listed dependencies for HDFS, 
 however if we replace {{AuthFilter.class.getName()}} with the equivalent 
 String literal then downstream folks can avoid including it while maintaining 
 compatibility.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-5223) Allow edit log/fsimage format changes without changing layout version

2015-04-07 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-5223:
-
Target Version/s: 3.0.0  (was: )

 Allow edit log/fsimage format changes without changing layout version
 -

 Key: HDFS-5223
 URL: https://issues.apache.org/jira/browse/HDFS-5223
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.1.1-beta
Reporter: Aaron T. Myers
Assignee: Colin Patrick McCabe
 Attachments: HDFS-5223.004.patch


 Currently all HDFS on-disk formats are version by the single layout version. 
 This means that even for changes which might be backward compatible, like the 
 addition of a new edit log op code, we must go through the full `namenode 
 -upgrade' process which requires coordination with DNs, etc. HDFS should 
 support a lighter weight alternative.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8075) Revist layout version

2015-04-07 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483601#comment-14483601
 ] 

Aaron T. Myers commented on HDFS-8075:
--

[~sanjay.radia] - Thanks for filing this. Agree that HDFS could use more 
flexibility in this sense. Are you familiar with the discussion on HDFS-5223? 
Seems like this JIRA may have similar goals to that one.

 Revist layout version
 -

 Key: HDFS-8075
 URL: https://issues.apache.org/jira/browse/HDFS-8075
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: HDFS
Affects Versions: 2.6.0
Reporter: Sanjay Radia

 Background
 * HDFS image layout was changed to use Protobufs to allow easier forward and 
 backward compatibility.
 * Hdfs has a layout version which is changed on each change (even if it an  
 optional protobuf field was added).
 * Hadoop supports two ways of going back during an upgrade:
 **  downgrade: go back to old binary version but use existing image/edits so 
 that newly created files are not lost
 ** rollback: go back to checkpoint created before upgrade was started - 
 hence newly created files are lost.
 Layout needs to be revisited if we want to support downgrade is some 
 circumstances which we dont today. Here are use cases:
 * Some changes can support downgrade even though they was a change in layout 
 since there is not real data loss but only loss of new functionality. E.g. 
 when we added ACLs one could have downgraded - there is no data loss but you 
 will lose the newly created ACLs. That is acceptable for a user since one 
 does not expect to retain the newly added ACLs in an old version.
 * Some changes may lead to data-loss if the functionality was used. For 
 example, the recent truncate will cause data loss if the functionality was 
 actually used. Now one can tell admins NOT use such new such new features 
 till the upgrade is finalized in which case one could potentially support 
 downgrade.
 * A fairly fundamental change to layout where a downgrade is not possible but 
 a rollback is. Say we change the layout completely from protobuf to something 
 else. Another example is when HDFS moves to support partial namespace in 
 memory - they is likely to be a fairly fundamental change in layout.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7037) Using distcp to copy data from insecure to secure cluster via hftp doesn't work (branch-2 only)

2015-04-07 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14484301#comment-14484301
 ] 

Aaron T. Myers commented on HDFS-7037:
--

Thanks for the reply, [~wheat9].

As I've said previously, adding this capability to HFTP does not change the 
security semantics of Hadoop at all, since RPC and other interfaces used for 
remote access already support allowing configurable insecure fallback. This is 
_not_ a security vulnerability. If it were, we should be removing the ability 
to configure insecure fallback at all in Hadoop. We're not doing that, because 
it was a deliberate choice to add that feature. Given that, I still don't 
understand why you'd be unwilling to fix this issue in HFTP. HFTP, like WebHDFS 
and RPC, is supposed to be able to work with either secure or insecure 
clusters, when configured to do so. It should be viewed as a bug that HFTP 
doesn't currently work, whereas the others do. Implementing HADOOP-11701 is a 
good idea in general, but fixing this bug in HFTP should not be gated on 
implementing that new feature.

So, I'll ask again, would you please consider changing your -1 to a -0?

 Using distcp to copy data from insecure to secure cluster via hftp doesn't 
 work  (branch-2 only)
 

 Key: HDFS-7037
 URL: https://issues.apache.org/jira/browse/HDFS-7037
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: security, tools
Affects Versions: 2.6.0
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang
 Attachments: HDFS-7037.001.patch


 This is a branch-2 only issue since hftp is only supported there. 
 Issuing distcp hftp://insecureCluster hdfs://secureCluster gave the 
 following failure exception:
 {code}
 14/09/13 22:07:40 INFO tools.DelegationTokenFetcher: Error when dealing 
 remote token:
 java.io.IOException: Error when dealing remote token: Internal Server Error
   at 
 org.apache.hadoop.hdfs.tools.DelegationTokenFetcher.run(DelegationTokenFetcher.java:375)
   at 
 org.apache.hadoop.hdfs.tools.DelegationTokenFetcher.getDTfromRemote(DelegationTokenFetcher.java:238)
   at 
 org.apache.hadoop.hdfs.web.HftpFileSystem$2.run(HftpFileSystem.java:252)
   at 
 org.apache.hadoop.hdfs.web.HftpFileSystem$2.run(HftpFileSystem.java:247)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
   at 
 org.apache.hadoop.hdfs.web.HftpFileSystem.getDelegationToken(HftpFileSystem.java:247)
   at 
 org.apache.hadoop.hdfs.web.TokenAspect.ensureTokenInitialized(TokenAspect.java:140)
   at 
 org.apache.hadoop.hdfs.web.HftpFileSystem.addDelegationTokenParam(HftpFileSystem.java:337)
   at 
 org.apache.hadoop.hdfs.web.HftpFileSystem.openConnection(HftpFileSystem.java:324)
   at 
 org.apache.hadoop.hdfs.web.HftpFileSystem$LsParser.fetchList(HftpFileSystem.java:457)
   at 
 org.apache.hadoop.hdfs.web.HftpFileSystem$LsParser.getFileStatus(HftpFileSystem.java:472)
   at 
 org.apache.hadoop.hdfs.web.HftpFileSystem.getFileStatus(HftpFileSystem.java:501)
   at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:57)
   at org.apache.hadoop.fs.Globber.glob(Globber.java:248)
   at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1623)
   at 
 org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:77)
   at org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:81)
   at 
 org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:342)
   at org.apache.hadoop.tools.DistCp.execute(DistCp.java:154)
   at org.apache.hadoop.tools.DistCp.run(DistCp.java:121)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
   at org.apache.hadoop.tools.DistCp.main(DistCp.java:390)
 14/09/13 22:07:40 WARN security.UserGroupInformation: 
 PriviledgedActionException as:hadoopu...@xyz.com (auth:KERBEROS) 
 cause:java.io.IOException: Unable to obtain remote token
 14/09/13 22:07:40 ERROR tools.DistCp: Exception encountered 
 java.io.IOException: Unable to obtain remote token
   at 
 org.apache.hadoop.hdfs.tools.DelegationTokenFetcher.getDTfromRemote(DelegationTokenFetcher.java:249)
   at 
 org.apache.hadoop.hdfs.web.HftpFileSystem$2.run(HftpFileSystem.java:252)
   at 
 org.apache.hadoop.hdfs.web.HftpFileSystem$2.run(HftpFileSystem.java:247)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
   at 
 

[jira] [Commented] (HDFS-6440) Support more than 2 NameNodes

2015-04-06 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14481831#comment-14481831
 ] 

Aaron T. Myers commented on HDFS-6440:
--

Sorry, [~jesse_yates], been busy. I got partway through a review of the patch a 
few weeks ago, but then haven't gotten back to it yet. Will post my feedback 
soon here.

 Support more than 2 NameNodes
 -

 Key: HDFS-6440
 URL: https://issues.apache.org/jira/browse/HDFS-6440
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: auto-failover, ha, namenode
Affects Versions: 2.4.0
Reporter: Jesse Yates
Assignee: Jesse Yates
 Attachments: Multiple-Standby-NameNodes_V1.pdf, 
 hdfs-6440-cdh-4.5-full.patch, hdfs-6440-trunk-v1.patch, 
 hdfs-6440-trunk-v1.patch, hdfs-multiple-snn-trunk-v0.patch


 Most of the work is already done to support more than 2 NameNodes (one 
 active, one standby). This would be the last bit to support running multiple 
 _standby_ NameNodes; one of the standbys should be available for fail-over.
 Mostly, this is a matter of updating how we parse configurations, some 
 complexity around managing the checkpointing, and updating a whole lot of 
 tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7037) Using distcp to copy data from insecure to secure cluster via hftp doesn't work (branch-2 only)

2015-03-20 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14372012#comment-14372012
 ] 

Aaron T. Myers commented on HDFS-7037:
--

Hey [~wheat9], have you had any chance to think about my last comment? Absent 
this being a security vulnerability (which I don't think this is, for the 
reason stated) I don't see any reason not to fix this in HFTP. We can certainly 
work on more general solutions in HADOOP-11726, but I'd still really like to 
get this issue fixed in HFTP in the meantime.

Thanks very much.

 Using distcp to copy data from insecure to secure cluster via hftp doesn't 
 work  (branch-2 only)
 

 Key: HDFS-7037
 URL: https://issues.apache.org/jira/browse/HDFS-7037
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: security, tools
Affects Versions: 2.6.0
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang
 Attachments: HDFS-7037.001.patch


 This is a branch-2 only issue since hftp is only supported there. 
 Issuing distcp hftp://insecureCluster hdfs://secureCluster gave the 
 following failure exception:
 {code}
 14/09/13 22:07:40 INFO tools.DelegationTokenFetcher: Error when dealing 
 remote token:
 java.io.IOException: Error when dealing remote token: Internal Server Error
   at 
 org.apache.hadoop.hdfs.tools.DelegationTokenFetcher.run(DelegationTokenFetcher.java:375)
   at 
 org.apache.hadoop.hdfs.tools.DelegationTokenFetcher.getDTfromRemote(DelegationTokenFetcher.java:238)
   at 
 org.apache.hadoop.hdfs.web.HftpFileSystem$2.run(HftpFileSystem.java:252)
   at 
 org.apache.hadoop.hdfs.web.HftpFileSystem$2.run(HftpFileSystem.java:247)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
   at 
 org.apache.hadoop.hdfs.web.HftpFileSystem.getDelegationToken(HftpFileSystem.java:247)
   at 
 org.apache.hadoop.hdfs.web.TokenAspect.ensureTokenInitialized(TokenAspect.java:140)
   at 
 org.apache.hadoop.hdfs.web.HftpFileSystem.addDelegationTokenParam(HftpFileSystem.java:337)
   at 
 org.apache.hadoop.hdfs.web.HftpFileSystem.openConnection(HftpFileSystem.java:324)
   at 
 org.apache.hadoop.hdfs.web.HftpFileSystem$LsParser.fetchList(HftpFileSystem.java:457)
   at 
 org.apache.hadoop.hdfs.web.HftpFileSystem$LsParser.getFileStatus(HftpFileSystem.java:472)
   at 
 org.apache.hadoop.hdfs.web.HftpFileSystem.getFileStatus(HftpFileSystem.java:501)
   at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:57)
   at org.apache.hadoop.fs.Globber.glob(Globber.java:248)
   at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1623)
   at 
 org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:77)
   at org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:81)
   at 
 org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:342)
   at org.apache.hadoop.tools.DistCp.execute(DistCp.java:154)
   at org.apache.hadoop.tools.DistCp.run(DistCp.java:121)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
   at org.apache.hadoop.tools.DistCp.main(DistCp.java:390)
 14/09/13 22:07:40 WARN security.UserGroupInformation: 
 PriviledgedActionException as:hadoopu...@xyz.com (auth:KERBEROS) 
 cause:java.io.IOException: Unable to obtain remote token
 14/09/13 22:07:40 ERROR tools.DistCp: Exception encountered 
 java.io.IOException: Unable to obtain remote token
   at 
 org.apache.hadoop.hdfs.tools.DelegationTokenFetcher.getDTfromRemote(DelegationTokenFetcher.java:249)
   at 
 org.apache.hadoop.hdfs.web.HftpFileSystem$2.run(HftpFileSystem.java:252)
   at 
 org.apache.hadoop.hdfs.web.HftpFileSystem$2.run(HftpFileSystem.java:247)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
   at 
 org.apache.hadoop.hdfs.web.HftpFileSystem.getDelegationToken(HftpFileSystem.java:247)
   at 
 org.apache.hadoop.hdfs.web.TokenAspect.ensureTokenInitialized(TokenAspect.java:140)
   at 
 org.apache.hadoop.hdfs.web.HftpFileSystem.addDelegationTokenParam(HftpFileSystem.java:337)
   at 
 org.apache.hadoop.hdfs.web.HftpFileSystem.openConnection(HftpFileSystem.java:324)
   at 
 org.apache.hadoop.hdfs.web.HftpFileSystem$LsParser.fetchList(HftpFileSystem.java:457)
   at 
 org.apache.hadoop.hdfs.web.HftpFileSystem$LsParser.getFileStatus(HftpFileSystem.java:472)
   at 
 

[jira] [Commented] (HDFS-7037) Using distcp to copy data from insecure to secure cluster via hftp doesn't work (branch-2 only)

2015-03-18 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367665#comment-14367665
 ] 

Aaron T. Myers commented on HDFS-7037:
--

[~wheat9] thanks for the response.

bq. I have strong preferences not to do so where my reasonings can be found in 
relevant jiras. As I pointed out in HDFS-6776, you'll need to fix this issue 
for every single filesystem. I appreciate if you can continue to investigate on 
doing it in distcp.

Of course, but if we do a fix only in distcp, then other relevant tools that 
use the various file systems (e.g. even simple ones like `hadoop fs ...') still 
won't work. So the question is: do we fix all the tools that use FileSystem? Or 
do we fix all the FileSystem implementations? The right answer seems to me to 
quite clearly be that we should fix the FileSystem implementations, as we 
should not require this workaround to be implemented by anyone coding against 
FileSystem.

To be clear, are you -1 on doing this fix for HFTP? Or just -0?

 Using distcp to copy data from insecure to secure cluster via hftp doesn't 
 work  (branch-2 only)
 

 Key: HDFS-7037
 URL: https://issues.apache.org/jira/browse/HDFS-7037
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: security, tools
Affects Versions: 2.6.0
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang
 Attachments: HDFS-7037.001.patch


 This is a branch-2 only issue since hftp is only supported there. 
 Issuing distcp hftp://insecureCluster hdfs://secureCluster gave the 
 following failure exception:
 {code}
 14/09/13 22:07:40 INFO tools.DelegationTokenFetcher: Error when dealing 
 remote token:
 java.io.IOException: Error when dealing remote token: Internal Server Error
   at 
 org.apache.hadoop.hdfs.tools.DelegationTokenFetcher.run(DelegationTokenFetcher.java:375)
   at 
 org.apache.hadoop.hdfs.tools.DelegationTokenFetcher.getDTfromRemote(DelegationTokenFetcher.java:238)
   at 
 org.apache.hadoop.hdfs.web.HftpFileSystem$2.run(HftpFileSystem.java:252)
   at 
 org.apache.hadoop.hdfs.web.HftpFileSystem$2.run(HftpFileSystem.java:247)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
   at 
 org.apache.hadoop.hdfs.web.HftpFileSystem.getDelegationToken(HftpFileSystem.java:247)
   at 
 org.apache.hadoop.hdfs.web.TokenAspect.ensureTokenInitialized(TokenAspect.java:140)
   at 
 org.apache.hadoop.hdfs.web.HftpFileSystem.addDelegationTokenParam(HftpFileSystem.java:337)
   at 
 org.apache.hadoop.hdfs.web.HftpFileSystem.openConnection(HftpFileSystem.java:324)
   at 
 org.apache.hadoop.hdfs.web.HftpFileSystem$LsParser.fetchList(HftpFileSystem.java:457)
   at 
 org.apache.hadoop.hdfs.web.HftpFileSystem$LsParser.getFileStatus(HftpFileSystem.java:472)
   at 
 org.apache.hadoop.hdfs.web.HftpFileSystem.getFileStatus(HftpFileSystem.java:501)
   at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:57)
   at org.apache.hadoop.fs.Globber.glob(Globber.java:248)
   at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1623)
   at 
 org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:77)
   at org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:81)
   at 
 org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:342)
   at org.apache.hadoop.tools.DistCp.execute(DistCp.java:154)
   at org.apache.hadoop.tools.DistCp.run(DistCp.java:121)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
   at org.apache.hadoop.tools.DistCp.main(DistCp.java:390)
 14/09/13 22:07:40 WARN security.UserGroupInformation: 
 PriviledgedActionException as:hadoopu...@xyz.com (auth:KERBEROS) 
 cause:java.io.IOException: Unable to obtain remote token
 14/09/13 22:07:40 ERROR tools.DistCp: Exception encountered 
 java.io.IOException: Unable to obtain remote token
   at 
 org.apache.hadoop.hdfs.tools.DelegationTokenFetcher.getDTfromRemote(DelegationTokenFetcher.java:249)
   at 
 org.apache.hadoop.hdfs.web.HftpFileSystem$2.run(HftpFileSystem.java:252)
   at 
 org.apache.hadoop.hdfs.web.HftpFileSystem$2.run(HftpFileSystem.java:247)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
   at 
 org.apache.hadoop.hdfs.web.HftpFileSystem.getDelegationToken(HftpFileSystem.java:247)
   at 
 

[jira] [Commented] (HDFS-7037) Using distcp to copy data from insecure to secure cluster via hftp doesn't work (branch-2 only)

2015-03-18 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367803#comment-14367803
 ] 

Aaron T. Myers commented on HDFS-7037:
--

bq. My question is how to fix all FileSystem implementations, given that there 
are multiple HCFS implementations (e.g., MapRFs, Ceph) that inherit the public 
FileSystem APIs, all of which sit outside of the repository of hadoop? Should 
we ask them to take care of this issue on their own?

That's up to them, but it still seems obvious to me that we should fix the 
FileSystem implementations that are in our repository. The alternative you've 
proposed, as I mentioned previously, is fixing all _users of FileSystem 
implementations_, of which there are obviously many outside of the Hadoop 
repository.

bq. -1 given the concern on security vulnerability.

Note that in the latest patch allowing connections to fall back to an insecure 
cluster is configurable, and disabled by default. So given that, making this 
change in HFTP is no different than how Hadoop RPC currently works, and thus 
there is no vulnerability being introduced here. This proposed change really 
only amounts to addressing a bug in HFTP that even when client fallback is 
enabled, HFTP still can't connect to insecure clusters, since the client can't 
handle it when a DT can't be fetched.

If the reasoning behind your -1 is really only predicated on this being a 
security vulnerability, then I'd ask you to please consider withdrawing it.

I'd really like to get this fixed in HFTP. It's been burning plenty of users 
for a long time.

 Using distcp to copy data from insecure to secure cluster via hftp doesn't 
 work  (branch-2 only)
 

 Key: HDFS-7037
 URL: https://issues.apache.org/jira/browse/HDFS-7037
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: security, tools
Affects Versions: 2.6.0
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang
 Attachments: HDFS-7037.001.patch


 This is a branch-2 only issue since hftp is only supported there. 
 Issuing distcp hftp://insecureCluster hdfs://secureCluster gave the 
 following failure exception:
 {code}
 14/09/13 22:07:40 INFO tools.DelegationTokenFetcher: Error when dealing 
 remote token:
 java.io.IOException: Error when dealing remote token: Internal Server Error
   at 
 org.apache.hadoop.hdfs.tools.DelegationTokenFetcher.run(DelegationTokenFetcher.java:375)
   at 
 org.apache.hadoop.hdfs.tools.DelegationTokenFetcher.getDTfromRemote(DelegationTokenFetcher.java:238)
   at 
 org.apache.hadoop.hdfs.web.HftpFileSystem$2.run(HftpFileSystem.java:252)
   at 
 org.apache.hadoop.hdfs.web.HftpFileSystem$2.run(HftpFileSystem.java:247)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
   at 
 org.apache.hadoop.hdfs.web.HftpFileSystem.getDelegationToken(HftpFileSystem.java:247)
   at 
 org.apache.hadoop.hdfs.web.TokenAspect.ensureTokenInitialized(TokenAspect.java:140)
   at 
 org.apache.hadoop.hdfs.web.HftpFileSystem.addDelegationTokenParam(HftpFileSystem.java:337)
   at 
 org.apache.hadoop.hdfs.web.HftpFileSystem.openConnection(HftpFileSystem.java:324)
   at 
 org.apache.hadoop.hdfs.web.HftpFileSystem$LsParser.fetchList(HftpFileSystem.java:457)
   at 
 org.apache.hadoop.hdfs.web.HftpFileSystem$LsParser.getFileStatus(HftpFileSystem.java:472)
   at 
 org.apache.hadoop.hdfs.web.HftpFileSystem.getFileStatus(HftpFileSystem.java:501)
   at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:57)
   at org.apache.hadoop.fs.Globber.glob(Globber.java:248)
   at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1623)
   at 
 org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:77)
   at org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:81)
   at 
 org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:342)
   at org.apache.hadoop.tools.DistCp.execute(DistCp.java:154)
   at org.apache.hadoop.tools.DistCp.run(DistCp.java:121)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
   at org.apache.hadoop.tools.DistCp.main(DistCp.java:390)
 14/09/13 22:07:40 WARN security.UserGroupInformation: 
 PriviledgedActionException as:hadoopu...@xyz.com (auth:KERBEROS) 
 cause:java.io.IOException: Unable to obtain remote token
 14/09/13 22:07:40 ERROR tools.DistCp: Exception encountered 
 java.io.IOException: Unable to obtain remote token
   at 
 org.apache.hadoop.hdfs.tools.DelegationTokenFetcher.getDTfromRemote(DelegationTokenFetcher.java:249)
   at 
 

[jira] [Commented] (HDFS-7037) Using distcp to copy data from insecure to secure cluster via hftp doesn't work (branch-2 only)

2015-03-17 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14366004#comment-14366004
 ] 

Aaron T. Myers commented on HDFS-7037:
--

I agree with Yongjun - this fix is basically equivalent to the fix done in 
HDFS-6776, but this time for HFTP instead of WebHDFS. The fix for this issue 
should not be implemented in distcp, as this issue affects all users of HFTP, 
including just directly using it from the FS shell.

+1, the latest patch looks good to me.

[~wheat9] - haven't heard from you on this JIRA in a while, despite Yongjun's 
questions. Are you OK with the patch? If I don't hear back from you in the next 
day or so I'm going to go ahead and commit it.

 Using distcp to copy data from insecure to secure cluster via hftp doesn't 
 work  (branch-2 only)
 

 Key: HDFS-7037
 URL: https://issues.apache.org/jira/browse/HDFS-7037
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: security, tools
Affects Versions: 2.6.0
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang
 Attachments: HDFS-7037.001.patch


 This is a branch-2 only issue since hftp is only supported there. 
 Issuing distcp hftp://insecureCluster hdfs://secureCluster gave the 
 following failure exception:
 {code}
 14/09/13 22:07:40 INFO tools.DelegationTokenFetcher: Error when dealing 
 remote token:
 java.io.IOException: Error when dealing remote token: Internal Server Error
   at 
 org.apache.hadoop.hdfs.tools.DelegationTokenFetcher.run(DelegationTokenFetcher.java:375)
   at 
 org.apache.hadoop.hdfs.tools.DelegationTokenFetcher.getDTfromRemote(DelegationTokenFetcher.java:238)
   at 
 org.apache.hadoop.hdfs.web.HftpFileSystem$2.run(HftpFileSystem.java:252)
   at 
 org.apache.hadoop.hdfs.web.HftpFileSystem$2.run(HftpFileSystem.java:247)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
   at 
 org.apache.hadoop.hdfs.web.HftpFileSystem.getDelegationToken(HftpFileSystem.java:247)
   at 
 org.apache.hadoop.hdfs.web.TokenAspect.ensureTokenInitialized(TokenAspect.java:140)
   at 
 org.apache.hadoop.hdfs.web.HftpFileSystem.addDelegationTokenParam(HftpFileSystem.java:337)
   at 
 org.apache.hadoop.hdfs.web.HftpFileSystem.openConnection(HftpFileSystem.java:324)
   at 
 org.apache.hadoop.hdfs.web.HftpFileSystem$LsParser.fetchList(HftpFileSystem.java:457)
   at 
 org.apache.hadoop.hdfs.web.HftpFileSystem$LsParser.getFileStatus(HftpFileSystem.java:472)
   at 
 org.apache.hadoop.hdfs.web.HftpFileSystem.getFileStatus(HftpFileSystem.java:501)
   at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:57)
   at org.apache.hadoop.fs.Globber.glob(Globber.java:248)
   at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1623)
   at 
 org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:77)
   at org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:81)
   at 
 org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:342)
   at org.apache.hadoop.tools.DistCp.execute(DistCp.java:154)
   at org.apache.hadoop.tools.DistCp.run(DistCp.java:121)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
   at org.apache.hadoop.tools.DistCp.main(DistCp.java:390)
 14/09/13 22:07:40 WARN security.UserGroupInformation: 
 PriviledgedActionException as:hadoopu...@xyz.com (auth:KERBEROS) 
 cause:java.io.IOException: Unable to obtain remote token
 14/09/13 22:07:40 ERROR tools.DistCp: Exception encountered 
 java.io.IOException: Unable to obtain remote token
   at 
 org.apache.hadoop.hdfs.tools.DelegationTokenFetcher.getDTfromRemote(DelegationTokenFetcher.java:249)
   at 
 org.apache.hadoop.hdfs.web.HftpFileSystem$2.run(HftpFileSystem.java:252)
   at 
 org.apache.hadoop.hdfs.web.HftpFileSystem$2.run(HftpFileSystem.java:247)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
   at 
 org.apache.hadoop.hdfs.web.HftpFileSystem.getDelegationToken(HftpFileSystem.java:247)
   at 
 org.apache.hadoop.hdfs.web.TokenAspect.ensureTokenInitialized(TokenAspect.java:140)
   at 
 org.apache.hadoop.hdfs.web.HftpFileSystem.addDelegationTokenParam(HftpFileSystem.java:337)
   at 
 org.apache.hadoop.hdfs.web.HftpFileSystem.openConnection(HftpFileSystem.java:324)
   at 
 org.apache.hadoop.hdfs.web.HftpFileSystem$LsParser.fetchList(HftpFileSystem.java:457)
   at 
 

[jira] [Updated] (HDFS-7682) {{DistributedFileSystem#getFileChecksum}} of a snapshotted file includes non-snapshotted content

2015-03-03 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-7682:
-
   Resolution: Fixed
Fix Version/s: 2.7.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

I've just committed this to trunk and branch-2.

Thanks a lot for the contribution, Charlie, and thanks also to Jing for the 
reviews.

 {{DistributedFileSystem#getFileChecksum}} of a snapshotted file includes 
 non-snapshotted content
 

 Key: HDFS-7682
 URL: https://issues.apache.org/jira/browse/HDFS-7682
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Charles Lamb
Assignee: Charles Lamb
 Fix For: 2.7.0

 Attachments: HDFS-7682.000.patch, HDFS-7682.001.patch, 
 HDFS-7682.002.patch, HDFS-7682.003.patch


 DistributedFileSystem#getFileChecksum of a snapshotted file includes 
 non-snapshotted content.
 The reason why this happens is because DistributedFileSystem#getFileChecksum 
 simply calculates the checksum of all of the CRCs from the blocks in the 
 file. But, in the case of a snapshotted file, we don't want to include data 
 in the checksum that was appended to the last block in the file after the 
 snapshot was taken.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7858) Improve HA Namenode Failover detection on the client

2015-03-02 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14343889#comment-14343889
 ] 

Aaron T. Myers commented on HDFS-7858:
--

Hey folks, sorry to come into this discussion so late.

Given that some folks choose to use HDFS HA without auto failover at all, and 
thus without ZKFCs or ZK in sight, I think we should target any solution to 
this problem to work without ZK. I'm also a little leery of using a cache file, 
as I'm afraid of thundering herd effects (if the file is in HDFS or in a home 
dir which is network mounted), and also don't like the fact that in a large 
cluster all users on all machines might need to populate this cache file.

As such, I'd propose that we pursue either of the following two options:

# Optimistically try to connect to both configured NNs simultaneously, thus 
allowing that one (the standby) may take a while to respond, but also expecting 
that the active will always respond rather promptly. This is similar to 
Kihwal's suggestion.
# Have the client connect to the JNs to determine which NN is the likely the 
active. In my experience, even those who don't use automatic failover basically 
always use the QJM. I think those that continue to use NFS-based HA are very 
few and far between.

Thoughts?

 Improve HA Namenode Failover detection on the client
 

 Key: HDFS-7858
 URL: https://issues.apache.org/jira/browse/HDFS-7858
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Arun Suresh
Assignee: Arun Suresh
 Attachments: HDFS-7858.1.patch


 In an HA deployment, Clients are configured with the hostnames of both the 
 Active and Standby Namenodes.Clients will first try one of the NNs 
 (non-deterministically) and if its a standby NN, then it will respond to the 
 client to retry the request on the other Namenode.
 If the client happens to talks to the Standby first, and the standby is 
 undergoing some GC / is busy, then those clients might not get a response 
 soon enough to try the other NN.
 Proposed Approach to solve this :
 1) Since Zookeeper is already used as the failover controller, the clients 
 could talk to ZK and find out which is the active namenode before contacting 
 it.
 2) Long-lived DFSClients would have a ZK watch configured which fires when 
 there is a failover so they do not have to query ZK everytime to find out the 
 active NN
 2) Clients can also cache the last active NN in the user's home directory 
 (~/.lastNN) so that short-lived clients can try that Namenode first before 
 querying ZK



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7682) {{DistributedFileSystem#getFileChecksum}} of a snapshotted file includes non-snapshotted content

2015-03-02 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14344305#comment-14344305
 ] 

Aaron T. Myers commented on HDFS-7682:
--

Latest patch looks good to me, and seems like it should address [~jingzhao]'s 
comments.

[~jingzhao] - does the patch look good to you as well? If so, I'll be +1 on it.

Thanks gents.

 {{DistributedFileSystem#getFileChecksum}} of a snapshotted file includes 
 non-snapshotted content
 

 Key: HDFS-7682
 URL: https://issues.apache.org/jira/browse/HDFS-7682
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Charles Lamb
Assignee: Charles Lamb
 Attachments: HDFS-7682.000.patch, HDFS-7682.001.patch, 
 HDFS-7682.002.patch, HDFS-7682.003.patch


 DistributedFileSystem#getFileChecksum of a snapshotted file includes 
 non-snapshotted content.
 The reason why this happens is because DistributedFileSystem#getFileChecksum 
 simply calculates the checksum of all of the CRCs from the blocks in the 
 file. But, in the case of a snapshotted file, we don't want to include data 
 in the checksum that was appended to the last block in the file after the 
 snapshot was taken.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7790) Do not create optional fields in DFSInputStream unless they are needed

2015-02-12 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14319122#comment-14319122
 ] 

Aaron T. Myers commented on HDFS-7790:
--

Patch looks good to me, and I'm confident that the test failures are unrelated. 
+1.

 Do not create optional fields in DFSInputStream unless they are needed
 --

 Key: HDFS-7790
 URL: https://issues.apache.org/jira/browse/HDFS-7790
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: dfsclient
Affects Versions: 2.3.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Priority: Minor
 Attachments: HDFS-7790.001.patch


 {{DFSInputStream#oneByteBuffer}} and {{DFSInputStream#extendedReadBuffers}} 
 are only used some of the time, and they are always used under the positional 
 lock.  Let's create them on demand to save memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   3   4   5   6   7   8   9   10   >