[jira] [Comment Edited] (HDFS-7984) webhdfs:// needs to support provided delegation tokens

2015-10-26 Thread Yi Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14973802#comment-14973802
 ] 

Yi Liu edited comment on HDFS-7984 at 10/26/15 6:27 AM:


{quote}
there is no way for an end user to create a token file with multiple tokens 
inside it, short of building custom code to do it..
{quote}

No, I think we have.   When using existing 
{{Credentials#writeTokenStorageFile}}, all tokens of the credentials will be 
persisted, and {{Credentials#readTokenStorageStream}} will read all tokens too. 
 So what we need to do is to add different tokens to the Credentials, use your 
example, there are two hdfs, we can get the delegation tokens for each of them, 
the {{service}} field of the two delegation tokens should be different, we can 
add them to one {{Credentials}} or through the UGI api to add them into one 
{{Credentials}}.
Actually even if we have multiple token files which contain only one token in 
each, we can read them separately through 
{{Credentials#writeTokenStorageFile}}, and add them to one {{Credentials}}.

Back to the original purpose of the JIRA, I don't know why we need to specify 
multiple delegation tokens in one webhdfs://,  the delegation token is used in 
some service to access HDFS on behalf of user, so one hdfs only needs one 
delegation token for one user.  For the distcp example you said, I think 
correct behavior is: user specify delegation token in each webhdfs://, and the 
MR task will add the two delegation tokens to the UGI Credentials of that user. 
  I think this is already supported, I have not tried the distcp on two 
different secured hdfs, if there is some bug, the correct fix is as I said, 
it's not to support multiple delegation tokens in one webhdfs://.
We also should not use HADOOP_TOKEN_FILE_LOCATION to solve the problem.


was (Author: hitliuyi):
{quote}
there is no way for an end user to create a token file with multiple tokens 
inside it, short of building custom code to do it..
{quote}

No, I think we have.   When using existing 
{{Credentials#writeTokenStorageFile}}, all tokens of the credentials will be 
persisted, and {{Credentials#readTokenStorageStream}} will read all tokens too. 
 So what we need to do is to add different tokens to the Credentials, use your 
example, there are two hdfs, we can get the delegation tokens for each of them, 
the {{service}} filed of the two delegation tokens should be different, we can 
add them to one {{Credentials}} or through the UGI api to add them into one 
{{Credentials}}.
Actually even if we have multiple token files which contain only one token in 
each, we can read them separately through 
{{Credentials#writeTokenStorageFile}}, and add them to one {{Credentials}}.

Back to the original purpose of the JIRA, I don't know why we need to specify 
multiple delegation tokens in one webhdfs://,  the delegation token is used in 
some service to access HDFS on behalf of user, so one hdfs only needs one 
delegation token for one user.  For the distcp example you said, I think 
correct behavior is: user specify delegation token in each webhdfs://, and the 
MR task will add the two delegation tokens to the UGI Credentials of that user. 
  I think this is already supported, I have not tried the distcp on two 
different secured hdfs, if there is some bug, the correct fix is as I said, 
it's not to support multiple delegation tokens in one webhdfs://.
We also should not use HADOOP_TOKEN_FILE_LOCATION to solve the problem.

> webhdfs:// needs to support provided delegation tokens
> --
>
> Key: HDFS-7984
> URL: https://issues.apache.org/jira/browse/HDFS-7984
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: webhdfs
>Affects Versions: 3.0.0
>Reporter: Allen Wittenauer
>Assignee: HeeSoo Kim
>Priority: Blocker
> Attachments: HDFS-7984.patch
>
>
> When using the webhdfs:// filesystem (especially from distcp), we need the 
> ability to inject a delegation token rather than webhdfs initialize its own.  
> This would allow for cross-authentication-zone file system accesses.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9079) Erasure coding: preallocate multiple generation stamps and serialize updates from data streamers

2015-10-26 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14973816#comment-14973816
 ] 

Walter Su commented on HDFS-9079:
-

bq. It seems that some failure cases were not considered. For example, what 
happen if the client dies after some of the streamers updated the GS but some 
are not?
If so, last step "6) Updates block on NN", aka "updatePipeline(newGS)" is not 
called. It's the same as the old way for non-ec blocks. Since both GS of 
updated/non-updated blocks are >= than GS of storedBlock, they can be used for 
block recovery.

> Erasure coding: preallocate multiple generation stamps and serialize updates 
> from data streamers
> 
>
> Key: HDFS-9079
> URL: https://issues.apache.org/jira/browse/HDFS-9079
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: erasure-coding
>Affects Versions: HDFS-7285
>Reporter: Zhe Zhang
>Assignee: Zhe Zhang
> Attachments: HDFS-9079-HDFS-7285.00.patch, HDFS-9079.01.patch, 
> HDFS-9079.02.patch, HDFS-9079.03.patch, HDFS-9079.04.patch, HDFS-9079.05.patch
>
>
> A non-striped DataStreamer goes through the following steps in error handling:
> {code}
> 1) Finds error => 2) Asks NN for new GS => 3) Gets new GS from NN => 4) 
> Applies new GS to DN (createBlockOutputStream) => 5) Ack from DN => 6) 
> Updates block on NN
> {code}
> To simplify the above we can preallocate GS when NN creates a new striped 
> block group ({{FSN#createNewBlock}}). For each new striped block group we can 
> reserve {{NUM_PARITY_BLOCKS}} GS's. Then steps 1~3 in the above sequence can 
> be saved. If more than {{NUM_PARITY_BLOCKS}} errors have happened we 
> shouldn't try to further recover anyway.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9261) Erasure Coding: Skip encoding the data cells if all the parity data streamers are failed for the current block group

2015-10-26 Thread Rakesh R (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14973827#comment-14973827
 ] 

Rakesh R commented on HDFS-9261:


Attached simple patch to check the health of parity streamers before 
writeParityCells. Please review the changes!

> Erasure Coding: Skip encoding the data cells if all the parity data streamers 
> are failed for the current block group
> 
>
> Key: HDFS-9261
> URL: https://issues.apache.org/jira/browse/HDFS-9261
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Rakesh R
>Assignee: Rakesh R
>Priority: Minor
> Attachments: HDFS-9261-00.patch
>
>
> {{DFSStripedOutputStream}} will continue writing with minimum number 
> (dataBlockNum) of live datanodes. It won't replace the failed datanodes 
> immediately for the current block group. Consider a case where all the parity 
> data streamers are failed, now it is unnecessary to encode the data block 
> cells and generate the parity data. This is a corner case where it can skip 
> {{writeParityCells()}} step.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9255) Consolidate block recovery related implementation into a single class

2015-10-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14973952#comment-14973952
 ] 

Hadoop QA commented on HDFS-9255:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  18m 30s | Pre-patch trunk has 1 extant 
Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   8m 27s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  11m 34s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 28s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   1m 32s | The applied patch generated  1 
new checkstyle issues (total was 286, now 264). |
| {color:red}-1{color} | whitespace |   0m  0s | The patch has 1  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 36s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 37s | The patch built with 
eclipse:eclipse. |
| {color:red}-1{color} | findbugs |   2m 52s | The patch appears to introduce 1 
new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | native |   3m 31s | Pre-build of native portion |
| {color:green}+1{color} | hdfs tests |  53m 26s | Tests passed in hadoop-hdfs. 
|
| | | 102m 37s | |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-hdfs |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12768675/HDFS-9255.06.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / b57f08c |
| Pre-patch Findbugs warnings | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13189/artifact/patchprocess/trunkFindbugsWarningshadoop-hdfs.html
 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-HDFS-Build/13189/artifact/patchprocess/diffcheckstylehadoop-hdfs.txt
 |
| whitespace | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13189/artifact/patchprocess/whitespace.txt
 |
| Findbugs warnings | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13189/artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13189/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13189/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13189/console |


This message was automatically generated.

> Consolidate block recovery related implementation into a single class
> -
>
> Key: HDFS-9255
> URL: https://issues.apache.org/jira/browse/HDFS-9255
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Walter Su
>Assignee: Walter Su
>Priority: Minor
> Attachments: HDFS-9255.01.patch, HDFS-9255.02.patch, 
> HDFS-9255.03.patch, HDFS-9255.04.patch, HDFS-9255.05.patch, HDFS-9255.06.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9284) fsck command should not print exception trace when file not found

2015-10-26 Thread Jagadesh Kiran N (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14974074#comment-14974074
 ] 

Jagadesh Kiran N commented on HDFS-9284:


[~andrew.wang] Please review it.,

> fsck command should not print exception trace when file not found 
> --
>
> Key: HDFS-9284
> URL: https://issues.apache.org/jira/browse/HDFS-9284
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Jagadesh Kiran N
>Assignee: Jagadesh Kiran N
> Attachments: HDFS-9284_00.patch, HDFS-9284_01.patch, 
> HDFS-9284_02.patch
>
>
> when file doesnt exist fsck throws exception 
> {code}
> ./hdfs fsck /kiran
> {code}
> the following exception occurs 
> {code}
> WARN util.NativeCodeLoader: Unable to load native-hadoop library for your 
> platform... using builtin-java classes where applicable
> FileSystem is inaccessible due to:
> java.io.FileNotFoundException: File does not exist: /kiran
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1273)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1265)
> at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1265)
> at org.apache.hadoop.fs.FileSystem.resolvePath(FileSystem.java:755)
> at org.apache.hadoop.hdfs.tools.DFSck.getResolvedPath(DFSck.java:236)
> at org.apache.hadoop.hdfs.tools.DFSck.doWork(DFSck.java:316)
> at org.apache.hadoop.hdfs.tools.DFSck.access$000(DFSck.java:73)
> at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:155)
> at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:152)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1667)
> at org.apache.hadoop.hdfs.tools.DFSck.run(DFSck.java:151)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
> at org.apache.hadoop.hdfs.tools.DFSck.main(DFSck.java:383)
> {code}
> but only {code } File does not exist: /kiran {code} error message should be 
> thrown
> {code} } catch (IOException ioe) {
> System.err.println("FileSystem is inaccessible due to:\n"
> + StringUtils.stringifyException(ioe));
> }{code}
> i think it should use ioe.getmessage() method
> {code}
> } catch (IOException ioe) {
> System.err.println("FileSystem is inaccessible due to:\n"
> + StringUtils.stringifyException(ioe.getmessage()));
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7984) webhdfs:// needs to support provided delegation tokens

2015-10-26 Thread Yi Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14973802#comment-14973802
 ] 

Yi Liu commented on HDFS-7984:
--

{quote}
there is no way for an end user to create a token file with multiple tokens 
inside it, short of building custom code to do it..
{quote}

No, I think we have.   When using existing 
{{Credentials#writeTokenStorageFile}}, all tokens of the credentials will be 
persisted, and {{Credentials#readTokenStorageStream}} will read all tokens too. 
 So what we need to do is to add different tokens to the Credentials, use your 
example, there are two hdfs, we can get the delegation tokens for each of them, 
the {{service}} filed of the two delegation tokens should be different, we can 
add them to one {{Credentials}} or through the UGI api to add them into one 
{{Credentials}}.
Actually even if we have multiple token files which contain only one token in 
each, we can read them separately through 
{{Credentials#writeTokenStorageFile}}, and add them to one {{Credentials}}.

Back to the original purpose of the JIRA, I don't know why we need to specify 
multiple delegation tokens in one webhdfs://,  the delegation token is used in 
some service to access HDFS on behalf of user, so one hdfs only needs one 
delegation token for one user.  For the distcp example you said, I think 
correct behavior is: user specify delegation token in each webhdfs://, and the 
MR task will add the two delegation tokens to the UGI Credentials of that user. 
  I think this is already supported, I have not tried the distcp on two 
different secured hdfs, if there is some bug, the correct fix is as I said, 
it's not to support multiple delegation tokens in one webhdfs://.
We also should not use HADOOP_TOKEN_FILE_LOCATION to solve the problem.

> webhdfs:// needs to support provided delegation tokens
> --
>
> Key: HDFS-7984
> URL: https://issues.apache.org/jira/browse/HDFS-7984
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: webhdfs
>Affects Versions: 3.0.0
>Reporter: Allen Wittenauer
>Assignee: HeeSoo Kim
>Priority: Blocker
> Attachments: HDFS-7984.patch
>
>
> When using the webhdfs:// filesystem (especially from distcp), we need the 
> ability to inject a delegation token rather than webhdfs initialize its own.  
> This would allow for cross-authentication-zone file system accesses.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9229) Expose size of NameNode directory as a metric

2015-10-26 Thread Surendra Singh Lilhore (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Surendra Singh Lilhore updated HDFS-9229:
-
Attachment: HDFS-9229.005.patch

Thanks [~rakeshr] for review and suggestions.

Attached updated patch. Addressed above review comments and accordingly changed 
test code..

Please review ...

> Expose size of NameNode directory as a metric
> -
>
> Key: HDFS-9229
> URL: https://issues.apache.org/jira/browse/HDFS-9229
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.7.1
>Reporter: Zhe Zhang
>Assignee: Surendra Singh Lilhore
>Priority: Minor
> Attachments: HDFS-9229.001.patch, HDFS-9229.002.patch, 
> HDFS-9229.003.patch, HDFS-9229.004.patch, HDFS-9229.005.patch
>
>
> Useful for admins in reserving / managing NN local file system space. Also 
> useful when transferring NN backups.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7984) webhdfs:// needs to support provided delegation tokens

2015-10-26 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14973825#comment-14973825
 ] 

Allen Wittenauer commented on HDFS-7984:


bq. No, I think we have. When using existing Credentials#writeTokenStorageFile 
... (a bunch of other verbage)

This demonstrates the big disconnect between what we see and what our users 
see. 

You don't seriously expect some data scientist or ops person to write code for 
this, do you?  Yes, there's an API, but where are the command line utilities to 
use it?  Where's the example code? Oh that's right, we expect everyone to build 
their own utilities.  Is it because the APIs are the only thing that ever stay 
stable?  Unless we switch Java versions in the middle of a branch. Or, I guess, 
at least until we move the classes out of jars.  Or, ...

(... and let's not forget that this is in some of the LEAST user-friendly bits 
of the source.  Even long time Hadoop devs shudder in fear when dealing with 
the UGI and token code ...)

bq. Back to the original purpose of the JIRA, I don't know why we need to 
specify multiple delegation tokens in one webhdfs://, the delegation token is 
used in some service to access HDFS on behalf of user, so one hdfs only needs 
one delegation token for one user.

I think you're greatly simplifying the situation.  In our use cases, we almost 
always have multiple realms in play where cross-realm is not and cannot be 
configured. We also don't trust our jobs to work with the given HDFS JARs since 
Hadoop backward compatibility is pretty much a joke at this point.  (See above) 
So there are often two WebHDFS URLs given on the distcp command line.

It's also not unusual to have a *third* cluster in play to act as an 
intermediary.  So yes, there are definitely real world use cases where 
supplying multiple DTs are needed.

bq. user specify delegation token in each webhdfs://,

... which, today, the only way a user can do this is via 
HADOOP_TOKEN_FILE_LOCATION... which I think everyone agrees is pretty terrible. 
Of course,  that's after they build an application to actually create a file 
with multiple tokens.  

bq.  We also should not use HADOOP_TOKEN_FILE_LOCATION to solve the problem.

... which ultimately brings us back to this and a handful of other patches 
we're working on.

> webhdfs:// needs to support provided delegation tokens
> --
>
> Key: HDFS-7984
> URL: https://issues.apache.org/jira/browse/HDFS-7984
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: webhdfs
>Affects Versions: 3.0.0
>Reporter: Allen Wittenauer
>Assignee: HeeSoo Kim
>Priority: Blocker
> Attachments: HDFS-7984.patch
>
>
> When using the webhdfs:// filesystem (especially from distcp), we need the 
> ability to inject a delegation token rather than webhdfs initialize its own.  
> This would allow for cross-authentication-zone file system accesses.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9261) Erasure Coding: Skip encoding the data cells if all the parity data streamers are failed for the current block group

2015-10-26 Thread Rakesh R (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakesh R updated HDFS-9261:
---
Attachment: HDFS-9261-00.patch

> Erasure Coding: Skip encoding the data cells if all the parity data streamers 
> are failed for the current block group
> 
>
> Key: HDFS-9261
> URL: https://issues.apache.org/jira/browse/HDFS-9261
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Rakesh R
>Assignee: Rakesh R
>Priority: Minor
> Attachments: HDFS-9261-00.patch
>
>
> {{DFSStripedOutputStream}} will continue writing with minimum number 
> (dataBlockNum) of live datanodes. It won't replace the failed datanodes 
> immediately for the current block group. Consider a case where all the parity 
> data streamers are failed, now it is unnecessary to encode the data block 
> cells and generate the parity data. This is a corner case where it can skip 
> {{writeParityCells()}} step.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8631) WebHDFS : Support list/setQuota

2015-10-26 Thread Surendra Singh Lilhore (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Surendra Singh Lilhore updated HDFS-8631:
-
Status: Patch Available  (was: Open)

Resubmitting patch to trigger QA build.. 

> WebHDFS : Support list/setQuota
> ---
>
> Key: HDFS-8631
> URL: https://issues.apache.org/jira/browse/HDFS-8631
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: nijel
>Assignee: Surendra Singh Lilhore
> Attachments: HDFS-8631-001.patch, HDFS-8631-002.patch
>
>
> User is able do quota management from filesystem object. Same operation can 
> be allowed trough REST API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8631) WebHDFS : Support list/setQuota

2015-10-26 Thread Surendra Singh Lilhore (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Surendra Singh Lilhore updated HDFS-8631:
-
Status: Open  (was: Patch Available)

> WebHDFS : Support list/setQuota
> ---
>
> Key: HDFS-8631
> URL: https://issues.apache.org/jira/browse/HDFS-8631
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: nijel
>Assignee: Surendra Singh Lilhore
> Attachments: HDFS-8631-001.patch, HDFS-8631-002.patch
>
>
> User is able do quota management from filesystem object. Same operation can 
> be allowed trough REST API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9276) Failed to Update HDFS Delegation Token for long running application in HA mode

2015-10-26 Thread Liangliang Gu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14973851#comment-14973851
 ] 

Liangliang Gu commented on HDFS-9276:
-

*FileSystem#addDelegationTokens* will return *Token for NameNode HA*.
Is there any public and stable API to update *Token for NameNode 1* and *Token 
for NameNode2*?
!https://issues.apache.org/jira/secure/attachment/12768210/debug1.PNG!

> Failed to Update HDFS Delegation Token for long running application in HA mode
> --
>
> Key: HDFS-9276
> URL: https://issues.apache.org/jira/browse/HDFS-9276
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: fs, ha, security
>Affects Versions: 2.7.1
>Reporter: Liangliang Gu
>Assignee: Liangliang Gu
> Attachments: HDFS-9276.01.patch, HDFS-9276.02.patch, 
> HDFS-9276.03.patch, debug1.PNG, debug2.PNG
>
>
> The Scenario is as follows:
> 1. NameNode HA is enabled.
> 2. Kerberos is enabled.
> 3. HDFS Delegation Token (not Keytab or TGT) is used to communicate with 
> NameNode.
> 4. We want to update the HDFS Delegation Token for long running applicatons. 
> HDFS Client will generate private tokens for each NameNode. When we update 
> the HDFS Delegation Token, these private tokens will not be updated, which 
> will cause token expired.
> This bug can be reproduced by the following program:
> {code}
> import java.security.PrivilegedExceptionAction
> import org.apache.hadoop.conf.Configuration
> import org.apache.hadoop.fs.{FileSystem, Path}
> import org.apache.hadoop.security.UserGroupInformation
> object HadoopKerberosTest {
>   def main(args: Array[String]): Unit = {
> val keytab = "/path/to/keytab/xxx.keytab"
> val principal = "x...@abc.com"
> val creds1 = new org.apache.hadoop.security.Credentials()
> val ugi1 = 
> UserGroupInformation.loginUserFromKeytabAndReturnUGI(principal, keytab)
> ugi1.doAs(new PrivilegedExceptionAction[Void] {
>   // Get a copy of the credentials
>   override def run(): Void = {
> val fs = FileSystem.get(new Configuration())
> fs.addDelegationTokens("test", creds1)
> null
>   }
> })
> val ugi = UserGroupInformation.createRemoteUser("test")
> ugi.addCredentials(creds1)
> ugi.doAs(new PrivilegedExceptionAction[Void] {
>   // Get a copy of the credentials
>   override def run(): Void = {
> var i = 0
> while (true) {
>   val creds1 = new org.apache.hadoop.security.Credentials()
>   val ugi1 = 
> UserGroupInformation.loginUserFromKeytabAndReturnUGI(principal, keytab)
>   ugi1.doAs(new PrivilegedExceptionAction[Void] {
> // Get a copy of the credentials
> override def run(): Void = {
>   val fs = FileSystem.get(new Configuration())
>   fs.addDelegationTokens("test", creds1)
>   null
> }
>   })
>   UserGroupInformation.getCurrentUser.addCredentials(creds1)
>   val fs = FileSystem.get( new Configuration())
>   i += 1
>   println()
>   println(i)
>   println(fs.listFiles(new Path("/user"), false))
>   Thread.sleep(60 * 1000)
> }
> null
>   }
> })
>   }
> }
> {code}
> To reproduce the bug, please set the following configuration to Name Node:
> {code}
> dfs.namenode.delegation.token.max-lifetime = 10min
> dfs.namenode.delegation.key.update-interval = 3min
> dfs.namenode.delegation.token.renew-interval = 3min
> {code}
> The bug will occure after 3 minutes.
> The stacktrace is:
> {code}
> Exception in thread "main" 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  token (HDFS_DELEGATION_TOKEN token 330156 for test) is expired
>   at org.apache.hadoop.ipc.Client.call(Client.java:1347)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1300)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>   at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:651)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>   at com.sun.proxy.$Proxy10.getFileInfo(Unknown 

[jira] [Updated] (HDFS-9255) Consolidate block recovery related implementation into a single class

2015-10-26 Thread Walter Su (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Walter Su updated HDFS-9255:

Attachment: HDFS-9255.06.patch

bq. One question is: since DataNode#blockRecoveryWorker is not declared as 
final, can we make sure the BPServiceActor thread can always see its non-null 
value when calling getBlockRecoveryWorker?
No. uploaded 06 patch to address that.

> Consolidate block recovery related implementation into a single class
> -
>
> Key: HDFS-9255
> URL: https://issues.apache.org/jira/browse/HDFS-9255
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Walter Su
>Assignee: Walter Su
>Priority: Minor
> Attachments: HDFS-9255.01.patch, HDFS-9255.02.patch, 
> HDFS-9255.03.patch, HDFS-9255.04.patch, HDFS-9255.05.patch, HDFS-9255.06.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9276) Failed to Update HDFS Delegation Token for long running application in HA mode

2015-10-26 Thread Liangliang Gu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14973857#comment-14973857
 ] 

Liangliang Gu commented on HDFS-9276:
-

ps: *Token for NameNode 1* and *Token for NameNode2* is generated by DFSClient 
and is PrivateToken.

> Failed to Update HDFS Delegation Token for long running application in HA mode
> --
>
> Key: HDFS-9276
> URL: https://issues.apache.org/jira/browse/HDFS-9276
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: fs, ha, security
>Affects Versions: 2.7.1
>Reporter: Liangliang Gu
>Assignee: Liangliang Gu
> Attachments: HDFS-9276.01.patch, HDFS-9276.02.patch, 
> HDFS-9276.03.patch, debug1.PNG, debug2.PNG
>
>
> The Scenario is as follows:
> 1. NameNode HA is enabled.
> 2. Kerberos is enabled.
> 3. HDFS Delegation Token (not Keytab or TGT) is used to communicate with 
> NameNode.
> 4. We want to update the HDFS Delegation Token for long running applicatons. 
> HDFS Client will generate private tokens for each NameNode. When we update 
> the HDFS Delegation Token, these private tokens will not be updated, which 
> will cause token expired.
> This bug can be reproduced by the following program:
> {code}
> import java.security.PrivilegedExceptionAction
> import org.apache.hadoop.conf.Configuration
> import org.apache.hadoop.fs.{FileSystem, Path}
> import org.apache.hadoop.security.UserGroupInformation
> object HadoopKerberosTest {
>   def main(args: Array[String]): Unit = {
> val keytab = "/path/to/keytab/xxx.keytab"
> val principal = "x...@abc.com"
> val creds1 = new org.apache.hadoop.security.Credentials()
> val ugi1 = 
> UserGroupInformation.loginUserFromKeytabAndReturnUGI(principal, keytab)
> ugi1.doAs(new PrivilegedExceptionAction[Void] {
>   // Get a copy of the credentials
>   override def run(): Void = {
> val fs = FileSystem.get(new Configuration())
> fs.addDelegationTokens("test", creds1)
> null
>   }
> })
> val ugi = UserGroupInformation.createRemoteUser("test")
> ugi.addCredentials(creds1)
> ugi.doAs(new PrivilegedExceptionAction[Void] {
>   // Get a copy of the credentials
>   override def run(): Void = {
> var i = 0
> while (true) {
>   val creds1 = new org.apache.hadoop.security.Credentials()
>   val ugi1 = 
> UserGroupInformation.loginUserFromKeytabAndReturnUGI(principal, keytab)
>   ugi1.doAs(new PrivilegedExceptionAction[Void] {
> // Get a copy of the credentials
> override def run(): Void = {
>   val fs = FileSystem.get(new Configuration())
>   fs.addDelegationTokens("test", creds1)
>   null
> }
>   })
>   UserGroupInformation.getCurrentUser.addCredentials(creds1)
>   val fs = FileSystem.get( new Configuration())
>   i += 1
>   println()
>   println(i)
>   println(fs.listFiles(new Path("/user"), false))
>   Thread.sleep(60 * 1000)
> }
> null
>   }
> })
>   }
> }
> {code}
> To reproduce the bug, please set the following configuration to Name Node:
> {code}
> dfs.namenode.delegation.token.max-lifetime = 10min
> dfs.namenode.delegation.key.update-interval = 3min
> dfs.namenode.delegation.token.renew-interval = 3min
> {code}
> The bug will occure after 3 minutes.
> The stacktrace is:
> {code}
> Exception in thread "main" 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  token (HDFS_DELEGATION_TOKEN token 330156 for test) is expired
>   at org.apache.hadoop.ipc.Client.call(Client.java:1347)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1300)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>   at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:651)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>   at com.sun.proxy.$Proxy10.getFileInfo(Unknown Source)
>   at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1679)
>   at 
> 

[jira] [Updated] (HDFS-9304) Add HdfsClientConfigKeys class to TestHdfsConfigFields#configurationClasses

2015-10-26 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-9304:
-
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.8.0
   Status: Resolved  (was: Patch Available)

I've committed the patch to trunk and branch-2. Thanks [~liuml07] for the 
contribution.

> Add HdfsClientConfigKeys class to TestHdfsConfigFields#configurationClasses
> ---
>
> Key: HDFS-9304
> URL: https://issues.apache.org/jira/browse/HDFS-9304
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: build
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
> Fix For: 2.8.0
>
> Attachments: HDFS-9304.000.patch
>
>
> *tl;dr* Since {{HdfsClientConfigKeys}} holds client side config keys, we need 
> to add this class to {{TestHdfsConfigFields#configurationClasses}}.
> Now the {{TestHdfsConfigFields}} unit test passes because {{DFSConfigKeys}} 
> still contains all the client side config keys, though marked @deprecated. As 
> we add new client config keys (e.g. [HDFS-9259]), the unit test will fail 
> with the following error:
> {quote}
> hdfs-default.xml has 1 properties missing in  class 
> org.apache.hadoop.hdfs.DFSConfigKeys
> {quote}
> If the logic is to make the {{DFSConfigKeys}} and {{HdfsClientConfigKeys}} 
> together cover all config keys in {{hdfs-default.xml}}, we need to put both 
> of them in {{TestHdfsConfigFields#configurationClasses}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9308) Add truncateMeta() to MiniDFSCluster

2015-10-26 Thread Tony Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tony Wu updated HDFS-9308:
--
Attachment: HDFS-9308.001.patch

In this patch:
* Add {{truncateMeta()}} method to truncate meta data files on DataNodes.
* Modify {{TestLeaseRecovery#testBlockRecoveryWithLessMetafile}} to use the new 
API.
* Enhance {{TestLeaseRecovery#testBlockRecoveryWithLessMetafile}} to check the 
file size after lease recovery is complete. Truncating the metadata file on 
DataNodes will effectively reduce the file size after lease recovery. The test 
should verify the new file size as well.

> Add truncateMeta() to MiniDFSCluster
> 
>
> Key: HDFS-9308
> URL: https://issues.apache.org/jira/browse/HDFS-9308
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: HDFS, test
>Affects Versions: 2.7.1
>Reporter: Tony Wu
>Assignee: Tony Wu
>Priority: Minor
> Attachments: HDFS-9308.001.patch
>
>
> HDFS-9188 introduced {{corruptMeta()}} method to make corrupting the metadata 
> file filesystem agnostic. There should also be a {{truncateMeta()}} method to 
> allow truncation of metadata files on DataNodes without writing code that's 
> specific to underling file system. 
> This will be useful for tests such as 
> {{TestLeaseRecovery#testBlockRecoveryWithLessMetafile}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9308) Add truncateMeta() to MiniDFSCluster

2015-10-26 Thread Tony Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tony Wu updated HDFS-9308:
--
Status: Patch Available  (was: Open)

> Add truncateMeta() to MiniDFSCluster
> 
>
> Key: HDFS-9308
> URL: https://issues.apache.org/jira/browse/HDFS-9308
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: HDFS, test
>Affects Versions: 2.7.1
>Reporter: Tony Wu
>Assignee: Tony Wu
>Priority: Minor
> Attachments: HDFS-9308.001.patch
>
>
> HDFS-9188 introduced {{corruptMeta()}} method to make corrupting the metadata 
> file filesystem agnostic. There should also be a {{truncateMeta()}} method to 
> allow truncation of metadata files on DataNodes without writing code that's 
> specific to underling file system. 
> This will be useful for tests such as 
> {{TestLeaseRecovery#testBlockRecoveryWithLessMetafile}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-3745) fsck prints that it's using KSSL even when it's in fact using SPNEGO for authentication

2015-10-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14975189#comment-14975189
 ] 

Hadoop QA commented on HDFS-3745:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  26m 50s | Findbugs (version ) appears to 
be broken on trunk. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |  11m 16s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  14m  2s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 32s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   3m 30s | There were no new checkstyle 
issues. |
| {color:red}-1{color} | whitespace |   0m  0s | The patch has 1  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   2m 20s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 44s | The patch built with 
eclipse:eclipse. |
| {color:red}-1{color} | findbugs |   9m 13s | The patch appears to introduce 1 
new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | common tests |   8m 25s | Tests failed in 
hadoop-common. |
| {color:red}-1{color} | mapreduce tests |   5m 53s | Tests failed in 
hadoop-mapreduce-client-hs. |
| {color:red}-1{color} | yarn tests |   0m 23s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| {color:red}-1{color} | hdfs tests |   0m 34s | Tests failed in hadoop-hdfs. |
| | |  83m 46s | |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-hdfs |
| Failed unit tests | hadoop.metrics2.impl.TestGangliaMetrics |
|   | hadoop.ha.TestZKFailoverController |
|   | hadoop.metrics2.impl.TestMetricsSystemImpl |
|   | hadoop.mapreduce.v2.hs.TestHistoryServerFileSystemStateStoreService |
|   | hadoop.mapreduce.v2.hs.TestJobHistoryEvents |
| Timed out tests | org.apache.hadoop.mapreduce.v2.hs.TestJobHistoryParsing |
| Failed build | hadoop-yarn-server-resourcemanager |
|   | hadoop-hdfs |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12751156/HDFS-3745.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 2f1eb2b |
| whitespace | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13200/artifact/patchprocess/whitespace.txt
 |
| Findbugs warnings | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13200/artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
 |
| hadoop-common test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13200/artifact/patchprocess/testrun_hadoop-common.txt
 |
| hadoop-mapreduce-client-hs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13200/artifact/patchprocess/testrun_hadoop-mapreduce-client-hs.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13200/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13200/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13200/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13200/console |


This message was automatically generated.

> fsck prints that it's using KSSL even when it's in fact using SPNEGO for 
> authentication
> ---
>
> Key: HDFS-3745
> URL: https://issues.apache.org/jira/browse/HDFS-3745
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client, security
>Affects Versions: 2.0.0-alpha
>Reporter: Aaron T. Myers
>Priority: Trivial
>  Labels: newbie
> Attachments: HDFS-3745.patch
>
>
> In branch-2 (which exclusively uses SPNEGO for HTTP authentication) and in 
> branch-1 (which can optionally use SPNEGO for HTTP authentication), running 
> fsck will print the following, which isn't quite right:
> {quote}
> FSCK started by hdfs (auth:KERBEROS_SSL) from...
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9284) fsck command should not print exception trace when file not found

2015-10-26 Thread Andrew Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wang updated HDFS-9284:
--
   Resolution: Fixed
Fix Version/s: 2.8.0
   Status: Resolved  (was: Patch Available)

Committed to trunk and branch-2, thanks for finding and fixing this 
[~jagadesh.kiran]!

> fsck command should not print exception trace when file not found 
> --
>
> Key: HDFS-9284
> URL: https://issues.apache.org/jira/browse/HDFS-9284
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Jagadesh Kiran N
>Assignee: Jagadesh Kiran N
> Fix For: 2.8.0
>
> Attachments: HDFS-9284_00.patch, HDFS-9284_01.patch, 
> HDFS-9284_02.patch
>
>
> when file doesnt exist fsck throws exception 
> {code}
> ./hdfs fsck /kiran
> {code}
> the following exception occurs 
> {code}
> WARN util.NativeCodeLoader: Unable to load native-hadoop library for your 
> platform... using builtin-java classes where applicable
> FileSystem is inaccessible due to:
> java.io.FileNotFoundException: File does not exist: /kiran
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1273)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1265)
> at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1265)
> at org.apache.hadoop.fs.FileSystem.resolvePath(FileSystem.java:755)
> at org.apache.hadoop.hdfs.tools.DFSck.getResolvedPath(DFSck.java:236)
> at org.apache.hadoop.hdfs.tools.DFSck.doWork(DFSck.java:316)
> at org.apache.hadoop.hdfs.tools.DFSck.access$000(DFSck.java:73)
> at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:155)
> at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:152)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1667)
> at org.apache.hadoop.hdfs.tools.DFSck.run(DFSck.java:151)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
> at org.apache.hadoop.hdfs.tools.DFSck.main(DFSck.java:383)
> {code}
> but only {code } File does not exist: /kiran {code} error message should be 
> thrown
> {code} } catch (IOException ioe) {
> System.err.println("FileSystem is inaccessible due to:\n"
> + StringUtils.stringifyException(ioe));
> }{code}
> i think it should use ioe.getmessage() method
> {code}
> } catch (IOException ioe) {
> System.err.println("FileSystem is inaccessible due to:\n"
> + StringUtils.stringifyException(ioe.getmessage()));
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9079) Erasure coding: preallocate multiple generation stamps and serialize updates from data streamers

2015-10-26 Thread Zhe Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang updated HDFS-9079:

Description: 
A non-striped DataStreamer goes through the following steps in error handling:
{code}
1) Finds error => 2) Asks NN for new GS => 3) Gets new GS from NN => 4) Applies 
new GS to DN (createBlockOutputStream) => 5) Ack from DN => 6) Updates block on 
NN
{code}

With multiple streamer threads run in parallel, we need to correctly handle a 
large number of possible combinations of interleaved thread events. For 
example, {{streamer_B}} starts step 2 in between events {{streamer_A.2}} and 
{{streamer_A.3}}.

HDFS-9040 moves steps 1, 2, 3, 6 from streamer to {{DFSStripedOutputStream}}. 
This JIRA proposes some further optimizations based on HDFS-9040:

# We can preallocate GS when NN creates a new striped block group 
({{FSN#createNewBlock}}). For each new striped block group we can reserve 
{{NUM_PARITY_BLOCKS}} GS's. If more than {{NUM_PARITY_BLOCKS}} errors have 
happened we shouldn't try to further recover anyway.
# We can use a dedicated event processor to offload the error handling logic 
from {{DFSStripedOutputStream}}, which is not a long running daemon.
# We can limit the lifespan of a streamer to be a single block. A streamer ends 
either after finishing the current block or when encountering a DN failure.

With the proposed change, a {{StripedDataStreamer}}'s flow becomes:
{code}
1) Finds DN error => 2) Notify coordinator (async, not waiting for response) => 
terminates
1) Finds external error => 2) Applies new GS to DN (createBlockOutputStream) => 
3) Ack from DN => 4) Notify coordinator (async, not waiting for response)
{code}

  was:
A non-striped DataStreamer goes through the following steps in error handling:
{code}
1) Finds error => 2) Asks NN for new GS => 3) Gets new GS from NN => 4) Applies 
new GS to DN (createBlockOutputStream) => 5) Ack from DN => 6) Updates block on 
NN
{code}
To simplify the above we can preallocate GS when NN creates a new striped block 
group ({{FSN#createNewBlock}}). For each new striped block group we can reserve 
{{NUM_PARITY_BLOCKS}} GS's. Then steps 1~3 in the above sequence can be saved. 
If more than {{NUM_PARITY_BLOCKS}} errors have happened we shouldn't try to 
further recover anyway.


> Erasure coding: preallocate multiple generation stamps and serialize updates 
> from data streamers
> 
>
> Key: HDFS-9079
> URL: https://issues.apache.org/jira/browse/HDFS-9079
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: erasure-coding
>Affects Versions: HDFS-7285
>Reporter: Zhe Zhang
>Assignee: Zhe Zhang
> Attachments: HDFS-9079-HDFS-7285.00.patch, HDFS-9079.01.patch, 
> HDFS-9079.02.patch, HDFS-9079.03.patch, HDFS-9079.04.patch, HDFS-9079.05.patch
>
>
> A non-striped DataStreamer goes through the following steps in error handling:
> {code}
> 1) Finds error => 2) Asks NN for new GS => 3) Gets new GS from NN => 4) 
> Applies new GS to DN (createBlockOutputStream) => 5) Ack from DN => 6) 
> Updates block on NN
> {code}
> With multiple streamer threads run in parallel, we need to correctly handle a 
> large number of possible combinations of interleaved thread events. For 
> example, {{streamer_B}} starts step 2 in between events {{streamer_A.2}} and 
> {{streamer_A.3}}.
> HDFS-9040 moves steps 1, 2, 3, 6 from streamer to {{DFSStripedOutputStream}}. 
> This JIRA proposes some further optimizations based on HDFS-9040:
> # We can preallocate GS when NN creates a new striped block group 
> ({{FSN#createNewBlock}}). For each new striped block group we can reserve 
> {{NUM_PARITY_BLOCKS}} GS's. If more than {{NUM_PARITY_BLOCKS}} errors have 
> happened we shouldn't try to further recover anyway.
> # We can use a dedicated event processor to offload the error handling logic 
> from {{DFSStripedOutputStream}}, which is not a long running daemon.
> # We can limit the lifespan of a streamer to be a single block. A streamer 
> ends either after finishing the current block or when encountering a DN 
> failure.
> With the proposed change, a {{StripedDataStreamer}}'s flow becomes:
> {code}
> 1) Finds DN error => 2) Notify coordinator (async, not waiting for response) 
> => terminates
> 1) Finds external error => 2) Applies new GS to DN (createBlockOutputStream) 
> => 3) Ack from DN => 4) Notify coordinator (async, not waiting for response)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9310) TestDataNodeHotSwapVolumes fails occasionally

2015-10-26 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-9310:

Description: 
TestDataNodeHotSwapVolumes fails occasionally in Jenkins and locally. e.g. 
https://builds.apache.org/job/PreCommit-HDFS-Build/13197/testReport/org.apache.hadoop.hdfs.server.datanode/TestDataNodeHotSwapVolumes/testRemoveVolumeBeingWritten/

*Error Message*

Timed out waiting for /test to reach 3 replicas

*Stacktrace*

java.util.concurrent.TimeoutException: Timed out waiting for /test to reach 3 
replicas
at 
org.apache.hadoop.hdfs.DFSTestUtil.waitReplication(DFSTestUtil.java:768)
at 
org.apache.hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes.testRemoveVolumeBeingWrittenForDatanode(TestDataNodeHotSwapVolumes.java:644)
at 
org.apache.hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes.testRemoveVolumeBeingWritten(TestDataNodeHotSwapVolumes.java:569)

  was:
TestDataNodeHotSwapVolumes fails occasionally in Jenkins and locally. e.g. 
https://builds.apache.org/job/PreCommit-HDFS-Build/13197/testReport/org.apache.hadoop.hdfs.server.datanode/TestDataNodeHotSwapVolumes/testRemoveVolumeBeingWritten/

{code}
Error Message

Timed out waiting for /test to reach 3 replicas
Stacktrace

java.util.concurrent.TimeoutException: Timed out waiting for /test to reach 3 
replicas
at 
org.apache.hadoop.hdfs.DFSTestUtil.waitReplication(DFSTestUtil.java:768)
at 
org.apache.hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes.testRemoveVolumeBeingWrittenForDatanode(TestDataNodeHotSwapVolumes.java:644)
at 
org.apache.hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes.testRemoveVolumeBeingWritten(TestDataNodeHotSwapVolumes.java:569)
{code}


> TestDataNodeHotSwapVolumes fails occasionally
> -
>
> Key: HDFS-9310
> URL: https://issues.apache.org/jira/browse/HDFS-9310
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.8.0
>Reporter: Arpit Agarwal
>
> TestDataNodeHotSwapVolumes fails occasionally in Jenkins and locally. e.g. 
> https://builds.apache.org/job/PreCommit-HDFS-Build/13197/testReport/org.apache.hadoop.hdfs.server.datanode/TestDataNodeHotSwapVolumes/testRemoveVolumeBeingWritten/
> *Error Message*
> Timed out waiting for /test to reach 3 replicas
> *Stacktrace*
> java.util.concurrent.TimeoutException: Timed out waiting for /test to reach 3 
> replicas
>   at 
> org.apache.hadoop.hdfs.DFSTestUtil.waitReplication(DFSTestUtil.java:768)
>   at 
> org.apache.hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes.testRemoveVolumeBeingWrittenForDatanode(TestDataNodeHotSwapVolumes.java:644)
>   at 
> org.apache.hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes.testRemoveVolumeBeingWritten(TestDataNodeHotSwapVolumes.java:569)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9304) Add HdfsClientConfigKeys class to TestHdfsConfigFields#configurationClasses

2015-10-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14974896#comment-14974896
 ] 

Hudson commented on HDFS-9304:
--

FAILURE: Integrated in Hadoop-trunk-Commit #8709 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8709/])
HDFS-9304. Add HdfsClientConfigKeys class to (wheat9: rev 
67e3d75aed1c1a90cabffc552d5743a69ea28b54)
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/tools/TestHdfsConfigFields.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


> Add HdfsClientConfigKeys class to TestHdfsConfigFields#configurationClasses
> ---
>
> Key: HDFS-9304
> URL: https://issues.apache.org/jira/browse/HDFS-9304
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: build
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
> Fix For: 2.8.0
>
> Attachments: HDFS-9304.000.patch
>
>
> *tl;dr* Since {{HdfsClientConfigKeys}} holds client side config keys, we need 
> to add this class to {{TestHdfsConfigFields#configurationClasses}}.
> Now the {{TestHdfsConfigFields}} unit test passes because {{DFSConfigKeys}} 
> still contains all the client side config keys, though marked @deprecated. As 
> we add new client config keys (e.g. [HDFS-9259]), the unit test will fail 
> with the following error:
> {quote}
> hdfs-default.xml has 1 properties missing in  class 
> org.apache.hadoop.hdfs.DFSConfigKeys
> {quote}
> If the logic is to make the {{DFSConfigKeys}} and {{HdfsClientConfigKeys}} 
> together cover all config keys in {{hdfs-default.xml}}, we need to put both 
> of them in {{TestHdfsConfigFields#configurationClasses}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7284) Add more debug info to BlockInfoUnderConstruction#setGenerationStampAndVerifyReplicas

2015-10-26 Thread Yongjun Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongjun Zhang updated HDFS-7284:

   Resolution: Fixed
Fix Version/s: 2.8.0
   Status: Resolved  (was: Patch Available)

I committed to trunk and branch-2. Thanks [~huLiu] for reporting the issue, and 
Wei-Chiu for the contribution!


> Add more debug info to 
> BlockInfoUnderConstruction#setGenerationStampAndVerifyReplicas
> -
>
> Key: HDFS-7284
> URL: https://issues.apache.org/jira/browse/HDFS-7284
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.5.1
>Reporter: Hu Liu,
>Assignee: Wei-Chiu Chuang
>  Labels: supportability
> Fix For: 2.8.0
>
> Attachments: HDFS-7284.001.patch, HDFS-7284.002.patch, 
> HDFS-7284.003.patch, HDFS-7284.004.patch, HDFS-7284.005.patch
>
>
> When I was looking at some replica loss issue, I got the following info from 
> log
> {code}
> 2014-10-13 01:54:53,104 INFO BlockStateChange: BLOCK* Removing stale replica 
> from location x.x.x.x
> {code}
> I could just know that a replica is removed, but I don't know which block and 
> its timestamp. I need to know the id and timestamp of the block from the log 
> file.
> So it's better to add more info including block id and timestamp to the code 
> snippet
> {code}
> for (ReplicaUnderConstruction r : replicas) {
>   if (genStamp != r.getGenerationStamp()) {
> r.getExpectedLocation().removeBlock(this);
> NameNode.blockStateChangeLog.info("BLOCK* Removing stale replica "
> + "from location: " + r.getExpectedLocation());
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9309) Tests that use KeyStoreUtil must call KeyStoreUtil.cleanupSSLConfig()

2015-10-26 Thread Wei-Chiu Chuang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14975055#comment-14975055
 ] 

Wei-Chiu Chuang commented on HDFS-9309:
---

The HDFS-9277 exhibited the same error message, so could be due to the same 
issue.

> Tests that use KeyStoreUtil must call KeyStoreUtil.cleanupSSLConfig()
> -
>
> Key: HDFS-9309
> URL: https://issues.apache.org/jira/browse/HDFS-9309
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>Priority: Minor
>
> When KeyStoreUtil.setupSSLConfig() is called, several files are created 
> (ssl-server.xml, ssl-client.xml, trustKS.jks, clientKS.jks, serverKS.jks). 
> However, if they are not deleted upon exit, weird thing can happen to any 
> subsequent files.
> For example, if ssl-client.xml is not delete, but trustKS.jks is deleted, 
> TestWebHDFSOAuth2.listStatusReturnsAsExpected will fail with message:
> {noformat}
> java.io.IOException: Unable to load OAuth2 connection factory.
>   at java.io.FileInputStream.open(Native Method)
>   at java.io.FileInputStream.(FileInputStream.java:146)
>   at 
> org.apache.hadoop.security.ssl.ReloadingX509TrustManager.loadTrustManager(ReloadingX509TrustManager.java:164)
>   at 
> org.apache.hadoop.security.ssl.ReloadingX509TrustManager.(ReloadingX509TrustManager.java:81)
>   at 
> org.apache.hadoop.security.ssl.FileBasedKeyStoresFactory.init(FileBasedKeyStoresFactory.java:215)
>   at org.apache.hadoop.security.ssl.SSLFactory.init(SSLFactory.java:131)
>   at 
> org.apache.hadoop.hdfs.web.URLConnectionFactory.newSslConnConfigurator(URLConnectionFactory.java:138)
>   at 
> org.apache.hadoop.hdfs.web.URLConnectionFactory.newOAuth2URLConnectionFactory(URLConnectionFactory.java:112)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.initialize(WebHdfsFileSystem.java:163)
>   at 
> org.apache.hadoop.hdfs.web.TestWebHDFSOAuth2.listStatusReturnsAsExpected(TestWebHDFSOAuth2.java:147)
> {noformat}
> There are currently several tests that do not clean up:
> {noformat}
> 130 ✗ weichiu@weichiu ~/trunk (trunk) $ grep -rnw . -e 
> 'KeyStoreTestUtil\.setupSSLConfig' | cut -d: -f1 |xargs grep -L 
> "KeyStoreTestUtil\.cleanupSSLConfig"
> ./hadoop-common-project/hadoop-kms/src/test/java/org/apache/hadoop/crypto/key/kms/server/TestKMS.java
> ./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/timeline/webapp/TestTimelineWebServicesWithSSL.java
> ./hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/web/TestWebHdfsTokens.java
> ./hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/protocol/datatransfer/sasl/SaslDataTransferTestCase.java
> ./hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/qjournal/TestSecureNNWithQJM.java
> ./hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNameNodeRespectsBindHostKeys.java
> ./hadoop-hdfs-project/hadoop-hdfs-httpfs/src/test/java/org/apache/hadoop/fs/http/client/TestHttpFSFWithSWebhdfsFileSystem.java
> {noformat}
> This JIRA is the effort to remove the bug.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9231) fsck doesn't explicitly list when Bad Replicas/Blocks are in a snapshot

2015-10-26 Thread Xiao Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen updated HDFS-9231:

Attachment: HDFS-9231.009.patch

Patch 009 fixes the check style warning of comments ending in '.'.
Test failures, findbugs warning and other check style errors are unrelated.

> fsck doesn't explicitly list when Bad Replicas/Blocks are in a snapshot
> ---
>
> Key: HDFS-9231
> URL: https://issues.apache.org/jira/browse/HDFS-9231
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: snapshots
>Reporter: Xiao Chen
>Assignee: Xiao Chen
> Attachments: HDFS-9231.001.patch, HDFS-9231.002.patch, 
> HDFS-9231.003.patch, HDFS-9231.004.patch, HDFS-9231.005.patch, 
> HDFS-9231.006.patch, HDFS-9231.007.patch, HDFS-9231.008.patch, 
> HDFS-9231.009.patch
>
>
> Currently for snapshot files, {{fsck -list-corruptfileblocks}} shows corrupt 
> blocks with the original file dir instead of the snapshot dir, and {{fsck 
> -list-corruptfileblocks -includeSnapshots}} behave the same.
> This can be confusing because even when the original file is deleted, fsck 
> will still show that deleted file as corrupted, although what's actually 
> corrupted is the snapshot. 
> As a side note, {{fsck -files -includeSnapshots}} shows the snapshot dirs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-5032) Write pipeline failures caused by slow or busy disk may not be handled properly.

2015-10-26 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14975142#comment-14975142
 ] 

Kihwal Lee commented on HDFS-5032:
--

bq. The faulty (slow) node was not detected correctly. Instead, the 2nd DN was 
excluded. The 1st DN's packet responder could have done a better job. It didn't 
have any outstanding ACKs to receive. Or the second DN could have tried to hint 
the 1st DN of what happened.

Fixed by HDFS-9178. Absence of heartbeat during flush will be fixed in a 
separate jira by [~daryn]

bq. copyBlock() could probably wait longer than 3 seconds in 
waitForMinLength(). Or it could check the on-disk size early on and fail early 
even before trying to establish a connection to the target.

If the node stuck in I/O is correctly taken out, this will happen far less. 
Also, HDFS-9106 will make this kind of failure non-fatal.

bq. Failed targets in block write/copy should clean up the record or make it 
recoverable.

Fixed in HDFS-6948.

> Write pipeline failures caused by slow or busy disk may not be handled 
> properly.
> 
>
> Key: HDFS-5032
> URL: https://issues.apache.org/jira/browse/HDFS-5032
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.1.0-beta, 0.23.9
>Reporter: Kihwal Lee
>Assignee: Daryn Sharp
>
> Here is one scenario I have recently encountered in a hbase cluster.
> The 1st datanode in a write pipeline's disk became extremely busy for many 
> minutes and it caused block writes on the disk to slow down. The 2nd 
> datanode's socket read from the 1st datanode timed out in 60 seconds and 
> disconnected. This caused a block recovery. The problem was, the 1st datanode 
> hasn't written the last packet, but the downstream nodes did and ACK was sent 
> back to the client. For this reason, the block recovery was issued up to the 
> ACKed size. 
> During the recovery, the first datanode was told to do copyBlock(). Since it 
> didn't have enough data on disk, it waited in waitForMinLength(), which 
> didn't help, so the command failed. The connection was already established to 
> the target node for the copy, but the target never received any data. The 
> data packet was eventually written, but it was too late for the copyBlock() 
> call.
> The destination node for the copy had block metadata in memory, but no file 
> was created on disk. When client contacted this node for block recovery, it 
> too failed. 
> There are few problems:
> - The faulty (slow) node was not detected correctly. Instead, the 2nd DN was 
> excluded. The 1st DN's packet responder could have done a better job. It 
> didn't have any outstanding ACKs to receive.  Or the second DN could have 
> tried to hint the 1st DN of what happened. 
> - copyBlock() could probably wait longer than 3 seconds in 
> waitForMinLength(). Or it could check the on-disk size early on and fail 
> early even before trying to establish a connection to the target.
> - Failed targets in block write/copy should clean up the record or make it 
> recoverable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9304) Add HdfsClientConfigKeys class to TestHdfsConfigFields#configurationClasses

2015-10-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14975181#comment-14975181
 ] 

Hudson commented on HDFS-9304:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #587 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/587/])
HDFS-9304. Add HdfsClientConfigKeys class to (wheat9: rev 
67e3d75aed1c1a90cabffc552d5743a69ea28b54)
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/tools/TestHdfsConfigFields.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


> Add HdfsClientConfigKeys class to TestHdfsConfigFields#configurationClasses
> ---
>
> Key: HDFS-9304
> URL: https://issues.apache.org/jira/browse/HDFS-9304
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: build
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
> Fix For: 2.8.0
>
> Attachments: HDFS-9304.000.patch
>
>
> *tl;dr* Since {{HdfsClientConfigKeys}} holds client side config keys, we need 
> to add this class to {{TestHdfsConfigFields#configurationClasses}}.
> Now the {{TestHdfsConfigFields}} unit test passes because {{DFSConfigKeys}} 
> still contains all the client side config keys, though marked @deprecated. As 
> we add new client config keys (e.g. [HDFS-9259]), the unit test will fail 
> with the following error:
> {quote}
> hdfs-default.xml has 1 properties missing in  class 
> org.apache.hadoop.hdfs.DFSConfigKeys
> {quote}
> If the logic is to make the {{DFSConfigKeys}} and {{HdfsClientConfigKeys}} 
> together cover all config keys in {{hdfs-default.xml}}, we need to put both 
> of them in {{TestHdfsConfigFields#configurationClasses}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9260) Improve performance and GC friendliness of startup and FBRs

2015-10-26 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14974876#comment-14974876
 ] 

Zhe Zhang commented on HDFS-9260:
-

Quick note: the patches are all named after HDFS-7435

> Improve performance and GC friendliness of startup and FBRs
> ---
>
> Key: HDFS-9260
> URL: https://issues.apache.org/jira/browse/HDFS-9260
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, namenode, performance
>Affects Versions: 2.7.1
>Reporter: Staffan Friberg
>Assignee: Staffan Friberg
> Attachments: HDFS Block and Replica Management 20151013.pdf, 
> HDFS-7435.001.patch, HDFS-7435.002.patch, HDFS-7435.003.patch, 
> HDFS-7435.004.patch, HDFS-7435.005.patch, HDFS-7435.006.patch, 
> HDFS-7435.007.patch
>
>
> This patch changes the datastructures used for BlockInfos and Replicas to 
> keep them sorted. This allows faster and more GC friendly handling of full 
> block reports.
> Would like to hear peoples feedback on this change and also some help 
> investigating/understanding a few outstanding issues if we are interested in 
> moving forward with this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9292) Make TestFileConcorruption independent to underlying FsDataset Implementation.

2015-10-26 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14974972#comment-14974972
 ] 

Colin Patrick McCabe commented on HDFS-9292:


{code}
81System.out.println("Deliberately removing block "
82+ brr.getBlockName());
{code}
While we're refactoring this, can we change this to a LOG message?

+1 once that's addressed

> Make TestFileConcorruption independent to underlying FsDataset Implementation.
> --
>
> Key: HDFS-9292
> URL: https://issues.apache.org/jira/browse/HDFS-9292
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.7.1
>Reporter: Lei (Eddy) Xu
>Assignee: Lei (Eddy) Xu
>Priority: Minor
> Attachments: HDFS-9292.00.patch
>
>
> {{TestFileCorruption}} manipulates the block data by directly accessing the 
> block files on disk.  {{MiniDFSCluster}} has already offered ways to corrupt 
> data. We can use that to make {{TestFileCorruption}} agnostic to the 
> implementation. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9245) Fix findbugs warnings in hdfs-nfs/WriteCtx

2015-10-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14975147#comment-14975147
 ] 

Hadoop QA commented on HDFS-9245:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  16m 45s | Pre-patch trunk has 2 extant 
Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   8m  2s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 32s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 24s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 24s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 32s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 35s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   0m 51s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings, and fixes 2 pre-existing warnings. |
| {color:green}+1{color} | native |   3m 12s | Pre-build of native portion |
| {color:green}+1{color} | hdfs tests |   1m 48s | Tests passed in 
hadoop-hdfs-nfs. |
| | |  44m  8s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12768785/HDFS-9245.002.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 2f1eb2b |
| Pre-patch Findbugs warnings | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13201/artifact/patchprocess/trunkFindbugsWarningshadoop-hdfs-nfs.html
 |
| hadoop-hdfs-nfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13201/artifact/patchprocess/testrun_hadoop-hdfs-nfs.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13201/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13201/console |


This message was automatically generated.

> Fix findbugs warnings in hdfs-nfs/WriteCtx
> --
>
> Key: HDFS-9245
> URL: https://issues.apache.org/jira/browse/HDFS-9245
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: nfs
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
> Attachments: HDFS-9245.000.patch, HDFS-9245.001.patch, 
> HDFS-9245.002.patch
>
>
> There are findbugs warnings as follows, brought by [HDFS-9092].
> It seems fine to ignore them by write a filter rule in the 
> {{findbugsExcludeFile.xml}} file. 
> {code:xml}
>  instanceHash="592511935f7cb9e5f97ef4c99a6c46c2" instanceOccurrenceNum="0" 
> priority="2" abbrev="IS" type="IS2_INCONSISTENT_SYNC" cweid="366" 
> instanceOccurrenceMax="0">
> Inconsistent synchronization
> 
> Inconsistent synchronization of 
> org.apache.hadoop.hdfs.nfs.nfs3.WriteCtx.offset; locked 75% of time
> 
> 
>  sourcepath="org/apache/hadoop/hdfs/nfs/nfs3/WriteCtx.java" 
> sourcefile="WriteCtx.java" end="314">
> At WriteCtx.java:[lines 40-314]
> 
> In class org.apache.hadoop.hdfs.nfs.nfs3.WriteCtx
> 
> {code}
> and
> {code:xml}
>  instanceHash="4f3daa339eb819220f26c998369b02fe" instanceOccurrenceNum="0" 
> priority="2" abbrev="IS" type="IS2_INCONSISTENT_SYNC" cweid="366" 
> instanceOccurrenceMax="0">
> Inconsistent synchronization
> 
> Inconsistent synchronization of 
> org.apache.hadoop.hdfs.nfs.nfs3.WriteCtx.originalCount; locked 50% of time
> 
> 
>  sourcepath="org/apache/hadoop/hdfs/nfs/nfs3/WriteCtx.java" 
> sourcefile="WriteCtx.java" end="314">
> At WriteCtx.java:[lines 40-314]
> 
> In class org.apache.hadoop.hdfs.nfs.nfs3.WriteCtx
> 
>  name="originalCount" primary="true" signature="I">
>  sourcepath="org/apache/hadoop/hdfs/nfs/nfs3/WriteCtx.java" 
> sourcefile="WriteCtx.java">
> In WriteCtx.java
> 
> 
> Field org.apache.hadoop.hdfs.nfs.nfs3.WriteCtx.originalCount
> 
> 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7284) Add more debug info to BlockInfoUnderConstruction#setGenerationStampAndVerifyReplicas

2015-10-26 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14974893#comment-14974893
 ] 

Yongjun Zhang commented on HDFS-7284:
-

+1 on 005, I will commit momentarily. 


> Add more debug info to 
> BlockInfoUnderConstruction#setGenerationStampAndVerifyReplicas
> -
>
> Key: HDFS-7284
> URL: https://issues.apache.org/jira/browse/HDFS-7284
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.5.1
>Reporter: Hu Liu,
>Assignee: Wei-Chiu Chuang
>  Labels: supportability
> Attachments: HDFS-7284.001.patch, HDFS-7284.002.patch, 
> HDFS-7284.003.patch, HDFS-7284.004.patch, HDFS-7284.005.patch
>
>
> When I was looking at some replica loss issue, I got the following info from 
> log
> {code}
> 2014-10-13 01:54:53,104 INFO BlockStateChange: BLOCK* Removing stale replica 
> from location x.x.x.x
> {code}
> I could just know that a replica is removed, but I don't know which block and 
> its timestamp. I need to know the id and timestamp of the block from the log 
> file.
> So it's better to add more info including block id and timestamp to the code 
> snippet
> {code}
> for (ReplicaUnderConstruction r : replicas) {
>   if (genStamp != r.getGenerationStamp()) {
> r.getExpectedLocation().removeBlock(this);
> NameNode.blockStateChangeLog.info("BLOCK* Removing stale replica "
> + "from location: " + r.getExpectedLocation());
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9304) Add HdfsClientConfigKeys class to TestHdfsConfigFields#configurationClasses

2015-10-26 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14974840#comment-14974840
 ] 

Haohui Mai commented on HDFS-9304:
--

+1. The failed unit tests are unrelated. Committing it shortly.

> Add HdfsClientConfigKeys class to TestHdfsConfigFields#configurationClasses
> ---
>
> Key: HDFS-9304
> URL: https://issues.apache.org/jira/browse/HDFS-9304
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: build
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
> Attachments: HDFS-9304.000.patch
>
>
> *tl;dr* Since {{HdfsClientConfigKeys}} holds client side config keys, we need 
> to add this class to {{TestHdfsConfigFields#configurationClasses}}.
> Now the {{TestHdfsConfigFields}} unit test passes because {{DFSConfigKeys}} 
> still contains all the client side config keys, though marked @deprecated. As 
> we add new client config keys (e.g. [HDFS-9259]), the unit test will fail 
> with the following error:
> {quote}
> hdfs-default.xml has 1 properties missing in  class 
> org.apache.hadoop.hdfs.DFSConfigKeys
> {quote}
> If the logic is to make the {{DFSConfigKeys}} and {{HdfsClientConfigKeys}} 
> together cover all config keys in {{hdfs-default.xml}}, we need to put both 
> of them in {{TestHdfsConfigFields#configurationClasses}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9083) Replication violates block placement policy.

2015-10-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14974953#comment-14974953
 ] 

Hadoop QA commented on HDFS-9083:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | patch |   0m  0s | The patch command could not apply 
the patch during dryrun. |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12768797/HDFS-9083-branch-2.7.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | branch-2 / baa2998 |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13198/console |


This message was automatically generated.

> Replication violates block placement policy.
> 
>
> Key: HDFS-9083
> URL: https://issues.apache.org/jira/browse/HDFS-9083
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: HDFS, namenode
>Affects Versions: 2.6.0
>Reporter: Rushabh S Shah
>Assignee: Rushabh S Shah
>Priority: Blocker
> Attachments: HDFS-9083-branch-2.7.patch
>
>
> Recently we are noticing many cases in which all the replica of the block are 
> residing on the same rack.
> During the block creation, the block placement policy was honored.
> But after node failure event in some specific manner, the block ends up in 
> such state.
> On investigating more I found out that BlockManager#blockHasEnoughRacks is 
> dependent on the config (net.topology.script.file.name)
> {noformat}
>  if (!this.shouldCheckForEnoughRacks) {
>   return true;
> }
> {noformat}
> We specify DNSToSwitchMapping implementation (our own custom implementation) 
> via net.topology.node.switch.mapping.impl and no longer use 
> net.topology.script.file.name config.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HDFS-9307) fuseConnect should be private to fuse_connect.c

2015-10-26 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu reassigned HDFS-9307:
---

Assignee: Mingliang Liu

> fuseConnect should be private to fuse_connect.c
> ---
>
> Key: HDFS-9307
> URL: https://issues.apache.org/jira/browse/HDFS-9307
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: fuse-dfs
>Reporter: Colin Patrick McCabe
>Assignee: Mingliang Liu
>Priority: Trivial
>
> fuseConnect should be private to fuse_connect.c, since it's not used outside 
> that file



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-5032) Write pipeline failures caused by slow or busy disk may not be handled properly.

2015-10-26 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee resolved HDFS-5032.
--
Resolution: Fixed

> Write pipeline failures caused by slow or busy disk may not be handled 
> properly.
> 
>
> Key: HDFS-5032
> URL: https://issues.apache.org/jira/browse/HDFS-5032
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.1.0-beta, 0.23.9
>Reporter: Kihwal Lee
>Assignee: Daryn Sharp
>
> Here is one scenario I have recently encountered in a hbase cluster.
> The 1st datanode in a write pipeline's disk became extremely busy for many 
> minutes and it caused block writes on the disk to slow down. The 2nd 
> datanode's socket read from the 1st datanode timed out in 60 seconds and 
> disconnected. This caused a block recovery. The problem was, the 1st datanode 
> hasn't written the last packet, but the downstream nodes did and ACK was sent 
> back to the client. For this reason, the block recovery was issued up to the 
> ACKed size. 
> During the recovery, the first datanode was told to do copyBlock(). Since it 
> didn't have enough data on disk, it waited in waitForMinLength(), which 
> didn't help, so the command failed. The connection was already established to 
> the target node for the copy, but the target never received any data. The 
> data packet was eventually written, but it was too late for the copyBlock() 
> call.
> The destination node for the copy had block metadata in memory, but no file 
> was created on disk. When client contacted this node for block recovery, it 
> too failed. 
> There are few problems:
> - The faulty (slow) node was not detected correctly. Instead, the 2nd DN was 
> excluded. The 1st DN's packet responder could have done a better job. It 
> didn't have any outstanding ACKs to receive.  Or the second DN could have 
> tried to hint the 1st DN of what happened. 
> - copyBlock() could probably wait longer than 3 seconds in 
> waitForMinLength(). Or it could check the on-disk size early on and fail 
> early even before trying to establish a connection to the target.
> - Failed targets in block write/copy should clean up the record or make it 
> recoverable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9083) Replication violates block placement policy.

2015-10-26 Thread Rushabh S Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rushabh S Shah updated HDFS-9083:
-
Attachment: HDFS-9083-branch-2.7.patch

> Replication violates block placement policy.
> 
>
> Key: HDFS-9083
> URL: https://issues.apache.org/jira/browse/HDFS-9083
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: HDFS, namenode
>Affects Versions: 2.6.0
>Reporter: Rushabh S Shah
>Assignee: Rushabh S Shah
>Priority: Blocker
> Attachments: HDFS-9083-branch-2.7.patch
>
>
> Recently we are noticing many cases in which all the replica of the block are 
> residing on the same rack.
> During the block creation, the block placement policy was honored.
> But after node failure event in some specific manner, the block ends up in 
> such state.
> On investigating more I found out that BlockManager#blockHasEnoughRacks is 
> dependent on the config (net.topology.script.file.name)
> {noformat}
>  if (!this.shouldCheckForEnoughRacks) {
>   return true;
> }
> {noformat}
> We specify DNSToSwitchMapping implementation (our own custom implementation) 
> via net.topology.node.switch.mapping.impl and no longer use 
> net.topology.script.file.name config.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9083) Replication violates block placement policy.

2015-10-26 Thread Rushabh S Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rushabh S Shah updated HDFS-9083:
-
Status: Patch Available  (was: Open)

> Replication violates block placement policy.
> 
>
> Key: HDFS-9083
> URL: https://issues.apache.org/jira/browse/HDFS-9083
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: HDFS, namenode
>Affects Versions: 2.6.0
>Reporter: Rushabh S Shah
>Assignee: Rushabh S Shah
>Priority: Blocker
> Attachments: HDFS-9083-branch-2.7.patch
>
>
> Recently we are noticing many cases in which all the replica of the block are 
> residing on the same rack.
> During the block creation, the block placement policy was honored.
> But after node failure event in some specific manner, the block ends up in 
> such state.
> On investigating more I found out that BlockManager#blockHasEnoughRacks is 
> dependent on the config (net.topology.script.file.name)
> {noformat}
>  if (!this.shouldCheckForEnoughRacks) {
>   return true;
> }
> {noformat}
> We specify DNSToSwitchMapping implementation (our own custom implementation) 
> via net.topology.node.switch.mapping.impl and no longer use 
> net.topology.script.file.name config.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7984) webhdfs:// needs to support provided delegation tokens

2015-10-26 Thread HeeSoo Kim (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

HeeSoo Kim updated HDFS-7984:
-
Attachment: HDFS-7984.002.patch

> webhdfs:// needs to support provided delegation tokens
> --
>
> Key: HDFS-7984
> URL: https://issues.apache.org/jira/browse/HDFS-7984
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: webhdfs
>Affects Versions: 3.0.0
>Reporter: Allen Wittenauer
>Assignee: HeeSoo Kim
>Priority: Blocker
> Attachments: HDFS-7984.001.patch, HDFS-7984.002.patch, HDFS-7984.patch
>
>
> When using the webhdfs:// filesystem (especially from distcp), we need the 
> ability to inject a delegation token rather than webhdfs initialize its own.  
> This would allow for cross-authentication-zone file system accesses.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9299) Give ReplicationMonitor a readable thread name

2015-10-26 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14974974#comment-14974974
 ] 

Colin Patrick McCabe commented on HDFS-9299:


+1 pending jenkins.

> Give ReplicationMonitor a readable thread name
> --
>
> Key: HDFS-9299
> URL: https://issues.apache.org/jira/browse/HDFS-9299
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.7.1
>Reporter: Staffan Friberg
>Assignee: Staffan Friberg
>Priority: Trivial
> Attachments: HDFS-9299.001.patch
>
>
> Currently the log output from the Replication Monitor is the class name, by 
> setting the name on the thread the output will be easier to read.
> Current
> 2015-10-23 11:07:53,344 
> [org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor@2fbdc5dd]
>  INFO  blockmanagement.BlockManager (BlockManager.java:run(4125)) - Stopping 
> ReplicationMonitor.
> After
> 2015-10-23 11:07:53,344 [ReplicationMonitor] INFO  
> blockmanagement.BlockManager (BlockManager.java:run(4125)) - Stopping 
> ReplicationMonitor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9129) Move the safemode block count into BlockManager

2015-10-26 Thread Anu Engineer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14975016#comment-14975016
 ] 

Anu Engineer commented on HDFS-9129:


[~liuml07] Thanks for adding me to this review. It looks like  the patch-10 
does indeed break HDFS-4015. The issue might be a missing line in 
{{FSNamesystem#setSafeMode}}. We seems to have dropped a line 
safeMode.leave(true), in the SAFEMODE_FORCE_EXIT along with setting 
leaveSafeMode function always calling blockManager.leaveSafeMode(true);

Since you are refactoring all the safemode related variables into one class, I 
think it might be a good idea to move BytesInFutureBlocks also to the same 
class.



> Move the safemode block count into BlockManager
> ---
>
> Key: HDFS-9129
> URL: https://issues.apache.org/jira/browse/HDFS-9129
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Haohui Mai
>Assignee: Mingliang Liu
> Attachments: HDFS-9129.000.patch, HDFS-9129.001.patch, 
> HDFS-9129.002.patch, HDFS-9129.003.patch, HDFS-9129.004.patch, 
> HDFS-9129.005.patch, HDFS-9129.006.patch, HDFS-9129.007.patch, 
> HDFS-9129.008.patch, HDFS-9129.009.patch, HDFS-9129.010.patch
>
>
> The {{SafeMode}} needs to track whether there are enough blocks so that the 
> NN can get out of the safemode. These fields can moved to the 
> {{BlockManager}} class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9307) fuseConnect should be private to fuse_connect.c

2015-10-26 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-9307:

Attachment: HDFS-9307.000.patch

Thanks for reporting this. I also think it should be private. The patch simply 
make it {{static}}.

> fuseConnect should be private to fuse_connect.c
> ---
>
> Key: HDFS-9307
> URL: https://issues.apache.org/jira/browse/HDFS-9307
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: fuse-dfs
>Reporter: Colin Patrick McCabe
>Assignee: Mingliang Liu
>Priority: Trivial
> Attachments: HDFS-9307.000.patch
>
>
> fuseConnect should be private to fuse_connect.c, since it's not used outside 
> that file



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9309) Tests that use KeyStoreUtil must call KeyStoreUtil.cleanupSSLConfig()

2015-10-26 Thread Wei-Chiu Chuang (JIRA)
Wei-Chiu Chuang created HDFS-9309:
-

 Summary: Tests that use KeyStoreUtil must call 
KeyStoreUtil.cleanupSSLConfig()
 Key: HDFS-9309
 URL: https://issues.apache.org/jira/browse/HDFS-9309
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Wei-Chiu Chuang
Assignee: Wei-Chiu Chuang
Priority: Minor


When KeyStoreUtil.setupSSLConfig() is called, several files are created 
(ssl-server.xml, ssl-client.xml, trustKS.jks, clientKS.jks, serverKS.jks). 
However, if they are not deleted upon exit, weird thing can happen to any 
subsequent files.

For example, if ssl-client.xml is not delete, but trustKS.jks is deleted, 
TestWebHDFSOAuth2.listStatusReturnsAsExpected will fail with message:
{noformat}
java.io.IOException: Unable to load OAuth2 connection factory.
at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.(FileInputStream.java:146)
at 
org.apache.hadoop.security.ssl.ReloadingX509TrustManager.loadTrustManager(ReloadingX509TrustManager.java:164)
at 
org.apache.hadoop.security.ssl.ReloadingX509TrustManager.(ReloadingX509TrustManager.java:81)
at 
org.apache.hadoop.security.ssl.FileBasedKeyStoresFactory.init(FileBasedKeyStoresFactory.java:215)
at org.apache.hadoop.security.ssl.SSLFactory.init(SSLFactory.java:131)
at 
org.apache.hadoop.hdfs.web.URLConnectionFactory.newSslConnConfigurator(URLConnectionFactory.java:138)
at 
org.apache.hadoop.hdfs.web.URLConnectionFactory.newOAuth2URLConnectionFactory(URLConnectionFactory.java:112)
at 
org.apache.hadoop.hdfs.web.WebHdfsFileSystem.initialize(WebHdfsFileSystem.java:163)
at 
org.apache.hadoop.hdfs.web.TestWebHDFSOAuth2.listStatusReturnsAsExpected(TestWebHDFSOAuth2.java:147)
{noformat}

There are currently several tests that do not clean up:

{noformat}

130 ✗ weichiu@weichiu ~/trunk (trunk) $ grep -rnw . -e 
'KeyStoreTestUtil\.setupSSLConfig' | cut -d: -f1 |xargs grep -L 
"KeyStoreTestUtil\.cleanupSSLConfig"
./hadoop-common-project/hadoop-kms/src/test/java/org/apache/hadoop/crypto/key/kms/server/TestKMS.java
./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/timeline/webapp/TestTimelineWebServicesWithSSL.java
./hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/web/TestWebHdfsTokens.java
./hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/protocol/datatransfer/sasl/SaslDataTransferTestCase.java
./hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/qjournal/TestSecureNNWithQJM.java
./hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNameNodeRespectsBindHostKeys.java
./hadoop-hdfs-project/hadoop-hdfs-httpfs/src/test/java/org/apache/hadoop/fs/http/client/TestHttpFSFWithSWebhdfsFileSystem.java
{noformat}

This JIRA is the effort to remove the bug.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9309) Tests that use KeyStoreUtil must call KeyStoreUtil.cleanupSSLConfig()

2015-10-26 Thread Wei-Chiu Chuang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-9309:
--
Description: 
When KeyStoreUtil.setupSSLConfig() is called, several files are created 
(ssl-server.xml, ssl-client.xml, trustKS.jks, clientKS.jks, serverKS.jks). 
However, if they are not deleted upon exit, weird thing can happen to any 
subsequent tests.

For example, if ssl-client.xml is not delete, but trustKS.jks is deleted, 
TestWebHDFSOAuth2.listStatusReturnsAsExpected will fail with message:
{noformat}
java.io.IOException: Unable to load OAuth2 connection factory.
at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.(FileInputStream.java:146)
at 
org.apache.hadoop.security.ssl.ReloadingX509TrustManager.loadTrustManager(ReloadingX509TrustManager.java:164)
at 
org.apache.hadoop.security.ssl.ReloadingX509TrustManager.(ReloadingX509TrustManager.java:81)
at 
org.apache.hadoop.security.ssl.FileBasedKeyStoresFactory.init(FileBasedKeyStoresFactory.java:215)
at org.apache.hadoop.security.ssl.SSLFactory.init(SSLFactory.java:131)
at 
org.apache.hadoop.hdfs.web.URLConnectionFactory.newSslConnConfigurator(URLConnectionFactory.java:138)
at 
org.apache.hadoop.hdfs.web.URLConnectionFactory.newOAuth2URLConnectionFactory(URLConnectionFactory.java:112)
at 
org.apache.hadoop.hdfs.web.WebHdfsFileSystem.initialize(WebHdfsFileSystem.java:163)
at 
org.apache.hadoop.hdfs.web.TestWebHDFSOAuth2.listStatusReturnsAsExpected(TestWebHDFSOAuth2.java:147)
{noformat}

There are currently several tests that do not clean up:

{noformat}

130 ✗ weichiu@weichiu ~/trunk (trunk) $ grep -rnw . -e 
'KeyStoreTestUtil\.setupSSLConfig' | cut -d: -f1 |xargs grep -L 
"KeyStoreTestUtil\.cleanupSSLConfig"
./hadoop-common-project/hadoop-kms/src/test/java/org/apache/hadoop/crypto/key/kms/server/TestKMS.java
./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/timeline/webapp/TestTimelineWebServicesWithSSL.java
./hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/web/TestWebHdfsTokens.java
./hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/protocol/datatransfer/sasl/SaslDataTransferTestCase.java
./hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/qjournal/TestSecureNNWithQJM.java
./hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNameNodeRespectsBindHostKeys.java
./hadoop-hdfs-project/hadoop-hdfs-httpfs/src/test/java/org/apache/hadoop/fs/http/client/TestHttpFSFWithSWebhdfsFileSystem.java
{noformat}

This JIRA is the effort to remove the bug.

  was:
When KeyStoreUtil.setupSSLConfig() is called, several files are created 
(ssl-server.xml, ssl-client.xml, trustKS.jks, clientKS.jks, serverKS.jks). 
However, if they are not deleted upon exit, weird thing can happen to any 
subsequent files.

For example, if ssl-client.xml is not delete, but trustKS.jks is deleted, 
TestWebHDFSOAuth2.listStatusReturnsAsExpected will fail with message:
{noformat}
java.io.IOException: Unable to load OAuth2 connection factory.
at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.(FileInputStream.java:146)
at 
org.apache.hadoop.security.ssl.ReloadingX509TrustManager.loadTrustManager(ReloadingX509TrustManager.java:164)
at 
org.apache.hadoop.security.ssl.ReloadingX509TrustManager.(ReloadingX509TrustManager.java:81)
at 
org.apache.hadoop.security.ssl.FileBasedKeyStoresFactory.init(FileBasedKeyStoresFactory.java:215)
at org.apache.hadoop.security.ssl.SSLFactory.init(SSLFactory.java:131)
at 
org.apache.hadoop.hdfs.web.URLConnectionFactory.newSslConnConfigurator(URLConnectionFactory.java:138)
at 
org.apache.hadoop.hdfs.web.URLConnectionFactory.newOAuth2URLConnectionFactory(URLConnectionFactory.java:112)
at 
org.apache.hadoop.hdfs.web.WebHdfsFileSystem.initialize(WebHdfsFileSystem.java:163)
at 
org.apache.hadoop.hdfs.web.TestWebHDFSOAuth2.listStatusReturnsAsExpected(TestWebHDFSOAuth2.java:147)
{noformat}

There are currently several tests that do not clean up:

{noformat}

130 ✗ weichiu@weichiu ~/trunk (trunk) $ grep -rnw . -e 
'KeyStoreTestUtil\.setupSSLConfig' | cut -d: -f1 |xargs grep -L 
"KeyStoreTestUtil\.cleanupSSLConfig"
./hadoop-common-project/hadoop-kms/src/test/java/org/apache/hadoop/crypto/key/kms/server/TestKMS.java
./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/timeline/webapp/TestTimelineWebServicesWithSSL.java
./hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/web/TestWebHdfsTokens.java

[jira] [Updated] (HDFS-9007) Fix HDFS Balancer to honor upgrade domain policy

2015-10-26 Thread Ming Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Ma updated HDFS-9007:
--
Attachment: HDFS-9007.patch

Here is the initial patch. The basic idea is to add a new method called 
{{isMovable}} to {{BlockPlacementPolicy}} for balancer and other tools to ask 
if the move is allowed.

The reason we don't use {{verifyBlockPlacement}} in such scenario is 
{{isMovable}} and {{verifyBlockPlacement}} have different meanings. 
{{isMovable}} means "it is ok as long as it doesn't make things worse.". For 
example, say a block has two replicas and they are on the same rack. Moving one 
of the replicas to another node in the same rack is allowed from balancer's 
point of view. But {{verifyBlockPlacement}} won't allow that.

The policy defined by {{isMovable}} is used by client-side tools. It is the 
same as the policy used by BlockManager to decide delete hint. Thus there is 
some refactoring to make sure that policy is shared properly.

The patch also adds bunch of new tests and clean up some code along the way.

> Fix HDFS Balancer to honor upgrade domain policy
> 
>
> Key: HDFS-9007
> URL: https://issues.apache.org/jira/browse/HDFS-9007
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Ming Ma
> Attachments: HDFS-9007.patch
>
>
> In the current design of HDFS Balancer, it doesn't use BlockPlacementPolicy 
> used by namenode runtime. Instead, it has somewhat redundant code to make 
> sure block allocation conforms with the rack policy.
> When namenode uses upgrade domain based policy, we need to make sure that 
> HDFS balancer doesn't move blocks in a way that could violate upgrade domain 
> block placement policy.
> In the longer term, we should consider how to make Balancer independent of 
> the actual BlockPlacementPolicy as in HDFS-1431. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9304) Add HdfsClientConfigKeys class to TestHdfsConfigFields#configurationClasses

2015-10-26 Thread Mingliang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14974847#comment-14974847
 ] 

Mingliang Liu commented on HDFS-9304:
-

Thanks to [~wheat9] for your review and commit.

> Add HdfsClientConfigKeys class to TestHdfsConfigFields#configurationClasses
> ---
>
> Key: HDFS-9304
> URL: https://issues.apache.org/jira/browse/HDFS-9304
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: build
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
> Fix For: 2.8.0
>
> Attachments: HDFS-9304.000.patch
>
>
> *tl;dr* Since {{HdfsClientConfigKeys}} holds client side config keys, we need 
> to add this class to {{TestHdfsConfigFields#configurationClasses}}.
> Now the {{TestHdfsConfigFields}} unit test passes because {{DFSConfigKeys}} 
> still contains all the client side config keys, though marked @deprecated. As 
> we add new client config keys (e.g. [HDFS-9259]), the unit test will fail 
> with the following error:
> {quote}
> hdfs-default.xml has 1 properties missing in  class 
> org.apache.hadoop.hdfs.DFSConfigKeys
> {quote}
> If the logic is to make the {{DFSConfigKeys}} and {{HdfsClientConfigKeys}} 
> together cover all config keys in {{hdfs-default.xml}}, we need to put both 
> of them in {{TestHdfsConfigFields#configurationClasses}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9268) fuse_dfs chown crashes when uid is passed as -1

2015-10-26 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-9268:
---
   Resolution: Fixed
Fix Version/s: 2.8.0
   Status: Resolved  (was: Patch Available)

> fuse_dfs chown crashes when uid is passed as -1
> ---
>
> Key: HDFS-9268
> URL: https://issues.apache.org/jira/browse/HDFS-9268
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Wei-Chiu Chuang
>Assignee: Colin Patrick McCabe
>Priority: Minor
> Fix For: 2.8.0
>
> Attachments: HDFS-9268.001.patch, HDFS-9268.002.patch
>
>
> JVM crashes when users attempt to use vi to update a file on fuse file system 
> with insufficient permission. (I use CDH's hadoop-fuse-dfs wrapper script to 
> generate the bug, but the same bug is reproducible in trunk)
> The root cause is a segfault in a dfs-fuse method
> To reproduce it do as follows:
> mkdir /mnt/fuse
> chmod 777 /mnt/fuse
> ulimit -c unlimited# to enable coredump
> hadoop-fuse-dfs -odebug hdfs://localhost:9000/fuse /mnt/fuse
> touch /mnt/fuse/y
> chmod 600 /mnt/fuse/y
> vim /mnt/fuse/y
> (in vim, :w to save the file)
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGSEGV (0xb) at pc=0x003b82f27ad6, pid=26606, tid=140079005689600
> #
> # JRE version: Java(TM) SE Runtime Environment (7.0_79-b15) (build 
> 1.7.0_79-b15)
> # Java VM: Java HotSpot(TM) 64-Bit Server VM (24.79-b02 mixed mode 
> linux-amd64 compressed oops)
> # Problematic frame:
> # C  [libc.so.6+0x127ad6]  __tls_get_addr@@GLIBC_2.3+0x127ad6
> #
> # Core dump written. Default location: /home/weichiu/core or core.26606
> #
> # An error report file with more information is saved as:
> # /home/weichiu/hs_err_pid26606.log
> #
> # If you would like to submit a bug report, please visit:
> #   http://bugreport.java.com/bugreport/crash.jsp
> # The crash happened outside the Java Virtual Machine in native code.
> # See problematic frame for where to report the bug.
> #
> /usr/bin/hadoop-fuse-dfs: line 29: 26606 Aborted (core 
> dumped) env CLASSPATH="${CLASSPATH}" ${HADOOP_HOME}/bin/fuse_dfs $@
> ===
> The coredump shows the segfault comes from 
> (gdb) bt
> #0  0x003b82e328e5 in raise () from /lib64/libc.so.6
> #1  0x003b82e340c5 in abort () from /lib64/libc.so.6
> #2  0x7f66fc924d75 in os::abort(bool) () from 
> /etc/alternatives/jre/jre/lib/amd64/server/libjvm.so
> #3  0x7f66fcaa76d7 in VMError::report_and_die() () from 
> /etc/alternatives/jre/jre/lib/amd64/server/libjvm.so
> #4  0x7f66fc929c8f in JVM_handle_linux_signal () from 
> /etc/alternatives/jre/jre/lib/amd64/server/libjvm.so
> #5  
> #6  0x003b82f27ad6 in __strcmp_sse42 () from /lib64/libc.so.6
> #7  0x004039a0 in hdfsConnTree_RB_FIND ()
> #8  0x00403e8f in fuseConnect ()
> #9  0x004046db in dfs_chown ()
> #10 0x7f66fcf8f6d2 in ?? () from /lib64/libfuse.so.2
> #11 0x7f66fcf940d1 in ?? () from /lib64/libfuse.so.2
> #12 0x7f66fcf910ef in ?? () from /lib64/libfuse.so.2
> #13 0x003b83207851 in start_thread () from /lib64/libpthread.so.0
> #14 0x003b82ee894d in clone () from /lib64/libc.so.6



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HDFS-5032) Write pipeline failures caused by slow or busy disk may not be handled properly.

2015-10-26 Thread Daryn Sharp (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daryn Sharp reassigned HDFS-5032:
-

Assignee: Daryn Sharp

> Write pipeline failures caused by slow or busy disk may not be handled 
> properly.
> 
>
> Key: HDFS-5032
> URL: https://issues.apache.org/jira/browse/HDFS-5032
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.1.0-beta, 0.23.9
>Reporter: Kihwal Lee
>Assignee: Daryn Sharp
>
> Here is one scenario I have recently encountered in a hbase cluster.
> The 1st datanode in a write pipeline's disk became extremely busy for many 
> minutes and it caused block writes on the disk to slow down. The 2nd 
> datanode's socket read from the 1st datanode timed out in 60 seconds and 
> disconnected. This caused a block recovery. The problem was, the 1st datanode 
> hasn't written the last packet, but the downstream nodes did and ACK was sent 
> back to the client. For this reason, the block recovery was issued up to the 
> ACKed size. 
> During the recovery, the first datanode was told to do copyBlock(). Since it 
> didn't have enough data on disk, it waited in waitForMinLength(), which 
> didn't help, so the command failed. The connection was already established to 
> the target node for the copy, but the target never received any data. The 
> data packet was eventually written, but it was too late for the copyBlock() 
> call.
> The destination node for the copy had block metadata in memory, but no file 
> was created on disk. When client contacted this node for block recovery, it 
> too failed. 
> There are few problems:
> - The faulty (slow) node was not detected correctly. Instead, the 2nd DN was 
> excluded. The 1st DN's packet responder could have done a better job. It 
> didn't have any outstanding ACKs to receive.  Or the second DN could have 
> tried to hint the 1st DN of what happened. 
> - copyBlock() could probably wait longer than 3 seconds in 
> waitForMinLength(). Or it could check the on-disk size early on and fail 
> early even before trying to establish a connection to the target.
> - Failed targets in block write/copy should clean up the record or make it 
> recoverable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9279) Decomissioned capacity should not be considered for configured/used capacity

2015-10-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14975088#comment-14975088
 ] 

Hadoop QA commented on HDFS-9279:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  23m  4s | Pre-patch trunk has 1 extant 
Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |  10m 23s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  14m  1s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 38s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   1m 53s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 53s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 43s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   2m 59s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | native |   4m 15s | Pre-build of native portion |
| {color:red}-1{color} | hdfs tests |  64m 57s | Tests failed in hadoop-hdfs. |
| | | 124m 50s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.hdfs.TestDFSUpgradeFromImage |
|   | hadoop.hdfs.TestDFSOutputStream |
|   | hadoop.hdfs.TestWriteReadStripedFile |
|   | hadoop.hdfs.server.namenode.TestNamenodeCapacityReport |
|   | hadoop.fs.TestSymlinkHdfsFileContext |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12768762/HDFS-9279-v3.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 67e3d75 |
| Pre-patch Findbugs warnings | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13196/artifact/patchprocess/trunkFindbugsWarningshadoop-hdfs.html
 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13196/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13196/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13196/console |


This message was automatically generated.

> Decomissioned capacity should not be considered for configured/used capacity
> 
>
> Key: HDFS-9279
> URL: https://issues.apache.org/jira/browse/HDFS-9279
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.6.1
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: HDFS-9279-v1.patch, HDFS-9279-v2.patch, 
> HDFS-9279-v3.patch
>
>
> Capacity of a decommissioned node is being accounted as configured and used 
> capacity metrics. This gives incorrect perception of cluster usage.
> Once a node is decommissioned, its capacity should be considered similar to a 
> dead node.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9307) fuseConnect should be private to fuse_connect.c

2015-10-26 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14975087#comment-14975087
 ] 

Colin Patrick McCabe commented on HDFS-9307:


Thanks, [~liuml07].  can you remove it from the header as well?  +1 once that's 
done

> fuseConnect should be private to fuse_connect.c
> ---
>
> Key: HDFS-9307
> URL: https://issues.apache.org/jira/browse/HDFS-9307
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: fuse-dfs
>Reporter: Colin Patrick McCabe
>Assignee: Mingliang Liu
>Priority: Trivial
> Attachments: HDFS-9307.000.patch
>
>
> fuseConnect should be private to fuse_connect.c, since it's not used outside 
> that file



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9305) Delayed heartbeat processing causes storm of subsequent heartbeats

2015-10-26 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14975085#comment-14975085
 ] 

Andrew Wang commented on HDFS-9305:
---

LGTM, though the test has some whitespace errors. +1 pending, feel free to fix 
at commit time via "git apply --whitespace=fix".

> Delayed heartbeat processing causes storm of subsequent heartbeats
> --
>
> Key: HDFS-9305
> URL: https://issues.apache.org/jira/browse/HDFS-9305
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.7.1
>Reporter: Chris Nauroth
>Assignee: Arpit Agarwal
> Attachments: HDFS-9305.01.patch, HDFS-9305.02.patch
>
>
> A DataNode typically sends a heartbeat to the NameNode every 3 seconds.  We 
> expect heartbeat handling to complete relatively quickly.  However, if 
> something unexpected causes heartbeat processing to get blocked, such as a 
> long GC or heavy lock contention within the NameNode, then heartbeat 
> processing would be delayed.  After recovering from this delay, the DataNode 
> then starts sending a storm of heartbeat messages in a tight loop.  In a 
> large cluster with many DataNodes, this storm of heartbeat messages could 
> cause harmful load on the NameNode and make overall cluster recovery more 
> difficult.
> The bug appears to be caused by incorrect timekeeping inside 
> {{BPServiceActor}}.  The next heartbeat time is always calculated as a delta 
> from the previous heartbeat time, without any compensation for possible long 
> latency on an individual heartbeat RPC.  The only mitigation would be 
> restarting all DataNodes to force a reset of the heartbeat schedule, or 
> simply wait out the storm until the scheduling catches up and corrects itself.
> This problem would not manifest after a NameNode restart.  In that case, the 
> NameNode would respond to the first heartbeat by telling the DataNode to 
> re-register, and {{BPServiceActor#reRegister}} would reset the heartbeat 
> schedule to the current time.  I believe the problem would only manifest if 
> the NameNode process kept alive, but processed heartbeats unexpectedly slowly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9007) Fix HDFS Balancer to honor upgrade domain policy

2015-10-26 Thread Ming Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Ma updated HDFS-9007:
--
Assignee: Ming Ma
  Status: Patch Available  (was: Open)

> Fix HDFS Balancer to honor upgrade domain policy
> 
>
> Key: HDFS-9007
> URL: https://issues.apache.org/jira/browse/HDFS-9007
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Ming Ma
>Assignee: Ming Ma
> Attachments: HDFS-9007.patch
>
>
> In the current design of HDFS Balancer, it doesn't use BlockPlacementPolicy 
> used by namenode runtime. Instead, it has somewhat redundant code to make 
> sure block allocation conforms with the rack policy.
> When namenode uses upgrade domain based policy, we need to make sure that 
> HDFS balancer doesn't move blocks in a way that could violate upgrade domain 
> block placement policy.
> In the longer term, we should consider how to make Balancer independent of 
> the actual BlockPlacementPolicy as in HDFS-1431. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9231) fsck doesn't explicitly list when Bad Replicas/Blocks are in a snapshot

2015-10-26 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14975154#comment-14975154
 ] 

Yongjun Zhang commented on HDFS-9231:
-

Thanks Xiao for the new revs. +1 on 009 pending jenkins.


> fsck doesn't explicitly list when Bad Replicas/Blocks are in a snapshot
> ---
>
> Key: HDFS-9231
> URL: https://issues.apache.org/jira/browse/HDFS-9231
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: snapshots
>Reporter: Xiao Chen
>Assignee: Xiao Chen
> Attachments: HDFS-9231.001.patch, HDFS-9231.002.patch, 
> HDFS-9231.003.patch, HDFS-9231.004.patch, HDFS-9231.005.patch, 
> HDFS-9231.006.patch, HDFS-9231.007.patch, HDFS-9231.008.patch, 
> HDFS-9231.009.patch
>
>
> Currently for snapshot files, {{fsck -list-corruptfileblocks}} shows corrupt 
> blocks with the original file dir instead of the snapshot dir, and {{fsck 
> -list-corruptfileblocks -includeSnapshots}} behave the same.
> This can be confusing because even when the original file is deleted, fsck 
> will still show that deleted file as corrupted, although what's actually 
> corrupted is the snapshot. 
> As a side note, {{fsck -files -includeSnapshots}} shows the snapshot dirs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-8128) hadoop-hdfs-client dependency convergence error

2015-10-26 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai resolved HDFS-8128.
--
Resolution: Cannot Reproduce

Resolving this issue. It looks like it is no longer reproducible in trunk. 

> hadoop-hdfs-client dependency convergence error
> ---
>
> Key: HDFS-8128
> URL: https://issues.apache.org/jira/browse/HDFS-8128
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: build
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Haohui Mai
>
> Found the following in 
> https://builds.apache.org/job/PreCommit-HDFS-Build/10258/consoleFull
> {noformat}
> [WARNING] Rule 0: org.apache.maven.plugins.enforcer.DependencyConvergence 
> failed with message:
> Failed while enforcing releasability the error(s) are [
> Dependency convergence error for 
> org.apache.hadoop:hadoop-annotations:3.0.0-SNAPSHOT paths to dependency are:
> +-org.apache.hadoop:hadoop-hdfs-client:3.0.0-SNAPSHOT
>   +-org.apache.hadoop:hadoop-common:3.0.0-SNAPSHOT
> +-org.apache.hadoop:hadoop-annotations:3.0.0-SNAPSHOT
> and
> +-org.apache.hadoop:hadoop-hdfs-client:3.0.0-SNAPSHOT
>   +-org.apache.hadoop:hadoop-annotations:3.0.0-20150410.234534-6484
> ]
> [INFO] 
> 
> [INFO] BUILD FAILURE
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9305) Delayed heartbeat processing causes storm of subsequent heartbeats

2015-10-26 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-9305:

Attachment: HDFS-9305.02.patch

> Delayed heartbeat processing causes storm of subsequent heartbeats
> --
>
> Key: HDFS-9305
> URL: https://issues.apache.org/jira/browse/HDFS-9305
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.7.1
>Reporter: Chris Nauroth
>Assignee: Arpit Agarwal
> Attachments: HDFS-9305.01.patch, HDFS-9305.02.patch
>
>
> A DataNode typically sends a heartbeat to the NameNode every 3 seconds.  We 
> expect heartbeat handling to complete relatively quickly.  However, if 
> something unexpected causes heartbeat processing to get blocked, such as a 
> long GC or heavy lock contention within the NameNode, then heartbeat 
> processing would be delayed.  After recovering from this delay, the DataNode 
> then starts sending a storm of heartbeat messages in a tight loop.  In a 
> large cluster with many DataNodes, this storm of heartbeat messages could 
> cause harmful load on the NameNode and make overall cluster recovery more 
> difficult.
> The bug appears to be caused by incorrect timekeeping inside 
> {{BPServiceActor}}.  The next heartbeat time is always calculated as a delta 
> from the previous heartbeat time, without any compensation for possible long 
> latency on an individual heartbeat RPC.  The only mitigation would be 
> restarting all DataNodes to force a reset of the heartbeat schedule, or 
> simply wait out the storm until the scheduling catches up and corrects itself.
> This problem would not manifest after a NameNode restart.  In that case, the 
> NameNode would respond to the first heartbeat by telling the DataNode to 
> re-register, and {{BPServiceActor#reRegister}} would reset the heartbeat 
> schedule to the current time.  I believe the problem would only manifest if 
> the NameNode process kept alive, but processed heartbeats unexpectedly slowly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9291) Fix TestInterDatanodeProtocol to be FsDataset-agnostic.

2015-10-26 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14974965#comment-14974965
 ] 

Colin Patrick McCabe commented on HDFS-9291:


+1.  Thanks, [~eddyxu].

> Fix TestInterDatanodeProtocol to be FsDataset-agnostic.
> ---
>
> Key: HDFS-9291
> URL: https://issues.apache.org/jira/browse/HDFS-9291
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: HDFS, test
>Affects Versions: 2.7.1
>Reporter: Lei (Eddy) Xu
>Assignee: Lei (Eddy) Xu
>Priority: Minor
> Attachments: HDFS-9291.00.patch
>
>
> {{TestInterDatanodeProtocol}} assumes the fsdataset is {{FsDatasetImpl}}. 
> This JIRA will make it dataset agnostic.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9268) fuse_dfs chown crashes when uid is passed as -1

2015-10-26 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14974983#comment-14974983
 ] 

Colin Patrick McCabe commented on HDFS-9268:


Thanks, [~zhz].  I'll open a follow-on JIRA for making {{fuseConnect}} private 
to {{fuse_connect.c}}.  Committing to 2.8

> fuse_dfs chown crashes when uid is passed as -1
> ---
>
> Key: HDFS-9268
> URL: https://issues.apache.org/jira/browse/HDFS-9268
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Wei-Chiu Chuang
>Assignee: Colin Patrick McCabe
>Priority: Minor
> Attachments: HDFS-9268.001.patch, HDFS-9268.002.patch
>
>
> JVM crashes when users attempt to use vi to update a file on fuse file system 
> with insufficient permission. (I use CDH's hadoop-fuse-dfs wrapper script to 
> generate the bug, but the same bug is reproducible in trunk)
> The root cause is a segfault in a dfs-fuse method
> To reproduce it do as follows:
> mkdir /mnt/fuse
> chmod 777 /mnt/fuse
> ulimit -c unlimited# to enable coredump
> hadoop-fuse-dfs -odebug hdfs://localhost:9000/fuse /mnt/fuse
> touch /mnt/fuse/y
> chmod 600 /mnt/fuse/y
> vim /mnt/fuse/y
> (in vim, :w to save the file)
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGSEGV (0xb) at pc=0x003b82f27ad6, pid=26606, tid=140079005689600
> #
> # JRE version: Java(TM) SE Runtime Environment (7.0_79-b15) (build 
> 1.7.0_79-b15)
> # Java VM: Java HotSpot(TM) 64-Bit Server VM (24.79-b02 mixed mode 
> linux-amd64 compressed oops)
> # Problematic frame:
> # C  [libc.so.6+0x127ad6]  __tls_get_addr@@GLIBC_2.3+0x127ad6
> #
> # Core dump written. Default location: /home/weichiu/core or core.26606
> #
> # An error report file with more information is saved as:
> # /home/weichiu/hs_err_pid26606.log
> #
> # If you would like to submit a bug report, please visit:
> #   http://bugreport.java.com/bugreport/crash.jsp
> # The crash happened outside the Java Virtual Machine in native code.
> # See problematic frame for where to report the bug.
> #
> /usr/bin/hadoop-fuse-dfs: line 29: 26606 Aborted (core 
> dumped) env CLASSPATH="${CLASSPATH}" ${HADOOP_HOME}/bin/fuse_dfs $@
> ===
> The coredump shows the segfault comes from 
> (gdb) bt
> #0  0x003b82e328e5 in raise () from /lib64/libc.so.6
> #1  0x003b82e340c5 in abort () from /lib64/libc.so.6
> #2  0x7f66fc924d75 in os::abort(bool) () from 
> /etc/alternatives/jre/jre/lib/amd64/server/libjvm.so
> #3  0x7f66fcaa76d7 in VMError::report_and_die() () from 
> /etc/alternatives/jre/jre/lib/amd64/server/libjvm.so
> #4  0x7f66fc929c8f in JVM_handle_linux_signal () from 
> /etc/alternatives/jre/jre/lib/amd64/server/libjvm.so
> #5  
> #6  0x003b82f27ad6 in __strcmp_sse42 () from /lib64/libc.so.6
> #7  0x004039a0 in hdfsConnTree_RB_FIND ()
> #8  0x00403e8f in fuseConnect ()
> #9  0x004046db in dfs_chown ()
> #10 0x7f66fcf8f6d2 in ?? () from /lib64/libfuse.so.2
> #11 0x7f66fcf940d1 in ?? () from /lib64/libfuse.so.2
> #12 0x7f66fcf910ef in ?? () from /lib64/libfuse.so.2
> #13 0x003b83207851 in start_thread () from /lib64/libpthread.so.0
> #14 0x003b82ee894d in clone () from /lib64/libc.so.6



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9301) HDFS clients can't construct HdfsConfiguration instances

2015-10-26 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14974981#comment-14974981
 ] 

Steve Loughran commented on HDFS-9301:
--

OK, I can verify that with this patch, my code compiles again.

However, I still think we ought to consider keeping hadoop-client pom depend on 
hadoop-hdfs, just as it pulls in other stuff (jets3t) that we don't really 
like. Then the hadoop-lean client can stay lean, and we could even have a 
policy for this "we can remove dependency JARs if we feel like it"

> HDFS clients can't construct HdfsConfiguration instances
> 
>
> Key: HDFS-9301
> URL: https://issues.apache.org/jira/browse/HDFS-9301
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Steve Loughran
>Assignee: Mingliang Liu
> Fix For: 2.8.0
>
> Attachments: HDFS-9241.000.patch, HDFS-9241.001.patch, 
> HDFS-9241.002.patch, HDFS-9241.003.patch, HDFS-9241.004.patch, 
> HDFS-9241.005.patch
>
>
> the changes for the hdfs client classpath make instantiating 
> {{HdfsConfiguration}} from the client impossible; it only lives server side. 
> This breaks any app which creates one.
> I know people will look at the {{@Private}} tag and say "don't do that then", 
> but it's worth considering precisely why I, at least, do this: it's the only 
> way to guarantee that the hdfs-default and hdfs-site resources get on the 
> classpath, including all the security settings. It's precisely the use case 
> which {{HdfsConfigurationLoader.init();}} offers internally to the hdfs code.
> What am I meant to do now? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9307) fuseConnect should be private to fuse_connect.c

2015-10-26 Thread Colin Patrick McCabe (JIRA)
Colin Patrick McCabe created HDFS-9307:
--

 Summary: fuseConnect should be private to fuse_connect.c
 Key: HDFS-9307
 URL: https://issues.apache.org/jira/browse/HDFS-9307
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: fuse-dfs
Reporter: Colin Patrick McCabe
Priority: Trivial


fuseConnect should be private to fuse_connect.c, since it's not used outside 
that file



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9307) fuseConnect should be private to fuse_connect.c

2015-10-26 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-9307:

Status: Patch Available  (was: Open)

> fuseConnect should be private to fuse_connect.c
> ---
>
> Key: HDFS-9307
> URL: https://issues.apache.org/jira/browse/HDFS-9307
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: fuse-dfs
>Reporter: Colin Patrick McCabe
>Assignee: Mingliang Liu
>Priority: Trivial
> Attachments: HDFS-9307.000.patch
>
>
> fuseConnect should be private to fuse_connect.c, since it's not used outside 
> that file



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9260) Improve performance and GC friendliness of startup and FBRs

2015-10-26 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14975040#comment-14975040
 ] 

Zhe Zhang commented on HDFS-9260:
-

Thanks for the great work Staffan.

The only change related to erasure coding is the below block ID translation, 
and I think it is done correctly.
{code}
+  long replicaID = replica.getBlockId();
+  if (BlockIdManager.isStripedBlockID(replicaID)
+  && (!hasNonEcBlockUsingStripedID ||
+  !blocksMap.containsBlock(replica))) {
+replicaID = BlockIdManager.convertToStripedID(replicaID);
   }
{code}

Will post a full review shortly.

> Improve performance and GC friendliness of startup and FBRs
> ---
>
> Key: HDFS-9260
> URL: https://issues.apache.org/jira/browse/HDFS-9260
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, namenode, performance
>Affects Versions: 2.7.1
>Reporter: Staffan Friberg
>Assignee: Staffan Friberg
> Attachments: HDFS Block and Replica Management 20151013.pdf, 
> HDFS-7435.001.patch, HDFS-7435.002.patch, HDFS-7435.003.patch, 
> HDFS-7435.004.patch, HDFS-7435.005.patch, HDFS-7435.006.patch, 
> HDFS-7435.007.patch
>
>
> This patch changes the datastructures used for BlockInfos and Replicas to 
> keep them sorted. This allows faster and more GC friendly handling of full 
> block reports.
> Would like to hear peoples feedback on this change and also some help 
> investigating/understanding a few outstanding issues if we are interested in 
> moving forward with this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9260) Improve performance and GC friendliness of startup and FBRs

2015-10-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14975042#comment-14975042
 ] 

Hadoop QA commented on HDFS-9260:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  29m 54s | Pre-patch trunk has 1 extant 
Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 19 new or modified test files. |
| {color:green}+1{color} | javac |  12m 32s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  14m 36s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 33s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   2m  3s | The applied patch generated  
21 new checkstyle issues (total was 883, now 892). |
| {color:red}-1{color} | whitespace |   0m 45s | The patch has 4  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   2m 11s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 55s | The patch built with 
eclipse:eclipse. |
| {color:red}-1{color} | findbugs |   3m  8s | The patch appears to introduce 3 
new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | native |   3m 36s | Pre-build of native portion |
| {color:red}-1{color} | hdfs tests |  57m 38s | Tests failed in hadoop-hdfs. |
| | | 128m 13s | |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-hdfs |
| Failed unit tests | hadoop.hdfs.TestReplaceDatanodeOnFailure |
| Timed out tests | 
org.apache.hadoop.hdfs.server.namenode.TestAddOverReplicatedStripedBlocks |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12768759/HDFS-7435.007.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 123b3db |
| Pre-patch Findbugs warnings | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13195/artifact/patchprocess/trunkFindbugsWarningshadoop-hdfs.html
 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-HDFS-Build/13195/artifact/patchprocess/diffcheckstylehadoop-hdfs.txt
 |
| whitespace | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13195/artifact/patchprocess/whitespace.txt
 |
| Findbugs warnings | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13195/artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13195/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13195/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf900.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13195/console |


This message was automatically generated.

> Improve performance and GC friendliness of startup and FBRs
> ---
>
> Key: HDFS-9260
> URL: https://issues.apache.org/jira/browse/HDFS-9260
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, namenode, performance
>Affects Versions: 2.7.1
>Reporter: Staffan Friberg
>Assignee: Staffan Friberg
> Attachments: HDFS Block and Replica Management 20151013.pdf, 
> HDFS-7435.001.patch, HDFS-7435.002.patch, HDFS-7435.003.patch, 
> HDFS-7435.004.patch, HDFS-7435.005.patch, HDFS-7435.006.patch, 
> HDFS-7435.007.patch
>
>
> This patch changes the datastructures used for BlockInfos and Replicas to 
> keep them sorted. This allows faster and more GC friendly handling of full 
> block reports.
> Would like to hear peoples feedback on this change and also some help 
> investigating/understanding a few outstanding issues if we are interested in 
> moving forward with this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9307) fuseConnect should be private to fuse_connect.c

2015-10-26 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-9307:

Attachment: HDFS-9307.001.patch

Thanks for your review, [~cmccabe]. The v1 patch addresses this and also 
refines the comments for the functions.

> fuseConnect should be private to fuse_connect.c
> ---
>
> Key: HDFS-9307
> URL: https://issues.apache.org/jira/browse/HDFS-9307
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: fuse-dfs
>Reporter: Colin Patrick McCabe
>Assignee: Mingliang Liu
>Priority: Trivial
> Attachments: HDFS-9307.000.patch, HDFS-9307.001.patch
>
>
> fuseConnect should be private to fuse_connect.c, since it's not used outside 
> that file



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9259) Make SO_SNDBUF size configurable at DFSClient side for hdfs write scenario

2015-10-26 Thread Ming Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14975177#comment-14975177
 ] 

Ming Ma commented on HDFS-9259:
---

Thanks [~liuml07]! The patch looks good. Just some format nits in 
TestDFSClientSocketSize: testAutoTuningSendBufferSize's assertTrue statement's 
indent is off; createSocket's createSocketForPipeline statement can put the 
last two lines into one line.

> Make SO_SNDBUF size configurable at DFSClient side for hdfs write scenario
> --
>
> Key: HDFS-9259
> URL: https://issues.apache.org/jira/browse/HDFS-9259
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Ming Ma
>Assignee: Mingliang Liu
> Attachments: HDFS-9259.000.patch
>
>
> We recently found that cross-DC hdfs write could be really slow. Further 
> investigation identified that is due to SendBufferSize and ReceiveBufferSize 
> used for hdfs write. The test ran "hadoop -fs -copyFromLocal" of a 256MB file 
> across DC with different SendBufferSize and ReceiveBufferSize values. The 
> results showed that c much faster than b; b is faster than a.
> a. SendBufferSize=128k, ReceiveBufferSize=128k (hdfs default setting).
> b. SendBufferSize=128K, ReceiveBufferSize=not set(TCP auto tuning).
> c. SendBufferSize=not set, ReceiveBufferSize=not set(TCP auto tuning for both)
> HDFS-8829 has enabled scenario b. We would like to enable scenario c by 
> making SendBufferSize configurable at DFSClient side. Cc: [~cmccabe] [~He 
> Tianyi] [~kanaka] [~vinayrpet].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9304) Add HdfsClientConfigKeys class to TestHdfsConfigFields#configurationClasses

2015-10-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14975235#comment-14975235
 ] 

Hudson commented on HDFS-9304:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #1323 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/1323/])
HDFS-9304. Add HdfsClientConfigKeys class to (wheat9: rev 
67e3d75aed1c1a90cabffc552d5743a69ea28b54)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/tools/TestHdfsConfigFields.java


> Add HdfsClientConfigKeys class to TestHdfsConfigFields#configurationClasses
> ---
>
> Key: HDFS-9304
> URL: https://issues.apache.org/jira/browse/HDFS-9304
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: build
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
> Fix For: 2.8.0
>
> Attachments: HDFS-9304.000.patch
>
>
> *tl;dr* Since {{HdfsClientConfigKeys}} holds client side config keys, we need 
> to add this class to {{TestHdfsConfigFields#configurationClasses}}.
> Now the {{TestHdfsConfigFields}} unit test passes because {{DFSConfigKeys}} 
> still contains all the client side config keys, though marked @deprecated. As 
> we add new client config keys (e.g. [HDFS-9259]), the unit test will fail 
> with the following error:
> {quote}
> hdfs-default.xml has 1 properties missing in  class 
> org.apache.hadoop.hdfs.DFSConfigKeys
> {quote}
> If the logic is to make the {{DFSConfigKeys}} and {{HdfsClientConfigKeys}} 
> together cover all config keys in {{hdfs-default.xml}}, we need to put both 
> of them in {{TestHdfsConfigFields#configurationClasses}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7284) Add more debug info to BlockInfoUnderConstruction#setGenerationStampAndVerifyReplicas

2015-10-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14975237#comment-14975237
 ] 

Hudson commented on HDFS-7284:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #1323 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/1323/])
HDFS-7284. Add more debug info to (yzhang: rev 
5e718de522328d1112ad38063596c204aa43f539)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockInfo.java
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/Block.java


> Add more debug info to 
> BlockInfoUnderConstruction#setGenerationStampAndVerifyReplicas
> -
>
> Key: HDFS-7284
> URL: https://issues.apache.org/jira/browse/HDFS-7284
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.5.1
>Reporter: Hu Liu,
>Assignee: Wei-Chiu Chuang
>  Labels: supportability
> Fix For: 2.8.0
>
> Attachments: HDFS-7284.001.patch, HDFS-7284.002.patch, 
> HDFS-7284.003.patch, HDFS-7284.004.patch, HDFS-7284.005.patch
>
>
> When I was looking at some replica loss issue, I got the following info from 
> log
> {code}
> 2014-10-13 01:54:53,104 INFO BlockStateChange: BLOCK* Removing stale replica 
> from location x.x.x.x
> {code}
> I could just know that a replica is removed, but I don't know which block and 
> its timestamp. I need to know the id and timestamp of the block from the log 
> file.
> So it's better to add more info including block id and timestamp to the code 
> snippet
> {code}
> for (ReplicaUnderConstruction r : replicas) {
>   if (genStamp != r.getGenerationStamp()) {
> r.getExpectedLocation().removeBlock(this);
> NameNode.blockStateChangeLog.info("BLOCK* Removing stale replica "
> + "from location: " + r.getExpectedLocation());
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9168) Move client side unit test to hadoop-hdfs-client

2015-10-26 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-9168:
-
Attachment: HDFS-9168.003.patch

> Move client side unit test to hadoop-hdfs-client
> 
>
> Key: HDFS-9168
> URL: https://issues.apache.org/jira/browse/HDFS-9168
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: build
>Reporter: Haohui Mai
>Assignee: Haohui Mai
> Attachments: HDFS-9168.000.patch, HDFS-9168.001.patch, 
> HDFS-9168.002.patch, HDFS-9168.003.patch
>
>
> We need to identify and move the unit tests on the client of hdfs to the 
> hdfs-client module.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9289) check genStamp when complete file

2015-10-26 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14975348#comment-14975348
 ] 

Zhe Zhang commented on HDFS-9289:
-

[~lichangleo] I think the below log shows that the client does have new GS 
{{1106111511603}} because the parameter {{newBlock}} is passed in from the 
client. So IIUC even if we check GS when completing file, as the patch does, it 
won't stop the client from completing / closing the file. Or could you describe 
how you think the patch can avoid this error? Thanks..

{code}
2015-10-20 19:49:20,392 [IPC Server handler 63 on 8020] INFO 
namenode.FSNamesystem: 
updatePipeline(BP-1052427332-98.138.108.146-1350583571998:blk_3773617405_1106111498065)
 successfully to 
BP-1052427332-98.138.108.146-1350583571998:blk_3773617405_1106111511603
{code}

> check genStamp when complete file
> -
>
> Key: HDFS-9289
> URL: https://issues.apache.org/jira/browse/HDFS-9289
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Chang Li
>Assignee: Chang Li
>Priority: Critical
> Attachments: HDFS-9289.1.patch, HDFS-9289.2.patch
>
>
> we have seen a case of corrupt block which is caused by file complete after a 
> pipelineUpdate, but the file complete with the old block genStamp. This 
> caused the replicas of two datanodes in updated pipeline to be viewed as 
> corrupte. Propose to check genstamp when commit block



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9260) Improve performance and GC friendliness of startup and FBRs

2015-10-26 Thread Staffan Friberg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Staffan Friberg updated HDFS-9260:
--
Attachment: HDFS-9260.008.patch

Using the right name of the bug on the patch...
Fixed white spaces and findbugs

The remaining should hopefully be OK.

./hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockInfo.java:60:35:
 Variable 'storages' must be private and have accessor methods.
Same as triplets was before

./hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java:204:16:
 Variable 'storageInfoMonitorThread' must be private and have accessor methods.
Same as replicationMonitor

./hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java:1:
 File length is 4,427 lines (max allowed is 2,000).
./hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java:2487:
 Comment matches to-do format 'TODO:'.
./hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java:2501:
 Comment matches to-do format 'TODO:'.
File was long before already, and the TODOs are kept from the earlier 
version of diffReport

./hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/util/TreeSet.java:221:19:
 Inner assignments should be avoided.
./hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/util/TreeSet.java:221:28:
 Inner assignments should be avoided.
./hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/util/TreeSet.java:221:35:
 Inner assignments should be avoided.
./hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/util/TreeSet.java:221:43:
 Inner assignments should be avoided.
Can change to separate lines writing null, but the current version is more 
compact in the clear method setting them all to null

> Improve performance and GC friendliness of startup and FBRs
> ---
>
> Key: HDFS-9260
> URL: https://issues.apache.org/jira/browse/HDFS-9260
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, namenode, performance
>Affects Versions: 2.7.1
>Reporter: Staffan Friberg
>Assignee: Staffan Friberg
> Attachments: HDFS Block and Replica Management 20151013.pdf, 
> HDFS-7435.001.patch, HDFS-7435.002.patch, HDFS-7435.003.patch, 
> HDFS-7435.004.patch, HDFS-7435.005.patch, HDFS-7435.006.patch, 
> HDFS-7435.007.patch, HDFS-9260.008.patch
>
>
> This patch changes the datastructures used for BlockInfos and Replicas to 
> keep them sorted. This allows faster and more GC friendly handling of full 
> block reports.
> Would like to hear peoples feedback on this change and also some help 
> investigating/understanding a few outstanding issues if we are interested in 
> moving forward with this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9299) Give ReplicationMonitor a readable thread name

2015-10-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14975409#comment-14975409
 ] 

Hadoop QA commented on HDFS-9299:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  18m 31s | Pre-patch trunk has 1 extant 
Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   8m  8s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 29s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 24s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   1m 23s | The applied patch generated  1 
new checkstyle issues (total was 161, now 161). |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 37s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   2m 36s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | native |   3m 19s | Pre-build of native portion |
| {color:red}-1{color} | hdfs tests |  63m 48s | Tests failed in hadoop-hdfs. |
| | | 110m 53s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.hdfs.server.namenode.ha.TestStandbyIsHot |
|   | hadoop.hdfs.server.namenode.ha.TestPipelinesFailover |
|   | hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot |
|   | hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication |
|   | hadoop.hdfs.TestAppendSnapshotTruncate |
| Timed out tests | org.apache.hadoop.hdfs.server.namenode.ha.TestHASafeMode |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12768380/HDFS-9299.001.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 37bf614 |
| Pre-patch Findbugs warnings | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13204/artifact/patchprocess/trunkFindbugsWarningshadoop-hdfs.html
 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-HDFS-Build/13204/artifact/patchprocess/diffcheckstylehadoop-hdfs.txt
 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13204/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13204/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13204/console |


This message was automatically generated.

> Give ReplicationMonitor a readable thread name
> --
>
> Key: HDFS-9299
> URL: https://issues.apache.org/jira/browse/HDFS-9299
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.7.1
>Reporter: Staffan Friberg
>Assignee: Staffan Friberg
>Priority: Trivial
> Attachments: HDFS-9299.001.patch
>
>
> Currently the log output from the Replication Monitor is the class name, by 
> setting the name on the thread the output will be easier to read.
> Current
> 2015-10-23 11:07:53,344 
> [org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor@2fbdc5dd]
>  INFO  blockmanagement.BlockManager (BlockManager.java:run(4125)) - Stopping 
> ReplicationMonitor.
> After
> 2015-10-23 11:07:53,344 [ReplicationMonitor] INFO  
> blockmanagement.BlockManager (BlockManager.java:run(4125)) - Stopping 
> ReplicationMonitor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9314) Improve BlockPlacementPolicyDefault's picking of excess replicas

2015-10-26 Thread Ming Ma (JIRA)
Ming Ma created HDFS-9314:
-

 Summary: Improve BlockPlacementPolicyDefault's picking of excess 
replicas
 Key: HDFS-9314
 URL: https://issues.apache.org/jira/browse/HDFS-9314
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Ming Ma


The test case used in HDFS-9313 identified NullPointerException as well as the 
limitation of excess replica picking. If the current replicas are on {SSD(rack 
r1), DISK(rack 1), DISK(rack 2), DISK(rack 2)} and the storage policy changes 
to HOT_STORAGE_POLICY_ID, BlockPlacementPolicyDefault's won't be able to delete 
SSD replica.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9018) Update the pom to add junit dependency and move TestXAttr to client project

2015-10-26 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-9018:
-
Resolution: Duplicate
Status: Resolved  (was: Patch Available)

HDFS-9168 moves all unit tests that only refer to the {{hadoop-hdfs-client}} 
package. Closing this one as a duplicate of HDFS-9168.

> Update the pom to add junit dependency and move TestXAttr to client project
> ---
>
> Key: HDFS-9018
> URL: https://issues.apache.org/jira/browse/HDFS-9018
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: build
>Reporter: Kanaka Kumar Avvaru
>Assignee: Kanaka Kumar Avvaru
> Attachments: HDFS-9018.patch
>
>
> Update the pom to add junit dependency and move 
> {{org.apache.hadoop.fs.TestXAttr}}  to client project to start with test 
> movement



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9308) Add truncateMeta() to MiniDFSCluster

2015-10-26 Thread Tony Wu (JIRA)
Tony Wu created HDFS-9308:
-

 Summary: Add truncateMeta() to MiniDFSCluster
 Key: HDFS-9308
 URL: https://issues.apache.org/jira/browse/HDFS-9308
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: HDFS, test
Affects Versions: 2.7.1
Reporter: Tony Wu
Assignee: Tony Wu
Priority: Minor


HDFS-9188 introduced {{corruptMeta()}} method to make corrupting the metadata 
file filesystem agnostic. There should also be a {{truncateMeta()}} method to 
allow truncation of metadata files on DataNodes without writing code that's 
specific to underling file system. 

This will be useful for tests such as 
{{TestLeaseRecovery#testBlockRecoveryWithLessMetafile}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9305) Delayed heartbeat processing causes storm of subsequent heartbeats

2015-10-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14975194#comment-14975194
 ] 

Hadoop QA commented on HDFS-9305:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  20m 35s | Pre-patch trunk has 1 extant 
Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   8m 52s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  11m 45s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 35s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   1m 39s | There were no new checkstyle 
issues. |
| {color:red}-1{color} | whitespace |   0m  0s | The patch has 3  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   2m  9s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 41s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   2m 50s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | native |   3m 57s | Pre-build of native portion |
| {color:red}-1{color} | hdfs tests |  63m 50s | Tests failed in hadoop-hdfs. |
| | | 116m 57s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes |
|   | hadoop.hdfs.server.blockmanagement.TestNodeCount |
|   | hadoop.hdfs.TestWriteReadStripedFile |
|   | hadoop.hdfs.server.namenode.TestFileTruncate |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12768795/HDFS-9305.02.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 3cc7377 |
| Pre-patch Findbugs warnings | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13197/artifact/patchprocess/trunkFindbugsWarningshadoop-hdfs.html
 |
| whitespace | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13197/artifact/patchprocess/whitespace.txt
 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13197/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13197/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13197/console |


This message was automatically generated.

> Delayed heartbeat processing causes storm of subsequent heartbeats
> --
>
> Key: HDFS-9305
> URL: https://issues.apache.org/jira/browse/HDFS-9305
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.7.1
>Reporter: Chris Nauroth
>Assignee: Arpit Agarwal
> Attachments: HDFS-9305.01.patch, HDFS-9305.02.patch
>
>
> A DataNode typically sends a heartbeat to the NameNode every 3 seconds.  We 
> expect heartbeat handling to complete relatively quickly.  However, if 
> something unexpected causes heartbeat processing to get blocked, such as a 
> long GC or heavy lock contention within the NameNode, then heartbeat 
> processing would be delayed.  After recovering from this delay, the DataNode 
> then starts sending a storm of heartbeat messages in a tight loop.  In a 
> large cluster with many DataNodes, this storm of heartbeat messages could 
> cause harmful load on the NameNode and make overall cluster recovery more 
> difficult.
> The bug appears to be caused by incorrect timekeeping inside 
> {{BPServiceActor}}.  The next heartbeat time is always calculated as a delta 
> from the previous heartbeat time, without any compensation for possible long 
> latency on an individual heartbeat RPC.  The only mitigation would be 
> restarting all DataNodes to force a reset of the heartbeat schedule, or 
> simply wait out the storm until the scheduling catches up and corrects itself.
> This problem would not manifest after a NameNode restart.  In that case, the 
> NameNode would respond to the first heartbeat by telling the DataNode to 
> re-register, and {{BPServiceActor#reRegister}} would reset the heartbeat 
> schedule to the current time.  I believe the problem would only manifest if 
> the NameNode process kept alive, but processed heartbeats unexpectedly slowly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-4937) ReplicationMonitor can infinite-loop in BlockPlacementPolicyDefault#chooseRandom()

2015-10-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14975240#comment-14975240
 ] 

Hadoop QA commented on HDFS-4937:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  16m 36s | Findbugs (version ) appears to 
be broken on trunk. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   8m  2s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 39s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 37s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 39s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 36s | The patch built with 
eclipse:eclipse. |
| {color:red}-1{color} | findbugs |   2m 35s | The patch appears to introduce 1 
new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | native |   3m 13s | Pre-build of native portion |
| {color:red}-1{color} | hdfs tests |  50m 48s | Tests failed in hadoop-hdfs. |
| | |  95m 11s | |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-hdfs |
| Failed unit tests | 
hadoop.hdfs.server.namenode.snapshot.TestSnapshotBlocksMap |
|   | hadoop.hdfs.server.namenode.ha.TestStandbyCheckpoints |
|   | hadoop.hdfs.server.namenode.snapshot.TestRenameWithSnapshots |
|   | hadoop.hdfs.server.namenode.ha.TestEditLogTailer |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12768765/HDFS-4937.v1.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 2f1eb2b |
| Findbugs warnings | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13202/artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13202/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13202/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf902.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13202/console |


This message was automatically generated.

> ReplicationMonitor can infinite-loop in 
> BlockPlacementPolicyDefault#chooseRandom()
> --
>
> Key: HDFS-4937
> URL: https://issues.apache.org/jira/browse/HDFS-4937
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.0.4-alpha, 0.23.8
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>  Labels: BB2015-05-TBR
> Attachments: HDFS-4937.patch, HDFS-4937.v1.patch
>
>
> When a large number of nodes are removed by refreshing node lists, the 
> network topology is updated. If the refresh happens at the right moment, the 
> replication monitor thread may stuck in the while loop of {{chooseRandom()}}. 
> This is because the cached cluster size is used in the terminal condition 
> check of the loop. This usually happens when a block with a high replication 
> factor is being processed. Since replicas/rack is also calculated beforehand, 
> no node choice may satisfy the goodness criteria if refreshing removed racks. 
> All nodes will end up in the excluded list, but the size will still be less 
> than the cached cluster size, so it will loop infinitely. This was observed 
> in a production environment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9268) fuse_dfs chown crashes when uid is passed as -1

2015-10-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14975236#comment-14975236
 ] 

Hudson commented on HDFS-9268:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #1323 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/1323/])
HDFS-9268. fuse_dfs chown crashes when uid is passed as -1 (cmccabe) (cmccabe: 
rev 2f1eb2bceb1df5f27649a514246b38b9ccf60cba)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/fuse-dfs/fuse_impls_chown.c


> fuse_dfs chown crashes when uid is passed as -1
> ---
>
> Key: HDFS-9268
> URL: https://issues.apache.org/jira/browse/HDFS-9268
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Wei-Chiu Chuang
>Assignee: Colin Patrick McCabe
>Priority: Minor
> Fix For: 2.8.0
>
> Attachments: HDFS-9268.001.patch, HDFS-9268.002.patch
>
>
> JVM crashes when users attempt to use vi to update a file on fuse file system 
> with insufficient permission. (I use CDH's hadoop-fuse-dfs wrapper script to 
> generate the bug, but the same bug is reproducible in trunk)
> The root cause is a segfault in a dfs-fuse method
> To reproduce it do as follows:
> mkdir /mnt/fuse
> chmod 777 /mnt/fuse
> ulimit -c unlimited# to enable coredump
> hadoop-fuse-dfs -odebug hdfs://localhost:9000/fuse /mnt/fuse
> touch /mnt/fuse/y
> chmod 600 /mnt/fuse/y
> vim /mnt/fuse/y
> (in vim, :w to save the file)
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGSEGV (0xb) at pc=0x003b82f27ad6, pid=26606, tid=140079005689600
> #
> # JRE version: Java(TM) SE Runtime Environment (7.0_79-b15) (build 
> 1.7.0_79-b15)
> # Java VM: Java HotSpot(TM) 64-Bit Server VM (24.79-b02 mixed mode 
> linux-amd64 compressed oops)
> # Problematic frame:
> # C  [libc.so.6+0x127ad6]  __tls_get_addr@@GLIBC_2.3+0x127ad6
> #
> # Core dump written. Default location: /home/weichiu/core or core.26606
> #
> # An error report file with more information is saved as:
> # /home/weichiu/hs_err_pid26606.log
> #
> # If you would like to submit a bug report, please visit:
> #   http://bugreport.java.com/bugreport/crash.jsp
> # The crash happened outside the Java Virtual Machine in native code.
> # See problematic frame for where to report the bug.
> #
> /usr/bin/hadoop-fuse-dfs: line 29: 26606 Aborted (core 
> dumped) env CLASSPATH="${CLASSPATH}" ${HADOOP_HOME}/bin/fuse_dfs $@
> ===
> The coredump shows the segfault comes from 
> (gdb) bt
> #0  0x003b82e328e5 in raise () from /lib64/libc.so.6
> #1  0x003b82e340c5 in abort () from /lib64/libc.so.6
> #2  0x7f66fc924d75 in os::abort(bool) () from 
> /etc/alternatives/jre/jre/lib/amd64/server/libjvm.so
> #3  0x7f66fcaa76d7 in VMError::report_and_die() () from 
> /etc/alternatives/jre/jre/lib/amd64/server/libjvm.so
> #4  0x7f66fc929c8f in JVM_handle_linux_signal () from 
> /etc/alternatives/jre/jre/lib/amd64/server/libjvm.so
> #5  
> #6  0x003b82f27ad6 in __strcmp_sse42 () from /lib64/libc.so.6
> #7  0x004039a0 in hdfsConnTree_RB_FIND ()
> #8  0x00403e8f in fuseConnect ()
> #9  0x004046db in dfs_chown ()
> #10 0x7f66fcf8f6d2 in ?? () from /lib64/libfuse.so.2
> #11 0x7f66fcf940d1 in ?? () from /lib64/libfuse.so.2
> #12 0x7f66fcf910ef in ?? () from /lib64/libfuse.so.2
> #13 0x003b83207851 in start_thread () from /lib64/libpthread.so.0
> #14 0x003b82ee894d in clone () from /lib64/libc.so.6



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9277) IOException "Unable to load OAuth2 connection factory." in TestWebHDFSOAuth2.listStatusReturnsAsExpected

2015-10-26 Thread Wei-Chiu Chuang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14975248#comment-14975248
 ] 

Wei-Chiu Chuang commented on HDFS-9277:
---

This failure occurred, because ssl-client.xml is present in the directory, but 
when the runtime read the file and tried to get the truststore file, the 
truststore file is not in the directory.
This can occur if other tests which executes before this one do not delete 
files properly. (See HDFS-9309 for the ongoing effort to delete the SSL related 
files properly), and the isolation between tests are broken.

To address this issue, I suggest ssl related files should be removed before the 
test is executed. In addition, properly create SSL related files to test OAuth2 
connection on SSL.

> IOException "Unable to load OAuth2 connection factory." in 
> TestWebHDFSOAuth2.listStatusReturnsAsExpected
> 
>
> Key: HDFS-9277
> URL: https://issues.apache.org/jira/browse/HDFS-9277
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Wei-Chiu Chuang
>
> This test is failing consistently in Hadoop-hdfs-trunk and 
> Hadoop-hdfs-trunk-Java8 since September 22.
> REGRESSION:  
> org.apache.hadoop.hdfs.web.TestWebHDFSOAuth2.listStatusReturnsAsExpected
> Error Message:
> Unable to load OAuth2 connection factory.
> Stack Trace:
> java.io.IOException: Unable to load OAuth2 connection factory.
>   at java.io.FileInputStream.open(Native Method)
>   at java.io.FileInputStream.(FileInputStream.java:146)
>   at 
> org.apache.hadoop.security.ssl.ReloadingX509TrustManager.loadTrustManager(ReloadingX509TrustManager.java:164)
>   at 
> org.apache.hadoop.security.ssl.ReloadingX509TrustManager.(ReloadingX509TrustManager.java:81)
>   at 
> org.apache.hadoop.security.ssl.FileBasedKeyStoresFactory.init(FileBasedKeyStoresFactory.java:215)
>   at org.apache.hadoop.security.ssl.SSLFactory.init(SSLFactory.java:131)
>   at 
> org.apache.hadoop.hdfs.web.URLConnectionFactory.newSslConnConfigurator(URLConnectionFactory.java:135)
>   at 
> org.apache.hadoop.hdfs.web.URLConnectionFactory.newOAuth2URLConnectionFactory(URLConnectionFactory.java:110)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.initialize(WebHdfsFileSystem.java:158)
>   at 
> org.apache.hadoop.hdfs.web.TestWebHDFSOAuth2.listStatusReturnsAsExpected(TestWebHDFSOAuth2.java:147)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7284) Add more debug info to BlockInfoUnderConstruction#setGenerationStampAndVerifyReplicas

2015-10-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14975269#comment-14975269
 ] 

Hudson commented on HDFS-7284:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #599 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/599/])
HDFS-7284. Add more debug info to (yzhang: rev 
5e718de522328d1112ad38063596c204aa43f539)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockInfo.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/Block.java


> Add more debug info to 
> BlockInfoUnderConstruction#setGenerationStampAndVerifyReplicas
> -
>
> Key: HDFS-7284
> URL: https://issues.apache.org/jira/browse/HDFS-7284
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.5.1
>Reporter: Hu Liu,
>Assignee: Wei-Chiu Chuang
>  Labels: supportability
> Fix For: 2.8.0
>
> Attachments: HDFS-7284.001.patch, HDFS-7284.002.patch, 
> HDFS-7284.003.patch, HDFS-7284.004.patch, HDFS-7284.005.patch
>
>
> When I was looking at some replica loss issue, I got the following info from 
> log
> {code}
> 2014-10-13 01:54:53,104 INFO BlockStateChange: BLOCK* Removing stale replica 
> from location x.x.x.x
> {code}
> I could just know that a replica is removed, but I don't know which block and 
> its timestamp. I need to know the id and timestamp of the block from the log 
> file.
> So it's better to add more info including block id and timestamp to the code 
> snippet
> {code}
> for (ReplicaUnderConstruction r : replicas) {
>   if (genStamp != r.getGenerationStamp()) {
> r.getExpectedLocation().removeBlock(this);
> NameNode.blockStateChangeLog.info("BLOCK* Removing stale replica "
> + "from location: " + r.getExpectedLocation());
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9268) fuse_dfs chown crashes when uid is passed as -1

2015-10-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14975268#comment-14975268
 ] 

Hudson commented on HDFS-9268:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #599 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/599/])
HDFS-9268. fuse_dfs chown crashes when uid is passed as -1 (cmccabe) (cmccabe: 
rev 2f1eb2bceb1df5f27649a514246b38b9ccf60cba)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/fuse-dfs/fuse_impls_chown.c


> fuse_dfs chown crashes when uid is passed as -1
> ---
>
> Key: HDFS-9268
> URL: https://issues.apache.org/jira/browse/HDFS-9268
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Wei-Chiu Chuang
>Assignee: Colin Patrick McCabe
>Priority: Minor
> Fix For: 2.8.0
>
> Attachments: HDFS-9268.001.patch, HDFS-9268.002.patch
>
>
> JVM crashes when users attempt to use vi to update a file on fuse file system 
> with insufficient permission. (I use CDH's hadoop-fuse-dfs wrapper script to 
> generate the bug, but the same bug is reproducible in trunk)
> The root cause is a segfault in a dfs-fuse method
> To reproduce it do as follows:
> mkdir /mnt/fuse
> chmod 777 /mnt/fuse
> ulimit -c unlimited# to enable coredump
> hadoop-fuse-dfs -odebug hdfs://localhost:9000/fuse /mnt/fuse
> touch /mnt/fuse/y
> chmod 600 /mnt/fuse/y
> vim /mnt/fuse/y
> (in vim, :w to save the file)
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGSEGV (0xb) at pc=0x003b82f27ad6, pid=26606, tid=140079005689600
> #
> # JRE version: Java(TM) SE Runtime Environment (7.0_79-b15) (build 
> 1.7.0_79-b15)
> # Java VM: Java HotSpot(TM) 64-Bit Server VM (24.79-b02 mixed mode 
> linux-amd64 compressed oops)
> # Problematic frame:
> # C  [libc.so.6+0x127ad6]  __tls_get_addr@@GLIBC_2.3+0x127ad6
> #
> # Core dump written. Default location: /home/weichiu/core or core.26606
> #
> # An error report file with more information is saved as:
> # /home/weichiu/hs_err_pid26606.log
> #
> # If you would like to submit a bug report, please visit:
> #   http://bugreport.java.com/bugreport/crash.jsp
> # The crash happened outside the Java Virtual Machine in native code.
> # See problematic frame for where to report the bug.
> #
> /usr/bin/hadoop-fuse-dfs: line 29: 26606 Aborted (core 
> dumped) env CLASSPATH="${CLASSPATH}" ${HADOOP_HOME}/bin/fuse_dfs $@
> ===
> The coredump shows the segfault comes from 
> (gdb) bt
> #0  0x003b82e328e5 in raise () from /lib64/libc.so.6
> #1  0x003b82e340c5 in abort () from /lib64/libc.so.6
> #2  0x7f66fc924d75 in os::abort(bool) () from 
> /etc/alternatives/jre/jre/lib/amd64/server/libjvm.so
> #3  0x7f66fcaa76d7 in VMError::report_and_die() () from 
> /etc/alternatives/jre/jre/lib/amd64/server/libjvm.so
> #4  0x7f66fc929c8f in JVM_handle_linux_signal () from 
> /etc/alternatives/jre/jre/lib/amd64/server/libjvm.so
> #5  
> #6  0x003b82f27ad6 in __strcmp_sse42 () from /lib64/libc.so.6
> #7  0x004039a0 in hdfsConnTree_RB_FIND ()
> #8  0x00403e8f in fuseConnect ()
> #9  0x004046db in dfs_chown ()
> #10 0x7f66fcf8f6d2 in ?? () from /lib64/libfuse.so.2
> #11 0x7f66fcf940d1 in ?? () from /lib64/libfuse.so.2
> #12 0x7f66fcf910ef in ?? () from /lib64/libfuse.so.2
> #13 0x003b83207851 in start_thread () from /lib64/libpthread.so.0
> #14 0x003b82ee894d in clone () from /lib64/libc.so.6



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9277) IOException "Unable to load OAuth2 connection factory." in TestWebHDFSOAuth2.listStatusReturnsAsExpected

2015-10-26 Thread Wei-Chiu Chuang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14975261#comment-14975261
 ] 

Wei-Chiu Chuang commented on HDFS-9277:
---

Which is to say, test OAuth2 connection w/o SSL, and then test OAuth2 
connection w/ SSL

> IOException "Unable to load OAuth2 connection factory." in 
> TestWebHDFSOAuth2.listStatusReturnsAsExpected
> 
>
> Key: HDFS-9277
> URL: https://issues.apache.org/jira/browse/HDFS-9277
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>
> This test is failing consistently in Hadoop-hdfs-trunk and 
> Hadoop-hdfs-trunk-Java8 since September 22.
> REGRESSION:  
> org.apache.hadoop.hdfs.web.TestWebHDFSOAuth2.listStatusReturnsAsExpected
> Error Message:
> Unable to load OAuth2 connection factory.
> Stack Trace:
> java.io.IOException: Unable to load OAuth2 connection factory.
>   at java.io.FileInputStream.open(Native Method)
>   at java.io.FileInputStream.(FileInputStream.java:146)
>   at 
> org.apache.hadoop.security.ssl.ReloadingX509TrustManager.loadTrustManager(ReloadingX509TrustManager.java:164)
>   at 
> org.apache.hadoop.security.ssl.ReloadingX509TrustManager.(ReloadingX509TrustManager.java:81)
>   at 
> org.apache.hadoop.security.ssl.FileBasedKeyStoresFactory.init(FileBasedKeyStoresFactory.java:215)
>   at org.apache.hadoop.security.ssl.SSLFactory.init(SSLFactory.java:131)
>   at 
> org.apache.hadoop.hdfs.web.URLConnectionFactory.newSslConnConfigurator(URLConnectionFactory.java:135)
>   at 
> org.apache.hadoop.hdfs.web.URLConnectionFactory.newOAuth2URLConnectionFactory(URLConnectionFactory.java:110)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.initialize(WebHdfsFileSystem.java:158)
>   at 
> org.apache.hadoop.hdfs.web.TestWebHDFSOAuth2.listStatusReturnsAsExpected(TestWebHDFSOAuth2.java:147)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9309) Tests that use KeyStoreUtil must call KeyStoreUtil.cleanupSSLConfig()

2015-10-26 Thread Wei-Chiu Chuang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-9309:
--
Attachment: HDFS-9309.001.patch

> Tests that use KeyStoreUtil must call KeyStoreUtil.cleanupSSLConfig()
> -
>
> Key: HDFS-9309
> URL: https://issues.apache.org/jira/browse/HDFS-9309
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>Priority: Minor
> Attachments: HDFS-9309.001.patch
>
>
> When KeyStoreUtil.setupSSLConfig() is called, several files are created 
> (ssl-server.xml, ssl-client.xml, trustKS.jks, clientKS.jks, serverKS.jks). 
> However, if they are not deleted upon exit, weird thing can happen to any 
> subsequent tests.
> For example, if ssl-client.xml is not delete, but trustKS.jks is deleted, 
> TestWebHDFSOAuth2.listStatusReturnsAsExpected will fail with message:
> {noformat}
> java.io.IOException: Unable to load OAuth2 connection factory.
>   at java.io.FileInputStream.open(Native Method)
>   at java.io.FileInputStream.(FileInputStream.java:146)
>   at 
> org.apache.hadoop.security.ssl.ReloadingX509TrustManager.loadTrustManager(ReloadingX509TrustManager.java:164)
>   at 
> org.apache.hadoop.security.ssl.ReloadingX509TrustManager.(ReloadingX509TrustManager.java:81)
>   at 
> org.apache.hadoop.security.ssl.FileBasedKeyStoresFactory.init(FileBasedKeyStoresFactory.java:215)
>   at org.apache.hadoop.security.ssl.SSLFactory.init(SSLFactory.java:131)
>   at 
> org.apache.hadoop.hdfs.web.URLConnectionFactory.newSslConnConfigurator(URLConnectionFactory.java:138)
>   at 
> org.apache.hadoop.hdfs.web.URLConnectionFactory.newOAuth2URLConnectionFactory(URLConnectionFactory.java:112)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.initialize(WebHdfsFileSystem.java:163)
>   at 
> org.apache.hadoop.hdfs.web.TestWebHDFSOAuth2.listStatusReturnsAsExpected(TestWebHDFSOAuth2.java:147)
> {noformat}
> There are currently several tests that do not clean up:
> {noformat}
> 130 ✗ weichiu@weichiu ~/trunk (trunk) $ grep -rnw . -e 
> 'KeyStoreTestUtil\.setupSSLConfig' | cut -d: -f1 |xargs grep -L 
> "KeyStoreTestUtil\.cleanupSSLConfig"
> ./hadoop-common-project/hadoop-kms/src/test/java/org/apache/hadoop/crypto/key/kms/server/TestKMS.java
> ./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/timeline/webapp/TestTimelineWebServicesWithSSL.java
> ./hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/web/TestWebHdfsTokens.java
> ./hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/protocol/datatransfer/sasl/SaslDataTransferTestCase.java
> ./hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/qjournal/TestSecureNNWithQJM.java
> ./hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNameNodeRespectsBindHostKeys.java
> ./hadoop-hdfs-project/hadoop-hdfs-httpfs/src/test/java/org/apache/hadoop/fs/http/client/TestHttpFSFWithSWebhdfsFileSystem.java
> {noformat}
> This JIRA is the effort to remove the bug.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8945) Update the description about replica placement in HDFS Architecture documentation

2015-10-26 Thread Andrew Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wang updated HDFS-8945:
--
   Resolution: Fixed
Fix Version/s: 2.8.0
   Status: Resolved  (was: Patch Available)

Committed to trunk and branch-2, thanks for the contribution [~iwasakims]!

> Update the description about replica placement in HDFS Architecture 
> documentation
> -
>
> Key: HDFS-8945
> URL: https://issues.apache.org/jira/browse/HDFS-8945
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Masatake Iwasaki
>Assignee: Masatake Iwasaki
>Priority: Minor
> Fix For: 2.8.0
>
> Attachments: HDFS-8945.001.patch, HDFS-8945.002.patch
>
>
> The description about replica placement should have
> * Explanation about storage types and storage policies should be added
> * placement policy for replication factor greater than 4



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8945) Update the description about replica placement in HDFS Architecture documentation

2015-10-26 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14975270#comment-14975270
 ] 

Andrew Wang commented on HDFS-8945:
---

My bad for leaving this for so long, +1 LGTM will commit shortly.

> Update the description about replica placement in HDFS Architecture 
> documentation
> -
>
> Key: HDFS-8945
> URL: https://issues.apache.org/jira/browse/HDFS-8945
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Masatake Iwasaki
>Assignee: Masatake Iwasaki
>Priority: Minor
> Attachments: HDFS-8945.001.patch, HDFS-8945.002.patch
>
>
> The description about replica placement should have
> * Explanation about storage types and storage policies should be added
> * placement policy for replication factor greater than 4



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9304) Add HdfsClientConfigKeys class to TestHdfsConfigFields#configurationClasses

2015-10-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14975267#comment-14975267
 ] 

Hudson commented on HDFS-9304:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #599 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/599/])
HDFS-9304. Add HdfsClientConfigKeys class to (wheat9: rev 
67e3d75aed1c1a90cabffc552d5743a69ea28b54)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/tools/TestHdfsConfigFields.java


> Add HdfsClientConfigKeys class to TestHdfsConfigFields#configurationClasses
> ---
>
> Key: HDFS-9304
> URL: https://issues.apache.org/jira/browse/HDFS-9304
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: build
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
> Fix For: 2.8.0
>
> Attachments: HDFS-9304.000.patch
>
>
> *tl;dr* Since {{HdfsClientConfigKeys}} holds client side config keys, we need 
> to add this class to {{TestHdfsConfigFields#configurationClasses}}.
> Now the {{TestHdfsConfigFields}} unit test passes because {{DFSConfigKeys}} 
> still contains all the client side config keys, though marked @deprecated. As 
> we add new client config keys (e.g. [HDFS-9259]), the unit test will fail 
> with the following error:
> {quote}
> hdfs-default.xml has 1 properties missing in  class 
> org.apache.hadoop.hdfs.DFSConfigKeys
> {quote}
> If the logic is to make the {{DFSConfigKeys}} and {{HdfsClientConfigKeys}} 
> together cover all config keys in {{hdfs-default.xml}}, we need to put both 
> of them in {{TestHdfsConfigFields#configurationClasses}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9310) TestDataNodeHotSwapVolumes fails occasionally

2015-10-26 Thread Arpit Agarwal (JIRA)
Arpit Agarwal created HDFS-9310:
---

 Summary: TestDataNodeHotSwapVolumes fails occasionally
 Key: HDFS-9310
 URL: https://issues.apache.org/jira/browse/HDFS-9310
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 2.8.0
Reporter: Arpit Agarwal


TestDataNodeHotSwapVolumes fails occasionally in Jenkins and locally. e.g. 
https://builds.apache.org/job/PreCommit-HDFS-Build/13197/testReport/org.apache.hadoop.hdfs.server.datanode/TestDataNodeHotSwapVolumes/testRemoveVolumeBeingWritten/

{code}
Error Message

Timed out waiting for /test to reach 3 replicas
Stacktrace

java.util.concurrent.TimeoutException: Timed out waiting for /test to reach 3 
replicas
at 
org.apache.hadoop.hdfs.DFSTestUtil.waitReplication(DFSTestUtil.java:768)
at 
org.apache.hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes.testRemoveVolumeBeingWrittenForDatanode(TestDataNodeHotSwapVolumes.java:644)
at 
org.apache.hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes.testRemoveVolumeBeingWritten(TestDataNodeHotSwapVolumes.java:569)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9311) Support optional offload of NameNode HA service health checks to a separate RPC server.

2015-10-26 Thread Chris Nauroth (JIRA)
Chris Nauroth created HDFS-9311:
---

 Summary: Support optional offload of NameNode HA service health 
checks to a separate RPC server.
 Key: HDFS-9311
 URL: https://issues.apache.org/jira/browse/HDFS-9311
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ha, namenode
Reporter: Chris Nauroth
Assignee: Chris Nauroth


When a NameNode is overwhelmed with load, it can lead to resource exhaustion of 
the RPC handler pools (both client-facing and service-facing).  Eventually, 
this blocks the health check RPC issued from ZKFC, which triggers a failover.  
Depending on fencing configuration, the former active NameNode may be killed.  
In an overloaded situation, the new active NameNode is likely to suffer the 
same fate, because client load patterns don't change after the failover.  This 
can degenerate into flapping between the 2 NameNodes without real recovery.  If 
a NameNode had been killed by fencing, then it would have to transition through 
safe mode, further delaying time to recovery.

This issue proposes a separate, optional RPC server at the NameNode for 
isolating the HA health checks.  These health checks are lightweight operations 
that do not suffer from contention issues on the namesystem lock or other 
shared resources.  Isolating the RPC handlers is sufficient to avoid this 
situation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8871) Decommissioning of a node with a failed volume may not start

2015-10-26 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14975386#comment-14975386
 ] 

Vinod Kumar Vavilapalli commented on HDFS-8871:
---

[~daryn] / [~kihwal], any update on this? Considering this for a 2.7.2 RC this 
weekend. Thanks.

> Decommissioning of a node with a failed volume may not start
> 
>
> Key: HDFS-8871
> URL: https://issues.apache.org/jira/browse/HDFS-8871
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Kihwal Lee
>Assignee: Daryn Sharp
>Priority: Critical
>
> Since staleness may not be properly cleared, a node with a failed volume may 
> not actually get scanned for block replication. Nothing is being replicated 
> from these nodes.
> This bug does not manifest unless the datanode has a unique storage ID per 
> volume. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9313) Possible NullPointerException in BlockManager if no excess replica can be chosen

2015-10-26 Thread Ming Ma (JIRA)
Ming Ma created HDFS-9313:
-

 Summary: Possible NullPointerException in BlockManager if no 
excess replica can be chosen
 Key: HDFS-9313
 URL: https://issues.apache.org/jira/browse/HDFS-9313
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Ming Ma


HDFS-8647 makes it easier to reason about various block placement scenarios. 
Here is one possible case where BlockManager won't be able to find the excess 
replica to delete: when storage policy changes around the same time balancer 
moves the block. When this happens, it will cause NullPointerException.

{noformat}
java.lang.NullPointerException
at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy.adjustSetsWithChosenReplica(BlockPlacementPolicy.java:156)
at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseReplicasToDelete(BlockPlacementPolicyDefault.java:978)
{noformat}

Note that it isn't found in any production clusters. Instead, it is found from 
new unit tests. In addition, the issue has been there before HDFS-8647.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9313) Possible NullPointerException in BlockManager if no excess replica can be chosen

2015-10-26 Thread Ming Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Ma updated HDFS-9313:
--
Assignee: Ming Ma
  Status: Patch Available  (was: Open)

> Possible NullPointerException in BlockManager if no excess replica can be 
> chosen
> 
>
> Key: HDFS-9313
> URL: https://issues.apache.org/jira/browse/HDFS-9313
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ming Ma
>Assignee: Ming Ma
> Attachments: HDFS-9313.patch
>
>
> HDFS-8647 makes it easier to reason about various block placement scenarios. 
> Here is one possible case where BlockManager won't be able to find the excess 
> replica to delete: when storage policy changes around the same time balancer 
> moves the block. When this happens, it will cause NullPointerException.
> {noformat}
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy.adjustSetsWithChosenReplica(BlockPlacementPolicy.java:156)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseReplicasToDelete(BlockPlacementPolicyDefault.java:978)
> {noformat}
> Note that it isn't found in any production clusters. Instead, it is found 
> from new unit tests. In addition, the issue has been there before HDFS-8647.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9313) Possible NullPointerException in BlockManager if no excess replica can be chosen

2015-10-26 Thread Ming Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Ma updated HDFS-9313:
--
Attachment: HDFS-9313.patch

Here is the patch that illustrates the scenario. It is better to guard against 
this.

In addition, for this specific test scenario, {{BlockPlacementPolicyDefault}} 
should have been able to delete excessSSD. We can fix it separately.

> Possible NullPointerException in BlockManager if no excess replica can be 
> chosen
> 
>
> Key: HDFS-9313
> URL: https://issues.apache.org/jira/browse/HDFS-9313
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ming Ma
> Attachments: HDFS-9313.patch
>
>
> HDFS-8647 makes it easier to reason about various block placement scenarios. 
> Here is one possible case where BlockManager won't be able to find the excess 
> replica to delete: when storage policy changes around the same time balancer 
> moves the block. When this happens, it will cause NullPointerException.
> {noformat}
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy.adjustSetsWithChosenReplica(BlockPlacementPolicy.java:156)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseReplicasToDelete(BlockPlacementPolicyDefault.java:978)
> {noformat}
> Note that it isn't found in any production clusters. Instead, it is found 
> from new unit tests. In addition, the issue has been there before HDFS-8647.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9282) Make data directory count and storage raw capacity related tests FsDataset-agnostic

2015-10-26 Thread Tony Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14975193#comment-14975193
 ] 

Tony Wu commented on HDFS-9282:
---

Manually verified hadoop.hdfs.server.datanode.TestDirectoryScanner runs without 
error. The patch does not change anything related to this test.

> Make data directory count and storage raw capacity related tests 
> FsDataset-agnostic
> ---
>
> Key: HDFS-9282
> URL: https://issues.apache.org/jira/browse/HDFS-9282
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: HDFS, test
>Affects Versions: 2.7.1
>Reporter: Tony Wu
>Assignee: Tony Wu
>Priority: Minor
> Attachments: HDFS-9282.001.patch, HDFS-9282.002.patch
>
>
> DFSMiniCluster and several tests have hard coded assumption of the underlying 
> storage having 2 data directories (volumes). As HDFS-9188 pointed out, with 
> new FsDataset implementations, these hard coded assumption about number of 
> data directories and raw capacities of storage may change as well.
> We need to extend FsDatasetTestUtils to provide:
> * Number of data directories of underlying storage per DataNode
> * Raw storage capacity of underlying storage per DataNode.
> * Have MiniDFSCluster automatically pick up the correct values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9268) fuse_dfs chown crashes when uid is passed as -1

2015-10-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14975213#comment-14975213
 ] 

Hudson commented on HDFS-9268:
--

FAILURE: Integrated in Hadoop-trunk-Commit #8710 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8710/])
HDFS-9268. fuse_dfs chown crashes when uid is passed as -1 (cmccabe) (cmccabe: 
rev 2f1eb2bceb1df5f27649a514246b38b9ccf60cba)
* 
hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/fuse-dfs/fuse_impls_chown.c
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


> fuse_dfs chown crashes when uid is passed as -1
> ---
>
> Key: HDFS-9268
> URL: https://issues.apache.org/jira/browse/HDFS-9268
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Wei-Chiu Chuang
>Assignee: Colin Patrick McCabe
>Priority: Minor
> Fix For: 2.8.0
>
> Attachments: HDFS-9268.001.patch, HDFS-9268.002.patch
>
>
> JVM crashes when users attempt to use vi to update a file on fuse file system 
> with insufficient permission. (I use CDH's hadoop-fuse-dfs wrapper script to 
> generate the bug, but the same bug is reproducible in trunk)
> The root cause is a segfault in a dfs-fuse method
> To reproduce it do as follows:
> mkdir /mnt/fuse
> chmod 777 /mnt/fuse
> ulimit -c unlimited# to enable coredump
> hadoop-fuse-dfs -odebug hdfs://localhost:9000/fuse /mnt/fuse
> touch /mnt/fuse/y
> chmod 600 /mnt/fuse/y
> vim /mnt/fuse/y
> (in vim, :w to save the file)
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGSEGV (0xb) at pc=0x003b82f27ad6, pid=26606, tid=140079005689600
> #
> # JRE version: Java(TM) SE Runtime Environment (7.0_79-b15) (build 
> 1.7.0_79-b15)
> # Java VM: Java HotSpot(TM) 64-Bit Server VM (24.79-b02 mixed mode 
> linux-amd64 compressed oops)
> # Problematic frame:
> # C  [libc.so.6+0x127ad6]  __tls_get_addr@@GLIBC_2.3+0x127ad6
> #
> # Core dump written. Default location: /home/weichiu/core or core.26606
> #
> # An error report file with more information is saved as:
> # /home/weichiu/hs_err_pid26606.log
> #
> # If you would like to submit a bug report, please visit:
> #   http://bugreport.java.com/bugreport/crash.jsp
> # The crash happened outside the Java Virtual Machine in native code.
> # See problematic frame for where to report the bug.
> #
> /usr/bin/hadoop-fuse-dfs: line 29: 26606 Aborted (core 
> dumped) env CLASSPATH="${CLASSPATH}" ${HADOOP_HOME}/bin/fuse_dfs $@
> ===
> The coredump shows the segfault comes from 
> (gdb) bt
> #0  0x003b82e328e5 in raise () from /lib64/libc.so.6
> #1  0x003b82e340c5 in abort () from /lib64/libc.so.6
> #2  0x7f66fc924d75 in os::abort(bool) () from 
> /etc/alternatives/jre/jre/lib/amd64/server/libjvm.so
> #3  0x7f66fcaa76d7 in VMError::report_and_die() () from 
> /etc/alternatives/jre/jre/lib/amd64/server/libjvm.so
> #4  0x7f66fc929c8f in JVM_handle_linux_signal () from 
> /etc/alternatives/jre/jre/lib/amd64/server/libjvm.so
> #5  
> #6  0x003b82f27ad6 in __strcmp_sse42 () from /lib64/libc.so.6
> #7  0x004039a0 in hdfsConnTree_RB_FIND ()
> #8  0x00403e8f in fuseConnect ()
> #9  0x004046db in dfs_chown ()
> #10 0x7f66fcf8f6d2 in ?? () from /lib64/libfuse.so.2
> #11 0x7f66fcf940d1 in ?? () from /lib64/libfuse.so.2
> #12 0x7f66fcf910ef in ?? () from /lib64/libfuse.so.2
> #13 0x003b83207851 in start_thread () from /lib64/libpthread.so.0
> #14 0x003b82ee894d in clone () from /lib64/libc.so.6



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9290) DFSClient#callAppend() is not backward compatible for slightly older NameNodes

2015-10-26 Thread Tony Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14975231#comment-14975231
 ] 

Tony Wu commented on HDFS-9290:
---

HI [~kihwal],
Thanks for taking the time to manually run these tests. I didn't know Hadoop QA 
does not kick off test runs for client changes. Will make sure I include my 
manual run results in the future.
Thanks,
Tony

> DFSClient#callAppend() is not backward compatible for slightly older NameNodes
> --
>
> Key: HDFS-9290
> URL: https://issues.apache.org/jira/browse/HDFS-9290
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: Tony Wu
>Assignee: Tony Wu
>Priority: Blocker
> Fix For: 3.0.0, 2.7.2
>
> Attachments: HDFS-9290.001.patch, HDFS-9290.002.patch
>
>
> HDFS-7210 combined 2 RPC calls used at file append into a single one. 
> Specifically {{getFileInfo()}} is combined with {{append()}}. While backward 
> compatibility for older client is handled by the new NameNode (protobuf). 
> Newer client's {{append()}} call does not work with older NameNodes. One will 
> run into an exception like the following:
> {code:java}
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hdfs.DFSOutputStream.isLazyPersist(DFSOutputStream.java:1741)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream.getChecksum4Compute(DFSOutputStream.java:1550)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream.(DFSOutputStream.java:1560)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream.(DFSOutputStream.java:1670)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream.newStreamForAppend(DFSOutputStream.java:1717)
> at org.apache.hadoop.hdfs.DFSClient.callAppend(DFSClient.java:1861)
> at org.apache.hadoop.hdfs.DFSClient.append(DFSClient.java:1922)
> at org.apache.hadoop.hdfs.DFSClient.append(DFSClient.java:1892)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:340)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:336)
> at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.append(DistributedFileSystem.java:336)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.append(DistributedFileSystem.java:318)
> at org.apache.hadoop.fs.FileSystem.append(FileSystem.java:1164)
> {code}
> The cause is that the new client code is expecting both the last block and 
> file info in the same RPC but the old NameNode only replied with the first. 
> The exception itself does not reflect this and one will have to look at the 
> HDFS source code to really understand what happened.
> We can have the client detect it's talking to a old NameNode and send an 
> extra {{getFileInfo()}} RPC. Or we should improve the exception being thrown 
> to accurately reflect the cause of failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7984) webhdfs:// needs to support provided delegation tokens

2015-10-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14975257#comment-14975257
 ] 

Hadoop QA commented on HDFS-7984:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  24m 45s | Pre-patch trunk has 1 extant 
Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 3 new or modified test files. |
| {color:green}+1{color} | javac |   9m  2s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  11m 22s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 26s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   2m 54s | The applied patch generated  1 
new checkstyle issues (total was 108, now 109). |
| {color:red}-1{color} | whitespace |   0m  1s | The patch has 1  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 51s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 38s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   7m 21s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | common tests |   8m 58s | Tests passed in 
hadoop-common. |
| {color:red}-1{color} | hdfs tests |  64m 55s | Tests failed in hadoop-hdfs. |
| {color:green}+1{color} | hdfs tests |   0m 32s | Tests passed in 
hadoop-hdfs-client. |
| | | 134m 33s | |
\\
\\
|| Reason || Tests ||
| Timed out tests | 
org.apache.hadoop.hdfs.server.namenode.ha.TestStandbyCheckpoints |
|   | org.apache.hadoop.hdfs.server.namenode.TestLargeDirectoryDelete |
|   | org.apache.hadoop.hdfs.TestReplication |
|   | org.apache.hadoop.hdfs.TestPread |
|   | org.apache.hadoop.hdfs.TestSafeMode |
|   | org.apache.hadoop.hdfs.TestFileAppend4 |
|   | org.apache.hadoop.hdfs.server.namenode.ha.TestFailureToReadEdits |
|   | org.apache.hadoop.hdfs.TestRollingUpgrade |
|   | org.apache.hadoop.hdfs.server.namenode.TestFileTruncate |
|   | org.apache.hadoop.hdfs.server.mover.TestStorageMover |
|   | org.apache.hadoop.hdfs.crypto.TestHdfsCryptoStreams |
|   | org.apache.hadoop.hdfs.server.namenode.ha.TestHAStateTransitions |
|   | org.apache.hadoop.hdfs.server.namenode.TestDeleteRace |
|   | org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover |
|   | org.apache.hadoop.hdfs.TestParallelUnixDomainRead |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12768802/HDFS-7984.002.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 3cc7377 |
| Pre-patch Findbugs warnings | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13199/artifact/patchprocess/trunkFindbugsWarningshadoop-hdfs.html
 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-HDFS-Build/13199/artifact/patchprocess/diffcheckstylehadoop-common.txt
 |
| whitespace | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13199/artifact/patchprocess/whitespace.txt
 |
| hadoop-common test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13199/artifact/patchprocess/testrun_hadoop-common.txt
 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13199/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| hadoop-hdfs-client test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13199/artifact/patchprocess/testrun_hadoop-hdfs-client.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13199/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/13199/console |


This message was automatically generated.

> webhdfs:// needs to support provided delegation tokens
> --
>
> Key: HDFS-7984
> URL: https://issues.apache.org/jira/browse/HDFS-7984
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: webhdfs
>Affects Versions: 3.0.0
>Reporter: Allen Wittenauer
>Assignee: HeeSoo Kim
>Priority: Blocker
> Attachments: HDFS-7984.001.patch, HDFS-7984.002.patch, HDFS-7984.patch
>
>
> When using the webhdfs:// filesystem (especially from distcp), we need the 
> ability to inject a delegation token rather than webhdfs initialize its own.  
> This would allow for cross-authentication-zone file system accesses.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9312) Fix TestReplication to be FsDataset-agnostic.

2015-10-26 Thread Lei (Eddy) Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei (Eddy) Xu updated HDFS-9312:

Attachment: HDFS-9312.00.patch

This patch:
* Adds {{TestFsDatasetTestUtils#injectReplica}}. 
* Fix TestReplicaion to use {{injectReplica}}

> Fix TestReplication to be FsDataset-agnostic.
> -
>
> Key: HDFS-9312
> URL: https://issues.apache.org/jira/browse/HDFS-9312
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Lei (Eddy) Xu
>Assignee: Lei (Eddy) Xu
>Priority: Minor
> Attachments: HDFS-9312.00.patch
>
>
> {{TestReplication}} uses raw file system access to inject dummy replica 
> files. It makes {{TestReplication}} not compatible to non-fs dataset 
> implementation.
> We can fix it by using existing {{FsDatasetTestUtils}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9277) IOException "Unable to load OAuth2 connection factory." in TestWebHDFSOAuth2.listStatusReturnsAsExpected

2015-10-26 Thread Wei-Chiu Chuang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-9277:
--
Status: Patch Available  (was: Open)

> IOException "Unable to load OAuth2 connection factory." in 
> TestWebHDFSOAuth2.listStatusReturnsAsExpected
> 
>
> Key: HDFS-9277
> URL: https://issues.apache.org/jira/browse/HDFS-9277
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
> Attachments: HDFS-9277.001.patch
>
>
> This test is failing consistently in Hadoop-hdfs-trunk and 
> Hadoop-hdfs-trunk-Java8 since September 22.
> REGRESSION:  
> org.apache.hadoop.hdfs.web.TestWebHDFSOAuth2.listStatusReturnsAsExpected
> Error Message:
> Unable to load OAuth2 connection factory.
> Stack Trace:
> java.io.IOException: Unable to load OAuth2 connection factory.
>   at java.io.FileInputStream.open(Native Method)
>   at java.io.FileInputStream.(FileInputStream.java:146)
>   at 
> org.apache.hadoop.security.ssl.ReloadingX509TrustManager.loadTrustManager(ReloadingX509TrustManager.java:164)
>   at 
> org.apache.hadoop.security.ssl.ReloadingX509TrustManager.(ReloadingX509TrustManager.java:81)
>   at 
> org.apache.hadoop.security.ssl.FileBasedKeyStoresFactory.init(FileBasedKeyStoresFactory.java:215)
>   at org.apache.hadoop.security.ssl.SSLFactory.init(SSLFactory.java:131)
>   at 
> org.apache.hadoop.hdfs.web.URLConnectionFactory.newSslConnConfigurator(URLConnectionFactory.java:135)
>   at 
> org.apache.hadoop.hdfs.web.URLConnectionFactory.newOAuth2URLConnectionFactory(URLConnectionFactory.java:110)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.initialize(WebHdfsFileSystem.java:158)
>   at 
> org.apache.hadoop.hdfs.web.TestWebHDFSOAuth2.listStatusReturnsAsExpected(TestWebHDFSOAuth2.java:147)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9313) Possible NullPointerException in BlockManager if no excess replica can be chosen

2015-10-26 Thread Mingliang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14975453#comment-14975453
 ] 

Mingliang Liu commented on HDFS-9313:
-

Thanks for filing and working on this, [~mingma]. I think the patch makes sense 
to me. The warning is much better than a NPE.

{code}
+// no replica can't be chosen as the excessive replica as
{code}
Do you mean "no replica *can* be chosen as the excessive replica as"?

> Possible NullPointerException in BlockManager if no excess replica can be 
> chosen
> 
>
> Key: HDFS-9313
> URL: https://issues.apache.org/jira/browse/HDFS-9313
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ming Ma
>Assignee: Ming Ma
> Attachments: HDFS-9313.patch
>
>
> HDFS-8647 makes it easier to reason about various block placement scenarios. 
> Here is one possible case where BlockManager won't be able to find the excess 
> replica to delete: when storage policy changes around the same time balancer 
> moves the block. When this happens, it will cause NullPointerException.
> {noformat}
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy.adjustSetsWithChosenReplica(BlockPlacementPolicy.java:156)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseReplicasToDelete(BlockPlacementPolicyDefault.java:978)
> {noformat}
> Note that it isn't found in any production clusters. Instead, it is found 
> from new unit tests. In addition, the issue has been there before HDFS-8647.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8766) Implement a libhdfs(3) compatible API

2015-10-26 Thread James Clampffer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14975188#comment-14975188
 ] 

James Clampffer commented on HDFS-8766:
---

Thanks for the input Haohui!

I have the bulk of the next patch done incorporating your feedback but it 
doesn't look like I'll have it done and tested until tomorrow morning.  In the 
meantime I have a couple questions/comments so we can save a round trip of 
reviewing if you catch this before then:

"The comments for the include files are unnecessary. The order of include files 
should be (1) local headers, (2) headers that are specific to the projects and 
(3) C++ headers. It avoid unnecessary loading."
Just to check: google's coding style says 1) include the header your 
implementing 2) C/C++ headers 3) project files, do we want to stick with google 
here?  I have no preference I'd just like to make sure we're both on the same 
page.

"There is a race-condition here. You're effectively hitting 
https://github.com/haohui/libhdfspp/issues/31. You'll need to capture the 
promise into the closure of the handler. And please wrap them as a template to 
avoid copying code."
a) We should actually be safe in this case.  Looking back at the old code I 
noticed it wasn't really a race condition so much as a dangling reference issue 
that tended to show up during races.  The code that caused this was "return 
stat.get_future.get()".  The promise, stat, was stack allocated so when you 
call get_future you get back an rvalue that holds a reference to data contained 
in (or otherwise managed by) the stack allocated object.  Because get is being 
called on rvalue nothing was left to keep the original stat object alive as the 
scope ended so in some cases stat's destructor would get called, or the stack 
was unwound and rewound past that address, before the future's get method 
returned.  It would be similar to doing "stringstream ss;; return 
ss.str().c_str()".  In this case it is guaranteed that the promise object stays 
alive for the duration of the call.
b) It turns out that creating one synchronization template is tricky due to a 
compiler bug in GCC.  The only way I can think of doing this correctly with a 
single function involves using a variadic capture list to create a wrapper 
callback (to set the future) and then calling the real callback from there.  
GCC doesn't seem to handle unpacking variadic arguments in capture lists in a 
lot of versions (mine included).  I'm going to put a little more time into 
finding an alternative implementation but it looks like I may have to create 1 
template for each number of arguments; it's better than nothing and portable.  
If you happen to have some time to think about this and come up with something 
that only needs one templated call I'd be happy to test it on GCC and use it.

Now that the FileHandle and FileSystem method implementations are separated 
from the C API wrappers I think it makes sense to put that all of the C++ into 
a namespace to keep things clean.  Do you have any preference about namespace 
nesting/naming conventions here?  I'm thinking hdfs::c_bindings should suffice.


> Implement a libhdfs(3) compatible API
> -
>
> Key: HDFS-8766
> URL: https://issues.apache.org/jira/browse/HDFS-8766
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: James Clampffer
>Assignee: James Clampffer
> Attachments: HDFS-8766.HDFS-8707.000.patch, 
> HDFS-8766.HDFS-8707.001.patch, HDFS-8766.HDFS-8707.002.patch, 
> HDFS-8766.HDFS-8707.003.patch, HDFS-8766.HDFS-8707.004.patch, 
> HDFS-8766.HDFS-8707.005.patch, HDFS-8766.HDFS-8707.006.patch, 
> HDFS-8766.HDFS-8707.007.patch, HDFS-8766.HDFS-8707.008.patch, 
> HDFS-8766.HDFS-8707.009.patch
>
>
> Add a synchronous API that is compatible with the hdfs.h header used in 
> libhdfs and libhdfs3.  This will make it possible for projects using 
> libhdfs/libhdfs3 to relink against libhdfspp with minimal changes.
> This also provides a pure C interface that can be linked against projects 
> that aren't built in C++11 mode for various reasons but use the same 
> compiler.  It also allows many other programming languages to access 
> libhdfspp through builtin FFI interfaces.
> The libhdfs API is very similar to the posix file API which makes it easier 
> for programs built using posix filesystem calls to be modified to access HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9259) Make SO_SNDBUF size configurable at DFSClient side for hdfs write scenario

2015-10-26 Thread Mingliang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-9259:

Attachment: HDFS-9259.001.patch

Thanks for your review [~mingma]!

The v1 patch addresses the format problems.

> Make SO_SNDBUF size configurable at DFSClient side for hdfs write scenario
> --
>
> Key: HDFS-9259
> URL: https://issues.apache.org/jira/browse/HDFS-9259
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Ming Ma
>Assignee: Mingliang Liu
> Attachments: HDFS-9259.000.patch, HDFS-9259.001.patch
>
>
> We recently found that cross-DC hdfs write could be really slow. Further 
> investigation identified that is due to SendBufferSize and ReceiveBufferSize 
> used for hdfs write. The test ran "hadoop -fs -copyFromLocal" of a 256MB file 
> across DC with different SendBufferSize and ReceiveBufferSize values. The 
> results showed that c much faster than b; b is faster than a.
> a. SendBufferSize=128k, ReceiveBufferSize=128k (hdfs default setting).
> b. SendBufferSize=128K, ReceiveBufferSize=not set(TCP auto tuning).
> c. SendBufferSize=not set, ReceiveBufferSize=not set(TCP auto tuning for both)
> HDFS-8829 has enabled scenario b. We would like to enable scenario c by 
> making SendBufferSize configurable at DFSClient side. Cc: [~cmccabe] [~He 
> Tianyi] [~kanaka] [~vinayrpet].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9284) fsck command should not print exception trace when file not found

2015-10-26 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14975199#comment-14975199
 ] 

Andrew Wang commented on HDFS-9284:
---

Thanks for the reminder [~jagadesh.kiran], +1 LGTM will commit shortly.

> fsck command should not print exception trace when file not found 
> --
>
> Key: HDFS-9284
> URL: https://issues.apache.org/jira/browse/HDFS-9284
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Jagadesh Kiran N
>Assignee: Jagadesh Kiran N
> Attachments: HDFS-9284_00.patch, HDFS-9284_01.patch, 
> HDFS-9284_02.patch
>
>
> when file doesnt exist fsck throws exception 
> {code}
> ./hdfs fsck /kiran
> {code}
> the following exception occurs 
> {code}
> WARN util.NativeCodeLoader: Unable to load native-hadoop library for your 
> platform... using builtin-java classes where applicable
> FileSystem is inaccessible due to:
> java.io.FileNotFoundException: File does not exist: /kiran
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1273)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$23.doCall(DistributedFileSystem.java:1265)
> at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1265)
> at org.apache.hadoop.fs.FileSystem.resolvePath(FileSystem.java:755)
> at org.apache.hadoop.hdfs.tools.DFSck.getResolvedPath(DFSck.java:236)
> at org.apache.hadoop.hdfs.tools.DFSck.doWork(DFSck.java:316)
> at org.apache.hadoop.hdfs.tools.DFSck.access$000(DFSck.java:73)
> at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:155)
> at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:152)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1667)
> at org.apache.hadoop.hdfs.tools.DFSck.run(DFSck.java:151)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
> at org.apache.hadoop.hdfs.tools.DFSck.main(DFSck.java:383)
> {code}
> but only {code } File does not exist: /kiran {code} error message should be 
> thrown
> {code} } catch (IOException ioe) {
> System.err.println("FileSystem is inaccessible due to:\n"
> + StringUtils.stringifyException(ioe));
> }{code}
> i think it should use ioe.getmessage() method
> {code}
> } catch (IOException ioe) {
> System.err.println("FileSystem is inaccessible due to:\n"
> + StringUtils.stringifyException(ioe.getmessage()));
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9308) Add truncateMeta() to MiniDFSCluster

2015-10-26 Thread Tony Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tony Wu updated HDFS-9308:
--
Description: 
HDFS-9188 introduced {{corruptMeta()}} method to make corrupting the metadata 
file filesystem agnostic. There should also be a {{truncateMeta()}} method in 
MiniDFSCluster to allow truncation of metadata files on DataNodes without 
writing code that's specific to underling file system. 
{{FsDatasetTestUtils#truncateMeta()}} is already implemented by HDFS-9188 and 
cam be exposed easily in {{MiniDFSCluster}}.

This will be useful for tests such as 
{{TestLeaseRecovery#testBlockRecoveryWithLessMetafile}}.

  was:
HDFS-9188 introduced {{corruptMeta()}} method to make corrupting the metadata 
file filesystem agnostic. There should also be a {{truncateMeta()}} method to 
allow truncation of metadata files on DataNodes without writing code that's 
specific to underling file system. 

This will be useful for tests such as 
{{TestLeaseRecovery#testBlockRecoveryWithLessMetafile}}.


> Add truncateMeta() to MiniDFSCluster
> 
>
> Key: HDFS-9308
> URL: https://issues.apache.org/jira/browse/HDFS-9308
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: HDFS, test
>Affects Versions: 2.7.1
>Reporter: Tony Wu
>Assignee: Tony Wu
>Priority: Minor
> Attachments: HDFS-9308.001.patch
>
>
> HDFS-9188 introduced {{corruptMeta()}} method to make corrupting the metadata 
> file filesystem agnostic. There should also be a {{truncateMeta()}} method in 
> MiniDFSCluster to allow truncation of metadata files on DataNodes without 
> writing code that's specific to underling file system. 
> {{FsDatasetTestUtils#truncateMeta()}} is already implemented by HDFS-9188 and 
> cam be exposed easily in {{MiniDFSCluster}}.
> This will be useful for tests such as 
> {{TestLeaseRecovery#testBlockRecoveryWithLessMetafile}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   3   >