[jira] [Commented] (HDFS-13056) Expose file-level composite CRCs in HDFS which are comparable across different instances/layouts

2018-02-20 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16371052#comment-16371052
 ] 

genericqa commented on HDFS-13056:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
20s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 6 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
17s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 13m 
45s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
11s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
58s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m 47s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  5m  
4s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
26s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
16s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 12m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 12m 
25s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 12m 25s{color} 
| {color:red} root generated 1 new + 1231 unchanged - 0 fixed = 1232 total (was 
1231) {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
2m 17s{color} | {color:orange} root: The patch generated 175 new + 609 
unchanged - 1 fixed = 784 total (was 610) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
54s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 12s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  5m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
27s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  9m  
2s{color} | {color:green} hadoop-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
30s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}114m 18s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
51s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}221m 40s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hdfs.server.namenode.TestNameNodeMetadataConsistency |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | HDFS-13056 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12911312/HDFS-13056.002.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite 

[jira] [Updated] (HDFS-11187) Optimize disk access for last partial chunk checksum of Finalized replica

2018-02-20 Thread Xiao Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen updated HDFS-11187:
-
   Resolution: Fixed
Fix Version/s: 2.7.6
   Status: Resolved  (was: Patch Available)

Pushed to branch-2.7! Thanks Gabor and Erik

> Optimize disk access for last partial chunk checksum of Finalized replica
> -
>
> Key: HDFS-11187
> URL: https://issues.apache.org/jira/browse/HDFS-11187
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Wei-Chiu Chuang
>Assignee: Gabor Bota
>Priority: Major
> Fix For: 3.1.0, 2.10.0, 2.9.1, 2.8.4, 2.7.6, 3.0.2
>
> Attachments: HDFS-11187-branch-2.001.patch, 
> HDFS-11187-branch-2.002.patch, HDFS-11187-branch-2.003.patch, 
> HDFS-11187-branch-2.004.patch, HDFS-11187-branch-2.7.001.patch, 
> HDFS-11187.001.patch, HDFS-11187.002.patch, HDFS-11187.003.patch, 
> HDFS-11187.004.patch, HDFS-11187.005.patch
>
>
> The patch at HDFS-11160 ensures BlockSender reads the correct version of 
> metafile when there are concurrent writers.
> However, the implementation is not optimal, because it must always read the 
> last partial chunk checksum from disk while holding FsDatasetImpl lock for 
> every reader. It is possible to optimize this by keeping an up-to-date 
> version of last partial checksum in-memory and reduce disk access.
> I am separating the optimization into a new jira, because maintaining the 
> state of in-memory checksum requires a lot more work.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11187) Optimize disk access for last partial chunk checksum of Finalized replica

2018-02-20 Thread Xiao Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16371050#comment-16371050
 ] 

Xiao Chen commented on HDFS-11187:
--

Failed tests look unrelated. checkstyle and whitespace are related but trivial, 
I'll fix those at commit time.

+1 on branch-2.7 patch, committing 

> Optimize disk access for last partial chunk checksum of Finalized replica
> -
>
> Key: HDFS-11187
> URL: https://issues.apache.org/jira/browse/HDFS-11187
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Wei-Chiu Chuang
>Assignee: Gabor Bota
>Priority: Major
> Fix For: 3.1.0, 2.10.0, 2.9.1, 2.8.4, 3.0.2
>
> Attachments: HDFS-11187-branch-2.001.patch, 
> HDFS-11187-branch-2.002.patch, HDFS-11187-branch-2.003.patch, 
> HDFS-11187-branch-2.004.patch, HDFS-11187-branch-2.7.001.patch, 
> HDFS-11187.001.patch, HDFS-11187.002.patch, HDFS-11187.003.patch, 
> HDFS-11187.004.patch, HDFS-11187.005.patch
>
>
> The patch at HDFS-11160 ensures BlockSender reads the correct version of 
> metafile when there are concurrent writers.
> However, the implementation is not optimal, because it must always read the 
> last partial chunk checksum from disk while holding FsDatasetImpl lock for 
> every reader. It is possible to optimize this by keeping an up-to-date 
> version of last partial checksum in-memory and reduce disk access.
> I am separating the optimization into a new jira, because maintaining the 
> state of in-memory checksum requires a lot more work.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13108) Ozone: OzoneFileSystem: Simplified url schema for Ozone File System

2018-02-20 Thread Elek, Marton (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16371043#comment-16371043
 ] 

Elek, Marton commented on HDFS-13108:
-

The one checkstyle issue is also fixed...

> Ozone: OzoneFileSystem: Simplified url schema for Ozone File System
> ---
>
> Key: HDFS-13108
> URL: https://issues.apache.org/jira/browse/HDFS-13108
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Elek, Marton
>Assignee: Elek, Marton
>Priority: Major
> Attachments: HDFS-13108-HDFS-7240.001.patch, 
> HDFS-13108-HDFS-7240.002.patch, HDFS-13108-HDFS-7240.003.patch, 
> HDFS-13108-HDFS-7240.005.patch, HDFS-13108-HDFS-7240.006.patch, 
> HDFS-13108-HDFS-7240.007.patch
>
>
> A. Current state
>  
> 1. The datanode host / bucket /volume should be defined in the defaultFS (eg. 
>  o3://datanode:9864/test/bucket1)
> 2. The root file system points to the bucket (eg. 'dfs -ls /' lists all the 
> keys from the bucket1)
> It works very well, but there are some limitations.
> B. Problem one 
> The current code doesn't support fully qualified locations. For example 'dfs 
> -ls o3://datanode:9864/test/bucket1/dir1' is not working.
> C.) Problem two
> I tried to fix the previous problem, but it's not trivial. The biggest 
> problem is that there is a Path.makeQualified call which could transform 
> unqualified url to qualified url. This is part of the Path.java so it's 
> common for all the Hadoop file systems.
> In the current implementations it qualifies an url with keeping the schema 
> (eg. o3:// ) and authority (eg: datanode: 9864) from the defaultfs and use 
> the relative path as the end of the qualified url. For example:
> makeQualfied(defaultUri=o3://datanode:9864/test/bucket1, path=dir1/file) will 
> return o3://datanode:9864/dir1/file which is obviously wrong (the good would 
> be o3://datanode:9864/TEST/BUCKET1/dir1/file). I tried to do a workaround 
> with using a custom makeQualified in the Ozone code and it worked from 
> command line but couldn't work with Spark which use the Hadoop api and the 
> original makeQualified path.
> D.) Solution
> We should support makeQualified calls, so we can use any path in the 
> defaultFS.
>  
> I propose to use a simplified schema as o3://bucket.volume/ 
> This is similar to the s3a  format where the pattern is s3a://bucket.region/ 
> We don't need to set the hostname of the datanode (or ksm in case of service 
> discovery) but it would be configurable with additional hadoop configuraion 
> values such as fs.o3.bucket.buckename.volumename.address=http://datanode:9864 
> (this is how the s3a works today, as I know).
> We also need to define restrictions for the volume names (in our case it 
> should not include dot any more).
> ps: some spark output
> 2018-02-03 18:43:04 WARN  Client:66 - Neither spark.yarn.jars nor 
> spark.yarn.archive is set, falling back to uploading libraries under 
> SPARK_HOME.
> 2018-02-03 18:43:05 INFO  Client:54 - Uploading resource 
> file:/tmp/spark-03119be0-9c3d-440c-8e9f-48c692412ab5/__spark_libs__244044896784490.zip
>  -> 
> o3://datanode:9864/user/hadoop/.sparkStaging/application_1517611085375_0001/__spark_libs__244044896784490.zip
> My default fs was o3://datanode:9864/test/bucket1, but spark qualified the 
> name of the home directory.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13108) Ozone: OzoneFileSystem: Simplified url schema for Ozone File System

2018-02-20 Thread Elek, Marton (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elek, Marton updated HDFS-13108:

Attachment: HDFS-13108-HDFS-7240.007.patch

> Ozone: OzoneFileSystem: Simplified url schema for Ozone File System
> ---
>
> Key: HDFS-13108
> URL: https://issues.apache.org/jira/browse/HDFS-13108
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Elek, Marton
>Assignee: Elek, Marton
>Priority: Major
> Attachments: HDFS-13108-HDFS-7240.001.patch, 
> HDFS-13108-HDFS-7240.002.patch, HDFS-13108-HDFS-7240.003.patch, 
> HDFS-13108-HDFS-7240.005.patch, HDFS-13108-HDFS-7240.006.patch, 
> HDFS-13108-HDFS-7240.007.patch
>
>
> A. Current state
>  
> 1. The datanode host / bucket /volume should be defined in the defaultFS (eg. 
>  o3://datanode:9864/test/bucket1)
> 2. The root file system points to the bucket (eg. 'dfs -ls /' lists all the 
> keys from the bucket1)
> It works very well, but there are some limitations.
> B. Problem one 
> The current code doesn't support fully qualified locations. For example 'dfs 
> -ls o3://datanode:9864/test/bucket1/dir1' is not working.
> C.) Problem two
> I tried to fix the previous problem, but it's not trivial. The biggest 
> problem is that there is a Path.makeQualified call which could transform 
> unqualified url to qualified url. This is part of the Path.java so it's 
> common for all the Hadoop file systems.
> In the current implementations it qualifies an url with keeping the schema 
> (eg. o3:// ) and authority (eg: datanode: 9864) from the defaultfs and use 
> the relative path as the end of the qualified url. For example:
> makeQualfied(defaultUri=o3://datanode:9864/test/bucket1, path=dir1/file) will 
> return o3://datanode:9864/dir1/file which is obviously wrong (the good would 
> be o3://datanode:9864/TEST/BUCKET1/dir1/file). I tried to do a workaround 
> with using a custom makeQualified in the Ozone code and it worked from 
> command line but couldn't work with Spark which use the Hadoop api and the 
> original makeQualified path.
> D.) Solution
> We should support makeQualified calls, so we can use any path in the 
> defaultFS.
>  
> I propose to use a simplified schema as o3://bucket.volume/ 
> This is similar to the s3a  format where the pattern is s3a://bucket.region/ 
> We don't need to set the hostname of the datanode (or ksm in case of service 
> discovery) but it would be configurable with additional hadoop configuraion 
> values such as fs.o3.bucket.buckename.volumename.address=http://datanode:9864 
> (this is how the s3a works today, as I know).
> We also need to define restrictions for the volume names (in our case it 
> should not include dot any more).
> ps: some spark output
> 2018-02-03 18:43:04 WARN  Client:66 - Neither spark.yarn.jars nor 
> spark.yarn.archive is set, falling back to uploading libraries under 
> SPARK_HOME.
> 2018-02-03 18:43:05 INFO  Client:54 - Uploading resource 
> file:/tmp/spark-03119be0-9c3d-440c-8e9f-48c692412ab5/__spark_libs__244044896784490.zip
>  -> 
> o3://datanode:9864/user/hadoop/.sparkStaging/application_1517611085375_0001/__spark_libs__244044896784490.zip
> My default fs was o3://datanode:9864/test/bucket1, but spark qualified the 
> name of the home directory.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13165) [SPS]: Collects successfully moved block details via IBR

2018-02-20 Thread Rakesh R (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16371019#comment-16371019
 ] 

Rakesh R commented on HDFS-13165:
-

Attached another patch fixing,
- checkstyle
- whitespace
- {{TestStoragePolicySatisfier#estSPSWhenFileHasExcessRedundancyBlocks}} test 
failure 

> [SPS]: Collects successfully moved block details via IBR
> 
>
> Key: HDFS-13165
> URL: https://issues.apache.org/jira/browse/HDFS-13165
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Rakesh R
>Assignee: Rakesh R
>Priority: Major
> Attachments: HDFS-13165-HDFS-10285-00.patch, 
> HDFS-13165-HDFS-10285-01.patch, HDFS-13165-HDFS-10285-02.patch
>
>
> This task to make use of the existing IBR to get moved block details and 
> remove unwanted future tracking logic exists in BlockStorageMovementTracker 
> code, this is no more needed as the file level tracking maintained at NN 
> itself.
> Following comments taken from HDFS-10285, 
> [here|https://issues.apache.org/jira/browse/HDFS-10285?focusedCommentId=16347472=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16347472]
> Comment-3)
> {quote}BPServiceActor
> Is it actually sending back the moved blocks? Aren’t IBRs sufficient?{quote}
> Comment-21)
> {quote}
> BlockStorageMovementTracker
> Many data structures are riddled with non-threadsafe race conditions and risk 
> of CMEs.
> Ex. The moverTaskFutures map. Adding new blocks and/or adding to a block's 
> list of futures is synchronized. However the run loop does an unsynchronized 
> block get, unsynchronized future remove, unsynchronized isEmpty, possibly 
> another unsynchronized get, only then does it do a synchronized remove of the 
> block. The whole chunk of code should be synchronized.
> Is the problematic moverTaskFutures even needed? It's aggregating futures 
> per-block for seemingly no reason. Why track all the futures at all instead 
> of just relying on the completion service? As best I can tell:
> It's only used to determine if a future from the completion service should be 
> ignored during shutdown. Shutdown sets the running boolean to false and 
> clears the entire datastructure so why not use the running boolean like a 
> check just a little further down?
> As synchronization to sleep up to 2 seconds before performing a blocking 
> moverCompletionService.take, but only when it thinks there are no active 
> futures. I'll ignore the missed notify race that the bounded wait masks, but 
> the real question is why not just do the blocking take?
> Why all the complexity? Am I missing something?
> BlocksMovementsStatusHandler
> Suffers same type of thread safety issues as StoragePolicySatisfyWorker. Ex. 
> blockIdVsMovementStatus is inconsistent synchronized. Does synchronize to 
> return an unmodifiable list which sadly does nothing to protect the caller 
> from CME.
> handle is iterating over a non-thread safe list.
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13165) [SPS]: Collects successfully moved block details via IBR

2018-02-20 Thread Rakesh R (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakesh R updated HDFS-13165:

Attachment: HDFS-13165-HDFS-10285-02.patch

> [SPS]: Collects successfully moved block details via IBR
> 
>
> Key: HDFS-13165
> URL: https://issues.apache.org/jira/browse/HDFS-13165
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Rakesh R
>Assignee: Rakesh R
>Priority: Major
> Attachments: HDFS-13165-HDFS-10285-00.patch, 
> HDFS-13165-HDFS-10285-01.patch, HDFS-13165-HDFS-10285-02.patch
>
>
> This task to make use of the existing IBR to get moved block details and 
> remove unwanted future tracking logic exists in BlockStorageMovementTracker 
> code, this is no more needed as the file level tracking maintained at NN 
> itself.
> Following comments taken from HDFS-10285, 
> [here|https://issues.apache.org/jira/browse/HDFS-10285?focusedCommentId=16347472=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16347472]
> Comment-3)
> {quote}BPServiceActor
> Is it actually sending back the moved blocks? Aren’t IBRs sufficient?{quote}
> Comment-21)
> {quote}
> BlockStorageMovementTracker
> Many data structures are riddled with non-threadsafe race conditions and risk 
> of CMEs.
> Ex. The moverTaskFutures map. Adding new blocks and/or adding to a block's 
> list of futures is synchronized. However the run loop does an unsynchronized 
> block get, unsynchronized future remove, unsynchronized isEmpty, possibly 
> another unsynchronized get, only then does it do a synchronized remove of the 
> block. The whole chunk of code should be synchronized.
> Is the problematic moverTaskFutures even needed? It's aggregating futures 
> per-block for seemingly no reason. Why track all the futures at all instead 
> of just relying on the completion service? As best I can tell:
> It's only used to determine if a future from the completion service should be 
> ignored during shutdown. Shutdown sets the running boolean to false and 
> clears the entire datastructure so why not use the running boolean like a 
> check just a little further down?
> As synchronization to sleep up to 2 seconds before performing a blocking 
> moverCompletionService.take, but only when it thinks there are no active 
> futures. I'll ignore the missed notify race that the bounded wait masks, but 
> the real question is why not just do the blocking take?
> Why all the complexity? Am I missing something?
> BlocksMovementsStatusHandler
> Suffers same type of thread safety issues as StoragePolicySatisfyWorker. Ex. 
> blockIdVsMovementStatus is inconsistent synchronized. Does synchronize to 
> return an unmodifiable list which sadly does nothing to protect the caller 
> from CME.
> handle is iterating over a non-thread safe list.
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13040) Kerberized inotify client fails despite kinit properly

2018-02-20 Thread Xiao Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370999#comment-16370999
 ] 

Xiao Chen commented on HDFS-13040:
--

Thanks for the review Daryn.

Patch 5 attached to address the comments except the 'current user' one. I agree 
it's the most correct thing to do, but maybe we can leave it out to a future 
jira.
{quote} floating the doAs login user up to {{getEditsFromTxid}} 
{quote}
Good idea, done this way and left the stream class untouched.
{quote}Could the unit test just explicitly set the conf keys
{quote}
Not really, because the journal part of the QJMHA cluster needs to be started 
first for us to know the correct journal URI, so we can't know the uri 
beforehand. {{initHAConf}} currently sets the shared edits dir key, presumably 
for the same reason.
{quote}the test
{quote}
Good catch, and helpful explanations. Addressed by using the correct UGIs. 
hdfs@ is the client, and hdfs/localhost@ is the NN user. Verified I can see the 
big beautiful gssapi stack trace without the fix.

1 odd thing I found in the test though is I had to set the proxy users for it 
to work, otherwise the mkdirs after the relogin would throw
{quote}AuthorizationException): User: hdfs/localh...@example.com is not allowed 
to impersonate h...@example.com
{quote}
at me. Debugging this, it seems to be a designed rpc server auth behavior from 
this 
[code|https://github.com/apache/hadoop/blob/121e1e1280c7b019f6d2cc3ba9eae1ead0dd8408/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Server.java#L2260].
 Though my debugging shows the {{protocolUser}} is {{hdfs@ (auth:SIMPLE)}}, 
while the {{realUser}} is {{hdfs/localhost@ (auth:KERBEROS)}}, still weird

 

> Kerberized inotify client fails despite kinit properly
> --
>
> Key: HDFS-13040
> URL: https://issues.apache.org/jira/browse/HDFS-13040
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.0
> Environment: Kerberized, HA cluster, iNotify client, CDH5.10.2
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>Priority: Major
> Attachments: HDFS-13040.001.patch, HDFS-13040.02.patch, 
> HDFS-13040.03.patch, HDFS-13040.04.patch, HDFS-13040.05.patch, 
> HDFS-13040.half.test.patch, TestDFSInotifyEventInputStreamKerberized.java, 
> TransactionReader.java
>
>
> This issue is similar to HDFS-10799.
> HDFS-10799 turned out to be a client side issue where client is responsible 
> for renewing kerberos ticket actively.
> However we found in a slightly setup even if client has valid Kerberos 
> credentials, inotify still fails.
> Suppose client uses principal h...@example.com, 
>  namenode 1 uses server principal hdfs/nn1.example@example.com
>  namenode 2 uses server principal hdfs/nn2.example@example.com
> *After Namenodes starts for longer than kerberos ticket lifetime*, the client 
> fails with the following error:
> {noformat}
> 18/01/19 11:23:02 WARN security.UserGroupInformation: 
> PriviledgedActionException as:h...@gce.cloudera.com (auth:KERBEROS) 
> cause:org.apache.hadoop.ipc.RemoteException(java.io.IOException): We 
> encountered an error reading 
> https://nn2.example.com:8481/getJournal?jid=ns1=8662=-60%3A353531113%3A0%3Acluster3,
>  
> https://nn1.example.com:8481/getJournal?jid=ns1=8662=-60%3A353531113%3A0%3Acluster3.
>   During automatic edit log failover, we noticed that all of the remaining 
> edit log streams are shorter than the current one!  The best remaining edit 
> log ends at transaction 8683, but we thought we could read up to transaction 
> 8684.  If you continue, metadata will be lost forever!
> at 
> org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream.nextOp(RedundantEditLogInputStream.java:213)
> at 
> org.apache.hadoop.hdfs.server.namenode.EditLogInputStream.readOp(EditLogInputStream.java:85)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.readOp(NameNodeRpcServer.java:1701)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getEditsFromTxid(NameNodeRpcServer.java:1763)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getEditsFromTxid(AuthorizationProviderProxyClientProtocol.java:1011)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getEditsFromTxid(ClientNamenodeProtocolServerSideTranslatorPB.java:1490)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
>   

[jira] [Updated] (HDFS-13040) Kerberized inotify client fails despite kinit properly

2018-02-20 Thread Xiao Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen updated HDFS-13040:
-
Attachment: HDFS-13040.05.patch

> Kerberized inotify client fails despite kinit properly
> --
>
> Key: HDFS-13040
> URL: https://issues.apache.org/jira/browse/HDFS-13040
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.0
> Environment: Kerberized, HA cluster, iNotify client, CDH5.10.2
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>Priority: Major
> Attachments: HDFS-13040.001.patch, HDFS-13040.02.patch, 
> HDFS-13040.03.patch, HDFS-13040.04.patch, HDFS-13040.05.patch, 
> HDFS-13040.half.test.patch, TestDFSInotifyEventInputStreamKerberized.java, 
> TransactionReader.java
>
>
> This issue is similar to HDFS-10799.
> HDFS-10799 turned out to be a client side issue where client is responsible 
> for renewing kerberos ticket actively.
> However we found in a slightly setup even if client has valid Kerberos 
> credentials, inotify still fails.
> Suppose client uses principal h...@example.com, 
>  namenode 1 uses server principal hdfs/nn1.example@example.com
>  namenode 2 uses server principal hdfs/nn2.example@example.com
> *After Namenodes starts for longer than kerberos ticket lifetime*, the client 
> fails with the following error:
> {noformat}
> 18/01/19 11:23:02 WARN security.UserGroupInformation: 
> PriviledgedActionException as:h...@gce.cloudera.com (auth:KERBEROS) 
> cause:org.apache.hadoop.ipc.RemoteException(java.io.IOException): We 
> encountered an error reading 
> https://nn2.example.com:8481/getJournal?jid=ns1=8662=-60%3A353531113%3A0%3Acluster3,
>  
> https://nn1.example.com:8481/getJournal?jid=ns1=8662=-60%3A353531113%3A0%3Acluster3.
>   During automatic edit log failover, we noticed that all of the remaining 
> edit log streams are shorter than the current one!  The best remaining edit 
> log ends at transaction 8683, but we thought we could read up to transaction 
> 8684.  If you continue, metadata will be lost forever!
> at 
> org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream.nextOp(RedundantEditLogInputStream.java:213)
> at 
> org.apache.hadoop.hdfs.server.namenode.EditLogInputStream.readOp(EditLogInputStream.java:85)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.readOp(NameNodeRpcServer.java:1701)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getEditsFromTxid(NameNodeRpcServer.java:1763)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getEditsFromTxid(AuthorizationProviderProxyClientProtocol.java:1011)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getEditsFromTxid(ClientNamenodeProtocolServerSideTranslatorPB.java:1490)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2216)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2212)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2210)
> {noformat}
> Typically if NameNode has an expired Kerberos ticket, the error handling for 
> the typical edit log tailing would let NameNode to relogin with its own 
> Kerberos principal. However, when inotify uses the same code path to retrieve 
> edits, since the current user is the inotify client's principal, unless 
> client uses the same principal as the NameNode, NameNode can't do it on 
> behalf of the client.
> Therefore, a more appropriate approach is to use proxy user so that NameNode 
> can retrieving edits on behalf of the client.
> I will attach a patch to fix it. This patch has been verified to work for a 
> CDH5.10.2 cluster, however it seems impossible to craft a unit test for this 
> fix because the way Hadoop UGI handles Kerberos credentials (I can't have a 
> single process that logins as two Kerberos principals simultaneously and let 
> them establish connection)
> A possible workaround is for the inotify client to use the active NameNode's 
> server principal. However, that's not going to work when there's a namenode 
> failover, because then the client's principal will not be 

[jira] [Commented] (HDFS-13175) Add more information for checking argument in DiskBalancerVolume

2018-02-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370929#comment-16370929
 ] 

Hudson commented on HDFS-13175:
---

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13692 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/13692/])
HDFS-13175. Add more information for checking argument in (aengineer: rev 
121e1e1280c7b019f6d2cc3ba9eae1ead0dd8408)
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/diskbalancer/connectors/DBNameNodeConnector.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/diskbalancer/datamodel/DiskBalancerVolume.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/diskbalancer/command/PlanCommand.java


> Add more information for checking argument in DiskBalancerVolume
> 
>
> Key: HDFS-13175
> URL: https://issues.apache.org/jira/browse/HDFS-13175
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: diskbalancer
>Affects Versions: 3.0.0
>Reporter: Lei (Eddy) Xu
>Assignee: Lei (Eddy) Xu
>Priority: Minor
> Fix For: 3.1.0, 3.0.2
>
> Attachments: HDFS-13175.00.patch, HDFS-13175.01.patch
>
>
> We have seen the following stack in production
> {code}
> Exception in thread "main" java.lang.IllegalArgumentException
>   at 
> com.google.common.base.Preconditions.checkArgument(Preconditions.java:72)
>   at 
> org.apache.hadoop.hdfs.server.diskbalancer.datamodel.DiskBalancerVolume.setUsed(DiskBalancerVolume.java:268)
>   at 
> org.apache.hadoop.hdfs.server.diskbalancer.connectors.DBNameNodeConnector.getVolumeInfoFromStorageReports(DBNameNodeConnector.java:141)
>   at 
> org.apache.hadoop.hdfs.server.diskbalancer.connectors.DBNameNodeConnector.getNodes(DBNameNodeConnector.java:90)
>   at 
> org.apache.hadoop.hdfs.server.diskbalancer.datamodel.DiskBalancerCluster.readClusterInfo(DiskBalancerCluster.java:132)
>   at 
> org.apache.hadoop.hdfs.server.diskbalancer.command.Command.readClusterInfo(Command.java:123)
>   at 
> org.apache.hadoop.hdfs.server.diskbalancer.command.PlanCommand.execute(PlanCommand.java:107)
> {code}
> raised from 
> {code}
>  public void setUsed(long dfsUsedSpace) {
> Preconditions.checkArgument(dfsUsedSpace < this.getCapacity());
> this.used = dfsUsedSpace;
>   }
> {code}
> However, the datanode reports at the very moment were not captured. We should 
> add more information into the stack trace to better diagnose the issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13175) Add more information for checking argument in DiskBalancerVolume

2018-02-20 Thread Anu Engineer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anu Engineer updated HDFS-13175:

   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.0.2
   3.1.0
   Status: Resolved  (was: Patch Available)

[~eddyxu] thank you for the contribution. I have committed this to 3.0,3.1 and 
trunk.

 

> Add more information for checking argument in DiskBalancerVolume
> 
>
> Key: HDFS-13175
> URL: https://issues.apache.org/jira/browse/HDFS-13175
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: diskbalancer
>Affects Versions: 3.0.0
>Reporter: Lei (Eddy) Xu
>Assignee: Lei (Eddy) Xu
>Priority: Minor
> Fix For: 3.1.0, 3.0.2
>
> Attachments: HDFS-13175.00.patch, HDFS-13175.01.patch
>
>
> We have seen the following stack in production
> {code}
> Exception in thread "main" java.lang.IllegalArgumentException
>   at 
> com.google.common.base.Preconditions.checkArgument(Preconditions.java:72)
>   at 
> org.apache.hadoop.hdfs.server.diskbalancer.datamodel.DiskBalancerVolume.setUsed(DiskBalancerVolume.java:268)
>   at 
> org.apache.hadoop.hdfs.server.diskbalancer.connectors.DBNameNodeConnector.getVolumeInfoFromStorageReports(DBNameNodeConnector.java:141)
>   at 
> org.apache.hadoop.hdfs.server.diskbalancer.connectors.DBNameNodeConnector.getNodes(DBNameNodeConnector.java:90)
>   at 
> org.apache.hadoop.hdfs.server.diskbalancer.datamodel.DiskBalancerCluster.readClusterInfo(DiskBalancerCluster.java:132)
>   at 
> org.apache.hadoop.hdfs.server.diskbalancer.command.Command.readClusterInfo(Command.java:123)
>   at 
> org.apache.hadoop.hdfs.server.diskbalancer.command.PlanCommand.execute(PlanCommand.java:107)
> {code}
> raised from 
> {code}
>  public void setUsed(long dfsUsedSpace) {
> Preconditions.checkArgument(dfsUsedSpace < this.getCapacity());
> this.used = dfsUsedSpace;
>   }
> {code}
> However, the datanode reports at the very moment were not captured. We should 
> add more information into the stack trace to better diagnose the issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13056) Expose file-level composite CRCs in HDFS which are comparable across different instances/layouts

2018-02-20 Thread Dennis Huo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Huo updated HDFS-13056:
--
Status: Patch Available  (was: Open)

> Expose file-level composite CRCs in HDFS which are comparable across 
> different instances/layouts
> 
>
> Key: HDFS-13056
> URL: https://issues.apache.org/jira/browse/HDFS-13056
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, distcp, erasure-coding, federation, hdfs
>Affects Versions: 3.0.0
>Reporter: Dennis Huo
>Priority: Major
> Attachments: HDFS-13056-branch-2.8.001.patch, 
> HDFS-13056-branch-2.8.poc1.patch, HDFS-13056.001.patch, HDFS-13056.002.patch, 
> Reference_only_zhen_PPOC_hadoop2.6.X.diff, hdfs-file-composite-crc32-v1.pdf, 
> hdfs-file-composite-crc32-v2.pdf, hdfs-file-composite-crc32-v3.pdf
>
>
> FileChecksum was first introduced in 
> [https://issues-test.apache.org/jira/browse/HADOOP-3981] and ever since then 
> has remained defined as MD5-of-MD5-of-CRC, where per-512-byte chunk CRCs are 
> already stored as part of datanode metadata, and the MD5 approach is used to 
> compute an aggregate value in a distributed manner, with individual datanodes 
> computing the MD5-of-CRCs per-block in parallel, and the HDFS client 
> computing the second-level MD5.
>  
> A shortcoming of this approach which is often brought up is the fact that 
> this FileChecksum is sensitive to the internal block-size and chunk-size 
> configuration, and thus different HDFS files with different block/chunk 
> settings cannot be compared. More commonly, one might have different HDFS 
> clusters which use different block sizes, in which case any data migration 
> won't be able to use the FileChecksum for distcp's rsync functionality or for 
> verifying end-to-end data integrity (on top of low-level data integrity 
> checks applied at data transfer time).
>  
> This was also revisited in https://issues.apache.org/jira/browse/HDFS-8430 
> during the addition of checksum support for striped erasure-coded files; 
> while there was some discussion of using CRC composability, it still 
> ultimately settled on hierarchical MD5 approach, which also adds the problem 
> that checksums of basic replicated files are not comparable to striped files.
>  
> This feature proposes to add a "COMPOSITE-CRC" FileChecksum type which uses 
> CRC composition to remain completely chunk/block agnostic, and allows 
> comparison between striped vs replicated files, between different HDFS 
> instances, and possible even between HDFS and other external storage systems. 
> This feature can also be added in-place to be compatible with existing block 
> metadata, and doesn't need to change the normal path of chunk verification, 
> so is minimally invasive. This also means even large preexisting HDFS 
> deployments could adopt this feature to retroactively sync data. A detailed 
> design document can be found here: 
> https://storage.googleapis.com/dennishuo/hdfs-file-composite-crc32-v1.pdf



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13175) Add more information for checking argument in DiskBalancerVolume

2018-02-20 Thread Anu Engineer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370896#comment-16370896
 ] 

Anu Engineer commented on HDFS-13175:
-

I will commit this shortly.

 

> Add more information for checking argument in DiskBalancerVolume
> 
>
> Key: HDFS-13175
> URL: https://issues.apache.org/jira/browse/HDFS-13175
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: diskbalancer
>Affects Versions: 3.0.0
>Reporter: Lei (Eddy) Xu
>Assignee: Lei (Eddy) Xu
>Priority: Minor
> Attachments: HDFS-13175.00.patch, HDFS-13175.01.patch
>
>
> We have seen the following stack in production
> {code}
> Exception in thread "main" java.lang.IllegalArgumentException
>   at 
> com.google.common.base.Preconditions.checkArgument(Preconditions.java:72)
>   at 
> org.apache.hadoop.hdfs.server.diskbalancer.datamodel.DiskBalancerVolume.setUsed(DiskBalancerVolume.java:268)
>   at 
> org.apache.hadoop.hdfs.server.diskbalancer.connectors.DBNameNodeConnector.getVolumeInfoFromStorageReports(DBNameNodeConnector.java:141)
>   at 
> org.apache.hadoop.hdfs.server.diskbalancer.connectors.DBNameNodeConnector.getNodes(DBNameNodeConnector.java:90)
>   at 
> org.apache.hadoop.hdfs.server.diskbalancer.datamodel.DiskBalancerCluster.readClusterInfo(DiskBalancerCluster.java:132)
>   at 
> org.apache.hadoop.hdfs.server.diskbalancer.command.Command.readClusterInfo(Command.java:123)
>   at 
> org.apache.hadoop.hdfs.server.diskbalancer.command.PlanCommand.execute(PlanCommand.java:107)
> {code}
> raised from 
> {code}
>  public void setUsed(long dfsUsedSpace) {
> Preconditions.checkArgument(dfsUsedSpace < this.getCapacity());
> this.used = dfsUsedSpace;
>   }
> {code}
> However, the datanode reports at the very moment were not captured. We should 
> add more information into the stack trace to better diagnose the issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13175) Add more information for checking argument in DiskBalancerVolume

2018-02-20 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370886#comment-16370886
 ] 

genericqa commented on HDFS-13175:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
21s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
 5s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
4s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
43s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
9s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 51s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
0s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
3s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 15s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
6s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
52s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}125m 31s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
23s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}178m 47s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA |
|   | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure |
|   | hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistReplicaRecovery 
|
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | HDFS-13175 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12911296/HDFS-13175.01.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux f58d0d8187b2 3.13.0-135-generic #184-Ubuntu SMP Wed Oct 18 
11:55:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 6f81cc0 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/23136/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/23136/testReport/ |
| Max. process+thread count | 

[jira] [Commented] (HDFS-13175) Add more information for checking argument in DiskBalancerVolume

2018-02-20 Thread Anu Engineer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370856#comment-16370856
 ] 

Anu Engineer commented on HDFS-13175:
-

+1, patch v1, pending jenkins.

 

> Add more information for checking argument in DiskBalancerVolume
> 
>
> Key: HDFS-13175
> URL: https://issues.apache.org/jira/browse/HDFS-13175
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: diskbalancer
>Affects Versions: 3.0.0
>Reporter: Lei (Eddy) Xu
>Assignee: Lei (Eddy) Xu
>Priority: Minor
> Attachments: HDFS-13175.00.patch, HDFS-13175.01.patch
>
>
> We have seen the following stack in production
> {code}
> Exception in thread "main" java.lang.IllegalArgumentException
>   at 
> com.google.common.base.Preconditions.checkArgument(Preconditions.java:72)
>   at 
> org.apache.hadoop.hdfs.server.diskbalancer.datamodel.DiskBalancerVolume.setUsed(DiskBalancerVolume.java:268)
>   at 
> org.apache.hadoop.hdfs.server.diskbalancer.connectors.DBNameNodeConnector.getVolumeInfoFromStorageReports(DBNameNodeConnector.java:141)
>   at 
> org.apache.hadoop.hdfs.server.diskbalancer.connectors.DBNameNodeConnector.getNodes(DBNameNodeConnector.java:90)
>   at 
> org.apache.hadoop.hdfs.server.diskbalancer.datamodel.DiskBalancerCluster.readClusterInfo(DiskBalancerCluster.java:132)
>   at 
> org.apache.hadoop.hdfs.server.diskbalancer.command.Command.readClusterInfo(Command.java:123)
>   at 
> org.apache.hadoop.hdfs.server.diskbalancer.command.PlanCommand.execute(PlanCommand.java:107)
> {code}
> raised from 
> {code}
>  public void setUsed(long dfsUsedSpace) {
> Preconditions.checkArgument(dfsUsedSpace < this.getCapacity());
> this.used = dfsUsedSpace;
>   }
> {code}
> However, the datanode reports at the very moment were not captured. We should 
> add more information into the stack trace to better diagnose the issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13056) Expose file-level composite CRCs in HDFS which are comparable across different instances/layouts

2018-02-20 Thread Dennis Huo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Huo updated HDFS-13056:
--
Attachment: HDFS-13056.002.patch

> Expose file-level composite CRCs in HDFS which are comparable across 
> different instances/layouts
> 
>
> Key: HDFS-13056
> URL: https://issues.apache.org/jira/browse/HDFS-13056
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, distcp, erasure-coding, federation, hdfs
>Affects Versions: 3.0.0
>Reporter: Dennis Huo
>Priority: Major
> Attachments: HDFS-13056-branch-2.8.001.patch, 
> HDFS-13056-branch-2.8.poc1.patch, HDFS-13056.001.patch, HDFS-13056.002.patch, 
> Reference_only_zhen_PPOC_hadoop2.6.X.diff, hdfs-file-composite-crc32-v1.pdf, 
> hdfs-file-composite-crc32-v2.pdf, hdfs-file-composite-crc32-v3.pdf
>
>
> FileChecksum was first introduced in 
> [https://issues-test.apache.org/jira/browse/HADOOP-3981] and ever since then 
> has remained defined as MD5-of-MD5-of-CRC, where per-512-byte chunk CRCs are 
> already stored as part of datanode metadata, and the MD5 approach is used to 
> compute an aggregate value in a distributed manner, with individual datanodes 
> computing the MD5-of-CRCs per-block in parallel, and the HDFS client 
> computing the second-level MD5.
>  
> A shortcoming of this approach which is often brought up is the fact that 
> this FileChecksum is sensitive to the internal block-size and chunk-size 
> configuration, and thus different HDFS files with different block/chunk 
> settings cannot be compared. More commonly, one might have different HDFS 
> clusters which use different block sizes, in which case any data migration 
> won't be able to use the FileChecksum for distcp's rsync functionality or for 
> verifying end-to-end data integrity (on top of low-level data integrity 
> checks applied at data transfer time).
>  
> This was also revisited in https://issues.apache.org/jira/browse/HDFS-8430 
> during the addition of checksum support for striped erasure-coded files; 
> while there was some discussion of using CRC composability, it still 
> ultimately settled on hierarchical MD5 approach, which also adds the problem 
> that checksums of basic replicated files are not comparable to striped files.
>  
> This feature proposes to add a "COMPOSITE-CRC" FileChecksum type which uses 
> CRC composition to remain completely chunk/block agnostic, and allows 
> comparison between striped vs replicated files, between different HDFS 
> instances, and possible even between HDFS and other external storage systems. 
> This feature can also be added in-place to be compatible with existing block 
> metadata, and doesn't need to change the normal path of chunk verification, 
> so is minimally invasive. This also means even large preexisting HDFS 
> deployments could adopt this feature to retroactively sync data. A detailed 
> design document can be found here: 
> https://storage.googleapis.com/dennishuo/hdfs-file-composite-crc32-v1.pdf



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13056) Expose file-level composite CRCs in HDFS which are comparable across different instances/layouts

2018-02-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370820#comment-16370820
 ] 

ASF GitHub Bot commented on HDFS-13056:
---

GitHub user dennishuo opened a pull request:

https://github.com/apache/hadoop/pull/344

HDFS-13056. Add support for a new COMPOSITE_CRC FileChecksum which is 
comparable between different block layouts and between striped/replicated files



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dennishuo/hadoop add-composite-crc32

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/hadoop/pull/344.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #344


commit de06097fa2f4c511d5a107d997c7dfa5862ada82
Author: Dennis Huo 
Date:   2018-01-24T23:04:29Z

Add support for a new COMPOSITE_CRC FileChecksum.

Adds new file-level ChecksumCombineMode options settable through config and
lower-level BlockChecksumOptions to indicate block-checksum types supported 
by
both blockChecksum and blockGroupChecksum in DataTransferProtocol.

CRCs are composed such that they are agnostic to block/chunk/cell layout and
thus can be compared between replicated-files and striped-files of
different underlying blocksize, bytes-per-crc, and cellSize settings.

Does not alter default behavior, and doesn't touch the data-read or
data-write paths at all.

commit 3f8fd5ef9da8c312f60430622d3c95f80cb1fde2
Author: Dennis Huo 
Date:   2018-02-08T00:21:14Z

Fix byte-length property for CRC FileChecksum

commit 1a326e38505bacd6b40a682668f36c2aa1047f86
Author: Dennis Huo 
Date:   2018-02-19T02:53:03Z

Add unittest for CrcUtil.

Minor optimization by starting multiplier at x^8 and fix the behavior of
composing a zero-length crcB.

commit d7c2bc739f3cff0d8ae72bb4f2a940eb5b733279
Author: Dennis Huo 
Date:   2018-02-20T00:47:50Z

Refactor StripedBlockChecksumReconstructor for easier reuse with 
COMPOSITE_CRC.

Update BlockChecksumHelper's CRC composition to use the same data buffer
used in MD5 case, and factor our shared logic from the
StripedBlockChecksumReconstructor into an abstract base class so that
reconstruction logic can be shared between MD5CRC and COMPOSITE_CRC.

commit ac38f404f1d15c9846f58acf297c7e242c3f8bce
Author: Dennis Huo 
Date:   2018-02-20T03:05:41Z

Extract a helper class CrcComposer.

Encapsulate all the CRC internals such as tracking the CRC polynomial,
precomputing the monomial, etc., into this class so taht BlockChecksumHelper
and FileChecksumHelper only need to interact with the clean interfaces
of CrcComposer.

commit 8f7b9fd6f93c8358dd0c4899e41d2a993bcc6294
Author: Dennis Huo 
Date:   2018-02-20T03:40:33Z

Add StripedBlockChecksumCompositeCrcReconstructor.

Wire it in to BlockChecksumHelper and use CrcComposer to regenerate
striped composite CRCs for missing EC data blocks.

commit fd2fc3408346aeb177eaeda50919995ee3c02cab
Author: Dennis Huo 
Date:   2018-02-20T21:56:07Z

Add end-to-end test coverage for COMPOSITE_CRC.

Extract hooks in TestFileChecksum to allow a subclass to share core
tests while modifying expectations of a subset of tests; add
TestFileChecksumCompositeCrc which extends TestFileChecksum to
apply the same test suite to COMPOSITE_CRC, and add a test case
for comparing two replicated files with different block sizes.
Test confirms that MD5CRC will yield different checksums
between replicated vs striped, and two replicated files with
different block sizes, while COMPOSITE_CRC yields the same
checksum for all cases.

commit 5cd2d08f2be672e79d931ebb6f89541f38334f0b
Author: Dennis Huo 
Date:   2018-02-20T23:44:11Z

Add unittest for CrcComposer.

Fix a bug in handling byte-array updates with nonzero offset.

commit e65248b077d4e1ad00888112de877afed86dad03
Author: Dennis Huo 
Date:   2018-02-21T00:08:05Z

Remove STRIPED_CRC as a BlockChecksumType.

Refactor to just use stripeLength with COMPOSITE_CRC, where non-striped
COMPOSITE_CRC is just an edge case where stripeLength is longer than the
data range.

commit c2a7701246c07a4906d7540d6bc496364239dafc
Author: Dennis Huo 
Date:   2018-02-21T01:02:08Z

Support file-attribute propagation of bytePerCrc in 
CompositeCrcFileChecksum.

Additionally, fix up remaining TODOs; add wrappers for late-evaluating
hex format of CRCs to pass into debug statements and clean up logging
logic.




> Expose file-level composite CRCs in HDFS which are comparable across 
> different instances/layouts
> 

[jira] [Commented] (HDFS-13175) Add more information for checking argument in DiskBalancerVolume

2018-02-20 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370819#comment-16370819
 ] 

genericqa commented on HDFS-13175:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
16s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
55s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
0s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
41s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
5s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m  2s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
3s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
55s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 4s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 29s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
56s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}122m 34s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
27s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}177m  5s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.TestDecommission |
|   | hadoop.hdfs.TestFileChecksum |
|   | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure150 |
|   | hadoop.hdfs.TestErasureCodingPolicyWithSnapshotWithRandomECPolicy |
|   | hadoop.hdfs.TestReplication |
|   | hadoop.hdfs.TestHFlush |
|   | hadoop.hdfs.server.namenode.TestReencryptionWithKMS |
|   | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting |
|   | hadoop.hdfs.qjournal.server.TestJournalNodeSync |
|   | hadoop.hdfs.TestDFSStripedOutputStream |
|   | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure170 |
|   | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure050 |
|   | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure |
|   | hadoop.hdfs.TestSetrepIncreasing |
|   | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure070 |
|   | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure160 |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | HDFS-13175 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12911282/HDFS-13175.00.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  

[jira] [Commented] (HDFS-13167) DatanodeAdminManager Improvements

2018-02-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370769#comment-16370769
 ] 

Hudson commented on HDFS-13167:
---

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13688 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/13688/])
HDFS-13167. DatanodeAdminManager Improvements. Contributed by BELUGA (inigoiri: 
rev 6f81cc0beea00843b44424417f09d8ee12cd7bae)
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeAdminManager.java


> DatanodeAdminManager Improvements
> -
>
> Key: HDFS-13167
> URL: https://issues.apache.org/jira/browse/HDFS-13167
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.0.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Trivial
> Fix For: 3.2.0
>
> Attachments: HDFS-13167.1.patch, HDFS-13167.2.patch, 
> HDFS-13167.3.patch
>
>
> # Use Collection type Set instead of List for tracking nodes
> # Fix logging statements that are erroneously appending variables instead of 
> using parameters
> # Miscellaneous small improvements
> As an example, the {{node}} variable is being appended to the string instead 
> of being passed as an argument to the {{trace}} method for variable 
> substitution.
> {code}
> LOG.trace("stopDecommission: Node {} in {}, nothing to do." +
>   node, node.getAdminState());
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13175) Add more information for checking argument in DiskBalancerVolume

2018-02-20 Thread Lei (Eddy) Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370765#comment-16370765
 ] 

Lei (Eddy) Xu commented on HDFS-13175:
--

[~anu] is this writing-before-stream trying to write {{clusterInfo}} obtained 
from {{readClusterInfo(cmd); }}. The exception above is from 
{{readClusterInfo()}} so that it could not be able to write this 
{{beforeStream}} in this particular case.

I moved that block of code before {{computePlan}} in the 01 patch. 

> Add more information for checking argument in DiskBalancerVolume
> 
>
> Key: HDFS-13175
> URL: https://issues.apache.org/jira/browse/HDFS-13175
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: diskbalancer
>Affects Versions: 3.0.0
>Reporter: Lei (Eddy) Xu
>Assignee: Lei (Eddy) Xu
>Priority: Minor
> Attachments: HDFS-13175.00.patch, HDFS-13175.01.patch
>
>
> We have seen the following stack in production
> {code}
> Exception in thread "main" java.lang.IllegalArgumentException
>   at 
> com.google.common.base.Preconditions.checkArgument(Preconditions.java:72)
>   at 
> org.apache.hadoop.hdfs.server.diskbalancer.datamodel.DiskBalancerVolume.setUsed(DiskBalancerVolume.java:268)
>   at 
> org.apache.hadoop.hdfs.server.diskbalancer.connectors.DBNameNodeConnector.getVolumeInfoFromStorageReports(DBNameNodeConnector.java:141)
>   at 
> org.apache.hadoop.hdfs.server.diskbalancer.connectors.DBNameNodeConnector.getNodes(DBNameNodeConnector.java:90)
>   at 
> org.apache.hadoop.hdfs.server.diskbalancer.datamodel.DiskBalancerCluster.readClusterInfo(DiskBalancerCluster.java:132)
>   at 
> org.apache.hadoop.hdfs.server.diskbalancer.command.Command.readClusterInfo(Command.java:123)
>   at 
> org.apache.hadoop.hdfs.server.diskbalancer.command.PlanCommand.execute(PlanCommand.java:107)
> {code}
> raised from 
> {code}
>  public void setUsed(long dfsUsedSpace) {
> Preconditions.checkArgument(dfsUsedSpace < this.getCapacity());
> this.used = dfsUsedSpace;
>   }
> {code}
> However, the datanode reports at the very moment were not captured. We should 
> add more information into the stack trace to better diagnose the issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13175) Add more information for checking argument in DiskBalancerVolume

2018-02-20 Thread Lei (Eddy) Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei (Eddy) Xu updated HDFS-13175:
-
Attachment: HDFS-13175.01.patch

> Add more information for checking argument in DiskBalancerVolume
> 
>
> Key: HDFS-13175
> URL: https://issues.apache.org/jira/browse/HDFS-13175
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: diskbalancer
>Affects Versions: 3.0.0
>Reporter: Lei (Eddy) Xu
>Assignee: Lei (Eddy) Xu
>Priority: Minor
> Attachments: HDFS-13175.00.patch, HDFS-13175.01.patch
>
>
> We have seen the following stack in production
> {code}
> Exception in thread "main" java.lang.IllegalArgumentException
>   at 
> com.google.common.base.Preconditions.checkArgument(Preconditions.java:72)
>   at 
> org.apache.hadoop.hdfs.server.diskbalancer.datamodel.DiskBalancerVolume.setUsed(DiskBalancerVolume.java:268)
>   at 
> org.apache.hadoop.hdfs.server.diskbalancer.connectors.DBNameNodeConnector.getVolumeInfoFromStorageReports(DBNameNodeConnector.java:141)
>   at 
> org.apache.hadoop.hdfs.server.diskbalancer.connectors.DBNameNodeConnector.getNodes(DBNameNodeConnector.java:90)
>   at 
> org.apache.hadoop.hdfs.server.diskbalancer.datamodel.DiskBalancerCluster.readClusterInfo(DiskBalancerCluster.java:132)
>   at 
> org.apache.hadoop.hdfs.server.diskbalancer.command.Command.readClusterInfo(Command.java:123)
>   at 
> org.apache.hadoop.hdfs.server.diskbalancer.command.PlanCommand.execute(PlanCommand.java:107)
> {code}
> raised from 
> {code}
>  public void setUsed(long dfsUsedSpace) {
> Preconditions.checkArgument(dfsUsedSpace < this.getCapacity());
> this.used = dfsUsedSpace;
>   }
> {code}
> However, the datanode reports at the very moment were not captured. We should 
> add more information into the stack trace to better diagnose the issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13168) XmlImageVisitor - Prefer Array over LinkedList

2018-02-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370749#comment-16370749
 ] 

Hudson commented on HDFS-13168:
---

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13687 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/13687/])
HDFS-13168. XmlImageVisitor - Prefer Array over LinkedList. Contributed 
(inigoiri: rev 17c592e6cfd1ea3dbe9671c4703caabd095d87cf)
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineImageViewer/XmlImageVisitor.java


> XmlImageVisitor - Prefer Array over LinkedList
> --
>
> Key: HDFS-13168
> URL: https://issues.apache.org/jira/browse/HDFS-13168
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.0.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Minor
> Fix For: 3.2.0
>
> Attachments: HDFS-13168.1.patch, HDFS-13168.2.patch
>
>
> {{ArrayDeque}}
> {quote}This class is likely to be faster than Stack when used as a stack, and 
> faster than LinkedList when used as a queue.{quote}
> .. not to mention less memory fragmentation (single backing array v.s. many 
> ArrayList nodes).
> https://docs.oracle.com/javase/8/docs/api/java/util/ArrayDeque.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13109) Support fully qualified hdfs path in EZ commands

2018-02-20 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370741#comment-16370741
 ] 

genericqa commented on HDFS-13109:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
20s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
24s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 
58s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
33s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
44s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
40s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 16s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
24s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
17s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m  
9s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
32s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 43s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
14s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
22s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}118m 46s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
23s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}179m 17s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | HDFS-13109 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12911273/HDFS-13109.003.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 720859b6050a 3.13.0-135-generic #184-Ubuntu SMP Wed Oct 18 
11:55:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 9028cca |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
| unit | 

[jira] [Updated] (HDFS-13167) DatanodeAdminManager Improvements

2018-02-20 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/HDFS-13167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Íñigo Goiri updated HDFS-13167:
---
   Resolution: Fixed
Fix Version/s: 3.2.0
   Status: Resolved  (was: Patch Available)

> DatanodeAdminManager Improvements
> -
>
> Key: HDFS-13167
> URL: https://issues.apache.org/jira/browse/HDFS-13167
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.0.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Trivial
> Fix For: 3.2.0
>
> Attachments: HDFS-13167.1.patch, HDFS-13167.2.patch, 
> HDFS-13167.3.patch
>
>
> # Use Collection type Set instead of List for tracking nodes
> # Fix logging statements that are erroneously appending variables instead of 
> using parameters
> # Miscellaneous small improvements
> As an example, the {{node}} variable is being appended to the string instead 
> of being passed as an argument to the {{trace}} method for variable 
> substitution.
> {code}
> LOG.trace("stopDecommission: Node {} in {}, nothing to do." +
>   node, node.getAdminState());
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13168) XmlImageVisitor - Prefer Array over LinkedList

2018-02-20 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/HDFS-13168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Íñigo Goiri updated HDFS-13168:
---
   Resolution: Fixed
Fix Version/s: 3.2.0
   Status: Resolved  (was: Patch Available)

> XmlImageVisitor - Prefer Array over LinkedList
> --
>
> Key: HDFS-13168
> URL: https://issues.apache.org/jira/browse/HDFS-13168
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.0.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Minor
> Fix For: 3.2.0
>
> Attachments: HDFS-13168.1.patch, HDFS-13168.2.patch
>
>
> {{ArrayDeque}}
> {quote}This class is likely to be faster than Stack when used as a stack, and 
> faster than LinkedList when used as a queue.{quote}
> .. not to mention less memory fragmentation (single backing array v.s. many 
> ArrayList nodes).
> https://docs.oracle.com/javase/8/docs/api/java/util/ArrayDeque.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13168) XmlImageVisitor - Prefer Array over LinkedList

2018-02-20 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HDFS-13168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370732#comment-16370732
 ] 

Íñigo Goiri commented on HDFS-13168:


Thanks [~belugabehr] for the patch, committed to trunk.

> XmlImageVisitor - Prefer Array over LinkedList
> --
>
> Key: HDFS-13168
> URL: https://issues.apache.org/jira/browse/HDFS-13168
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.0.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Minor
> Fix For: 3.2.0
>
> Attachments: HDFS-13168.1.patch, HDFS-13168.2.patch
>
>
> {{ArrayDeque}}
> {quote}This class is likely to be faster than Stack when used as a stack, and 
> faster than LinkedList when used as a queue.{quote}
> .. not to mention less memory fragmentation (single backing array v.s. many 
> ArrayList nodes).
> https://docs.oracle.com/javase/8/docs/api/java/util/ArrayDeque.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13175) Add more information for checking argument in DiskBalancerVolume

2018-02-20 Thread Lei (Eddy) Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370720#comment-16370720
 ] 

Lei (Eddy) Xu commented on HDFS-13175:
--

Thanks a lot for the information, [~anu].  Let me go back to check and get back 
to you.

> Add more information for checking argument in DiskBalancerVolume
> 
>
> Key: HDFS-13175
> URL: https://issues.apache.org/jira/browse/HDFS-13175
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: diskbalancer
>Affects Versions: 3.0.0
>Reporter: Lei (Eddy) Xu
>Assignee: Lei (Eddy) Xu
>Priority: Minor
> Attachments: HDFS-13175.00.patch
>
>
> We have seen the following stack in production
> {code}
> Exception in thread "main" java.lang.IllegalArgumentException
>   at 
> com.google.common.base.Preconditions.checkArgument(Preconditions.java:72)
>   at 
> org.apache.hadoop.hdfs.server.diskbalancer.datamodel.DiskBalancerVolume.setUsed(DiskBalancerVolume.java:268)
>   at 
> org.apache.hadoop.hdfs.server.diskbalancer.connectors.DBNameNodeConnector.getVolumeInfoFromStorageReports(DBNameNodeConnector.java:141)
>   at 
> org.apache.hadoop.hdfs.server.diskbalancer.connectors.DBNameNodeConnector.getNodes(DBNameNodeConnector.java:90)
>   at 
> org.apache.hadoop.hdfs.server.diskbalancer.datamodel.DiskBalancerCluster.readClusterInfo(DiskBalancerCluster.java:132)
>   at 
> org.apache.hadoop.hdfs.server.diskbalancer.command.Command.readClusterInfo(Command.java:123)
>   at 
> org.apache.hadoop.hdfs.server.diskbalancer.command.PlanCommand.execute(PlanCommand.java:107)
> {code}
> raised from 
> {code}
>  public void setUsed(long dfsUsedSpace) {
> Preconditions.checkArgument(dfsUsedSpace < this.getCapacity());
> this.used = dfsUsedSpace;
>   }
> {code}
> However, the datanode reports at the very moment were not captured. We should 
> add more information into the stack trace to better diagnose the issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13168) XmlImageVisitor - Prefer Array over LinkedList

2018-02-20 Thread BELUGA BEHR (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370713#comment-16370713
 ] 

BELUGA BEHR commented on HDFS-13168:


+ is turned into StringBuilder under the covers.

No preference, but I believe it's been all caps thus far.  Thanks!!!

> XmlImageVisitor - Prefer Array over LinkedList
> --
>
> Key: HDFS-13168
> URL: https://issues.apache.org/jira/browse/HDFS-13168
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.0.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Minor
> Attachments: HDFS-13168.1.patch, HDFS-13168.2.patch
>
>
> {{ArrayDeque}}
> {quote}This class is likely to be faster than Stack when used as a stack, and 
> faster than LinkedList when used as a queue.{quote}
> .. not to mention less memory fragmentation (single backing array v.s. many 
> ArrayList nodes).
> https://docs.oracle.com/javase/8/docs/api/java/util/ArrayDeque.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13175) Add more information for checking argument in DiskBalancerVolume

2018-02-20 Thread Anu Engineer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370697#comment-16370697
 ] 

Anu Engineer commented on HDFS-13175:
-

[~eddyxu] Just checked code,  we need to move this block, after we call  
{{readClusterInfo(cmd);}}
{code:java}
try (FSDataOutputStream beforeStream = create(String.format(
DiskBalancerCLI.BEFORE_TEMPLATE,
cmd.getOptionValue(DiskBalancerCLI.PLAN {
  beforeStream.write(getCluster().toJson()
  .getBytes(StandardCharsets.UTF_8));
}{code}

but before we call {{computePlan}}, that way we will write always write 
{{before.json}}

{{List plans = getCluster().computePlan(this.thresholdPercentage);}}


> Add more information for checking argument in DiskBalancerVolume
> 
>
> Key: HDFS-13175
> URL: https://issues.apache.org/jira/browse/HDFS-13175
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: diskbalancer
>Affects Versions: 3.0.0
>Reporter: Lei (Eddy) Xu
>Assignee: Lei (Eddy) Xu
>Priority: Minor
> Attachments: HDFS-13175.00.patch
>
>
> We have seen the following stack in production
> {code}
> Exception in thread "main" java.lang.IllegalArgumentException
>   at 
> com.google.common.base.Preconditions.checkArgument(Preconditions.java:72)
>   at 
> org.apache.hadoop.hdfs.server.diskbalancer.datamodel.DiskBalancerVolume.setUsed(DiskBalancerVolume.java:268)
>   at 
> org.apache.hadoop.hdfs.server.diskbalancer.connectors.DBNameNodeConnector.getVolumeInfoFromStorageReports(DBNameNodeConnector.java:141)
>   at 
> org.apache.hadoop.hdfs.server.diskbalancer.connectors.DBNameNodeConnector.getNodes(DBNameNodeConnector.java:90)
>   at 
> org.apache.hadoop.hdfs.server.diskbalancer.datamodel.DiskBalancerCluster.readClusterInfo(DiskBalancerCluster.java:132)
>   at 
> org.apache.hadoop.hdfs.server.diskbalancer.command.Command.readClusterInfo(Command.java:123)
>   at 
> org.apache.hadoop.hdfs.server.diskbalancer.command.PlanCommand.execute(PlanCommand.java:107)
> {code}
> raised from 
> {code}
>  public void setUsed(long dfsUsedSpace) {
> Preconditions.checkArgument(dfsUsedSpace < this.getCapacity());
> this.used = dfsUsedSpace;
>   }
> {code}
> However, the datanode reports at the very moment were not captured. We should 
> add more information into the stack trace to better diagnose the issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-13175) Add more information for checking argument in DiskBalancerVolume

2018-02-20 Thread Anu Engineer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370665#comment-16370665
 ] 

Anu Engineer edited comment on HDFS-13175 at 2/20/18 10:29 PM:
---

[~eddyxu] , We capture a file called datanode.before.json, that file contains 
the whole Datanode reports that we read from the Namenode connector. The 
default path is {{"/system/diskbalancer/./before.json}}, 
please see if you have that file, if so we will be able to reproduce this 
issue. It is possible that we crashed before we wrote this file, if so may be 
we should save the data before we process it.


was (Author: anu):
[~eddyxu] , We capture a file called datanode.before.json, that file contains 
the whole Datanode reports that we read from the Namenode connector. The 
default path is {{"/system/diskbalancer/./before.json}}, 
please see if you have that file, if so we will be able to reproduce this 
issue. It is possible that we crashed before we wrote this file, if so may be 
we should have the data before we process it.

> Add more information for checking argument in DiskBalancerVolume
> 
>
> Key: HDFS-13175
> URL: https://issues.apache.org/jira/browse/HDFS-13175
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: diskbalancer
>Affects Versions: 3.0.0
>Reporter: Lei (Eddy) Xu
>Assignee: Lei (Eddy) Xu
>Priority: Minor
> Attachments: HDFS-13175.00.patch
>
>
> We have seen the following stack in production
> {code}
> Exception in thread "main" java.lang.IllegalArgumentException
>   at 
> com.google.common.base.Preconditions.checkArgument(Preconditions.java:72)
>   at 
> org.apache.hadoop.hdfs.server.diskbalancer.datamodel.DiskBalancerVolume.setUsed(DiskBalancerVolume.java:268)
>   at 
> org.apache.hadoop.hdfs.server.diskbalancer.connectors.DBNameNodeConnector.getVolumeInfoFromStorageReports(DBNameNodeConnector.java:141)
>   at 
> org.apache.hadoop.hdfs.server.diskbalancer.connectors.DBNameNodeConnector.getNodes(DBNameNodeConnector.java:90)
>   at 
> org.apache.hadoop.hdfs.server.diskbalancer.datamodel.DiskBalancerCluster.readClusterInfo(DiskBalancerCluster.java:132)
>   at 
> org.apache.hadoop.hdfs.server.diskbalancer.command.Command.readClusterInfo(Command.java:123)
>   at 
> org.apache.hadoop.hdfs.server.diskbalancer.command.PlanCommand.execute(PlanCommand.java:107)
> {code}
> raised from 
> {code}
>  public void setUsed(long dfsUsedSpace) {
> Preconditions.checkArgument(dfsUsedSpace < this.getCapacity());
> this.used = dfsUsedSpace;
>   }
> {code}
> However, the datanode reports at the very moment were not captured. We should 
> add more information into the stack trace to better diagnose the issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-13175) Add more information for checking argument in DiskBalancerVolume

2018-02-20 Thread Anu Engineer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370665#comment-16370665
 ] 

Anu Engineer edited comment on HDFS-13175 at 2/20/18 10:28 PM:
---

[~eddyxu] , We capture a file called datanode.before.json, that file contains 
the whole Datanode reports that we read from the Namenode connector. The 
default path is {{"/system/diskbalancer/./before.json}}, 
please see if you have that file, if so we will be able to reproduce this 
issue. It is possible that we crashed before we wrote this file, if so may be 
we should have the data before we process it.


was (Author: anu):
[~eddyxu] , We capture a file called datanode.before.json, that file contains 
the whole Datanode reports that we read from the Namenode connector. The 
default path is {{"/system/diskbalancer/./before.json}}, 
please see if you have that file, if so we will be able to reproduce this issue.

> Add more information for checking argument in DiskBalancerVolume
> 
>
> Key: HDFS-13175
> URL: https://issues.apache.org/jira/browse/HDFS-13175
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: diskbalancer
>Affects Versions: 3.0.0
>Reporter: Lei (Eddy) Xu
>Assignee: Lei (Eddy) Xu
>Priority: Minor
> Attachments: HDFS-13175.00.patch
>
>
> We have seen the following stack in production
> {code}
> Exception in thread "main" java.lang.IllegalArgumentException
>   at 
> com.google.common.base.Preconditions.checkArgument(Preconditions.java:72)
>   at 
> org.apache.hadoop.hdfs.server.diskbalancer.datamodel.DiskBalancerVolume.setUsed(DiskBalancerVolume.java:268)
>   at 
> org.apache.hadoop.hdfs.server.diskbalancer.connectors.DBNameNodeConnector.getVolumeInfoFromStorageReports(DBNameNodeConnector.java:141)
>   at 
> org.apache.hadoop.hdfs.server.diskbalancer.connectors.DBNameNodeConnector.getNodes(DBNameNodeConnector.java:90)
>   at 
> org.apache.hadoop.hdfs.server.diskbalancer.datamodel.DiskBalancerCluster.readClusterInfo(DiskBalancerCluster.java:132)
>   at 
> org.apache.hadoop.hdfs.server.diskbalancer.command.Command.readClusterInfo(Command.java:123)
>   at 
> org.apache.hadoop.hdfs.server.diskbalancer.command.PlanCommand.execute(PlanCommand.java:107)
> {code}
> raised from 
> {code}
>  public void setUsed(long dfsUsedSpace) {
> Preconditions.checkArgument(dfsUsedSpace < this.getCapacity());
> this.used = dfsUsedSpace;
>   }
> {code}
> However, the datanode reports at the very moment were not captured. We should 
> add more information into the stack trace to better diagnose the issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13175) Add more information for checking argument in DiskBalancerVolume

2018-02-20 Thread Anu Engineer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370665#comment-16370665
 ] 

Anu Engineer commented on HDFS-13175:
-

[~eddyxu] , We capture a file called datanode.before.json, that file contains 
the whole Datanode reports that we read from the Namenode connector. The 
default path is {{"/system/diskbalancer/./before.json}}, 
please see if you have that file, if so we will be able to reproduce this issue.

> Add more information for checking argument in DiskBalancerVolume
> 
>
> Key: HDFS-13175
> URL: https://issues.apache.org/jira/browse/HDFS-13175
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: diskbalancer
>Affects Versions: 3.0.0
>Reporter: Lei (Eddy) Xu
>Assignee: Lei (Eddy) Xu
>Priority: Minor
> Attachments: HDFS-13175.00.patch
>
>
> We have seen the following stack in production
> {code}
> Exception in thread "main" java.lang.IllegalArgumentException
>   at 
> com.google.common.base.Preconditions.checkArgument(Preconditions.java:72)
>   at 
> org.apache.hadoop.hdfs.server.diskbalancer.datamodel.DiskBalancerVolume.setUsed(DiskBalancerVolume.java:268)
>   at 
> org.apache.hadoop.hdfs.server.diskbalancer.connectors.DBNameNodeConnector.getVolumeInfoFromStorageReports(DBNameNodeConnector.java:141)
>   at 
> org.apache.hadoop.hdfs.server.diskbalancer.connectors.DBNameNodeConnector.getNodes(DBNameNodeConnector.java:90)
>   at 
> org.apache.hadoop.hdfs.server.diskbalancer.datamodel.DiskBalancerCluster.readClusterInfo(DiskBalancerCluster.java:132)
>   at 
> org.apache.hadoop.hdfs.server.diskbalancer.command.Command.readClusterInfo(Command.java:123)
>   at 
> org.apache.hadoop.hdfs.server.diskbalancer.command.PlanCommand.execute(PlanCommand.java:107)
> {code}
> raised from 
> {code}
>  public void setUsed(long dfsUsedSpace) {
> Preconditions.checkArgument(dfsUsedSpace < this.getCapacity());
> this.used = dfsUsedSpace;
>   }
> {code}
> However, the datanode reports at the very moment were not captured. We should 
> add more information into the stack trace to better diagnose the issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13175) Add more information for checking argument in DiskBalancerVolume

2018-02-20 Thread Anu Engineer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370655#comment-16370655
 ] 

Anu Engineer commented on HDFS-13175:
-

+1, Pending Jenkins.Thanks for filing and fixing this issue.

> Add more information for checking argument in DiskBalancerVolume
> 
>
> Key: HDFS-13175
> URL: https://issues.apache.org/jira/browse/HDFS-13175
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: diskbalancer
>Affects Versions: 3.0.0
>Reporter: Lei (Eddy) Xu
>Assignee: Lei (Eddy) Xu
>Priority: Minor
> Attachments: HDFS-13175.00.patch
>
>
> We have seen the following stack in production
> {code}
> Exception in thread "main" java.lang.IllegalArgumentException
>   at 
> com.google.common.base.Preconditions.checkArgument(Preconditions.java:72)
>   at 
> org.apache.hadoop.hdfs.server.diskbalancer.datamodel.DiskBalancerVolume.setUsed(DiskBalancerVolume.java:268)
>   at 
> org.apache.hadoop.hdfs.server.diskbalancer.connectors.DBNameNodeConnector.getVolumeInfoFromStorageReports(DBNameNodeConnector.java:141)
>   at 
> org.apache.hadoop.hdfs.server.diskbalancer.connectors.DBNameNodeConnector.getNodes(DBNameNodeConnector.java:90)
>   at 
> org.apache.hadoop.hdfs.server.diskbalancer.datamodel.DiskBalancerCluster.readClusterInfo(DiskBalancerCluster.java:132)
>   at 
> org.apache.hadoop.hdfs.server.diskbalancer.command.Command.readClusterInfo(Command.java:123)
>   at 
> org.apache.hadoop.hdfs.server.diskbalancer.command.PlanCommand.execute(PlanCommand.java:107)
> {code}
> raised from 
> {code}
>  public void setUsed(long dfsUsedSpace) {
> Preconditions.checkArgument(dfsUsedSpace < this.getCapacity());
> this.used = dfsUsedSpace;
>   }
> {code}
> However, the datanode reports at the very moment were not captured. We should 
> add more information into the stack trace to better diagnose the issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13175) Add more information for checking argument in DiskBalancerVolume

2018-02-20 Thread Lei (Eddy) Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370650#comment-16370650
 ] 

Lei (Eddy) Xu commented on HDFS-13175:
--

The patch also deleted a duplicated line of 
{{volume.setUsed(report.getDfsUsed());}}. 

> Add more information for checking argument in DiskBalancerVolume
> 
>
> Key: HDFS-13175
> URL: https://issues.apache.org/jira/browse/HDFS-13175
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: diskbalancer
>Affects Versions: 3.0.0
>Reporter: Lei (Eddy) Xu
>Assignee: Lei (Eddy) Xu
>Priority: Minor
> Attachments: HDFS-13175.00.patch
>
>
> We have seen the following stack in production
> {code}
> Exception in thread "main" java.lang.IllegalArgumentException
>   at 
> com.google.common.base.Preconditions.checkArgument(Preconditions.java:72)
>   at 
> org.apache.hadoop.hdfs.server.diskbalancer.datamodel.DiskBalancerVolume.setUsed(DiskBalancerVolume.java:268)
>   at 
> org.apache.hadoop.hdfs.server.diskbalancer.connectors.DBNameNodeConnector.getVolumeInfoFromStorageReports(DBNameNodeConnector.java:141)
>   at 
> org.apache.hadoop.hdfs.server.diskbalancer.connectors.DBNameNodeConnector.getNodes(DBNameNodeConnector.java:90)
>   at 
> org.apache.hadoop.hdfs.server.diskbalancer.datamodel.DiskBalancerCluster.readClusterInfo(DiskBalancerCluster.java:132)
>   at 
> org.apache.hadoop.hdfs.server.diskbalancer.command.Command.readClusterInfo(Command.java:123)
>   at 
> org.apache.hadoop.hdfs.server.diskbalancer.command.PlanCommand.execute(PlanCommand.java:107)
> {code}
> raised from 
> {code}
>  public void setUsed(long dfsUsedSpace) {
> Preconditions.checkArgument(dfsUsedSpace < this.getCapacity());
> this.used = dfsUsedSpace;
>   }
> {code}
> However, the datanode reports at the very moment were not captured. We should 
> add more information into the stack trace to better diagnose the issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13175) Add more information for checking argument in DiskBalancerVolume

2018-02-20 Thread Lei (Eddy) Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei (Eddy) Xu updated HDFS-13175:
-
Status: Patch Available  (was: Open)

> Add more information for checking argument in DiskBalancerVolume
> 
>
> Key: HDFS-13175
> URL: https://issues.apache.org/jira/browse/HDFS-13175
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: diskbalancer
>Affects Versions: 3.0.0
>Reporter: Lei (Eddy) Xu
>Assignee: Lei (Eddy) Xu
>Priority: Minor
> Attachments: HDFS-13175.00.patch
>
>
> We have seen the following stack in production
> {code}
> Exception in thread "main" java.lang.IllegalArgumentException
>   at 
> com.google.common.base.Preconditions.checkArgument(Preconditions.java:72)
>   at 
> org.apache.hadoop.hdfs.server.diskbalancer.datamodel.DiskBalancerVolume.setUsed(DiskBalancerVolume.java:268)
>   at 
> org.apache.hadoop.hdfs.server.diskbalancer.connectors.DBNameNodeConnector.getVolumeInfoFromStorageReports(DBNameNodeConnector.java:141)
>   at 
> org.apache.hadoop.hdfs.server.diskbalancer.connectors.DBNameNodeConnector.getNodes(DBNameNodeConnector.java:90)
>   at 
> org.apache.hadoop.hdfs.server.diskbalancer.datamodel.DiskBalancerCluster.readClusterInfo(DiskBalancerCluster.java:132)
>   at 
> org.apache.hadoop.hdfs.server.diskbalancer.command.Command.readClusterInfo(Command.java:123)
>   at 
> org.apache.hadoop.hdfs.server.diskbalancer.command.PlanCommand.execute(PlanCommand.java:107)
> {code}
> raised from 
> {code}
>  public void setUsed(long dfsUsedSpace) {
> Preconditions.checkArgument(dfsUsedSpace < this.getCapacity());
> this.used = dfsUsedSpace;
>   }
> {code}
> However, the datanode reports at the very moment were not captured. We should 
> add more information into the stack trace to better diagnose the issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13161) Update comment in start-dfs.sh to mention correct variable for secure datanode user

2018-02-20 Thread Ajay Kumar (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370649#comment-16370649
 ] 

Ajay Kumar commented on HDFS-13161:
---

[~vagarychen] Thanks for review and commit.

> Update comment in start-dfs.sh to mention correct variable for secure 
> datanode user 
> 
>
> Key: HDFS-13161
> URL: https://issues.apache.org/jira/browse/HDFS-13161
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Ajay Kumar
>Assignee: Ajay Kumar
>Priority: Minor
> Attachments: HDFS-13161.000.patch
>
>
>  start-dfs.sh mentions that for secure DN startup we need to set 
> HADOOP_SECURE_DN_USER.
> The correct variable is HDFS_DATANODE_SECURE_USER.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13175) Add more information for checking argument in DiskBalancerVolume

2018-02-20 Thread Lei (Eddy) Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei (Eddy) Xu updated HDFS-13175:
-
Attachment: HDFS-13175.00.patch

> Add more information for checking argument in DiskBalancerVolume
> 
>
> Key: HDFS-13175
> URL: https://issues.apache.org/jira/browse/HDFS-13175
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: diskbalancer
>Affects Versions: 3.0.0
>Reporter: Lei (Eddy) Xu
>Assignee: Lei (Eddy) Xu
>Priority: Minor
> Attachments: HDFS-13175.00.patch
>
>
> We have seen the following stack in production
> {code}
> Exception in thread "main" java.lang.IllegalArgumentException
>   at 
> com.google.common.base.Preconditions.checkArgument(Preconditions.java:72)
>   at 
> org.apache.hadoop.hdfs.server.diskbalancer.datamodel.DiskBalancerVolume.setUsed(DiskBalancerVolume.java:268)
>   at 
> org.apache.hadoop.hdfs.server.diskbalancer.connectors.DBNameNodeConnector.getVolumeInfoFromStorageReports(DBNameNodeConnector.java:141)
>   at 
> org.apache.hadoop.hdfs.server.diskbalancer.connectors.DBNameNodeConnector.getNodes(DBNameNodeConnector.java:90)
>   at 
> org.apache.hadoop.hdfs.server.diskbalancer.datamodel.DiskBalancerCluster.readClusterInfo(DiskBalancerCluster.java:132)
>   at 
> org.apache.hadoop.hdfs.server.diskbalancer.command.Command.readClusterInfo(Command.java:123)
>   at 
> org.apache.hadoop.hdfs.server.diskbalancer.command.PlanCommand.execute(PlanCommand.java:107)
> {code}
> raised from 
> {code}
>  public void setUsed(long dfsUsedSpace) {
> Preconditions.checkArgument(dfsUsedSpace < this.getCapacity());
> this.used = dfsUsedSpace;
>   }
> {code}
> However, the datanode reports at the very moment were not captured. We should 
> add more information into the stack trace to better diagnose the issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13108) Ozone: OzoneFileSystem: Simplified url schema for Ozone File System

2018-02-20 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370638#comment-16370638
 ] 

genericqa commented on HDFS-13108:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
18s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} HDFS-7240 Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
34s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 
56s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 15m  
2s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
20s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m  
4s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
17m  9s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
14s{color} | {color:green} HDFS-7240 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
46s{color} | {color:green} HDFS-7240 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
20s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 15m 
12s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 15m 
12s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
2m 20s{color} | {color:orange} root: The patch generated 1 new + 0 unchanged - 
0 fixed = 1 total (was 0) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m  
4s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 40s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
49s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}109m 43s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
29s{color} | {color:green} hadoop-ozone in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
43s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}215m  8s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA |
|   | hadoop.hdfs.server.namenode.TestTruncateQuotaUpdate |
|   | hadoop.hdfs.TestReadStripedFileWithMissingBlocks |
|   | hadoop.hdfs.server.namenode.TestCheckpoint |
|   | hadoop.hdfs.server.namenode.TestNameEditsConfigs |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:d11161b |
| JIRA Issue | HDFS-13108 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12911263/HDFS-13108-HDFS-7240.006.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 2da01179cea9 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 
14:43:09 UTC 2018 x86_64 x86_64 x86_64 

[jira] [Updated] (HDFS-13175) Add more information for checking argument in DiskBalancerVolume

2018-02-20 Thread Lei (Eddy) Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei (Eddy) Xu updated HDFS-13175:
-
Description: 
We have seen the following stack in production

{code}
Exception in thread "main" java.lang.IllegalArgumentException
at 
com.google.common.base.Preconditions.checkArgument(Preconditions.java:72)
at 
org.apache.hadoop.hdfs.server.diskbalancer.datamodel.DiskBalancerVolume.setUsed(DiskBalancerVolume.java:268)
at 
org.apache.hadoop.hdfs.server.diskbalancer.connectors.DBNameNodeConnector.getVolumeInfoFromStorageReports(DBNameNodeConnector.java:141)
at 
org.apache.hadoop.hdfs.server.diskbalancer.connectors.DBNameNodeConnector.getNodes(DBNameNodeConnector.java:90)
at 
org.apache.hadoop.hdfs.server.diskbalancer.datamodel.DiskBalancerCluster.readClusterInfo(DiskBalancerCluster.java:132)
at 
org.apache.hadoop.hdfs.server.diskbalancer.command.Command.readClusterInfo(Command.java:123)
at 
org.apache.hadoop.hdfs.server.diskbalancer.command.PlanCommand.execute(PlanCommand.java:107)
{code}

raised from 
{code}
 public void setUsed(long dfsUsedSpace) {
Preconditions.checkArgument(dfsUsedSpace < this.getCapacity());
this.used = dfsUsedSpace;
  }
{code}

However, the datanode reports at the very moment were not captured. We should 
add more information into the stack trace to better diagnose the issue.

  was:
We have seen the following stack in production

{code
Exception in thread "main" java.lang.IllegalArgumentException
at 
com.google.common.base.Preconditions.checkArgument(Preconditions.java:72)
at 
org.apache.hadoop.hdfs.server.diskbalancer.datamodel.DiskBalancerVolume.setUsed(DiskBalancerVolume.java:268)
at 
org.apache.hadoop.hdfs.server.diskbalancer.connectors.DBNameNodeConnector.getVolumeInfoFromStorageReports(DBNameNodeConnector.java:141)
at 
org.apache.hadoop.hdfs.server.diskbalancer.connectors.DBNameNodeConnector.getNodes(DBNameNodeConnector.java:90)
at 
org.apache.hadoop.hdfs.server.diskbalancer.datamodel.DiskBalancerCluster.readClusterInfo(DiskBalancerCluster.java:132)
at 
org.apache.hadoop.hdfs.server.diskbalancer.command.Command.readClusterInfo(Command.java:123)
at 
org.apache.hadoop.hdfs.server.diskbalancer.command.PlanCommand.execute(PlanCommand.java:107)
{code}

raised from 
{code}
 public void setUsed(long dfsUsedSpace) {
Preconditions.checkArgument(dfsUsedSpace < this.getCapacity());
this.used = dfsUsedSpace;
  }
{code}

However, the datanode reports at the very moment were not captured. We should 
add more information into the stack trace to better diagnose the issue.


> Add more information for checking argument in DiskBalancerVolume
> 
>
> Key: HDFS-13175
> URL: https://issues.apache.org/jira/browse/HDFS-13175
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: diskbalancer
>Affects Versions: 3.0.0
>Reporter: Lei (Eddy) Xu
>Assignee: Lei (Eddy) Xu
>Priority: Minor
>
> We have seen the following stack in production
> {code}
> Exception in thread "main" java.lang.IllegalArgumentException
>   at 
> com.google.common.base.Preconditions.checkArgument(Preconditions.java:72)
>   at 
> org.apache.hadoop.hdfs.server.diskbalancer.datamodel.DiskBalancerVolume.setUsed(DiskBalancerVolume.java:268)
>   at 
> org.apache.hadoop.hdfs.server.diskbalancer.connectors.DBNameNodeConnector.getVolumeInfoFromStorageReports(DBNameNodeConnector.java:141)
>   at 
> org.apache.hadoop.hdfs.server.diskbalancer.connectors.DBNameNodeConnector.getNodes(DBNameNodeConnector.java:90)
>   at 
> org.apache.hadoop.hdfs.server.diskbalancer.datamodel.DiskBalancerCluster.readClusterInfo(DiskBalancerCluster.java:132)
>   at 
> org.apache.hadoop.hdfs.server.diskbalancer.command.Command.readClusterInfo(Command.java:123)
>   at 
> org.apache.hadoop.hdfs.server.diskbalancer.command.PlanCommand.execute(PlanCommand.java:107)
> {code}
> raised from 
> {code}
>  public void setUsed(long dfsUsedSpace) {
> Preconditions.checkArgument(dfsUsedSpace < this.getCapacity());
> this.used = dfsUsedSpace;
>   }
> {code}
> However, the datanode reports at the very moment were not captured. We should 
> add more information into the stack trace to better diagnose the issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-13175) Add more information for checking argument in DiskBalancerVolume

2018-02-20 Thread Lei (Eddy) Xu (JIRA)
Lei (Eddy) Xu created HDFS-13175:


 Summary: Add more information for checking argument in 
DiskBalancerVolume
 Key: HDFS-13175
 URL: https://issues.apache.org/jira/browse/HDFS-13175
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: diskbalancer
Affects Versions: 3.0.0
Reporter: Lei (Eddy) Xu
Assignee: Lei (Eddy) Xu


We have seen the following stack in production

{code
Exception in thread "main" java.lang.IllegalArgumentException
at 
com.google.common.base.Preconditions.checkArgument(Preconditions.java:72)
at 
org.apache.hadoop.hdfs.server.diskbalancer.datamodel.DiskBalancerVolume.setUsed(DiskBalancerVolume.java:268)
at 
org.apache.hadoop.hdfs.server.diskbalancer.connectors.DBNameNodeConnector.getVolumeInfoFromStorageReports(DBNameNodeConnector.java:141)
at 
org.apache.hadoop.hdfs.server.diskbalancer.connectors.DBNameNodeConnector.getNodes(DBNameNodeConnector.java:90)
at 
org.apache.hadoop.hdfs.server.diskbalancer.datamodel.DiskBalancerCluster.readClusterInfo(DiskBalancerCluster.java:132)
at 
org.apache.hadoop.hdfs.server.diskbalancer.command.Command.readClusterInfo(Command.java:123)
at 
org.apache.hadoop.hdfs.server.diskbalancer.command.PlanCommand.execute(PlanCommand.java:107)
{code}

raised from 
{code}
 public void setUsed(long dfsUsedSpace) {
Preconditions.checkArgument(dfsUsedSpace < this.getCapacity());
this.used = dfsUsedSpace;
  }
{code}

However, the datanode reports at the very moment were not captured. We should 
add more information into the stack trace to better diagnose the issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13119) RBF: Manage unavailable clusters

2018-02-20 Thread Lei (Eddy) Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370621#comment-16370621
 ] 

Lei (Eddy) Xu commented on HDFS-13119:
--

Hi, [~elgoiri] 

This looks not like a blocker for 3.0.1 to me. Lets make it 3.0.2 then.

> RBF: Manage unavailable clusters
> 
>
> Key: HDFS-13119
> URL: https://issues.apache.org/jira/browse/HDFS-13119
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: 3.0.0
>Reporter: Íñigo Goiri
>Assignee: Yiqun Lin
>Priority: Major
>  Labels: RBF
> Fix For: 3.1.0, 2.10.0, 2.9.1, 3.0.2, 3.2.0
>
> Attachments: HDFS-13119.001.patch, HDFS-13119.002.patch, 
> HDFS-13119.003.patch, HDFS-13119.004.patch, HDFS-13119.005.patch, 
> HDFS-13119.006.patch
>
>
> When a federated cluster has one of the subcluster down, operations that run 
> in every subcluster ({{RouterRpcClient#invokeAll()}}) may take all the RPC 
> connections.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11187) Optimize disk access for last partial chunk checksum of Finalized replica

2018-02-20 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370607#comment-16370607
 ] 

genericqa commented on HDFS-11187:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 15m  
2s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} branch-2.7 Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  8m 
10s{color} | {color:green} branch-2.7 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
14s{color} | {color:green} branch-2.7 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
30s{color} | {color:green} branch-2.7 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
10s{color} | {color:green} branch-2.7 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m  
3s{color} | {color:green} branch-2.7 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
49s{color} | {color:green} branch-2.7 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
55s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
0s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 23s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch 
generated 1 new + 172 unchanged - 0 fixed = 173 total (was 172) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
58s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 61 line(s) that end in whitespace. Use 
git apply --whitespace=fix <>. Refer 
https://git-scm.com/docs/git-apply {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
45s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}108m 38s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  1m 
24s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}151m  3s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Unreaped Processes | hadoop-hdfs:28 |
| Failed junit tests | hadoop.hdfs.server.datanode.TestFsDatasetCache |
|   | hadoop.hdfs.server.datanode.fsdataset.impl.TestWriteToReplica |
|   | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting |
|   | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure |
| Timed out junit tests | org.apache.hadoop.hdfs.TestWriteRead |
|   | org.apache.hadoop.hdfs.server.datanode.TestNNHandlesBlockReportPerStorage 
|
|   | org.apache.hadoop.hdfs.server.datanode.TestDeleteBlockPool |
|   | org.apache.hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes |
|   | org.apache.hadoop.hdfs.server.datanode.TestNNHandlesCombinedBlockReport |
|   | org.apache.hadoop.hdfs.server.datanode.TestDataNodeMetrics |
|   | org.apache.hadoop.hdfs.server.datanode.TestDataNodeRollingUpgrade |
|   | org.apache.hadoop.hdfs.server.datanode.TestBlockRecovery |
|   | org.apache.hadoop.hdfs.TestPread |
|   | org.apache.hadoop.hdfs.TestFileAppend4 |
|   | org.apache.hadoop.hdfs.TestRollingUpgradeDowngrade |
|   | org.apache.hadoop.hdfs.server.datanode.TestBatchIbr |
|   | org.apache.hadoop.hdfs.TestDecommission |
|   | 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestInterDatanodeProtocol 
|
|   | org.apache.hadoop.hdfs.TestDFSUpgrade |
|   | 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistFiles |
|   | org.apache.hadoop.hdfs.server.namenode.TestCheckpoint |
|   | 

[jira] [Updated] (HDFS-13119) RBF: Manage unavailable clusters

2018-02-20 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/HDFS-13119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Íñigo Goiri updated HDFS-13119:
---
Fix Version/s: 3.0.2
   2.9.1

> RBF: Manage unavailable clusters
> 
>
> Key: HDFS-13119
> URL: https://issues.apache.org/jira/browse/HDFS-13119
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: 3.0.0
>Reporter: Íñigo Goiri
>Assignee: Yiqun Lin
>Priority: Major
>  Labels: RBF
> Fix For: 3.1.0, 2.10.0, 2.9.1, 3.0.2, 3.2.0
>
> Attachments: HDFS-13119.001.patch, HDFS-13119.002.patch, 
> HDFS-13119.003.patch, HDFS-13119.004.patch, HDFS-13119.005.patch, 
> HDFS-13119.006.patch
>
>
> When a federated cluster has one of the subcluster down, operations that run 
> in every subcluster ({{RouterRpcClient#invokeAll()}}) may take all the RPC 
> connections.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13165) [SPS]: Collects successfully moved block details via IBR

2018-02-20 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370604#comment-16370604
 ] 

genericqa commented on HDFS-13165:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
17s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 17 new or modified test 
files. {color} |
|| || || || {color:brown} HDFS-10285 Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 
40s{color} | {color:green} HDFS-10285 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
58s{color} | {color:green} HDFS-10285 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
56s{color} | {color:green} HDFS-10285 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
16s{color} | {color:green} HDFS-10285 passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 29s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
14s{color} | {color:green} HDFS-10285 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
58s{color} | {color:green} HDFS-10285 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  0m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
59s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 50s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch 
generated 26 new + 887 unchanged - 4 fixed = 913 total (was 891) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
2s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 2 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 17s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
52s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 97m 22s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
28s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}156m 27s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hdfs.server.namenode.TestPersistentStoragePolicySatisfier |
|   | hadoop.hdfs.TestFileChecksum |
|   | hadoop.hdfs.server.namenode.TestTruncateQuotaUpdate |
|   | hadoop.hdfs.server.namenode.TestReencryptionWithKMS |
|   | hadoop.hdfs.server.namenode.sps.TestStoragePolicySatisfier |
|   | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure080 |
|   | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure070 |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | HDFS-13165 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12911264/HDFS-13165-HDFS-10285-01.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  cc  |
| uname | Linux 539e3ea12cc8 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 
14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality 

[jira] [Commented] (HDFS-13119) RBF: Manage unavailable clusters

2018-02-20 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HDFS-13119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370606#comment-16370606
 ] 

Íñigo Goiri commented on HDFS-13119:


Thanks [~chris.douglas] for the clarification.
I pushed to {{branch-2.9}} and {{branch-3.0}} and added 2.9.1 and 3.0.2 as fix 
versions.

> RBF: Manage unavailable clusters
> 
>
> Key: HDFS-13119
> URL: https://issues.apache.org/jira/browse/HDFS-13119
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: 3.0.0
>Reporter: Íñigo Goiri
>Assignee: Yiqun Lin
>Priority: Major
>  Labels: RBF
> Fix For: 3.1.0, 2.10.0, 2.9.1, 3.0.2, 3.2.0
>
> Attachments: HDFS-13119.001.patch, HDFS-13119.002.patch, 
> HDFS-13119.003.patch, HDFS-13119.004.patch, HDFS-13119.005.patch, 
> HDFS-13119.006.patch
>
>
> When a federated cluster has one of the subcluster down, operations that run 
> in every subcluster ({{RouterRpcClient#invokeAll()}}) may take all the RPC 
> connections.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13167) DatanodeAdminManager Improvements

2018-02-20 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HDFS-13167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370591#comment-16370591
 ] 

Íñigo Goiri commented on HDFS-13167:


[^HDFS-13167.3.patch] LGTM.
Committing to trunk.

> DatanodeAdminManager Improvements
> -
>
> Key: HDFS-13167
> URL: https://issues.apache.org/jira/browse/HDFS-13167
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.0.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Trivial
> Attachments: HDFS-13167.1.patch, HDFS-13167.2.patch, 
> HDFS-13167.3.patch
>
>
> # Use Collection type Set instead of List for tracking nodes
> # Fix logging statements that are erroneously appending variables instead of 
> using parameters
> # Miscellaneous small improvements
> As an example, the {{node}} variable is being appended to the string instead 
> of being passed as an argument to the {{trace}} method for variable 
> substitution.
> {code}
> LOG.trace("stopDecommission: Node {} in {}, nothing to do." +
>   node, node.getAdminState());
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13109) Support fully qualified hdfs path in EZ commands

2018-02-20 Thread Hanisha Koneru (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370572#comment-16370572
 ] 

Hanisha Koneru commented on HDFS-13109:
---

Thanks for reviewing the patch, [~shahrs87].
{quote}The variable {{dfs}} in {{HdfsAdmin}} referred to 
{{DistributedFileSystem}} whereas in {{DistributedFileSystem}}, {{dfs}} refers 
to {{DFSClient}}.{{DistributedFileSystem#getEZForPath}} resolves the path and 
calls {{DFSClient#getEZForPath}}. Whereas after the patch, it won't resolve the 
path.
{quote}
I did not resolve the path as the input path to {{private void 
provisionEZTrash}} is already resolved by the calling method. So we can skip 
{{DistributedFileSystem#getEZForPath}} and directly call 
{{DFSClient#getEZForPath}}.
 Please correct me if I understood wrongly.
{quote}You have already resolved the path in the calling function public void 
provisionEZTrash. You can just pass the resolved path to the private method 
provisionEZTrash instead of getPathName.
{quote}
Yes, thanks for catching this. We can skip the {{getPathName}} and directly 
pass the resolved path component.

Addressed other review comments and checkstyle issues in patch v03.

> Support fully qualified hdfs path in EZ commands
> 
>
> Key: HDFS-13109
> URL: https://issues.apache.org/jira/browse/HDFS-13109
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Reporter: Hanisha Koneru
>Assignee: Hanisha Koneru
>Priority: Major
> Attachments: HDFS-13109.001.patch, HDFS-13109.002.patch, 
> HDFS-13109.003.patch
>
>
> When creating an Encryption Zone, if the fully qualified path is specified in 
> the path argument, it throws the following error.
> {code:java}
> ~$ hdfs crypto -createZone -keyName mykey1 -path hdfs://ns1/zone1
> IllegalArgumentException: hdfs://ns1/zone1 is not the root of an encryption 
> zone. Do you mean /zone1?
> ~$ hdfs crypto -createZone -keyName mykey1 -path "hdfs://namenode:9000/zone2" 
> IllegalArgumentException: hdfs://namenode:9000/zone2 is not the root of an 
> encryption zone. Do you mean /zone2?
> {code}
> The EZ creation succeeds as the path is resolved in 
> DFS#createEncryptionZone(). But while creating the Trash directory, the path 
> is not resolved and it throws the above error.
>  A fully qualified path should be supported by {{crypto}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13109) Support fully qualified hdfs path in EZ commands

2018-02-20 Thread Hanisha Koneru (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanisha Koneru updated HDFS-13109:
--
Attachment: HDFS-13109.003.patch

> Support fully qualified hdfs path in EZ commands
> 
>
> Key: HDFS-13109
> URL: https://issues.apache.org/jira/browse/HDFS-13109
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Reporter: Hanisha Koneru
>Assignee: Hanisha Koneru
>Priority: Major
> Attachments: HDFS-13109.001.patch, HDFS-13109.002.patch, 
> HDFS-13109.003.patch
>
>
> When creating an Encryption Zone, if the fully qualified path is specified in 
> the path argument, it throws the following error.
> {code:java}
> ~$ hdfs crypto -createZone -keyName mykey1 -path hdfs://ns1/zone1
> IllegalArgumentException: hdfs://ns1/zone1 is not the root of an encryption 
> zone. Do you mean /zone1?
> ~$ hdfs crypto -createZone -keyName mykey1 -path "hdfs://namenode:9000/zone2" 
> IllegalArgumentException: hdfs://namenode:9000/zone2 is not the root of an 
> encryption zone. Do you mean /zone2?
> {code}
> The EZ creation succeeds as the path is resolved in 
> DFS#createEncryptionZone(). But while creating the Trash directory, the path 
> is not resolved and it throws the above error.
>  A fully qualified path should be supported by {{crypto}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13167) DatanodeAdminManager Improvements

2018-02-20 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370553#comment-16370553
 ] 

genericqa commented on HDFS-13167:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
24s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
 4s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
51s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
34s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
55s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m  6s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
47s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
52s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 27s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
55s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}125m 16s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
23s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}177m 40s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.server.namenode.TestTruncateQuotaUpdate |
|   | hadoop.hdfs.web.TestWebHdfsTimeouts |
|   | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure |
|   | hadoop.hdfs.server.namenode.TestDecommissioningStatus |
|   | hadoop.hdfs.TestSafeModeWithStripedFileWithRandomECPolicy |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | HDFS-13167 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12911255/HDFS-13167.3.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 620e2dca651b 4.4.0-64-generic #85-Ubuntu SMP Mon Feb 20 
11:50:30 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 8896d20 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/23129/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 

[jira] [Commented] (HDFS-13159) TestTruncateQuotaUpdate fails in trunk

2018-02-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370537#comment-16370537
 ] 

Hudson commented on HDFS-13159:
---

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13686 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/13686/])
HDFS-13159. TestTruncateQuotaUpdate fails in trunk. Contributed by Nanda (arp: 
rev 9028ccaf838621808e5e26a9fa933d28799538dd)
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestTruncateQuotaUpdate.java


> TestTruncateQuotaUpdate fails in trunk
> --
>
> Key: HDFS-13159
> URL: https://issues.apache.org/jira/browse/HDFS-13159
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Reporter: Arpit Agarwal
>Assignee: Nanda kumar
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: HDFS-13159.000.patch, HDFS-13159.001.patch
>
>
> Details in comment below.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13119) RBF: Manage unavailable clusters

2018-02-20 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370517#comment-16370517
 ] 

Chris Douglas commented on HDFS-13119:
--

bq. As this is technically a bug, I'd like to push it for 2.9.1 and 3.0.1 (or 
3.0.2).
There's a vote for 3.0.1 in progress, but you can contact the release manager 
([~eddyxu]) in case he rolls another RC.

bq. Any idea what's the current state with the branches? My guess is branch-2.9 
and branch-3.0.
AFAIK:
trunk -> 3.2
3.1.0 -> branch-3.1
3.0.2 -> branch-3.0
3.0.1 -> branch-3.0.1
2.10 -> branch-2
2.9.1 -> branch-2.9

> RBF: Manage unavailable clusters
> 
>
> Key: HDFS-13119
> URL: https://issues.apache.org/jira/browse/HDFS-13119
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: 3.0.0
>Reporter: Íñigo Goiri
>Assignee: Yiqun Lin
>Priority: Major
>  Labels: RBF
> Fix For: 3.1.0, 2.10.0, 3.2.0
>
> Attachments: HDFS-13119.001.patch, HDFS-13119.002.patch, 
> HDFS-13119.003.patch, HDFS-13119.004.patch, HDFS-13119.005.patch, 
> HDFS-13119.006.patch
>
>
> When a federated cluster has one of the subcluster down, operations that run 
> in every subcluster ({{RouterRpcClient#invokeAll()}}) may take all the RPC 
> connections.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13070) Ozone: SCM: Support for container replica reconciliation - 1

2018-02-20 Thread Nanda kumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nanda kumar updated HDFS-13070:
---
Fix Version/s: HDFS-7240

> Ozone: SCM: Support for container replica reconciliation - 1
> 
>
> Key: HDFS-13070
> URL: https://issues.apache.org/jira/browse/HDFS-13070
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Reporter: Nanda kumar
>Assignee: Nanda kumar
>Priority: Major
> Fix For: HDFS-7240
>
> Attachments: HDFS-13070-HDFS-7240.000.patch, 
> HDFS-13070-HDFS-7240.001.patch
>
>
> SCM should process container reports and identify under replicated containers 
> for re-replication. {{ContainerSupervisor}} should take one NodePool at a 
> time and start processing the container reports of datanodes in that 
> NodePool. In this jira we just integrate {{ContainerSupervisor}} into SCM, 
> actual reconciliation logic will be handled in follow-up jiras.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12977) Add stateId to RPC headers.

2018-02-20 Thread Plamen Jeliazkov (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370481#comment-16370481
 ] 

Plamen Jeliazkov commented on HDFS-12977:
-

Thanks for taking a look Konstantin.

With regards to #1 – If I try to do that right away I would have to import the 
hdfs module into common. Perhaps I can find a smarter way around that though. 
May require changing the constructor for Call though.

And with #2 – that changes the logic to fetch the EditLog's Txid without the 
writeLock. As long as this is called after the response is created I think we 
are OK but I am not sure how this will work out with async EditLog feature. We 
don't want to end up in a situation where the client gets a response with a 
Txid that is actually behind what it's requsts's/response's Txid is.

> Add stateId to RPC headers.
> ---
>
> Key: HDFS-12977
> URL: https://issues.apache.org/jira/browse/HDFS-12977
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ipc, namenode
>Reporter: Konstantin Shvachko
>Assignee: Plamen Jeliazkov
>Priority: Major
> Attachments: HDFS_12977.trunk.001.patch
>
>
> stateId is a new field in the RPC headers of NameNode proto calls.
> stateId is the journal transaction Id, which represents LastSeenId for the 
> clients and LastWrittenId for NameNodes. See more in [reads from Standby 
> design 
> doc|https://issues.apache.org/jira/secure/attachment/12902925/ConsistentReadsFromStandbyNode.pdf].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13070) Ozone: SCM: Support for container replica reconciliation - 1

2018-02-20 Thread Nanda kumar (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370477#comment-16370477
 ] 

Nanda kumar commented on HDFS-13070:


Thanks [~anu] for the review. I have committed this to the feature branch.

> Ozone: SCM: Support for container replica reconciliation - 1
> 
>
> Key: HDFS-13070
> URL: https://issues.apache.org/jira/browse/HDFS-13070
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Reporter: Nanda kumar
>Assignee: Nanda kumar
>Priority: Major
> Attachments: HDFS-13070-HDFS-7240.000.patch, 
> HDFS-13070-HDFS-7240.001.patch
>
>
> SCM should process container reports and identify under replicated containers 
> for re-replication. {{ContainerSupervisor}} should take one NodePool at a 
> time and start processing the container reports of datanodes in that 
> NodePool. In this jira we just integrate {{ContainerSupervisor}} into SCM, 
> actual reconciliation logic will be handled in follow-up jiras.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13070) Ozone: SCM: Support for container replica reconciliation - 1

2018-02-20 Thread Nanda kumar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nanda kumar updated HDFS-13070:
---
  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

> Ozone: SCM: Support for container replica reconciliation - 1
> 
>
> Key: HDFS-13070
> URL: https://issues.apache.org/jira/browse/HDFS-13070
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Reporter: Nanda kumar
>Assignee: Nanda kumar
>Priority: Major
> Attachments: HDFS-13070-HDFS-7240.000.patch, 
> HDFS-13070-HDFS-7240.001.patch
>
>
> SCM should process container reports and identify under replicated containers 
> for re-replication. {{ContainerSupervisor}} should take one NodePool at a 
> time and start processing the container reports of datanodes in that 
> NodePool. In this jira we just integrate {{ContainerSupervisor}} into SCM, 
> actual reconciliation logic will be handled in follow-up jiras.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13159) TestTruncateQuotaUpdate fails in trunk

2018-02-20 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-13159:
-
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.2.0
   Status: Resolved  (was: Patch Available)

+1

I've committed this. Thanks for the quick fix [~nandakumar131]!

> TestTruncateQuotaUpdate fails in trunk
> --
>
> Key: HDFS-13159
> URL: https://issues.apache.org/jira/browse/HDFS-13159
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Reporter: Arpit Agarwal
>Assignee: Nanda kumar
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: HDFS-13159.000.patch, HDFS-13159.001.patch
>
>
> Details in comment below.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-13078) Ozone: Update Ratis on Ozone to 0.1.1-alpha-8fd74ed-SNAPSHOT, to fix large chunk reads (>4M) from Datanodes

2018-02-20 Thread Anu Engineer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370459#comment-16370459
 ] 

Anu Engineer edited comment on HDFS-13078 at 2/20/18 7:06 PM:
--

[~msingh] Thanks for the patch. [~szetszwo] Thanks for the review, I have 
committed this to the feature branch.

{{TestKeys}} passed on running locally, just FYI.


was (Author: anu):
[~msingh] Thanks for the patch. [~szetszwo] Thanks for the review, I have 
committed this to the feature branch.

> Ozone: Update Ratis on Ozone to 0.1.1-alpha-8fd74ed-SNAPSHOT, to fix large 
> chunk reads (>4M) from Datanodes
> ---
>
> Key: HDFS-13078
> URL: https://issues.apache.org/jira/browse/HDFS-13078
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Mukul Kumar Singh
>Assignee: Mukul Kumar Singh
>Priority: Major
> Fix For: HDFS-7240
>
> Attachments: HDFS-13078-HDFS-7240.001.patch, 
> HDFS-13078-HDFS-7240.002.patch, HDFS-13078-HDFS-7240.003.patch, 
> HDFS-13078-HDFS-7240.004.patch, HDFS-13078-HDFS-7240.005.patch
>
>
> In Ozone, reads from Ratis read fail because stream is closed before the 
> reply is received.
> {code}
> Jan 23, 2018 1:27:14 PM 
> org.apache.ratis.shaded.io.grpc.netty.NettyServerHandler onStreamError
> WARNING: Stream Error
> org.apache.ratis.shaded.io.netty.handler.codec.http2.Http2Exception$StreamException:
>  Stream closed before write could take place
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.Http2Exception.streamError(Http2Exception.java:149)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2RemoteFlowController$FlowState.cancel(DefaultHttp2RemoteFlowController.java:499)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2RemoteFlowController$FlowState.cancel(DefaultHttp2RemoteFlowController.java:480)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2RemoteFlowController$1.onStreamClosed(DefaultHttp2RemoteFlowController.java:105)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2Connection.notifyClosed(DefaultHttp2Connection.java:349)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2Connection$ActiveStreams.removeFromActiveStreams(DefaultHttp2Connection.java:985)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2Connection$ActiveStreams.deactivate(DefaultHttp2Connection.java:941)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2Connection$DefaultStream.close(DefaultHttp2Connection.java:497)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2Connection$DefaultStream.close(DefaultHttp2Connection.java:503)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.Http2ConnectionHandler.closeStream(Http2ConnectionHandler.java:587)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2ConnectionDecoder$FrameReadListener.onRstStreamRead(DefaultHttp2ConnectionDecoder.java:356)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.Http2InboundFrameLogger$1.onRstStreamRead(Http2InboundFrameLogger.java:80)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2FrameReader.readRstStreamFrame(DefaultHttp2FrameReader.java:516)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2FrameReader.processPayloadState(DefaultHttp2FrameReader.java:260)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2FrameReader.readFrame(DefaultHttp2FrameReader.java:160)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.Http2InboundFrameLogger.readFrame(Http2InboundFrameLogger.java:41)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2ConnectionDecoder.decodeFrame(DefaultHttp2ConnectionDecoder.java:118)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.Http2ConnectionHandler$FrameDecoder.decode(Http2ConnectionHandler.java:388)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.Http2ConnectionHandler.decode(Http2ConnectionHandler.java:448)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:489)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:428)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:265)
> at 
> 

[jira] [Updated] (HDFS-13078) Ozone: Update Ratis on Ozone to 0.1.1-alpha-8fd74ed-SNAPSHOT, to fix large chunk reads (>4M) from Datanodes

2018-02-20 Thread Anu Engineer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anu Engineer updated HDFS-13078:

Resolution: Fixed
Status: Resolved  (was: Patch Available)

[~msingh] Thanks for the patch. [~szetszwo] Thanks for the review, I have 
committed this to the feature branch.

> Ozone: Update Ratis on Ozone to 0.1.1-alpha-8fd74ed-SNAPSHOT, to fix large 
> chunk reads (>4M) from Datanodes
> ---
>
> Key: HDFS-13078
> URL: https://issues.apache.org/jira/browse/HDFS-13078
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Mukul Kumar Singh
>Assignee: Mukul Kumar Singh
>Priority: Major
> Fix For: HDFS-7240
>
> Attachments: HDFS-13078-HDFS-7240.001.patch, 
> HDFS-13078-HDFS-7240.002.patch, HDFS-13078-HDFS-7240.003.patch, 
> HDFS-13078-HDFS-7240.004.patch, HDFS-13078-HDFS-7240.005.patch
>
>
> In Ozone, reads from Ratis read fail because stream is closed before the 
> reply is received.
> {code}
> Jan 23, 2018 1:27:14 PM 
> org.apache.ratis.shaded.io.grpc.netty.NettyServerHandler onStreamError
> WARNING: Stream Error
> org.apache.ratis.shaded.io.netty.handler.codec.http2.Http2Exception$StreamException:
>  Stream closed before write could take place
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.Http2Exception.streamError(Http2Exception.java:149)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2RemoteFlowController$FlowState.cancel(DefaultHttp2RemoteFlowController.java:499)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2RemoteFlowController$FlowState.cancel(DefaultHttp2RemoteFlowController.java:480)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2RemoteFlowController$1.onStreamClosed(DefaultHttp2RemoteFlowController.java:105)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2Connection.notifyClosed(DefaultHttp2Connection.java:349)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2Connection$ActiveStreams.removeFromActiveStreams(DefaultHttp2Connection.java:985)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2Connection$ActiveStreams.deactivate(DefaultHttp2Connection.java:941)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2Connection$DefaultStream.close(DefaultHttp2Connection.java:497)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2Connection$DefaultStream.close(DefaultHttp2Connection.java:503)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.Http2ConnectionHandler.closeStream(Http2ConnectionHandler.java:587)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2ConnectionDecoder$FrameReadListener.onRstStreamRead(DefaultHttp2ConnectionDecoder.java:356)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.Http2InboundFrameLogger$1.onRstStreamRead(Http2InboundFrameLogger.java:80)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2FrameReader.readRstStreamFrame(DefaultHttp2FrameReader.java:516)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2FrameReader.processPayloadState(DefaultHttp2FrameReader.java:260)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2FrameReader.readFrame(DefaultHttp2FrameReader.java:160)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.Http2InboundFrameLogger.readFrame(Http2InboundFrameLogger.java:41)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2ConnectionDecoder.decodeFrame(DefaultHttp2ConnectionDecoder.java:118)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.Http2ConnectionHandler$FrameDecoder.decode(Http2ConnectionHandler.java:388)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.Http2ConnectionHandler.decode(Http2ConnectionHandler.java:448)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:489)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:428)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:265)
> at 
> org.apache.ratis.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
> at 
> 

[jira] [Commented] (HDFS-13165) [SPS]: Collects successfully moved block details via IBR

2018-02-20 Thread Rakesh R (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370443#comment-16370443
 ] 

Rakesh R commented on HDFS-13165:
-

Attached new patch. Following are the changes compares to the previous patch:
- built data structure to match the RECEIVED_BLOCK with the expected block 
moves, then updates track list {{file vs block moves}}.
- made DatanodeProtocol.proto changes by removing 
{{BlocksStorageMoveAttemptFinishedProto}} status result, where we added earlier 
to notify SPS.
- cleaned up BlockStorageMovementTracker
- made few minor log changes.

Note: the patch is quite big due to proto changes and unit test case 
refactoring.

> [SPS]: Collects successfully moved block details via IBR
> 
>
> Key: HDFS-13165
> URL: https://issues.apache.org/jira/browse/HDFS-13165
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Rakesh R
>Assignee: Rakesh R
>Priority: Major
> Attachments: HDFS-13165-HDFS-10285-00.patch, 
> HDFS-13165-HDFS-10285-01.patch
>
>
> This task to make use of the existing IBR to get moved block details and 
> remove unwanted future tracking logic exists in BlockStorageMovementTracker 
> code, this is no more needed as the file level tracking maintained at NN 
> itself.
> Following comments taken from HDFS-10285, 
> [here|https://issues.apache.org/jira/browse/HDFS-10285?focusedCommentId=16347472=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16347472]
> Comment-3)
> {quote}BPServiceActor
> Is it actually sending back the moved blocks? Aren’t IBRs sufficient?{quote}
> Comment-21)
> {quote}
> BlockStorageMovementTracker
> Many data structures are riddled with non-threadsafe race conditions and risk 
> of CMEs.
> Ex. The moverTaskFutures map. Adding new blocks and/or adding to a block's 
> list of futures is synchronized. However the run loop does an unsynchronized 
> block get, unsynchronized future remove, unsynchronized isEmpty, possibly 
> another unsynchronized get, only then does it do a synchronized remove of the 
> block. The whole chunk of code should be synchronized.
> Is the problematic moverTaskFutures even needed? It's aggregating futures 
> per-block for seemingly no reason. Why track all the futures at all instead 
> of just relying on the completion service? As best I can tell:
> It's only used to determine if a future from the completion service should be 
> ignored during shutdown. Shutdown sets the running boolean to false and 
> clears the entire datastructure so why not use the running boolean like a 
> check just a little further down?
> As synchronization to sleep up to 2 seconds before performing a blocking 
> moverCompletionService.take, but only when it thinks there are no active 
> futures. I'll ignore the missed notify race that the bounded wait masks, but 
> the real question is why not just do the blocking take?
> Why all the complexity? Am I missing something?
> BlocksMovementsStatusHandler
> Suffers same type of thread safety issues as StoragePolicySatisfyWorker. Ex. 
> blockIdVsMovementStatus is inconsistent synchronized. Does synchronize to 
> return an unmodifiable list which sadly does nothing to protect the caller 
> from CME.
> handle is iterating over a non-thread safe list.
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13078) Ozone: Update Ratis on Ozone to 0.1.1-alpha-8fd74ed-SNAPSHOT, to fix large chunk reads (>4M) from Datanodes

2018-02-20 Thread Anu Engineer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370424#comment-16370424
 ] 

Anu Engineer commented on HDFS-13078:
-

[~szetszwo] Thanks for the review. I will commit this patch shortly.

> Ozone: Update Ratis on Ozone to 0.1.1-alpha-8fd74ed-SNAPSHOT, to fix large 
> chunk reads (>4M) from Datanodes
> ---
>
> Key: HDFS-13078
> URL: https://issues.apache.org/jira/browse/HDFS-13078
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Mukul Kumar Singh
>Assignee: Mukul Kumar Singh
>Priority: Major
> Fix For: HDFS-7240
>
> Attachments: HDFS-13078-HDFS-7240.001.patch, 
> HDFS-13078-HDFS-7240.002.patch, HDFS-13078-HDFS-7240.003.patch, 
> HDFS-13078-HDFS-7240.004.patch, HDFS-13078-HDFS-7240.005.patch
>
>
> In Ozone, reads from Ratis read fail because stream is closed before the 
> reply is received.
> {code}
> Jan 23, 2018 1:27:14 PM 
> org.apache.ratis.shaded.io.grpc.netty.NettyServerHandler onStreamError
> WARNING: Stream Error
> org.apache.ratis.shaded.io.netty.handler.codec.http2.Http2Exception$StreamException:
>  Stream closed before write could take place
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.Http2Exception.streamError(Http2Exception.java:149)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2RemoteFlowController$FlowState.cancel(DefaultHttp2RemoteFlowController.java:499)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2RemoteFlowController$FlowState.cancel(DefaultHttp2RemoteFlowController.java:480)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2RemoteFlowController$1.onStreamClosed(DefaultHttp2RemoteFlowController.java:105)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2Connection.notifyClosed(DefaultHttp2Connection.java:349)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2Connection$ActiveStreams.removeFromActiveStreams(DefaultHttp2Connection.java:985)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2Connection$ActiveStreams.deactivate(DefaultHttp2Connection.java:941)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2Connection$DefaultStream.close(DefaultHttp2Connection.java:497)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2Connection$DefaultStream.close(DefaultHttp2Connection.java:503)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.Http2ConnectionHandler.closeStream(Http2ConnectionHandler.java:587)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2ConnectionDecoder$FrameReadListener.onRstStreamRead(DefaultHttp2ConnectionDecoder.java:356)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.Http2InboundFrameLogger$1.onRstStreamRead(Http2InboundFrameLogger.java:80)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2FrameReader.readRstStreamFrame(DefaultHttp2FrameReader.java:516)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2FrameReader.processPayloadState(DefaultHttp2FrameReader.java:260)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2FrameReader.readFrame(DefaultHttp2FrameReader.java:160)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.Http2InboundFrameLogger.readFrame(Http2InboundFrameLogger.java:41)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.DefaultHttp2ConnectionDecoder.decodeFrame(DefaultHttp2ConnectionDecoder.java:118)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.Http2ConnectionHandler$FrameDecoder.decode(Http2ConnectionHandler.java:388)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.http2.Http2ConnectionHandler.decode(Http2ConnectionHandler.java:448)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:489)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:428)
> at 
> org.apache.ratis.shaded.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:265)
> at 
> org.apache.ratis.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
> at 
> org.apache.ratis.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
> at 
> 

[jira] [Commented] (HDFS-13119) RBF: Manage unavailable clusters

2018-02-20 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HDFS-13119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370421#comment-16370421
 ] 

Íñigo Goiri commented on HDFS-13119:


[~linyiqun] I've been running this locally and the test takes a long time to 
run.
Right now, it's doing twice a retry of 10 times with a timeout of 1 second and 
a sleep of 1.
Checking the [test 
results|https://builds.apache.org/job/PreCommit-HDFS-Build/23121/testReport/org.apache.hadoop.hdfs.server.federation.router/TestRouterRPCClientRetries/],
 this makes 40 seconds for {{testRetryWhenOneNameServiceDown}} and 16 for 
{{testRetryWhenAllNameServiceDown}}.
I think this is unnecessary and we could tune:
* IPC_CLIENT_CONNECT_MAX_RETRIES_KEY
* IPC_CLIENT_CONNECT_RETRY_INTERVAL_KEY
In addition, to reduce the creation of {{MiniDFSCluster}}, we could use 
{{BeforeClass}}.
If you are on-board, I would open a new JIRA for this.


> RBF: Manage unavailable clusters
> 
>
> Key: HDFS-13119
> URL: https://issues.apache.org/jira/browse/HDFS-13119
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: 3.0.0
>Reporter: Íñigo Goiri
>Assignee: Yiqun Lin
>Priority: Major
>  Labels: RBF
> Fix For: 3.1.0, 2.10.0, 3.2.0
>
> Attachments: HDFS-13119.001.patch, HDFS-13119.002.patch, 
> HDFS-13119.003.patch, HDFS-13119.004.patch, HDFS-13119.005.patch, 
> HDFS-13119.006.patch
>
>
> When a federated cluster has one of the subcluster down, operations that run 
> in every subcluster ({{RouterRpcClient#invokeAll()}}) may take all the RPC 
> connections.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11187) Optimize disk access for last partial chunk checksum of Finalized replica

2018-02-20 Thread Gabor Bota (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabor Bota updated HDFS-11187:
--
Status: Patch Available  (was: Reopened)

Patch submitted for branch-2.7.
Cherry-picking the commit from branch-2.
Conflicts:

hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/FinalizedReplica.java

hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java


> Optimize disk access for last partial chunk checksum of Finalized replica
> -
>
> Key: HDFS-11187
> URL: https://issues.apache.org/jira/browse/HDFS-11187
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Wei-Chiu Chuang
>Assignee: Gabor Bota
>Priority: Major
> Fix For: 3.1.0, 2.10.0, 2.9.1, 2.8.4, 3.0.2
>
> Attachments: HDFS-11187-branch-2.001.patch, 
> HDFS-11187-branch-2.002.patch, HDFS-11187-branch-2.003.patch, 
> HDFS-11187-branch-2.004.patch, HDFS-11187-branch-2.7.001.patch, 
> HDFS-11187.001.patch, HDFS-11187.002.patch, HDFS-11187.003.patch, 
> HDFS-11187.004.patch, HDFS-11187.005.patch
>
>
> The patch at HDFS-11160 ensures BlockSender reads the correct version of 
> metafile when there are concurrent writers.
> However, the implementation is not optimal, because it must always read the 
> last partial chunk checksum from disk while holding FsDatasetImpl lock for 
> every reader. It is possible to optimize this by keeping an up-to-date 
> version of last partial checksum in-memory and reduce disk access.
> I am separating the optimization into a new jira, because maintaining the 
> state of in-memory checksum requires a lot more work.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Reopened] (HDFS-11187) Optimize disk access for last partial chunk checksum of Finalized replica

2018-02-20 Thread Gabor Bota (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabor Bota reopened HDFS-11187:
---

Reopened to provide patch for branch-2.7

> Optimize disk access for last partial chunk checksum of Finalized replica
> -
>
> Key: HDFS-11187
> URL: https://issues.apache.org/jira/browse/HDFS-11187
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Wei-Chiu Chuang
>Assignee: Gabor Bota
>Priority: Major
> Fix For: 3.1.0, 2.10.0, 2.9.1, 2.8.4, 3.0.2
>
> Attachments: HDFS-11187-branch-2.001.patch, 
> HDFS-11187-branch-2.002.patch, HDFS-11187-branch-2.003.patch, 
> HDFS-11187-branch-2.004.patch, HDFS-11187-branch-2.7.001.patch, 
> HDFS-11187.001.patch, HDFS-11187.002.patch, HDFS-11187.003.patch, 
> HDFS-11187.004.patch, HDFS-11187.005.patch
>
>
> The patch at HDFS-11160 ensures BlockSender reads the correct version of 
> metafile when there are concurrent writers.
> However, the implementation is not optimal, because it must always read the 
> last partial chunk checksum from disk while holding FsDatasetImpl lock for 
> every reader. It is possible to optimize this by keeping an up-to-date 
> version of last partial checksum in-memory and reduce disk access.
> I am separating the optimization into a new jira, because maintaining the 
> state of in-memory checksum requires a lot more work.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11187) Optimize disk access for last partial chunk checksum of Finalized replica

2018-02-20 Thread Gabor Bota (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabor Bota updated HDFS-11187:
--
Attachment: HDFS-11187-branch-2.7.001.patch

> Optimize disk access for last partial chunk checksum of Finalized replica
> -
>
> Key: HDFS-11187
> URL: https://issues.apache.org/jira/browse/HDFS-11187
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Wei-Chiu Chuang
>Assignee: Gabor Bota
>Priority: Major
> Fix For: 3.1.0, 2.10.0, 2.9.1, 2.8.4, 3.0.2
>
> Attachments: HDFS-11187-branch-2.001.patch, 
> HDFS-11187-branch-2.002.patch, HDFS-11187-branch-2.003.patch, 
> HDFS-11187-branch-2.004.patch, HDFS-11187-branch-2.7.001.patch, 
> HDFS-11187.001.patch, HDFS-11187.002.patch, HDFS-11187.003.patch, 
> HDFS-11187.004.patch, HDFS-11187.005.patch
>
>
> The patch at HDFS-11160 ensures BlockSender reads the correct version of 
> metafile when there are concurrent writers.
> However, the implementation is not optimal, because it must always read the 
> last partial chunk checksum from disk while holding FsDatasetImpl lock for 
> every reader. It is possible to optimize this by keeping an up-to-date 
> version of last partial checksum in-memory and reduce disk access.
> I am separating the optimization into a new jira, because maintaining the 
> state of in-memory checksum requires a lot more work.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13165) [SPS]: Collects successfully moved block details via IBR

2018-02-20 Thread Rakesh R (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakesh R updated HDFS-13165:

Attachment: HDFS-13165-HDFS-10285-01.patch

> [SPS]: Collects successfully moved block details via IBR
> 
>
> Key: HDFS-13165
> URL: https://issues.apache.org/jira/browse/HDFS-13165
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Rakesh R
>Assignee: Rakesh R
>Priority: Major
> Attachments: HDFS-13165-HDFS-10285-00.patch, 
> HDFS-13165-HDFS-10285-01.patch
>
>
> This task to make use of the existing IBR to get moved block details and 
> remove unwanted future tracking logic exists in BlockStorageMovementTracker 
> code, this is no more needed as the file level tracking maintained at NN 
> itself.
> Following comments taken from HDFS-10285, 
> [here|https://issues.apache.org/jira/browse/HDFS-10285?focusedCommentId=16347472=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16347472]
> Comment-3)
> {quote}BPServiceActor
> Is it actually sending back the moved blocks? Aren’t IBRs sufficient?{quote}
> Comment-21)
> {quote}
> BlockStorageMovementTracker
> Many data structures are riddled with non-threadsafe race conditions and risk 
> of CMEs.
> Ex. The moverTaskFutures map. Adding new blocks and/or adding to a block's 
> list of futures is synchronized. However the run loop does an unsynchronized 
> block get, unsynchronized future remove, unsynchronized isEmpty, possibly 
> another unsynchronized get, only then does it do a synchronized remove of the 
> block. The whole chunk of code should be synchronized.
> Is the problematic moverTaskFutures even needed? It's aggregating futures 
> per-block for seemingly no reason. Why track all the futures at all instead 
> of just relying on the completion service? As best I can tell:
> It's only used to determine if a future from the completion service should be 
> ignored during shutdown. Shutdown sets the running boolean to false and 
> clears the entire datastructure so why not use the running boolean like a 
> check just a little further down?
> As synchronization to sleep up to 2 seconds before performing a blocking 
> moverCompletionService.take, but only when it thinks there are no active 
> futures. I'll ignore the missed notify race that the bounded wait masks, but 
> the real question is why not just do the blocking take?
> Why all the complexity? Am I missing something?
> BlocksMovementsStatusHandler
> Suffers same type of thread safety issues as StoragePolicySatisfyWorker. Ex. 
> blockIdVsMovementStatus is inconsistent synchronized. Does synchronize to 
> return an unmodifiable list which sadly does nothing to protect the caller 
> from CME.
> handle is iterating over a non-thread safe list.
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13070) Ozone: SCM: Support for container replica reconciliation - 1

2018-02-20 Thread Anu Engineer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370415#comment-16370415
 ] 

Anu Engineer commented on HDFS-13070:
-

+1, the patch looks good to me. Thanks for taking care of this.

> Ozone: SCM: Support for container replica reconciliation - 1
> 
>
> Key: HDFS-13070
> URL: https://issues.apache.org/jira/browse/HDFS-13070
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Reporter: Nanda kumar
>Assignee: Nanda kumar
>Priority: Major
> Attachments: HDFS-13070-HDFS-7240.000.patch, 
> HDFS-13070-HDFS-7240.001.patch
>
>
> SCM should process container reports and identify under replicated containers 
> for re-replication. {{ContainerSupervisor}} should take one NodePool at a 
> time and start processing the container reports of datanodes in that 
> NodePool. In this jira we just integrate {{ContainerSupervisor}} into SCM, 
> actual reconciliation logic will be handled in follow-up jiras.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13170) Port webhdfs unmaskedpermission parameter to HTTPFS

2018-02-20 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370406#comment-16370406
 ] 

genericqa commented on HDFS-13170:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
18s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 
17s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
25s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
41s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 57s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
31s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
18s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
19s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 17s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs-httpfs: The 
patch generated 9 new + 397 unchanged - 7 fixed = 406 total (was 404) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 29s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
17s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
40s{color} | {color:green} hadoop-hdfs-httpfs in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
22s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 54m 10s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | HDFS-13170 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12911250/HDFS-13170.001.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux b2e9ac2d1724 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 
14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 8896d20 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HDFS-Build/23130/artifact/out/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs-httpfs.txt
 |
| whitespace | 
https://builds.apache.org/job/PreCommit-HDFS-Build/23130/artifact/out/whitespace-eol.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/23130/testReport/ |
| Max. process+thread count | 686 (vs. ulimit of 5500) |
| modules | C: 

[jira] [Commented] (HDFS-13108) Ozone: OzoneFileSystem: Simplified url schema for Ozone File System

2018-02-20 Thread Elek, Marton (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370405#comment-16370405
 ] 

Elek, Marton commented on HDFS-13108:
-

Final patch has been uploaded. Order of imports and all the assertion messages 
are fixed (Thanks to [~ste...@apache.org]'s comments). Both the contract and 
normal unit tests are passing.

> Ozone: OzoneFileSystem: Simplified url schema for Ozone File System
> ---
>
> Key: HDFS-13108
> URL: https://issues.apache.org/jira/browse/HDFS-13108
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Elek, Marton
>Assignee: Elek, Marton
>Priority: Major
> Attachments: HDFS-13108-HDFS-7240.001.patch, 
> HDFS-13108-HDFS-7240.002.patch, HDFS-13108-HDFS-7240.003.patch, 
> HDFS-13108-HDFS-7240.005.patch, HDFS-13108-HDFS-7240.006.patch
>
>
> A. Current state
>  
> 1. The datanode host / bucket /volume should be defined in the defaultFS (eg. 
>  o3://datanode:9864/test/bucket1)
> 2. The root file system points to the bucket (eg. 'dfs -ls /' lists all the 
> keys from the bucket1)
> It works very well, but there are some limitations.
> B. Problem one 
> The current code doesn't support fully qualified locations. For example 'dfs 
> -ls o3://datanode:9864/test/bucket1/dir1' is not working.
> C.) Problem two
> I tried to fix the previous problem, but it's not trivial. The biggest 
> problem is that there is a Path.makeQualified call which could transform 
> unqualified url to qualified url. This is part of the Path.java so it's 
> common for all the Hadoop file systems.
> In the current implementations it qualifies an url with keeping the schema 
> (eg. o3:// ) and authority (eg: datanode: 9864) from the defaultfs and use 
> the relative path as the end of the qualified url. For example:
> makeQualfied(defaultUri=o3://datanode:9864/test/bucket1, path=dir1/file) will 
> return o3://datanode:9864/dir1/file which is obviously wrong (the good would 
> be o3://datanode:9864/TEST/BUCKET1/dir1/file). I tried to do a workaround 
> with using a custom makeQualified in the Ozone code and it worked from 
> command line but couldn't work with Spark which use the Hadoop api and the 
> original makeQualified path.
> D.) Solution
> We should support makeQualified calls, so we can use any path in the 
> defaultFS.
>  
> I propose to use a simplified schema as o3://bucket.volume/ 
> This is similar to the s3a  format where the pattern is s3a://bucket.region/ 
> We don't need to set the hostname of the datanode (or ksm in case of service 
> discovery) but it would be configurable with additional hadoop configuraion 
> values such as fs.o3.bucket.buckename.volumename.address=http://datanode:9864 
> (this is how the s3a works today, as I know).
> We also need to define restrictions for the volume names (in our case it 
> should not include dot any more).
> ps: some spark output
> 2018-02-03 18:43:04 WARN  Client:66 - Neither spark.yarn.jars nor 
> spark.yarn.archive is set, falling back to uploading libraries under 
> SPARK_HOME.
> 2018-02-03 18:43:05 INFO  Client:54 - Uploading resource 
> file:/tmp/spark-03119be0-9c3d-440c-8e9f-48c692412ab5/__spark_libs__244044896784490.zip
>  -> 
> o3://datanode:9864/user/hadoop/.sparkStaging/application_1517611085375_0001/__spark_libs__244044896784490.zip
> My default fs was o3://datanode:9864/test/bucket1, but spark qualified the 
> name of the home directory.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13108) Ozone: OzoneFileSystem: Simplified url schema for Ozone File System

2018-02-20 Thread Elek, Marton (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elek, Marton updated HDFS-13108:

Attachment: HDFS-13108-HDFS-7240.006.patch

> Ozone: OzoneFileSystem: Simplified url schema for Ozone File System
> ---
>
> Key: HDFS-13108
> URL: https://issues.apache.org/jira/browse/HDFS-13108
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Elek, Marton
>Assignee: Elek, Marton
>Priority: Major
> Attachments: HDFS-13108-HDFS-7240.001.patch, 
> HDFS-13108-HDFS-7240.002.patch, HDFS-13108-HDFS-7240.003.patch, 
> HDFS-13108-HDFS-7240.005.patch, HDFS-13108-HDFS-7240.006.patch
>
>
> A. Current state
>  
> 1. The datanode host / bucket /volume should be defined in the defaultFS (eg. 
>  o3://datanode:9864/test/bucket1)
> 2. The root file system points to the bucket (eg. 'dfs -ls /' lists all the 
> keys from the bucket1)
> It works very well, but there are some limitations.
> B. Problem one 
> The current code doesn't support fully qualified locations. For example 'dfs 
> -ls o3://datanode:9864/test/bucket1/dir1' is not working.
> C.) Problem two
> I tried to fix the previous problem, but it's not trivial. The biggest 
> problem is that there is a Path.makeQualified call which could transform 
> unqualified url to qualified url. This is part of the Path.java so it's 
> common for all the Hadoop file systems.
> In the current implementations it qualifies an url with keeping the schema 
> (eg. o3:// ) and authority (eg: datanode: 9864) from the defaultfs and use 
> the relative path as the end of the qualified url. For example:
> makeQualfied(defaultUri=o3://datanode:9864/test/bucket1, path=dir1/file) will 
> return o3://datanode:9864/dir1/file which is obviously wrong (the good would 
> be o3://datanode:9864/TEST/BUCKET1/dir1/file). I tried to do a workaround 
> with using a custom makeQualified in the Ozone code and it worked from 
> command line but couldn't work with Spark which use the Hadoop api and the 
> original makeQualified path.
> D.) Solution
> We should support makeQualified calls, so we can use any path in the 
> defaultFS.
>  
> I propose to use a simplified schema as o3://bucket.volume/ 
> This is similar to the s3a  format where the pattern is s3a://bucket.region/ 
> We don't need to set the hostname of the datanode (or ksm in case of service 
> discovery) but it would be configurable with additional hadoop configuraion 
> values such as fs.o3.bucket.buckename.volumename.address=http://datanode:9864 
> (this is how the s3a works today, as I know).
> We also need to define restrictions for the volume names (in our case it 
> should not include dot any more).
> ps: some spark output
> 2018-02-03 18:43:04 WARN  Client:66 - Neither spark.yarn.jars nor 
> spark.yarn.archive is set, falling back to uploading libraries under 
> SPARK_HOME.
> 2018-02-03 18:43:05 INFO  Client:54 - Uploading resource 
> file:/tmp/spark-03119be0-9c3d-440c-8e9f-48c692412ab5/__spark_libs__244044896784490.zip
>  -> 
> o3://datanode:9864/user/hadoop/.sparkStaging/application_1517611085375_0001/__spark_libs__244044896784490.zip
> My default fs was o3://datanode:9864/test/bucket1, but spark qualified the 
> name of the home directory.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Issue Comment Deleted] (HDFS-13102) Implement SnapshotSkipList class to store Multi level DirectoryDiffs

2018-02-20 Thread Shashikant Banerjee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDFS-13102:
---
Comment: was deleted

(was: Thanks Nicholas for the Review. 

There are some issues for which I feel we should maintain a list maintaining 
the skip indices . I think its better to have a call sometime tomorrow.

If we keep the skipIndices maintained in a list, the power logic will also 
work..

I do agree that the addFirst method won't work in the current scenario and I 
would like to discuss with you on this part as how to handle this as this will 
be called when the nameNode starts up..So it may require a different handling.

Let me know in case you are available tomorrow any time.

Thanks
Shashi

On 2/20/18, 11:36 PM, "Tsz Wo Nicholas Sze (JIRA)"  wrote:


[ 
https://issues.apache.org/jira/browse/HDFS-13102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370378#comment-16370378
 ] 

Tsz Wo Nicholas Sze commented on HDFS-13102:


Some more comments:

- There seems a bug in addFirst -- it should add at index 0, i.e. 
skipNodeList.add(0, node).  Then, checkAndPromoteIfNeeded() won't work for it.

- With remove, we cannot use power to determine the skip indices.  I 
understand that remove() is not implemented here.  Are you going to change the 
computation in combineDiffs() when adding remove()?
{code}
//combineDiffs()
  // At each level no of entries to be combined to promote to a
  // higher level will be equal to skip interval, eg: assuming skip 
interval
  // of 4, at level 0, s0, s1 ,s2 and s3 will be combined to form s0-3.
  // similarly, s4-7, s8-11 and s11-15 will be constructed at level 1.
  // At level 1, s0-3, s4-7, s8-11, s11-15 will be combined to construct
  // s0-15 and so on.
  Double power = Math.pow(skipInterval, levelIterator);
{code}


> Implement SnapshotSkipList class to store Multi level DirectoryDiffs
> 
>
> Key: HDFS-13102
> URL: https://issues.apache.org/jira/browse/HDFS-13102
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Attachments: HDFS-13102.001.patch, HDFS-13102.002.patch, 
HDFS-13102.003.patch
>
>
> HDFS-11225 explains an issue where deletion of older snapshots can take a 
very long time in case the no of snapshot diffs is quite large for directories. 
For any directory under a snapshot, to construct the children list , it needs 
to combine all the diffs from that particular snapshot to the last snapshotDiff 
record and reverseApply to the current children list of the directory on live 
fs. This can take  a significant time if the no of snapshot diffs are quite 
large and changes per diff is significant.
> This Jira proposes to store the Directory diffs in a SnapshotSkip list, 
where we store multi level DirectoryDiffs. At each level, the Directory Diff 
will be cumulative diff of k snapshot diffs,
> where k is the level of a node in the list. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
)

> Implement SnapshotSkipList class to store Multi level DirectoryDiffs
> 
>
> Key: HDFS-13102
> URL: https://issues.apache.org/jira/browse/HDFS-13102
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Attachments: HDFS-13102.001.patch, HDFS-13102.002.patch, 
> HDFS-13102.003.patch
>
>
> HDFS-11225 explains an issue where deletion of older snapshots can take a 
> very long time in case the no of snapshot diffs is quite large for 
> directories. For any directory under a snapshot, to construct the children 
> list , it needs to combine all the diffs from that particular snapshot to the 
> last snapshotDiff record and reverseApply to the current children list of the 
> directory on live fs. This can take  a significant time if the no of snapshot 
> diffs are quite large and changes per diff is significant.
> This Jira proposes to store the Directory diffs in a SnapshotSkip list, where 
> we store multi level DirectoryDiffs. At each level, the Directory Diff will 
> be cumulative diff of k snapshot diffs,
> where k is the level of a node in the list. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Issue Comment Deleted] (HDFS-13102) Implement SnapshotSkipList class to store Multi level DirectoryDiffs

2018-02-20 Thread Shashikant Banerjee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDFS-13102:
---
Comment: was deleted

(was: I am holding on to other patches until this gets finalized..Because 
changing this patch will invariantly change other patches as well.

On 2/20/18, 11:36 PM, "Tsz Wo Nicholas Sze (JIRA)"  wrote:


[ 
https://issues.apache.org/jira/browse/HDFS-13102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370378#comment-16370378
 ] 

Tsz Wo Nicholas Sze commented on HDFS-13102:


Some more comments:

- There seems a bug in addFirst -- it should add at index 0, i.e. 
skipNodeList.add(0, node).  Then, checkAndPromoteIfNeeded() won't work for it.

- With remove, we cannot use power to determine the skip indices.  I 
understand that remove() is not implemented here.  Are you going to change the 
computation in combineDiffs() when adding remove()?
{code}
//combineDiffs()
  // At each level no of entries to be combined to promote to a
  // higher level will be equal to skip interval, eg: assuming skip 
interval
  // of 4, at level 0, s0, s1 ,s2 and s3 will be combined to form s0-3.
  // similarly, s4-7, s8-11 and s11-15 will be constructed at level 1.
  // At level 1, s0-3, s4-7, s8-11, s11-15 will be combined to construct
  // s0-15 and so on.
  Double power = Math.pow(skipInterval, levelIterator);
{code}


> Implement SnapshotSkipList class to store Multi level DirectoryDiffs
> 
>
> Key: HDFS-13102
> URL: https://issues.apache.org/jira/browse/HDFS-13102
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Attachments: HDFS-13102.001.patch, HDFS-13102.002.patch, 
HDFS-13102.003.patch
>
>
> HDFS-11225 explains an issue where deletion of older snapshots can take a 
very long time in case the no of snapshot diffs is quite large for directories. 
For any directory under a snapshot, to construct the children list , it needs 
to combine all the diffs from that particular snapshot to the last snapshotDiff 
record and reverseApply to the current children list of the directory on live 
fs. This can take  a significant time if the no of snapshot diffs are quite 
large and changes per diff is significant.
> This Jira proposes to store the Directory diffs in a SnapshotSkip list, 
where we store multi level DirectoryDiffs. At each level, the Directory Diff 
will be cumulative diff of k snapshot diffs,
> where k is the level of a node in the list. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
)

> Implement SnapshotSkipList class to store Multi level DirectoryDiffs
> 
>
> Key: HDFS-13102
> URL: https://issues.apache.org/jira/browse/HDFS-13102
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Attachments: HDFS-13102.001.patch, HDFS-13102.002.patch, 
> HDFS-13102.003.patch
>
>
> HDFS-11225 explains an issue where deletion of older snapshots can take a 
> very long time in case the no of snapshot diffs is quite large for 
> directories. For any directory under a snapshot, to construct the children 
> list , it needs to combine all the diffs from that particular snapshot to the 
> last snapshotDiff record and reverseApply to the current children list of the 
> directory on live fs. This can take  a significant time if the no of snapshot 
> diffs are quite large and changes per diff is significant.
> This Jira proposes to store the Directory diffs in a SnapshotSkip list, where 
> we store multi level DirectoryDiffs. At each level, the Directory Diff will 
> be cumulative diff of k snapshot diffs,
> where k is the level of a node in the list. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13102) Implement SnapshotSkipList class to store Multi level DirectoryDiffs

2018-02-20 Thread Shashikant Banerjee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370396#comment-16370396
 ] 

Shashikant Banerjee commented on HDFS-13102:


I am holding on to other patches until this gets finalized..Because changing 
this patch will invariantly change other patches as well.

On 2/20/18, 11:36 PM, "Tsz Wo Nicholas Sze (JIRA)"  wrote:


[ 
https://issues.apache.org/jira/browse/HDFS-13102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370378#comment-16370378
 ] 

Tsz Wo Nicholas Sze commented on HDFS-13102:


Some more comments:

- There seems a bug in addFirst -- it should add at index 0, i.e. 
skipNodeList.add(0, node).  Then, checkAndPromoteIfNeeded() won't work for it.

- With remove, we cannot use power to determine the skip indices.  I 
understand that remove() is not implemented here.  Are you going to change the 
computation in combineDiffs() when adding remove()?
{code}
//combineDiffs()
  // At each level no of entries to be combined to promote to a
  // higher level will be equal to skip interval, eg: assuming skip 
interval
  // of 4, at level 0, s0, s1 ,s2 and s3 will be combined to form s0-3.
  // similarly, s4-7, s8-11 and s11-15 will be constructed at level 1.
  // At level 1, s0-3, s4-7, s8-11, s11-15 will be combined to construct
  // s0-15 and so on.
  Double power = Math.pow(skipInterval, levelIterator);
{code}


> Implement SnapshotSkipList class to store Multi level DirectoryDiffs
> 
>
> Key: HDFS-13102
> URL: https://issues.apache.org/jira/browse/HDFS-13102
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Attachments: HDFS-13102.001.patch, HDFS-13102.002.patch, 
HDFS-13102.003.patch
>
>
> HDFS-11225 explains an issue where deletion of older snapshots can take a 
very long time in case the no of snapshot diffs is quite large for directories. 
For any directory under a snapshot, to construct the children list , it needs 
to combine all the diffs from that particular snapshot to the last snapshotDiff 
record and reverseApply to the current children list of the directory on live 
fs. This can take  a significant time if the no of snapshot diffs are quite 
large and changes per diff is significant.
> This Jira proposes to store the Directory diffs in a SnapshotSkip list, 
where we store multi level DirectoryDiffs. At each level, the Directory Diff 
will be cumulative diff of k snapshot diffs,
> where k is the level of a node in the list. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


> Implement SnapshotSkipList class to store Multi level DirectoryDiffs
> 
>
> Key: HDFS-13102
> URL: https://issues.apache.org/jira/browse/HDFS-13102
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Attachments: HDFS-13102.001.patch, HDFS-13102.002.patch, 
> HDFS-13102.003.patch
>
>
> HDFS-11225 explains an issue where deletion of older snapshots can take a 
> very long time in case the no of snapshot diffs is quite large for 
> directories. For any directory under a snapshot, to construct the children 
> list , it needs to combine all the diffs from that particular snapshot to the 
> last snapshotDiff record and reverseApply to the current children list of the 
> directory on live fs. This can take  a significant time if the no of snapshot 
> diffs are quite large and changes per diff is significant.
> This Jira proposes to store the Directory diffs in a SnapshotSkip list, where 
> we store multi level DirectoryDiffs. At each level, the Directory Diff will 
> be cumulative diff of k snapshot diffs,
> where k is the level of a node in the list. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13102) Implement SnapshotSkipList class to store Multi level DirectoryDiffs

2018-02-20 Thread Shashikant Banerjee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370391#comment-16370391
 ] 

Shashikant Banerjee commented on HDFS-13102:


Thanks Nicholas for the Review. 

There are some issues for which I feel we should maintain a list maintaining 
the skip indices . I think its better to have a call sometime tomorrow.

If we keep the skipIndices maintained in a list, the power logic will also 
work..

I do agree that the addFirst method won't work in the current scenario and I 
would like to discuss with you on this part as how to handle this as this will 
be called when the nameNode starts up..So it may require a different handling.

Let me know in case you are available tomorrow any time.

Thanks
Shashi

On 2/20/18, 11:36 PM, "Tsz Wo Nicholas Sze (JIRA)"  wrote:


[ 
https://issues.apache.org/jira/browse/HDFS-13102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370378#comment-16370378
 ] 

Tsz Wo Nicholas Sze commented on HDFS-13102:


Some more comments:

- There seems a bug in addFirst -- it should add at index 0, i.e. 
skipNodeList.add(0, node).  Then, checkAndPromoteIfNeeded() won't work for it.

- With remove, we cannot use power to determine the skip indices.  I 
understand that remove() is not implemented here.  Are you going to change the 
computation in combineDiffs() when adding remove()?
{code}
//combineDiffs()
  // At each level no of entries to be combined to promote to a
  // higher level will be equal to skip interval, eg: assuming skip 
interval
  // of 4, at level 0, s0, s1 ,s2 and s3 will be combined to form s0-3.
  // similarly, s4-7, s8-11 and s11-15 will be constructed at level 1.
  // At level 1, s0-3, s4-7, s8-11, s11-15 will be combined to construct
  // s0-15 and so on.
  Double power = Math.pow(skipInterval, levelIterator);
{code}


> Implement SnapshotSkipList class to store Multi level DirectoryDiffs
> 
>
> Key: HDFS-13102
> URL: https://issues.apache.org/jira/browse/HDFS-13102
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Attachments: HDFS-13102.001.patch, HDFS-13102.002.patch, 
HDFS-13102.003.patch
>
>
> HDFS-11225 explains an issue where deletion of older snapshots can take a 
very long time in case the no of snapshot diffs is quite large for directories. 
For any directory under a snapshot, to construct the children list , it needs 
to combine all the diffs from that particular snapshot to the last snapshotDiff 
record and reverseApply to the current children list of the directory on live 
fs. This can take  a significant time if the no of snapshot diffs are quite 
large and changes per diff is significant.
> This Jira proposes to store the Directory diffs in a SnapshotSkip list, 
where we store multi level DirectoryDiffs. At each level, the Directory Diff 
will be cumulative diff of k snapshot diffs,
> where k is the level of a node in the list. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


> Implement SnapshotSkipList class to store Multi level DirectoryDiffs
> 
>
> Key: HDFS-13102
> URL: https://issues.apache.org/jira/browse/HDFS-13102
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Attachments: HDFS-13102.001.patch, HDFS-13102.002.patch, 
> HDFS-13102.003.patch
>
>
> HDFS-11225 explains an issue where deletion of older snapshots can take a 
> very long time in case the no of snapshot diffs is quite large for 
> directories. For any directory under a snapshot, to construct the children 
> list , it needs to combine all the diffs from that particular snapshot to the 
> last snapshotDiff record and reverseApply to the current children list of the 
> directory on live fs. This can take  a significant time if the no of snapshot 
> diffs are quite large and changes per diff is significant.
> This Jira proposes to store the Directory diffs in a SnapshotSkip list, where 
> we store multi level DirectoryDiffs. At each level, the Directory Diff will 
> be cumulative diff of k snapshot diffs,
> where k is the level of a node in the list. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HDFS-13102) Implement SnapshotSkipList class to store Multi level DirectoryDiffs

2018-02-20 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370378#comment-16370378
 ] 

Tsz Wo Nicholas Sze commented on HDFS-13102:


Some more comments:

- There seems a bug in addFirst -- it should add at index 0, i.e. 
skipNodeList.add(0, node).  Then, checkAndPromoteIfNeeded() won't work for it.

- With remove, we cannot use power to determine the skip indices.  I understand 
that remove() is not implemented here.  Are you going to change the computation 
in combineDiffs() when adding remove()?
{code}
//combineDiffs()
  // At each level no of entries to be combined to promote to a
  // higher level will be equal to skip interval, eg: assuming skip interval
  // of 4, at level 0, s0, s1 ,s2 and s3 will be combined to form s0-3.
  // similarly, s4-7, s8-11 and s11-15 will be constructed at level 1.
  // At level 1, s0-3, s4-7, s8-11, s11-15 will be combined to construct
  // s0-15 and so on.
  Double power = Math.pow(skipInterval, levelIterator);
{code}


> Implement SnapshotSkipList class to store Multi level DirectoryDiffs
> 
>
> Key: HDFS-13102
> URL: https://issues.apache.org/jira/browse/HDFS-13102
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Attachments: HDFS-13102.001.patch, HDFS-13102.002.patch, 
> HDFS-13102.003.patch
>
>
> HDFS-11225 explains an issue where deletion of older snapshots can take a 
> very long time in case the no of snapshot diffs is quite large for 
> directories. For any directory under a snapshot, to construct the children 
> list , it needs to combine all the diffs from that particular snapshot to the 
> last snapshotDiff record and reverseApply to the current children list of the 
> directory on live fs. This can take  a significant time if the no of snapshot 
> diffs are quite large and changes per diff is significant.
> This Jira proposes to store the Directory diffs in a SnapshotSkip list, where 
> we store multi level DirectoryDiffs. At each level, the Directory Diff will 
> be cumulative diff of k snapshot diffs,
> where k is the level of a node in the list. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-13168) XmlImageVisitor - Prefer Array over LinkedList

2018-02-20 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HDFS-13168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370367#comment-16370367
 ] 

Íñigo Goiri edited comment on HDFS-13168 at 2/20/18 6:02 PM:
-

{{StringBuilder}} is more optimal but I'm not sure how does + in {{String}} is 
implemented.
Anyway, just philosophical at this point, it should be good either way.
I'm committing [^HDFS-13168.2.patch] to {{trunk}}; [~belugabehr], for the 
commit message do I use your name capitalized as it is here or do you prefer 
some other spelling?



was (Author: elgoiri):
{{StringBuilder}} is more optimal but I'm not sure how does + in {{String}} is 
implemented.
Anyway, just philosophical at this point, it should be good either way.
I'm committing [^HDFS-13168.2.patch] to {{trunk}}; [~belugabehr], for the 
commit message do I use for your name capitalized as it is here or do you 
prefer some other spelling?


> XmlImageVisitor - Prefer Array over LinkedList
> --
>
> Key: HDFS-13168
> URL: https://issues.apache.org/jira/browse/HDFS-13168
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.0.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Minor
> Attachments: HDFS-13168.1.patch, HDFS-13168.2.patch
>
>
> {{ArrayDeque}}
> {quote}This class is likely to be faster than Stack when used as a stack, and 
> faster than LinkedList when used as a queue.{quote}
> .. not to mention less memory fragmentation (single backing array v.s. many 
> ArrayList nodes).
> https://docs.oracle.com/javase/8/docs/api/java/util/ArrayDeque.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13168) XmlImageVisitor - Prefer Array over LinkedList

2018-02-20 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HDFS-13168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370367#comment-16370367
 ] 

Íñigo Goiri commented on HDFS-13168:


{{StringBuilder}} is more optimal but I'm not sure how does + in {{String}} is 
implemented.
Anyway, just philosophical at this point, it should be good either way.
I'm committing [^HDFS-13168.2.patch] to {{trunk}}; [~belugabehr], for the 
commit message do I use for your name capitalized as it is here or do you 
prefer some other spelling?


> XmlImageVisitor - Prefer Array over LinkedList
> --
>
> Key: HDFS-13168
> URL: https://issues.apache.org/jira/browse/HDFS-13168
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.0.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Minor
> Attachments: HDFS-13168.1.patch, HDFS-13168.2.patch
>
>
> {{ArrayDeque}}
> {quote}This class is likely to be faster than Stack when used as a stack, and 
> faster than LinkedList when used as a queue.{quote}
> .. not to mention less memory fragmentation (single backing array v.s. many 
> ArrayList nodes).
> https://docs.oracle.com/javase/8/docs/api/java/util/ArrayDeque.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11187) Optimize disk access for last partial chunk checksum of Finalized replica

2018-02-20 Thread Gabor Bota (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370364#comment-16370364
 ] 

Gabor Bota commented on HDFS-11187:
---

Hi [~xkrogen], I'll check if it can be applied easily soon.


> Optimize disk access for last partial chunk checksum of Finalized replica
> -
>
> Key: HDFS-11187
> URL: https://issues.apache.org/jira/browse/HDFS-11187
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Wei-Chiu Chuang
>Assignee: Gabor Bota
>Priority: Major
> Fix For: 3.1.0, 2.10.0, 2.9.1, 2.8.4, 3.0.2
>
> Attachments: HDFS-11187-branch-2.001.patch, 
> HDFS-11187-branch-2.002.patch, HDFS-11187-branch-2.003.patch, 
> HDFS-11187-branch-2.004.patch, HDFS-11187.001.patch, HDFS-11187.002.patch, 
> HDFS-11187.003.patch, HDFS-11187.004.patch, HDFS-11187.005.patch
>
>
> The patch at HDFS-11160 ensures BlockSender reads the correct version of 
> metafile when there are concurrent writers.
> However, the implementation is not optimal, because it must always read the 
> last partial chunk checksum from disk while holding FsDatasetImpl lock for 
> every reader. It is possible to optimize this by keeping an up-to-date 
> version of last partial checksum in-memory and reduce disk access.
> I am separating the optimization into a new jira, because maintaining the 
> state of in-memory checksum requires a lot more work.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13102) Implement SnapshotSkipList class to store Multi level DirectoryDiffs

2018-02-20 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370365#comment-16370365
 ] 

Tsz Wo Nicholas Sze commented on HDFS-13102:



> Removes will be handled as a part HDFS-13171

If remove() is not working yet, please throw UnsupportedOperationException in 
this JIRA.

> ... Now locating the previous multiLevel node and next multiLevelNode might 
> be a little cumbersome as we need to iterate through the list again and check 
> which node is actually a multiLevel node. ...

Just iterate starting at the deleted element but not the entire list.  
Maintaining diffSetIndexList needs extra memory.

> ... We need to have the INodeDirectory passed to DirectoryDiffList, as with 
> INodeDirectory reference itself, we will be able to read the configured value 
> of SkipInterval.

Passing it in the constructor is fine but not storing it.  It occurpies memory 
for storing the INodeDirectory.

> ... getMinListForRange actually gives a list of childrenDiff(not 
> DirectoryDiffs ...

Good point! Then, both DiffListByArrayList and unmodifiableList should 
implement getSumForRange.



> Implement SnapshotSkipList class to store Multi level DirectoryDiffs
> 
>
> Key: HDFS-13102
> URL: https://issues.apache.org/jira/browse/HDFS-13102
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Attachments: HDFS-13102.001.patch, HDFS-13102.002.patch, 
> HDFS-13102.003.patch
>
>
> HDFS-11225 explains an issue where deletion of older snapshots can take a 
> very long time in case the no of snapshot diffs is quite large for 
> directories. For any directory under a snapshot, to construct the children 
> list , it needs to combine all the diffs from that particular snapshot to the 
> last snapshotDiff record and reverseApply to the current children list of the 
> directory on live fs. This can take  a significant time if the no of snapshot 
> diffs are quite large and changes per diff is significant.
> This Jira proposes to store the Directory diffs in a SnapshotSkip list, where 
> we store multi level DirectoryDiffs. At each level, the Directory Diff will 
> be cumulative diff of k snapshot diffs,
> where k is the level of a node in the list. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13119) RBF: Manage unavailable clusters

2018-02-20 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HDFS-13119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370358#comment-16370358
 ] 

Íñigo Goiri commented on HDFS-13119:


I added  [^HDFS-13119.006.patch] with the committed version to trunk for 
completeness.
As this is technically a bug, I'd like to push it for 2.9.1 and 3.0.1 (or 
3.0.2).
Any idea what's the current state with the branches? My guess is {{branch-2.9}} 
and {{branch-3.0}}.
[~chris.douglas] which ones would be the right ones for 2.9.1 and 3.0.X?

> RBF: Manage unavailable clusters
> 
>
> Key: HDFS-13119
> URL: https://issues.apache.org/jira/browse/HDFS-13119
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: 3.0.0
>Reporter: Íñigo Goiri
>Assignee: Yiqun Lin
>Priority: Major
>  Labels: RBF
> Fix For: 3.1.0, 2.10.0, 3.2.0
>
> Attachments: HDFS-13119.001.patch, HDFS-13119.002.patch, 
> HDFS-13119.003.patch, HDFS-13119.004.patch, HDFS-13119.005.patch, 
> HDFS-13119.006.patch
>
>
> When a federated cluster has one of the subcluster down, operations that run 
> in every subcluster ({{RouterRpcClient#invokeAll()}}) may take all the RPC 
> connections.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13119) RBF: Manage unavailable clusters

2018-02-20 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/HDFS-13119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Íñigo Goiri updated HDFS-13119:
---
Attachment: HDFS-13119.006.patch

> RBF: Manage unavailable clusters
> 
>
> Key: HDFS-13119
> URL: https://issues.apache.org/jira/browse/HDFS-13119
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: 3.0.0
>Reporter: Íñigo Goiri
>Assignee: Yiqun Lin
>Priority: Major
>  Labels: RBF
> Fix For: 3.1.0, 2.10.0, 3.2.0
>
> Attachments: HDFS-13119.001.patch, HDFS-13119.002.patch, 
> HDFS-13119.003.patch, HDFS-13119.004.patch, HDFS-13119.005.patch, 
> HDFS-13119.006.patch
>
>
> When a federated cluster has one of the subcluster down, operations that run 
> in every subcluster ({{RouterRpcClient#invokeAll()}}) may take all the RPC 
> connections.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-13102) Implement SnapshotSkipList class to store Multi level DirectoryDiffs

2018-02-20 Thread Shashikant Banerjee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370330#comment-16370330
 ] 

Shashikant Banerjee edited comment on HDFS-13102 at 2/20/18 5:37 PM:
-

Thanks [~szetszwo], for the review comments.
{code:java}
TestDirectoryDiffList does not test remove(..). As mentioned, remove(..) seems 
having some bugs.{code}
I agree that TestDirectoryDiffList does not test removes. Removes will be 
handled as a part HDFS-13171.Removes need to balance the list and hence, i 
would like to update it in a different Jira.
{code:java}
diffSetIndexList does not seem useful since it is the same as the nodes in 
level 1. BTW, diffSetIndexList is not updated when remove an element so that it 
seems a bug. I suggest removing diffSetIndexList since it can be computed if 
necessary.{code}
diffSetIndexList is a list which maintains the indices for all the 
multiLevelNodes. I think diffSetIndexList should be kept as otherwise,

for determining multiLevel nodes when sequence of delete happen will be 
cumbersome. For example,

let's say we have snapshotDiff stored for a directory  as follows with a skip 
interval of 3:

s0-->s1->s2>s3>s4>s5->s6-

In this case, s0 and s3 will be multiLevel nodes. Let's say s2 gets deleted 
followed by s3. Now locating the previous multiLevel node and next 
multiLevelNode might be a little cumbersome as we need to iterate through the 
list again and check which node is actually a multiLevel node. DiffsetIndexList 
will simplify the balancing the skipList in case of deletions. It will be 
updated accordingly when we handle deletes with HDFS-13171.
{code:java}
Pass INodeDirectory as a parameter in getSumForRange(..). Then, we could remove 
INodeDirectory dir from DirectoryDiffList{code}
We need to have the INodeDirectory passed to DirectoryDiffList, as with 
INodeDirectory reference itself, we will be able to read the configured value 
of SkipInterval. This will be a part of HDFS-13173.
{code:java}
Let's replace getSumForRange with getMinListForRange in DiffList so that we may 
implement it DiffListByArrayList using subList.{code}
getMinListForRange actually gives a list of childrenDiff(not DirectoryDiffs 
which is the basic element stored in the List). Putting this API in the 
diffList interface method might not make much sense. In case, this API seems 
suitable for DiffList interface method, we need to change the API to return 
list of DirectoryDiffs rather than childrenDiff here.

For DiffListByArrayList (which will be used to store FileDiffs in general), 
does not have childrenDiff element.


was (Author: shashikant):
Thanks [~szetszwo], for the review comments.
{code:java}
TestDirectoryDiffList does not test remove(..). As mentioned, remove(..) seems 
having some bugs.{code}
I agree that TestDirectoryDiffList does not test removes. Removes will be 
handled as a part HDFS-13171.Removes need to balance the list and hence, i 
would like to update it in a different Jira.
{code:java}
diffSetIndexList does not seem useful since it is the same as the nodes in 
level 1. BTW, diffSetIndexList is not updated when remove an element so that it 
seems a bug. I suggest removing diffSetIndexList since it can be computed if 
necessary.{code}
diffSetIndexList is a list which maintains the indices for all the 
multiLevelNodes. I think diffSetIndexList should be kept as otherwise,

for determining multiLevel nodes when sequence of delete happen will be 
cumbersome. For example,

let's say we have snapshotDiff stored for a directory  as follows with a skip 
interval of 3:

s0-->s1->s2>s3>s4>s5->s6-

In this case, s0 and s3 will be multiLevel nodes. Let's say s2 gets deleted 
followed by s3. Now locating the previous multiLevel node and

next multiLevelNode might be a little cumbersome as we need to iterate through 
the list again and check which node is actually a multiLevel node. 
DiffsetIndexList will simplify the balancing the skipList in case of deletions. 
It will be updated accordingly when we handle deletes with HDFS-13171.
{code:java}
Pass INodeDirectory as a parameter in getSumForRange(..). Then, we could remove 
INodeDirectory dir from DirectoryDiffList{code}
We need to have the INodeDirectory passed to DirectoryDiffList, as with 
INodeDirectory reference itself, we will be able to read the configured value 
of SkipInterval. This will be a part of HDFS-13173.
{code:java}
Let's replace getSumForRange with getMinListForRange in DiffList so that we may 
implement it DiffListByArrayList using subList.{code}
getMinListForRange actually gives a list of childrenDiff(not DirectoryDiffs 
which is the basic element stored in the List). Putting this API

in the diffList interface method might not make much sense. In case, this API 
seems suitable for DiffList interface method, we need to change the API to 
return list of DirectoryDiffs rather than childrenDiff here.

For 

[jira] [Commented] (HDFS-13102) Implement SnapshotSkipList class to store Multi level DirectoryDiffs

2018-02-20 Thread Shashikant Banerjee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370330#comment-16370330
 ] 

Shashikant Banerjee commented on HDFS-13102:


Thanks [~szetszwo], for the review comments.
{code:java}
TestDirectoryDiffList does not test remove(..). As mentioned, remove(..) seems 
having some bugs.{code}
I agree that TestDirectoryDiffList does not test removes. Removes will be 
handled as a part HDFS-13171.Removes need to balance the list and hence, i 
would like to update it in a different Jira.
{code:java}
diffSetIndexList does not seem useful since it is the same as the nodes in 
level 1. BTW, diffSetIndexList is not updated when remove an element so that it 
seems a bug. I suggest removing diffSetIndexList since it can be computed if 
necessary.{code}
diffSetIndexList is a list which maintains the indices for all the 
multiLevelNodes. I think diffSetIndexList should be kept as otherwise,

for determining multiLevel nodes when sequence of delete happen will be 
cumbersome. For example,

let's say we have snapshotDiff stored for a directory  as follows with a skip 
interval of 3:

s0-->s1->s2>s3>s4>s5->s6-

In this case, s0 and s3 will be multiLevel nodes. Let's say s2 gets deleted 
followed by s3. Now locating the previous multiLevel node and

next multiLevelNode might be a little cumbersome as we need to iterate through 
the list again and check which node is actually a multiLevel node. 
DiffsetIndexList will simplify the balancing the skipList in case of deletions. 
It will be updated accordingly when we handle deletes with HDFS-13171.
{code:java}
Pass INodeDirectory as a parameter in getSumForRange(..). Then, we could remove 
INodeDirectory dir from DirectoryDiffList{code}
We need to have the INodeDirectory passed to DirectoryDiffList, as with 
INodeDirectory reference itself, we will be able to read the configured value 
of SkipInterval. This will be a part of HDFS-13173.
{code:java}
Let's replace getSumForRange with getMinListForRange in DiffList so that we may 
implement it DiffListByArrayList using subList.{code}
getMinListForRange actually gives a list of childrenDiff(not DirectoryDiffs 
which is the basic element stored in the List). Putting this API

in the diffList interface method might not make much sense. In case, this API 
seems suitable for DiffList interface method, we need to change the API to 
return list of DirectoryDiffs rather than childrenDiff here.

For DiffListByArrayList (which will be used to store FileDiffs in general), 
does not have childrenDiff element.

> Implement SnapshotSkipList class to store Multi level DirectoryDiffs
> 
>
> Key: HDFS-13102
> URL: https://issues.apache.org/jira/browse/HDFS-13102
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Attachments: HDFS-13102.001.patch, HDFS-13102.002.patch, 
> HDFS-13102.003.patch
>
>
> HDFS-11225 explains an issue where deletion of older snapshots can take a 
> very long time in case the no of snapshot diffs is quite large for 
> directories. For any directory under a snapshot, to construct the children 
> list , it needs to combine all the diffs from that particular snapshot to the 
> last snapshotDiff record and reverseApply to the current children list of the 
> directory on live fs. This can take  a significant time if the no of snapshot 
> diffs are quite large and changes per diff is significant.
> This Jira proposes to store the Directory diffs in a SnapshotSkip list, where 
> we store multi level DirectoryDiffs. At each level, the Directory Diff will 
> be cumulative diff of k snapshot diffs,
> where k is the level of a node in the list. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13170) Port webhdfs unmaskedpermission parameter to HTTPFS

2018-02-20 Thread Stephen O'Donnell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen O'Donnell updated HDFS-13170:
-
Affects Version/s: 3.2.0
   Status: Patch Available  (was: Open)

> Port webhdfs unmaskedpermission parameter to HTTPFS
> ---
>
> Key: HDFS-13170
> URL: https://issues.apache.org/jira/browse/HDFS-13170
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.2.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Attachments: HDFS-13170.001.patch
>
>
> HDFS-6962 fixed a long standing issue where default ACLs are not correctly 
> applied to files when they are created from the hadoop shell.
> With this change, if you create a file with default ACLs against the parent 
> directory, with dfs.namenode.posix.acl.inheritance.enabled=false, the result 
> is:
> {code}
> # file: /test_acl/file_from_shell_off
> # owner: user1
> # group: supergroup
> user::rw-
> user:user1:rwx    #effective:r--
> user:user2:rwx    #effective:r--
> group::r-x    #effective:r--
> group:users:rwx    #effective:r--
> mask::r--
> other::r--
> {code}
> And if you enable this, to fix the bug above, the result is as you would 
> expect:
> {code}
> # file: /test_acl/file_from_shell
> # owner: user1
> # group: supergroup
> user::rw-
> user:user1:rwx    #effective:rw-
> user:user2:rwx    #effective:rw-
> group::r-x    #effective:r--
> group:users:rwx    #effective:rw-
> mask::rw-
> other::r--
> {code}
> If I then create a file over HTTPFS or webHDFS, the behaviour is not the same 
> as above:
> {code}
> # file: /test_acl/default_permissions
> # owner: user1
> # group: supergroup
> user::rwx
> user:user1:rwx    #effective:r-x
> user:user2:rwx    #effective:r-x
> group::r-x
> group:users:rwx    #effective:r-x
> mask::r-x
> other::r-x
> {code}
> Notice the mask is set to r-x and this remove the write permission on the new 
> file.
> As part of HDFS-6962 a new parameter was added to webhdfs 
> 'unmaskedpermission'. By passing it to a webhdfs call, it can result in the 
> same behaviour as when a file is written from the CLI:
> {code}
> curl -i -X PUT -T test.txt --header "Content-Type:application/octet-stream"  
> "http://namenode:50075/webhdfs/v1/test_acl/unmasked__770?op=CREATE=user1=namenode:8020=false=770;
> # file: /test_acl/unmasked__770
> # owner: user1
> # group: supergroup
> user::rwx
> user:user1:rwx
> user:user2:rwx
> group::r-x
> group:users:rwx
> mask::rwx
> other::---
> {code}
> However, this parameter was never ported to HTTPFS.
> This Jira is to replicate the same changes to HTTPFS so this parameter is 
> available there too.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11187) Optimize disk access for last partial chunk checksum of Finalized replica

2018-02-20 Thread Erik Krogen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370287#comment-16370287
 ] 

Erik Krogen commented on HDFS-11187:


Hi [~gabor.bota] / [~xiaochen], thanks for the work! I see that the target 
version is 2.7.6 but that this was only backported to branch-2.8. Do you plan 
to put it in branch-2.7? It seems it should go there given that IIUC HDFS-11160 
introduced a performance regression in 2.7.

> Optimize disk access for last partial chunk checksum of Finalized replica
> -
>
> Key: HDFS-11187
> URL: https://issues.apache.org/jira/browse/HDFS-11187
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Wei-Chiu Chuang
>Assignee: Gabor Bota
>Priority: Major
> Fix For: 3.1.0, 2.10.0, 2.9.1, 2.8.4, 3.0.2
>
> Attachments: HDFS-11187-branch-2.001.patch, 
> HDFS-11187-branch-2.002.patch, HDFS-11187-branch-2.003.patch, 
> HDFS-11187-branch-2.004.patch, HDFS-11187.001.patch, HDFS-11187.002.patch, 
> HDFS-11187.003.patch, HDFS-11187.004.patch, HDFS-11187.005.patch
>
>
> The patch at HDFS-11160 ensures BlockSender reads the correct version of 
> metafile when there are concurrent writers.
> However, the implementation is not optimal, because it must always read the 
> last partial chunk checksum from disk while holding FsDatasetImpl lock for 
> every reader. It is possible to optimize this by keeping an up-to-date 
> version of last partial checksum in-memory and reduce disk access.
> I am separating the optimization into a new jira, because maintaining the 
> state of in-memory checksum requires a lot more work.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13167) DatanodeAdminManager Improvements

2018-02-20 Thread BELUGA BEHR (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370285#comment-16370285
 ] 

BELUGA BEHR commented on HDFS-13167:


Test failures are unrelated.

I have uploaded a new patch to address the one check-style error.

> DatanodeAdminManager Improvements
> -
>
> Key: HDFS-13167
> URL: https://issues.apache.org/jira/browse/HDFS-13167
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.0.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Trivial
> Attachments: HDFS-13167.1.patch, HDFS-13167.2.patch, 
> HDFS-13167.3.patch
>
>
> # Use Collection type Set instead of List for tracking nodes
> # Fix logging statements that are erroneously appending variables instead of 
> using parameters
> # Miscellaneous small improvements
> As an example, the {{node}} variable is being appended to the string instead 
> of being passed as an argument to the {{trace}} method for variable 
> substitution.
> {code}
> LOG.trace("stopDecommission: Node {} in {}, nothing to do." +
>   node, node.getAdminState());
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13167) DatanodeAdminManager Improvements

2018-02-20 Thread BELUGA BEHR (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BELUGA BEHR updated HDFS-13167:
---
Attachment: HDFS-13167.3.patch

> DatanodeAdminManager Improvements
> -
>
> Key: HDFS-13167
> URL: https://issues.apache.org/jira/browse/HDFS-13167
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.0.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Trivial
> Attachments: HDFS-13167.1.patch, HDFS-13167.2.patch, 
> HDFS-13167.3.patch
>
>
> # Use Collection type Set instead of List for tracking nodes
> # Fix logging statements that are erroneously appending variables instead of 
> using parameters
> # Miscellaneous small improvements
> As an example, the {{node}} variable is being appended to the string instead 
> of being passed as an argument to the {{trace}} method for variable 
> substitution.
> {code}
> LOG.trace("stopDecommission: Node {} in {}, nothing to do." +
>   node, node.getAdminState());
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13167) DatanodeAdminManager Improvements

2018-02-20 Thread BELUGA BEHR (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BELUGA BEHR updated HDFS-13167:
---
Status: Patch Available  (was: Open)

> DatanodeAdminManager Improvements
> -
>
> Key: HDFS-13167
> URL: https://issues.apache.org/jira/browse/HDFS-13167
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.0.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Trivial
> Attachments: HDFS-13167.1.patch, HDFS-13167.2.patch, 
> HDFS-13167.3.patch
>
>
> # Use Collection type Set instead of List for tracking nodes
> # Fix logging statements that are erroneously appending variables instead of 
> using parameters
> # Miscellaneous small improvements
> As an example, the {{node}} variable is being appended to the string instead 
> of being passed as an argument to the {{trace}} method for variable 
> substitution.
> {code}
> LOG.trace("stopDecommission: Node {} in {}, nothing to do." +
>   node, node.getAdminState());
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13167) DatanodeAdminManager Improvements

2018-02-20 Thread BELUGA BEHR (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BELUGA BEHR updated HDFS-13167:
---
Status: Open  (was: Patch Available)

> DatanodeAdminManager Improvements
> -
>
> Key: HDFS-13167
> URL: https://issues.apache.org/jira/browse/HDFS-13167
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.0.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Trivial
> Attachments: HDFS-13167.1.patch, HDFS-13167.2.patch
>
>
> # Use Collection type Set instead of List for tracking nodes
> # Fix logging statements that are erroneously appending variables instead of 
> using parameters
> # Miscellaneous small improvements
> As an example, the {{node}} variable is being appended to the string instead 
> of being passed as an argument to the {{trace}} method for variable 
> substitution.
> {code}
> LOG.trace("stopDecommission: Node {} in {}, nothing to do." +
>   node, node.getAdminState());
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13168) XmlImageVisitor - Prefer Array over LinkedList

2018-02-20 Thread BELUGA BEHR (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370271#comment-16370271
 ] 

BELUGA BEHR commented on HDFS-13168:


[~elgoiri] Using 'char' type is intentional.  It's faster to add a 'char' to a 
StringBuilder than a string.  Adding a string requires a _null_ check and a 
_length_ check.  For a char, it cannot be null and it can only be a length of 
1, so it's faster.

 

Please consider this patch for inclusion into the project.  Thanks!

> XmlImageVisitor - Prefer Array over LinkedList
> --
>
> Key: HDFS-13168
> URL: https://issues.apache.org/jira/browse/HDFS-13168
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.0.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Minor
> Attachments: HDFS-13168.1.patch, HDFS-13168.2.patch
>
>
> {{ArrayDeque}}
> {quote}This class is likely to be faster than Stack when used as a stack, and 
> faster than LinkedList when used as a queue.{quote}
> .. not to mention less memory fragmentation (single backing array v.s. many 
> ArrayList nodes).
> https://docs.oracle.com/javase/8/docs/api/java/util/ArrayDeque.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13170) Port webhdfs unmaskedpermission parameter to HTTPFS

2018-02-20 Thread Stephen O'Donnell (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370249#comment-16370249
 ] 

Stephen O'Donnell commented on HDFS-13170:
--

I have added a patch for this and a couple of tests. I have tried as much as 
possible to mirror the changes that were done in webhdfs, but the HTTPFS code 
is quite different so its not a straight copy and paste.

I think the only code paths affected here are CREATE and MKDIRS. None of the 
other operations create objects that need the permissions applied in HDFS.

> Port webhdfs unmaskedpermission parameter to HTTPFS
> ---
>
> Key: HDFS-13170
> URL: https://issues.apache.org/jira/browse/HDFS-13170
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Attachments: HDFS-13170.001.patch
>
>
> HDFS-6962 fixed a long standing issue where default ACLs are not correctly 
> applied to files when they are created from the hadoop shell.
> With this change, if you create a file with default ACLs against the parent 
> directory, with dfs.namenode.posix.acl.inheritance.enabled=false, the result 
> is:
> {code}
> # file: /test_acl/file_from_shell_off
> # owner: user1
> # group: supergroup
> user::rw-
> user:user1:rwx    #effective:r--
> user:user2:rwx    #effective:r--
> group::r-x    #effective:r--
> group:users:rwx    #effective:r--
> mask::r--
> other::r--
> {code}
> And if you enable this, to fix the bug above, the result is as you would 
> expect:
> {code}
> # file: /test_acl/file_from_shell
> # owner: user1
> # group: supergroup
> user::rw-
> user:user1:rwx    #effective:rw-
> user:user2:rwx    #effective:rw-
> group::r-x    #effective:r--
> group:users:rwx    #effective:rw-
> mask::rw-
> other::r--
> {code}
> If I then create a file over HTTPFS or webHDFS, the behaviour is not the same 
> as above:
> {code}
> # file: /test_acl/default_permissions
> # owner: user1
> # group: supergroup
> user::rwx
> user:user1:rwx    #effective:r-x
> user:user2:rwx    #effective:r-x
> group::r-x
> group:users:rwx    #effective:r-x
> mask::r-x
> other::r-x
> {code}
> Notice the mask is set to r-x and this remove the write permission on the new 
> file.
> As part of HDFS-6962 a new parameter was added to webhdfs 
> 'unmaskedpermission'. By passing it to a webhdfs call, it can result in the 
> same behaviour as when a file is written from the CLI:
> {code}
> curl -i -X PUT -T test.txt --header "Content-Type:application/octet-stream"  
> "http://namenode:50075/webhdfs/v1/test_acl/unmasked__770?op=CREATE=user1=namenode:8020=false=770;
> # file: /test_acl/unmasked__770
> # owner: user1
> # group: supergroup
> user::rwx
> user:user1:rwx
> user:user2:rwx
> group::r-x
> group:users:rwx
> mask::rwx
> other::---
> {code}
> However, this parameter was never ported to HTTPFS.
> This Jira is to replicate the same changes to HTTPFS so this parameter is 
> available there too.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13170) Port webhdfs unmaskedpermission parameter to HTTPFS

2018-02-20 Thread Stephen O'Donnell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen O'Donnell updated HDFS-13170:
-
Attachment: HDFS-13170.001.patch

> Port webhdfs unmaskedpermission parameter to HTTPFS
> ---
>
> Key: HDFS-13170
> URL: https://issues.apache.org/jira/browse/HDFS-13170
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Attachments: HDFS-13170.001.patch
>
>
> HDFS-6962 fixed a long standing issue where default ACLs are not correctly 
> applied to files when they are created from the hadoop shell.
> With this change, if you create a file with default ACLs against the parent 
> directory, with dfs.namenode.posix.acl.inheritance.enabled=false, the result 
> is:
> {code}
> # file: /test_acl/file_from_shell_off
> # owner: user1
> # group: supergroup
> user::rw-
> user:user1:rwx    #effective:r--
> user:user2:rwx    #effective:r--
> group::r-x    #effective:r--
> group:users:rwx    #effective:r--
> mask::r--
> other::r--
> {code}
> And if you enable this, to fix the bug above, the result is as you would 
> expect:
> {code}
> # file: /test_acl/file_from_shell
> # owner: user1
> # group: supergroup
> user::rw-
> user:user1:rwx    #effective:rw-
> user:user2:rwx    #effective:rw-
> group::r-x    #effective:r--
> group:users:rwx    #effective:rw-
> mask::rw-
> other::r--
> {code}
> If I then create a file over HTTPFS or webHDFS, the behaviour is not the same 
> as above:
> {code}
> # file: /test_acl/default_permissions
> # owner: user1
> # group: supergroup
> user::rwx
> user:user1:rwx    #effective:r-x
> user:user2:rwx    #effective:r-x
> group::r-x
> group:users:rwx    #effective:r-x
> mask::r-x
> other::r-x
> {code}
> Notice the mask is set to r-x and this remove the write permission on the new 
> file.
> As part of HDFS-6962 a new parameter was added to webhdfs 
> 'unmaskedpermission'. By passing it to a webhdfs call, it can result in the 
> same behaviour as when a file is written from the CLI:
> {code}
> curl -i -X PUT -T test.txt --header "Content-Type:application/octet-stream"  
> "http://namenode:50075/webhdfs/v1/test_acl/unmasked__770?op=CREATE=user1=namenode:8020=false=770;
> # file: /test_acl/unmasked__770
> # owner: user1
> # group: supergroup
> user::rwx
> user:user1:rwx
> user:user2:rwx
> group::r-x
> group:users:rwx
> mask::rwx
> other::---
> {code}
> However, this parameter was never ported to HTTPFS.
> This Jira is to replicate the same changes to HTTPFS so this parameter is 
> available there too.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13102) Implement SnapshotSkipList class to store Multi level DirectoryDiffs

2018-02-20 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370199#comment-16370199
 ] 

Tsz Wo Nicholas Sze commented on HDFS-13102:


Thanks [~shashikant] for working on this.  Some comments on the patch:
- Pass INodeDirectory as a parameter in getSumForRange(..).  Then, we could 
remove INodeDirectory dir from DirectoryDiffList.
- Let's replace getSumForRange with getMinListForRange in DiffList so that we 
may implement it DiffListByArrayList using subList.
- diffSetIndexList does not seem useful since it is the same as the nodes in 
level 1.  BTW, diffSetIndexList is not updated when remove an element so that 
it seems a bug.  I suggest removing diffSetIndexList since it can be computed 
if necessary.
- TestDirectoryDiffList does not test remove(..).  As mentioned, remove(..) 
seems having some bugs.


> Implement SnapshotSkipList class to store Multi level DirectoryDiffs
> 
>
> Key: HDFS-13102
> URL: https://issues.apache.org/jira/browse/HDFS-13102
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Attachments: HDFS-13102.001.patch, HDFS-13102.002.patch, 
> HDFS-13102.003.patch
>
>
> HDFS-11225 explains an issue where deletion of older snapshots can take a 
> very long time in case the no of snapshot diffs is quite large for 
> directories. For any directory under a snapshot, to construct the children 
> list , it needs to combine all the diffs from that particular snapshot to the 
> last snapshotDiff record and reverseApply to the current children list of the 
> directory on live fs. This can take  a significant time if the no of snapshot 
> diffs are quite large and changes per diff is significant.
> This Jira proposes to store the Directory diffs in a SnapshotSkip list, where 
> we store multi level DirectoryDiffs. At each level, the Directory Diff will 
> be cumulative diff of k snapshot diffs,
> where k is the level of a node in the list. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-13169) Ambari UI deploy fails during startup of Ambari Metrics

2018-02-20 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee resolved HDFS-13169.
---
Resolution: Invalid

> Ambari UI deploy fails during startup of Ambari Metrics
> ---
>
> Key: HDFS-13169
> URL: https://issues.apache.org/jira/browse/HDFS-13169
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Aravindan Vijayan
>Priority: Major
>
> {noformat}
> HDP version:HDP-3.0.0.0-702
> Ambari version: 2.99.99.0-77
> {noformat}
> /var/lib/ambari-agent/data/errors-52.txt:
> {noformat}
> Traceback (most recent call last):
>   File 
> "/var/lib/ambari-agent/cache/common-services/AMBARI_METRICS/0.1.0/package/scripts/metrics_collector.py",
>  line 90, in 
> AmsCollector().execute()
>   File 
> "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py",
>  line 371, in execute
> self.execute_prefix_function(self.command_name, 'post', env)
>   File 
> "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py",
>  line 392, in execute_prefix_function
> method(env)
>   File 
> "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py",
>  line 434, in post_start
> raise Fail("Pid file {0} doesn't exist after starting of the 
> component.".format(pid_file))
> resource_management.core.exceptions.Fail: Pid file 
> /var/run/ambari-metrics-collector//hbase-ams-master.pid doesn't exist after 
> starting of the component.
> {noformat}
> /var/lib/ambari-agent/data/output-52.txt:
> {noformat}
> 2018-01-11 13:03:40,753 - Stack Feature Version Info: Cluster Stack=3.0, 
> Command Stack=None, Command Version=3.0.0.0-702 -> 3.0.0.0-702
> 2018-01-11 13:03:40,755 - Using hadoop conf dir: 
> /usr/hdp/3.0.0.0-702/hadoop/conf
> 2018-01-11 13:03:40,884 - Stack Feature Version Info: Cluster Stack=3.0, 
> Command Stack=None, Command Version=3.0.0.0-702 -> 3.0.0.0-702
> 2018-01-11 13:03:40,885 - Using hadoop conf dir: 
> /usr/hdp/3.0.0.0-702/hadoop/conf
> 2018-01-11 13:03:40,886 - Group['hdfs'] {}
> 2018-01-11 13:03:40,887 - Group['hadoop'] {}
> 2018-01-11 13:03:40,887 - Group['users'] {}
> 2018-01-11 13:03:40,887 - User['hive'] {'gid': 'hadoop', 
> 'fetch_nonlocal_groups': True, 'groups': ['hadoop'], 'uid': None}
> 2018-01-11 13:03:40,890 - User['infra-solr'] {'gid': 'hadoop', 
> 'fetch_nonlocal_groups': True, 'groups': ['hadoop'], 'uid': None}
> 2018-01-11 13:03:40,891 - User['zookeeper'] {'gid': 'hadoop', 
> 'fetch_nonlocal_groups': True, 'groups': ['hadoop'], 'uid': None}
> 2018-01-11 13:03:40,892 - User['atlas'] {'gid': 'hadoop', 
> 'fetch_nonlocal_groups': True, 'groups': ['hadoop'], 'uid': None}
> 2018-01-11 13:03:40,893 - User['ams'] {'gid': 'hadoop', 
> 'fetch_nonlocal_groups': True, 'groups': ['hadoop'], 'uid': None}
> 2018-01-11 13:03:40,893 - User['ambari-qa'] {'gid': 'hadoop', 
> 'fetch_nonlocal_groups': True, 'groups': ['hadoop', 'users'], 'uid': None}
> 2018-01-11 13:03:40,894 - User['kafka'] {'gid': 'hadoop', 
> 'fetch_nonlocal_groups': True, 'groups': ['hadoop'], 'uid': None}
> 2018-01-11 13:03:40,894 - User['tez'] {'gid': 'hadoop', 
> 'fetch_nonlocal_groups': True, 'groups': ['hadoop', 'users'], 'uid': None}
> 2018-01-11 13:03:40,895 - User['hdfs'] {'gid': 'hadoop', 
> 'fetch_nonlocal_groups': True, 'groups': ['hdfs', 'hadoop'], 'uid': None}
> 2018-01-11 13:03:40,895 - User['yarn'] {'gid': 'hadoop', 
> 'fetch_nonlocal_groups': True, 'groups': ['hadoop'], 'uid': None}
> 2018-01-11 13:03:40,896 - User['mapred'] {'gid': 'hadoop', 
> 'fetch_nonlocal_groups': True, 'groups': ['hadoop'], 'uid': None}
> 2018-01-11 13:03:40,897 - User['hbase'] {'gid': 'hadoop', 
> 'fetch_nonlocal_groups': True, 'groups': ['hadoop'], 'uid': None}
> 2018-01-11 13:03:40,897 - File['/var/lib/ambari-agent/tmp/changeUid.sh'] 
> {'content': StaticFile('changeToSecureUid.sh'), 'mode': 0555}
> 2018-01-11 13:03:40,898 - Execute['/var/lib/ambari-agent/tmp/changeUid.sh 
> ambari-qa 
> /tmp/hadoop-ambari-qa,/tmp/hsperfdata_ambari-qa,/home/ambari-qa,/tmp/ambari-qa,/tmp/sqoop-ambari-qa
>  0'] {'not_if': '(test $(id -u ambari-qa) -gt 1000) || (false)'}
> 2018-01-11 13:03:40,903 - Skipping 
> Execute['/var/lib/ambari-agent/tmp/changeUid.sh ambari-qa 
> /tmp/hadoop-ambari-qa,/tmp/hsperfdata_ambari-qa,/home/ambari-qa,/tmp/ambari-qa,/tmp/sqoop-ambari-qa
>  0'] due to not_if
> 2018-01-11 13:03:40,903 - Directory['/tmp/hbase-hbase'] {'owner': 'hbase', 
> 'create_parents': True, 'mode': 0775, 'cd_access': 'a'}
> 2018-01-11 13:03:40,904 - File['/var/lib/ambari-agent/tmp/changeUid.sh'] 
> {'content': StaticFile('changeToSecureUid.sh'), 'mode': 0555}
> 2018-01-11 13:03:40,905 - File['/var/lib/ambari-agent/tmp/changeUid.sh'] 
> {'content': StaticFile('changeToSecureUid.sh'), 'mode': 0555}
> 2018-01-11 13:03:40,906 - 

[jira] [Commented] (HDFS-12070) Failed block recovery leaves files open indefinitely and at risk for data loss

2018-02-20 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370138#comment-16370138
 ] 

Daryn Sharp commented on HDFS-12070:


Back when I filed, I played around with a fix and didn't use close=false.  I 
too read the append design.  It reads is if the PD is supposed to obtain a new 
genstamp and retry but I don't think a DN can do that.  The reasoning for 
another round of commit sync wasn't explained.  Perhaps it was due to the 
earlier implementation or concerns over concurrent commit syncs but the 
recovery id feature should allow the NN to weed out prior commit syncs.

My concern is the NN has claimed the lease during commit sync.  Append, 
truncate, and non-overwrite creates will trigger an implicit commit sync.  
Normally it completes almost immediately, roughly up to the heartbeat interval, 
and the client succeeds on retry.  If another round of commit sync is required 
due to close=false, the client can re-trigger commit sync after the soft lease 
period (5 mins) – I don't think a client does or should retry for that long.  
Which means the operation will unnecessarily fail.  Also, it will take up to 
the hard lease period (1 hour) for the NN to fix the under replication.

In either case (close=true/false), the NN has removed the failed DNs from the 
expected locations.  Bad blocks should be invalidated if/when "failed" DNs 
block report in the wrong genstamp and/or size so I think it's safe for the PD 
to ignore failed nodes and close?

> Failed block recovery leaves files open indefinitely and at risk for data loss
> --
>
> Key: HDFS-12070
> URL: https://issues.apache.org/jira/browse/HDFS-12070
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.0.0-alpha
>Reporter: Daryn Sharp
>Assignee: Kihwal Lee
>Priority: Major
> Attachments: HDFS-12070.0.patch, lease.patch
>
>
> Files will remain open indefinitely if block recovery fails which creates a 
> high risk of data loss.  The replication monitor will not replicate these 
> blocks.
> The NN provides the primary node a list of candidate nodes for recovery which 
> involves a 2-stage process. The primary node removes any candidates that 
> cannot init replica recovery (essentially alive and knows about the block) to 
> create a sync list.  Stage 2 issues updates to the sync list – _but fails if 
> any node fails_ unlike the first stage.  The NN should be informed of nodes 
> that did succeed.
> Manual recovery will also fail until the problematic node is temporarily 
> stopped so a connection refused will induce the bad node to be pruned from 
> the candidates.  Recovery succeeds, the lease is released, under replication 
> is fixed, and block is invalidated from the bad node.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13113) Use Log.*(Object, Throwable) overload to log exceptions

2018-02-20 Thread Andras Bokor (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andras Bokor updated HDFS-13113:

Attachment: HADOOP-10571-branch-3.0.002.patch

> Use Log.*(Object, Throwable) overload to log exceptions
> ---
>
> Key: HDFS-13113
> URL: https://issues.apache.org/jira/browse/HDFS-13113
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, namenode, nfs
>Affects Versions: 2.4.0
>Reporter: Steve Loughran
>Assignee: Andras Bokor
>Priority: Major
> Attachments: HADOOP-10571-branch-3.0.002.patch, 
> HADOOP-10571.05.patch, HADOOP-10571.07.patch
>
>
> FYI, In HADOOP-10571, [~boky01] is going to clean up a lot of the log 
> statements, including some in Datanode and elsewhere.
> I'm provisionally +1 on that, but want to run it on the standalone tests 
> (Yetus has already done them), and give the HDFS developers warning of a 
> change which is going to touch their codebase.
> If anyone doesn't want the logging improvements, now is your chance to say so



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-13174) hdfs mover -p /path times out after 20 min

2018-02-20 Thread Istvan Fajth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Istvan Fajth reassigned HDFS-13174:
---

Assignee: Istvan Fajth

> hdfs mover -p /path times out after 20 min
> --
>
> Key: HDFS-13174
> URL: https://issues.apache.org/jira/browse/HDFS-13174
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer  mover
>Affects Versions: 2.8.0, 2.7.4, 3.0.0-alpha2
>Reporter: Istvan Fajth
>Assignee: Istvan Fajth
>Priority: Major
> Fix For: 3.1.0, 3.0.1, 2.8.4, 2.7.6
>
>
> In HDFS-11015 there is an iteration timeout introduced in Dispatcher.Source 
> class, that is checked during dispatching the moves that the Balancer and the 
> Mover does. This timeout is hardwired to 20 minutes.
> In the Balancer we have iterations, and even if an iteration is timing out 
> the Balancer runs further and does an other iteration before it fails if 
> there were no moves happened in a few iterations.
> The Mover on the other hand does not have iterations, so if moving a path 
> runs for more than 20 minutes, after 20 minutes Mover will stop with the 
> following exception reported to the console (lines might differ as this 
> exception came from a CDH5.12.1 installation):
> java.io.IOException: Block move timed out
> at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.receiveResponse(Dispatcher.java:382)
> at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.dispatch(Dispatcher.java:328)
> at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.access$2500(Dispatcher.java:186)
> at 
> org.apache.hadoop.hdfs.server.balancer.Dispatcher$1.run(Dispatcher.java:956)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-13174) hdfs mover -p /path times out after 20 min

2018-02-20 Thread Istvan Fajth (JIRA)
Istvan Fajth created HDFS-13174:
---

 Summary: hdfs mover -p /path times out after 20 min
 Key: HDFS-13174
 URL: https://issues.apache.org/jira/browse/HDFS-13174
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: balancer  mover
Affects Versions: 3.0.0-alpha2, 2.7.4, 2.8.0
Reporter: Istvan Fajth
 Fix For: 3.1.0, 3.0.1, 2.8.4, 2.7.6


In HDFS-11015 there is an iteration timeout introduced in Dispatcher.Source 
class, that is checked during dispatching the moves that the Balancer and the 
Mover does. This timeout is hardwired to 20 minutes.

In the Balancer we have iterations, and even if an iteration is timing out the 
Balancer runs further and does an other iteration before it fails if there were 
no moves happened in a few iterations.

The Mover on the other hand does not have iterations, so if moving a path runs 
for more than 20 minutes, after 20 minutes Mover will stop with the following 
exception reported to the console (lines might differ as this exception came 
from a CDH5.12.1 installation):
java.io.IOException: Block move timed out
at 
org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.receiveResponse(Dispatcher.java:382)
at 
org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.dispatch(Dispatcher.java:328)
at 
org.apache.hadoop.hdfs.server.balancer.Dispatcher$PendingMove.access$2500(Dispatcher.java:186)
at 
org.apache.hadoop.hdfs.server.balancer.Dispatcher$1.run(Dispatcher.java:956)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org