[jira] [Commented] (HDFS-14998) Update Observer Namenode doc for ZKFC after HDFS-14130

2019-11-26 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16983255#comment-16983255
 ] 

Ayush Saxena commented on HDFS-14998:
-

Thanx [~ferhui] seems fine to me.
[~csun] give a check, 
if no further comments will push this by tomorrow EOD. 

> Update Observer Namenode doc for ZKFC after HDFS-14130
> --
>
> Key: HDFS-14998
> URL: https://issues.apache.org/jira/browse/HDFS-14998
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: documentation
>Affects Versions: 3.3.0
>Reporter: Fei Hui
>Assignee: Fei Hui
>Priority: Minor
> Attachments: HDFS-14998.001.patch, HDFS-14998.002.patch, 
> HDFS-14998.003.patch
>
>
> After HDFS-14130, we should update observer namenode doc, observer namenode 
> can run with ZKFC running



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14997) BPServiceActor process command from NameNode asynchronously

2019-11-26 Thread Xiaoqiao He (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16983253#comment-16983253
 ] 

Xiaoqiao He commented on HDFS-14997:


[~elgoiri] v005 try to fix checkstyle and run all failed unit tests Jenkins 
reported above, It looks both of them are passed. Please help to take another 
reviews and double check. Thanks.

> BPServiceActor process command from NameNode asynchronously
> ---
>
> Key: HDFS-14997
> URL: https://issues.apache.org/jira/browse/HDFS-14997
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Xiaoqiao He
>Assignee: Xiaoqiao He
>Priority: Major
> Attachments: HDFS-14997.001.patch, HDFS-14997.002.patch, 
> HDFS-14997.003.patch, HDFS-14997.004.patch, HDFS-14997.005.patch
>
>
> There are two core functions, report(#sendHeartbeat, #blockReport, 
> #cacheReport) and #processCommand in #BPServiceActor main process flow. If 
> processCommand cost long time it will block send report flow. Meanwhile 
> processCommand could cost long time(over 1000s the worst case I meet) when IO 
> load  of DataNode is very high. Since some IO operations are under 
> #datasetLock, So it has to wait to acquire #datasetLock long time when 
> process some of commands(such as #DNA_INVALIDATE). In such case, #heartbeat 
> will not send to NameNode in-time, and trigger other disasters.
> I propose to improve #processCommand asynchronously and not block 
> #BPServiceActor to send heartbeat back to NameNode when meet high IO load.
> Notes:
> 1. Lifeline could be one effective solution, however some old branches are 
> not support this feature.
> 2. IO operations under #datasetLock is another issue, I think we should solve 
> it at another JIRA.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14852) Remove of LowRedundancyBlocks do NOT remove the block from all queues

2019-11-26 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16983252#comment-16983252
 ] 

Ayush Saxena commented on HDFS-14852:
-

Thanx [~ferhui] for the ping. There is already a lot of discussion here, which 
I didn't follow. I need to read it full, to ensure I don't miss any concern. 
[~kihwal] can you give a check? You already have done most of the part here. :)

If no one volunteers, I will try following it up, maybe this weekend.

> Remove of LowRedundancyBlocks do NOT remove the block from all queues
> -
>
> Key: HDFS-14852
> URL: https://issues.apache.org/jira/browse/HDFS-14852
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ec
>Affects Versions: 3.2.0, 3.0.3, 3.1.2, 3.3.0
>Reporter: Fei Hui
>Assignee: Fei Hui
>Priority: Major
> Attachments: CorruptBlocksMismatch.png, HDFS-14852.001.patch, 
> HDFS-14852.002.patch, HDFS-14852.003.patch, HDFS-14852.004.patch, 
> screenshot-1.png
>
>
> LowRedundancyBlocks.java
> {code:java}
> // Some comments here
> if(priLevel >= 0 && priLevel < LEVEL
> && priorityQueues.get(priLevel).remove(block)) {
>   NameNode.blockStateChangeLog.debug(
>   "BLOCK* NameSystem.LowRedundancyBlock.remove: Removing block {}"
>   + " from priority queue {}",
>   block, priLevel);
>   decrementBlockStat(block, priLevel, oldExpectedReplicas);
>   return true;
> } else {
>   // Try to remove the block from all queues if the block was
>   // not found in the queue for the given priority level.
>   for (int i = 0; i < LEVEL; i++) {
> if (i != priLevel && priorityQueues.get(i).remove(block)) {
>   NameNode.blockStateChangeLog.debug(
>   "BLOCK* NameSystem.LowRedundancyBlock.remove: Removing block" +
>   " {} from priority queue {}", block, i);
>   decrementBlockStat(block, i, oldExpectedReplicas);
>   return true;
> }
>   }
> }
> return false;
>   }
> {code}
> Source code is above, the comments as follow
> {quote}
>   // Try to remove the block from all queues if the block was
>   // not found in the queue for the given priority level.
> {quote}
> The function "remove" does NOT remove the block from all queues.
> Function add from LowRedundancyBlocks.java is used on some places and maybe 
> one block in two or more queues.
> We found that corrupt blocks mismatch corrupt files on NN web UI. Maybe it is 
> related to this.
> Upload initial patch



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14997) BPServiceActor process command from NameNode asynchronously

2019-11-26 Thread Xiaoqiao He (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoqiao He updated HDFS-14997:
---
Attachment: HDFS-14997.005.patch

> BPServiceActor process command from NameNode asynchronously
> ---
>
> Key: HDFS-14997
> URL: https://issues.apache.org/jira/browse/HDFS-14997
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Xiaoqiao He
>Assignee: Xiaoqiao He
>Priority: Major
> Attachments: HDFS-14997.001.patch, HDFS-14997.002.patch, 
> HDFS-14997.003.patch, HDFS-14997.004.patch, HDFS-14997.005.patch
>
>
> There are two core functions, report(#sendHeartbeat, #blockReport, 
> #cacheReport) and #processCommand in #BPServiceActor main process flow. If 
> processCommand cost long time it will block send report flow. Meanwhile 
> processCommand could cost long time(over 1000s the worst case I meet) when IO 
> load  of DataNode is very high. Since some IO operations are under 
> #datasetLock, So it has to wait to acquire #datasetLock long time when 
> process some of commands(such as #DNA_INVALIDATE). In such case, #heartbeat 
> will not send to NameNode in-time, and trigger other disasters.
> I propose to improve #processCommand asynchronously and not block 
> #BPServiceActor to send heartbeat back to NameNode when meet high IO load.
> Notes:
> 1. Lifeline could be one effective solution, however some old branches are 
> not support this feature.
> 2. IO operations under #datasetLock is another issue, I think we should solve 
> it at another JIRA.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15009) FSCK "-list-corruptfileblocks" return Invalid Entries

2019-11-26 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16983251#comment-16983251
 ] 

Ayush Saxena commented on HDFS-15009:
-

Thanx [~hemanthboyina] for the patch.

{code:java}
-import static 
org.apache.hadoop.hdfs.server.federation.router.RBFConfigKeys.FEDERATION_MOUNT_TABLE_MAX_CACHE_SIZE;
-import static 
org.apache.hadoop.hdfs.server.federation.router.RBFConfigKeys.FEDERATION_MOUNT_TABLE_MAX_CACHE_SIZE_DEFAULT;
+import static 
org.apache.hadoop.hdfs.server.federation.router.RBFConfigKeys.FEDERATION_MOUNT_TABLE_MAX_CACHE_SIZE;
+import static 
org.apache.hadoop.hdfs.server.federation.router.RBFConfigKeys.FEDERATION_MOUNT_TABLE_MAX_CACHE_SIZE_DEFAULT;
{code}

Avoid this change due to change of order in imports unnecessarily.

For the test, Add a line of comment, explaining what it is testing, as done for 
other cases.
Apart LGTM.


> FSCK "-list-corruptfileblocks" return Invalid Entries
> -
>
> Key: HDFS-15009
> URL: https://issues.apache.org/jira/browse/HDFS-15009
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-15009.001.patch, HDFS-15009.002.patch
>
>
> Scenario :  if we have two directories dir1, dir10 and only dir10 have 
> corrupt files 
> Now if we run -list-corruptfileblocks for dir1,  corrupt files count for dir1 
> showing is of dir10
> {code:java}
>   while (blkIterator.hasNext()) {
> BlockInfo blk = blkIterator.next();
> final INodeFile inode = getBlockCollection(blk);
> skip++;
> if (inode != null) {
>   String src = inode.getFullPathName();
>   if (src.startsWith(path)){
> corruptFiles.add(new CorruptFileBlockInfo(src, blk));
> count++;
> if (count >= DEFAULT_MAX_CORRUPT_FILEBLOCKS_RETURNED)
>   break;
>   }
> }
>   } {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15013) Reduce NameNode overview tab response time

2019-11-26 Thread hemanthboyina (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16983223#comment-16983223
 ] 

hemanthboyina commented on HDFS-15013:
--

+1 (non-binding)

> Reduce NameNode overview tab response time
> --
>
> Key: HDFS-15013
> URL: https://issues.apache.org/jira/browse/HDFS-15013
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: HuangTao
>Assignee: HuangTao
>Priority: Minor
> Fix For: 3.3.0
>
> Attachments: HDFS-15013.001.patch, image-2019-11-26-10-05-39-640.png, 
> image-2019-11-26-10-09-07-952.png
>
>
> Now, the overview tab load /conf synchronously as follow picture.
>  !image-2019-11-26-10-05-39-640.png! 
> This issue will change it to an asynchronous method. The effect diagram is as 
> follows.
>  !image-2019-11-26-10-09-07-952.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14986) ReplicaCachingGetSpaceUsed throws ConcurrentModificationException

2019-11-26 Thread Aiphago (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16983213#comment-16983213
 ] 

Aiphago commented on HDFS-14986:


Hi [~linyiqun],Thank you for your valuable advice.I improve the patch as your 
comment.Addition I rename shouldInitRefresh to shouldFirstRefresh order to easy 
distinguish.[^HDFS-14986.005.patch]

> ReplicaCachingGetSpaceUsed throws  ConcurrentModificationException
> --
>
> Key: HDFS-14986
> URL: https://issues.apache.org/jira/browse/HDFS-14986
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, performance
>Reporter: Ryan Wu
>Assignee: Aiphago
>Priority: Major
> Attachments: HDFS-14986.001.patch, HDFS-14986.002.patch, 
> HDFS-14986.003.patch, HDFS-14986.004.patch, HDFS-14986.005.patch
>
>
> Running DU across lots of disks is very expensive . We applied the patch 
> HDFS-14313 to get  used space from ReplicaInfo in memory.However, new du 
> threads throw the exception
> {code:java}
> // 2019-11-08 18:07:13,858 ERROR 
> [refreshUsed-/home/vipshop/hard_disk/7/dfs/dn/current/BP-1203969992--1450855658517]
>  
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed:
>  ReplicaCachingGetSpaceUsed refresh error
> java.util.ConcurrentModificationException: Tree has been modified outside of 
> iterator
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.checkForModification(FoldedTreeSet.java:311)
> 
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.hasNext(FoldedTreeSet.java:256)
> 
> at java.util.AbstractCollection.addAll(AbstractCollection.java:343)
> at java.util.HashSet.(HashSet.java:120)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.deepCopyReplica(FsDatasetImpl.java:1052)
> 
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed.refresh(ReplicaCachingGetSpaceUsed.java:73)
> 
> at 
> org.apache.hadoop.fs.CachingGetSpaceUsed$RefreshThread.run(CachingGetSpaceUsed.java:178)
>    
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14986) ReplicaCachingGetSpaceUsed throws ConcurrentModificationException

2019-11-26 Thread Aiphago (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aiphago updated HDFS-14986:
---
Attachment: HDFS-14986.005.patch

> ReplicaCachingGetSpaceUsed throws  ConcurrentModificationException
> --
>
> Key: HDFS-14986
> URL: https://issues.apache.org/jira/browse/HDFS-14986
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, performance
>Reporter: Ryan Wu
>Assignee: Aiphago
>Priority: Major
> Attachments: HDFS-14986.001.patch, HDFS-14986.002.patch, 
> HDFS-14986.003.patch, HDFS-14986.004.patch, HDFS-14986.005.patch
>
>
> Running DU across lots of disks is very expensive . We applied the patch 
> HDFS-14313 to get  used space from ReplicaInfo in memory.However, new du 
> threads throw the exception
> {code:java}
> // 2019-11-08 18:07:13,858 ERROR 
> [refreshUsed-/home/vipshop/hard_disk/7/dfs/dn/current/BP-1203969992--1450855658517]
>  
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed:
>  ReplicaCachingGetSpaceUsed refresh error
> java.util.ConcurrentModificationException: Tree has been modified outside of 
> iterator
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.checkForModification(FoldedTreeSet.java:311)
> 
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.hasNext(FoldedTreeSet.java:256)
> 
> at java.util.AbstractCollection.addAll(AbstractCollection.java:343)
> at java.util.HashSet.(HashSet.java:120)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.deepCopyReplica(FsDatasetImpl.java:1052)
> 
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed.refresh(ReplicaCachingGetSpaceUsed.java:73)
> 
> at 
> org.apache.hadoop.fs.CachingGetSpaceUsed$RefreshThread.run(CachingGetSpaceUsed.java:178)
>    
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15010) BlockPoolSlice#addReplicaThreadPool static pool should be initialized by static method

2019-11-26 Thread Surendra Singh Lilhore (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16983209#comment-16983209
 ] 

Surendra Singh Lilhore commented on HDFS-15010:
---

Attached v4 patch.

> BlockPoolSlice#addReplicaThreadPool static pool should be initialized by 
> static method
> --
>
> Key: HDFS-15010
> URL: https://issues.apache.org/jira/browse/HDFS-15010
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 3.1.2
>Reporter: Surendra Singh Lilhore
>Assignee: Surendra Singh Lilhore
>Priority: Major
> Attachments: HDFS-15010.001.patch, HDFS-15010.02.patch, 
> HDFS-15010.03.patch, HDFS-15010.04.patch
>
>
> {{BlockPoolSlice#initializeAddReplicaPool()}} method currently initialize the 
> static thread pool instance. But when two {{BPServiceActor}} actor try to 
> load block pool parallelly then it may create different instance. 
> So {{BlockPoolSlice#initializeAddReplicaPool()}} method should be a static 
> method.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15010) BlockPoolSlice#addReplicaThreadPool static pool should be initialized by static method

2019-11-26 Thread Surendra Singh Lilhore (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Surendra Singh Lilhore updated HDFS-15010:
--
Attachment: HDFS-15010.04.patch

> BlockPoolSlice#addReplicaThreadPool static pool should be initialized by 
> static method
> --
>
> Key: HDFS-15010
> URL: https://issues.apache.org/jira/browse/HDFS-15010
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 3.1.2
>Reporter: Surendra Singh Lilhore
>Assignee: Surendra Singh Lilhore
>Priority: Major
> Attachments: HDFS-15010.001.patch, HDFS-15010.02.patch, 
> HDFS-15010.03.patch, HDFS-15010.04.patch
>
>
> {{BlockPoolSlice#initializeAddReplicaPool()}} method currently initialize the 
> static thread pool instance. But when two {{BPServiceActor}} actor try to 
> load block pool parallelly then it may create different instance. 
> So {{BlockPoolSlice#initializeAddReplicaPool()}} method should be a static 
> method.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14901) RBF: Add Encryption Zone related ClientProtocol APIs

2019-11-26 Thread hemanthboyina (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16983207#comment-16983207
 ] 

hemanthboyina commented on HDFS-14901:
--

thanks for the review [~elgoiri] 

Thread.sleep(100) is just to make sure cluster was ready 
I think it is not always neccessary . may be not in this case
{quote} is there any option we can do something smarter than sleeping 100ms
{quote}
i am not aware of any .

 

> RBF: Add Encryption Zone related ClientProtocol APIs
> 
>
> Key: HDFS-14901
> URL: https://issues.apache.org/jira/browse/HDFS-14901
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-14901.001.patch, HDFS-14901.002.patch
>
>
> Currently listEncryptionZones,reencryptEncryptionZone,listReencryptionStatus 
> these APIs are not implemented in Router.
> This JIRA is intend to implement above mentioned APIs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15010) BlockPoolSlice#addReplicaThreadPool static pool should be initialized by static method

2019-11-26 Thread Surendra Singh Lilhore (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16983201#comment-16983201
 ] 

Surendra Singh Lilhore commented on HDFS-15010:
---

{quote}{{TestFsVolumeList}} looks suspicious.
{quote}
{{addReplicaThreadPool}} is a static instance and if some other test case is 
using {{MiniDFSCluster}} then itwill initialize with 
{{Runtime.getRuntime().availableProcessors()}}. This will pass in some machine 
if available processors in that machine is 5. So it is better to reinitialize 
{{addReplicaThreadPool}} before test case execution.
{quote}BTW, I checked the test and is a little long (~5 seconds).
{quote}
Its depend on machine, if you see the latest test report it is taking only 3.8 
sec.

[https://builds.apache.org/job/PreCommit-HDFS-Build/28400/testReport/org.apache.hadoop.hdfs.server.datanode.fsdataset.impl/TestFsVolumeList/]

I will attache new patch with the fix.

 

> BlockPoolSlice#addReplicaThreadPool static pool should be initialized by 
> static method
> --
>
> Key: HDFS-15010
> URL: https://issues.apache.org/jira/browse/HDFS-15010
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 3.1.2
>Reporter: Surendra Singh Lilhore
>Assignee: Surendra Singh Lilhore
>Priority: Major
> Attachments: HDFS-15010.001.patch, HDFS-15010.02.patch, 
> HDFS-15010.03.patch
>
>
> {{BlockPoolSlice#initializeAddReplicaPool()}} method currently initialize the 
> static thread pool instance. But when two {{BPServiceActor}} actor try to 
> load block pool parallelly then it may create different instance. 
> So {{BlockPoolSlice#initializeAddReplicaPool()}} method should be a static 
> method.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14960) TesteBalancerWithNodeGroup should not succeed with DFSNetworkTopology

2019-11-26 Thread hemanthboyina (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16983200#comment-16983200
 ] 

hemanthboyina commented on HDFS-14960:
--

[~elgoiri] [~Jim_Brennan]

I think there should be a validation in BlockPlacementPolicyWithNodeGroup.java
{code:java}
 @Override
  public void initialize(Configuration conf,  FSClusterStats stats,
  NetworkTopology clusterMap, 
  Host2NodesMap host2datanodeMap) {
super.initialize(conf, stats, clusterMap, host2datanodeMap);
  } {code}
NetworkTopology's clusterMap should be instance of NetworkTopologyWithNodeGroup 
, if it is not then we should throw the exception .

please correct me if am wrong .

> TesteBalancerWithNodeGroup should not succeed with DFSNetworkTopology
> -
>
> Key: HDFS-14960
> URL: https://issues.apache.org/jira/browse/HDFS-14960
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.1.3
>Reporter: Jim Brennan
>Priority: Minor
>
> As reported in HDFS-14958, TestBalancerWithNodeGroup was succeeding even 
> though it was using DFSNetworkTopology instead of 
> NetworkTopologyWithNodeGroup.
> [~inigoiri] rightly suggested that this indicates the test is not very good - 
> it should fail when run without NetworkTopologyWithNodeGroup.
> We should improve this test.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14651) DeadNodeDetector checks dead node periodically

2019-11-26 Thread Lisheng Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lisheng Sun updated HDFS-14651:
---
Description: 
DeadNodeDetector checks dead node periodically.
DeadNodeDetector periodically detect the Node in DeadNodeDetector#deadnode, If 
the access is successful, the Node will be moved from 
DeadNodeDetector#deadnode. Continuous detection of the dead node is necessary. 
The DataNode need rejoin the cluster due to a service restart/machine repair. 
The DataNode may be permanently excluded if there is no added probe mechanism.

  was:DeadNodeDetector checks dead node periodically.


> DeadNodeDetector checks dead node periodically
> --
>
> Key: HDFS-14651
> URL: https://issues.apache.org/jira/browse/HDFS-14651
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14651.001.patch, HDFS-14651.002.patch, 
> HDFS-14651.003.patch, HDFS-14651.004.patch, HDFS-14651.005.patch, 
> HDFS-14651.006.patch, HDFS-14651.007.patch, HDFS-14651.008.patch
>
>
> DeadNodeDetector checks dead node periodically.
> DeadNodeDetector periodically detect the Node in DeadNodeDetector#deadnode, 
> If the access is successful, the Node will be moved from 
> DeadNodeDetector#deadnode. Continuous detection of the dead node is 
> necessary. The DataNode need rejoin the cluster due to a service 
> restart/machine repair. The DataNode may be permanently excluded if there is 
> no added probe mechanism.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-14650) DeadNodeDetector redetects Suspicious Node

2019-11-26 Thread Lisheng Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lisheng Sun resolved HDFS-14650.

Resolution: Implemented

> DeadNodeDetector redetects Suspicious Node
> --
>
> Key: HDFS-14650
> URL: https://issues.apache.org/jira/browse/HDFS-14650
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14650) DeadNodeDetector redetects Suspicious Node

2019-11-26 Thread Lisheng Sun (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16983185#comment-16983185
 ] 

Lisheng Sun commented on HDFS-14650:


HDFS-14649 has implemented this jira, I will close it.

> DeadNodeDetector redetects Suspicious Node
> --
>
> Key: HDFS-14650
> URL: https://issues.apache.org/jira/browse/HDFS-14650
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15010) BlockPoolSlice#addReplicaThreadPool static pool should be initialized by static method

2019-11-26 Thread Jira


[ 
https://issues.apache.org/jira/browse/HDFS-15010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16983181#comment-16983181
 ] 

Íñigo Goiri commented on HDFS-15010:


{{TestFsVolumeList}} looks suspicious.

> BlockPoolSlice#addReplicaThreadPool static pool should be initialized by 
> static method
> --
>
> Key: HDFS-15010
> URL: https://issues.apache.org/jira/browse/HDFS-15010
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 3.1.2
>Reporter: Surendra Singh Lilhore
>Assignee: Surendra Singh Lilhore
>Priority: Major
> Attachments: HDFS-15010.001.patch, HDFS-15010.02.patch, 
> HDFS-15010.03.patch
>
>
> {{BlockPoolSlice#initializeAddReplicaPool()}} method currently initialize the 
> static thread pool instance. But when two {{BPServiceActor}} actor try to 
> load block pool parallelly then it may create different instance. 
> So {{BlockPoolSlice#initializeAddReplicaPool()}} method should be a static 
> method.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-9786) HttpFS doesn't write the proxyuser information in logfile

2019-11-26 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-9786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16983162#comment-16983162
 ] 

Ayush Saxena commented on HDFS-9786:


Already 4 years from now, [~weichiu] seems it is safe for you to takeover.
[~sookim] doesn't seems to be active. 

> HttpFS doesn't write the proxyuser information in logfile
> -
>
> Key: HDFS-9786
> URL: https://issues.apache.org/jira/browse/HDFS-9786
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Heesoo Kim
>Assignee: Heesoo Kim
>Priority: Major
>
> According to the httpfs-log4j.properties, the log pattern indicates that
> {code}
> log4j.appender.httpfsaudit.layout.ConversionPattern=%d{ISO8601} %5p 
> [%X{hostname}][%X{user}:%X{doAs}] %X{op} %m%n
> {code}
> However, the httpfsaudit doesn't write right information for user and 
> proxyuser information. It is better to write ugi on audit log in HttpFS GW.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-14986) ReplicaCachingGetSpaceUsed throws ConcurrentModificationException

2019-11-26 Thread Yiqun Lin (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16983153#comment-16983153
 ] 

Yiqun Lin edited comment on HDFS-14986 at 11/27/19 4:04 AM:


Comment makes sense to me. I have minor suggestion for {{shouldInitRefresh}}. 

The meaning of shouldInitRefresh should be that 'should do init refresh 
operation', so the default value should be true rather than false. So I prefer 
to change this and add comment:

{code}
   void init() {
 if (used.get() < 0) {
   used.set(0);
// Skip initial refresh operation, so we need to do first refresh 
// operation immediately in refresh thread.
+  if (!shouldInitRefresh) {
+initRefeshThread(true);
+return;
+  }
   refresh();
 }
+initRefeshThread(false);
+  }
{code}

Can we move this method into CachingGetSpaceUsed? The variable is under 
CachingGetSpaceUsed, it will be better also put the set operation in this class 
and let sub-classes to use this.
{code}
+  /**
+   * Reset that if we need to do the initial refresh.
+   * @param shouldInitRefresh The flag value to set.
+   */
+  protected void setShouldInitRefresh(boolean shouldInitRefresh) {
+this.shouldInitRefresh = shouldInitRefresh;
+  }
{code}

Can you update method name and comment for this method?
{code}
void initRefeshThread (boolean runImmediately)
{code}
to
{code}
// add comment here.
private void initRefeshThread (boolean runImmediately)
{code}


was (Author: linyiqun):
Comment makes sense to me. I have minor suggestion for {{shouldInitRefresh}}. 

The meaning of shouldInitRefresh should be that 'should do init refresh 
operation', so the default value should be true rather than false. So I prefer 
to change this and add comment:

{code}
   void init() {
 if (used.get() < 0) {
   used.set(0);
// Skip initial refresh operation, so we need to do first refresh 
// operation immediately in refresh thread.
+  if (!shouldInitRefresh) {
+initRefeshThread(true);
+return;
+  }
   refresh();
 }
+initRefeshThread(false);
+  }
{code}

Can we move this method into CachingGetSpaceUsed? The variable is under 
CachingGetSpaceUsed, it will be better also put the set operation in this class 
and let sub-classes to use this.
{code}
+  /**
+   * Reset that if we need to do the initial refresh.
+   * @param shouldInitRefresh The flag value to set.
+   */
+  protected void setShouldInitRefresh(boolean shouldInitRefresh) {
+this.shouldInitRefresh = shouldInitRefresh;
+  }
{code}

> ReplicaCachingGetSpaceUsed throws  ConcurrentModificationException
> --
>
> Key: HDFS-14986
> URL: https://issues.apache.org/jira/browse/HDFS-14986
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, performance
>Reporter: Ryan Wu
>Assignee: Aiphago
>Priority: Major
> Attachments: HDFS-14986.001.patch, HDFS-14986.002.patch, 
> HDFS-14986.003.patch, HDFS-14986.004.patch
>
>
> Running DU across lots of disks is very expensive . We applied the patch 
> HDFS-14313 to get  used space from ReplicaInfo in memory.However, new du 
> threads throw the exception
> {code:java}
> // 2019-11-08 18:07:13,858 ERROR 
> [refreshUsed-/home/vipshop/hard_disk/7/dfs/dn/current/BP-1203969992--1450855658517]
>  
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed:
>  ReplicaCachingGetSpaceUsed refresh error
> java.util.ConcurrentModificationException: Tree has been modified outside of 
> iterator
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.checkForModification(FoldedTreeSet.java:311)
> 
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.hasNext(FoldedTreeSet.java:256)
> 
> at java.util.AbstractCollection.addAll(AbstractCollection.java:343)
> at java.util.HashSet.(HashSet.java:120)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.deepCopyReplica(FsDatasetImpl.java:1052)
> 
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed.refresh(ReplicaCachingGetSpaceUsed.java:73)
> 
> at 
> org.apache.hadoop.fs.CachingGetSpaceUsed$RefreshThread.run(CachingGetSpaceUsed.java:178)
>    
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14986) ReplicaCachingGetSpaceUsed throws ConcurrentModificationException

2019-11-26 Thread Yiqun Lin (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16983153#comment-16983153
 ] 

Yiqun Lin commented on HDFS-14986:
--

Comment makes sense to me. I have minor suggestion for {{shouldInitRefresh}}. 

The meaning of shouldInitRefresh should be that 'should do init refresh 
operation', so the default value should be true rather than false. So I prefer 
to change this and add comment:

{code}
   void init() {
 if (used.get() < 0) {
   used.set(0);
// Skip initial refresh operation, so we need to do first refresh 
// operation immediately in refresh thread.
+  if (!shouldInitRefresh) {
+initRefeshThread(true);
+return;
+  }
   refresh();
 }
+initRefeshThread(false);
+  }
{code}

Can we move this method into CachingGetSpaceUsed? The variable is under 
CachingGetSpaceUsed, it will be better also put the set operation in this class 
and let sub-classes to use this.
{code}
+  /**
+   * Reset that if we need to do the initial refresh.
+   * @param shouldInitRefresh The flag value to set.
+   */
+  protected void setShouldInitRefresh(boolean shouldInitRefresh) {
+this.shouldInitRefresh = shouldInitRefresh;
+  }
{code}

> ReplicaCachingGetSpaceUsed throws  ConcurrentModificationException
> --
>
> Key: HDFS-14986
> URL: https://issues.apache.org/jira/browse/HDFS-14986
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, performance
>Reporter: Ryan Wu
>Assignee: Aiphago
>Priority: Major
> Attachments: HDFS-14986.001.patch, HDFS-14986.002.patch, 
> HDFS-14986.003.patch, HDFS-14986.004.patch
>
>
> Running DU across lots of disks is very expensive . We applied the patch 
> HDFS-14313 to get  used space from ReplicaInfo in memory.However, new du 
> threads throw the exception
> {code:java}
> // 2019-11-08 18:07:13,858 ERROR 
> [refreshUsed-/home/vipshop/hard_disk/7/dfs/dn/current/BP-1203969992--1450855658517]
>  
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed:
>  ReplicaCachingGetSpaceUsed refresh error
> java.util.ConcurrentModificationException: Tree has been modified outside of 
> iterator
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.checkForModification(FoldedTreeSet.java:311)
> 
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.hasNext(FoldedTreeSet.java:256)
> 
> at java.util.AbstractCollection.addAll(AbstractCollection.java:343)
> at java.util.HashSet.(HashSet.java:120)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.deepCopyReplica(FsDatasetImpl.java:1052)
> 
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed.refresh(ReplicaCachingGetSpaceUsed.java:73)
> 
> at 
> org.apache.hadoop.fs.CachingGetSpaceUsed$RefreshThread.run(CachingGetSpaceUsed.java:178)
>    
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15018) DataNode doesn't shutdown although the number of failed disks reaches dfs.datanode.failed.volumes.tolerated

2019-11-26 Thread Toshihiro Suzuki (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Toshihiro Suzuki updated HDFS-15018:

Description: 
In our case, we set dfs.datanode.failed.volumes.tolerated=0 but a DataNode 
didn't shutdown when a disk in the DataNode host got failed for some reason.

The the following log messages were shown in the DataNode log which indicates 
the DataNode detected the disk failure, but the DataNode didn't shutdown:
{code}
2019-09-17T13:15:43.262-0400 WARN 
org.apache.hadoop.hdfs.server.datanode.DataNode: checkDiskErrorAsync callback 
got 1 failed volumes: [/data2/hdfs/current]
2019-09-17T13:15:43.262-0400 INFO 
org.apache.hadoop.hdfs.server.datanode.BlockScanner: Removing scanner for 
volume /data2/hdfs (StorageID DS-329dec9d-a476-4334-9570-651a7e4d1f44)
2019-09-17T13:15:43.263-0400 INFO 
org.apache.hadoop.hdfs.server.datanode.VolumeScanner: 
VolumeScanner(/data2/hdfs, DS-329dec9d-a476-4334-9570-651a7e4d1f44) exiting.
{code}

Looking at the HDFS code, it looks like when the DataNode detects a disk 
failure, DataNode waits until the volume reference of the disk is released.
https://github.com/hortonworks/hadoop/blob/HDP-2.6.5.0-292-tag/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsVolumeList.java#L246

I'm suspecting that the volume reference is not released after the failure 
detection, but not sure the reason.

And we took thread dumps when the issue was happening. It looks like the 
following thread is waiting for the volume reference of the disk to be released:
{code}
"pool-4-thread-1" #174 daemon prio=5 os_prio=0 tid=0x7f9e7c7bf800 
nid=0x8325 in Object.wait() [0x7f9e629cb000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.waitVolumeRemoved(FsVolumeList.java:262)
at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.handleVolumeFailures(FsVolumeList.java:246)
- locked <0x000670559278> (a java.lang.Object)
at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.handleVolumeFailures(FsDatasetImpl.java:2178)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.handleVolumeFailures(DataNode.java:3410)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.access$100(DataNode.java:248)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode$4.call(DataNode.java:2013)
at 
org.apache.hadoop.hdfs.server.datanode.checker.DatasetVolumeChecker$ResultHandler.invokeCallback(DatasetVolumeChecker.java:394)
at 
org.apache.hadoop.hdfs.server.datanode.checker.DatasetVolumeChecker$ResultHandler.cleanup(DatasetVolumeChecker.java:387)
at 
org.apache.hadoop.hdfs.server.datanode.checker.DatasetVolumeChecker$ResultHandler.onFailure(DatasetVolumeChecker.java:370)
at com.google.common.util.concurrent.Futures$6.run(Futures.java:977)
at 
com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:253)
at 
org.apache.hadoop.hdfs.server.datanode.checker.AbstractFuture.executeListener(AbstractFuture.java:991)
at 
org.apache.hadoop.hdfs.server.datanode.checker.AbstractFuture.complete(AbstractFuture.java:885)
at 
org.apache.hadoop.hdfs.server.datanode.checker.AbstractFuture.setException(AbstractFuture.java:739)
at 
org.apache.hadoop.hdfs.server.datanode.checker.TimeoutFuture$Fire.run(TimeoutFuture.java:137)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
{code}

We found a similar issue HDFS-13339, but we didn't see any dead lock from the 
thread dump.

Attaching the full thread dumps of the problematic DataNode.


  was:
In our case, we set dfs.datanode.failed.volumes.tolerated=0 but a DataNode 
didn't shutdown when a disk in the DataNode host got failed for some reason.

The the following log messages were shown in the DataNode log which indicates 
the DataNode detected the disk failure, but the DataNode didn't shutdown:
{code}
2019-09-17T13:15:43.262-0400 WARN 
org.apache.hadoop.hdfs.server.datanode.DataNode: checkDiskErrorAsync callback 
got 1 failed volumes: [/data2/hdfs/current]
2019-09-17T13:15:43.262-0400 INFO 

[jira] [Created] (HDFS-15018) DataNode doesn't shutdown although the number of failed disks reaches dfs.datanode.failed.volumes.tolerated

2019-11-26 Thread Toshihiro Suzuki (Jira)
Toshihiro Suzuki created HDFS-15018:
---

 Summary: DataNode doesn't shutdown although the number of failed 
disks reaches dfs.datanode.failed.volumes.tolerated
 Key: HDFS-15018
 URL: https://issues.apache.org/jira/browse/HDFS-15018
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.7.3
 Environment: HDP-2.6.5
Reporter: Toshihiro Suzuki
 Attachments: thread_dumps.txt

In our case, we set dfs.datanode.failed.volumes.tolerated=0 but a DataNode 
didn't shutdown when a disk in the DataNode host got failed for some reason.

The the following log messages were shown in the DataNode log which indicates 
the DataNode detected the disk failure, but the DataNode didn't shutdown:
{code}
2019-09-17T13:15:43.262-0400 WARN 
org.apache.hadoop.hdfs.server.datanode.DataNode: checkDiskErrorAsync callback 
got 1 failed volumes: [/data2/hdfs/current]
2019-09-17T13:15:43.262-0400 INFO 
org.apache.hadoop.hdfs.server.datanode.BlockScanner: Removing scanner for 
volume /data2/hdfs (StorageID DS-329dec9d-a476-4334-9570-651a7e4d1f44)
2019-09-17T13:15:43.263-0400 INFO 
org.apache.hadoop.hdfs.server.datanode.VolumeScanner: 
VolumeScanner(/data2/hdfs, DS-329dec9d-a476-4334-9570-651a7e4d1f44) exiting.
{code}

Looking at the HDFS code, it looks like when the DataNode detects a disk 
failure, DataNode waits until the volume reference of the disk is released.
https://github.com/hortonworks/hadoop/blob/HDP-2.6.5.0-292-tag/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsVolumeList.java#L246

I'm suspecting that the volume reference is not released after the failure 
detection, but not sure the reason.

And we took when the issue was happening. Attaching it to this Jira.

It looks like the following thread is waiting for the volume reference of the 
disk to be released:
{code}
"pool-4-thread-1" #174 daemon prio=5 os_prio=0 tid=0x7f9e7c7bf800 
nid=0x8325 in Object.wait() [0x7f9e629cb000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.waitVolumeRemoved(FsVolumeList.java:262)
at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.handleVolumeFailures(FsVolumeList.java:246)
- locked <0x000670559278> (a java.lang.Object)
at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.handleVolumeFailures(FsDatasetImpl.java:2178)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.handleVolumeFailures(DataNode.java:3410)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.access$100(DataNode.java:248)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode$4.call(DataNode.java:2013)
at 
org.apache.hadoop.hdfs.server.datanode.checker.DatasetVolumeChecker$ResultHandler.invokeCallback(DatasetVolumeChecker.java:394)
at 
org.apache.hadoop.hdfs.server.datanode.checker.DatasetVolumeChecker$ResultHandler.cleanup(DatasetVolumeChecker.java:387)
at 
org.apache.hadoop.hdfs.server.datanode.checker.DatasetVolumeChecker$ResultHandler.onFailure(DatasetVolumeChecker.java:370)
at com.google.common.util.concurrent.Futures$6.run(Futures.java:977)
at 
com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:253)
at 
org.apache.hadoop.hdfs.server.datanode.checker.AbstractFuture.executeListener(AbstractFuture.java:991)
at 
org.apache.hadoop.hdfs.server.datanode.checker.AbstractFuture.complete(AbstractFuture.java:885)
at 
org.apache.hadoop.hdfs.server.datanode.checker.AbstractFuture.setException(AbstractFuture.java:739)
at 
org.apache.hadoop.hdfs.server.datanode.checker.TimeoutFuture$Fire.run(TimeoutFuture.java:137)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
{code}

We found a similar issue HDFS-13339, but we didn't see any dead lock from the 
thread dump.

Attaching the full thread dumps of the problematic DataNode.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional 

[jira] [Commented] (HDFS-14649) Add suspect probe for DeadNodeDetector

2019-11-26 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16983127#comment-16983127
 ] 

Hudson commented on HDFS-14649:
---

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17701 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/17701/])
HDFS-14649. Add suspect probe for DeadNodeDetector. Contributed by (yqlin: rev 
c8bef4d6a6d7d5affd00cff6ea4a2e2ef778050e)
* (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml
* (edit) 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DeadNodeDetector.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/client/HdfsClientConfigKeys.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDeadNodeDetection.java


> Add suspect probe for DeadNodeDetector
> --
>
> Key: HDFS-14649
> URL: https://issues.apache.org/jira/browse/HDFS-14649
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14649.001.patch, HDFS-14649.002.patch, 
> HDFS-14649.003.patch, HDFS-14649.004.patch, HDFS-14649.005.patch
>
>
> Add suspect probe for DeadNodeDetector.
> when  some DataNode of the block is found to inaccessible, put the DataNode 
> into it will be placed in the Suspicious Node list. Because when DataNode is 
> not accessible, it is likely that the replica has been removed from the 
> DataNode.Therefore, it needs to be confirmed by re-probing and requires a 
> higher priority processing.
> And when DataNode is placed in the Suspicious Node list, it is accessed by 
> other dfsinputstream.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14650) DeadNodeDetector redetects Suspicious Node

2019-11-26 Thread Yiqun Lin (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16983124#comment-16983124
 ] 

Yiqun Lin commented on HDFS-14650:
--

[~leosun08], is this sub-task still needed?

> DeadNodeDetector redetects Suspicious Node
> --
>
> Key: HDFS-14650
> URL: https://issues.apache.org/jira/browse/HDFS-14650
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14649) Add suspect probe for DeadNodeDetector

2019-11-26 Thread Yiqun Lin (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yiqun Lin updated HDFS-14649:
-
Fix Version/s: 3.3.0
 Hadoop Flags: Reviewed
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

Committed this to trunk.

Thanks [~leosun08] for the contribution.

> Add suspect probe for DeadNodeDetector
> --
>
> Key: HDFS-14649
> URL: https://issues.apache.org/jira/browse/HDFS-14649
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14649.001.patch, HDFS-14649.002.patch, 
> HDFS-14649.003.patch, HDFS-14649.004.patch, HDFS-14649.005.patch
>
>
> Add suspect probe for DeadNodeDetector.
> when  some DataNode of the block is found to inaccessible, put the DataNode 
> into it will be placed in the Suspicious Node list. Because when DataNode is 
> not accessible, it is likely that the replica has been removed from the 
> DataNode.Therefore, it needs to be confirmed by re-probing and requires a 
> higher priority processing.
> And when DataNode is placed in the Suspicious Node list, it is accessed by 
> other dfsinputstream.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14649) Add suspect probe for DeadNodeDetector

2019-11-26 Thread Yiqun Lin (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16983121#comment-16983121
 ] 

Yiqun Lin commented on HDFS-14649:
--

The latest patch LGTM, +1.

> Add suspect probe for DeadNodeDetector
> --
>
> Key: HDFS-14649
> URL: https://issues.apache.org/jira/browse/HDFS-14649
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-14649.001.patch, HDFS-14649.002.patch, 
> HDFS-14649.003.patch, HDFS-14649.004.patch, HDFS-14649.005.patch
>
>
> Add suspect probe for DeadNodeDetector.
> when  some DataNode of the block is found to inaccessible, put the DataNode 
> into it will be placed in the Suspicious Node list. Because when DataNode is 
> not accessible, it is likely that the replica has been removed from the 
> DataNode.Therefore, it needs to be confirmed by re-probing and requires a 
> higher priority processing.
> And when DataNode is placed in the Suspicious Node list, it is accessed by 
> other dfsinputstream.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15016) RBF: getDatanodeReport() should return the latest update

2019-11-26 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16983104#comment-16983104
 ] 

Ayush Saxena commented on HDFS-15016:
-

Thanx [~elgoiri] for the report.
Makes sense to correct and the proposed fix too seems straightforward.
A way to refactor could be like this too :

{code:java}
DatanodeInfo existingNodeId = datanodesMap.get(nodeId);
if (existingNodeId == null ||
node.getLastUpdate() > existingNodeId.getLastUpdate())
{code}

Not sure whether it will save anything or not, just give a check once you 
upload the fix, otherwise I am Ok with the way in the comment too. :)


> RBF: getDatanodeReport() should return the latest update
> 
>
> Key: HDFS-15016
> URL: https://issues.apache.org/jira/browse/HDFS-15016
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Íñigo Goiri
>Priority: Major
>
> Currently, when the Router calls getDatanodeReport() (or 
> getDatanodeStorageReport()) and the DN is in multiple clusters, it just takes 
> the one that comes first. It should consider the latest update.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-9695) HTTPFS - CHECKACCESS operation missing

2019-11-26 Thread Takanobu Asanuma (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-9695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16983102#comment-16983102
 ] 

Takanobu Asanuma commented on HDFS-9695:


Since HttpFs has the same interface of WebHDFS, ACCESS should be CHECKACCESS 
and FSACTION_MODE_PARAM shuold be _fsaction_?

[https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/WebHDFS.html#Check_access]

> HTTPFS - CHECKACCESS operation missing
> --
>
> Key: HDFS-9695
> URL: https://issues.apache.org/jira/browse/HDFS-9695
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Bert Hekman
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-9695.001.patch
>
>
> Hi,
> The CHECKACCESS operation seems to be missing in HTTPFS. I'm getting the 
> following error:
> {code}
> QueryParamException: java.lang.IllegalArgumentException: No enum constant 
> org.apache.hadoop.fs.http.client.HttpFSFileSystem.Operation.CHECKACCESS
> {code}
> A quick look into the org.apache.hadoop.fs.http.client.HttpFSFileSystem class 
> reveals that CHECKACCESS is not defined at all.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-15017) Remove redundant import of AtomicBoolean in NameNodeConnector.

2019-11-26 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun reassigned HDFS-15017:
---

Assignee: Chao Sun

> Remove redundant import of AtomicBoolean in NameNodeConnector.
> --
>
> Key: HDFS-15017
> URL: https://issues.apache.org/jira/browse/HDFS-15017
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer  mover, hdfs
>Affects Versions: 2.10.0
>Reporter: Konstantin Shvachko
>Assignee: Chao Sun
>Priority: Major
>  Labels: newbie
>
> Should remove redundant import.
> Looks like it is specific to branch 2.10. Trunk and 3x branches don't have it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-15017) Remove redundant import of AtomicBoolean in NameNodeConnector.

2019-11-26 Thread Konstantin Shvachko (Jira)
Konstantin Shvachko created HDFS-15017:
--

 Summary: Remove redundant import of AtomicBoolean in 
NameNodeConnector.
 Key: HDFS-15017
 URL: https://issues.apache.org/jira/browse/HDFS-15017
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: balancer  mover, hdfs
Affects Versions: 2.10.0
Reporter: Konstantin Shvachko


Should remove redundant import.
Looks like it is specific to branch 2.10. Trunk and 3x branches don't have it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14973) Balancer getBlocks RPC dispersal does not function properly

2019-11-26 Thread Erik Krogen (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16983042#comment-16983042
 ] 

Erik Krogen commented on HDFS-14973:


Just committed v005 to branch-2 (and branch-2.10 for good measure, since the 
discussion on the mailing list hasn't finalized yet AFAICT). Thanks for the 
reviews [~shv]!

> Balancer getBlocks RPC dispersal does not function properly
> ---
>
> Key: HDFS-14973
> URL: https://issues.apache.org/jira/browse/HDFS-14973
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer  mover
>Affects Versions: 2.9.0, 2.7.4, 2.8.2, 3.0.0
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Major
> Fix For: 3.3.0, 3.1.4, 3.2.2, 2.10.1, 2.11.0
>
> Attachments: HDFS-14973-branch-2.003.patch, 
> HDFS-14973-branch-2.004.patch, HDFS-14973-branch-2.005.patch, 
> HDFS-14973.000.patch, HDFS-14973.001.patch, HDFS-14973.002.patch, 
> HDFS-14973.003.patch, HDFS-14973.test.patch
>
>
> In HDFS-11384, a mechanism was added to make the {{getBlocks}} RPC calls 
> issued by the Balancer/Mover more dispersed, to alleviate load on the 
> NameNode, since {{getBlocks}} can be very expensive and the Balancer should 
> not impact normal cluster operation.
> Unfortunately, this functionality does not function as expected, especially 
> when the dispatcher thread count is low. The primary issue is that the delay 
> is applied only to the first N threads that are submitted to the dispatcher's 
> executor, where N is the size of the dispatcher's threadpool, but *not* to 
> the first R threads, where R is the number of allowed {{getBlocks}} QPS 
> (currently hardcoded to 20). For example, if the threadpool size is 100 (the 
> default), threads 0-19 have no delay, 20-99 have increased levels of delay, 
> and 100+ have no delay. As I understand it, the intent of the logic was that 
> the delay applied to the first 100 threads would force the dispatcher 
> executor's threads to all be consumed, thus blocking subsequent (non-delayed) 
> threads until the delay period has expired. However, threads 0-19 can finish 
> very quickly (their work can often be fulfilled in the time it takes to 
> execute a single {{getBlocks}} RPC, on the order of tens of milliseconds), 
> thus opening up 20 new slots in the executor, which are then consumed by 
> non-delayed threads 100-119, and so on. So, although 80 threads have had a 
> delay applied, the non-delay threads rush through in the 20 non-delay slots.
> This problem gets even worse when the dispatcher threadpool size is less than 
> the max {{getBlocks}} QPS. For example, if the threadpool size is 10, _no 
> threads ever have a delay applied_, and the feature is not enabled at all.
> This problem wasn't surfaced in the original JIRA because the test 
> incorrectly measured the period across which {{getBlocks}} RPCs were 
> distributed. The variables {{startGetBlocksTime}} and {{endGetBlocksTime}} 
> were used to track the time over which the {{getBlocks}} calls were made. 
> However, {{startGetBlocksTime}} was initialized at the time of creation of 
> the {{FSNameystem}} spy, which is before the mock DataNodes are started. Even 
> worse, the Balancer in this test takes 2 iterations to complete balancing the 
> cluster, so the time period {{endGetBlocksTime - startGetBlocksTime}} 
> actually represents:
> {code}
> (time to submit getBlocks RPCs) + (DataNode startup time) + (time for the 
> Dispatcher to complete an iteration of moving blocks)
> {code}
> Thus, the RPC QPS reported by the test is much lower than the RPC QPS seen 
> during the period of initial block fetching.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14973) Balancer getBlocks RPC dispersal does not function properly

2019-11-26 Thread Erik Krogen (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Krogen updated HDFS-14973:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Balancer getBlocks RPC dispersal does not function properly
> ---
>
> Key: HDFS-14973
> URL: https://issues.apache.org/jira/browse/HDFS-14973
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer  mover
>Affects Versions: 2.9.0, 2.7.4, 2.8.2, 3.0.0
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Major
> Fix For: 3.3.0, 3.1.4, 3.2.2, 2.10.1, 2.11.0
>
> Attachments: HDFS-14973-branch-2.003.patch, 
> HDFS-14973-branch-2.004.patch, HDFS-14973-branch-2.005.patch, 
> HDFS-14973.000.patch, HDFS-14973.001.patch, HDFS-14973.002.patch, 
> HDFS-14973.003.patch, HDFS-14973.test.patch
>
>
> In HDFS-11384, a mechanism was added to make the {{getBlocks}} RPC calls 
> issued by the Balancer/Mover more dispersed, to alleviate load on the 
> NameNode, since {{getBlocks}} can be very expensive and the Balancer should 
> not impact normal cluster operation.
> Unfortunately, this functionality does not function as expected, especially 
> when the dispatcher thread count is low. The primary issue is that the delay 
> is applied only to the first N threads that are submitted to the dispatcher's 
> executor, where N is the size of the dispatcher's threadpool, but *not* to 
> the first R threads, where R is the number of allowed {{getBlocks}} QPS 
> (currently hardcoded to 20). For example, if the threadpool size is 100 (the 
> default), threads 0-19 have no delay, 20-99 have increased levels of delay, 
> and 100+ have no delay. As I understand it, the intent of the logic was that 
> the delay applied to the first 100 threads would force the dispatcher 
> executor's threads to all be consumed, thus blocking subsequent (non-delayed) 
> threads until the delay period has expired. However, threads 0-19 can finish 
> very quickly (their work can often be fulfilled in the time it takes to 
> execute a single {{getBlocks}} RPC, on the order of tens of milliseconds), 
> thus opening up 20 new slots in the executor, which are then consumed by 
> non-delayed threads 100-119, and so on. So, although 80 threads have had a 
> delay applied, the non-delay threads rush through in the 20 non-delay slots.
> This problem gets even worse when the dispatcher threadpool size is less than 
> the max {{getBlocks}} QPS. For example, if the threadpool size is 10, _no 
> threads ever have a delay applied_, and the feature is not enabled at all.
> This problem wasn't surfaced in the original JIRA because the test 
> incorrectly measured the period across which {{getBlocks}} RPCs were 
> distributed. The variables {{startGetBlocksTime}} and {{endGetBlocksTime}} 
> were used to track the time over which the {{getBlocks}} calls were made. 
> However, {{startGetBlocksTime}} was initialized at the time of creation of 
> the {{FSNameystem}} spy, which is before the mock DataNodes are started. Even 
> worse, the Balancer in this test takes 2 iterations to complete balancing the 
> cluster, so the time period {{endGetBlocksTime - startGetBlocksTime}} 
> actually represents:
> {code}
> (time to submit getBlocks RPCs) + (DataNode startup time) + (time for the 
> Dispatcher to complete an iteration of moving blocks)
> {code}
> Thus, the RPC QPS reported by the test is much lower than the RPC QPS seen 
> during the period of initial block fetching.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14973) Balancer getBlocks RPC dispersal does not function properly

2019-11-26 Thread Erik Krogen (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Krogen updated HDFS-14973:
---
Fix Version/s: 2.11.0
   2.10.1

> Balancer getBlocks RPC dispersal does not function properly
> ---
>
> Key: HDFS-14973
> URL: https://issues.apache.org/jira/browse/HDFS-14973
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer  mover
>Affects Versions: 2.9.0, 2.7.4, 2.8.2, 3.0.0
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Major
> Fix For: 3.3.0, 3.1.4, 3.2.2, 2.10.1, 2.11.0
>
> Attachments: HDFS-14973-branch-2.003.patch, 
> HDFS-14973-branch-2.004.patch, HDFS-14973-branch-2.005.patch, 
> HDFS-14973.000.patch, HDFS-14973.001.patch, HDFS-14973.002.patch, 
> HDFS-14973.003.patch, HDFS-14973.test.patch
>
>
> In HDFS-11384, a mechanism was added to make the {{getBlocks}} RPC calls 
> issued by the Balancer/Mover more dispersed, to alleviate load on the 
> NameNode, since {{getBlocks}} can be very expensive and the Balancer should 
> not impact normal cluster operation.
> Unfortunately, this functionality does not function as expected, especially 
> when the dispatcher thread count is low. The primary issue is that the delay 
> is applied only to the first N threads that are submitted to the dispatcher's 
> executor, where N is the size of the dispatcher's threadpool, but *not* to 
> the first R threads, where R is the number of allowed {{getBlocks}} QPS 
> (currently hardcoded to 20). For example, if the threadpool size is 100 (the 
> default), threads 0-19 have no delay, 20-99 have increased levels of delay, 
> and 100+ have no delay. As I understand it, the intent of the logic was that 
> the delay applied to the first 100 threads would force the dispatcher 
> executor's threads to all be consumed, thus blocking subsequent (non-delayed) 
> threads until the delay period has expired. However, threads 0-19 can finish 
> very quickly (their work can often be fulfilled in the time it takes to 
> execute a single {{getBlocks}} RPC, on the order of tens of milliseconds), 
> thus opening up 20 new slots in the executor, which are then consumed by 
> non-delayed threads 100-119, and so on. So, although 80 threads have had a 
> delay applied, the non-delay threads rush through in the 20 non-delay slots.
> This problem gets even worse when the dispatcher threadpool size is less than 
> the max {{getBlocks}} QPS. For example, if the threadpool size is 10, _no 
> threads ever have a delay applied_, and the feature is not enabled at all.
> This problem wasn't surfaced in the original JIRA because the test 
> incorrectly measured the period across which {{getBlocks}} RPCs were 
> distributed. The variables {{startGetBlocksTime}} and {{endGetBlocksTime}} 
> were used to track the time over which the {{getBlocks}} calls were made. 
> However, {{startGetBlocksTime}} was initialized at the time of creation of 
> the {{FSNameystem}} spy, which is before the mock DataNodes are started. Even 
> worse, the Balancer in this test takes 2 iterations to complete balancing the 
> cluster, so the time period {{endGetBlocksTime - startGetBlocksTime}} 
> actually represents:
> {code}
> (time to submit getBlocks RPCs) + (DataNode startup time) + (time for the 
> Dispatcher to complete an iteration of moving blocks)
> {code}
> Thus, the RPC QPS reported by the test is much lower than the RPC QPS seen 
> during the period of initial block fetching.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14973) Balancer getBlocks RPC dispersal does not function properly

2019-11-26 Thread Konstantin Shvachko (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16983030#comment-16983030
 ] 

Konstantin Shvachko commented on HDFS-14973:


+1 for v05 patch.

> Balancer getBlocks RPC dispersal does not function properly
> ---
>
> Key: HDFS-14973
> URL: https://issues.apache.org/jira/browse/HDFS-14973
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer  mover
>Affects Versions: 2.9.0, 2.7.4, 2.8.2, 3.0.0
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Major
> Fix For: 3.3.0, 3.1.4, 3.2.2
>
> Attachments: HDFS-14973-branch-2.003.patch, 
> HDFS-14973-branch-2.004.patch, HDFS-14973-branch-2.005.patch, 
> HDFS-14973.000.patch, HDFS-14973.001.patch, HDFS-14973.002.patch, 
> HDFS-14973.003.patch, HDFS-14973.test.patch
>
>
> In HDFS-11384, a mechanism was added to make the {{getBlocks}} RPC calls 
> issued by the Balancer/Mover more dispersed, to alleviate load on the 
> NameNode, since {{getBlocks}} can be very expensive and the Balancer should 
> not impact normal cluster operation.
> Unfortunately, this functionality does not function as expected, especially 
> when the dispatcher thread count is low. The primary issue is that the delay 
> is applied only to the first N threads that are submitted to the dispatcher's 
> executor, where N is the size of the dispatcher's threadpool, but *not* to 
> the first R threads, where R is the number of allowed {{getBlocks}} QPS 
> (currently hardcoded to 20). For example, if the threadpool size is 100 (the 
> default), threads 0-19 have no delay, 20-99 have increased levels of delay, 
> and 100+ have no delay. As I understand it, the intent of the logic was that 
> the delay applied to the first 100 threads would force the dispatcher 
> executor's threads to all be consumed, thus blocking subsequent (non-delayed) 
> threads until the delay period has expired. However, threads 0-19 can finish 
> very quickly (their work can often be fulfilled in the time it takes to 
> execute a single {{getBlocks}} RPC, on the order of tens of milliseconds), 
> thus opening up 20 new slots in the executor, which are then consumed by 
> non-delayed threads 100-119, and so on. So, although 80 threads have had a 
> delay applied, the non-delay threads rush through in the 20 non-delay slots.
> This problem gets even worse when the dispatcher threadpool size is less than 
> the max {{getBlocks}} QPS. For example, if the threadpool size is 10, _no 
> threads ever have a delay applied_, and the feature is not enabled at all.
> This problem wasn't surfaced in the original JIRA because the test 
> incorrectly measured the period across which {{getBlocks}} RPCs were 
> distributed. The variables {{startGetBlocksTime}} and {{endGetBlocksTime}} 
> were used to track the time over which the {{getBlocks}} calls were made. 
> However, {{startGetBlocksTime}} was initialized at the time of creation of 
> the {{FSNameystem}} spy, which is before the mock DataNodes are started. Even 
> worse, the Balancer in this test takes 2 iterations to complete balancing the 
> cluster, so the time period {{endGetBlocksTime - startGetBlocksTime}} 
> actually represents:
> {code}
> (time to submit getBlocks RPCs) + (DataNode startup time) + (time for the 
> Dispatcher to complete an iteration of moving blocks)
> {code}
> Thus, the RPC QPS reported by the test is much lower than the RPC QPS seen 
> during the period of initial block fetching.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15010) BlockPoolSlice#addReplicaThreadPool static pool should be initialized by static method

2019-11-26 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16982945#comment-16982945
 ] 

Hadoop QA commented on HDFS-15010:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
18s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 
20s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
59s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
42s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
6s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m 40s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
12s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
11s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 25s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
9s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 83m  5s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
30s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}146m 12s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hdfs.server.datanode.fsdataset.impl.TestFsVolumeList |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.4 Server=19.03.4 Image:yetus/hadoop:104ccca9169 |
| JIRA Issue | HDFS-15010 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12986834/HDFS-15010.03.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 2e0efb6d4134 4.15.0-66-generic #75-Ubuntu SMP Tue Oct 1 
05:24:09 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 3161813 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_222 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28400/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28400/testReport/ |
| Max. process+thread count | 3339 (vs. ulimit of 5500) |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28400/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |

[jira] [Commented] (HDFS-15016) RBF: getDatanodeReport() should return the latest update

2019-11-26 Thread Jira


[ 
https://issues.apache.org/jira/browse/HDFS-15016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16982940#comment-16982940
 ] 

Íñigo Goiri commented on HDFS-15016:


Right now, we have DataNodes that change subclusters.
The Router is just returning a random one so it can be that is returning a dead 
one.

It could just do something like:
{code}
for (Entry entry :
results.entrySet()) {
  FederationNamespaceInfo ns = entry.getKey();
  DatanodeInfo[] result = entry.getValue();
  for (DatanodeInfo node : result) {
String nodeId = node.getXferAddr();
if (!datanodesMap.containsKey(nodeId) ||
node.getLastUpdate() > datanodesMap.get(nodeId).getLastUpdate()) {
  // Add the subcluster as a suffix to the network location
  node.setNetworkLocation(
  NodeBase.PATH_SEPARATOR_STR + ns.getNameserviceId() +
  node.getNetworkLocation());
  datanodesMap.put(nodeId, node);
} else {
  LOG.debug("{} is in multiple subclusters", nodeId);
}
  }
}
{code}

> RBF: getDatanodeReport() should return the latest update
> 
>
> Key: HDFS-15016
> URL: https://issues.apache.org/jira/browse/HDFS-15016
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Íñigo Goiri
>Priority: Major
>
> Currently, when the Router calls getDatanodeReport() (or 
> getDatanodeStorageReport()) and the DN is in multiple clusters, it just takes 
> the one that comes first. It should consider the latest update.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-15016) RBF: getDatanodeReport() should return the latest update

2019-11-26 Thread Jira
Íñigo Goiri created HDFS-15016:
--

 Summary: RBF: getDatanodeReport() should return the latest update
 Key: HDFS-15016
 URL: https://issues.apache.org/jira/browse/HDFS-15016
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Íñigo Goiri


Currently, when the Router calls getDatanodeReport() (or 
getDatanodeStorageReport()) and the DN is in multiple clusters, it just takes 
the one that comes first. It should consider the latest update.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15010) BlockPoolSlice#addReplicaThreadPool static pool should be initialized by static method

2019-11-26 Thread Jira


[ 
https://issues.apache.org/jira/browse/HDFS-15010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16982915#comment-16982915
 ] 

Íñigo Goiri commented on HDFS-15010:


BTW, I checked the test and is a little long (~5 seconds).
Is there something we can save? We should definetely stop the MiniDFSCluster.

> BlockPoolSlice#addReplicaThreadPool static pool should be initialized by 
> static method
> --
>
> Key: HDFS-15010
> URL: https://issues.apache.org/jira/browse/HDFS-15010
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 3.1.2
>Reporter: Surendra Singh Lilhore
>Assignee: Surendra Singh Lilhore
>Priority: Major
> Attachments: HDFS-15010.001.patch, HDFS-15010.02.patch, 
> HDFS-15010.03.patch
>
>
> {{BlockPoolSlice#initializeAddReplicaPool()}} method currently initialize the 
> static thread pool instance. But when two {{BPServiceActor}} actor try to 
> load block pool parallelly then it may create different instance. 
> So {{BlockPoolSlice#initializeAddReplicaPool()}} method should be a static 
> method.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14901) RBF: Add Encryption Zone related ClientProtocol APIs

2019-11-26 Thread Jira


[ 
https://issues.apache.org/jira/browse/HDFS-14901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16982908#comment-16982908
 ] 

Íñigo Goiri commented on HDFS-14901:


In the test setup, is there any option we can do something smarter than 
sleeping 100ms?

> RBF: Add Encryption Zone related ClientProtocol APIs
> 
>
> Key: HDFS-14901
> URL: https://issues.apache.org/jira/browse/HDFS-14901
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-14901.001.patch, HDFS-14901.002.patch
>
>
> Currently listEncryptionZones,reencryptEncryptionZone,listReencryptionStatus 
> these APIs are not implemented in Router.
> This JIRA is intend to implement above mentioned APIs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14997) BPServiceActor process command from NameNode asynchronously

2019-11-26 Thread Jira


[ 
https://issues.apache.org/jira/browse/HDFS-14997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16982901#comment-16982901
 ] 

Íñigo Goiri commented on HDFS-14997:


In addition to the checkstyles we should use the logger format in 
{{CommandProcessingThread}}.
Other than that, it looks good.

> BPServiceActor process command from NameNode asynchronously
> ---
>
> Key: HDFS-14997
> URL: https://issues.apache.org/jira/browse/HDFS-14997
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Xiaoqiao He
>Assignee: Xiaoqiao He
>Priority: Major
> Attachments: HDFS-14997.001.patch, HDFS-14997.002.patch, 
> HDFS-14997.003.patch, HDFS-14997.004.patch
>
>
> There are two core functions, report(#sendHeartbeat, #blockReport, 
> #cacheReport) and #processCommand in #BPServiceActor main process flow. If 
> processCommand cost long time it will block send report flow. Meanwhile 
> processCommand could cost long time(over 1000s the worst case I meet) when IO 
> load  of DataNode is very high. Since some IO operations are under 
> #datasetLock, So it has to wait to acquire #datasetLock long time when 
> process some of commands(such as #DNA_INVALIDATE). In such case, #heartbeat 
> will not send to NameNode in-time, and trigger other disasters.
> I propose to improve #processCommand asynchronously and not block 
> #BPServiceActor to send heartbeat back to NameNode when meet high IO load.
> Notes:
> 1. Lifeline could be one effective solution, however some old branches are 
> not support this feature.
> 2. IO operations under #datasetLock is another issue, I think we should solve 
> it at another JIRA.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15010) BlockPoolSlice#addReplicaThreadPool static pool should be initialized by static method

2019-11-26 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16982896#comment-16982896
 ] 

Hadoop QA commented on HDFS-15010:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
33s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
28s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
0s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
42s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
8s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 36s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
24s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
55s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
55s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 26s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
14s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}100m 50s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
43s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}159m 15s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hdfs.server.blockmanagement.TestUnderReplicatedBlocks |
|   | hadoop.hdfs.TestDecommission |
|   | hadoop.hdfs.TestFileChecksumCompositeCrc |
|   | hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA |
|   | hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication |
|   | hadoop.hdfs.server.namenode.ha.TestEditLogsDuringFailover |
|   | hadoop.hdfs.TestDeadNodeDetection |
|   | hadoop.hdfs.server.namenode.ha.TestObserverNode |
|   | hadoop.hdfs.server.namenode.ha.TestStandbyCheckpoints |
|   | hadoop.hdfs.server.namenode.ha.TestPendingCorruptDnMessages |
|   | hadoop.hdfs.TestReconstructStripedFile |
|   | hadoop.hdfs.server.namenode.ha.TestEditLogTailer |
|   | hadoop.hdfs.server.namenode.ha.TestBootstrapAliasmap |
|   | hadoop.hdfs.TestLeaseRecovery2 |
|   | hadoop.hdfs.TestBlockStoragePolicy |
|   | hadoop.hdfs.TestDFSClientRetries |
|   | hadoop.hdfs.server.datanode.fsdataset.impl.TestFsVolumeList |
|   | hadoop.hdfs.TestFileChecksum |
|   | hadoop.hdfs.TestDistributedFileSystemWithECFile |
|   | hadoop.hdfs.TestDFSStripedOutputStreamWithRandomECPolicy |
|   | hadoop.hdfs.TestLocalDFS |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.5 Server=19.03.5 Image:yetus/hadoop:104ccca9169 |
| JIRA Issue | HDFS-15010 |
| JIRA Patch URL | 

[jira] [Commented] (HDFS-9695) HTTPFS - CHECKACCESS operation missing

2019-11-26 Thread Jira


[ 
https://issues.apache.org/jira/browse/HDFS-9695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16982884#comment-16982884
 ] 

Íñigo Goiri commented on HDFS-9695:
---

Minor comments:
* {{fsActionSpecToString()}} is a little weird, can it just be {{return 
aclSpec.toString()}}? Actually, you could just do it directly in {{access()}}.
* In the test, the {{assertEquals()}} should go the other way (expected first).

> HTTPFS - CHECKACCESS operation missing
> --
>
> Key: HDFS-9695
> URL: https://issues.apache.org/jira/browse/HDFS-9695
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Bert Hekman
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-9695.001.patch
>
>
> Hi,
> The CHECKACCESS operation seems to be missing in HTTPFS. I'm getting the 
> following error:
> {code}
> QueryParamException: java.lang.IllegalArgumentException: No enum constant 
> org.apache.hadoop.fs.http.client.HttpFSFileSystem.Operation.CHECKACCESS
> {code}
> A quick look into the org.apache.hadoop.fs.http.client.HttpFSFileSystem class 
> reveals that CHECKACCESS is not defined at all.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-9695) HTTPFS - CHECKACCESS operation missing

2019-11-26 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HDFS-9695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Íñigo Goiri reassigned HDFS-9695:
-

Assignee: hemanthboyina

> HTTPFS - CHECKACCESS operation missing
> --
>
> Key: HDFS-9695
> URL: https://issues.apache.org/jira/browse/HDFS-9695
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Bert Hekman
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-9695.001.patch
>
>
> Hi,
> The CHECKACCESS operation seems to be missing in HTTPFS. I'm getting the 
> following error:
> {code}
> QueryParamException: java.lang.IllegalArgumentException: No enum constant 
> org.apache.hadoop.fs.http.client.HttpFSFileSystem.Operation.CHECKACCESS
> {code}
> A quick look into the org.apache.hadoop.fs.http.client.HttpFSFileSystem class 
> reveals that CHECKACCESS is not defined at all.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15014) RBF: WebHdfs chooseDatanode shouldn't call getDatanodeReport

2019-11-26 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HDFS-15014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Íñigo Goiri updated HDFS-15014:
---
Summary: RBF: WebHdfs chooseDatanode shouldn't call getDatanodeReport   
(was: [RBF] WebHdfs chooseDatanode shouldn't call getDatanodeReport )

> RBF: WebHdfs chooseDatanode shouldn't call getDatanodeReport 
> -
>
> Key: HDFS-15014
> URL: https://issues.apache.org/jira/browse/HDFS-15014
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Reporter: Chao Sun
>Priority: Major
>
> Currently the {{chooseDatanode}} call (which is shared by {{open}}, 
> {{create}}, {{append}} and {{getFileChecksum}}) in RBF WebHDFS calls 
> {{getDatanodeReport}} from ALL downstream namenodes:
> {code}
>   private DatanodeInfo chooseDatanode(final Router router,
>   final String path, final HttpOpParam.Op op, final long openOffset,
>   final String excludeDatanodes) throws IOException {
> // We need to get the DNs as a privileged user
> final RouterRpcServer rpcServer = getRPCServer(router);
> UserGroupInformation loginUser = UserGroupInformation.getLoginUser();
> RouterRpcServer.setCurrentUser(loginUser);
> DatanodeInfo[] dns = null;
> try {
>   dns = rpcServer.getDatanodeReport(DatanodeReportType.LIVE);
> } catch (IOException e) {
>   LOG.error("Cannot get the datanodes from the RPC server", e);
> } finally {
>   // Reset ugi to remote user for remaining operations.
>   RouterRpcServer.resetCurrentUser();
> }
> HashSet excludes = new HashSet();
> if (excludeDatanodes != null) {
>   Collection collection =
>   getTrimmedStringCollection(excludeDatanodes);
>   for (DatanodeInfo dn : dns) {
> if (collection.contains(dn.getName())) {
>   excludes.add(dn);
> }
>   }
> }
> ...
> {code}
> The {{getDatanodeReport}} is very expensive (particularly in a large cluster) 
> as it need to lock the {{DatanodeManager}} which is also shared by calls such 
> as processing heartbeats. Check HDFS-14366 for a similar issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-9695) HTTPFS - CHECKACCESS operation missing

2019-11-26 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-9695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16982852#comment-16982852
 ] 

Hadoop QA commented on HDFS-9695:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
27s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 
44s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
33s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
35s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 51s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
36s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
24s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
19s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 25s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs-httpfs: The 
patch generated 1 new + 428 unchanged - 0 fixed = 429 total (was 428) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 42s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
21s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  4m 
36s{color} | {color:green} hadoop-hdfs-httpfs in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
29s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 60m 33s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.5 Server=19.03.5 Image:yetus/hadoop:104ccca9169 |
| JIRA Issue | HDFS-9695 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12986836/HDFS-9695.001.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 09aa7cf4da37 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 3161813 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_222 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28401/artifact/out/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs-httpfs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28401/testReport/ |
| Max. process+thread count | 629 (vs. ulimit of 5500) |
| modules | C: hadoop-hdfs-project/hadoop-hdfs-httpfs U: 
hadoop-hdfs-project/hadoop-hdfs-httpfs |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28401/console |
| Powered by | Apache Yetus 

[jira] [Commented] (HDFS-14997) BPServiceActor process command from NameNode asynchronously

2019-11-26 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16982829#comment-16982829
 ] 

Hadoop QA commented on HDFS-14997:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
46s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
10s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
1s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
44s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
5s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 26s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
11s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
56s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 39s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch 
generated 2 new + 112 unchanged - 5 fixed = 114 total (was 117) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 12s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
18s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}120m 22s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
32s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}183m  5s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.TestFileChecksum |
|   | hadoop.hdfs.TestDecommission |
|   | hadoop.hdfs.TestReconstructStripedFileWithRandomECPolicy |
|   | hadoop.hdfs.server.namenode.TestRedudantBlocks |
|   | hadoop.hdfs.server.namenode.snapshot.TestRenameWithSnapshots |
|   | hadoop.hdfs.TestDFSUpgradeFromImage |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.5 Server=19.03.5 Image:yetus/hadoop:104ccca9169 |
| JIRA Issue | HDFS-14997 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12986825/HDFS-14997.004.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 9d136b260516 4.15.0-66-generic #75-Ubuntu SMP Tue Oct 1 
05:24:09 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 52e9ee3 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_222 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28396/artifact/out/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
 |
| unit | 

[jira] [Commented] (HDFS-14973) Balancer getBlocks RPC dispersal does not function properly

2019-11-26 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16982827#comment-16982827
 ] 

Hadoop QA commented on HDFS-14973:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 17m  
5s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} branch-2 Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 14m 
44s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
2s{color} | {color:green} branch-2 passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
9s{color} | {color:green} branch-2 passed with JDK v1.8.0_222 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
52s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
7s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
21s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
20s{color} | {color:green} branch-2 passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
52s{color} | {color:green} branch-2 passed with JDK v1.8.0_222 {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 7s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
0s{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
55s{color} | {color:green} the patch passed with JDK v1.8.0_222 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
55s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 41s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch 
generated 1 new + 644 unchanged - 1 fixed = 645 total (was 645) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
7s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
19s{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
48s{color} | {color:green} the patch passed with JDK v1.8.0_222 {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 86m  9s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
35s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}139m 41s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.server.namenode.TestDeleteRace |
|   | hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistReplicaRecovery 
|
|   | hadoop.hdfs.server.namenode.ha.TestSeveralNameNodes |
|   | hadoop.hdfs.qjournal.server.TestJournalNodeRespectsBindHostKeys |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.5 Server=19.03.5 Image:yetus/hadoop:f555aa740b5 |
| JIRA Issue | HDFS-14973 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12986827/HDFS-14973-branch-2.005.patch
 |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  

[jira] [Commented] (HDFS-14901) RBF: Add Encryption Zone related ClientProtocol APIs

2019-11-26 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16982820#comment-16982820
 ] 

Hadoop QA commented on HDFS-14901:
--

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 23m 
10s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
 2s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
28s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
19s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
30s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m  0s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
8s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
49s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
16s{color} | {color:green} hadoop-hdfs-project/hadoop-hdfs-rbf: The patch 
generated 0 new + 4 unchanged - 1 fixed = 4 total (was 5) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 56s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
7s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  7m  
1s{color} | {color:green} hadoop-hdfs-rbf in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
25s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 85m 10s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.4 Server=19.03.4 Image:yetus/hadoop:104ccca9169 |
| JIRA Issue | HDFS-14901 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12986832/HDFS-14901.002.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 3616f5b87ed9 4.15.0-66-generic #75-Ubuntu SMP Tue Oct 1 
05:24:09 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 52e9ee3 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_222 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28399/testReport/ |
| Max. process+thread count | 2440 (vs. ulimit of 5500) |
| modules | C: hadoop-hdfs-project/hadoop-hdfs-rbf U: 
hadoop-hdfs-project/hadoop-hdfs-rbf |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28399/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> RBF: Add Encryption Zone related ClientProtocol APIs
> 

[jira] [Updated] (HDFS-15015) Backport HDFS-5040 to branch-2

2019-11-26 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HDFS-15015:

Attachment: (was: HDFS-15015-branch-2.000.patch)

> Backport HDFS-5040 to branch-2
> --
>
> Key: HDFS-15015
> URL: https://issues.apache.org/jira/browse/HDFS-15015
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: logging
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
>
> HDFS-5040 added audit logging for several admin commands which are useful for 
> diagnosing and debugging. For instance, {{getDatanodeReport}} is an expensive 
> call and can be invoked by components such as RBF for metrics and others. 
> It's better to track them in audit log.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15015) Backport HDFS-5040 to branch-2

2019-11-26 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HDFS-15015:

Attachment: HDFS-15015-branch-2.000.patch

> Backport HDFS-5040 to branch-2
> --
>
> Key: HDFS-15015
> URL: https://issues.apache.org/jira/browse/HDFS-15015
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: logging
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
> Attachments: HDFS-15015-branch-2.000.patch
>
>
> HDFS-5040 added audit logging for several admin commands which are useful for 
> diagnosing and debugging. For instance, {{getDatanodeReport}} is an expensive 
> call and can be invoked by components such as RBF for metrics and others. 
> It's better to track them in audit log.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15015) Backport HDFS-5040 to branch-2

2019-11-26 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HDFS-15015:

Status: Patch Available  (was: Open)

> Backport HDFS-5040 to branch-2
> --
>
> Key: HDFS-15015
> URL: https://issues.apache.org/jira/browse/HDFS-15015
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: logging
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
> Attachments: HDFS-15015-branch-2.000.patch
>
>
> HDFS-5040 added audit logging for several admin commands which are useful for 
> diagnosing and debugging. For instance, {{getDatanodeReport}} is an expensive 
> call and can be invoked by components such as RBF for metrics and others. 
> It's better to track them in audit log.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-15015) Backport HDFS-5040 to branch-2

2019-11-26 Thread Chao Sun (Jira)
Chao Sun created HDFS-15015:
---

 Summary: Backport HDFS-5040 to branch-2
 Key: HDFS-15015
 URL: https://issues.apache.org/jira/browse/HDFS-15015
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: logging
Reporter: Chao Sun
Assignee: Chao Sun


HDFS-5040 added audit logging for several admin commands which are useful for 
diagnosing and debugging. For instance, {{getDatanodeReport}} is an expensive 
call and can be invoked by components such as RBF for metrics and others. It's 
better to track them in audit log.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-15014) [RBF] WebHdfs chooseDatanode shouldn't call getDatanodeReport

2019-11-26 Thread Chao Sun (Jira)
Chao Sun created HDFS-15014:
---

 Summary: [RBF] WebHdfs chooseDatanode shouldn't call 
getDatanodeReport 
 Key: HDFS-15014
 URL: https://issues.apache.org/jira/browse/HDFS-15014
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: rbf
Reporter: Chao Sun


Currently the {{chooseDatanode}} call (which is shared by {{open}}, {{create}}, 
{{append}} and {{getFileChecksum}}) in RBF WebHDFS calls {{getDatanodeReport}} 
from ALL downstream namenodes:

{code}
  private DatanodeInfo chooseDatanode(final Router router,
  final String path, final HttpOpParam.Op op, final long openOffset,
  final String excludeDatanodes) throws IOException {
// We need to get the DNs as a privileged user
final RouterRpcServer rpcServer = getRPCServer(router);
UserGroupInformation loginUser = UserGroupInformation.getLoginUser();
RouterRpcServer.setCurrentUser(loginUser);

DatanodeInfo[] dns = null;
try {
  dns = rpcServer.getDatanodeReport(DatanodeReportType.LIVE);
} catch (IOException e) {
  LOG.error("Cannot get the datanodes from the RPC server", e);
} finally {
  // Reset ugi to remote user for remaining operations.
  RouterRpcServer.resetCurrentUser();
}

HashSet excludes = new HashSet();
if (excludeDatanodes != null) {
  Collection collection =
  getTrimmedStringCollection(excludeDatanodes);
  for (DatanodeInfo dn : dns) {
if (collection.contains(dn.getName())) {
  excludes.add(dn);
}
  }
}
...
{code}

The {{getDatanodeReport}} is very expensive (particularly in a large cluster) 
as it need to lock the {{DatanodeManager}} which is also shared by calls such 
as processing heartbeats. Check HDFS-14366 for a similar issue.





--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-9695) HTTPFS - CHECKACCESS operation missing

2019-11-26 Thread hemanthboyina (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-9695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16982799#comment-16982799
 ] 

hemanthboyina commented on HDFS-9695:
-

can you please review the patch [~elgoiri] [~tasanuma]

> HTTPFS - CHECKACCESS operation missing
> --
>
> Key: HDFS-9695
> URL: https://issues.apache.org/jira/browse/HDFS-9695
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Bert Hekman
>Priority: Major
> Attachments: HDFS-9695.001.patch
>
>
> Hi,
> The CHECKACCESS operation seems to be missing in HTTPFS. I'm getting the 
> following error:
> {code}
> QueryParamException: java.lang.IllegalArgumentException: No enum constant 
> org.apache.hadoop.fs.http.client.HttpFSFileSystem.Operation.CHECKACCESS
> {code}
> A quick look into the org.apache.hadoop.fs.http.client.HttpFSFileSystem class 
> reveals that CHECKACCESS is not defined at all.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-9695) HTTPFS - CHECKACCESS operation missing

2019-11-26 Thread hemanthboyina (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-9695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hemanthboyina updated HDFS-9695:

Attachment: HDFS-9695.001.patch
Status: Patch Available  (was: Open)

> HTTPFS - CHECKACCESS operation missing
> --
>
> Key: HDFS-9695
> URL: https://issues.apache.org/jira/browse/HDFS-9695
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Bert Hekman
>Priority: Major
> Attachments: HDFS-9695.001.patch
>
>
> Hi,
> The CHECKACCESS operation seems to be missing in HTTPFS. I'm getting the 
> following error:
> {code}
> QueryParamException: java.lang.IllegalArgumentException: No enum constant 
> org.apache.hadoop.fs.http.client.HttpFSFileSystem.Operation.CHECKACCESS
> {code}
> A quick look into the org.apache.hadoop.fs.http.client.HttpFSFileSystem class 
> reveals that CHECKACCESS is not defined at all.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15010) BlockPoolSlice#addReplicaThreadPool static pool should be initialized by static method

2019-11-26 Thread Surendra Singh Lilhore (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16982789#comment-16982789
 ] 

Surendra Singh Lilhore commented on HDFS-15010:
---

Thanks [~elgoiri] , attached v3 patch.

> BlockPoolSlice#addReplicaThreadPool static pool should be initialized by 
> static method
> --
>
> Key: HDFS-15010
> URL: https://issues.apache.org/jira/browse/HDFS-15010
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 3.1.2
>Reporter: Surendra Singh Lilhore
>Assignee: Surendra Singh Lilhore
>Priority: Major
> Attachments: HDFS-15010.001.patch, HDFS-15010.02.patch, 
> HDFS-15010.03.patch
>
>
> {{BlockPoolSlice#initializeAddReplicaPool()}} method currently initialize the 
> static thread pool instance. But when two {{BPServiceActor}} actor try to 
> load block pool parallelly then it may create different instance. 
> So {{BlockPoolSlice#initializeAddReplicaPool()}} method should be a static 
> method.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15010) BlockPoolSlice#addReplicaThreadPool static pool should be initialized by static method

2019-11-26 Thread Surendra Singh Lilhore (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Surendra Singh Lilhore updated HDFS-15010:
--
Attachment: HDFS-15010.03.patch

> BlockPoolSlice#addReplicaThreadPool static pool should be initialized by 
> static method
> --
>
> Key: HDFS-15010
> URL: https://issues.apache.org/jira/browse/HDFS-15010
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 3.1.2
>Reporter: Surendra Singh Lilhore
>Assignee: Surendra Singh Lilhore
>Priority: Major
> Attachments: HDFS-15010.001.patch, HDFS-15010.02.patch, 
> HDFS-15010.03.patch
>
>
> {{BlockPoolSlice#initializeAddReplicaPool()}} method currently initialize the 
> static thread pool instance. But when two {{BPServiceActor}} actor try to 
> load block pool parallelly then it may create different instance. 
> So {{BlockPoolSlice#initializeAddReplicaPool()}} method should be a static 
> method.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14901) RBF: Add Encryption Zone related ClientProtocol APIs

2019-11-26 Thread hemanthboyina (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16982759#comment-16982759
 ] 

hemanthboyina commented on HDFS-14901:
--

updated the patch , please review

> RBF: Add Encryption Zone related ClientProtocol APIs
> 
>
> Key: HDFS-14901
> URL: https://issues.apache.org/jira/browse/HDFS-14901
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-14901.001.patch, HDFS-14901.002.patch
>
>
> Currently listEncryptionZones,reencryptEncryptionZone,listReencryptionStatus 
> these APIs are not implemented in Router.
> This JIRA is intend to implement above mentioned APIs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14901) RBF: Add Encryption Zone related ClientProtocol APIs

2019-11-26 Thread hemanthboyina (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hemanthboyina updated HDFS-14901:
-
Attachment: HDFS-14901.002.patch

> RBF: Add Encryption Zone related ClientProtocol APIs
> 
>
> Key: HDFS-14901
> URL: https://issues.apache.org/jira/browse/HDFS-14901
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-14901.001.patch, HDFS-14901.002.patch
>
>
> Currently listEncryptionZones,reencryptEncryptionZone,listReencryptionStatus 
> these APIs are not implemented in Router.
> This JIRA is intend to implement above mentioned APIs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15012) NN fails to parse Edit logs after applying HDFS-13101

2019-11-26 Thread Surendra Singh Lilhore (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16982748#comment-16982748
 ] 

Surendra Singh Lilhore commented on HDFS-15012:
---

[~ericlin], can you attach test patch to reproduce this issue ?

> NN fails to parse Edit logs after applying HDFS-13101
> -
>
> Key: HDFS-15012
> URL: https://issues.apache.org/jira/browse/HDFS-15012
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: nn
>Reporter: Eric Lin
>Priority: Critical
>
> After applying HDFS-13101, and deleting and creating large number of 
> snapshots, SNN exited with below error:
>   
> {code:sh}
> 2019-11-18 08:28:06,528 ERROR 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: Encountered exception 
> on operation DeleteSnapshotOp [snapshotRoot=/path/to/hdfs/file, 
> snapshotName=distcp-3479-31-old, 
> RpcClientId=b16a6cb5-bdbb-45ae-9f9a-f7dc57931f37, Rpc
> CallId=1]
> java.lang.AssertionError: Element already exists: 
> element=partition_isactive=true, DELETED=[partition_isactive=true]
> at org.apache.hadoop.hdfs.util.Diff.insert(Diff.java:193)
> at org.apache.hadoop.hdfs.util.Diff.delete(Diff.java:239)
> at org.apache.hadoop.hdfs.util.Diff.combinePosterior(Diff.java:462)
> at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature$DirectoryDiff$2.initChildren(DirectoryWithSnapshotFeature.java:240)
> at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature$DirectoryDiff$2.iterator(DirectoryWithSnapshotFeature.java:250)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtreeRecursively(INodeDirectory.java:755)
> at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature.cleanDirectory(DirectoryWithSnapshotFeature.java:753)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtree(INodeDirectory.java:790)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeReference.cleanSubtree(INodeReference.java:332)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeReference$WithName.cleanSubtree(INodeReference.java:583)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtreeRecursively(INodeDirectory.java:760)
> at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature.cleanDirectory(DirectoryWithSnapshotFeature.java:753)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtree(INodeDirectory.java:790)
> at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.DirectorySnapshottableFeature.removeSnapshot(DirectorySnapshottableFeature.java:235)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.removeSnapshot(INodeDirectory.java:259)
> at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.SnapshotManager.deleteSnapshot(SnapshotManager.java:301)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:688)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:232)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:141)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:903)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:756)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:324)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1144)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:796)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:614)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:676)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:844)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:823)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1547)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1615)
> {code}
> We confirmed that fsimage and edit files were NOT corrupted, as reverting 
> HDFS-13101 fixed the issue. So the logic introduced in HDFS-13101 is broken 
> and failed to parse edit log files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15010) BlockPoolSlice#addReplicaThreadPool static pool should be initialized by static method

2019-11-26 Thread Jira


[ 
https://issues.apache.org/jira/browse/HDFS-15010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16982745#comment-16982745
 ] 

Íñigo Goiri commented on HDFS-15010:


Probably we should make the assertTrue an assertEquals.
Other than that it looks good.

> BlockPoolSlice#addReplicaThreadPool static pool should be initialized by 
> static method
> --
>
> Key: HDFS-15010
> URL: https://issues.apache.org/jira/browse/HDFS-15010
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 3.1.2
>Reporter: Surendra Singh Lilhore
>Assignee: Surendra Singh Lilhore
>Priority: Major
> Attachments: HDFS-15010.001.patch, HDFS-15010.02.patch
>
>
> {{BlockPoolSlice#initializeAddReplicaPool()}} method currently initialize the 
> static thread pool instance. But when two {{BPServiceActor}} actor try to 
> load block pool parallelly then it may create different instance. 
> So {{BlockPoolSlice#initializeAddReplicaPool()}} method should be a static 
> method.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15010) BlockPoolSlice#addReplicaThreadPool static pool should be initialized by static method

2019-11-26 Thread Surendra Singh Lilhore (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16982744#comment-16982744
 ] 

Surendra Singh Lilhore commented on HDFS-15010:
---

Thanks [~elgoiri]  for review.

Just added test case to verify that it will not affect anymore, but it will not 
reproduce this issue.

> BlockPoolSlice#addReplicaThreadPool static pool should be initialized by 
> static method
> --
>
> Key: HDFS-15010
> URL: https://issues.apache.org/jira/browse/HDFS-15010
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 3.1.2
>Reporter: Surendra Singh Lilhore
>Assignee: Surendra Singh Lilhore
>Priority: Major
> Attachments: HDFS-15010.001.patch, HDFS-15010.02.patch
>
>
> {{BlockPoolSlice#initializeAddReplicaPool()}} method currently initialize the 
> static thread pool instance. But when two {{BPServiceActor}} actor try to 
> load block pool parallelly then it may create different instance. 
> So {{BlockPoolSlice#initializeAddReplicaPool()}} method should be a static 
> method.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15010) BlockPoolSlice#addReplicaThreadPool static pool should be initialized by static method

2019-11-26 Thread Surendra Singh Lilhore (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Surendra Singh Lilhore updated HDFS-15010:
--
Attachment: HDFS-15010.02.patch

> BlockPoolSlice#addReplicaThreadPool static pool should be initialized by 
> static method
> --
>
> Key: HDFS-15010
> URL: https://issues.apache.org/jira/browse/HDFS-15010
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 3.1.2
>Reporter: Surendra Singh Lilhore
>Assignee: Surendra Singh Lilhore
>Priority: Major
> Attachments: HDFS-15010.001.patch, HDFS-15010.02.patch
>
>
> {{BlockPoolSlice#initializeAddReplicaPool()}} method currently initialize the 
> static thread pool instance. But when two {{BPServiceActor}} actor try to 
> load block pool parallelly then it may create different instance. 
> So {{BlockPoolSlice#initializeAddReplicaPool()}} method should be a static 
> method.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14973) Balancer getBlocks RPC dispersal does not function properly

2019-11-26 Thread Erik Krogen (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16982721#comment-16982721
 ] 

Erik Krogen commented on HDFS-14973:


Thanks for the review [~shv]! I addressed your comments in  
[^HDFS-14973-branch-2.005.patch] :

# Great idea, thanks.
# I updated it to avoid the use of {{Preconditions}}. It still throws an 
{{IllegalArgumentException}}, as I think this is a much better semantic 
representation of the issue than an {{IOException}}. The description of an IAE: 
"Thrown to indicate that a method has been passed an illegal or inappropriate 
argument."
# Yup, totally makes sense. My mistake. Fixed.
# I had a bug in the implementation of {{setAtomicLongToMinMax()}} that would 
cause the {{FSNamesystem}} spy to hang sometimes. I fixed this.

> Balancer getBlocks RPC dispersal does not function properly
> ---
>
> Key: HDFS-14973
> URL: https://issues.apache.org/jira/browse/HDFS-14973
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer  mover
>Affects Versions: 2.9.0, 2.7.4, 2.8.2, 3.0.0
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Major
> Fix For: 3.3.0, 3.1.4, 3.2.2
>
> Attachments: HDFS-14973-branch-2.003.patch, 
> HDFS-14973-branch-2.004.patch, HDFS-14973-branch-2.005.patch, 
> HDFS-14973.000.patch, HDFS-14973.001.patch, HDFS-14973.002.patch, 
> HDFS-14973.003.patch, HDFS-14973.test.patch
>
>
> In HDFS-11384, a mechanism was added to make the {{getBlocks}} RPC calls 
> issued by the Balancer/Mover more dispersed, to alleviate load on the 
> NameNode, since {{getBlocks}} can be very expensive and the Balancer should 
> not impact normal cluster operation.
> Unfortunately, this functionality does not function as expected, especially 
> when the dispatcher thread count is low. The primary issue is that the delay 
> is applied only to the first N threads that are submitted to the dispatcher's 
> executor, where N is the size of the dispatcher's threadpool, but *not* to 
> the first R threads, where R is the number of allowed {{getBlocks}} QPS 
> (currently hardcoded to 20). For example, if the threadpool size is 100 (the 
> default), threads 0-19 have no delay, 20-99 have increased levels of delay, 
> and 100+ have no delay. As I understand it, the intent of the logic was that 
> the delay applied to the first 100 threads would force the dispatcher 
> executor's threads to all be consumed, thus blocking subsequent (non-delayed) 
> threads until the delay period has expired. However, threads 0-19 can finish 
> very quickly (their work can often be fulfilled in the time it takes to 
> execute a single {{getBlocks}} RPC, on the order of tens of milliseconds), 
> thus opening up 20 new slots in the executor, which are then consumed by 
> non-delayed threads 100-119, and so on. So, although 80 threads have had a 
> delay applied, the non-delay threads rush through in the 20 non-delay slots.
> This problem gets even worse when the dispatcher threadpool size is less than 
> the max {{getBlocks}} QPS. For example, if the threadpool size is 10, _no 
> threads ever have a delay applied_, and the feature is not enabled at all.
> This problem wasn't surfaced in the original JIRA because the test 
> incorrectly measured the period across which {{getBlocks}} RPCs were 
> distributed. The variables {{startGetBlocksTime}} and {{endGetBlocksTime}} 
> were used to track the time over which the {{getBlocks}} calls were made. 
> However, {{startGetBlocksTime}} was initialized at the time of creation of 
> the {{FSNameystem}} spy, which is before the mock DataNodes are started. Even 
> worse, the Balancer in this test takes 2 iterations to complete balancing the 
> cluster, so the time period {{endGetBlocksTime - startGetBlocksTime}} 
> actually represents:
> {code}
> (time to submit getBlocks RPCs) + (DataNode startup time) + (time for the 
> Dispatcher to complete an iteration of moving blocks)
> {code}
> Thus, the RPC QPS reported by the test is much lower than the RPC QPS seen 
> during the period of initial block fetching.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14908) LeaseManager should check parent-child relationship when filter open files.

2019-11-26 Thread Jira


[ 
https://issues.apache.org/jira/browse/HDFS-14908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16982719#comment-16982719
 ] 

Íñigo Goiri commented on HDFS-14908:


+1 on [^HDFS-14908.006.patch].

> LeaseManager should check parent-child relationship when filter open files.
> ---
>
> Key: HDFS-14908
> URL: https://issues.apache.org/jira/browse/HDFS-14908
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.1.0, 3.0.1
>Reporter: Jinglun
>Assignee: Jinglun
>Priority: Minor
> Attachments: HDFS-14908.001.patch, HDFS-14908.002.patch, 
> HDFS-14908.003.patch, HDFS-14908.004.patch, HDFS-14908.005.patch, 
> HDFS-14908.006.patch, HDFS-14908.TestV4.patch, Test.java, TestV2.java, 
> TestV3.java
>
>
> Now when doing listOpenFiles(), LeaseManager only checks whether the filter 
> path is the prefix of the open files. We should check whether the filter path 
> is the parent/ancestor of the open files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14973) Balancer getBlocks RPC dispersal does not function properly

2019-11-26 Thread Erik Krogen (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Krogen updated HDFS-14973:
---
Attachment: HDFS-14973-branch-2.005.patch

> Balancer getBlocks RPC dispersal does not function properly
> ---
>
> Key: HDFS-14973
> URL: https://issues.apache.org/jira/browse/HDFS-14973
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer  mover
>Affects Versions: 2.9.0, 2.7.4, 2.8.2, 3.0.0
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Major
> Fix For: 3.3.0, 3.1.4, 3.2.2
>
> Attachments: HDFS-14973-branch-2.003.patch, 
> HDFS-14973-branch-2.004.patch, HDFS-14973-branch-2.005.patch, 
> HDFS-14973.000.patch, HDFS-14973.001.patch, HDFS-14973.002.patch, 
> HDFS-14973.003.patch, HDFS-14973.test.patch
>
>
> In HDFS-11384, a mechanism was added to make the {{getBlocks}} RPC calls 
> issued by the Balancer/Mover more dispersed, to alleviate load on the 
> NameNode, since {{getBlocks}} can be very expensive and the Balancer should 
> not impact normal cluster operation.
> Unfortunately, this functionality does not function as expected, especially 
> when the dispatcher thread count is low. The primary issue is that the delay 
> is applied only to the first N threads that are submitted to the dispatcher's 
> executor, where N is the size of the dispatcher's threadpool, but *not* to 
> the first R threads, where R is the number of allowed {{getBlocks}} QPS 
> (currently hardcoded to 20). For example, if the threadpool size is 100 (the 
> default), threads 0-19 have no delay, 20-99 have increased levels of delay, 
> and 100+ have no delay. As I understand it, the intent of the logic was that 
> the delay applied to the first 100 threads would force the dispatcher 
> executor's threads to all be consumed, thus blocking subsequent (non-delayed) 
> threads until the delay period has expired. However, threads 0-19 can finish 
> very quickly (their work can often be fulfilled in the time it takes to 
> execute a single {{getBlocks}} RPC, on the order of tens of milliseconds), 
> thus opening up 20 new slots in the executor, which are then consumed by 
> non-delayed threads 100-119, and so on. So, although 80 threads have had a 
> delay applied, the non-delay threads rush through in the 20 non-delay slots.
> This problem gets even worse when the dispatcher threadpool size is less than 
> the max {{getBlocks}} QPS. For example, if the threadpool size is 10, _no 
> threads ever have a delay applied_, and the feature is not enabled at all.
> This problem wasn't surfaced in the original JIRA because the test 
> incorrectly measured the period across which {{getBlocks}} RPCs were 
> distributed. The variables {{startGetBlocksTime}} and {{endGetBlocksTime}} 
> were used to track the time over which the {{getBlocks}} calls were made. 
> However, {{startGetBlocksTime}} was initialized at the time of creation of 
> the {{FSNameystem}} spy, which is before the mock DataNodes are started. Even 
> worse, the Balancer in this test takes 2 iterations to complete balancing the 
> cluster, so the time period {{endGetBlocksTime - startGetBlocksTime}} 
> actually represents:
> {code}
> (time to submit getBlocks RPCs) + (DataNode startup time) + (time for the 
> Dispatcher to complete an iteration of moving blocks)
> {code}
> Thus, the RPC QPS reported by the test is much lower than the RPC QPS seen 
> during the period of initial block fetching.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15013) Reduce NameNode overview tab response time

2019-11-26 Thread Jira


[ 
https://issues.apache.org/jira/browse/HDFS-15013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16982714#comment-16982714
 ] 

Íñigo Goiri commented on HDFS-15013:


My bad, that was internal code we have.
+1 on  [^HDFS-15013.001.patch].

> Reduce NameNode overview tab response time
> --
>
> Key: HDFS-15013
> URL: https://issues.apache.org/jira/browse/HDFS-15013
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: HuangTao
>Assignee: HuangTao
>Priority: Minor
> Fix For: 3.3.0
>
> Attachments: HDFS-15013.001.patch, image-2019-11-26-10-05-39-640.png, 
> image-2019-11-26-10-09-07-952.png
>
>
> Now, the overview tab load /conf synchronously as follow picture.
>  !image-2019-11-26-10-05-39-640.png! 
> This issue will change it to an asynchronous method. The effect diagram is as 
> follows.
>  !image-2019-11-26-10-09-07-952.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15009) FSCK "-list-corruptfileblocks" return Invalid Entries

2019-11-26 Thread hemanthboyina (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16982707#comment-16982707
 ] 

hemanthboyina commented on HDFS-15009:
--

updated the patch , please review

> FSCK "-list-corruptfileblocks" return Invalid Entries
> -
>
> Key: HDFS-15009
> URL: https://issues.apache.org/jira/browse/HDFS-15009
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-15009.001.patch, HDFS-15009.002.patch
>
>
> Scenario :  if we have two directories dir1, dir10 and only dir10 have 
> corrupt files 
> Now if we run -list-corruptfileblocks for dir1,  corrupt files count for dir1 
> showing is of dir10
> {code:java}
>   while (blkIterator.hasNext()) {
> BlockInfo blk = blkIterator.next();
> final INodeFile inode = getBlockCollection(blk);
> skip++;
> if (inode != null) {
>   String src = inode.getFullPathName();
>   if (src.startsWith(path)){
> corruptFiles.add(new CorruptFileBlockInfo(src, blk));
> count++;
> if (count >= DEFAULT_MAX_CORRUPT_FILEBLOCKS_RETURNED)
>   break;
>   }
> }
>   } {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14997) BPServiceActor process command from NameNode asynchronously

2019-11-26 Thread Xiaoqiao He (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16982686#comment-16982686
 ] 

Xiaoqiao He commented on HDFS-14997:


Thanks [~elgoiri] for your comments. v004 try to fix concerns mentioned above. 
Please take another reviews. Thanks.

> BPServiceActor process command from NameNode asynchronously
> ---
>
> Key: HDFS-14997
> URL: https://issues.apache.org/jira/browse/HDFS-14997
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Xiaoqiao He
>Assignee: Xiaoqiao He
>Priority: Major
> Attachments: HDFS-14997.001.patch, HDFS-14997.002.patch, 
> HDFS-14997.003.patch, HDFS-14997.004.patch
>
>
> There are two core functions, report(#sendHeartbeat, #blockReport, 
> #cacheReport) and #processCommand in #BPServiceActor main process flow. If 
> processCommand cost long time it will block send report flow. Meanwhile 
> processCommand could cost long time(over 1000s the worst case I meet) when IO 
> load  of DataNode is very high. Since some IO operations are under 
> #datasetLock, So it has to wait to acquire #datasetLock long time when 
> process some of commands(such as #DNA_INVALIDATE). In such case, #heartbeat 
> will not send to NameNode in-time, and trigger other disasters.
> I propose to improve #processCommand asynchronously and not block 
> #BPServiceActor to send heartbeat back to NameNode when meet high IO load.
> Notes:
> 1. Lifeline could be one effective solution, however some old branches are 
> not support this feature.
> 2. IO operations under #datasetLock is another issue, I think we should solve 
> it at another JIRA.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14997) BPServiceActor process command from NameNode asynchronously

2019-11-26 Thread Xiaoqiao He (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoqiao He updated HDFS-14997:
---
Attachment: HDFS-14997.004.patch

> BPServiceActor process command from NameNode asynchronously
> ---
>
> Key: HDFS-14997
> URL: https://issues.apache.org/jira/browse/HDFS-14997
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Xiaoqiao He
>Assignee: Xiaoqiao He
>Priority: Major
> Attachments: HDFS-14997.001.patch, HDFS-14997.002.patch, 
> HDFS-14997.003.patch, HDFS-14997.004.patch
>
>
> There are two core functions, report(#sendHeartbeat, #blockReport, 
> #cacheReport) and #processCommand in #BPServiceActor main process flow. If 
> processCommand cost long time it will block send report flow. Meanwhile 
> processCommand could cost long time(over 1000s the worst case I meet) when IO 
> load  of DataNode is very high. Since some IO operations are under 
> #datasetLock, So it has to wait to acquire #datasetLock long time when 
> process some of commands(such as #DNA_INVALIDATE). In such case, #heartbeat 
> will not send to NameNode in-time, and trigger other disasters.
> I propose to improve #processCommand asynchronously and not block 
> #BPServiceActor to send heartbeat back to NameNode when meet high IO load.
> Notes:
> 1. Lifeline could be one effective solution, however some old branches are 
> not support this feature.
> 2. IO operations under #datasetLock is another issue, I think we should solve 
> it at another JIRA.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15009) FSCK "-list-corruptfileblocks" return Invalid Entries

2019-11-26 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16982675#comment-16982675
 ] 

Hadoop QA commented on HDFS-15009:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
27s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
21s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
 2s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  3m  
6s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
56s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
41s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 25s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m  
6s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m  
6s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
12s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  3m  
8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  3m  
8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m  6s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m  
4s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 92m 23s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  7m 
14s{color} | {color:green} hadoop-hdfs-rbf in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
40s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}167m 35s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hdfs.server.blockmanagement.TestUnderReplicatedBlocks |
|   | hadoop.hdfs.server.namenode.TestRedudantBlocks |
|   | hadoop.hdfs.TestDeadNodeDetection |
|   | hadoop.hdfs.server.blockmanagement.TestBlockStatsMXBean |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.5 Server=19.03.5 Image:yetus/hadoop:104ccca9169 |
| JIRA Issue | HDFS-15009 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12986796/HDFS-15009.002.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 17f41f690ed0 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 448ffb1 |
| 

[jira] [Commented] (HDFS-14649) Add suspect probe for DeadNodeDetector

2019-11-26 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16982617#comment-16982617
 ] 

Hadoop QA commented on HDFS-14649:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
56s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
14s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 
15s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  4m 
41s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 5s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
24s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
18m 28s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  5m 
10s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
55s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
12s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  4m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  4m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m  
4s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
16m 59s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  5m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
59s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m  
8s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}144m 41s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
50s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}240m 59s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.TestSafeModeWithStripedFile |
|   | hadoop.hdfs.TestDFSClientRetries |
|   | hadoop.hdfs.TestDistributedFileSystem |
|   | hadoop.hdfs.TestReadStripedFileWithDNFailure |
|   | hadoop.hdfs.TestReconstructStripedFileWithRandomECPolicy |
|   | hadoop.hdfs.TestDFSStripedOutputStreamWithRandomECPolicy |
|   | hadoop.hdfs.TestLeaseRecovery2 |
|   | hadoop.hdfs.server.diskbalancer.TestDiskBalancerRPC |
|   | hadoop.hdfs.server.diskbalancer.TestDiskBalancer |
|   | hadoop.hdfs.server.blockmanagement.TestUnderReplicatedBlocks |
|   | hadoop.hdfs.TestErasureCodingExerciseAPIs |
|   | hadoop.hdfs.TestErasureCodingPolicyWithSnapshotWithRandomECPolicy |
|   | hadoop.hdfs.TestQuota |
|   | 

[jira] [Commented] (HDFS-14649) Add suspect probe for DeadNodeDetector

2019-11-26 Thread Lisheng Sun (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16982570#comment-16982570
 ] 

Lisheng Sun commented on HDFS-14649:


{quote}
We can introduce one additional variable in Detector class, like 
disabledProbeThreadForTest and then reset this value in the UT. We don't need 
to add new config for this.
{quote}
the v004 patch fixed this problem.  To reset disabledProbeThreadForTest in the 
UT controls whether to start probeThread. 

> Add suspect probe for DeadNodeDetector
> --
>
> Key: HDFS-14649
> URL: https://issues.apache.org/jira/browse/HDFS-14649
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-14649.001.patch, HDFS-14649.002.patch, 
> HDFS-14649.003.patch, HDFS-14649.004.patch, HDFS-14649.005.patch
>
>
> Add suspect probe for DeadNodeDetector.
> when  some DataNode of the block is found to inaccessible, put the DataNode 
> into it will be placed in the Suspicious Node list. Because when DataNode is 
> not accessible, it is likely that the replica has been removed from the 
> DataNode.Therefore, it needs to be confirmed by re-probing and requires a 
> higher priority processing.
> And when DataNode is placed in the Suspicious Node list, it is accessed by 
> other dfsinputstream.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-14649) Add suspect probe for DeadNodeDetector

2019-11-26 Thread Lisheng Sun (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16982570#comment-16982570
 ] 

Lisheng Sun edited comment on HDFS-14649 at 11/26/19 2:53 PM:
--

{quote}
We can introduce one additional variable in Detector class, like 
disabledProbeThreadForTest and then reset this value in the UT. We don't need 
to add new config for this.
{quote}
the v005 patch fixed this problem.  To reset disabledProbeThreadForTest in the 
UT controls whether to start probeThread. 


was (Author: leosun08):
{quote}
We can introduce one additional variable in Detector class, like 
disabledProbeThreadForTest and then reset this value in the UT. We don't need 
to add new config for this.
{quote}
the v004 patch fixed this problem.  To reset disabledProbeThreadForTest in the 
UT controls whether to start probeThread. 

> Add suspect probe for DeadNodeDetector
> --
>
> Key: HDFS-14649
> URL: https://issues.apache.org/jira/browse/HDFS-14649
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-14649.001.patch, HDFS-14649.002.patch, 
> HDFS-14649.003.patch, HDFS-14649.004.patch, HDFS-14649.005.patch
>
>
> Add suspect probe for DeadNodeDetector.
> when  some DataNode of the block is found to inaccessible, put the DataNode 
> into it will be placed in the Suspicious Node list. Because when DataNode is 
> not accessible, it is likely that the replica has been removed from the 
> DataNode.Therefore, it needs to be confirmed by re-probing and requires a 
> higher priority processing.
> And when DataNode is placed in the Suspicious Node list, it is accessed by 
> other dfsinputstream.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15009) FSCK "-list-corruptfileblocks" return Invalid Entries

2019-11-26 Thread hemanthboyina (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hemanthboyina updated HDFS-15009:
-
Attachment: HDFS-15009.002.patch

> FSCK "-list-corruptfileblocks" return Invalid Entries
> -
>
> Key: HDFS-15009
> URL: https://issues.apache.org/jira/browse/HDFS-15009
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-15009.001.patch, HDFS-15009.002.patch
>
>
> Scenario :  if we have two directories dir1, dir10 and only dir10 have 
> corrupt files 
> Now if we run -list-corruptfileblocks for dir1,  corrupt files count for dir1 
> showing is of dir10
> {code:java}
>   while (blkIterator.hasNext()) {
> BlockInfo blk = blkIterator.next();
> final INodeFile inode = getBlockCollection(blk);
> skip++;
> if (inode != null) {
>   String src = inode.getFullPathName();
>   if (src.startsWith(path)){
> corruptFiles.add(new CorruptFileBlockInfo(src, blk));
> count++;
> if (count >= DEFAULT_MAX_CORRUPT_FILEBLOCKS_RETURNED)
>   break;
>   }
> }
>   } {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14649) Add suspect probe for DeadNodeDetector

2019-11-26 Thread Lisheng Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lisheng Sun updated HDFS-14649:
---
Attachment: HDFS-14649.005.patch

> Add suspect probe for DeadNodeDetector
> --
>
> Key: HDFS-14649
> URL: https://issues.apache.org/jira/browse/HDFS-14649
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-14649.001.patch, HDFS-14649.002.patch, 
> HDFS-14649.003.patch, HDFS-14649.004.patch, HDFS-14649.005.patch
>
>
> Add suspect probe for DeadNodeDetector.
> when  some DataNode of the block is found to inaccessible, put the DataNode 
> into it will be placed in the Suspicious Node list. Because when DataNode is 
> not accessible, it is likely that the replica has been removed from the 
> DataNode.Therefore, it needs to be confirmed by re-probing and requires a 
> higher priority processing.
> And when DataNode is placed in the Suspicious Node list, it is accessed by 
> other dfsinputstream.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14649) Add suspect probe for DeadNodeDetector

2019-11-26 Thread Lisheng Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lisheng Sun updated HDFS-14649:
---
Attachment: (was: HDFS-14649.005.patch)

> Add suspect probe for DeadNodeDetector
> --
>
> Key: HDFS-14649
> URL: https://issues.apache.org/jira/browse/HDFS-14649
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-14649.001.patch, HDFS-14649.002.patch, 
> HDFS-14649.003.patch, HDFS-14649.004.patch
>
>
> Add suspect probe for DeadNodeDetector.
> when  some DataNode of the block is found to inaccessible, put the DataNode 
> into it will be placed in the Suspicious Node list. Because when DataNode is 
> not accessible, it is likely that the replica has been removed from the 
> DataNode.Therefore, it needs to be confirmed by re-probing and requires a 
> higher priority processing.
> And when DataNode is placed in the Suspicious Node list, it is accessed by 
> other dfsinputstream.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14649) Add suspect probe for DeadNodeDetector

2019-11-26 Thread Lisheng Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lisheng Sun updated HDFS-14649:
---
Attachment: HDFS-14649.005.patch

> Add suspect probe for DeadNodeDetector
> --
>
> Key: HDFS-14649
> URL: https://issues.apache.org/jira/browse/HDFS-14649
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-14649.001.patch, HDFS-14649.002.patch, 
> HDFS-14649.003.patch, HDFS-14649.004.patch, HDFS-14649.005.patch
>
>
> Add suspect probe for DeadNodeDetector.
> when  some DataNode of the block is found to inaccessible, put the DataNode 
> into it will be placed in the Suspicious Node list. Because when DataNode is 
> not accessible, it is likely that the replica has been removed from the 
> DataNode.Therefore, it needs to be confirmed by re-probing and requires a 
> higher priority processing.
> And when DataNode is placed in the Suspicious Node list, it is accessed by 
> other dfsinputstream.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14649) Add suspect probe for DeadNodeDetector

2019-11-26 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16982366#comment-16982366
 ] 

Hadoop QA commented on HDFS-14649:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
41s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
10s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
 3s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  3m 
20s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
53s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
51s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m 43s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
18s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
41s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
10s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  3m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  3m 
10s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 48s{color} | {color:orange} hadoop-hdfs-project: The patch generated 2 new + 
28 unchanged - 0 fixed = 30 total (was 28) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 44s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
39s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m  
0s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}115m 49s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
51s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}193m 11s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.tools.TestHdfsConfigFields |
|   | hadoop.hdfs.server.balancer.TestBalancer |
|   | hadoop.hdfs.server.datanode.TestBlockRecovery |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.5 Server=19.03.5 Image:yetus/hadoop:104ccca9169 |
| JIRA Issue | HDFS-14649 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12986750/HDFS-14649.004.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  xml  |
| uname | Linux c8346c68674d 4.15.0-66-generic #75-Ubuntu SMP Tue Oct 1 
05:24:09 UTC 2019 x86_64 x86_64 

[jira] [Commented] (HDFS-15012) NN fails to parse Edit logs after applying HDFS-13101

2019-11-26 Thread Eric Lin (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16982314#comment-16982314
 ] 

Eric Lin commented on HDFS-15012:
-

Thanks [~weichiu],

Hopefully we can nail it down soon.


> NN fails to parse Edit logs after applying HDFS-13101
> -
>
> Key: HDFS-15012
> URL: https://issues.apache.org/jira/browse/HDFS-15012
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: nn
>Reporter: Eric Lin
>Priority: Critical
>
> After applying HDFS-13101, and deleting and creating large number of 
> snapshots, SNN exited with below error:
>   
> {code:sh}
> 2019-11-18 08:28:06,528 ERROR 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: Encountered exception 
> on operation DeleteSnapshotOp [snapshotRoot=/path/to/hdfs/file, 
> snapshotName=distcp-3479-31-old, 
> RpcClientId=b16a6cb5-bdbb-45ae-9f9a-f7dc57931f37, Rpc
> CallId=1]
> java.lang.AssertionError: Element already exists: 
> element=partition_isactive=true, DELETED=[partition_isactive=true]
> at org.apache.hadoop.hdfs.util.Diff.insert(Diff.java:193)
> at org.apache.hadoop.hdfs.util.Diff.delete(Diff.java:239)
> at org.apache.hadoop.hdfs.util.Diff.combinePosterior(Diff.java:462)
> at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature$DirectoryDiff$2.initChildren(DirectoryWithSnapshotFeature.java:240)
> at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature$DirectoryDiff$2.iterator(DirectoryWithSnapshotFeature.java:250)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtreeRecursively(INodeDirectory.java:755)
> at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature.cleanDirectory(DirectoryWithSnapshotFeature.java:753)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtree(INodeDirectory.java:790)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeReference.cleanSubtree(INodeReference.java:332)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeReference$WithName.cleanSubtree(INodeReference.java:583)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtreeRecursively(INodeDirectory.java:760)
> at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature.cleanDirectory(DirectoryWithSnapshotFeature.java:753)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtree(INodeDirectory.java:790)
> at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.DirectorySnapshottableFeature.removeSnapshot(DirectorySnapshottableFeature.java:235)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.removeSnapshot(INodeDirectory.java:259)
> at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.SnapshotManager.deleteSnapshot(SnapshotManager.java:301)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:688)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:232)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:141)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:903)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:756)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:324)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1144)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:796)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:614)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:676)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:844)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:823)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1547)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1615)
> {code}
> We confirmed that fsimage and edit files were NOT corrupted, as reverting 
> HDFS-13101 fixed the issue. So the logic introduced in HDFS-13101 is broken 
> and failed to parse edit log files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14852) Remove of LowRedundancyBlocks do NOT remove the block from all queues

2019-11-26 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16982297#comment-16982297
 ] 

Hadoop QA commented on HDFS-14852:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  9s{color} 
| {color:red} HDFS-14852 does not apply to trunk. Rebase required? Wrong 
Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HDFS-14852 |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28393/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> Remove of LowRedundancyBlocks do NOT remove the block from all queues
> -
>
> Key: HDFS-14852
> URL: https://issues.apache.org/jira/browse/HDFS-14852
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ec
>Affects Versions: 3.2.0, 3.0.3, 3.1.2, 3.3.0
>Reporter: Fei Hui
>Assignee: Fei Hui
>Priority: Major
> Attachments: CorruptBlocksMismatch.png, HDFS-14852.001.patch, 
> HDFS-14852.002.patch, HDFS-14852.003.patch, HDFS-14852.004.patch, 
> screenshot-1.png
>
>
> LowRedundancyBlocks.java
> {code:java}
> // Some comments here
> if(priLevel >= 0 && priLevel < LEVEL
> && priorityQueues.get(priLevel).remove(block)) {
>   NameNode.blockStateChangeLog.debug(
>   "BLOCK* NameSystem.LowRedundancyBlock.remove: Removing block {}"
>   + " from priority queue {}",
>   block, priLevel);
>   decrementBlockStat(block, priLevel, oldExpectedReplicas);
>   return true;
> } else {
>   // Try to remove the block from all queues if the block was
>   // not found in the queue for the given priority level.
>   for (int i = 0; i < LEVEL; i++) {
> if (i != priLevel && priorityQueues.get(i).remove(block)) {
>   NameNode.blockStateChangeLog.debug(
>   "BLOCK* NameSystem.LowRedundancyBlock.remove: Removing block" +
>   " {} from priority queue {}", block, i);
>   decrementBlockStat(block, i, oldExpectedReplicas);
>   return true;
> }
>   }
> }
> return false;
>   }
> {code}
> Source code is above, the comments as follow
> {quote}
>   // Try to remove the block from all queues if the block was
>   // not found in the queue for the given priority level.
> {quote}
> The function "remove" does NOT remove the block from all queues.
> Function add from LowRedundancyBlocks.java is used on some places and maybe 
> one block in two or more queues.
> We found that corrupt blocks mismatch corrupt files on NN web UI. Maybe it is 
> related to this.
> Upload initial patch



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14852) Remove of LowRedundancyBlocks do NOT remove the block from all queues

2019-11-26 Thread Fei Hui (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16982296#comment-16982296
 ] 

Fei Hui commented on HDFS-14852:


This  problem occurs again!
 !screenshot-1.png! 
Should we move it forward? Thanks
[~kihwal][~weichiu] [~ayushtkn]

> Remove of LowRedundancyBlocks do NOT remove the block from all queues
> -
>
> Key: HDFS-14852
> URL: https://issues.apache.org/jira/browse/HDFS-14852
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ec
>Affects Versions: 3.2.0, 3.0.3, 3.1.2, 3.3.0
>Reporter: Fei Hui
>Assignee: Fei Hui
>Priority: Major
> Attachments: CorruptBlocksMismatch.png, HDFS-14852.001.patch, 
> HDFS-14852.002.patch, HDFS-14852.003.patch, HDFS-14852.004.patch, 
> screenshot-1.png
>
>
> LowRedundancyBlocks.java
> {code:java}
> // Some comments here
> if(priLevel >= 0 && priLevel < LEVEL
> && priorityQueues.get(priLevel).remove(block)) {
>   NameNode.blockStateChangeLog.debug(
>   "BLOCK* NameSystem.LowRedundancyBlock.remove: Removing block {}"
>   + " from priority queue {}",
>   block, priLevel);
>   decrementBlockStat(block, priLevel, oldExpectedReplicas);
>   return true;
> } else {
>   // Try to remove the block from all queues if the block was
>   // not found in the queue for the given priority level.
>   for (int i = 0; i < LEVEL; i++) {
> if (i != priLevel && priorityQueues.get(i).remove(block)) {
>   NameNode.blockStateChangeLog.debug(
>   "BLOCK* NameSystem.LowRedundancyBlock.remove: Removing block" +
>   " {} from priority queue {}", block, i);
>   decrementBlockStat(block, i, oldExpectedReplicas);
>   return true;
> }
>   }
> }
> return false;
>   }
> {code}
> Source code is above, the comments as follow
> {quote}
>   // Try to remove the block from all queues if the block was
>   // not found in the queue for the given priority level.
> {quote}
> The function "remove" does NOT remove the block from all queues.
> Function add from LowRedundancyBlocks.java is used on some places and maybe 
> one block in two or more queues.
> We found that corrupt blocks mismatch corrupt files on NN web UI. Maybe it is 
> related to this.
> Upload initial patch



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14852) Remove of LowRedundancyBlocks do NOT remove the block from all queues

2019-11-26 Thread Fei Hui (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fei Hui updated HDFS-14852:
---
Attachment: screenshot-1.png

> Remove of LowRedundancyBlocks do NOT remove the block from all queues
> -
>
> Key: HDFS-14852
> URL: https://issues.apache.org/jira/browse/HDFS-14852
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ec
>Affects Versions: 3.2.0, 3.0.3, 3.1.2, 3.3.0
>Reporter: Fei Hui
>Assignee: Fei Hui
>Priority: Major
> Attachments: CorruptBlocksMismatch.png, HDFS-14852.001.patch, 
> HDFS-14852.002.patch, HDFS-14852.003.patch, HDFS-14852.004.patch, 
> screenshot-1.png
>
>
> LowRedundancyBlocks.java
> {code:java}
> // Some comments here
> if(priLevel >= 0 && priLevel < LEVEL
> && priorityQueues.get(priLevel).remove(block)) {
>   NameNode.blockStateChangeLog.debug(
>   "BLOCK* NameSystem.LowRedundancyBlock.remove: Removing block {}"
>   + " from priority queue {}",
>   block, priLevel);
>   decrementBlockStat(block, priLevel, oldExpectedReplicas);
>   return true;
> } else {
>   // Try to remove the block from all queues if the block was
>   // not found in the queue for the given priority level.
>   for (int i = 0; i < LEVEL; i++) {
> if (i != priLevel && priorityQueues.get(i).remove(block)) {
>   NameNode.blockStateChangeLog.debug(
>   "BLOCK* NameSystem.LowRedundancyBlock.remove: Removing block" +
>   " {} from priority queue {}", block, i);
>   decrementBlockStat(block, i, oldExpectedReplicas);
>   return true;
> }
>   }
> }
> return false;
>   }
> {code}
> Source code is above, the comments as follow
> {quote}
>   // Try to remove the block from all queues if the block was
>   // not found in the queue for the given priority level.
> {quote}
> The function "remove" does NOT remove the block from all queues.
> Function add from LowRedundancyBlocks.java is used on some places and maybe 
> one block in two or more queues.
> We found that corrupt blocks mismatch corrupt files on NN web UI. Maybe it is 
> related to this.
> Upload initial patch



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14649) Add suspect probe for DeadNodeDetector

2019-11-26 Thread Yiqun Lin (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16982255#comment-16982255
 ] 

Yiqun Lin commented on HDFS-14649:
--

Some review comments for the unit test:
 * We can introduce one additional variable in Detector class, like 
disabledProbeThreadForTest and then reset this value in the UT. We don't need 
to add new config for this.
{code:java}
  /**
   * Disabled start probe suspect/dead thread for the testing.
   */
  private volatile boolean disabledProbeThreadForTest = false;
  ...
  @VisibleForTesting
  public void disabledProbeThreadForTest() {
  this.disabledProbeThreadForTest = true;
  }
{code}

 * Can we update the condition 
dfsClient.getClientContext().getDeadNodeDetector().getSuspectNodesProbeQueue().size()
 >= 1 to 
dfsClient.getClientContext().getDeadNodeDetector().getSuspectNodesProbeQueue().size()
 > 0?

> Add suspect probe for DeadNodeDetector
> --
>
> Key: HDFS-14649
> URL: https://issues.apache.org/jira/browse/HDFS-14649
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-14649.001.patch, HDFS-14649.002.patch, 
> HDFS-14649.003.patch, HDFS-14649.004.patch
>
>
> Add suspect probe for DeadNodeDetector.
> when  some DataNode of the block is found to inaccessible, put the DataNode 
> into it will be placed in the Suspicious Node list. Because when DataNode is 
> not accessible, it is likely that the replica has been removed from the 
> DataNode.Therefore, it needs to be confirmed by re-probing and requires a 
> higher priority processing.
> And when DataNode is placed in the Suspicious Node list, it is accessed by 
> other dfsinputstream.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15013) Reduce NameNode overview tab response time

2019-11-26 Thread HuangTao (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16982235#comment-16982235
 ] 

HuangTao commented on HDFS-15013:
-

[~inigoiri] I have checked federationhealth.js, there is no seperate sync ajax 
request likes in dfshealth.js.
{code:java}
 $.ajax({'url': '/conf', 'dataType': 'xml', 'async': false}).done(
  function(d) {
var $xml = $(d);
var namespace, nnId;
$xml.find('property').each(function(idx,v) {
  if ($(v).find('name').text() === 'dfs.nameservice.id') {
namespace = $(v).find('value').text();
  }
  if ($(v).find('name').text() === 'dfs.ha.namenode.id') {
nnId = $(v).find('value').text();
  }
});
if (namespace && nnId) {
  data['HAInfo'] = {"Namespace": namespace, "NamenodeID": nnId};
}
});{code}

> Reduce NameNode overview tab response time
> --
>
> Key: HDFS-15013
> URL: https://issues.apache.org/jira/browse/HDFS-15013
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: HuangTao
>Assignee: HuangTao
>Priority: Minor
> Fix For: 3.3.0
>
> Attachments: HDFS-15013.001.patch, image-2019-11-26-10-05-39-640.png, 
> image-2019-11-26-10-09-07-952.png
>
>
> Now, the overview tab load /conf synchronously as follow picture.
>  !image-2019-11-26-10-05-39-640.png! 
> This issue will change it to an asynchronous method. The effect diagram is as 
> follows.
>  !image-2019-11-26-10-09-07-952.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org