date:20191226

[jira] [Commented] (HDFS-15074) DataNode.DataTransfer thread should catch all the expception and log it.

2019-12-26 Thread Surendra Singh Lilhore (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-15074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17003931#comment-17003931
 ] 

Surendra Singh Lilhore commented on HDFS-15074:
---

+1

> DataNode.DataTransfer thread should catch all the expception and log it.
> 
>
> Key: HDFS-15074
> URL: https://issues.apache.org/jira/browse/HDFS-15074
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.1.1
>Reporter: Surendra Singh Lilhore
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-15074.001.patch, HDFS-15074.002.patch
>
>
> Some time If this thread is throwing exception other than IOException, will 
> not be able to trash it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14957) INodeReference Space Consumed was not same in QuotaUsage and ContentSummary

2019-12-26 Thread Surendra Singh Lilhore (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17003930#comment-17003930
 ] 

Surendra Singh Lilhore commented on HDFS-14957:
---

+1, I will commit this today.

> INodeReference Space Consumed was not same in QuotaUsage and ContentSummary
> ---
>
> Key: HDFS-14957
> URL: https://issues.apache.org/jira/browse/HDFS-14957
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.1.4
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-14957.001.patch, HDFS-14957.002.patch, 
> HDFS-14957.JPG
>
>
> for INodeReferences , space consumed was different in QuotaUsage and Content 
> Summary 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-15051) RBF: Propose to revoke WRITE MountTableEntry privilege to super user only

2019-12-26 Thread Xiaoqiao He (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoqiao He updated HDFS-15051:
---
Status: Open  (was: Patch Available)

> RBF: Propose to revoke WRITE MountTableEntry privilege to super user only
> -
>
> Key: HDFS-15051
> URL: https://issues.apache.org/jira/browse/HDFS-15051
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: rbf
>Reporter: Xiaoqiao He
>Assignee: Xiaoqiao He
>Priority: Major
> Attachments: HDFS-15051.001.patch, HDFS-15051.002.patch, 
> HDFS-15051.003.patch, HDFS-15051.004.patch, HDFS-15051.005.patch
>
>
> The current permission checker of #MountTableStoreImpl is not very restrict. 
> In some case, any user could add/update/remove MountTableEntry without the 
> expected permission checking.
> The following code segment try to check permission when operate 
> MountTableEntry, however mountTable object is from Client/RouterAdmin 
> {{MountTable mountTable = request.getEntry();}}, and user could pass any mode 
> which could bypass the permission checker.
> {code:java}
>   public void checkPermission(MountTable mountTable, FsAction access)
>   throws AccessControlException {
> if (isSuperUser()) {
>   return;
> }
> FsPermission mode = mountTable.getMode();
> if (getUser().equals(mountTable.getOwnerName())
> && mode.getUserAction().implies(access)) {
>   return;
> }
> if (isMemberOfGroup(mountTable.getGroupName())
> && mode.getGroupAction().implies(access)) {
>   return;
> }
> if (!getUser().equals(mountTable.getOwnerName())
> && !isMemberOfGroup(mountTable.getGroupName())
> && mode.getOtherAction().implies(access)) {
>   return;
> }
> throw new AccessControlException(
> "Permission denied while accessing mount table "
> + mountTable.getSourcePath()
> + ": user " + getUser() + " does not have " + access.toString()
> + " permissions.");
>   }
> {code}
> I just propose revoke WRITE MountTableEntry privilege to super user only.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-15051) RBF: Propose to revoke WRITE MountTableEntry privilege to super user only

2019-12-26 Thread Xiaoqiao He (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoqiao He updated HDFS-15051:
---
Status: Patch Available  (was: Open)

re-trigger Jenkins manually

> RBF: Propose to revoke WRITE MountTableEntry privilege to super user only
> -
>
> Key: HDFS-15051
> URL: https://issues.apache.org/jira/browse/HDFS-15051
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: rbf
>Reporter: Xiaoqiao He
>Assignee: Xiaoqiao He
>Priority: Major
> Attachments: HDFS-15051.001.patch, HDFS-15051.002.patch, 
> HDFS-15051.003.patch, HDFS-15051.004.patch, HDFS-15051.005.patch
>
>
> The current permission checker of #MountTableStoreImpl is not very restrict. 
> In some case, any user could add/update/remove MountTableEntry without the 
> expected permission checking.
> The following code segment try to check permission when operate 
> MountTableEntry, however mountTable object is from Client/RouterAdmin 
> {{MountTable mountTable = request.getEntry();}}, and user could pass any mode 
> which could bypass the permission checker.
> {code:java}
>   public void checkPermission(MountTable mountTable, FsAction access)
>   throws AccessControlException {
> if (isSuperUser()) {
>   return;
> }
> FsPermission mode = mountTable.getMode();
> if (getUser().equals(mountTable.getOwnerName())
> && mode.getUserAction().implies(access)) {
>   return;
> }
> if (isMemberOfGroup(mountTable.getGroupName())
> && mode.getGroupAction().implies(access)) {
>   return;
> }
> if (!getUser().equals(mountTable.getOwnerName())
> && !isMemberOfGroup(mountTable.getGroupName())
> && mode.getOtherAction().implies(access)) {
>   return;
> }
> throw new AccessControlException(
> "Permission denied while accessing mount table "
> + mountTable.getSourcePath()
> + ": user " + getUser() + " does not have " + access.toString()
> + " permissions.");
>   }
> {code}
> I just propose revoke WRITE MountTableEntry privilege to super user only.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-15075) Remove process command timing from BPServiceActor

2019-12-26 Thread Xiaoqiao He (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoqiao He updated HDFS-15075:
---
Status: Patch Available  (was: Open)

re-trigger Jenkins manually

> Remove process command timing from BPServiceActor
> -
>
> Key: HDFS-15075
> URL: https://issues.apache.org/jira/browse/HDFS-15075
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Íñigo Goiri
>Assignee: Xiaoqiao He
>Priority: Major
> Attachments: HDFS-15075.001.patch, HDFS-15075.002.patch
>
>
> HDFS-14997 moved the command processing into async.
> Right now, we are checking the time to add to a queue.
> We should remove this one and maybe move the timing within the thread.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-15075) Remove process command timing from BPServiceActor

2019-12-26 Thread Xiaoqiao He (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoqiao He updated HDFS-15075:
---
Status: Open  (was: Patch Available)

> Remove process command timing from BPServiceActor
> -
>
> Key: HDFS-15075
> URL: https://issues.apache.org/jira/browse/HDFS-15075
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Íñigo Goiri
>Assignee: Xiaoqiao He
>Priority: Major
> Attachments: HDFS-15075.001.patch, HDFS-15075.002.patch
>
>
> HDFS-14997 moved the command processing into async.
> Right now, we are checking the time to add to a queue.
> We should remove this one and maybe move the timing within the thread.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-15003) RBF: Make Router support storage type quota.

2019-12-26 Thread Hudson (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-15003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17003880#comment-17003880
 ] 

Hudson commented on HDFS-15003:
---

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17795 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/17795/])
HDFS-15003. RBF: Make Router support storage type quota. Contributed by 
(ayushsaxena: rev 8730a7bf6025a3b2b7d6e6686533283b854af192)
* (edit) 
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/RouterQuotaUsage.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/RouterAdminServer.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/RouterQuotaUpdateService.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/store/records/impl/pb/MountTablePBImpl.java
* (edit) hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HDFSCommands.md
* (edit) 
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/RouterQuotaManager.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/Quota.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs-rbf/src/test/java/org/apache/hadoop/hdfs/server/federation/router/TestRouterAdminCLI.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/RouterRpcServer.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/store/impl/MountTableStoreImpl.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs-rbf/src/test/java/org/apache/hadoop/hdfs/server/federation/router/TestRouterQuota.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/tools/federation/RouterAdmin.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs-rbf/src/site/markdown/HDFSRouterFederation.md


> RBF: Make Router support storage type quota.
> 
>
> Key: HDFS-15003
> URL: https://issues.apache.org/jira/browse/HDFS-15003
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Jinglun
>Assignee: Jinglun
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-15003.001.patch, HDFS-15003.002.patch, 
> HDFS-15003.003.patch, HDFS-15003.004.patch, HDFS-15003.005.patch, 
> HDFS-15003.006.patch, HDFS-15003.007.patch, HDFS-15003.008.patch
>
>
> Make Router support storage type quota.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14997) BPServiceActor processes commands from NameNode asynchronously

2019-12-26 Thread Hudson (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17003874#comment-17003874
 ] 

Hudson commented on HDFS-14997:
---

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17794 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/17794/])
HDFS-14997. Addendum: BPServiceActor processes commands from NameNode 
(ayushsaxena: rev 80f91d14ab0fb385252d4eeb19141bd059303d59)
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPServiceActor.java


> BPServiceActor processes commands from NameNode asynchronously
> --
>
> Key: HDFS-14997
> URL: https://issues.apache.org/jira/browse/HDFS-14997
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Xiaoqiao He
>Assignee: Xiaoqiao He
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14997.001.patch, HDFS-14997.002.patch, 
> HDFS-14997.003.patch, HDFS-14997.004.patch, HDFS-14997.005.patch, 
> HDFS-14997.addendum.patch, image-2019-12-26-16-15-44-814.png
>
>
> There are two core functions, report(#sendHeartbeat, #blockReport, 
> #cacheReport) and #processCommand in #BPServiceActor main process flow. If 
> processCommand cost long time it will block send report flow. Meanwhile 
> processCommand could cost long time(over 1000s the worst case I meet) when IO 
> load  of DataNode is very high. Since some IO operations are under 
> #datasetLock, So it has to wait to acquire #datasetLock long time when 
> process some of commands(such as #DNA_INVALIDATE). In such case, #heartbeat 
> will not send to NameNode in-time, and trigger other disasters.
> I propose to improve #processCommand asynchronously and not block 
> #BPServiceActor to send heartbeat back to NameNode when meet high IO load.
> Notes:
> 1. Lifeline could be one effective solution, however some old branches are 
> not support this feature.
> 2. IO operations under #datasetLock is another issue, I think we should solve 
> it at another JIRA.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-15003) RBF: Make Router support storage type quota.

2019-12-26 Thread Ayush Saxena (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena updated HDFS-15003:

Fix Version/s: 3.3.0
 Hadoop Flags: Reviewed
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

Committed to trunk.

Thanx [~LiJinglun] for the contribution and [~elgoiri] for the review!!!

> RBF: Make Router support storage type quota.
> 
>
> Key: HDFS-15003
> URL: https://issues.apache.org/jira/browse/HDFS-15003
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Jinglun
>Assignee: Jinglun
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-15003.001.patch, HDFS-15003.002.patch, 
> HDFS-15003.003.patch, HDFS-15003.004.patch, HDFS-15003.005.patch, 
> HDFS-15003.006.patch, HDFS-15003.007.patch, HDFS-15003.008.patch
>
>
> Make Router support storage type quota.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-15003) RBF: Make Router support storage type quota.

2019-12-26 Thread Ayush Saxena (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-15003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17003871#comment-17003871
 ] 

Ayush Saxena commented on HDFS-15003:
-

Thanx [~LiJinglun]  for confirmation, the heap issue isn't related, Already 
Pushed, Should have been fixed now.

Committing this shortly.

 

> RBF: Make Router support storage type quota.
> 
>
> Key: HDFS-15003
> URL: https://issues.apache.org/jira/browse/HDFS-15003
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Jinglun
>Assignee: Jinglun
>Priority: Major
> Attachments: HDFS-15003.001.patch, HDFS-15003.002.patch, 
> HDFS-15003.003.patch, HDFS-15003.004.patch, HDFS-15003.005.patch, 
> HDFS-15003.006.patch, HDFS-15003.007.patch, HDFS-15003.008.patch
>
>
> Make Router support storage type quota.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14997) BPServiceActor processes commands from NameNode asynchronously

2019-12-26 Thread Xiaoqiao He (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17003868#comment-17003868
 ] 

Xiaoqiao He commented on HDFS-14997:


Thanks [~ayushtkn],[~elgoiri],[~Kevin_Zheng] for your helps and reviews.

> BPServiceActor processes commands from NameNode asynchronously
> --
>
> Key: HDFS-14997
> URL: https://issues.apache.org/jira/browse/HDFS-14997
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Xiaoqiao He
>Assignee: Xiaoqiao He
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14997.001.patch, HDFS-14997.002.patch, 
> HDFS-14997.003.patch, HDFS-14997.004.patch, HDFS-14997.005.patch, 
> HDFS-14997.addendum.patch, image-2019-12-26-16-15-44-814.png
>
>
> There are two core functions, report(#sendHeartbeat, #blockReport, 
> #cacheReport) and #processCommand in #BPServiceActor main process flow. If 
> processCommand cost long time it will block send report flow. Meanwhile 
> processCommand could cost long time(over 1000s the worst case I meet) when IO 
> load  of DataNode is very high. Since some IO operations are under 
> #datasetLock, So it has to wait to acquire #datasetLock long time when 
> process some of commands(such as #DNA_INVALIDATE). In such case, #heartbeat 
> will not send to NameNode in-time, and trigger other disasters.
> I propose to improve #processCommand asynchronously and not block 
> #BPServiceActor to send heartbeat back to NameNode when meet high IO load.
> Notes:
> 1. Lifeline could be one effective solution, however some old branches are 
> not support this feature.
> 2. IO operations under #datasetLock is another issue, I think we should solve 
> it at another JIRA.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-15003) RBF: Make Router support storage type quota.

2019-12-26 Thread Xiaoqiao He (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-15003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17003867#comment-17003867
 ] 

Xiaoqiao He commented on HDFS-15003:


Hi [~LiJinglun], OOM error is not related with this PR in my opinion, please 
ignore it. HDFS-14997 is tracking 'java.lang.OutOfMemoryError: Java heap space' 
Jenkins reported.

> RBF: Make Router support storage type quota.
> 
>
> Key: HDFS-15003
> URL: https://issues.apache.org/jira/browse/HDFS-15003
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Jinglun
>Assignee: Jinglun
>Priority: Major
> Attachments: HDFS-15003.001.patch, HDFS-15003.002.patch, 
> HDFS-15003.003.patch, HDFS-15003.004.patch, HDFS-15003.005.patch, 
> HDFS-15003.006.patch, HDFS-15003.007.patch, HDFS-15003.008.patch
>
>
> Make Router support storage type quota.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-14997) BPServiceActor processes commands from NameNode asynchronously

2019-12-26 Thread Ayush Saxena (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-14997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena updated HDFS-14997:

Resolution: Fixed
Status: Resolved  (was: Patch Available)

> BPServiceActor processes commands from NameNode asynchronously
> --
>
> Key: HDFS-14997
> URL: https://issues.apache.org/jira/browse/HDFS-14997
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Xiaoqiao He
>Assignee: Xiaoqiao He
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14997.001.patch, HDFS-14997.002.patch, 
> HDFS-14997.003.patch, HDFS-14997.004.patch, HDFS-14997.005.patch, 
> HDFS-14997.addendum.patch, image-2019-12-26-16-15-44-814.png
>
>
> There are two core functions, report(#sendHeartbeat, #blockReport, 
> #cacheReport) and #processCommand in #BPServiceActor main process flow. If 
> processCommand cost long time it will block send report flow. Meanwhile 
> processCommand could cost long time(over 1000s the worst case I meet) when IO 
> load  of DataNode is very high. Since some IO operations are under 
> #datasetLock, So it has to wait to acquire #datasetLock long time when 
> process some of commands(such as #DNA_INVALIDATE). In such case, #heartbeat 
> will not send to NameNode in-time, and trigger other disasters.
> I propose to improve #processCommand asynchronously and not block 
> #BPServiceActor to send heartbeat back to NameNode when meet high IO load.
> Notes:
> 1. Lifeline could be one effective solution, however some old branches are 
> not support this feature.
> 2. IO operations under #datasetLock is another issue, I think we should solve 
> it at another JIRA.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14997) BPServiceActor processes commands from NameNode asynchronously

2019-12-26 Thread Ayush Saxena (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17003863#comment-17003863
 ] 

Ayush Saxena commented on HDFS-14997:
-

Committed Addendum to trunk.
Thanx [~hexiaoqiao] for the contribution, [~elgoiri] and [~Kevin_Zheng] for the 
reviews!!!

> BPServiceActor processes commands from NameNode asynchronously
> --
>
> Key: HDFS-14997
> URL: https://issues.apache.org/jira/browse/HDFS-14997
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Xiaoqiao He
>Assignee: Xiaoqiao He
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14997.001.patch, HDFS-14997.002.patch, 
> HDFS-14997.003.patch, HDFS-14997.004.patch, HDFS-14997.005.patch, 
> HDFS-14997.addendum.patch, image-2019-12-26-16-15-44-814.png
>
>
> There are two core functions, report(#sendHeartbeat, #blockReport, 
> #cacheReport) and #processCommand in #BPServiceActor main process flow. If 
> processCommand cost long time it will block send report flow. Meanwhile 
> processCommand could cost long time(over 1000s the worst case I meet) when IO 
> load  of DataNode is very high. Since some IO operations are under 
> #datasetLock, So it has to wait to acquire #datasetLock long time when 
> process some of commands(such as #DNA_INVALIDATE). In such case, #heartbeat 
> will not send to NameNode in-time, and trigger other disasters.
> I propose to improve #processCommand asynchronously and not block 
> #BPServiceActor to send heartbeat back to NameNode when meet high IO load.
> Notes:
> 1. Lifeline could be one effective solution, however some old branches are 
> not support this feature.
> 2. IO operations under #datasetLock is another issue, I think we should solve 
> it at another JIRA.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-15003) RBF: Make Router support storage type quota.

2019-12-26 Thread Jinglun (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-15003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17003861#comment-17003861
 ] 

Jinglun commented on HDFS-15003:


Hi [~ayushtkn] , I run all the failed tests on my pc and they perform well. 
Except in TestFileChecksum many cases fail because of 
'java.lang.OutOfMemoryError: Java heap space'. I remove the v08 and still get 
the 'java.lang.OutOfMemoryError: Java heap space'. So I think it is unrelated.

> RBF: Make Router support storage type quota.
> 
>
> Key: HDFS-15003
> URL: https://issues.apache.org/jira/browse/HDFS-15003
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Jinglun
>Assignee: Jinglun
>Priority: Major
> Attachments: HDFS-15003.001.patch, HDFS-15003.002.patch, 
> HDFS-15003.003.patch, HDFS-15003.004.patch, HDFS-15003.005.patch, 
> HDFS-15003.006.patch, HDFS-15003.007.patch, HDFS-15003.008.patch
>
>
> Make Router support storage type quota.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14997) BPServiceActor processes commands from NameNode asynchronously

2019-12-26 Thread Ayush Saxena (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17003860#comment-17003860
 ] 

Ayush Saxena commented on HDFS-14997:
-

+1 for the addendum

> BPServiceActor processes commands from NameNode asynchronously
> --
>
> Key: HDFS-14997
> URL: https://issues.apache.org/jira/browse/HDFS-14997
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Xiaoqiao He
>Assignee: Xiaoqiao He
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14997.001.patch, HDFS-14997.002.patch, 
> HDFS-14997.003.patch, HDFS-14997.004.patch, HDFS-14997.005.patch, 
> HDFS-14997.addendum.patch, image-2019-12-26-16-15-44-814.png
>
>
> There are two core functions, report(#sendHeartbeat, #blockReport, 
> #cacheReport) and #processCommand in #BPServiceActor main process flow. If 
> processCommand cost long time it will block send report flow. Meanwhile 
> processCommand could cost long time(over 1000s the worst case I meet) when IO 
> load  of DataNode is very high. Since some IO operations are under 
> #datasetLock, So it has to wait to acquire #datasetLock long time when 
> process some of commands(such as #DNA_INVALIDATE). In such case, #heartbeat 
> will not send to NameNode in-time, and trigger other disasters.
> I propose to improve #processCommand asynchronously and not block 
> #BPServiceActor to send heartbeat back to NameNode when meet high IO load.
> Notes:
> 1. Lifeline could be one effective solution, however some old branches are 
> not support this feature.
> 2. IO operations under #datasetLock is another issue, I think we should solve 
> it at another JIRA.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14934) [SBN Read] Standby NN throws many InterruptedExceptions when dfs.ha.tail-edits.period is 0

2019-12-26 Thread Takanobu Asanuma (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17003830#comment-17003830
 ] 

Takanobu Asanuma commented on HDFS-14934:
-

[~ayushtkn] Thanks for investigating it!
I've confirmed the change suppresses the warn logs. Could you submit the patch?

> [SBN Read] Standby NN throws many InterruptedExceptions when 
> dfs.ha.tail-edits.period is 0
> --
>
> Key: HDFS-14934
> URL: https://issues.apache.org/jira/browse/HDFS-14934
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Takanobu Asanuma
>Priority: Major
>
> When dfs.ha.tail-edits.period is 0ms (or very short-time), there are many 
> warn logs in standby NN.
> {noformat}
> 2019-10-25 16:25:46,945 [Logger channel (from parallel executor) to  hostname>/:] WARN  concurrent.ExecutorHelper 
> (ExecutorHelper.java:logThrowableFromAfterExecute(55)) - Thread 
> (Thread[Logger channel (from parallel executor) to / address>:,5,main]) interrupted: 
> java.lang.InterruptedException
>   at 
> com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:509)
>   at 
> com.google.common.util.concurrent.FluentFuture$TrustedFuture.get(FluentFuture.java:82)
>   at 
> org.apache.hadoop.util.concurrent.ExecutorHelper.logThrowableFromAfterExecute(ExecutorHelper.java:48)
>   at 
> org.apache.hadoop.util.concurrent.HadoopThreadPoolExecutor.afterExecute(HadoopThreadPoolExecutor.java:90)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1157)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14997) BPServiceActor processes commands from NameNode asynchronously

2019-12-26 Thread Zhenyu Zheng (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17003827#comment-17003827
 ] 

Zhenyu Zheng commented on HDFS-14997:
-

[~hexiaoqiao] Thanks for the fix, manually patched and tested, it works. so +1

> BPServiceActor processes commands from NameNode asynchronously
> --
>
> Key: HDFS-14997
> URL: https://issues.apache.org/jira/browse/HDFS-14997
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Xiaoqiao He
>Assignee: Xiaoqiao He
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14997.001.patch, HDFS-14997.002.patch, 
> HDFS-14997.003.patch, HDFS-14997.004.patch, HDFS-14997.005.patch, 
> HDFS-14997.addendum.patch, image-2019-12-26-16-15-44-814.png
>
>
> There are two core functions, report(#sendHeartbeat, #blockReport, 
> #cacheReport) and #processCommand in #BPServiceActor main process flow. If 
> processCommand cost long time it will block send report flow. Meanwhile 
> processCommand could cost long time(over 1000s the worst case I meet) when IO 
> load  of DataNode is very high. Since some IO operations are under 
> #datasetLock, So it has to wait to acquire #datasetLock long time when 
> process some of commands(such as #DNA_INVALIDATE). In such case, #heartbeat 
> will not send to NameNode in-time, and trigger other disasters.
> I propose to improve #processCommand asynchronously and not block 
> #BPServiceActor to send heartbeat back to NameNode when meet high IO load.
> Notes:
> 1. Lifeline could be one effective solution, however some old branches are 
> not support this feature.
> 2. IO operations under #datasetLock is another issue, I think we should solve 
> it at another JIRA.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14997) BPServiceActor processes commands from NameNode asynchronously

2019-12-26 Thread Jira



[ 
https://issues.apache.org/jira/browse/HDFS-14997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17003776#comment-17003776
 ] 

Íñigo Goiri commented on HDFS-14997:


The addendum looks good to me.

> BPServiceActor processes commands from NameNode asynchronously
> --
>
> Key: HDFS-14997
> URL: https://issues.apache.org/jira/browse/HDFS-14997
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Xiaoqiao He
>Assignee: Xiaoqiao He
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14997.001.patch, HDFS-14997.002.patch, 
> HDFS-14997.003.patch, HDFS-14997.004.patch, HDFS-14997.005.patch, 
> HDFS-14997.addendum.patch, image-2019-12-26-16-15-44-814.png
>
>
> There are two core functions, report(#sendHeartbeat, #blockReport, 
> #cacheReport) and #processCommand in #BPServiceActor main process flow. If 
> processCommand cost long time it will block send report flow. Meanwhile 
> processCommand could cost long time(over 1000s the worst case I meet) when IO 
> load  of DataNode is very high. Since some IO operations are under 
> #datasetLock, So it has to wait to acquire #datasetLock long time when 
> process some of commands(such as #DNA_INVALIDATE). In such case, #heartbeat 
> will not send to NameNode in-time, and trigger other disasters.
> I propose to improve #processCommand asynchronously and not block 
> #BPServiceActor to send heartbeat back to NameNode when meet high IO load.
> Notes:
> 1. Lifeline could be one effective solution, however some old branches are 
> not support this feature.
> 2. IO operations under #datasetLock is another issue, I think we should solve 
> it at another JIRA.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-15003) RBF: Make Router support storage type quota.

2019-12-26 Thread Ayush Saxena (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-15003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17003723#comment-17003723
 ] 

Ayush Saxena commented on HDFS-15003:
-

Bunch of test failures, Seems unrelated. [~LiJinglun] can you double check once?
v008 LGTM +1

> RBF: Make Router support storage type quota.
> 
>
> Key: HDFS-15003
> URL: https://issues.apache.org/jira/browse/HDFS-15003
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Jinglun
>Assignee: Jinglun
>Priority: Major
> Attachments: HDFS-15003.001.patch, HDFS-15003.002.patch, 
> HDFS-15003.003.patch, HDFS-15003.004.patch, HDFS-15003.005.patch, 
> HDFS-15003.006.patch, HDFS-15003.007.patch, HDFS-15003.008.patch
>
>
> Make Router support storage type quota.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14934) [SBN Read] Standby NN throws many InterruptedExceptions when dfs.ha.tail-edits.period is 0

2019-12-26 Thread Ayush Saxena (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17003719#comment-17003719
 ] 

Ayush Saxena commented on HDFS-14934:
-

Sorry [~tasanuma] for coming up late. The reason for the logs seems to be use 
of {{HadoopThreadPoolExecutor}} which has a {{afterExecute}} method, which is 
logging this.
Changing {{HadoopThreadPoolExecutor}} to {{ThreadPoolExecutor}} should fix the 
problem. 
Can you check once.



> [SBN Read] Standby NN throws many InterruptedExceptions when 
> dfs.ha.tail-edits.period is 0
> --
>
> Key: HDFS-14934
> URL: https://issues.apache.org/jira/browse/HDFS-14934
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Takanobu Asanuma
>Priority: Major
>
> When dfs.ha.tail-edits.period is 0ms (or very short-time), there are many 
> warn logs in standby NN.
> {noformat}
> 2019-10-25 16:25:46,945 [Logger channel (from parallel executor) to  hostname>/:] WARN  concurrent.ExecutorHelper 
> (ExecutorHelper.java:logThrowableFromAfterExecute(55)) - Thread 
> (Thread[Logger channel (from parallel executor) to / address>:,5,main]) interrupted: 
> java.lang.InterruptedException
>   at 
> com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:509)
>   at 
> com.google.common.util.concurrent.FluentFuture$TrustedFuture.get(FluentFuture.java:82)
>   at 
> org.apache.hadoop.util.concurrent.ExecutorHelper.logThrowableFromAfterExecute(ExecutorHelper.java:48)
>   at 
> org.apache.hadoop.util.concurrent.HadoopThreadPoolExecutor.afterExecute(HadoopThreadPoolExecutor.java:90)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1157)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-15003) RBF: Make Router support storage type quota.

2019-12-26 Thread Hadoop QA (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-15003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17003708#comment-17003708
 ] 

Hadoop QA commented on HDFS-15003:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
54s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
24s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 
28s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  3m 
38s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
57s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
53s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
16m 43s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
29s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m  
9s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
11s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  3m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  3m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 42s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
12s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}130m 46s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  8m 
17s{color} | {color:green} hadoop-hdfs-rbf in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
51s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}217m  9s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.TestDeadNodeDetection |
|   | hadoop.hdfs.server.namenode.ha.TestFailureToReadEdits |
|   | hadoop.hdfs.TestDFSInputStream |
|   | hadoop.hdfs.TestDecommissionWithStripedBackoffMonitor |
|   | hadoop.hdfs.TestDatanodeRegistration |
|   | hadoop.hdfs.TestStateAlignmentContextWithHA |
|   | hadoop.hdfs.server.namenode.TestRedudantBlocks |
|   | hadoop.hdfs.server.namenode.snapshot.TestRenameWithSnapshots |
|   | hadoop.hdfs.TestFileChecksumCompositeCrc |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.5 Server=19.03.5 Image:yetus/hadoop:c44943d1fc3 |
| JIRA Issue | HDFS-15003 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12989495/HDFS-15003.008.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |

[jira] [Commented] (HDFS-14528) Failover from Active to Standby Failed

2019-12-26 Thread Hadoop QA (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17003707#comment-17003707
 ] 

Hadoop QA commented on HDFS-14528:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
37s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m  
0s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
20s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 15m 
44s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
32s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
44s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
19m 30s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
11s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
49s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
26s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
 1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 16m  
8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 16m  
8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 28s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
48s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  9m 
45s{color} | {color:green} hadoop-common in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}130m 12s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  1m 
34s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}252m 34s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.TestFileChecksum |
|   | hadoop.hdfs.TestDFSStripedOutputStreamWithFailureWithRandomECPolicy |
|   | hadoop.hdfs.TestMaintenanceState |
|   | hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFSStriped |
|   | hadoop.hdfs.TestDecommissionWithStripedBackoffMonitor |
|   | hadoop.hdfs.tools.TestDFSZKFailoverController |
|   | hadoop.hdfs.server.namenode.TestRedudantBlocks |
|   | hadoop.hdfs.server.namenode.snapshot.TestRenameWithSnapshots |
|   | hadoop.hdfs.TestFileChecksumCompositeCrc |
|   | hadoop.hdfs.TestSafeModeWithStripedFileWithRandomECPolicy |
|   | hadoop.hdfs.TestRollingUpgrade |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.5 Server=19.03.5 Image:yetus/hadoop:c44943d1fc3 |
| JIRA Issue | HDFS-14528 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12986408/HDFS-14528.0

[jira] [Updated] (HDFS-14442) Disagreement between HAUtil.getAddressOfActive and RpcInvocationHandler.getConnectionId

2019-12-26 Thread Ravuri Sushma sree (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-14442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravuri Sushma sree updated HDFS-14442:
--
Attachment: HDFS-14442.004.patch

> Disagreement between HAUtil.getAddressOfActive and 
> RpcInvocationHandler.getConnectionId
> ---
>
> Key: HDFS-14442
> URL: https://issues.apache.org/jira/browse/HDFS-14442
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.3.0
>Reporter: Erik Krogen
>Assignee: Ravuri Sushma sree
>Priority: Major
> Attachments: HDFS-14442.001.patch, HDFS-14442.002.patch, 
> HDFS-14442.003.patch, HDFS-14442.004.patch
>
>
> While working on HDFS-14245, we noticed a discrepancy in some proxy-handling 
> code.
> The description of {{RpcInvocationHandler.getConnectionId()}} states:
> {code}
>   /**
>* Returns the connection id associated with the InvocationHandler instance.
>* @return ConnectionId
>*/
>   ConnectionId getConnectionId();
> {code}
> It does not make any claims about whether this connection ID will be an 
> active proxy or not. Yet in {{HAUtil}} we have:
> {code}
>   /**
>* Get the internet address of the currently-active NN. This should rarely 
> be
>* used, since callers of this method who connect directly to the NN using 
> the
>* resulting InetSocketAddress will not be able to connect to the active NN 
> if
>* a failover were to occur after this method has been called.
>* 
>* @param fs the file system to get the active address of.
>* @return the internet address of the currently-active NN.
>* @throws IOException if an error occurs while resolving the active NN.
>*/
>   public static InetSocketAddress getAddressOfActive(FileSystem fs)
>   throws IOException {
> if (!(fs instanceof DistributedFileSystem)) {
>   throw new IllegalArgumentException("FileSystem " + fs + " is not a 
> DFS.");
> }
> // force client address resolution.
> fs.exists(new Path("/"));
> DistributedFileSystem dfs = (DistributedFileSystem) fs;
> DFSClient dfsClient = dfs.getClient();
> return RPC.getServerAddress(dfsClient.getNamenode());
>   }
> {code}
> Where the call {{RPC.getServerAddress()}} eventually terminates into 
> {{RpcInvocationHandler#getConnectionId()}}, via {{RPC.getServerAddress()}} -> 
> {{RPC.getConnectionIdForProxy()}} -> 
> {{RpcInvocationHandler#getConnectionId()}}. {{HAUtil}} appears to be making 
> an incorrect assumption that {{RpcInvocationHandler}} will necessarily return 
> an _active_ connection ID. {{ObserverReadProxyProvider}} demonstrates a 
> counter-example to this, since the current connection ID may be pointing at, 
> for example, an Observer NameNode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14442) Disagreement between HAUtil.getAddressOfActive and RpcInvocationHandler.getConnectionId

2019-12-26 Thread Ravuri Sushma sree (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17003692#comment-17003692
 ] 

Ravuri Sushma sree commented on HDFS-14442:
---

Hi [~xkrogen] , thank you for your inputs in simplifying the test, I have 
uploaded a patch implementing the same. Please review

> Disagreement between HAUtil.getAddressOfActive and 
> RpcInvocationHandler.getConnectionId
> ---
>
> Key: HDFS-14442
> URL: https://issues.apache.org/jira/browse/HDFS-14442
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.3.0
>Reporter: Erik Krogen
>Assignee: Ravuri Sushma sree
>Priority: Major
> Attachments: HDFS-14442.001.patch, HDFS-14442.002.patch, 
> HDFS-14442.003.patch
>
>
> While working on HDFS-14245, we noticed a discrepancy in some proxy-handling 
> code.
> The description of {{RpcInvocationHandler.getConnectionId()}} states:
> {code}
>   /**
>* Returns the connection id associated with the InvocationHandler instance.
>* @return ConnectionId
>*/
>   ConnectionId getConnectionId();
> {code}
> It does not make any claims about whether this connection ID will be an 
> active proxy or not. Yet in {{HAUtil}} we have:
> {code}
>   /**
>* Get the internet address of the currently-active NN. This should rarely 
> be
>* used, since callers of this method who connect directly to the NN using 
> the
>* resulting InetSocketAddress will not be able to connect to the active NN 
> if
>* a failover were to occur after this method has been called.
>* 
>* @param fs the file system to get the active address of.
>* @return the internet address of the currently-active NN.
>* @throws IOException if an error occurs while resolving the active NN.
>*/
>   public static InetSocketAddress getAddressOfActive(FileSystem fs)
>   throws IOException {
> if (!(fs instanceof DistributedFileSystem)) {
>   throw new IllegalArgumentException("FileSystem " + fs + " is not a 
> DFS.");
> }
> // force client address resolution.
> fs.exists(new Path("/"));
> DistributedFileSystem dfs = (DistributedFileSystem) fs;
> DFSClient dfsClient = dfs.getClient();
> return RPC.getServerAddress(dfsClient.getNamenode());
>   }
> {code}
> Where the call {{RPC.getServerAddress()}} eventually terminates into 
> {{RpcInvocationHandler#getConnectionId()}}, via {{RPC.getServerAddress()}} -> 
> {{RPC.getConnectionIdForProxy()}} -> 
> {{RpcInvocationHandler#getConnectionId()}}. {{HAUtil}} appears to be making 
> an incorrect assumption that {{RpcInvocationHandler}} will necessarily return 
> an _active_ connection ID. {{ObserverReadProxyProvider}} demonstrates a 
> counter-example to this, since the current connection ID may be pointing at, 
> for example, an Observer NameNode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-15079) RBF: Client maybe get an unexpected result with network anomaly

2019-12-26 Thread Fei Hui (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-15079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17003619#comment-17003619
 ] 

Fei Hui commented on HDFS-15079:


Thanks [~hexiaoqiao] 
{quote}
ClientId & CallId of request from Router to NameNode are both created by Router 
itself 
{quote}
Yes. I was wrong， regards clientName as clientId :(
Digging in

> RBF: Client maybe get an unexpected result with network anomaly 
> 
>
> Key: HDFS-15079
> URL: https://issues.apache.org/jira/browse/HDFS-15079
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: rbf
>Affects Versions: 3.3.0
>Reporter: Fei Hui
>Priority: Critical
> Attachments: UnexpectedOverWriteUT.patch
>
>
>  I find there is a critical problem on RBF, HDFS-15078 can resolve it on some 
> Scenarios, but i have no idea about the overall resolution.
> The problem is that
> Client with RBF(r0, r1) create a file HDFS file via r0, it gets Exception and 
> failovers to r1
> r0 has been send create rpc to namenode(1st create)
> Client create a HDFS file via r1(2nd create)
> Client writes the HDFS file and close it finally(3rd close)
> Maybe namenode receiving the rpc in order as follow
> 2nd create
> 3rd close
> 1st create
> And overwrite is true by default, this would make the file had been written 
> an empty file. This is an critical problem 
> We had encountered this problem. There are many hive and spark jobs running 
> on our cluster,   sometimes it occurs



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-15003) RBF: Make Router support storage type quota.

2019-12-26 Thread Jinglun (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-15003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17003618#comment-17003618
 ] 

Jinglun commented on HDFS-15003:


Hi [~ayushtkn], thanks your reminding ! Upload v08.

> RBF: Make Router support storage type quota.
> 
>
> Key: HDFS-15003
> URL: https://issues.apache.org/jira/browse/HDFS-15003
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Jinglun
>Assignee: Jinglun
>Priority: Major
> Attachments: HDFS-15003.001.patch, HDFS-15003.002.patch, 
> HDFS-15003.003.patch, HDFS-15003.004.patch, HDFS-15003.005.patch, 
> HDFS-15003.006.patch, HDFS-15003.007.patch, HDFS-15003.008.patch
>
>
> Make Router support storage type quota.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-15003) RBF: Make Router support storage type quota.

2019-12-26 Thread Jinglun (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinglun updated HDFS-15003:
---
Attachment: HDFS-15003.008.patch

> RBF: Make Router support storage type quota.
> 
>
> Key: HDFS-15003
> URL: https://issues.apache.org/jira/browse/HDFS-15003
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Jinglun
>Assignee: Jinglun
>Priority: Major
> Attachments: HDFS-15003.001.patch, HDFS-15003.002.patch, 
> HDFS-15003.003.patch, HDFS-15003.004.patch, HDFS-15003.005.patch, 
> HDFS-15003.006.patch, HDFS-15003.007.patch, HDFS-15003.008.patch
>
>
> Make Router support storage type quota.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14997) BPServiceActor processes commands from NameNode asynchronously

2019-12-26 Thread Xiaoqiao He (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17003613#comment-17003613
 ] 

Xiaoqiao He commented on HDFS-14997:


[~elgoiri] Please ignore failed unit tests reported by QA in the 
second-to-last, since I submit wrong addendum patch at the beginning. Please 
reference the last one. Thanks.

> BPServiceActor processes commands from NameNode asynchronously
> --
>
> Key: HDFS-14997
> URL: https://issues.apache.org/jira/browse/HDFS-14997
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Xiaoqiao He
>Assignee: Xiaoqiao He
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14997.001.patch, HDFS-14997.002.patch, 
> HDFS-14997.003.patch, HDFS-14997.004.patch, HDFS-14997.005.patch, 
> HDFS-14997.addendum.patch, image-2019-12-26-16-15-44-814.png
>
>
> There are two core functions, report(#sendHeartbeat, #blockReport, 
> #cacheReport) and #processCommand in #BPServiceActor main process flow. If 
> processCommand cost long time it will block send report flow. Meanwhile 
> processCommand could cost long time(over 1000s the worst case I meet) when IO 
> load  of DataNode is very high. Since some IO operations are under 
> #datasetLock, So it has to wait to acquire #datasetLock long time when 
> process some of commands(such as #DNA_INVALIDATE). In such case, #heartbeat 
> will not send to NameNode in-time, and trigger other disasters.
> I propose to improve #processCommand asynchronously and not block 
> #BPServiceActor to send heartbeat back to NameNode when meet high IO load.
> Notes:
> 1. Lifeline could be one effective solution, however some old branches are 
> not support this feature.
> 2. IO operations under #datasetLock is another issue, I think we should solve 
> it at another JIRA.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14997) BPServiceActor processes commands from NameNode asynchronously

2019-12-26 Thread Hadoop QA (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17003609#comment-17003609
 ] 

Hadoop QA commented on HDFS-14997:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
30s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
1s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
41s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
6s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 24s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
8s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
15s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 21s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
12s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
16s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 93m 57s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
38s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}151m 27s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.server.namenode.TestRedudantBlocks |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.5 Server=19.03.5 Image:yetus/hadoop:c44943d1fc3 |
| JIRA Issue | HDFS-14997 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12989487/HDFS-14997.addendum.patch
 |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux e72cf1da6758 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 300505c |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_232 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28574/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28574/testReport/ |
| Max. process+thread count | 3843 (vs. ulimit of 5500) |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
| Console o

[jira] [Commented] (HDFS-14997) BPServiceActor processes commands from NameNode asynchronously

2019-12-26 Thread Jira



[ 
https://issues.apache.org/jira/browse/HDFS-14997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17003607#comment-17003607
 ] 

Íñigo Goiri commented on HDFS-14997:


I think the failed unit tests are related as we are interrupting the threads 
twice.
We should probably also set it to null once we are done in addition to checking 
if it was interrupted. 

> BPServiceActor processes commands from NameNode asynchronously
> --
>
> Key: HDFS-14997
> URL: https://issues.apache.org/jira/browse/HDFS-14997
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Xiaoqiao He
>Assignee: Xiaoqiao He
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14997.001.patch, HDFS-14997.002.patch, 
> HDFS-14997.003.patch, HDFS-14997.004.patch, HDFS-14997.005.patch, 
> HDFS-14997.addendum.patch, image-2019-12-26-16-15-44-814.png
>
>
> There are two core functions, report(#sendHeartbeat, #blockReport, 
> #cacheReport) and #processCommand in #BPServiceActor main process flow. If 
> processCommand cost long time it will block send report flow. Meanwhile 
> processCommand could cost long time(over 1000s the worst case I meet) when IO 
> load  of DataNode is very high. Since some IO operations are under 
> #datasetLock, So it has to wait to acquire #datasetLock long time when 
> process some of commands(such as #DNA_INVALIDATE). In such case, #heartbeat 
> will not send to NameNode in-time, and trigger other disasters.
> I propose to improve #processCommand asynchronously and not block 
> #BPServiceActor to send heartbeat back to NameNode when meet high IO load.
> Notes:
> 1. Lifeline could be one effective solution, however some old branches are 
> not support this feature.
> 2. IO operations under #datasetLock is another issue, I think we should solve 
> it at another JIRA.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14997) BPServiceActor processes commands from NameNode asynchronously

2019-12-26 Thread Hadoop QA (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17003578#comment-17003578
 ] 

Hadoop QA commented on HDFS-14997:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
53s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 
20s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
6s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
46s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
10s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m  3s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
21s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
15s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 6s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
5s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} shadedclient {color} | {color:red} 13m 
31s{color} | {color:red} patch has errors when building and testing our client 
artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
17s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 38m 21s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
32s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}102m 35s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.TestSetrepDecreasing |
|   | hadoop.hdfs.server.blockmanagement.TestNameNodePrunesMissingStorages |
|   | hadoop.hdfs.server.datanode.TestNNHandlesCombinedBlockReport |
|   | hadoop.fs.TestResolveHdfsSymlink |
|   | hadoop.hdfs.TestClose |
|   | hadoop.hdfs.server.namenode.TestNameNodeMetadataConsistency |
|   | hadoop.fs.viewfs.TestViewFsDefaultValue |
|   | hadoop.hdfs.TestWriteBlockGetsBlockLengthHint |
|   | hadoop.hdfs.TestDatanodeConfig |
|   | hadoop.hdfs.tools.TestECAdmin |
|   | hadoop.hdfs.server.datanode.TestDataNodeErasureCodingMetrics |
|   | 
hadoop.hdfs.server.namenode.snapshot.TestSnapshotNameWithInvalidCharacters |
|   | hadoop.hdfs.TestFileChecksum |
|   | hadoop.hdfs.server.namenode.TestXAttrConfigFlag |
|   | hadoop.hdfs.web.TestWebHdfsTokens |
|   | hadoop.hdfs.security.TestDelegationTokenForProxyUser |
|   | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting |
|   | hadoop.hdfs.server.namenode.TestQuotaByStorageType |
|   | hadoop.hdfs.TestDFSFinalize |
|   | hadoop.hdfs.TestDFSStripedInputStream |
|   | 
hadoop.hdfs.tools.offlineImageViewer.TestOfflineImageViewerForStoragePolicy |
|   | hadoop.hdfs.se

[jira] [Commented] (HDFS-14997) BPServiceActor processes commands from NameNode asynchronously

2019-12-26 Thread Xiaoqiao He (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17003574#comment-17003574
 ] 

Xiaoqiao He commented on HDFS-14997:


run {{TestFileChecksum}} at local with addendum patch, try to dump memory 
again, The number of DataNode instances and commandProcessingThread are both 
restore. Please help to double check.[~ayushtkn],[~Kevin_Zheng]. Thanks again.

> BPServiceActor processes commands from NameNode asynchronously
> --
>
> Key: HDFS-14997
> URL: https://issues.apache.org/jira/browse/HDFS-14997
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Xiaoqiao He
>Assignee: Xiaoqiao He
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14997.001.patch, HDFS-14997.002.patch, 
> HDFS-14997.003.patch, HDFS-14997.004.patch, HDFS-14997.005.patch, 
> HDFS-14997.addendum.patch, image-2019-12-26-16-15-44-814.png
>
>
> There are two core functions, report(#sendHeartbeat, #blockReport, 
> #cacheReport) and #processCommand in #BPServiceActor main process flow. If 
> processCommand cost long time it will block send report flow. Meanwhile 
> processCommand could cost long time(over 1000s the worst case I meet) when IO 
> load  of DataNode is very high. Since some IO operations are under 
> #datasetLock, So it has to wait to acquire #datasetLock long time when 
> process some of commands(such as #DNA_INVALIDATE). In such case, #heartbeat 
> will not send to NameNode in-time, and trigger other disasters.
> I propose to improve #processCommand asynchronously and not block 
> #BPServiceActor to send heartbeat back to NameNode when meet high IO load.
> Notes:
> 1. Lifeline could be one effective solution, however some old branches are 
> not support this feature.
> 2. IO operations under #datasetLock is another issue, I think we should solve 
> it at another JIRA.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-15079) RBF: Client maybe get an unexpected result with network anomaly

2019-12-26 Thread Xiaoqiao He (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-15079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17003570#comment-17003570
 ] 

Xiaoqiao He commented on HDFS-15079:


Hi [~ferhui], IIUC, ClientId & CallId of request from Router to NameNode are 
both created by Router itself and ClientId matches with connection one by one. 
Ref. ConnectionPool#newConnection
{code:java}
Object proxy = RPC.getProtocolProxy(classes.protoPb, version, socket, ugi,
conf, factory, RPC.getRpcTimeout(conf), defaultPolicy, null).getProxy();
{code}
If router could reuse ClientId and CallId received from client to request 
NameNode, we should not worry about non-idempotent operation request to 
different routers and get the wrong result in some case.
The only question is this solution has to involve RPC changes, we need to open 
ClientId & CallId getter and setter to support router handle them.

> RBF: Client maybe get an unexpected result with network anomaly 
> 
>
> Key: HDFS-15079
> URL: https://issues.apache.org/jira/browse/HDFS-15079
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: rbf
>Affects Versions: 3.3.0
>Reporter: Fei Hui
>Priority: Critical
> Attachments: UnexpectedOverWriteUT.patch
>
>
>  I find there is a critical problem on RBF, HDFS-15078 can resolve it on some 
> Scenarios, but i have no idea about the overall resolution.
> The problem is that
> Client with RBF(r0, r1) create a file HDFS file via r0, it gets Exception and 
> failovers to r1
> r0 has been send create rpc to namenode(1st create)
> Client create a HDFS file via r1(2nd create)
> Client writes the HDFS file and close it finally(3rd close)
> Maybe namenode receiving the rpc in order as follow
> 2nd create
> 3rd close
> 1st create
> And overwrite is true by default, this would make the file had been written 
> an empty file. This is an critical problem 
> We had encountered this problem. There are many hive and spark jobs running 
> on our cluster,   sometimes it occurs



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-14997) BPServiceActor processes commands from NameNode asynchronously

2019-12-26 Thread Xiaoqiao He (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-14997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoqiao He updated HDFS-14997:
---
Attachment: HDFS-14997.addendum.patch

> BPServiceActor processes commands from NameNode asynchronously
> --
>
> Key: HDFS-14997
> URL: https://issues.apache.org/jira/browse/HDFS-14997
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Xiaoqiao He
>Assignee: Xiaoqiao He
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14997.001.patch, HDFS-14997.002.patch, 
> HDFS-14997.003.patch, HDFS-14997.004.patch, HDFS-14997.005.patch, 
> HDFS-14997.addendum.patch, image-2019-12-26-16-15-44-814.png
>
>
> There are two core functions, report(#sendHeartbeat, #blockReport, 
> #cacheReport) and #processCommand in #BPServiceActor main process flow. If 
> processCommand cost long time it will block send report flow. Meanwhile 
> processCommand could cost long time(over 1000s the worst case I meet) when IO 
> load  of DataNode is very high. Since some IO operations are under 
> #datasetLock, So it has to wait to acquire #datasetLock long time when 
> process some of commands(such as #DNA_INVALIDATE). In such case, #heartbeat 
> will not send to NameNode in-time, and trigger other disasters.
> I propose to improve #processCommand asynchronously and not block 
> #BPServiceActor to send heartbeat back to NameNode when meet high IO load.
> Notes:
> 1. Lifeline could be one effective solution, however some old branches are 
> not support this feature.
> 2. IO operations under #datasetLock is another issue, I think we should solve 
> it at another JIRA.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-14997) BPServiceActor processes commands from NameNode asynchronously

2019-12-26 Thread Xiaoqiao He (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-14997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoqiao He updated HDFS-14997:
---
Attachment: (was: HDFS-14997.addendum.patch)

> BPServiceActor processes commands from NameNode asynchronously
> --
>
> Key: HDFS-14997
> URL: https://issues.apache.org/jira/browse/HDFS-14997
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Xiaoqiao He
>Assignee: Xiaoqiao He
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14997.001.patch, HDFS-14997.002.patch, 
> HDFS-14997.003.patch, HDFS-14997.004.patch, HDFS-14997.005.patch, 
> HDFS-14997.addendum.patch, image-2019-12-26-16-15-44-814.png
>
>
> There are two core functions, report(#sendHeartbeat, #blockReport, 
> #cacheReport) and #processCommand in #BPServiceActor main process flow. If 
> processCommand cost long time it will block send report flow. Meanwhile 
> processCommand could cost long time(over 1000s the worst case I meet) when IO 
> load  of DataNode is very high. Since some IO operations are under 
> #datasetLock, So it has to wait to acquire #datasetLock long time when 
> process some of commands(such as #DNA_INVALIDATE). In such case, #heartbeat 
> will not send to NameNode in-time, and trigger other disasters.
> I propose to improve #processCommand asynchronously and not block 
> #BPServiceActor to send heartbeat back to NameNode when meet high IO load.
> Notes:
> 1. Lifeline could be one effective solution, however some old branches are 
> not support this feature.
> 2. IO operations under #datasetLock is another issue, I think we should solve 
> it at another JIRA.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14997) BPServiceActor processes commands from NameNode asynchronously

2019-12-26 Thread Xiaoqiao He (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17003535#comment-17003535
 ] 

Xiaoqiao He commented on HDFS-14997:


Thanks [~ayushtkn],[~Kevin_Zheng] for your reminder and very valuable comments, 
very sorry for not check this logic very carefully. Try to dump memory and 
analyze who occupy, I observe that there are over 200 DataNode object actively 
and each one occupy over 600K retained heap size when run {{TestFileChecksum}}, 
and {{commandProcessingThread}} thread not interrupt while stop DataNode. In 
one word, it is easy to meet OutOfMemoryError if there are many test methods in 
single test class and setup MiniCluster with lots of DataNode.
[^HDFS-14997.addendum.patch] try to fix this issue, Please take reviews if have 
time. cc [~elgoiri],[~weichiu]

> BPServiceActor processes commands from NameNode asynchronously
> --
>
> Key: HDFS-14997
> URL: https://issues.apache.org/jira/browse/HDFS-14997
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Xiaoqiao He
>Assignee: Xiaoqiao He
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14997.001.patch, HDFS-14997.002.patch, 
> HDFS-14997.003.patch, HDFS-14997.004.patch, HDFS-14997.005.patch, 
> HDFS-14997.addendum.patch, image-2019-12-26-16-15-44-814.png
>
>
> There are two core functions, report(#sendHeartbeat, #blockReport, 
> #cacheReport) and #processCommand in #BPServiceActor main process flow. If 
> processCommand cost long time it will block send report flow. Meanwhile 
> processCommand could cost long time(over 1000s the worst case I meet) when IO 
> load  of DataNode is very high. Since some IO operations are under 
> #datasetLock, So it has to wait to acquire #datasetLock long time when 
> process some of commands(such as #DNA_INVALIDATE). In such case, #heartbeat 
> will not send to NameNode in-time, and trigger other disasters.
> I propose to improve #processCommand asynchronously and not block 
> #BPServiceActor to send heartbeat back to NameNode when meet high IO load.
> Notes:
> 1. Lifeline could be one effective solution, however some old branches are 
> not support this feature.
> 2. IO operations under #datasetLock is another issue, I think we should solve 
> it at another JIRA.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-14997) BPServiceActor processes commands from NameNode asynchronously

2019-12-26 Thread Xiaoqiao He (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-14997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoqiao He updated HDFS-14997:
---
Attachment: HDFS-14997.addendum.patch
Status: Patch Available  (was: Reopened)

> BPServiceActor processes commands from NameNode asynchronously
> --
>
> Key: HDFS-14997
> URL: https://issues.apache.org/jira/browse/HDFS-14997
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Xiaoqiao He
>Assignee: Xiaoqiao He
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14997.001.patch, HDFS-14997.002.patch, 
> HDFS-14997.003.patch, HDFS-14997.004.patch, HDFS-14997.005.patch, 
> HDFS-14997.addendum.patch, image-2019-12-26-16-15-44-814.png
>
>
> There are two core functions, report(#sendHeartbeat, #blockReport, 
> #cacheReport) and #processCommand in #BPServiceActor main process flow. If 
> processCommand cost long time it will block send report flow. Meanwhile 
> processCommand could cost long time(over 1000s the worst case I meet) when IO 
> load  of DataNode is very high. Since some IO operations are under 
> #datasetLock, So it has to wait to acquire #datasetLock long time when 
> process some of commands(such as #DNA_INVALIDATE). In such case, #heartbeat 
> will not send to NameNode in-time, and trigger other disasters.
> I propose to improve #processCommand asynchronously and not block 
> #BPServiceActor to send heartbeat back to NameNode when meet high IO load.
> Notes:
> 1. Lifeline could be one effective solution, however some old branches are 
> not support this feature.
> 2. IO operations under #datasetLock is another issue, I think we should solve 
> it at another JIRA.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Reopened] (HDFS-14997) BPServiceActor processes commands from NameNode asynchronously

2019-12-26 Thread Xiaoqiao He (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-14997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoqiao He reopened HDFS-14997:


> BPServiceActor processes commands from NameNode asynchronously
> --
>
> Key: HDFS-14997
> URL: https://issues.apache.org/jira/browse/HDFS-14997
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Xiaoqiao He
>Assignee: Xiaoqiao He
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14997.001.patch, HDFS-14997.002.patch, 
> HDFS-14997.003.patch, HDFS-14997.004.patch, HDFS-14997.005.patch, 
> image-2019-12-26-16-15-44-814.png
>
>
> There are two core functions, report(#sendHeartbeat, #blockReport, 
> #cacheReport) and #processCommand in #BPServiceActor main process flow. If 
> processCommand cost long time it will block send report flow. Meanwhile 
> processCommand could cost long time(over 1000s the worst case I meet) when IO 
> load  of DataNode is very high. Since some IO operations are under 
> #datasetLock, So it has to wait to acquire #datasetLock long time when 
> process some of commands(such as #DNA_INVALIDATE). In such case, #heartbeat 
> will not send to NameNode in-time, and trigger other disasters.
> I propose to improve #processCommand asynchronously and not block 
> #BPServiceActor to send heartbeat back to NameNode when meet high IO load.
> Notes:
> 1. Lifeline could be one effective solution, however some old branches are 
> not support this feature.
> 2. IO operations under #datasetLock is another issue, I think we should solve 
> it at another JIRA.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-15080) Fix the issue in reading persistent memory cache with an offset

2019-12-26 Thread Hadoop QA (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-15080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17003520#comment-17003520
 ] 

Hadoop QA commented on HDFS-15080:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
14s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} branch-3.2 Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 28m 
 6s{color} | {color:green} branch-3.2 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
14s{color} | {color:green} branch-3.2 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
54s{color} | {color:green} branch-3.2 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
21s{color} | {color:green} branch-3.2 passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
16m 14s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
37s{color} | {color:green} branch-3.2 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
8s{color} | {color:green} branch-3.2 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 1s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 53s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
46s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 99m 37s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
31s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}174m 56s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.TestLeaseRecovery2 |
|   | hadoop.hdfs.server.diskbalancer.TestDiskBalancer |
|   | hadoop.hdfs.server.blockmanagement.TestUnderReplicatedBlocks |
|   | hadoop.hdfs.server.balancer.TestBalancer |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.5 Server=19.03.5 Image:yetus/hadoop:0f25cbbb251 |
| JIRA Issue | HDFS-15080 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12989478/HDFS-15080-branch-3.2-000.patch
 |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 31dba5851532 4.15.0-66-generic #75-Ubuntu SMP Tue Oct 1 
05:24:09 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | branch-3.2 / 440b7ab |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_232 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28572/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 
https://builds.apach

[jira] [Comment Edited] (HDFS-14997) BPServiceActor processes commands from NameNode asynchronously

2019-12-26 Thread Zhenyu Zheng (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17003518#comment-17003518
 ] 

Zhenyu Zheng edited comment on HDFS-14997 at 12/26/19 8:15 AM:
---

Hi [~hexiaoqiao] here is the hprof we got from our tests, for 
{{TestFileChecksum}}  :

!image-2019-12-26-16-15-44-814.png!

 

And as you can see from our CI: 
[https://builds.apache.org/job/Hadoop-qbt-linux-ARM-trunk/] we are running full 
tests once per day and these tests starts to fail serveral days ago.

I've also triggered one test for the upstream CI, it also fails for Datanode 
related tests:

[https://builds.apache.org/view/H-L/view/Hadoop/job/hadoop-multibranch/view/change-requests/job/PR-1783/1/#showFailuresLink]
 


was (Author: kevin_zheng):
Hi [~hexiaoqiao] here is the hprof we got from our tests, for 
{{TestFileChecksum}}  :

!image-2019-12-26-16-08-53-472.png! \

 

And as you can see from our CI: 
[https://builds.apache.org/job/Hadoop-qbt-linux-ARM-trunk/] we are running full 
tests once per day and these tests starts to fail serveral days ago.

I've also triggered one test for the upstream CI, it also fails for Datanode 
related tests:

[https://builds.apache.org/view/H-L/view/Hadoop/job/hadoop-multibranch/view/change-requests/job/PR-1783/1/#showFailuresLink]
 

> BPServiceActor processes commands from NameNode asynchronously
> --
>
> Key: HDFS-14997
> URL: https://issues.apache.org/jira/browse/HDFS-14997
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Xiaoqiao He
>Assignee: Xiaoqiao He
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14997.001.patch, HDFS-14997.002.patch, 
> HDFS-14997.003.patch, HDFS-14997.004.patch, HDFS-14997.005.patch, 
> image-2019-12-26-16-15-44-814.png
>
>
> There are two core functions, report(#sendHeartbeat, #blockReport, 
> #cacheReport) and #processCommand in #BPServiceActor main process flow. If 
> processCommand cost long time it will block send report flow. Meanwhile 
> processCommand could cost long time(over 1000s the worst case I meet) when IO 
> load  of DataNode is very high. Since some IO operations are under 
> #datasetLock, So it has to wait to acquire #datasetLock long time when 
> process some of commands(such as #DNA_INVALIDATE). In such case, #heartbeat 
> will not send to NameNode in-time, and trigger other disasters.
> I propose to improve #processCommand asynchronously and not block 
> #BPServiceActor to send heartbeat back to NameNode when meet high IO load.
> Notes:
> 1. Lifeline could be one effective solution, however some old branches are 
> not support this feature.
> 2. IO operations under #datasetLock is another issue, I think we should solve 
> it at another JIRA.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14997) BPServiceActor processes commands from NameNode asynchronously

2019-12-26 Thread Zhenyu Zheng (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17003518#comment-17003518
 ] 

Zhenyu Zheng commented on HDFS-14997:
-

Hi [~hexiaoqiao] here is the hprof we got from our tests, for 
{{TestFileChecksum}}  :

!image-2019-12-26-16-08-53-472.png! \

 

And as you can see from our CI: 
[https://builds.apache.org/job/Hadoop-qbt-linux-ARM-trunk/] we are running full 
tests once per day and these tests starts to fail serveral days ago.

I've also triggered one test for the upstream CI, it also fails for Datanode 
related tests:

[https://builds.apache.org/view/H-L/view/Hadoop/job/hadoop-multibranch/view/change-requests/job/PR-1783/1/#showFailuresLink]
 

> BPServiceActor processes commands from NameNode asynchronously
> --
>
> Key: HDFS-14997
> URL: https://issues.apache.org/jira/browse/HDFS-14997
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Xiaoqiao He
>Assignee: Xiaoqiao He
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14997.001.patch, HDFS-14997.002.patch, 
> HDFS-14997.003.patch, HDFS-14997.004.patch, HDFS-14997.005.patch
>
>
> There are two core functions, report(#sendHeartbeat, #blockReport, 
> #cacheReport) and #processCommand in #BPServiceActor main process flow. If 
> processCommand cost long time it will block send report flow. Meanwhile 
> processCommand could cost long time(over 1000s the worst case I meet) when IO 
> load  of DataNode is very high. Since some IO operations are under 
> #datasetLock, So it has to wait to acquire #datasetLock long time when 
> process some of commands(such as #DNA_INVALIDATE). In such case, #heartbeat 
> will not send to NameNode in-time, and trigger other disasters.
> I propose to improve #processCommand asynchronously and not block 
> #BPServiceActor to send heartbeat back to NameNode when meet high IO load.
> Notes:
> 1. Lifeline could be one effective solution, however some old branches are 
> not support this feature.
> 2. IO operations under #datasetLock is another issue, I think we should solve 
> it at another JIRA.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-15003) RBF: Make Router support storage type quota.

2019-12-26 Thread Ayush Saxena (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-15003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17003515#comment-17003515
 ] 

Ayush Saxena commented on HDFS-15003:
-

Thanx [~LiJinglun] seems you missed adding the description? Just added entry, 
not the description in the Commands Page. Here :

{noformat}

COMMAND_OPTION  Description
-add source nameservices destinationAdd a mount table entry or update if it 
exists.
-update source nameservices destination Update a mount table entry or create 
one if it does not exist.
-rm source  Remove mount point of specified path.
-ls pathList mount points under specified path.
-setQuota path -nsQuota nsQuota -ssQuota ssQuotaSet quota for specified 
path. See HDFS Quotas Guide for the quota detail.
-clrQuota path  Clear quota of given mount point. See HDFS Quotas Guide for the 
quota detail.
-safemode enter leave get   Manually set the Router entering or leaving 
safe mode. The option get will be used for verifying if the Router is in safe 
mode state.
-nameservice disable enable nameservice Disable/enable a name service from the 
federation. If disabled, requests will not go to that name service.
-getDisabledNameservicesGet the name services that are disabled in the 
federation.
{noformat}



> RBF: Make Router support storage type quota.
> 
>
> Key: HDFS-15003
> URL: https://issues.apache.org/jira/browse/HDFS-15003
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Jinglun
>Assignee: Jinglun
>Priority: Major
> Attachments: HDFS-15003.001.patch, HDFS-15003.002.patch, 
> HDFS-15003.003.patch, HDFS-15003.004.patch, HDFS-15003.005.patch, 
> HDFS-15003.006.patch, HDFS-15003.007.patch
>
>
> Make Router support storage type quota.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

43 matches

Mail list logo