[jira] [Updated] (HDFS-15044) [Dynamometer] Show the line of audit log when parsing it unsuccessfully

2019-12-09 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma updated HDFS-15044:

Status: Patch Available  (was: Open)

> [Dynamometer] Show the line of audit log when parsing it unsuccessfully
> ---
>
> Key: HDFS-15044
> URL: https://issues.apache.org/jira/browse/HDFS-15044
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: tools
>Reporter: Takanobu Asanuma
>Assignee: Takanobu Asanuma
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-15044) [Dynamometer] Show the line of audit log when parsing it unsuccessfully

2019-12-09 Thread Takanobu Asanuma (Jira)
Takanobu Asanuma created HDFS-15044:
---

 Summary: [Dynamometer] Show the line of audit log when parsing it 
unsuccessfully
 Key: HDFS-15044
 URL: https://issues.apache.org/jira/browse/HDFS-15044
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: tools
Reporter: Takanobu Asanuma
Assignee: Takanobu Asanuma






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14953) [Dynamometer] Missing blocks gradually increase after NN starts

2019-12-09 Thread Takanobu Asanuma (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16992267#comment-16992267
 ] 

Takanobu Asanuma commented on HDFS-14953:
-

Sorry for late response. Actually, I ported your PR and tested with it, but it 
didn't fix my problem. Will investigate it further.

> [Dynamometer] Missing blocks gradually increase after NN starts
> ---
>
> Key: HDFS-14953
> URL: https://issues.apache.org/jira/browse/HDFS-14953
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: tools
>Reporter: Takanobu Asanuma
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15040) RBF: Secured Router should not run when SecretManager is not running

2019-12-09 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16992263#comment-16992263
 ] 

Hudson commented on HDFS-15040:
---

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17746 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/17746/])
HDFS-15040. RBF: Secured Router should not run when SecretManager is not 
(github: rev c4733377d0fa375a8d585f5cb1db79bf20ec6710)
* (add) 
hadoop-hdfs-project/hadoop-hdfs-rbf/src/test/java/org/apache/hadoop/hdfs/server/federation/security/MockNotRunningSecretManager.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs-rbf/src/test/java/org/apache/hadoop/hdfs/server/federation/security/TestRouterSecurityManager.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/security/RouterSecurityManager.java


> RBF: Secured Router should not run when SecretManager is not running
> 
>
> Key: HDFS-15040
> URL: https://issues.apache.org/jira/browse/HDFS-15040
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Takanobu Asanuma
>Assignee: Takanobu Asanuma
>Priority: Major
> Fix For: 3.3.0
>
>
> We have faced an issue that router is running while SecretManager is not 
> running. HDFS-14835 is a similar fix which checks whether SecreatManager is 
> null or not. But it didn't cover this case. So we also need to check the 
> running status.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15041) Make MAX_LOCK_HOLD_MS and full queue size configurable

2019-12-09 Thread zhuqi (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16992261#comment-16992261
 ] 

zhuqi commented on HDFS-15041:
--

Thanks for [~hexiaoqiao]  to help to cc [~weichiu].

Now i am the Hadoop YARN Contributor, could you help me to add to Hadoop HDFS 
Contributor.

It's my honor to contribute to Hadoop HDFS.

> Make MAX_LOCK_HOLD_MS and full queue size configurable
> --
>
> Key: HDFS-15041
> URL: https://issues.apache.org/jira/browse/HDFS-15041
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: 3.2.0
>Reporter: zhuqi
>Priority: Major
> Attachments: HDFS-15041.001.patch
>
>
> Now the MAX_LOCK_HOLD_MS and the full queue size are fixed. But different 
> cluster have different need for the latency and the queue health standard. 
> We'd better to make the two parameter configurable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15043) RBF: The detail of the Exception is not shown in ZKDelegationTokenSecretManagerImpl

2019-12-09 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16992257#comment-16992257
 ] 

Hudson commented on HDFS-15043:
---

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17745 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/17745/])
HDFS-15043. RBF: The detail of the Exception is not shown in (github: rev 
9f098520517e3adfad0a2721284ccc19af3e6673)
* (edit) 
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/security/token/ZKDelegationTokenSecretManagerImpl.java


> RBF: The detail of the Exception is not shown in 
> ZKDelegationTokenSecretManagerImpl
> ---
>
> Key: HDFS-15043
> URL: https://issues.apache.org/jira/browse/HDFS-15043
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Akira Ajisaka
>Assignee: Akira Ajisaka
>Priority: Major
> Fix For: 3.3.0
>
>
> In the constructor of ZKDTSMImpl, when IOException occurs in 
> super.startThreads(), the message of the exception is not logged.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15040) RBF: Secured Router should not run when SecretManager is not running

2019-12-09 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma updated HDFS-15040:

Fix Version/s: 3.3.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

Merged the PR into trunk.

> RBF: Secured Router should not run when SecretManager is not running
> 
>
> Key: HDFS-15040
> URL: https://issues.apache.org/jira/browse/HDFS-15040
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Takanobu Asanuma
>Assignee: Takanobu Asanuma
>Priority: Major
> Fix For: 3.3.0
>
>
> We have faced an issue that router is running while SecretManager is not 
> running. HDFS-14835 is a similar fix which checks whether SecreatManager is 
> null or not. But it didn't cover this case. So we also need to check the 
> running status.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14983) RBF: Add dfsrouteradmin -refreshSuperUserGroupsConfiguration command option

2019-12-09 Thread Xieming Li (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16992259#comment-16992259
 ] 

Xieming Li commented on HDFS-14983:
---

[~inigoiri], Thank you for your review.

I have fixed that CheckStyle Error.

 

> RBF: Add dfsrouteradmin -refreshSuperUserGroupsConfiguration command option
> ---
>
> Key: HDFS-14983
> URL: https://issues.apache.org/jira/browse/HDFS-14983
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: rbf
>Reporter: Akira Ajisaka
>Assignee: Xieming Li
>Priority: Minor
> Attachments: HDFS-14983.002.patch, HDFS-14983.003.patch, 
> HDFS-14983.004.patch, HDFS-14983.draft.001.patch
>
>
> NameNode can update proxyuser config by -refreshSuperUserGroupsConfiguration 
> without restarting but DFSRouter cannot. It would be better for DFSRouter to 
> have such functionality to be compatible with NameNode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15041) Make MAX_LOCK_HOLD_MS and full queue size configurable

2019-12-09 Thread Xiaoqiao He (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16992256#comment-16992256
 ] 

Xiaoqiao He commented on HDFS-15041:


Thanks [~zhuqi] for your quick response, It makes sense to me cc[~weichiu] 
could you help to add [~zhuqi] as contributor and assign this JIRA to him? 
BTW, please correct codestyle (such as alignment,) following 
https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute. 

> Make MAX_LOCK_HOLD_MS and full queue size configurable
> --
>
> Key: HDFS-15041
> URL: https://issues.apache.org/jira/browse/HDFS-15041
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: 3.2.0
>Reporter: zhuqi
>Priority: Major
> Attachments: HDFS-15041.001.patch
>
>
> Now the MAX_LOCK_HOLD_MS and the full queue size are fixed. But different 
> cluster have different need for the latency and the queue health standard. 
> We'd better to make the two parameter configurable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14983) RBF: Add dfsrouteradmin -refreshSuperUserGroupsConfiguration command option

2019-12-09 Thread Xieming Li (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xieming Li updated HDFS-14983:
--
Attachment: HDFS-14983.004.patch
Status: Patch Available  (was: Open)

> RBF: Add dfsrouteradmin -refreshSuperUserGroupsConfiguration command option
> ---
>
> Key: HDFS-14983
> URL: https://issues.apache.org/jira/browse/HDFS-14983
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: rbf
>Reporter: Akira Ajisaka
>Assignee: Xieming Li
>Priority: Minor
> Attachments: HDFS-14983.002.patch, HDFS-14983.003.patch, 
> HDFS-14983.004.patch, HDFS-14983.draft.001.patch
>
>
> NameNode can update proxyuser config by -refreshSuperUserGroupsConfiguration 
> without restarting but DFSRouter cannot. It would be better for DFSRouter to 
> have such functionality to be compatible with NameNode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14983) RBF: Add dfsrouteradmin -refreshSuperUserGroupsConfiguration command option

2019-12-09 Thread Xieming Li (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xieming Li updated HDFS-14983:
--
Status: Open  (was: Patch Available)

> RBF: Add dfsrouteradmin -refreshSuperUserGroupsConfiguration command option
> ---
>
> Key: HDFS-14983
> URL: https://issues.apache.org/jira/browse/HDFS-14983
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: rbf
>Reporter: Akira Ajisaka
>Assignee: Xieming Li
>Priority: Minor
> Attachments: HDFS-14983.002.patch, HDFS-14983.003.patch, 
> HDFS-14983.draft.001.patch
>
>
> NameNode can update proxyuser config by -refreshSuperUserGroupsConfiguration 
> without restarting but DFSRouter cannot. It would be better for DFSRouter to 
> have such functionality to be compatible with NameNode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15043) RBF: The detail of the Exception is not shown in ZKDelegationTokenSecretManagerImpl

2019-12-09 Thread Akira Ajisaka (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka updated HDFS-15043:
-
Fix Version/s: 3.3.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

Merged the PR into trunk.

> RBF: The detail of the Exception is not shown in 
> ZKDelegationTokenSecretManagerImpl
> ---
>
> Key: HDFS-15043
> URL: https://issues.apache.org/jira/browse/HDFS-15043
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Akira Ajisaka
>Assignee: Akira Ajisaka
>Priority: Major
> Fix For: 3.3.0
>
>
> In the constructor of ZKDTSMImpl, when IOException occurs in 
> super.startThreads(), the message of the exception is not logged.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15041) Make MAX_LOCK_HOLD_MS and full queue size configurable

2019-12-09 Thread zhuqi (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16992240#comment-16992240
 ] 

zhuqi commented on HDFS-15041:
--

Hi [~hexiaoqiao] 

Our one cluster with too many deleted operations and write operations because 
of our new hive lifetime system with too many partitions . In some cases the 
4ms will be too short to handle, so i want to let it to be configurable. Also 
some presto based realtime situation without using read from standby, may want 
to shorter the max lock time, in order to better the read performance.

Thanks.

> Make MAX_LOCK_HOLD_MS and full queue size configurable
> --
>
> Key: HDFS-15041
> URL: https://issues.apache.org/jira/browse/HDFS-15041
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: 3.2.0
>Reporter: zhuqi
>Priority: Major
> Attachments: HDFS-15041.001.patch
>
>
> Now the MAX_LOCK_HOLD_MS and the full queue size are fixed. But different 
> cluster have different need for the latency and the queue health standard. 
> We'd better to make the two parameter configurable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14997) BPServiceActor process command from NameNode asynchronously

2019-12-09 Thread Xiaoqiao He (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16992225#comment-16992225
 ] 

Xiaoqiao He commented on HDFS-14997:


Thanks [~elgoiri] for your comments. Actually I have deployed this feature in 
our production cluster for weeks, It looks work well as expected. For separate 
#CommandProcessingThread class which it is only used by #BPServiceActor, I 
think keep it as one inner class of #BPServiceActor is OK for me, FYI. Thanks.

> BPServiceActor process command from NameNode asynchronously
> ---
>
> Key: HDFS-14997
> URL: https://issues.apache.org/jira/browse/HDFS-14997
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Xiaoqiao He
>Assignee: Xiaoqiao He
>Priority: Major
> Attachments: HDFS-14997.001.patch, HDFS-14997.002.patch, 
> HDFS-14997.003.patch, HDFS-14997.004.patch, HDFS-14997.005.patch
>
>
> There are two core functions, report(#sendHeartbeat, #blockReport, 
> #cacheReport) and #processCommand in #BPServiceActor main process flow. If 
> processCommand cost long time it will block send report flow. Meanwhile 
> processCommand could cost long time(over 1000s the worst case I meet) when IO 
> load  of DataNode is very high. Since some IO operations are under 
> #datasetLock, So it has to wait to acquire #datasetLock long time when 
> process some of commands(such as #DNA_INVALIDATE). In such case, #heartbeat 
> will not send to NameNode in-time, and trigger other disasters.
> I propose to improve #processCommand asynchronously and not block 
> #BPServiceActor to send heartbeat back to NameNode when meet high IO load.
> Notes:
> 1. Lifeline could be one effective solution, however some old branches are 
> not support this feature.
> 2. IO operations under #datasetLock is another issue, I think we should solve 
> it at another JIRA.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15041) Make MAX_LOCK_HOLD_MS and full queue size configurable

2019-12-09 Thread Xiaoqiao He (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16992223#comment-16992223
 ] 

Xiaoqiao He commented on HDFS-15041:


Thanks [~weichiu] for your invite. In my experience, 4 ms by default is 
suitable for me and not meet issues anymore, since this acts for RPC request 
#blockReceivedAndDeleted only. I also wonder the purposes why to tune this 
parameters. Some cases meet will be better to decide whether it is need. FYI. 
Thanks again.

> Make MAX_LOCK_HOLD_MS and full queue size configurable
> --
>
> Key: HDFS-15041
> URL: https://issues.apache.org/jira/browse/HDFS-15041
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: 3.2.0
>Reporter: zhuqi
>Priority: Major
> Attachments: HDFS-15041.001.patch
>
>
> Now the MAX_LOCK_HOLD_MS and the full queue size are fixed. But different 
> cluster have different need for the latency and the queue health standard. 
> We'd better to make the two parameter configurable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-15041) Make MAX_LOCK_HOLD_MS and full queue size configurable

2019-12-09 Thread zhuqi (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16992124#comment-16992124
 ] 

zhuqi edited comment on HDFS-15041 at 12/10/19 5:48 AM:


Hi [~weichiu]

Yeah, i mean to make MAX_LOCK_HOLD_MS configurable after my change in 
HDFS-14553, because of the different need for client latency and balance the 
pressure for rpc queue. Sorry for the mistake , not mean the hdfs balancer, the 
 balancer pressure i have changed to standby.  


was (Author: zhuqi):
Hi [~weichiu]

Yeah, i mean to make MAX_LOCK_HOLD_MS configurable after my change in 
HDFS-14553, because of the different need for client latency and balance the 
pressure for rpc queue. Sorry for the mistake , not mean the hdfs balancer, the 
 balancer pressure i have changed to standby. Also we can add this queue size 
to metrics if needed?

 

> Make MAX_LOCK_HOLD_MS and full queue size configurable
> --
>
> Key: HDFS-15041
> URL: https://issues.apache.org/jira/browse/HDFS-15041
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: 3.2.0
>Reporter: zhuqi
>Priority: Major
> Attachments: HDFS-15041.001.patch
>
>
> Now the MAX_LOCK_HOLD_MS and the full queue size are fixed. But different 
> cluster have different need for the latency and the queue health standard. 
> We'd better to make the two parameter configurable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15041) Make MAX_LOCK_HOLD_MS and full queue size configurable

2019-12-09 Thread zhuqi (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhuqi updated HDFS-15041:
-
Attachment: HDFS-15041.001.patch
Status: Patch Available  (was: Open)

> Make MAX_LOCK_HOLD_MS and full queue size configurable
> --
>
> Key: HDFS-15041
> URL: https://issues.apache.org/jira/browse/HDFS-15041
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: 3.2.0
>Reporter: zhuqi
>Priority: Major
> Attachments: HDFS-15041.001.patch
>
>
> Now the MAX_LOCK_HOLD_MS and the full queue size are fixed. But different 
> cluster have different need for the latency and the queue health standard. 
> We'd better to make the two parameter configurable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15043) RBF: The detail of the Exception is not shown in ZKDelegationTokenSecretManagerImpl

2019-12-09 Thread Akira Ajisaka (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka updated HDFS-15043:
-
Assignee: Akira Ajisaka
  Status: Patch Available  (was: Open)

> RBF: The detail of the Exception is not shown in 
> ZKDelegationTokenSecretManagerImpl
> ---
>
> Key: HDFS-15043
> URL: https://issues.apache.org/jira/browse/HDFS-15043
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Akira Ajisaka
>Assignee: Akira Ajisaka
>Priority: Major
>
> In the constructor of ZKDTSMImpl, when IOException occurs in 
> super.startThreads(), the message of the exception is not logged.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15043) RBF: The detail of the Exception is not shown in ZKDelegationTokenSecretManagerImpl

2019-12-09 Thread Akira Ajisaka (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka updated HDFS-15043:
-
Target Version/s: 3.3.0

> RBF: The detail of the Exception is not shown in 
> ZKDelegationTokenSecretManagerImpl
> ---
>
> Key: HDFS-15043
> URL: https://issues.apache.org/jira/browse/HDFS-15043
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Akira Ajisaka
>Assignee: Akira Ajisaka
>Priority: Major
>
> In the constructor of ZKDTSMImpl, when IOException occurs in 
> super.startThreads(), the message of the exception is not logged.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15043) RBF: The detail of the Exception is not shown in ZKDelegationTokenSecretManagerImpl

2019-12-09 Thread Akira Ajisaka (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka updated HDFS-15043:
-
Description: In the constructor of ZKDTSMImpl, when IOException occurs in 
super.startThreads(), the message of the exception is not logged.

> RBF: The detail of the Exception is not shown in 
> ZKDelegationTokenSecretManagerImpl
> ---
>
> Key: HDFS-15043
> URL: https://issues.apache.org/jira/browse/HDFS-15043
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Akira Ajisaka
>Priority: Major
>
> In the constructor of ZKDTSMImpl, when IOException occurs in 
> super.startThreads(), the message of the exception is not logged.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15043) RBF: The detail of the Exception is not shown in ZKDelegationTokenSecretManagerImpl

2019-12-09 Thread Akira Ajisaka (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka updated HDFS-15043:
-
Summary: RBF: The detail of the Exception is not shown in 
ZKDelegationTokenSecretManagerImpl  (was: RBF: The detail of the Exception is 
not logged in ZKDelegationTokenSecretManagerImpl)

> RBF: The detail of the Exception is not shown in 
> ZKDelegationTokenSecretManagerImpl
> ---
>
> Key: HDFS-15043
> URL: https://issues.apache.org/jira/browse/HDFS-15043
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Akira Ajisaka
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-15043) RBF: The detail of the Exception is not logged in ZKDelegationTokenSecretManagerImpl

2019-12-09 Thread Akira Ajisaka (Jira)
Akira Ajisaka created HDFS-15043:


 Summary: RBF: The detail of the Exception is not logged in 
ZKDelegationTokenSecretManagerImpl
 Key: HDFS-15043
 URL: https://issues.apache.org/jira/browse/HDFS-15043
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Akira Ajisaka






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15036) Active NameNode should not silently fail the image transfer

2019-12-09 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16992155#comment-16992155
 ] 

Hadoop QA commented on HDFS-15036:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
47s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 
23s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
0s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
44s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
11s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 43s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  2m 
17s{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs in trunk has 1 extant 
Findbugs warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
20s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 1s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 31s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
10s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 99m 20s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
32s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}162m 55s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hdfs.server.datanode.TestDataNodeErasureCodingMetrics |
|   | hadoop.hdfs.TestReconstructStripedFileWithRandomECPolicy |
|   | hadoop.hdfs.server.namenode.TestNamenodeCapacityReport |
|   | hadoop.hdfs.TestReconstructStripedFile |
|   | hadoop.hdfs.server.namenode.TestFsck |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.5 Server=19.03.5 Image:yetus/hadoop:104ccca9169 |
| JIRA Issue | HDFS-15036 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12988378/HDFS-15036.001.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 0e77d17e1e66 4.15.0-66-generic #75-Ubuntu SMP Tue Oct 1 
05:24:09 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / dc66de7 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_222 |
| findbugs | v3.1.0-RC1 |
| findbugs | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28488/artifact/out/branch-findbugs-hadoop-hdfs-project_hadoop-hdfs-warnings.html
 |
| unit | 

[jira] [Commented] (HDFS-15036) Active NameNode should not silently fail the image transfer

2019-12-09 Thread Konstantin Shvachko (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16992138#comment-16992138
 ] 

Konstantin Shvachko commented on HDFS-15036:


Good investigation and findings [~vagarychen].
# Could you add a comment explaining that {{ImageServlet}} should not reject 
images other than checkpoints.
# I am still concerned about the "silent" part. Should we add some logging, so 
that next time we could see what happened on both nodes.

> Active NameNode should not silently fail the image transfer
> ---
>
> Key: HDFS-15036
> URL: https://issues.apache.org/jira/browse/HDFS-15036
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.10.0
>Reporter: Konstantin Shvachko
>Assignee: Chen Liang
>Priority: Major
> Attachments: HDFS-15036.001.patch
>
>
> Image transfer from Standby NameNode to  Active silently fails on Active, 
> without any logging and not notifying the receiver side.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-15041) Make MAX_LOCK_HOLD_MS and full queue size configurable

2019-12-09 Thread zhuqi (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16992124#comment-16992124
 ] 

zhuqi edited comment on HDFS-15041 at 12/10/19 2:45 AM:


Hi [~weichiu]

Yeah, i mean to make MAX_LOCK_HOLD_MS configurable after my change in 
HDFS-14553, because of the different need for client latency and balance the 
pressure for rpc queue. Sorry for the mistake , not mean the hdfs balancer, the 
 balancer pressure i have changed to standby. Also we can add this queue size 
to metrics if needed?

 


was (Author: zhuqi):
Hi [~weichiu]

Yeah, i mean to make MAX_LOCK_HOLD_MS configurable after my change in 
HDFS-14553, because of the different need for client latency and balance the 
pressure for rpc queue. The balancer pressure i have changed to standby. Also 
we can add this queue size to metrics if needed.

 

> Make MAX_LOCK_HOLD_MS and full queue size configurable
> --
>
> Key: HDFS-15041
> URL: https://issues.apache.org/jira/browse/HDFS-15041
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: 3.2.0
>Reporter: zhuqi
>Priority: Major
>
> Now the MAX_LOCK_HOLD_MS and the full queue size are fixed. But different 
> cluster have different need for the latency and the queue health standard. 
> We'd better to make the two parameter configurable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-15041) Make MAX_LOCK_HOLD_MS and full queue size configurable

2019-12-09 Thread zhuqi (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16992124#comment-16992124
 ] 

zhuqi edited comment on HDFS-15041 at 12/10/19 2:26 AM:


Hi [~weichiu]

Yeah, i mean to make MAX_LOCK_HOLD_MS configurable after my change in 
HDFS-14553, because of the different need for client latency and balance the 
pressure for rpc queue. The balancer pressure i have changed to standby. Also 
we can add this queue size to metrics if needed.

 


was (Author: zhuqi):
Hi [~weichiu]

Yeah, i mean to make MAX_LOCK_HOLD_MS configurable after your change in 
HDFS-14553, because of the different need for client latency and balance the 
pressure for rpc queue. The balancer pressure i have changed to standby. 

 

> Make MAX_LOCK_HOLD_MS and full queue size configurable
> --
>
> Key: HDFS-15041
> URL: https://issues.apache.org/jira/browse/HDFS-15041
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: 3.2.0
>Reporter: zhuqi
>Priority: Major
>
> Now the MAX_LOCK_HOLD_MS and the full queue size are fixed. But different 
> cluster have different need for the latency and the queue health standard. 
> We'd better to make the two parameter configurable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15041) Make MAX_LOCK_HOLD_MS and full queue size configurable

2019-12-09 Thread zhuqi (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16992124#comment-16992124
 ] 

zhuqi commented on HDFS-15041:
--

Hi [~weichiu]

Yeah, i mean to make MAX_LOCK_HOLD_MS configurable after your change in 
HDFS-14553, because of the different need for client latency and balance the 
pressure for rpc queue. The balancer pressure i have changed to standby. 

 

> Make MAX_LOCK_HOLD_MS and full queue size configurable
> --
>
> Key: HDFS-15041
> URL: https://issues.apache.org/jira/browse/HDFS-15041
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: 3.2.0
>Reporter: zhuqi
>Priority: Major
>
> Now the MAX_LOCK_HOLD_MS and the full queue size are fixed. But different 
> cluster have different need for the latency and the queue health standard. 
> We'd better to make the two parameter configurable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14522) Allow compact property description in xml in httpfs

2019-12-09 Thread Akira Ajisaka (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka updated HDFS-14522:
-
Fix Version/s: 3.3.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

Thank you [~iwasakims] and [~inigoiri]!

> Allow compact property description in xml in httpfs
> ---
>
> Key: HDFS-14522
> URL: https://issues.apache.org/jira/browse/HDFS-14522
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: httpfs
>Reporter: Akira Ajisaka
>Assignee: Masatake Iwasaki
>Priority: Major
> Fix For: 3.3.0
>
>
> HADOOP-6964 allowed compact property description in Hadoop configuration, 
> however, it is not allowed in httpfs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14522) Allow compact property description in xml in httpfs

2019-12-09 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16992105#comment-16992105
 ] 

Hudson commented on HDFS-14522:
---

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17744 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/17744/])
HDFS-14522. Allow compact property description in xml in httpfs. (#1737) 
(github: rev 4dffd81bb75efaa5742d2246354ebdc86cbd1aab)
* (add) 
hadoop-hdfs-project/hadoop-hdfs-httpfs/src/test/resources/test-compact-format-property.xml
* (edit) 
hadoop-hdfs-project/hadoop-hdfs-httpfs/src/test/java/org/apache/hadoop/lib/util/TestConfigurationUtils.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/java/org/apache/hadoop/lib/util/ConfigurationUtils.java


> Allow compact property description in xml in httpfs
> ---
>
> Key: HDFS-14522
> URL: https://issues.apache.org/jira/browse/HDFS-14522
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: httpfs
>Reporter: Akira Ajisaka
>Assignee: Masatake Iwasaki
>Priority: Major
>
> HADOOP-6964 allowed compact property description in Hadoop configuration, 
> however, it is not allowed in httpfs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15036) Active NameNode should not silently fail the image transfer

2019-12-09 Thread Chen Liang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang updated HDFS-15036:
--
Target Version/s: 3.3.0, 2.10.1  (was: 2.10.1)
  Status: Patch Available  (was: Open)

> Active NameNode should not silently fail the image transfer
> ---
>
> Key: HDFS-15036
> URL: https://issues.apache.org/jira/browse/HDFS-15036
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.10.0
>Reporter: Konstantin Shvachko
>Assignee: Chen Liang
>Priority: Major
> Attachments: HDFS-15036.001.patch
>
>
> Image transfer from Standby NameNode to  Active silently fails on Active, 
> without any logging and not notifying the receiver side.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15036) Active NameNode should not silently fail the image transfer

2019-12-09 Thread Chen Liang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang updated HDFS-15036:
--
Attachment: HDFS-15036.001.patch

> Active NameNode should not silently fail the image transfer
> ---
>
> Key: HDFS-15036
> URL: https://issues.apache.org/jira/browse/HDFS-15036
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.10.0
>Reporter: Konstantin Shvachko
>Assignee: Chen Liang
>Priority: Major
> Attachments: HDFS-15036.001.patch
>
>
> Image transfer from Standby NameNode to  Active silently fails on Active, 
> without any logging and not notifying the receiver side.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15032) Balancer crashes when it fails to contact an unavailable NN via ObserverReadProxyProvider

2019-12-09 Thread Konstantin Shvachko (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16992069#comment-16992069
 ] 

Konstantin Shvachko commented on HDFS-15032:


Hey Erik, I see what you mean now. But it looks like an IntelliJ specific 
thing. I don't see this in my Eclipse. When I click on the ProxyCombiner 
variable Eclipse shows what your toString() specifies. Performance with 
{{Method.equals()}} is better as it compares references, rather than strings, 
but I don't think it is worth it.
Looks like {{TestBalancerWithHANameNodes}} timed out on Jenkins. I saw it 
timing out on my Linux box once as well.

> Balancer crashes when it fails to contact an unavailable NN via 
> ObserverReadProxyProvider
> -
>
> Key: HDFS-15032
> URL: https://issues.apache.org/jira/browse/HDFS-15032
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer  mover
>Affects Versions: 2.10.0
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Major
> Attachments: HDFS-15032.000.patch, HDFS-15032.001.patch, 
> HDFS-15032.002.patch, HDFS-15032.003.patch, HDFS-15032.004.patch, 
> debugger_with_tostring.png, debugger_without_tostring.png
>
>
> When trying to run the Balancer using ObserverReadProxyProvider (to allow it 
> to read from the Observer Node as described in HDFS-14979), if one of the NNs 
> isn't running, the Balancer will crash.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-15036) Active NameNode should not silently fail the image transfer

2019-12-09 Thread Chen Liang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Liang reassigned HDFS-15036:
-

Assignee: Chen Liang  (was: Chao Sun)

> Active NameNode should not silently fail the image transfer
> ---
>
> Key: HDFS-15036
> URL: https://issues.apache.org/jira/browse/HDFS-15036
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.10.0
>Reporter: Konstantin Shvachko
>Assignee: Chen Liang
>Priority: Major
>
> Image transfer from Standby NameNode to  Active silently fails on Active, 
> without any logging and not notifying the receiver side.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15036) Active NameNode should not silently fail the image transfer

2019-12-09 Thread Chen Liang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16992029#comment-16992029
 ] 

Chen Liang commented on HDFS-15036:
---

[~csun] np, sure, thanks for asking :) . Assigning to myself then.

> Active NameNode should not silently fail the image transfer
> ---
>
> Key: HDFS-15036
> URL: https://issues.apache.org/jira/browse/HDFS-15036
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.10.0
>Reporter: Konstantin Shvachko
>Assignee: Chao Sun
>Priority: Major
>
> Image transfer from Standby NameNode to  Active silently fails on Active, 
> without any logging and not notifying the receiver side.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14852) Remove of LowRedundancyBlocks do NOT remove the block from all queues

2019-12-09 Thread Stephen O'Donnell (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16992025#comment-16992025
 ] 

Stephen O'Donnell commented on HDFS-14852:
--

Looking at the original code, in BlockManager.removeBlock(), it does not guess 
the level to remove from, it always passes LowRedundancyBlocks.LEVEL.

{code}
neededReconstruction.remove(block, LowRedundancyBlocks.LEVEL);
{code}

This means the first part of the method is never executed, and it will always 
iterate all the queues until it finds an entry to remove:

{code}
if(priLevel >= 0 && priLevel < LEVEL  // Never executed on block delete as 
priLevel == LEVEL
&& priorityQueues.get(priLevel).remove(block)) {
  ...
  return true;
} else {
  // Try to remove the block from all queues if the block was
  // not found in the queue for the given priority level.
  for (int i = 0; i < LEVEL; i++) {
if (i != priLevel && priorityQueues.get(i).remove(block)) {
  NameNode.blockStateChangeLog.debug(
  "BLOCK* NameSystem.LowRedundancyBlock.remove: Removing block" +
  " {} from priority queue {}", block, i);
  decrementBlockStat(block, i, oldExpectedReplicas);
  return true;
}
  }
}
return false;
  }
{code}

In the most common case, for a delete of a given block, there will be no 
reference in the lowRedundancyQueue (most blocks are perfectly replicated), but 
based on the above, it has always been checking all 5 queues the majority of 
the time, so I wonder if the performance concern of deleting all queues is as 
bad as we think.

I wonder if the call to remove I mentioned above should always have been:

{code}
neededReconstruction.remove(block, LowRedundancyBlocks.LEVEL - 1);
{code}

That way it would always attempt to delete from the corrupt list and if it gets 
nothing, try the other queues. If something is left behind in the other queues 
it would get deleted anyway later by the redundancy monitor.

Other calls to neededReconstruction.remove() pass a priority, but that is 
because those calls know the queue the block was taken from (they don't really 
guess / calculate the priority, they just know where it came from), but as the 
write lock is dropped after getting the list of blocks the block could be moved 
to another level:

{code}
  int computeBlockReconstructionWork(int blocksToProcess) {
List> blocksToReconstruct = null;
namesystem.writeLock();
try {
  // Choose the blocks to be reconstructed
  blocksToReconstruct = neededReconstruction
  .chooseLowRedundancyBlocks(blocksToProcess);
} finally {
  namesystem.writeUnlock();
}
return computeReconstructionWorkForBlocks(blocksToReconstruct);
  }
{code}

I need to check the 005 patch a bit more tomorrow and think on this a bit more. 
Based on my logic above, where the common case for deletes already checks all 
unless it finds a match, and the other cases pass a priority which is almost 
always correct, and rarely iterator the queues, I do wonder if simply deleting 
all queues is the simplest solution.

> Remove of LowRedundancyBlocks do NOT remove the block from all queues
> -
>
> Key: HDFS-14852
> URL: https://issues.apache.org/jira/browse/HDFS-14852
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.2.0, 3.0.3, 3.1.2, 3.3.0
>Reporter: Fei Hui
>Assignee: Fei Hui
>Priority: Major
> Attachments: CorruptBlocksMismatch.png, HDFS-14852.001.patch, 
> HDFS-14852.002.patch, HDFS-14852.003.patch, HDFS-14852.004.patch, 
> HDFS-14852.005.patch, screenshot-1.png
>
>
> LowRedundancyBlocks.java
> {code:java}
> // Some comments here
> if(priLevel >= 0 && priLevel < LEVEL
> && priorityQueues.get(priLevel).remove(block)) {
>   NameNode.blockStateChangeLog.debug(
>   "BLOCK* NameSystem.LowRedundancyBlock.remove: Removing block {}"
>   + " from priority queue {}",
>   block, priLevel);
>   decrementBlockStat(block, priLevel, oldExpectedReplicas);
>   return true;
> } else {
>   // Try to remove the block from all queues if the block was
>   // not found in the queue for the given priority level.
>   for (int i = 0; i < LEVEL; i++) {
> if (i != priLevel && priorityQueues.get(i).remove(block)) {
>   NameNode.blockStateChangeLog.debug(
>   "BLOCK* NameSystem.LowRedundancyBlock.remove: Removing block" +
>   " {} from priority queue {}", block, i);
>   decrementBlockStat(block, i, oldExpectedReplicas);
>   return true;
> }
>   }
> }
> return false;
>   }
> {code}
> Source code is above, the comments as follow
> {quote}
> 

[jira] [Commented] (HDFS-6874) Add GETFILEBLOCKLOCATIONS operation to HttpFS

2019-12-09 Thread Jira


[ 
https://issues.apache.org/jira/browse/HDFS-6874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16992023#comment-16992023
 ] 

Íñigo Goiri commented on HDFS-6874:
---

{quote}
I think we need to implement getblocklocations in httpfs and call  
getfileblocklocations .
{quote}

Yes, I think that's the way to go.

> Add GETFILEBLOCKLOCATIONS operation to HttpFS
> -
>
> Key: HDFS-6874
> URL: https://issues.apache.org/jira/browse/HDFS-6874
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: httpfs
>Affects Versions: 2.4.1, 2.7.3
>Reporter: Gao Zhong Liang
>Assignee: Weiwei Yang
>Priority: Major
>  Labels: BB2015-05-TBR
> Attachments: HDFS-6874-1.patch, HDFS-6874-branch-2.6.0.patch, 
> HDFS-6874.02.patch, HDFS-6874.03.patch, HDFS-6874.04.patch, 
> HDFS-6874.05.patch, HDFS-6874.06.patch, HDFS-6874.07.patch, 
> HDFS-6874.08.patch, HDFS-6874.09.patch, HDFS-6874.10.patch, HDFS-6874.patch
>
>
> GETFILEBLOCKLOCATIONS operation is missing in HttpFS, which is already 
> supported in WebHDFS.  For the request of GETFILEBLOCKLOCATIONS in 
> org.apache.hadoop.fs.http.server.HttpFSServer, BAD_REQUEST is returned so far:
> ...
>  case GETFILEBLOCKLOCATIONS: {
> response = Response.status(Response.Status.BAD_REQUEST).build();
> break;
>   }
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15036) Active NameNode should not silently fail the image transfer

2019-12-09 Thread Chao Sun (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16992018#comment-16992018
 ] 

Chao Sun commented on HDFS-15036:
-

[~vagarychen] sorry for grabbing this JIRA too soon :) Since you have done much 
study on this, do you want to take this JIRA instead?

> Active NameNode should not silently fail the image transfer
> ---
>
> Key: HDFS-15036
> URL: https://issues.apache.org/jira/browse/HDFS-15036
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.10.0
>Reporter: Konstantin Shvachko
>Assignee: Chao Sun
>Priority: Major
>
> Image transfer from Standby NameNode to  Active silently fails on Active, 
> without any logging and not notifying the receiver side.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-15036) Active NameNode should not silently fail the image transfer

2019-12-09 Thread Chen Liang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16991998#comment-16991998
 ] 

Chen Liang edited comment on HDFS-15036 at 12/9/19 10:36 PM:
-

Spent some time debugging this issue, I think I found the cause of the issue.

In HDFS-12979, we introduced a logic that, if a image being uploaded is not too 
far ahead of the previous image, this image upload request is rejected. This is 
to prevent the scenario when there are multiple SbNs, all SbNs upload images to 
ANN too frequently. This is considered as correct behavior, so there is no 
logging indication of any error or anything here (the being "silent" part). 
Both ANN and SbN simply ignore and proceed.

But now it appears that, a side effect of this change, is that during RU, the 
rollback image also has to go through this check, and it could also be 
rejected. If this happens, SbN proceeds assuming upload is done, while ANN 
proceeds with still not receiving the rollback image. The upload silently 
failed in this case.

The check logic that rejects the upload is in {{ImageServlet}}. In my earlier 
test, I just commented out the whole block below and the issue seems gone. But 
I think the fix is probably just adding a new check to ensure this rejection 
only applies to regular image upload, not rollback image, like the newly added 
line in the line in the follow code snippet. But I haven't actually tested 
changing it this way.:
{code:java}
  if (checkRecentImageEnable &&
  NameNodeFile.IMAGE.equals(parsedParams.getNameNodeFile()) && 
// <--- this should fix the issue, as NameNodeFile.IMAGE_ROLLBACK should bypass 
this
  timeDelta < checkpointPeriod &&
  txid - lastCheckpointTxid < checkpointTxnCount) {
// only when at least one of two conditions are met we accept
// a new fsImage
// 1. most recent image's txid is too far behind
// 2. last checkpoint time was too old
response.sendError(HttpServletResponse.SC_CONFLICT,
"Most recent checkpoint is neither too far behind in "
+ "txid, nor too old. New txnid cnt is "
+ (txid - lastCheckpointTxid)
+ ", expecting at least " + checkpointTxnCount
+ " unless too long since last upload.");
return null;
  }
{code}


was (Author: vagarychen):
Spent some time debugging this issue, I think I found the cause of the issue. 

In HDFS-12979, we introduced a logic that, if a image being uploaded is not too 
far ahead of the previous image, this image upload request is rejected. This is 
to prevent the scenario when there are multiple SbNs, all SbNs upload images to 
ANN too frequently. This is considered as correct behavior, so there is no 
logging indication of any error or anything here (the being "silent" part). 
Both ANN and SbN simply ignore and proceed.

But now it appears that, a side effect of this change, is that during RU, the 
rollback image also has to go through this check, and it could also be 
rejected. If this happens, SbN proceeds assuming upload is done, while ANN 
proceeds with still not receiving the rollback image. The upload silently 
failed in this case.

The check logic that rejects the upload is in {{ImageServlet}}. In my earlier 
test, I just commented out the whole block below and the issue seems gone. But 
I think the fix is probably just adding a new check to ensure this rejection 
only applies to regular image upload, like the newly added line in the line in 
the follow code snippet. But I haven't actually tested changing it this way.:
{code}
  if (checkRecentImageEnable &&
  NameNodeFile.IMAGE.equals(parsedParams.getNameNodeFile()) && 
// <--- this should fix the issue
  timeDelta < checkpointPeriod &&
  txid - lastCheckpointTxid < checkpointTxnCount) {
// only when at least one of two conditions are met we accept
// a new fsImage
// 1. most recent image's txid is too far behind
// 2. last checkpoint time was too old
response.sendError(HttpServletResponse.SC_CONFLICT,
"Most recent checkpoint is neither too far behind in "
+ "txid, nor too old. New txnid cnt is "
+ (txid - lastCheckpointTxid)
+ ", expecting at least " + checkpointTxnCount
+ " unless too long since last upload.");
return null;
  }
{code}


> Active NameNode should not silently fail the image transfer
> ---
>
> Key: HDFS-15036
> URL: 

[jira] [Commented] (HDFS-15036) Active NameNode should not silently fail the image transfer

2019-12-09 Thread Chen Liang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16991998#comment-16991998
 ] 

Chen Liang commented on HDFS-15036:
---

Spent some time debugging this issue, I think I found the cause of the issue. 

In HDFS-12979, we introduced a logic that, if a image being uploaded is not too 
far ahead of the previous image, this image upload request is rejected. This is 
to prevent the scenario when there are multiple SbNs, all SbNs upload images to 
ANN too frequently. This is considered as correct behavior, so there is no 
logging indication of any error or anything here (the being "silent" part). 
Both ANN and SbN simply ignore and proceed.

But now it appears that, a side effect of this change, is that during RU, the 
rollback image also has to go through this check, and it could also be 
rejected. If this happens, SbN proceeds assuming upload is done, while ANN 
proceeds with still not receiving the rollback image. The upload silently 
failed in this case.

The check logic that rejects the upload is in {{ImageServlet}}. In my earlier 
test, I just commented out the whole block below and the issue seems gone. But 
I think the fix is probably just adding a new check to ensure this rejection 
only applies to regular image upload, like the newly added line in the line in 
the follow code snippet. But I haven't actually tested changing it this way.:
{code}
  if (checkRecentImageEnable &&
  NameNodeFile.IMAGE.equals(parsedParams.getNameNodeFile()) && 
// <--- this should fix the issue
  timeDelta < checkpointPeriod &&
  txid - lastCheckpointTxid < checkpointTxnCount) {
// only when at least one of two conditions are met we accept
// a new fsImage
// 1. most recent image's txid is too far behind
// 2. last checkpoint time was too old
response.sendError(HttpServletResponse.SC_CONFLICT,
"Most recent checkpoint is neither too far behind in "
+ "txid, nor too old. New txnid cnt is "
+ (txid - lastCheckpointTxid)
+ ", expecting at least " + checkpointTxnCount
+ " unless too long since last upload.");
return null;
  }
{code}


> Active NameNode should not silently fail the image transfer
> ---
>
> Key: HDFS-15036
> URL: https://issues.apache.org/jira/browse/HDFS-15036
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.10.0
>Reporter: Konstantin Shvachko
>Assignee: Chao Sun
>Priority: Major
>
> Image transfer from Standby NameNode to  Active silently fails on Active, 
> without any logging and not notifying the receiver side.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14983) RBF: Add dfsrouteradmin -refreshSuperUserGroupsConfiguration command option

2019-12-09 Thread Jira


[ 
https://issues.apache.org/jira/browse/HDFS-14983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16991984#comment-16991984
 ] 

Íñigo Goiri commented on HDFS-14983:


For the checkstyle warning you can break the line in 
TestRouterRefreshSuperUserGroupsConfiguration.
Other than that this looks good to go.
Hopefully the next Yetus run will be clean.

> RBF: Add dfsrouteradmin -refreshSuperUserGroupsConfiguration command option
> ---
>
> Key: HDFS-14983
> URL: https://issues.apache.org/jira/browse/HDFS-14983
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: rbf
>Reporter: Akira Ajisaka
>Assignee: Xieming Li
>Priority: Minor
> Attachments: HDFS-14983.002.patch, HDFS-14983.003.patch, 
> HDFS-14983.draft.001.patch
>
>
> NameNode can update proxyuser config by -refreshSuperUserGroupsConfiguration 
> without restarting but DFSRouter cannot. It would be better for DFSRouter to 
> have such functionality to be compatible with NameNode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15041) Make MAX_LOCK_HOLD_MS and full queue size configurable

2019-12-09 Thread Wei-Chiu Chuang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16991983#comment-16991983
 ] 

Wei-Chiu Chuang commented on HDFS-15041:


[~hexiaoqiao] does it make sense to make MAX_LOCK_HOLD_MS configurable after 
your change in HDFS-14553?

Also, let's understand the use case better: if the purpose is to make balancer 
not to overwhelm NameNode, there are other solutions that look even more 
promising, such as HDFS-13183, or HDFS-14162 (if consistent read from standby 
is enabled).

> Make MAX_LOCK_HOLD_MS and full queue size configurable
> --
>
> Key: HDFS-15041
> URL: https://issues.apache.org/jira/browse/HDFS-15041
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: 3.2.0
>Reporter: zhuqi
>Priority: Major
>
> Now the MAX_LOCK_HOLD_MS and the full queue size are fixed. But different 
> cluster have different need for the latency and the queue health standard. 
> We'd better to make the two parameter configurable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15032) Balancer crashes when it fails to contact an unavailable NN via ObserverReadProxyProvider

2019-12-09 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16991971#comment-16991971
 ] 

Hadoop QA commented on HDFS-15032:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
45s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
21s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
 3s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 16m 
18s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
34s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
31s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
18m 42s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  2m 
25s{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs in trunk has 1 extant 
Findbugs warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
42s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
21s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 15m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 15m 
33s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
2m 34s{color} | {color:orange} root: The patch generated 1 new + 2 unchanged - 
0 fixed = 3 total (was 2) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 53s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
39s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  8m 
54s{color} | {color:green} hadoop-common in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}100m 44s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  1m 
 7s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}218m 16s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.server.balancer.TestBalancerWithHANameNodes |
|   | hadoop.hdfs.server.namenode.TestFsck |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.5 Server=19.03.5 Image:yetus/hadoop:104ccca9169 |
| JIRA Issue | HDFS-15032 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12988365/HDFS-15032.004.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 98be0587895c 4.15.0-66-generic #75-Ubuntu SMP Tue Oct 1 
05:24:09 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / dc66de7 |
| maven | 

[jira] [Updated] (HDFS-14667) Backport [HDFS-14403] "Cost-based FairCallQueue" to branch-2

2019-12-09 Thread Jonathan Hung (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hung updated HDFS-14667:
-
Fix Version/s: (was: 2.11.0)

> Backport [HDFS-14403] "Cost-based FairCallQueue" to branch-2
> 
>
> Key: HDFS-14667
> URL: https://issues.apache.org/jira/browse/HDFS-14667
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Major
> Fix For: 2.10.0
>
> Attachments: HDFS-14403-branch-2.000.patch
>
>
> We would like to target pulling HDFS-14403, an important operability 
> enhancement, into branch-2.
> It's only present in trunk now so we also need to backport through the 3.x 
> lines.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15005) Backport HDFS-12300 to branch-2

2019-12-09 Thread Jonathan Hung (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hung updated HDFS-15005:
-
Fix Version/s: (was: 2.11.0)

> Backport HDFS-12300 to branch-2
> ---
>
> Key: HDFS-15005
> URL: https://issues.apache.org/jira/browse/HDFS-15005
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
> Fix For: 2.10.1
>
> Attachments: HDFS-15005-branch-2.000.patch, 
> HDFS-15005-branch-2.001.patch, HDFS-15005-branch-2.002.patch, 
> HDFS-15005-branch-2.003.patch
>
>
> Having DT related information is very useful in audit log. This tracks effort 
> to backport HDFS-12300 to branch-2.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15005) Backport HDFS-12300 to branch-2

2019-12-09 Thread Jonathan Hung (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16991917#comment-16991917
 ] 

Jonathan Hung commented on HDFS-15005:
--

Removing 2.11.0 fix version after branch-2 -> branch-2.10 rename

> Backport HDFS-12300 to branch-2
> ---
>
> Key: HDFS-15005
> URL: https://issues.apache.org/jira/browse/HDFS-15005
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
> Fix For: 2.10.1
>
> Attachments: HDFS-15005-branch-2.000.patch, 
> HDFS-15005-branch-2.001.patch, HDFS-15005-branch-2.002.patch, 
> HDFS-15005-branch-2.003.patch
>
>
> Having DT related information is very useful in audit log. This tracks effort 
> to backport HDFS-12300 to branch-2.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14986) ReplicaCachingGetSpaceUsed throws ConcurrentModificationException

2019-12-09 Thread Jonathan Hung (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16991911#comment-16991911
 ] 

Jonathan Hung commented on HDFS-14986:
--

Removing 2.11.0 fix version after branch-2 -> branch-2.10 rename

> ReplicaCachingGetSpaceUsed throws  ConcurrentModificationException
> --
>
> Key: HDFS-14986
> URL: https://issues.apache.org/jira/browse/HDFS-14986
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, performance
>Affects Versions: 2.10.0
>Reporter: Ryan Wu
>Assignee: Aiphago
>Priority: Major
> Fix For: 3.3.0, 2.10.1
>
> Attachments: HDFS-14986.001.patch, HDFS-14986.002.patch, 
> HDFS-14986.003.patch, HDFS-14986.004.patch, HDFS-14986.005.patch, 
> HDFS-14986.006.patch
>
>
> Running DU across lots of disks is very expensive . We applied the patch 
> HDFS-14313 to get  used space from ReplicaInfo in memory.However, new du 
> threads throw the exception
> {code:java}
> // 2019-11-08 18:07:13,858 ERROR 
> [refreshUsed-/home/vipshop/hard_disk/7/dfs/dn/current/BP-1203969992--1450855658517]
>  
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed:
>  ReplicaCachingGetSpaceUsed refresh error
> java.util.ConcurrentModificationException: Tree has been modified outside of 
> iterator
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.checkForModification(FoldedTreeSet.java:311)
> 
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.hasNext(FoldedTreeSet.java:256)
> 
> at java.util.AbstractCollection.addAll(AbstractCollection.java:343)
> at java.util.HashSet.(HashSet.java:120)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.deepCopyReplica(FsDatasetImpl.java:1052)
> 
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed.refresh(ReplicaCachingGetSpaceUsed.java:73)
> 
> at 
> org.apache.hadoop.fs.CachingGetSpaceUsed$RefreshThread.run(CachingGetSpaceUsed.java:178)
>    
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14986) ReplicaCachingGetSpaceUsed throws ConcurrentModificationException

2019-12-09 Thread Jonathan Hung (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hung updated HDFS-14986:
-
Fix Version/s: (was: 2.11.0)

> ReplicaCachingGetSpaceUsed throws  ConcurrentModificationException
> --
>
> Key: HDFS-14986
> URL: https://issues.apache.org/jira/browse/HDFS-14986
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, performance
>Affects Versions: 2.10.0
>Reporter: Ryan Wu
>Assignee: Aiphago
>Priority: Major
> Fix For: 3.3.0, 2.10.1
>
> Attachments: HDFS-14986.001.patch, HDFS-14986.002.patch, 
> HDFS-14986.003.patch, HDFS-14986.004.patch, HDFS-14986.005.patch, 
> HDFS-14986.006.patch
>
>
> Running DU across lots of disks is very expensive . We applied the patch 
> HDFS-14313 to get  used space from ReplicaInfo in memory.However, new du 
> threads throw the exception
> {code:java}
> // 2019-11-08 18:07:13,858 ERROR 
> [refreshUsed-/home/vipshop/hard_disk/7/dfs/dn/current/BP-1203969992--1450855658517]
>  
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed:
>  ReplicaCachingGetSpaceUsed refresh error
> java.util.ConcurrentModificationException: Tree has been modified outside of 
> iterator
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.checkForModification(FoldedTreeSet.java:311)
> 
> at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.hasNext(FoldedTreeSet.java:256)
> 
> at java.util.AbstractCollection.addAll(AbstractCollection.java:343)
> at java.util.HashSet.(HashSet.java:120)
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.deepCopyReplica(FsDatasetImpl.java:1052)
> 
> at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed.refresh(ReplicaCachingGetSpaceUsed.java:73)
> 
> at 
> org.apache.hadoop.fs.CachingGetSpaceUsed$RefreshThread.run(CachingGetSpaceUsed.java:178)
>    
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14973) Balancer getBlocks RPC dispersal does not function properly

2019-12-09 Thread Jonathan Hung (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16991910#comment-16991910
 ] 

Jonathan Hung commented on HDFS-14973:
--

Removing 2.11.0 fix version after branch-2 -> branch-2.10 rename

> Balancer getBlocks RPC dispersal does not function properly
> ---
>
> Key: HDFS-14973
> URL: https://issues.apache.org/jira/browse/HDFS-14973
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer  mover
>Affects Versions: 2.9.0, 2.7.4, 2.8.2, 3.0.0
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Major
> Fix For: 3.3.0, 3.1.4, 3.2.2, 2.10.1
>
> Attachments: HDFS-14973-branch-2.003.patch, 
> HDFS-14973-branch-2.004.patch, HDFS-14973-branch-2.005.patch, 
> HDFS-14973.000.patch, HDFS-14973.001.patch, HDFS-14973.002.patch, 
> HDFS-14973.003.patch, HDFS-14973.test.patch
>
>
> In HDFS-11384, a mechanism was added to make the {{getBlocks}} RPC calls 
> issued by the Balancer/Mover more dispersed, to alleviate load on the 
> NameNode, since {{getBlocks}} can be very expensive and the Balancer should 
> not impact normal cluster operation.
> Unfortunately, this functionality does not function as expected, especially 
> when the dispatcher thread count is low. The primary issue is that the delay 
> is applied only to the first N threads that are submitted to the dispatcher's 
> executor, where N is the size of the dispatcher's threadpool, but *not* to 
> the first R threads, where R is the number of allowed {{getBlocks}} QPS 
> (currently hardcoded to 20). For example, if the threadpool size is 100 (the 
> default), threads 0-19 have no delay, 20-99 have increased levels of delay, 
> and 100+ have no delay. As I understand it, the intent of the logic was that 
> the delay applied to the first 100 threads would force the dispatcher 
> executor's threads to all be consumed, thus blocking subsequent (non-delayed) 
> threads until the delay period has expired. However, threads 0-19 can finish 
> very quickly (their work can often be fulfilled in the time it takes to 
> execute a single {{getBlocks}} RPC, on the order of tens of milliseconds), 
> thus opening up 20 new slots in the executor, which are then consumed by 
> non-delayed threads 100-119, and so on. So, although 80 threads have had a 
> delay applied, the non-delay threads rush through in the 20 non-delay slots.
> This problem gets even worse when the dispatcher threadpool size is less than 
> the max {{getBlocks}} QPS. For example, if the threadpool size is 10, _no 
> threads ever have a delay applied_, and the feature is not enabled at all.
> This problem wasn't surfaced in the original JIRA because the test 
> incorrectly measured the period across which {{getBlocks}} RPCs were 
> distributed. The variables {{startGetBlocksTime}} and {{endGetBlocksTime}} 
> were used to track the time over which the {{getBlocks}} calls were made. 
> However, {{startGetBlocksTime}} was initialized at the time of creation of 
> the {{FSNameystem}} spy, which is before the mock DataNodes are started. Even 
> worse, the Balancer in this test takes 2 iterations to complete balancing the 
> cluster, so the time period {{endGetBlocksTime - startGetBlocksTime}} 
> actually represents:
> {code}
> (time to submit getBlocks RPCs) + (DataNode startup time) + (time for the 
> Dispatcher to complete an iteration of moving blocks)
> {code}
> Thus, the RPC QPS reported by the test is much lower than the RPC QPS seen 
> during the period of initial block fetching.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14973) Balancer getBlocks RPC dispersal does not function properly

2019-12-09 Thread Jonathan Hung (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hung updated HDFS-14973:
-
Fix Version/s: (was: 2.11.0)

> Balancer getBlocks RPC dispersal does not function properly
> ---
>
> Key: HDFS-14973
> URL: https://issues.apache.org/jira/browse/HDFS-14973
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer  mover
>Affects Versions: 2.9.0, 2.7.4, 2.8.2, 3.0.0
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Major
> Fix For: 3.3.0, 3.1.4, 3.2.2, 2.10.1
>
> Attachments: HDFS-14973-branch-2.003.patch, 
> HDFS-14973-branch-2.004.patch, HDFS-14973-branch-2.005.patch, 
> HDFS-14973.000.patch, HDFS-14973.001.patch, HDFS-14973.002.patch, 
> HDFS-14973.003.patch, HDFS-14973.test.patch
>
>
> In HDFS-11384, a mechanism was added to make the {{getBlocks}} RPC calls 
> issued by the Balancer/Mover more dispersed, to alleviate load on the 
> NameNode, since {{getBlocks}} can be very expensive and the Balancer should 
> not impact normal cluster operation.
> Unfortunately, this functionality does not function as expected, especially 
> when the dispatcher thread count is low. The primary issue is that the delay 
> is applied only to the first N threads that are submitted to the dispatcher's 
> executor, where N is the size of the dispatcher's threadpool, but *not* to 
> the first R threads, where R is the number of allowed {{getBlocks}} QPS 
> (currently hardcoded to 20). For example, if the threadpool size is 100 (the 
> default), threads 0-19 have no delay, 20-99 have increased levels of delay, 
> and 100+ have no delay. As I understand it, the intent of the logic was that 
> the delay applied to the first 100 threads would force the dispatcher 
> executor's threads to all be consumed, thus blocking subsequent (non-delayed) 
> threads until the delay period has expired. However, threads 0-19 can finish 
> very quickly (their work can often be fulfilled in the time it takes to 
> execute a single {{getBlocks}} RPC, on the order of tens of milliseconds), 
> thus opening up 20 new slots in the executor, which are then consumed by 
> non-delayed threads 100-119, and so on. So, although 80 threads have had a 
> delay applied, the non-delay threads rush through in the 20 non-delay slots.
> This problem gets even worse when the dispatcher threadpool size is less than 
> the max {{getBlocks}} QPS. For example, if the threadpool size is 10, _no 
> threads ever have a delay applied_, and the feature is not enabled at all.
> This problem wasn't surfaced in the original JIRA because the test 
> incorrectly measured the period across which {{getBlocks}} RPCs were 
> distributed. The variables {{startGetBlocksTime}} and {{endGetBlocksTime}} 
> were used to track the time over which the {{getBlocks}} calls were made. 
> However, {{startGetBlocksTime}} was initialized at the time of creation of 
> the {{FSNameystem}} spy, which is before the mock DataNodes are started. Even 
> worse, the Balancer in this test takes 2 iterations to complete balancing the 
> cluster, so the time period {{endGetBlocksTime - startGetBlocksTime}} 
> actually represents:
> {code}
> (time to submit getBlocks RPCs) + (DataNode startup time) + (time for the 
> Dispatcher to complete an iteration of moving blocks)
> {code}
> Thus, the RPC QPS reported by the test is much lower than the RPC QPS seen 
> during the period of initial block fetching.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14952) Skip safemode if blockTotal is 0 in new NN

2019-12-09 Thread Jonathan Hung (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16991904#comment-16991904
 ] 

Jonathan Hung commented on HDFS-14952:
--

Renaming 2.11.0 fix version to 2.10.1 after branch-2 -> branch-2.10 rename

> Skip safemode if blockTotal is 0 in new NN
> --
>
> Key: HDFS-14952
> URL: https://issues.apache.org/jira/browse/HDFS-14952
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Rajesh Balamohan
>Assignee: Xiaoqiao He
>Priority: Trivial
>  Labels: performance
> Fix For: 3.3.0, 3.1.4, 3.2.2, 2.10.1
>
> Attachments: HDFS-14952.001.patch, HDFS-14952.002.patch, 
> HDFS-14952.003.patch
>
>
> When new NN is installed, it spends 30-45 seconds in Safemode. When 
> {{blockTotal}} is 0, it should be possible to short circuit safemode check in 
> {{BlockManagerSafeMode::areThresholdsMet}}.
> https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManagerSafeMode.java#L571



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14952) Skip safemode if blockTotal is 0 in new NN

2019-12-09 Thread Jonathan Hung (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hung updated HDFS-14952:
-
Fix Version/s: (was: 2.11.0)
   2.10.1

> Skip safemode if blockTotal is 0 in new NN
> --
>
> Key: HDFS-14952
> URL: https://issues.apache.org/jira/browse/HDFS-14952
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Rajesh Balamohan
>Assignee: Xiaoqiao He
>Priority: Trivial
>  Labels: performance
> Fix For: 3.3.0, 3.1.4, 3.2.2, 2.10.1
>
> Attachments: HDFS-14952.001.patch, HDFS-14952.002.patch, 
> HDFS-14952.003.patch
>
>
> When new NN is installed, it spends 30-45 seconds in Safemode. When 
> {{blockTotal}} is 0, it should be possible to short circuit safemode check in 
> {{BlockManagerSafeMode::areThresholdsMet}}.
> https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManagerSafeMode.java#L571



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14884) Add sanity check that zone key equals feinfo key while setting Xattrs

2019-12-09 Thread Jonathan Hung (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16991896#comment-16991896
 ] 

Jonathan Hung commented on HDFS-14884:
--

Removing 2.11.0 fix version after branch-2 -> branch-2.10 rename

> Add sanity check that zone key equals feinfo key while setting Xattrs
> -
>
> Key: HDFS-14884
> URL: https://issues.apache.org/jira/browse/HDFS-14884
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: encryption, hdfs
>Affects Versions: 2.11.0
>Reporter: Mukul Kumar Singh
>Assignee: Mukul Kumar Singh
>Priority: Major
> Fix For: 3.3.0, 3.1.4, 3.2.2, 2.10.1
>
> Attachments: HDFS-14884-branch-2.001.patch, HDFS-14884.001.patch, 
> HDFS-14884.002.patch, HDFS-14884.003.patch, hdfs_distcp.patch
>
>
> Currently, it is possible to set an external attribute where the  zone key is 
> not the same as  feinfo key. This jira will add a precondition before setting 
> this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14884) Add sanity check that zone key equals feinfo key while setting Xattrs

2019-12-09 Thread Jonathan Hung (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hung updated HDFS-14884:
-
Fix Version/s: (was: 2.11.0)

> Add sanity check that zone key equals feinfo key while setting Xattrs
> -
>
> Key: HDFS-14884
> URL: https://issues.apache.org/jira/browse/HDFS-14884
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: encryption, hdfs
>Affects Versions: 2.11.0
>Reporter: Mukul Kumar Singh
>Assignee: Mukul Kumar Singh
>Priority: Major
> Fix For: 3.3.0, 3.1.4, 3.2.2, 2.10.1
>
> Attachments: HDFS-14884-branch-2.001.patch, HDFS-14884.001.patch, 
> HDFS-14884.002.patch, HDFS-14884.003.patch, hdfs_distcp.patch
>
>
> Currently, it is possible to set an external attribute where the  zone key is 
> not the same as  feinfo key. This jira will add a precondition before setting 
> this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14979) [Observer Node] Balancer should submit getBlocks to Observer Node when possible

2019-12-09 Thread Jonathan Hung (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16991899#comment-16991899
 ] 

Jonathan Hung commented on HDFS-14979:
--

Removing 2.11.0 fix version after branch-2 -> branch-2.10 rename

> [Observer Node] Balancer should submit getBlocks to Observer Node when 
> possible
> ---
>
> Key: HDFS-14979
> URL: https://issues.apache.org/jira/browse/HDFS-14979
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer  mover, hdfs
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Major
> Fix For: 3.3.0, 3.1.4, 3.2.2, 2.10.1
>
> Attachments: HDFS-14979.000.patch
>
>
> In HDFS-14162, we made it so that the Balancer could function when 
> {{ObserverReadProxyProvider}} was in use. However, the Balancer would still 
> read from the active NameNode, because {{getBlocks}} wasn't annotated as 
> {{@ReadOnly}}. This task is to enable the Balancer to actually read from the 
> Observer Node to alleviate load from the active NameNode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14979) [Observer Node] Balancer should submit getBlocks to Observer Node when possible

2019-12-09 Thread Jonathan Hung (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hung updated HDFS-14979:
-
Fix Version/s: (was: 2.11.0)

> [Observer Node] Balancer should submit getBlocks to Observer Node when 
> possible
> ---
>
> Key: HDFS-14979
> URL: https://issues.apache.org/jira/browse/HDFS-14979
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer  mover, hdfs
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Major
> Fix For: 3.3.0, 3.1.4, 3.2.2, 2.10.1
>
> Attachments: HDFS-14979.000.patch
>
>
> In HDFS-14162, we made it so that the Balancer could function when 
> {{ObserverReadProxyProvider}} was in use. However, the Balancer would still 
> read from the active NameNode, because {{getBlocks}} wasn't annotated as 
> {{@ReadOnly}}. This task is to enable the Balancer to actually read from the 
> Observer Node to alleviate load from the active NameNode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14590) [SBN Read] Add the document link to the top page

2019-12-09 Thread Jonathan Hung (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16991890#comment-16991890
 ] 

Jonathan Hung commented on HDFS-14590:
--

Removing 2.11.0 fix version after branch-2 -> branch-2.10 rename

> [SBN Read] Add the document link to the top page
> 
>
> Key: HDFS-14590
> URL: https://issues.apache.org/jira/browse/HDFS-14590
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: documentation
>Reporter: Takanobu Asanuma
>Assignee: Takanobu Asanuma
>Priority: Major
> Fix For: 3.3.0, 3.1.4, 3.2.2, 2.10.1
>
> Attachments: HDFS-14590.001.patch, HDFS-14590.002.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14958) TestBalancerWithNodeGroup is not using NetworkTopologyWithNodeGroup

2019-12-09 Thread Jonathan Hung (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16991891#comment-16991891
 ] 

Jonathan Hung commented on HDFS-14958:
--

Removing 2.11.0 fix version after branch-2 -> branch-2.10 rename

> TestBalancerWithNodeGroup is not using NetworkTopologyWithNodeGroup
> ---
>
> Key: HDFS-14958
> URL: https://issues.apache.org/jira/browse/HDFS-14958
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.1.3
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Minor
> Fix For: 3.3.0, 3.1.4, 3.2.2, 2.10.1
>
> Attachments: HDFS-14958.001.patch
>
>
> TestBalancerWithNodeGroup is intended to test with 
> {{NetworkTopologyWithNodeGroup}}, but it is not configured correctly.  
> Because {{DFSConfigKeys.DFS_USE_DFS_NETWORK_TOPOLOGY_KEY}} defaults to true, 
> {{CommonConfigurationKeysPublic.NET_TOPOLOGY_IMPL_KEY}} is ignored and the 
> test actually uses the default {{DFSNetworkTopology}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14958) TestBalancerWithNodeGroup is not using NetworkTopologyWithNodeGroup

2019-12-09 Thread Jonathan Hung (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hung updated HDFS-14958:
-
Fix Version/s: (was: 2.11.0)

> TestBalancerWithNodeGroup is not using NetworkTopologyWithNodeGroup
> ---
>
> Key: HDFS-14958
> URL: https://issues.apache.org/jira/browse/HDFS-14958
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.1.3
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Minor
> Fix For: 3.3.0, 3.1.4, 3.2.2, 2.10.1
>
> Attachments: HDFS-14958.001.patch
>
>
> TestBalancerWithNodeGroup is intended to test with 
> {{NetworkTopologyWithNodeGroup}}, but it is not configured correctly.  
> Because {{DFSConfigKeys.DFS_USE_DFS_NETWORK_TOPOLOGY_KEY}} defaults to true, 
> {{CommonConfigurationKeysPublic.NET_TOPOLOGY_IMPL_KEY}} is ignored and the 
> test actually uses the default {{DFSNetworkTopology}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14590) [SBN Read] Add the document link to the top page

2019-12-09 Thread Jonathan Hung (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hung updated HDFS-14590:
-
Fix Version/s: (was: 2.11.0)

> [SBN Read] Add the document link to the top page
> 
>
> Key: HDFS-14590
> URL: https://issues.apache.org/jira/browse/HDFS-14590
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: documentation
>Reporter: Takanobu Asanuma
>Assignee: Takanobu Asanuma
>Priority: Major
> Fix For: 3.3.0, 3.1.4, 3.2.2, 2.10.1
>
> Attachments: HDFS-14590.001.patch, HDFS-14590.002.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15042) Add more tests for ByteBufferPositionedReadable

2019-12-09 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HDFS-15042:
--
Summary: Add more tests for ByteBufferPositionedReadable   (was: add more 
tests for ByteBufferPositionedReadable )

> Add more tests for ByteBufferPositionedReadable 
> 
>
> Key: HDFS-15042
> URL: https://issues.apache.org/jira/browse/HDFS-15042
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: fs, test
>Affects Versions: 3.3.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
>
> There's a few corner cases of ByteBufferPositionedReadable which need to be 
> tested, mainly illegal read positions. Add them



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Issue Comment Deleted] (HDFS-15032) Balancer crashes when it fails to contact an unavailable NN via ObserverReadProxyProvider

2019-12-09 Thread Erik Krogen (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Krogen updated HDFS-15032:
---
Comment: was deleted

(was: | (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  8s{color} 
| {color:red} HDFS-15032 does not apply to trunk. Rebase required? Wrong 
Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HDFS-15032 |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28486/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.

)

> Balancer crashes when it fails to contact an unavailable NN via 
> ObserverReadProxyProvider
> -
>
> Key: HDFS-15032
> URL: https://issues.apache.org/jira/browse/HDFS-15032
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer  mover
>Affects Versions: 2.10.0
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Major
> Attachments: HDFS-15032.000.patch, HDFS-15032.001.patch, 
> HDFS-15032.002.patch, HDFS-15032.003.patch, HDFS-15032.004.patch, 
> debugger_with_tostring.png, debugger_without_tostring.png
>
>
> When trying to run the Balancer using ObserverReadProxyProvider (to allow it 
> to read from the Observer Node as described in HDFS-14979), if one of the NNs 
> isn't running, the Balancer will crash.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15032) Balancer crashes when it fails to contact an unavailable NN via ObserverReadProxyProvider

2019-12-09 Thread Erik Krogen (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16991822#comment-16991822
 ] 

Erik Krogen commented on HDFS-15032:


It looks like Yetus was trying to pick up the image as a patch:
{code}
HDFS-15032 patch is being downloaded at Mon Dec  9 17:47:52 UTC 2019 from
  
https://issues.apache.org/jira/secure/attachment/12988361/debugger_without_tostring.png
 -> Downloaded
{code}
I'm re-attaching v3 as v004 to get Yetus to pick it up (hopefully).

> Balancer crashes when it fails to contact an unavailable NN via 
> ObserverReadProxyProvider
> -
>
> Key: HDFS-15032
> URL: https://issues.apache.org/jira/browse/HDFS-15032
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer  mover
>Affects Versions: 2.10.0
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Major
> Attachments: HDFS-15032.000.patch, HDFS-15032.001.patch, 
> HDFS-15032.002.patch, HDFS-15032.003.patch, HDFS-15032.004.patch, 
> debugger_with_tostring.png, debugger_without_tostring.png
>
>
> When trying to run the Balancer using ObserverReadProxyProvider (to allow it 
> to read from the Observer Node as described in HDFS-14979), if one of the NNs 
> isn't running, the Balancer will crash.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15032) Balancer crashes when it fails to contact an unavailable NN via ObserverReadProxyProvider

2019-12-09 Thread Erik Krogen (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Krogen updated HDFS-15032:
---
Attachment: HDFS-15032.004.patch

> Balancer crashes when it fails to contact an unavailable NN via 
> ObserverReadProxyProvider
> -
>
> Key: HDFS-15032
> URL: https://issues.apache.org/jira/browse/HDFS-15032
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer  mover
>Affects Versions: 2.10.0
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Major
> Attachments: HDFS-15032.000.patch, HDFS-15032.001.patch, 
> HDFS-15032.002.patch, HDFS-15032.003.patch, HDFS-15032.004.patch, 
> debugger_with_tostring.png, debugger_without_tostring.png
>
>
> When trying to run the Balancer using ObserverReadProxyProvider (to allow it 
> to read from the Observer Node as described in HDFS-14979), if one of the NNs 
> isn't running, the Balancer will crash.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15032) Balancer crashes when it fails to contact an unavailable NN via ObserverReadProxyProvider

2019-12-09 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16991812#comment-16991812
 ] 

Hadoop QA commented on HDFS-15032:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  8s{color} 
| {color:red} HDFS-15032 does not apply to trunk. Rebase required? Wrong 
Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HDFS-15032 |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28486/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> Balancer crashes when it fails to contact an unavailable NN via 
> ObserverReadProxyProvider
> -
>
> Key: HDFS-15032
> URL: https://issues.apache.org/jira/browse/HDFS-15032
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer  mover
>Affects Versions: 2.10.0
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Major
> Attachments: HDFS-15032.000.patch, HDFS-15032.001.patch, 
> HDFS-15032.002.patch, HDFS-15032.003.patch, debugger_with_tostring.png, 
> debugger_without_tostring.png
>
>
> When trying to run the Balancer using ObserverReadProxyProvider (to allow it 
> to read from the Observer Node as described in HDFS-14979), if one of the NNs 
> isn't running, the Balancer will crash.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15032) Balancer crashes when it fails to contact an unavailable NN via ObserverReadProxyProvider

2019-12-09 Thread Erik Krogen (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16991806#comment-16991806
 ] 

Erik Krogen commented on HDFS-15032:


Hey [~shv], that's a great question. In many cases, only a reference to the 
proxy is kept, so it is the direct {{toString}} method of the proxy that you 
see. For example, this is what a debugger stopped in 
{{ObserverReadProxyProvider}} looks like without this change:
 !debugger_without_tostring.png! 
You can see that the proxy (which is really a combined proxy) is reporting that 
it is a {{NameNodeProtocolTranslatorPB}}, because it is the {{toString()}} 
method of the first proxy which is being used. This was misleading to me when I 
was trying to investigate this issue, as it led me to believe a plain 
{{NameNodeProtocol}} was showing up where I expected a {{BalancerProtocol}}. 
However with the change, it is more obvious what is going on:
 !debugger_with_tostring.png! 

I see your concern about the performance, however. I've added a v003 patch 
which replaces to string comparison with a call to {{Method.equals()}}, which I 
confirmed internally only does a few reference equality checks:
{code}
public boolean equals(Object obj) {
if (obj != null && obj instanceof Method) {
Method other = (Method)obj;
if ((getDeclaringClass() == other.getDeclaringClass())
&& (getName() == other.getName())) {
if (!returnType.equals(other.getReturnType()))
return false;
return equalParamTypes(parameterTypes, other.parameterTypes);
}
}
return false;
}
{code}
Let me know if that addresses your concerns. If you think it's too risky for 
performance, I'm fine with removing it.

> Balancer crashes when it fails to contact an unavailable NN via 
> ObserverReadProxyProvider
> -
>
> Key: HDFS-15032
> URL: https://issues.apache.org/jira/browse/HDFS-15032
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer  mover
>Affects Versions: 2.10.0
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Major
> Attachments: HDFS-15032.000.patch, HDFS-15032.001.patch, 
> HDFS-15032.002.patch, HDFS-15032.003.patch, debugger_with_tostring.png, 
> debugger_without_tostring.png
>
>
> When trying to run the Balancer using ObserverReadProxyProvider (to allow it 
> to read from the Observer Node as described in HDFS-14979), if one of the NNs 
> isn't running, the Balancer will crash.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15032) Balancer crashes when it fails to contact an unavailable NN via ObserverReadProxyProvider

2019-12-09 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16991801#comment-16991801
 ] 

Hadoop QA commented on HDFS-15032:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  9s{color} 
| {color:red} HDFS-15032 does not apply to trunk. Rebase required? Wrong 
Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HDFS-15032 |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/28485/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> Balancer crashes when it fails to contact an unavailable NN via 
> ObserverReadProxyProvider
> -
>
> Key: HDFS-15032
> URL: https://issues.apache.org/jira/browse/HDFS-15032
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer  mover
>Affects Versions: 2.10.0
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Major
> Attachments: HDFS-15032.000.patch, HDFS-15032.001.patch, 
> HDFS-15032.002.patch, HDFS-15032.003.patch, debugger_with_tostring.png, 
> debugger_without_tostring.png
>
>
> When trying to run the Balancer using ObserverReadProxyProvider (to allow it 
> to read from the Observer Node as described in HDFS-14979), if one of the NNs 
> isn't running, the Balancer will crash.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15032) Balancer crashes when it fails to contact an unavailable NN via ObserverReadProxyProvider

2019-12-09 Thread Erik Krogen (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Krogen updated HDFS-15032:
---
Attachment: debugger_with_tostring.png

> Balancer crashes when it fails to contact an unavailable NN via 
> ObserverReadProxyProvider
> -
>
> Key: HDFS-15032
> URL: https://issues.apache.org/jira/browse/HDFS-15032
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer  mover
>Affects Versions: 2.10.0
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Major
> Attachments: HDFS-15032.000.patch, HDFS-15032.001.patch, 
> HDFS-15032.002.patch, HDFS-15032.003.patch, debugger_with_tostring.png, 
> debugger_without_tostring.png
>
>
> When trying to run the Balancer using ObserverReadProxyProvider (to allow it 
> to read from the Observer Node as described in HDFS-14979), if one of the NNs 
> isn't running, the Balancer will crash.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15032) Balancer crashes when it fails to contact an unavailable NN via ObserverReadProxyProvider

2019-12-09 Thread Erik Krogen (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Krogen updated HDFS-15032:
---
Attachment: debugger_without_tostring.png

> Balancer crashes when it fails to contact an unavailable NN via 
> ObserverReadProxyProvider
> -
>
> Key: HDFS-15032
> URL: https://issues.apache.org/jira/browse/HDFS-15032
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer  mover
>Affects Versions: 2.10.0
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Major
> Attachments: HDFS-15032.000.patch, HDFS-15032.001.patch, 
> HDFS-15032.002.patch, HDFS-15032.003.patch, debugger_with_tostring.png, 
> debugger_without_tostring.png
>
>
> When trying to run the Balancer using ObserverReadProxyProvider (to allow it 
> to read from the Observer Node as described in HDFS-14979), if one of the NNs 
> isn't running, the Balancer will crash.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15032) Balancer crashes when it fails to contact an unavailable NN via ObserverReadProxyProvider

2019-12-09 Thread Erik Krogen (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Krogen updated HDFS-15032:
---
Attachment: HDFS-15032.003.patch

> Balancer crashes when it fails to contact an unavailable NN via 
> ObserverReadProxyProvider
> -
>
> Key: HDFS-15032
> URL: https://issues.apache.org/jira/browse/HDFS-15032
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer  mover
>Affects Versions: 2.10.0
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Major
> Attachments: HDFS-15032.000.patch, HDFS-15032.001.patch, 
> HDFS-15032.002.patch, HDFS-15032.003.patch
>
>
> When trying to run the Balancer using ObserverReadProxyProvider (to allow it 
> to read from the Observer Node as described in HDFS-14979), if one of the NNs 
> isn't running, the Balancer will crash.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-15042) add more tests for ByteBufferPositionedReadable

2019-12-09 Thread Steve Loughran (Jira)
Steve Loughran created HDFS-15042:
-

 Summary: add more tests for ByteBufferPositionedReadable 
 Key: HDFS-15042
 URL: https://issues.apache.org/jira/browse/HDFS-15042
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: fs, test
Affects Versions: 3.3.0
Reporter: Steve Loughran
Assignee: Steve Loughran


There's a few corner cases of ByteBufferPositionedReadable which need to be 
tested, mainly illegal read positions. Add them



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15041) Make MAX_LOCK_HOLD_MS and full queue size configurable

2019-12-09 Thread Wei-Chiu Chuang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16991755#comment-16991755
 ] 

Wei-Chiu Chuang commented on HDFS-15041:


I believe the HDFS-14553 does the latter for you.

> Make MAX_LOCK_HOLD_MS and full queue size configurable
> --
>
> Key: HDFS-15041
> URL: https://issues.apache.org/jira/browse/HDFS-15041
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: 3.2.0
>Reporter: zhuqi
>Priority: Major
>
> Now the MAX_LOCK_HOLD_MS and the full queue size are fixed. But different 
> cluster have different need for the latency and the queue health standard. 
> We'd better to make the two parameter configurable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15041) Make MAX_LOCK_HOLD_MS and full queue size configurable

2019-12-09 Thread zhuqi (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16991729#comment-16991729
 ] 

zhuqi commented on HDFS-15041:
--

cc [~daryn]  ,  [~weichiu]
Our cluster wants to change it in order to get the better balancer between 
latency and rpc queue size boom. What do you think about it ? May i have the 
access to assign to myself. 

Thanks.

> Make MAX_LOCK_HOLD_MS and full queue size configurable
> --
>
> Key: HDFS-15041
> URL: https://issues.apache.org/jira/browse/HDFS-15041
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: 3.2.0
>Reporter: zhuqi
>Priority: Major
>
> Now the MAX_LOCK_HOLD_MS and the full queue size are fixed. But different 
> cluster have different need for the latency and the queue health standard. 
> We'd better to make the two parameter configurable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-15041) Make MAX_LOCK_HOLD_MS and full queue size configurable

2019-12-09 Thread zhuqi (Jira)
zhuqi created HDFS-15041:


 Summary: Make MAX_LOCK_HOLD_MS and full queue size configurable
 Key: HDFS-15041
 URL: https://issues.apache.org/jira/browse/HDFS-15041
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Affects Versions: 3.2.0
Reporter: zhuqi


Now the MAX_LOCK_HOLD_MS and the full queue size are fixed. But different 
cluster have different need for the latency and the queue health standard. We'd 
better to make the two parameter configurable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15012) NN fails to parse Edit logs after applying HDFS-13101

2019-12-09 Thread Tsz-wo Sze (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16991695#comment-16991695
 ] 

Tsz-wo Sze commented on HDFS-15012:
---

+1 the 000 patch looks good.

> NN fails to parse Edit logs after applying HDFS-13101
> -
>
> Key: HDFS-15012
> URL: https://issues.apache.org/jira/browse/HDFS-15012
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: nn
>Reporter: Eric Lin
>Assignee: Shashikant Banerjee
>Priority: Blocker
>  Labels: release-blocker
> Attachments: HDFS-15012.000.patch
>
>
> After applying HDFS-13101, and deleting and creating large number of 
> snapshots, SNN exited with below error:
>   
> {code:sh}
> 2019-11-18 08:28:06,528 ERROR 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: Encountered exception 
> on operation DeleteSnapshotOp [snapshotRoot=/path/to/hdfs/file, 
> snapshotName=distcp-3479-31-old, 
> RpcClientId=b16a6cb5-bdbb-45ae-9f9a-f7dc57931f37, Rpc
> CallId=1]
> java.lang.AssertionError: Element already exists: 
> element=partition_isactive=true, DELETED=[partition_isactive=true]
> at org.apache.hadoop.hdfs.util.Diff.insert(Diff.java:193)
> at org.apache.hadoop.hdfs.util.Diff.delete(Diff.java:239)
> at org.apache.hadoop.hdfs.util.Diff.combinePosterior(Diff.java:462)
> at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature$DirectoryDiff$2.initChildren(DirectoryWithSnapshotFeature.java:240)
> at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature$DirectoryDiff$2.iterator(DirectoryWithSnapshotFeature.java:250)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtreeRecursively(INodeDirectory.java:755)
> at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature.cleanDirectory(DirectoryWithSnapshotFeature.java:753)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtree(INodeDirectory.java:790)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeReference.cleanSubtree(INodeReference.java:332)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeReference$WithName.cleanSubtree(INodeReference.java:583)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtreeRecursively(INodeDirectory.java:760)
> at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature.cleanDirectory(DirectoryWithSnapshotFeature.java:753)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtree(INodeDirectory.java:790)
> at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.DirectorySnapshottableFeature.removeSnapshot(DirectorySnapshottableFeature.java:235)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.removeSnapshot(INodeDirectory.java:259)
> at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.SnapshotManager.deleteSnapshot(SnapshotManager.java:301)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:688)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:232)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:141)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:903)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:756)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:324)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1144)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:796)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:614)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:676)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:844)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:823)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1547)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1615)
> {code}
> We confirmed that fsimage and edit files were NOT corrupted, as reverting 
> HDFS-13101 fixed the issue. So the logic introduced in HDFS-13101 is broken 
> and failed to parse edit log files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: