[jira] [Updated] (HDFS-15044) [Dynamometer] Show the line of audit log when parsing it unsuccessfully
[ https://issues.apache.org/jira/browse/HDFS-15044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takanobu Asanuma updated HDFS-15044: Status: Patch Available (was: Open) > [Dynamometer] Show the line of audit log when parsing it unsuccessfully > --- > > Key: HDFS-15044 > URL: https://issues.apache.org/jira/browse/HDFS-15044 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: tools >Reporter: Takanobu Asanuma >Assignee: Takanobu Asanuma >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-15044) [Dynamometer] Show the line of audit log when parsing it unsuccessfully
Takanobu Asanuma created HDFS-15044: --- Summary: [Dynamometer] Show the line of audit log when parsing it unsuccessfully Key: HDFS-15044 URL: https://issues.apache.org/jira/browse/HDFS-15044 Project: Hadoop HDFS Issue Type: Sub-task Components: tools Reporter: Takanobu Asanuma Assignee: Takanobu Asanuma -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14953) [Dynamometer] Missing blocks gradually increase after NN starts
[ https://issues.apache.org/jira/browse/HDFS-14953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16992267#comment-16992267 ] Takanobu Asanuma commented on HDFS-14953: - Sorry for late response. Actually, I ported your PR and tested with it, but it didn't fix my problem. Will investigate it further. > [Dynamometer] Missing blocks gradually increase after NN starts > --- > > Key: HDFS-14953 > URL: https://issues.apache.org/jira/browse/HDFS-14953 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: tools >Reporter: Takanobu Asanuma >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15040) RBF: Secured Router should not run when SecretManager is not running
[ https://issues.apache.org/jira/browse/HDFS-15040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16992263#comment-16992263 ] Hudson commented on HDFS-15040: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17746 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17746/]) HDFS-15040. RBF: Secured Router should not run when SecretManager is not (github: rev c4733377d0fa375a8d585f5cb1db79bf20ec6710) * (add) hadoop-hdfs-project/hadoop-hdfs-rbf/src/test/java/org/apache/hadoop/hdfs/server/federation/security/MockNotRunningSecretManager.java * (edit) hadoop-hdfs-project/hadoop-hdfs-rbf/src/test/java/org/apache/hadoop/hdfs/server/federation/security/TestRouterSecurityManager.java * (edit) hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/security/RouterSecurityManager.java > RBF: Secured Router should not run when SecretManager is not running > > > Key: HDFS-15040 > URL: https://issues.apache.org/jira/browse/HDFS-15040 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Takanobu Asanuma >Assignee: Takanobu Asanuma >Priority: Major > Fix For: 3.3.0 > > > We have faced an issue that router is running while SecretManager is not > running. HDFS-14835 is a similar fix which checks whether SecreatManager is > null or not. But it didn't cover this case. So we also need to check the > running status. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15041) Make MAX_LOCK_HOLD_MS and full queue size configurable
[ https://issues.apache.org/jira/browse/HDFS-15041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16992261#comment-16992261 ] zhuqi commented on HDFS-15041: -- Thanks for [~hexiaoqiao] to help to cc [~weichiu]. Now i am the Hadoop YARN Contributor, could you help me to add to Hadoop HDFS Contributor. It's my honor to contribute to Hadoop HDFS. > Make MAX_LOCK_HOLD_MS and full queue size configurable > -- > > Key: HDFS-15041 > URL: https://issues.apache.org/jira/browse/HDFS-15041 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: 3.2.0 >Reporter: zhuqi >Priority: Major > Attachments: HDFS-15041.001.patch > > > Now the MAX_LOCK_HOLD_MS and the full queue size are fixed. But different > cluster have different need for the latency and the queue health standard. > We'd better to make the two parameter configurable. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15043) RBF: The detail of the Exception is not shown in ZKDelegationTokenSecretManagerImpl
[ https://issues.apache.org/jira/browse/HDFS-15043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16992257#comment-16992257 ] Hudson commented on HDFS-15043: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17745 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17745/]) HDFS-15043. RBF: The detail of the Exception is not shown in (github: rev 9f098520517e3adfad0a2721284ccc19af3e6673) * (edit) hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/security/token/ZKDelegationTokenSecretManagerImpl.java > RBF: The detail of the Exception is not shown in > ZKDelegationTokenSecretManagerImpl > --- > > Key: HDFS-15043 > URL: https://issues.apache.org/jira/browse/HDFS-15043 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Akira Ajisaka >Assignee: Akira Ajisaka >Priority: Major > Fix For: 3.3.0 > > > In the constructor of ZKDTSMImpl, when IOException occurs in > super.startThreads(), the message of the exception is not logged. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15040) RBF: Secured Router should not run when SecretManager is not running
[ https://issues.apache.org/jira/browse/HDFS-15040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takanobu Asanuma updated HDFS-15040: Fix Version/s: 3.3.0 Resolution: Fixed Status: Resolved (was: Patch Available) Merged the PR into trunk. > RBF: Secured Router should not run when SecretManager is not running > > > Key: HDFS-15040 > URL: https://issues.apache.org/jira/browse/HDFS-15040 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Takanobu Asanuma >Assignee: Takanobu Asanuma >Priority: Major > Fix For: 3.3.0 > > > We have faced an issue that router is running while SecretManager is not > running. HDFS-14835 is a similar fix which checks whether SecreatManager is > null or not. But it didn't cover this case. So we also need to check the > running status. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14983) RBF: Add dfsrouteradmin -refreshSuperUserGroupsConfiguration command option
[ https://issues.apache.org/jira/browse/HDFS-14983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16992259#comment-16992259 ] Xieming Li commented on HDFS-14983: --- [~inigoiri], Thank you for your review. I have fixed that CheckStyle Error. > RBF: Add dfsrouteradmin -refreshSuperUserGroupsConfiguration command option > --- > > Key: HDFS-14983 > URL: https://issues.apache.org/jira/browse/HDFS-14983 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: rbf >Reporter: Akira Ajisaka >Assignee: Xieming Li >Priority: Minor > Attachments: HDFS-14983.002.patch, HDFS-14983.003.patch, > HDFS-14983.004.patch, HDFS-14983.draft.001.patch > > > NameNode can update proxyuser config by -refreshSuperUserGroupsConfiguration > without restarting but DFSRouter cannot. It would be better for DFSRouter to > have such functionality to be compatible with NameNode. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15041) Make MAX_LOCK_HOLD_MS and full queue size configurable
[ https://issues.apache.org/jira/browse/HDFS-15041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16992256#comment-16992256 ] Xiaoqiao He commented on HDFS-15041: Thanks [~zhuqi] for your quick response, It makes sense to me cc[~weichiu] could you help to add [~zhuqi] as contributor and assign this JIRA to him? BTW, please correct codestyle (such as alignment,) following https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute. > Make MAX_LOCK_HOLD_MS and full queue size configurable > -- > > Key: HDFS-15041 > URL: https://issues.apache.org/jira/browse/HDFS-15041 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: 3.2.0 >Reporter: zhuqi >Priority: Major > Attachments: HDFS-15041.001.patch > > > Now the MAX_LOCK_HOLD_MS and the full queue size are fixed. But different > cluster have different need for the latency and the queue health standard. > We'd better to make the two parameter configurable. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14983) RBF: Add dfsrouteradmin -refreshSuperUserGroupsConfiguration command option
[ https://issues.apache.org/jira/browse/HDFS-14983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xieming Li updated HDFS-14983: -- Attachment: HDFS-14983.004.patch Status: Patch Available (was: Open) > RBF: Add dfsrouteradmin -refreshSuperUserGroupsConfiguration command option > --- > > Key: HDFS-14983 > URL: https://issues.apache.org/jira/browse/HDFS-14983 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: rbf >Reporter: Akira Ajisaka >Assignee: Xieming Li >Priority: Minor > Attachments: HDFS-14983.002.patch, HDFS-14983.003.patch, > HDFS-14983.004.patch, HDFS-14983.draft.001.patch > > > NameNode can update proxyuser config by -refreshSuperUserGroupsConfiguration > without restarting but DFSRouter cannot. It would be better for DFSRouter to > have such functionality to be compatible with NameNode. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14983) RBF: Add dfsrouteradmin -refreshSuperUserGroupsConfiguration command option
[ https://issues.apache.org/jira/browse/HDFS-14983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xieming Li updated HDFS-14983: -- Status: Open (was: Patch Available) > RBF: Add dfsrouteradmin -refreshSuperUserGroupsConfiguration command option > --- > > Key: HDFS-14983 > URL: https://issues.apache.org/jira/browse/HDFS-14983 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: rbf >Reporter: Akira Ajisaka >Assignee: Xieming Li >Priority: Minor > Attachments: HDFS-14983.002.patch, HDFS-14983.003.patch, > HDFS-14983.draft.001.patch > > > NameNode can update proxyuser config by -refreshSuperUserGroupsConfiguration > without restarting but DFSRouter cannot. It would be better for DFSRouter to > have such functionality to be compatible with NameNode. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15043) RBF: The detail of the Exception is not shown in ZKDelegationTokenSecretManagerImpl
[ https://issues.apache.org/jira/browse/HDFS-15043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira Ajisaka updated HDFS-15043: - Fix Version/s: 3.3.0 Resolution: Fixed Status: Resolved (was: Patch Available) Merged the PR into trunk. > RBF: The detail of the Exception is not shown in > ZKDelegationTokenSecretManagerImpl > --- > > Key: HDFS-15043 > URL: https://issues.apache.org/jira/browse/HDFS-15043 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Akira Ajisaka >Assignee: Akira Ajisaka >Priority: Major > Fix For: 3.3.0 > > > In the constructor of ZKDTSMImpl, when IOException occurs in > super.startThreads(), the message of the exception is not logged. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15041) Make MAX_LOCK_HOLD_MS and full queue size configurable
[ https://issues.apache.org/jira/browse/HDFS-15041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16992240#comment-16992240 ] zhuqi commented on HDFS-15041: -- Hi [~hexiaoqiao] Our one cluster with too many deleted operations and write operations because of our new hive lifetime system with too many partitions . In some cases the 4ms will be too short to handle, so i want to let it to be configurable. Also some presto based realtime situation without using read from standby, may want to shorter the max lock time, in order to better the read performance. Thanks. > Make MAX_LOCK_HOLD_MS and full queue size configurable > -- > > Key: HDFS-15041 > URL: https://issues.apache.org/jira/browse/HDFS-15041 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: 3.2.0 >Reporter: zhuqi >Priority: Major > Attachments: HDFS-15041.001.patch > > > Now the MAX_LOCK_HOLD_MS and the full queue size are fixed. But different > cluster have different need for the latency and the queue health standard. > We'd better to make the two parameter configurable. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14997) BPServiceActor process command from NameNode asynchronously
[ https://issues.apache.org/jira/browse/HDFS-14997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16992225#comment-16992225 ] Xiaoqiao He commented on HDFS-14997: Thanks [~elgoiri] for your comments. Actually I have deployed this feature in our production cluster for weeks, It looks work well as expected. For separate #CommandProcessingThread class which it is only used by #BPServiceActor, I think keep it as one inner class of #BPServiceActor is OK for me, FYI. Thanks. > BPServiceActor process command from NameNode asynchronously > --- > > Key: HDFS-14997 > URL: https://issues.apache.org/jira/browse/HDFS-14997 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Xiaoqiao He >Assignee: Xiaoqiao He >Priority: Major > Attachments: HDFS-14997.001.patch, HDFS-14997.002.patch, > HDFS-14997.003.patch, HDFS-14997.004.patch, HDFS-14997.005.patch > > > There are two core functions, report(#sendHeartbeat, #blockReport, > #cacheReport) and #processCommand in #BPServiceActor main process flow. If > processCommand cost long time it will block send report flow. Meanwhile > processCommand could cost long time(over 1000s the worst case I meet) when IO > load of DataNode is very high. Since some IO operations are under > #datasetLock, So it has to wait to acquire #datasetLock long time when > process some of commands(such as #DNA_INVALIDATE). In such case, #heartbeat > will not send to NameNode in-time, and trigger other disasters. > I propose to improve #processCommand asynchronously and not block > #BPServiceActor to send heartbeat back to NameNode when meet high IO load. > Notes: > 1. Lifeline could be one effective solution, however some old branches are > not support this feature. > 2. IO operations under #datasetLock is another issue, I think we should solve > it at another JIRA. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15041) Make MAX_LOCK_HOLD_MS and full queue size configurable
[ https://issues.apache.org/jira/browse/HDFS-15041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16992223#comment-16992223 ] Xiaoqiao He commented on HDFS-15041: Thanks [~weichiu] for your invite. In my experience, 4 ms by default is suitable for me and not meet issues anymore, since this acts for RPC request #blockReceivedAndDeleted only. I also wonder the purposes why to tune this parameters. Some cases meet will be better to decide whether it is need. FYI. Thanks again. > Make MAX_LOCK_HOLD_MS and full queue size configurable > -- > > Key: HDFS-15041 > URL: https://issues.apache.org/jira/browse/HDFS-15041 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: 3.2.0 >Reporter: zhuqi >Priority: Major > Attachments: HDFS-15041.001.patch > > > Now the MAX_LOCK_HOLD_MS and the full queue size are fixed. But different > cluster have different need for the latency and the queue health standard. > We'd better to make the two parameter configurable. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-15041) Make MAX_LOCK_HOLD_MS and full queue size configurable
[ https://issues.apache.org/jira/browse/HDFS-15041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16992124#comment-16992124 ] zhuqi edited comment on HDFS-15041 at 12/10/19 5:48 AM: Hi [~weichiu] Yeah, i mean to make MAX_LOCK_HOLD_MS configurable after my change in HDFS-14553, because of the different need for client latency and balance the pressure for rpc queue. Sorry for the mistake , not mean the hdfs balancer, the balancer pressure i have changed to standby. was (Author: zhuqi): Hi [~weichiu] Yeah, i mean to make MAX_LOCK_HOLD_MS configurable after my change in HDFS-14553, because of the different need for client latency and balance the pressure for rpc queue. Sorry for the mistake , not mean the hdfs balancer, the balancer pressure i have changed to standby. Also we can add this queue size to metrics if needed? > Make MAX_LOCK_HOLD_MS and full queue size configurable > -- > > Key: HDFS-15041 > URL: https://issues.apache.org/jira/browse/HDFS-15041 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: 3.2.0 >Reporter: zhuqi >Priority: Major > Attachments: HDFS-15041.001.patch > > > Now the MAX_LOCK_HOLD_MS and the full queue size are fixed. But different > cluster have different need for the latency and the queue health standard. > We'd better to make the two parameter configurable. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15041) Make MAX_LOCK_HOLD_MS and full queue size configurable
[ https://issues.apache.org/jira/browse/HDFS-15041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhuqi updated HDFS-15041: - Attachment: HDFS-15041.001.patch Status: Patch Available (was: Open) > Make MAX_LOCK_HOLD_MS and full queue size configurable > -- > > Key: HDFS-15041 > URL: https://issues.apache.org/jira/browse/HDFS-15041 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: 3.2.0 >Reporter: zhuqi >Priority: Major > Attachments: HDFS-15041.001.patch > > > Now the MAX_LOCK_HOLD_MS and the full queue size are fixed. But different > cluster have different need for the latency and the queue health standard. > We'd better to make the two parameter configurable. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15043) RBF: The detail of the Exception is not shown in ZKDelegationTokenSecretManagerImpl
[ https://issues.apache.org/jira/browse/HDFS-15043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira Ajisaka updated HDFS-15043: - Assignee: Akira Ajisaka Status: Patch Available (was: Open) > RBF: The detail of the Exception is not shown in > ZKDelegationTokenSecretManagerImpl > --- > > Key: HDFS-15043 > URL: https://issues.apache.org/jira/browse/HDFS-15043 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Akira Ajisaka >Assignee: Akira Ajisaka >Priority: Major > > In the constructor of ZKDTSMImpl, when IOException occurs in > super.startThreads(), the message of the exception is not logged. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15043) RBF: The detail of the Exception is not shown in ZKDelegationTokenSecretManagerImpl
[ https://issues.apache.org/jira/browse/HDFS-15043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira Ajisaka updated HDFS-15043: - Target Version/s: 3.3.0 > RBF: The detail of the Exception is not shown in > ZKDelegationTokenSecretManagerImpl > --- > > Key: HDFS-15043 > URL: https://issues.apache.org/jira/browse/HDFS-15043 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Akira Ajisaka >Assignee: Akira Ajisaka >Priority: Major > > In the constructor of ZKDTSMImpl, when IOException occurs in > super.startThreads(), the message of the exception is not logged. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15043) RBF: The detail of the Exception is not shown in ZKDelegationTokenSecretManagerImpl
[ https://issues.apache.org/jira/browse/HDFS-15043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira Ajisaka updated HDFS-15043: - Description: In the constructor of ZKDTSMImpl, when IOException occurs in super.startThreads(), the message of the exception is not logged. > RBF: The detail of the Exception is not shown in > ZKDelegationTokenSecretManagerImpl > --- > > Key: HDFS-15043 > URL: https://issues.apache.org/jira/browse/HDFS-15043 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Akira Ajisaka >Priority: Major > > In the constructor of ZKDTSMImpl, when IOException occurs in > super.startThreads(), the message of the exception is not logged. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15043) RBF: The detail of the Exception is not shown in ZKDelegationTokenSecretManagerImpl
[ https://issues.apache.org/jira/browse/HDFS-15043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira Ajisaka updated HDFS-15043: - Summary: RBF: The detail of the Exception is not shown in ZKDelegationTokenSecretManagerImpl (was: RBF: The detail of the Exception is not logged in ZKDelegationTokenSecretManagerImpl) > RBF: The detail of the Exception is not shown in > ZKDelegationTokenSecretManagerImpl > --- > > Key: HDFS-15043 > URL: https://issues.apache.org/jira/browse/HDFS-15043 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Akira Ajisaka >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-15043) RBF: The detail of the Exception is not logged in ZKDelegationTokenSecretManagerImpl
Akira Ajisaka created HDFS-15043: Summary: RBF: The detail of the Exception is not logged in ZKDelegationTokenSecretManagerImpl Key: HDFS-15043 URL: https://issues.apache.org/jira/browse/HDFS-15043 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Akira Ajisaka -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15036) Active NameNode should not silently fail the image transfer
[ https://issues.apache.org/jira/browse/HDFS-15036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16992155#comment-16992155 ] Hadoop QA commented on HDFS-15036: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 47s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 23s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 0s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 44s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 11s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 43s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 2m 17s{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs in trunk has 1 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 20s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 57s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 57s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 0s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 1s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 31s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 20s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 10s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 99m 20s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 32s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}162m 55s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.datanode.TestDataNodeErasureCodingMetrics | | | hadoop.hdfs.TestReconstructStripedFileWithRandomECPolicy | | | hadoop.hdfs.server.namenode.TestNamenodeCapacityReport | | | hadoop.hdfs.TestReconstructStripedFile | | | hadoop.hdfs.server.namenode.TestFsck | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.5 Server=19.03.5 Image:yetus/hadoop:104ccca9169 | | JIRA Issue | HDFS-15036 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12988378/HDFS-15036.001.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 0e77d17e1e66 4.15.0-66-generic #75-Ubuntu SMP Tue Oct 1 05:24:09 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / dc66de7 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_222 | | findbugs | v3.1.0-RC1 | | findbugs | https://builds.apache.org/job/PreCommit-HDFS-Build/28488/artifact/out/branch-findbugs-hadoop-hdfs-project_hadoop-hdfs-warnings.html | | unit |
[jira] [Commented] (HDFS-15036) Active NameNode should not silently fail the image transfer
[ https://issues.apache.org/jira/browse/HDFS-15036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16992138#comment-16992138 ] Konstantin Shvachko commented on HDFS-15036: Good investigation and findings [~vagarychen]. # Could you add a comment explaining that {{ImageServlet}} should not reject images other than checkpoints. # I am still concerned about the "silent" part. Should we add some logging, so that next time we could see what happened on both nodes. > Active NameNode should not silently fail the image transfer > --- > > Key: HDFS-15036 > URL: https://issues.apache.org/jira/browse/HDFS-15036 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.10.0 >Reporter: Konstantin Shvachko >Assignee: Chen Liang >Priority: Major > Attachments: HDFS-15036.001.patch > > > Image transfer from Standby NameNode to Active silently fails on Active, > without any logging and not notifying the receiver side. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-15041) Make MAX_LOCK_HOLD_MS and full queue size configurable
[ https://issues.apache.org/jira/browse/HDFS-15041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16992124#comment-16992124 ] zhuqi edited comment on HDFS-15041 at 12/10/19 2:45 AM: Hi [~weichiu] Yeah, i mean to make MAX_LOCK_HOLD_MS configurable after my change in HDFS-14553, because of the different need for client latency and balance the pressure for rpc queue. Sorry for the mistake , not mean the hdfs balancer, the balancer pressure i have changed to standby. Also we can add this queue size to metrics if needed? was (Author: zhuqi): Hi [~weichiu] Yeah, i mean to make MAX_LOCK_HOLD_MS configurable after my change in HDFS-14553, because of the different need for client latency and balance the pressure for rpc queue. The balancer pressure i have changed to standby. Also we can add this queue size to metrics if needed. > Make MAX_LOCK_HOLD_MS and full queue size configurable > -- > > Key: HDFS-15041 > URL: https://issues.apache.org/jira/browse/HDFS-15041 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: 3.2.0 >Reporter: zhuqi >Priority: Major > > Now the MAX_LOCK_HOLD_MS and the full queue size are fixed. But different > cluster have different need for the latency and the queue health standard. > We'd better to make the two parameter configurable. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-15041) Make MAX_LOCK_HOLD_MS and full queue size configurable
[ https://issues.apache.org/jira/browse/HDFS-15041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16992124#comment-16992124 ] zhuqi edited comment on HDFS-15041 at 12/10/19 2:26 AM: Hi [~weichiu] Yeah, i mean to make MAX_LOCK_HOLD_MS configurable after my change in HDFS-14553, because of the different need for client latency and balance the pressure for rpc queue. The balancer pressure i have changed to standby. Also we can add this queue size to metrics if needed. was (Author: zhuqi): Hi [~weichiu] Yeah, i mean to make MAX_LOCK_HOLD_MS configurable after your change in HDFS-14553, because of the different need for client latency and balance the pressure for rpc queue. The balancer pressure i have changed to standby. > Make MAX_LOCK_HOLD_MS and full queue size configurable > -- > > Key: HDFS-15041 > URL: https://issues.apache.org/jira/browse/HDFS-15041 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: 3.2.0 >Reporter: zhuqi >Priority: Major > > Now the MAX_LOCK_HOLD_MS and the full queue size are fixed. But different > cluster have different need for the latency and the queue health standard. > We'd better to make the two parameter configurable. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15041) Make MAX_LOCK_HOLD_MS and full queue size configurable
[ https://issues.apache.org/jira/browse/HDFS-15041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16992124#comment-16992124 ] zhuqi commented on HDFS-15041: -- Hi [~weichiu] Yeah, i mean to make MAX_LOCK_HOLD_MS configurable after your change in HDFS-14553, because of the different need for client latency and balance the pressure for rpc queue. The balancer pressure i have changed to standby. > Make MAX_LOCK_HOLD_MS and full queue size configurable > -- > > Key: HDFS-15041 > URL: https://issues.apache.org/jira/browse/HDFS-15041 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: 3.2.0 >Reporter: zhuqi >Priority: Major > > Now the MAX_LOCK_HOLD_MS and the full queue size are fixed. But different > cluster have different need for the latency and the queue health standard. > We'd better to make the two parameter configurable. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14522) Allow compact property description in xml in httpfs
[ https://issues.apache.org/jira/browse/HDFS-14522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira Ajisaka updated HDFS-14522: - Fix Version/s: 3.3.0 Resolution: Fixed Status: Resolved (was: Patch Available) Thank you [~iwasakims] and [~inigoiri]! > Allow compact property description in xml in httpfs > --- > > Key: HDFS-14522 > URL: https://issues.apache.org/jira/browse/HDFS-14522 > Project: Hadoop HDFS > Issue Type: Improvement > Components: httpfs >Reporter: Akira Ajisaka >Assignee: Masatake Iwasaki >Priority: Major > Fix For: 3.3.0 > > > HADOOP-6964 allowed compact property description in Hadoop configuration, > however, it is not allowed in httpfs. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14522) Allow compact property description in xml in httpfs
[ https://issues.apache.org/jira/browse/HDFS-14522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16992105#comment-16992105 ] Hudson commented on HDFS-14522: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17744 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17744/]) HDFS-14522. Allow compact property description in xml in httpfs. (#1737) (github: rev 4dffd81bb75efaa5742d2246354ebdc86cbd1aab) * (add) hadoop-hdfs-project/hadoop-hdfs-httpfs/src/test/resources/test-compact-format-property.xml * (edit) hadoop-hdfs-project/hadoop-hdfs-httpfs/src/test/java/org/apache/hadoop/lib/util/TestConfigurationUtils.java * (edit) hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/java/org/apache/hadoop/lib/util/ConfigurationUtils.java > Allow compact property description in xml in httpfs > --- > > Key: HDFS-14522 > URL: https://issues.apache.org/jira/browse/HDFS-14522 > Project: Hadoop HDFS > Issue Type: Improvement > Components: httpfs >Reporter: Akira Ajisaka >Assignee: Masatake Iwasaki >Priority: Major > > HADOOP-6964 allowed compact property description in Hadoop configuration, > however, it is not allowed in httpfs. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15036) Active NameNode should not silently fail the image transfer
[ https://issues.apache.org/jira/browse/HDFS-15036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Liang updated HDFS-15036: -- Target Version/s: 3.3.0, 2.10.1 (was: 2.10.1) Status: Patch Available (was: Open) > Active NameNode should not silently fail the image transfer > --- > > Key: HDFS-15036 > URL: https://issues.apache.org/jira/browse/HDFS-15036 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.10.0 >Reporter: Konstantin Shvachko >Assignee: Chen Liang >Priority: Major > Attachments: HDFS-15036.001.patch > > > Image transfer from Standby NameNode to Active silently fails on Active, > without any logging and not notifying the receiver side. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15036) Active NameNode should not silently fail the image transfer
[ https://issues.apache.org/jira/browse/HDFS-15036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Liang updated HDFS-15036: -- Attachment: HDFS-15036.001.patch > Active NameNode should not silently fail the image transfer > --- > > Key: HDFS-15036 > URL: https://issues.apache.org/jira/browse/HDFS-15036 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.10.0 >Reporter: Konstantin Shvachko >Assignee: Chen Liang >Priority: Major > Attachments: HDFS-15036.001.patch > > > Image transfer from Standby NameNode to Active silently fails on Active, > without any logging and not notifying the receiver side. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15032) Balancer crashes when it fails to contact an unavailable NN via ObserverReadProxyProvider
[ https://issues.apache.org/jira/browse/HDFS-15032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16992069#comment-16992069 ] Konstantin Shvachko commented on HDFS-15032: Hey Erik, I see what you mean now. But it looks like an IntelliJ specific thing. I don't see this in my Eclipse. When I click on the ProxyCombiner variable Eclipse shows what your toString() specifies. Performance with {{Method.equals()}} is better as it compares references, rather than strings, but I don't think it is worth it. Looks like {{TestBalancerWithHANameNodes}} timed out on Jenkins. I saw it timing out on my Linux box once as well. > Balancer crashes when it fails to contact an unavailable NN via > ObserverReadProxyProvider > - > > Key: HDFS-15032 > URL: https://issues.apache.org/jira/browse/HDFS-15032 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer mover >Affects Versions: 2.10.0 >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Major > Attachments: HDFS-15032.000.patch, HDFS-15032.001.patch, > HDFS-15032.002.patch, HDFS-15032.003.patch, HDFS-15032.004.patch, > debugger_with_tostring.png, debugger_without_tostring.png > > > When trying to run the Balancer using ObserverReadProxyProvider (to allow it > to read from the Observer Node as described in HDFS-14979), if one of the NNs > isn't running, the Balancer will crash. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-15036) Active NameNode should not silently fail the image transfer
[ https://issues.apache.org/jira/browse/HDFS-15036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Liang reassigned HDFS-15036: - Assignee: Chen Liang (was: Chao Sun) > Active NameNode should not silently fail the image transfer > --- > > Key: HDFS-15036 > URL: https://issues.apache.org/jira/browse/HDFS-15036 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.10.0 >Reporter: Konstantin Shvachko >Assignee: Chen Liang >Priority: Major > > Image transfer from Standby NameNode to Active silently fails on Active, > without any logging and not notifying the receiver side. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15036) Active NameNode should not silently fail the image transfer
[ https://issues.apache.org/jira/browse/HDFS-15036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16992029#comment-16992029 ] Chen Liang commented on HDFS-15036: --- [~csun] np, sure, thanks for asking :) . Assigning to myself then. > Active NameNode should not silently fail the image transfer > --- > > Key: HDFS-15036 > URL: https://issues.apache.org/jira/browse/HDFS-15036 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.10.0 >Reporter: Konstantin Shvachko >Assignee: Chao Sun >Priority: Major > > Image transfer from Standby NameNode to Active silently fails on Active, > without any logging and not notifying the receiver side. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14852) Remove of LowRedundancyBlocks do NOT remove the block from all queues
[ https://issues.apache.org/jira/browse/HDFS-14852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16992025#comment-16992025 ] Stephen O'Donnell commented on HDFS-14852: -- Looking at the original code, in BlockManager.removeBlock(), it does not guess the level to remove from, it always passes LowRedundancyBlocks.LEVEL. {code} neededReconstruction.remove(block, LowRedundancyBlocks.LEVEL); {code} This means the first part of the method is never executed, and it will always iterate all the queues until it finds an entry to remove: {code} if(priLevel >= 0 && priLevel < LEVEL // Never executed on block delete as priLevel == LEVEL && priorityQueues.get(priLevel).remove(block)) { ... return true; } else { // Try to remove the block from all queues if the block was // not found in the queue for the given priority level. for (int i = 0; i < LEVEL; i++) { if (i != priLevel && priorityQueues.get(i).remove(block)) { NameNode.blockStateChangeLog.debug( "BLOCK* NameSystem.LowRedundancyBlock.remove: Removing block" + " {} from priority queue {}", block, i); decrementBlockStat(block, i, oldExpectedReplicas); return true; } } } return false; } {code} In the most common case, for a delete of a given block, there will be no reference in the lowRedundancyQueue (most blocks are perfectly replicated), but based on the above, it has always been checking all 5 queues the majority of the time, so I wonder if the performance concern of deleting all queues is as bad as we think. I wonder if the call to remove I mentioned above should always have been: {code} neededReconstruction.remove(block, LowRedundancyBlocks.LEVEL - 1); {code} That way it would always attempt to delete from the corrupt list and if it gets nothing, try the other queues. If something is left behind in the other queues it would get deleted anyway later by the redundancy monitor. Other calls to neededReconstruction.remove() pass a priority, but that is because those calls know the queue the block was taken from (they don't really guess / calculate the priority, they just know where it came from), but as the write lock is dropped after getting the list of blocks the block could be moved to another level: {code} int computeBlockReconstructionWork(int blocksToProcess) { List> blocksToReconstruct = null; namesystem.writeLock(); try { // Choose the blocks to be reconstructed blocksToReconstruct = neededReconstruction .chooseLowRedundancyBlocks(blocksToProcess); } finally { namesystem.writeUnlock(); } return computeReconstructionWorkForBlocks(blocksToReconstruct); } {code} I need to check the 005 patch a bit more tomorrow and think on this a bit more. Based on my logic above, where the common case for deletes already checks all unless it finds a match, and the other cases pass a priority which is almost always correct, and rarely iterator the queues, I do wonder if simply deleting all queues is the simplest solution. > Remove of LowRedundancyBlocks do NOT remove the block from all queues > - > > Key: HDFS-14852 > URL: https://issues.apache.org/jira/browse/HDFS-14852 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 3.2.0, 3.0.3, 3.1.2, 3.3.0 >Reporter: Fei Hui >Assignee: Fei Hui >Priority: Major > Attachments: CorruptBlocksMismatch.png, HDFS-14852.001.patch, > HDFS-14852.002.patch, HDFS-14852.003.patch, HDFS-14852.004.patch, > HDFS-14852.005.patch, screenshot-1.png > > > LowRedundancyBlocks.java > {code:java} > // Some comments here > if(priLevel >= 0 && priLevel < LEVEL > && priorityQueues.get(priLevel).remove(block)) { > NameNode.blockStateChangeLog.debug( > "BLOCK* NameSystem.LowRedundancyBlock.remove: Removing block {}" > + " from priority queue {}", > block, priLevel); > decrementBlockStat(block, priLevel, oldExpectedReplicas); > return true; > } else { > // Try to remove the block from all queues if the block was > // not found in the queue for the given priority level. > for (int i = 0; i < LEVEL; i++) { > if (i != priLevel && priorityQueues.get(i).remove(block)) { > NameNode.blockStateChangeLog.debug( > "BLOCK* NameSystem.LowRedundancyBlock.remove: Removing block" + > " {} from priority queue {}", block, i); > decrementBlockStat(block, i, oldExpectedReplicas); > return true; > } > } > } > return false; > } > {code} > Source code is above, the comments as follow > {quote} >
[jira] [Commented] (HDFS-6874) Add GETFILEBLOCKLOCATIONS operation to HttpFS
[ https://issues.apache.org/jira/browse/HDFS-6874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16992023#comment-16992023 ] Íñigo Goiri commented on HDFS-6874: --- {quote} I think we need to implement getblocklocations in httpfs and call getfileblocklocations . {quote} Yes, I think that's the way to go. > Add GETFILEBLOCKLOCATIONS operation to HttpFS > - > > Key: HDFS-6874 > URL: https://issues.apache.org/jira/browse/HDFS-6874 > Project: Hadoop HDFS > Issue Type: Improvement > Components: httpfs >Affects Versions: 2.4.1, 2.7.3 >Reporter: Gao Zhong Liang >Assignee: Weiwei Yang >Priority: Major > Labels: BB2015-05-TBR > Attachments: HDFS-6874-1.patch, HDFS-6874-branch-2.6.0.patch, > HDFS-6874.02.patch, HDFS-6874.03.patch, HDFS-6874.04.patch, > HDFS-6874.05.patch, HDFS-6874.06.patch, HDFS-6874.07.patch, > HDFS-6874.08.patch, HDFS-6874.09.patch, HDFS-6874.10.patch, HDFS-6874.patch > > > GETFILEBLOCKLOCATIONS operation is missing in HttpFS, which is already > supported in WebHDFS. For the request of GETFILEBLOCKLOCATIONS in > org.apache.hadoop.fs.http.server.HttpFSServer, BAD_REQUEST is returned so far: > ... > case GETFILEBLOCKLOCATIONS: { > response = Response.status(Response.Status.BAD_REQUEST).build(); > break; > } > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15036) Active NameNode should not silently fail the image transfer
[ https://issues.apache.org/jira/browse/HDFS-15036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16992018#comment-16992018 ] Chao Sun commented on HDFS-15036: - [~vagarychen] sorry for grabbing this JIRA too soon :) Since you have done much study on this, do you want to take this JIRA instead? > Active NameNode should not silently fail the image transfer > --- > > Key: HDFS-15036 > URL: https://issues.apache.org/jira/browse/HDFS-15036 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.10.0 >Reporter: Konstantin Shvachko >Assignee: Chao Sun >Priority: Major > > Image transfer from Standby NameNode to Active silently fails on Active, > without any logging and not notifying the receiver side. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-15036) Active NameNode should not silently fail the image transfer
[ https://issues.apache.org/jira/browse/HDFS-15036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16991998#comment-16991998 ] Chen Liang edited comment on HDFS-15036 at 12/9/19 10:36 PM: - Spent some time debugging this issue, I think I found the cause of the issue. In HDFS-12979, we introduced a logic that, if a image being uploaded is not too far ahead of the previous image, this image upload request is rejected. This is to prevent the scenario when there are multiple SbNs, all SbNs upload images to ANN too frequently. This is considered as correct behavior, so there is no logging indication of any error or anything here (the being "silent" part). Both ANN and SbN simply ignore and proceed. But now it appears that, a side effect of this change, is that during RU, the rollback image also has to go through this check, and it could also be rejected. If this happens, SbN proceeds assuming upload is done, while ANN proceeds with still not receiving the rollback image. The upload silently failed in this case. The check logic that rejects the upload is in {{ImageServlet}}. In my earlier test, I just commented out the whole block below and the issue seems gone. But I think the fix is probably just adding a new check to ensure this rejection only applies to regular image upload, not rollback image, like the newly added line in the line in the follow code snippet. But I haven't actually tested changing it this way.: {code:java} if (checkRecentImageEnable && NameNodeFile.IMAGE.equals(parsedParams.getNameNodeFile()) && // <--- this should fix the issue, as NameNodeFile.IMAGE_ROLLBACK should bypass this timeDelta < checkpointPeriod && txid - lastCheckpointTxid < checkpointTxnCount) { // only when at least one of two conditions are met we accept // a new fsImage // 1. most recent image's txid is too far behind // 2. last checkpoint time was too old response.sendError(HttpServletResponse.SC_CONFLICT, "Most recent checkpoint is neither too far behind in " + "txid, nor too old. New txnid cnt is " + (txid - lastCheckpointTxid) + ", expecting at least " + checkpointTxnCount + " unless too long since last upload."); return null; } {code} was (Author: vagarychen): Spent some time debugging this issue, I think I found the cause of the issue. In HDFS-12979, we introduced a logic that, if a image being uploaded is not too far ahead of the previous image, this image upload request is rejected. This is to prevent the scenario when there are multiple SbNs, all SbNs upload images to ANN too frequently. This is considered as correct behavior, so there is no logging indication of any error or anything here (the being "silent" part). Both ANN and SbN simply ignore and proceed. But now it appears that, a side effect of this change, is that during RU, the rollback image also has to go through this check, and it could also be rejected. If this happens, SbN proceeds assuming upload is done, while ANN proceeds with still not receiving the rollback image. The upload silently failed in this case. The check logic that rejects the upload is in {{ImageServlet}}. In my earlier test, I just commented out the whole block below and the issue seems gone. But I think the fix is probably just adding a new check to ensure this rejection only applies to regular image upload, like the newly added line in the line in the follow code snippet. But I haven't actually tested changing it this way.: {code} if (checkRecentImageEnable && NameNodeFile.IMAGE.equals(parsedParams.getNameNodeFile()) && // <--- this should fix the issue timeDelta < checkpointPeriod && txid - lastCheckpointTxid < checkpointTxnCount) { // only when at least one of two conditions are met we accept // a new fsImage // 1. most recent image's txid is too far behind // 2. last checkpoint time was too old response.sendError(HttpServletResponse.SC_CONFLICT, "Most recent checkpoint is neither too far behind in " + "txid, nor too old. New txnid cnt is " + (txid - lastCheckpointTxid) + ", expecting at least " + checkpointTxnCount + " unless too long since last upload."); return null; } {code} > Active NameNode should not silently fail the image transfer > --- > > Key: HDFS-15036 > URL:
[jira] [Commented] (HDFS-15036) Active NameNode should not silently fail the image transfer
[ https://issues.apache.org/jira/browse/HDFS-15036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16991998#comment-16991998 ] Chen Liang commented on HDFS-15036: --- Spent some time debugging this issue, I think I found the cause of the issue. In HDFS-12979, we introduced a logic that, if a image being uploaded is not too far ahead of the previous image, this image upload request is rejected. This is to prevent the scenario when there are multiple SbNs, all SbNs upload images to ANN too frequently. This is considered as correct behavior, so there is no logging indication of any error or anything here (the being "silent" part). Both ANN and SbN simply ignore and proceed. But now it appears that, a side effect of this change, is that during RU, the rollback image also has to go through this check, and it could also be rejected. If this happens, SbN proceeds assuming upload is done, while ANN proceeds with still not receiving the rollback image. The upload silently failed in this case. The check logic that rejects the upload is in {{ImageServlet}}. In my earlier test, I just commented out the whole block below and the issue seems gone. But I think the fix is probably just adding a new check to ensure this rejection only applies to regular image upload, like the newly added line in the line in the follow code snippet. But I haven't actually tested changing it this way.: {code} if (checkRecentImageEnable && NameNodeFile.IMAGE.equals(parsedParams.getNameNodeFile()) && // <--- this should fix the issue timeDelta < checkpointPeriod && txid - lastCheckpointTxid < checkpointTxnCount) { // only when at least one of two conditions are met we accept // a new fsImage // 1. most recent image's txid is too far behind // 2. last checkpoint time was too old response.sendError(HttpServletResponse.SC_CONFLICT, "Most recent checkpoint is neither too far behind in " + "txid, nor too old. New txnid cnt is " + (txid - lastCheckpointTxid) + ", expecting at least " + checkpointTxnCount + " unless too long since last upload."); return null; } {code} > Active NameNode should not silently fail the image transfer > --- > > Key: HDFS-15036 > URL: https://issues.apache.org/jira/browse/HDFS-15036 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.10.0 >Reporter: Konstantin Shvachko >Assignee: Chao Sun >Priority: Major > > Image transfer from Standby NameNode to Active silently fails on Active, > without any logging and not notifying the receiver side. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14983) RBF: Add dfsrouteradmin -refreshSuperUserGroupsConfiguration command option
[ https://issues.apache.org/jira/browse/HDFS-14983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16991984#comment-16991984 ] Íñigo Goiri commented on HDFS-14983: For the checkstyle warning you can break the line in TestRouterRefreshSuperUserGroupsConfiguration. Other than that this looks good to go. Hopefully the next Yetus run will be clean. > RBF: Add dfsrouteradmin -refreshSuperUserGroupsConfiguration command option > --- > > Key: HDFS-14983 > URL: https://issues.apache.org/jira/browse/HDFS-14983 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: rbf >Reporter: Akira Ajisaka >Assignee: Xieming Li >Priority: Minor > Attachments: HDFS-14983.002.patch, HDFS-14983.003.patch, > HDFS-14983.draft.001.patch > > > NameNode can update proxyuser config by -refreshSuperUserGroupsConfiguration > without restarting but DFSRouter cannot. It would be better for DFSRouter to > have such functionality to be compatible with NameNode. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15041) Make MAX_LOCK_HOLD_MS and full queue size configurable
[ https://issues.apache.org/jira/browse/HDFS-15041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16991983#comment-16991983 ] Wei-Chiu Chuang commented on HDFS-15041: [~hexiaoqiao] does it make sense to make MAX_LOCK_HOLD_MS configurable after your change in HDFS-14553? Also, let's understand the use case better: if the purpose is to make balancer not to overwhelm NameNode, there are other solutions that look even more promising, such as HDFS-13183, or HDFS-14162 (if consistent read from standby is enabled). > Make MAX_LOCK_HOLD_MS and full queue size configurable > -- > > Key: HDFS-15041 > URL: https://issues.apache.org/jira/browse/HDFS-15041 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: 3.2.0 >Reporter: zhuqi >Priority: Major > > Now the MAX_LOCK_HOLD_MS and the full queue size are fixed. But different > cluster have different need for the latency and the queue health standard. > We'd better to make the two parameter configurable. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15032) Balancer crashes when it fails to contact an unavailable NN via ObserverReadProxyProvider
[ https://issues.apache.org/jira/browse/HDFS-15032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16991971#comment-16991971 ] Hadoop QA commented on HDFS-15032: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 45s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 21s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 3s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 16m 18s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 2m 34s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 31s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 18m 42s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 2m 25s{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs in trunk has 1 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 42s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 21s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 15m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 15m 33s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 2m 34s{color} | {color:orange} root: The patch generated 1 new + 2 unchanged - 0 fixed = 3 total (was 2) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 53s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 19s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 39s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 54s{color} | {color:green} hadoop-common in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red}100m 44s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 1m 7s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}218m 16s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.balancer.TestBalancerWithHANameNodes | | | hadoop.hdfs.server.namenode.TestFsck | \\ \\ || Subsystem || Report/Notes || | Docker | Client=19.03.5 Server=19.03.5 Image:yetus/hadoop:104ccca9169 | | JIRA Issue | HDFS-15032 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12988365/HDFS-15032.004.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 98be0587895c 4.15.0-66-generic #75-Ubuntu SMP Tue Oct 1 05:24:09 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / dc66de7 | | maven |
[jira] [Updated] (HDFS-14667) Backport [HDFS-14403] "Cost-based FairCallQueue" to branch-2
[ https://issues.apache.org/jira/browse/HDFS-14667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hung updated HDFS-14667: - Fix Version/s: (was: 2.11.0) > Backport [HDFS-14403] "Cost-based FairCallQueue" to branch-2 > > > Key: HDFS-14667 > URL: https://issues.apache.org/jira/browse/HDFS-14667 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Major > Fix For: 2.10.0 > > Attachments: HDFS-14403-branch-2.000.patch > > > We would like to target pulling HDFS-14403, an important operability > enhancement, into branch-2. > It's only present in trunk now so we also need to backport through the 3.x > lines. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15005) Backport HDFS-12300 to branch-2
[ https://issues.apache.org/jira/browse/HDFS-15005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hung updated HDFS-15005: - Fix Version/s: (was: 2.11.0) > Backport HDFS-12300 to branch-2 > --- > > Key: HDFS-15005 > URL: https://issues.apache.org/jira/browse/HDFS-15005 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Reporter: Chao Sun >Assignee: Chao Sun >Priority: Major > Fix For: 2.10.1 > > Attachments: HDFS-15005-branch-2.000.patch, > HDFS-15005-branch-2.001.patch, HDFS-15005-branch-2.002.patch, > HDFS-15005-branch-2.003.patch > > > Having DT related information is very useful in audit log. This tracks effort > to backport HDFS-12300 to branch-2. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15005) Backport HDFS-12300 to branch-2
[ https://issues.apache.org/jira/browse/HDFS-15005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16991917#comment-16991917 ] Jonathan Hung commented on HDFS-15005: -- Removing 2.11.0 fix version after branch-2 -> branch-2.10 rename > Backport HDFS-12300 to branch-2 > --- > > Key: HDFS-15005 > URL: https://issues.apache.org/jira/browse/HDFS-15005 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Reporter: Chao Sun >Assignee: Chao Sun >Priority: Major > Fix For: 2.10.1 > > Attachments: HDFS-15005-branch-2.000.patch, > HDFS-15005-branch-2.001.patch, HDFS-15005-branch-2.002.patch, > HDFS-15005-branch-2.003.patch > > > Having DT related information is very useful in audit log. This tracks effort > to backport HDFS-12300 to branch-2. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14986) ReplicaCachingGetSpaceUsed throws ConcurrentModificationException
[ https://issues.apache.org/jira/browse/HDFS-14986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16991911#comment-16991911 ] Jonathan Hung commented on HDFS-14986: -- Removing 2.11.0 fix version after branch-2 -> branch-2.10 rename > ReplicaCachingGetSpaceUsed throws ConcurrentModificationException > -- > > Key: HDFS-14986 > URL: https://issues.apache.org/jira/browse/HDFS-14986 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, performance >Affects Versions: 2.10.0 >Reporter: Ryan Wu >Assignee: Aiphago >Priority: Major > Fix For: 3.3.0, 2.10.1 > > Attachments: HDFS-14986.001.patch, HDFS-14986.002.patch, > HDFS-14986.003.patch, HDFS-14986.004.patch, HDFS-14986.005.patch, > HDFS-14986.006.patch > > > Running DU across lots of disks is very expensive . We applied the patch > HDFS-14313 to get used space from ReplicaInfo in memory.However, new du > threads throw the exception > {code:java} > // 2019-11-08 18:07:13,858 ERROR > [refreshUsed-/home/vipshop/hard_disk/7/dfs/dn/current/BP-1203969992--1450855658517] > > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed: > ReplicaCachingGetSpaceUsed refresh error > java.util.ConcurrentModificationException: Tree has been modified outside of > iterator > at > org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.checkForModification(FoldedTreeSet.java:311) > > at > org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.hasNext(FoldedTreeSet.java:256) > > at java.util.AbstractCollection.addAll(AbstractCollection.java:343) > at java.util.HashSet.(HashSet.java:120) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.deepCopyReplica(FsDatasetImpl.java:1052) > > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed.refresh(ReplicaCachingGetSpaceUsed.java:73) > > at > org.apache.hadoop.fs.CachingGetSpaceUsed$RefreshThread.run(CachingGetSpaceUsed.java:178) > > at java.lang.Thread.run(Thread.java:748) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14986) ReplicaCachingGetSpaceUsed throws ConcurrentModificationException
[ https://issues.apache.org/jira/browse/HDFS-14986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hung updated HDFS-14986: - Fix Version/s: (was: 2.11.0) > ReplicaCachingGetSpaceUsed throws ConcurrentModificationException > -- > > Key: HDFS-14986 > URL: https://issues.apache.org/jira/browse/HDFS-14986 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, performance >Affects Versions: 2.10.0 >Reporter: Ryan Wu >Assignee: Aiphago >Priority: Major > Fix For: 3.3.0, 2.10.1 > > Attachments: HDFS-14986.001.patch, HDFS-14986.002.patch, > HDFS-14986.003.patch, HDFS-14986.004.patch, HDFS-14986.005.patch, > HDFS-14986.006.patch > > > Running DU across lots of disks is very expensive . We applied the patch > HDFS-14313 to get used space from ReplicaInfo in memory.However, new du > threads throw the exception > {code:java} > // 2019-11-08 18:07:13,858 ERROR > [refreshUsed-/home/vipshop/hard_disk/7/dfs/dn/current/BP-1203969992--1450855658517] > > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed: > ReplicaCachingGetSpaceUsed refresh error > java.util.ConcurrentModificationException: Tree has been modified outside of > iterator > at > org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.checkForModification(FoldedTreeSet.java:311) > > at > org.apache.hadoop.hdfs.util.FoldedTreeSet$TreeSetIterator.hasNext(FoldedTreeSet.java:256) > > at java.util.AbstractCollection.addAll(AbstractCollection.java:343) > at java.util.HashSet.(HashSet.java:120) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.deepCopyReplica(FsDatasetImpl.java:1052) > > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed.refresh(ReplicaCachingGetSpaceUsed.java:73) > > at > org.apache.hadoop.fs.CachingGetSpaceUsed$RefreshThread.run(CachingGetSpaceUsed.java:178) > > at java.lang.Thread.run(Thread.java:748) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14973) Balancer getBlocks RPC dispersal does not function properly
[ https://issues.apache.org/jira/browse/HDFS-14973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16991910#comment-16991910 ] Jonathan Hung commented on HDFS-14973: -- Removing 2.11.0 fix version after branch-2 -> branch-2.10 rename > Balancer getBlocks RPC dispersal does not function properly > --- > > Key: HDFS-14973 > URL: https://issues.apache.org/jira/browse/HDFS-14973 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer mover >Affects Versions: 2.9.0, 2.7.4, 2.8.2, 3.0.0 >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Major > Fix For: 3.3.0, 3.1.4, 3.2.2, 2.10.1 > > Attachments: HDFS-14973-branch-2.003.patch, > HDFS-14973-branch-2.004.patch, HDFS-14973-branch-2.005.patch, > HDFS-14973.000.patch, HDFS-14973.001.patch, HDFS-14973.002.patch, > HDFS-14973.003.patch, HDFS-14973.test.patch > > > In HDFS-11384, a mechanism was added to make the {{getBlocks}} RPC calls > issued by the Balancer/Mover more dispersed, to alleviate load on the > NameNode, since {{getBlocks}} can be very expensive and the Balancer should > not impact normal cluster operation. > Unfortunately, this functionality does not function as expected, especially > when the dispatcher thread count is low. The primary issue is that the delay > is applied only to the first N threads that are submitted to the dispatcher's > executor, where N is the size of the dispatcher's threadpool, but *not* to > the first R threads, where R is the number of allowed {{getBlocks}} QPS > (currently hardcoded to 20). For example, if the threadpool size is 100 (the > default), threads 0-19 have no delay, 20-99 have increased levels of delay, > and 100+ have no delay. As I understand it, the intent of the logic was that > the delay applied to the first 100 threads would force the dispatcher > executor's threads to all be consumed, thus blocking subsequent (non-delayed) > threads until the delay period has expired. However, threads 0-19 can finish > very quickly (their work can often be fulfilled in the time it takes to > execute a single {{getBlocks}} RPC, on the order of tens of milliseconds), > thus opening up 20 new slots in the executor, which are then consumed by > non-delayed threads 100-119, and so on. So, although 80 threads have had a > delay applied, the non-delay threads rush through in the 20 non-delay slots. > This problem gets even worse when the dispatcher threadpool size is less than > the max {{getBlocks}} QPS. For example, if the threadpool size is 10, _no > threads ever have a delay applied_, and the feature is not enabled at all. > This problem wasn't surfaced in the original JIRA because the test > incorrectly measured the period across which {{getBlocks}} RPCs were > distributed. The variables {{startGetBlocksTime}} and {{endGetBlocksTime}} > were used to track the time over which the {{getBlocks}} calls were made. > However, {{startGetBlocksTime}} was initialized at the time of creation of > the {{FSNameystem}} spy, which is before the mock DataNodes are started. Even > worse, the Balancer in this test takes 2 iterations to complete balancing the > cluster, so the time period {{endGetBlocksTime - startGetBlocksTime}} > actually represents: > {code} > (time to submit getBlocks RPCs) + (DataNode startup time) + (time for the > Dispatcher to complete an iteration of moving blocks) > {code} > Thus, the RPC QPS reported by the test is much lower than the RPC QPS seen > during the period of initial block fetching. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14973) Balancer getBlocks RPC dispersal does not function properly
[ https://issues.apache.org/jira/browse/HDFS-14973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hung updated HDFS-14973: - Fix Version/s: (was: 2.11.0) > Balancer getBlocks RPC dispersal does not function properly > --- > > Key: HDFS-14973 > URL: https://issues.apache.org/jira/browse/HDFS-14973 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer mover >Affects Versions: 2.9.0, 2.7.4, 2.8.2, 3.0.0 >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Major > Fix For: 3.3.0, 3.1.4, 3.2.2, 2.10.1 > > Attachments: HDFS-14973-branch-2.003.patch, > HDFS-14973-branch-2.004.patch, HDFS-14973-branch-2.005.patch, > HDFS-14973.000.patch, HDFS-14973.001.patch, HDFS-14973.002.patch, > HDFS-14973.003.patch, HDFS-14973.test.patch > > > In HDFS-11384, a mechanism was added to make the {{getBlocks}} RPC calls > issued by the Balancer/Mover more dispersed, to alleviate load on the > NameNode, since {{getBlocks}} can be very expensive and the Balancer should > not impact normal cluster operation. > Unfortunately, this functionality does not function as expected, especially > when the dispatcher thread count is low. The primary issue is that the delay > is applied only to the first N threads that are submitted to the dispatcher's > executor, where N is the size of the dispatcher's threadpool, but *not* to > the first R threads, where R is the number of allowed {{getBlocks}} QPS > (currently hardcoded to 20). For example, if the threadpool size is 100 (the > default), threads 0-19 have no delay, 20-99 have increased levels of delay, > and 100+ have no delay. As I understand it, the intent of the logic was that > the delay applied to the first 100 threads would force the dispatcher > executor's threads to all be consumed, thus blocking subsequent (non-delayed) > threads until the delay period has expired. However, threads 0-19 can finish > very quickly (their work can often be fulfilled in the time it takes to > execute a single {{getBlocks}} RPC, on the order of tens of milliseconds), > thus opening up 20 new slots in the executor, which are then consumed by > non-delayed threads 100-119, and so on. So, although 80 threads have had a > delay applied, the non-delay threads rush through in the 20 non-delay slots. > This problem gets even worse when the dispatcher threadpool size is less than > the max {{getBlocks}} QPS. For example, if the threadpool size is 10, _no > threads ever have a delay applied_, and the feature is not enabled at all. > This problem wasn't surfaced in the original JIRA because the test > incorrectly measured the period across which {{getBlocks}} RPCs were > distributed. The variables {{startGetBlocksTime}} and {{endGetBlocksTime}} > were used to track the time over which the {{getBlocks}} calls were made. > However, {{startGetBlocksTime}} was initialized at the time of creation of > the {{FSNameystem}} spy, which is before the mock DataNodes are started. Even > worse, the Balancer in this test takes 2 iterations to complete balancing the > cluster, so the time period {{endGetBlocksTime - startGetBlocksTime}} > actually represents: > {code} > (time to submit getBlocks RPCs) + (DataNode startup time) + (time for the > Dispatcher to complete an iteration of moving blocks) > {code} > Thus, the RPC QPS reported by the test is much lower than the RPC QPS seen > during the period of initial block fetching. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14952) Skip safemode if blockTotal is 0 in new NN
[ https://issues.apache.org/jira/browse/HDFS-14952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16991904#comment-16991904 ] Jonathan Hung commented on HDFS-14952: -- Renaming 2.11.0 fix version to 2.10.1 after branch-2 -> branch-2.10 rename > Skip safemode if blockTotal is 0 in new NN > -- > > Key: HDFS-14952 > URL: https://issues.apache.org/jira/browse/HDFS-14952 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Rajesh Balamohan >Assignee: Xiaoqiao He >Priority: Trivial > Labels: performance > Fix For: 3.3.0, 3.1.4, 3.2.2, 2.10.1 > > Attachments: HDFS-14952.001.patch, HDFS-14952.002.patch, > HDFS-14952.003.patch > > > When new NN is installed, it spends 30-45 seconds in Safemode. When > {{blockTotal}} is 0, it should be possible to short circuit safemode check in > {{BlockManagerSafeMode::areThresholdsMet}}. > https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManagerSafeMode.java#L571 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14952) Skip safemode if blockTotal is 0 in new NN
[ https://issues.apache.org/jira/browse/HDFS-14952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hung updated HDFS-14952: - Fix Version/s: (was: 2.11.0) 2.10.1 > Skip safemode if blockTotal is 0 in new NN > -- > > Key: HDFS-14952 > URL: https://issues.apache.org/jira/browse/HDFS-14952 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Rajesh Balamohan >Assignee: Xiaoqiao He >Priority: Trivial > Labels: performance > Fix For: 3.3.0, 3.1.4, 3.2.2, 2.10.1 > > Attachments: HDFS-14952.001.patch, HDFS-14952.002.patch, > HDFS-14952.003.patch > > > When new NN is installed, it spends 30-45 seconds in Safemode. When > {{blockTotal}} is 0, it should be possible to short circuit safemode check in > {{BlockManagerSafeMode::areThresholdsMet}}. > https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManagerSafeMode.java#L571 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14884) Add sanity check that zone key equals feinfo key while setting Xattrs
[ https://issues.apache.org/jira/browse/HDFS-14884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16991896#comment-16991896 ] Jonathan Hung commented on HDFS-14884: -- Removing 2.11.0 fix version after branch-2 -> branch-2.10 rename > Add sanity check that zone key equals feinfo key while setting Xattrs > - > > Key: HDFS-14884 > URL: https://issues.apache.org/jira/browse/HDFS-14884 > Project: Hadoop HDFS > Issue Type: Bug > Components: encryption, hdfs >Affects Versions: 2.11.0 >Reporter: Mukul Kumar Singh >Assignee: Mukul Kumar Singh >Priority: Major > Fix For: 3.3.0, 3.1.4, 3.2.2, 2.10.1 > > Attachments: HDFS-14884-branch-2.001.patch, HDFS-14884.001.patch, > HDFS-14884.002.patch, HDFS-14884.003.patch, hdfs_distcp.patch > > > Currently, it is possible to set an external attribute where the zone key is > not the same as feinfo key. This jira will add a precondition before setting > this. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14884) Add sanity check that zone key equals feinfo key while setting Xattrs
[ https://issues.apache.org/jira/browse/HDFS-14884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hung updated HDFS-14884: - Fix Version/s: (was: 2.11.0) > Add sanity check that zone key equals feinfo key while setting Xattrs > - > > Key: HDFS-14884 > URL: https://issues.apache.org/jira/browse/HDFS-14884 > Project: Hadoop HDFS > Issue Type: Bug > Components: encryption, hdfs >Affects Versions: 2.11.0 >Reporter: Mukul Kumar Singh >Assignee: Mukul Kumar Singh >Priority: Major > Fix For: 3.3.0, 3.1.4, 3.2.2, 2.10.1 > > Attachments: HDFS-14884-branch-2.001.patch, HDFS-14884.001.patch, > HDFS-14884.002.patch, HDFS-14884.003.patch, hdfs_distcp.patch > > > Currently, it is possible to set an external attribute where the zone key is > not the same as feinfo key. This jira will add a precondition before setting > this. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14979) [Observer Node] Balancer should submit getBlocks to Observer Node when possible
[ https://issues.apache.org/jira/browse/HDFS-14979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16991899#comment-16991899 ] Jonathan Hung commented on HDFS-14979: -- Removing 2.11.0 fix version after branch-2 -> branch-2.10 rename > [Observer Node] Balancer should submit getBlocks to Observer Node when > possible > --- > > Key: HDFS-14979 > URL: https://issues.apache.org/jira/browse/HDFS-14979 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer mover, hdfs >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Major > Fix For: 3.3.0, 3.1.4, 3.2.2, 2.10.1 > > Attachments: HDFS-14979.000.patch > > > In HDFS-14162, we made it so that the Balancer could function when > {{ObserverReadProxyProvider}} was in use. However, the Balancer would still > read from the active NameNode, because {{getBlocks}} wasn't annotated as > {{@ReadOnly}}. This task is to enable the Balancer to actually read from the > Observer Node to alleviate load from the active NameNode. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14979) [Observer Node] Balancer should submit getBlocks to Observer Node when possible
[ https://issues.apache.org/jira/browse/HDFS-14979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hung updated HDFS-14979: - Fix Version/s: (was: 2.11.0) > [Observer Node] Balancer should submit getBlocks to Observer Node when > possible > --- > > Key: HDFS-14979 > URL: https://issues.apache.org/jira/browse/HDFS-14979 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer mover, hdfs >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Major > Fix For: 3.3.0, 3.1.4, 3.2.2, 2.10.1 > > Attachments: HDFS-14979.000.patch > > > In HDFS-14162, we made it so that the Balancer could function when > {{ObserverReadProxyProvider}} was in use. However, the Balancer would still > read from the active NameNode, because {{getBlocks}} wasn't annotated as > {{@ReadOnly}}. This task is to enable the Balancer to actually read from the > Observer Node to alleviate load from the active NameNode. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14590) [SBN Read] Add the document link to the top page
[ https://issues.apache.org/jira/browse/HDFS-14590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16991890#comment-16991890 ] Jonathan Hung commented on HDFS-14590: -- Removing 2.11.0 fix version after branch-2 -> branch-2.10 rename > [SBN Read] Add the document link to the top page > > > Key: HDFS-14590 > URL: https://issues.apache.org/jira/browse/HDFS-14590 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: documentation >Reporter: Takanobu Asanuma >Assignee: Takanobu Asanuma >Priority: Major > Fix For: 3.3.0, 3.1.4, 3.2.2, 2.10.1 > > Attachments: HDFS-14590.001.patch, HDFS-14590.002.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14958) TestBalancerWithNodeGroup is not using NetworkTopologyWithNodeGroup
[ https://issues.apache.org/jira/browse/HDFS-14958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16991891#comment-16991891 ] Jonathan Hung commented on HDFS-14958: -- Removing 2.11.0 fix version after branch-2 -> branch-2.10 rename > TestBalancerWithNodeGroup is not using NetworkTopologyWithNodeGroup > --- > > Key: HDFS-14958 > URL: https://issues.apache.org/jira/browse/HDFS-14958 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.1.3 >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Minor > Fix For: 3.3.0, 3.1.4, 3.2.2, 2.10.1 > > Attachments: HDFS-14958.001.patch > > > TestBalancerWithNodeGroup is intended to test with > {{NetworkTopologyWithNodeGroup}}, but it is not configured correctly. > Because {{DFSConfigKeys.DFS_USE_DFS_NETWORK_TOPOLOGY_KEY}} defaults to true, > {{CommonConfigurationKeysPublic.NET_TOPOLOGY_IMPL_KEY}} is ignored and the > test actually uses the default {{DFSNetworkTopology}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14958) TestBalancerWithNodeGroup is not using NetworkTopologyWithNodeGroup
[ https://issues.apache.org/jira/browse/HDFS-14958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hung updated HDFS-14958: - Fix Version/s: (was: 2.11.0) > TestBalancerWithNodeGroup is not using NetworkTopologyWithNodeGroup > --- > > Key: HDFS-14958 > URL: https://issues.apache.org/jira/browse/HDFS-14958 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.1.3 >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Minor > Fix For: 3.3.0, 3.1.4, 3.2.2, 2.10.1 > > Attachments: HDFS-14958.001.patch > > > TestBalancerWithNodeGroup is intended to test with > {{NetworkTopologyWithNodeGroup}}, but it is not configured correctly. > Because {{DFSConfigKeys.DFS_USE_DFS_NETWORK_TOPOLOGY_KEY}} defaults to true, > {{CommonConfigurationKeysPublic.NET_TOPOLOGY_IMPL_KEY}} is ignored and the > test actually uses the default {{DFSNetworkTopology}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14590) [SBN Read] Add the document link to the top page
[ https://issues.apache.org/jira/browse/HDFS-14590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hung updated HDFS-14590: - Fix Version/s: (was: 2.11.0) > [SBN Read] Add the document link to the top page > > > Key: HDFS-14590 > URL: https://issues.apache.org/jira/browse/HDFS-14590 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: documentation >Reporter: Takanobu Asanuma >Assignee: Takanobu Asanuma >Priority: Major > Fix For: 3.3.0, 3.1.4, 3.2.2, 2.10.1 > > Attachments: HDFS-14590.001.patch, HDFS-14590.002.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15042) Add more tests for ByteBufferPositionedReadable
[ https://issues.apache.org/jira/browse/HDFS-15042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HDFS-15042: -- Summary: Add more tests for ByteBufferPositionedReadable (was: add more tests for ByteBufferPositionedReadable ) > Add more tests for ByteBufferPositionedReadable > > > Key: HDFS-15042 > URL: https://issues.apache.org/jira/browse/HDFS-15042 > Project: Hadoop HDFS > Issue Type: Improvement > Components: fs, test >Affects Versions: 3.3.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Major > > There's a few corner cases of ByteBufferPositionedReadable which need to be > tested, mainly illegal read positions. Add them -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Issue Comment Deleted] (HDFS-15032) Balancer crashes when it fails to contact an unavailable NN via ObserverReadProxyProvider
[ https://issues.apache.org/jira/browse/HDFS-15032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Krogen updated HDFS-15032: --- Comment: was deleted (was: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 8s{color} | {color:red} HDFS-15032 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HDFS-15032 | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/28486/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. ) > Balancer crashes when it fails to contact an unavailable NN via > ObserverReadProxyProvider > - > > Key: HDFS-15032 > URL: https://issues.apache.org/jira/browse/HDFS-15032 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer mover >Affects Versions: 2.10.0 >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Major > Attachments: HDFS-15032.000.patch, HDFS-15032.001.patch, > HDFS-15032.002.patch, HDFS-15032.003.patch, HDFS-15032.004.patch, > debugger_with_tostring.png, debugger_without_tostring.png > > > When trying to run the Balancer using ObserverReadProxyProvider (to allow it > to read from the Observer Node as described in HDFS-14979), if one of the NNs > isn't running, the Balancer will crash. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15032) Balancer crashes when it fails to contact an unavailable NN via ObserverReadProxyProvider
[ https://issues.apache.org/jira/browse/HDFS-15032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16991822#comment-16991822 ] Erik Krogen commented on HDFS-15032: It looks like Yetus was trying to pick up the image as a patch: {code} HDFS-15032 patch is being downloaded at Mon Dec 9 17:47:52 UTC 2019 from https://issues.apache.org/jira/secure/attachment/12988361/debugger_without_tostring.png -> Downloaded {code} I'm re-attaching v3 as v004 to get Yetus to pick it up (hopefully). > Balancer crashes when it fails to contact an unavailable NN via > ObserverReadProxyProvider > - > > Key: HDFS-15032 > URL: https://issues.apache.org/jira/browse/HDFS-15032 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer mover >Affects Versions: 2.10.0 >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Major > Attachments: HDFS-15032.000.patch, HDFS-15032.001.patch, > HDFS-15032.002.patch, HDFS-15032.003.patch, HDFS-15032.004.patch, > debugger_with_tostring.png, debugger_without_tostring.png > > > When trying to run the Balancer using ObserverReadProxyProvider (to allow it > to read from the Observer Node as described in HDFS-14979), if one of the NNs > isn't running, the Balancer will crash. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15032) Balancer crashes when it fails to contact an unavailable NN via ObserverReadProxyProvider
[ https://issues.apache.org/jira/browse/HDFS-15032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Krogen updated HDFS-15032: --- Attachment: HDFS-15032.004.patch > Balancer crashes when it fails to contact an unavailable NN via > ObserverReadProxyProvider > - > > Key: HDFS-15032 > URL: https://issues.apache.org/jira/browse/HDFS-15032 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer mover >Affects Versions: 2.10.0 >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Major > Attachments: HDFS-15032.000.patch, HDFS-15032.001.patch, > HDFS-15032.002.patch, HDFS-15032.003.patch, HDFS-15032.004.patch, > debugger_with_tostring.png, debugger_without_tostring.png > > > When trying to run the Balancer using ObserverReadProxyProvider (to allow it > to read from the Observer Node as described in HDFS-14979), if one of the NNs > isn't running, the Balancer will crash. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15032) Balancer crashes when it fails to contact an unavailable NN via ObserverReadProxyProvider
[ https://issues.apache.org/jira/browse/HDFS-15032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16991812#comment-16991812 ] Hadoop QA commented on HDFS-15032: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 8s{color} | {color:red} HDFS-15032 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HDFS-15032 | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/28486/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > Balancer crashes when it fails to contact an unavailable NN via > ObserverReadProxyProvider > - > > Key: HDFS-15032 > URL: https://issues.apache.org/jira/browse/HDFS-15032 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer mover >Affects Versions: 2.10.0 >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Major > Attachments: HDFS-15032.000.patch, HDFS-15032.001.patch, > HDFS-15032.002.patch, HDFS-15032.003.patch, debugger_with_tostring.png, > debugger_without_tostring.png > > > When trying to run the Balancer using ObserverReadProxyProvider (to allow it > to read from the Observer Node as described in HDFS-14979), if one of the NNs > isn't running, the Balancer will crash. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15032) Balancer crashes when it fails to contact an unavailable NN via ObserverReadProxyProvider
[ https://issues.apache.org/jira/browse/HDFS-15032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16991806#comment-16991806 ] Erik Krogen commented on HDFS-15032: Hey [~shv], that's a great question. In many cases, only a reference to the proxy is kept, so it is the direct {{toString}} method of the proxy that you see. For example, this is what a debugger stopped in {{ObserverReadProxyProvider}} looks like without this change: !debugger_without_tostring.png! You can see that the proxy (which is really a combined proxy) is reporting that it is a {{NameNodeProtocolTranslatorPB}}, because it is the {{toString()}} method of the first proxy which is being used. This was misleading to me when I was trying to investigate this issue, as it led me to believe a plain {{NameNodeProtocol}} was showing up where I expected a {{BalancerProtocol}}. However with the change, it is more obvious what is going on: !debugger_with_tostring.png! I see your concern about the performance, however. I've added a v003 patch which replaces to string comparison with a call to {{Method.equals()}}, which I confirmed internally only does a few reference equality checks: {code} public boolean equals(Object obj) { if (obj != null && obj instanceof Method) { Method other = (Method)obj; if ((getDeclaringClass() == other.getDeclaringClass()) && (getName() == other.getName())) { if (!returnType.equals(other.getReturnType())) return false; return equalParamTypes(parameterTypes, other.parameterTypes); } } return false; } {code} Let me know if that addresses your concerns. If you think it's too risky for performance, I'm fine with removing it. > Balancer crashes when it fails to contact an unavailable NN via > ObserverReadProxyProvider > - > > Key: HDFS-15032 > URL: https://issues.apache.org/jira/browse/HDFS-15032 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer mover >Affects Versions: 2.10.0 >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Major > Attachments: HDFS-15032.000.patch, HDFS-15032.001.patch, > HDFS-15032.002.patch, HDFS-15032.003.patch, debugger_with_tostring.png, > debugger_without_tostring.png > > > When trying to run the Balancer using ObserverReadProxyProvider (to allow it > to read from the Observer Node as described in HDFS-14979), if one of the NNs > isn't running, the Balancer will crash. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15032) Balancer crashes when it fails to contact an unavailable NN via ObserverReadProxyProvider
[ https://issues.apache.org/jira/browse/HDFS-15032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16991801#comment-16991801 ] Hadoop QA commented on HDFS-15032: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 9s{color} | {color:red} HDFS-15032 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HDFS-15032 | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/28485/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > Balancer crashes when it fails to contact an unavailable NN via > ObserverReadProxyProvider > - > > Key: HDFS-15032 > URL: https://issues.apache.org/jira/browse/HDFS-15032 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer mover >Affects Versions: 2.10.0 >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Major > Attachments: HDFS-15032.000.patch, HDFS-15032.001.patch, > HDFS-15032.002.patch, HDFS-15032.003.patch, debugger_with_tostring.png, > debugger_without_tostring.png > > > When trying to run the Balancer using ObserverReadProxyProvider (to allow it > to read from the Observer Node as described in HDFS-14979), if one of the NNs > isn't running, the Balancer will crash. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15032) Balancer crashes when it fails to contact an unavailable NN via ObserverReadProxyProvider
[ https://issues.apache.org/jira/browse/HDFS-15032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Krogen updated HDFS-15032: --- Attachment: debugger_with_tostring.png > Balancer crashes when it fails to contact an unavailable NN via > ObserverReadProxyProvider > - > > Key: HDFS-15032 > URL: https://issues.apache.org/jira/browse/HDFS-15032 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer mover >Affects Versions: 2.10.0 >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Major > Attachments: HDFS-15032.000.patch, HDFS-15032.001.patch, > HDFS-15032.002.patch, HDFS-15032.003.patch, debugger_with_tostring.png, > debugger_without_tostring.png > > > When trying to run the Balancer using ObserverReadProxyProvider (to allow it > to read from the Observer Node as described in HDFS-14979), if one of the NNs > isn't running, the Balancer will crash. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15032) Balancer crashes when it fails to contact an unavailable NN via ObserverReadProxyProvider
[ https://issues.apache.org/jira/browse/HDFS-15032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Krogen updated HDFS-15032: --- Attachment: debugger_without_tostring.png > Balancer crashes when it fails to contact an unavailable NN via > ObserverReadProxyProvider > - > > Key: HDFS-15032 > URL: https://issues.apache.org/jira/browse/HDFS-15032 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer mover >Affects Versions: 2.10.0 >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Major > Attachments: HDFS-15032.000.patch, HDFS-15032.001.patch, > HDFS-15032.002.patch, HDFS-15032.003.patch, debugger_with_tostring.png, > debugger_without_tostring.png > > > When trying to run the Balancer using ObserverReadProxyProvider (to allow it > to read from the Observer Node as described in HDFS-14979), if one of the NNs > isn't running, the Balancer will crash. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15032) Balancer crashes when it fails to contact an unavailable NN via ObserverReadProxyProvider
[ https://issues.apache.org/jira/browse/HDFS-15032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Krogen updated HDFS-15032: --- Attachment: HDFS-15032.003.patch > Balancer crashes when it fails to contact an unavailable NN via > ObserverReadProxyProvider > - > > Key: HDFS-15032 > URL: https://issues.apache.org/jira/browse/HDFS-15032 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer mover >Affects Versions: 2.10.0 >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Major > Attachments: HDFS-15032.000.patch, HDFS-15032.001.patch, > HDFS-15032.002.patch, HDFS-15032.003.patch > > > When trying to run the Balancer using ObserverReadProxyProvider (to allow it > to read from the Observer Node as described in HDFS-14979), if one of the NNs > isn't running, the Balancer will crash. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-15042) add more tests for ByteBufferPositionedReadable
Steve Loughran created HDFS-15042: - Summary: add more tests for ByteBufferPositionedReadable Key: HDFS-15042 URL: https://issues.apache.org/jira/browse/HDFS-15042 Project: Hadoop HDFS Issue Type: Improvement Components: fs, test Affects Versions: 3.3.0 Reporter: Steve Loughran Assignee: Steve Loughran There's a few corner cases of ByteBufferPositionedReadable which need to be tested, mainly illegal read positions. Add them -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15041) Make MAX_LOCK_HOLD_MS and full queue size configurable
[ https://issues.apache.org/jira/browse/HDFS-15041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16991755#comment-16991755 ] Wei-Chiu Chuang commented on HDFS-15041: I believe the HDFS-14553 does the latter for you. > Make MAX_LOCK_HOLD_MS and full queue size configurable > -- > > Key: HDFS-15041 > URL: https://issues.apache.org/jira/browse/HDFS-15041 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: 3.2.0 >Reporter: zhuqi >Priority: Major > > Now the MAX_LOCK_HOLD_MS and the full queue size are fixed. But different > cluster have different need for the latency and the queue health standard. > We'd better to make the two parameter configurable. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15041) Make MAX_LOCK_HOLD_MS and full queue size configurable
[ https://issues.apache.org/jira/browse/HDFS-15041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16991729#comment-16991729 ] zhuqi commented on HDFS-15041: -- cc [~daryn] , [~weichiu] Our cluster wants to change it in order to get the better balancer between latency and rpc queue size boom. What do you think about it ? May i have the access to assign to myself. Thanks. > Make MAX_LOCK_HOLD_MS and full queue size configurable > -- > > Key: HDFS-15041 > URL: https://issues.apache.org/jira/browse/HDFS-15041 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: 3.2.0 >Reporter: zhuqi >Priority: Major > > Now the MAX_LOCK_HOLD_MS and the full queue size are fixed. But different > cluster have different need for the latency and the queue health standard. > We'd better to make the two parameter configurable. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-15041) Make MAX_LOCK_HOLD_MS and full queue size configurable
zhuqi created HDFS-15041: Summary: Make MAX_LOCK_HOLD_MS and full queue size configurable Key: HDFS-15041 URL: https://issues.apache.org/jira/browse/HDFS-15041 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: 3.2.0 Reporter: zhuqi Now the MAX_LOCK_HOLD_MS and the full queue size are fixed. But different cluster have different need for the latency and the queue health standard. We'd better to make the two parameter configurable. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15012) NN fails to parse Edit logs after applying HDFS-13101
[ https://issues.apache.org/jira/browse/HDFS-15012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16991695#comment-16991695 ] Tsz-wo Sze commented on HDFS-15012: --- +1 the 000 patch looks good. > NN fails to parse Edit logs after applying HDFS-13101 > - > > Key: HDFS-15012 > URL: https://issues.apache.org/jira/browse/HDFS-15012 > Project: Hadoop HDFS > Issue Type: Bug > Components: nn >Reporter: Eric Lin >Assignee: Shashikant Banerjee >Priority: Blocker > Labels: release-blocker > Attachments: HDFS-15012.000.patch > > > After applying HDFS-13101, and deleting and creating large number of > snapshots, SNN exited with below error: > > {code:sh} > 2019-11-18 08:28:06,528 ERROR > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: Encountered exception > on operation DeleteSnapshotOp [snapshotRoot=/path/to/hdfs/file, > snapshotName=distcp-3479-31-old, > RpcClientId=b16a6cb5-bdbb-45ae-9f9a-f7dc57931f37, Rpc > CallId=1] > java.lang.AssertionError: Element already exists: > element=partition_isactive=true, DELETED=[partition_isactive=true] > at org.apache.hadoop.hdfs.util.Diff.insert(Diff.java:193) > at org.apache.hadoop.hdfs.util.Diff.delete(Diff.java:239) > at org.apache.hadoop.hdfs.util.Diff.combinePosterior(Diff.java:462) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature$DirectoryDiff$2.initChildren(DirectoryWithSnapshotFeature.java:240) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature$DirectoryDiff$2.iterator(DirectoryWithSnapshotFeature.java:250) > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtreeRecursively(INodeDirectory.java:755) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature.cleanDirectory(DirectoryWithSnapshotFeature.java:753) > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtree(INodeDirectory.java:790) > at > org.apache.hadoop.hdfs.server.namenode.INodeReference.cleanSubtree(INodeReference.java:332) > at > org.apache.hadoop.hdfs.server.namenode.INodeReference$WithName.cleanSubtree(INodeReference.java:583) > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtreeRecursively(INodeDirectory.java:760) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature.cleanDirectory(DirectoryWithSnapshotFeature.java:753) > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtree(INodeDirectory.java:790) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectorySnapshottableFeature.removeSnapshot(DirectorySnapshottableFeature.java:235) > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.removeSnapshot(INodeDirectory.java:259) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.SnapshotManager.deleteSnapshot(SnapshotManager.java:301) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:688) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:232) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:141) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:903) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:756) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:324) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1144) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:796) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:614) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:676) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:844) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:823) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1547) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1615) > {code} > We confirmed that fsimage and edit files were NOT corrupted, as reverting > HDFS-13101 fixed the issue. So the logic introduced in HDFS-13101 is broken > and failed to parse edit log files. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: