[jira] [Created] (HDFS-15299) Add an option to enable reporting flaky disk
star created HDFS-15299: --- Summary: Add an option to enable reporting flaky disk Key: HDFS-15299 URL: https://issues.apache.org/jira/browse/HDFS-15299 Project: Hadoop HDFS Issue Type: Bug Reporter: star Assignee: star In our production environment with disks more than 8 years old, many DN are treated as dead because of partially broken. Then NN will balance data blocks in the cluster, introducing high disk loads. To reduce the impact of flaky disks, we'd like to extend the tolerance mechanism to partial disk failure. As described in HDFS-10777 , command du could still throw exception in a high loaded disk. It is brittle to just remove a flaky disk because it may recover later. However it is a rare case in our production environment. So can we just add an option to enable partial disk failure tolerance for users who has mostly broken disks and care more about stability of the cluster. We will replace those old disks in the future, but before that, it will last a long time to run hdfs cluster on those servers. Comments are appreciated. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-14378) Simplify the design of multiple NN and both logic of edit log roll and checkpoint
[ https://issues.apache.org/jira/browse/HDFS-14378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925622#comment-16925622 ] star edited comment on HDFS-14378 at 9/9/19 11:51 AM: -- Thanks [~jojochuang] for reviewing and advise. Section 7.6 of HDFS-1073 emphasize the problem of multi nn. HDFS-6440 didn't take care of edits rolling and avoid multiple fsimage uploading by 'primary check pointer' status for multi SNN. I'd like to make two sub jiras as respect to edits rolling and fsimage downloading. ANN will roll its edit logs. As to fsimage, two options as far as my concern: # SNN do its own checkpointk and ANN will download fsimage from a random selected SNN. # ANN issues a checkpoint command to SNNs by a special edit log like "OP_ROLLING_UPGRADE_START", then ANN downloads fsimage form a random selected SNN. [~jojochuang], [~tlipcon] what's your opinion? was (Author: starphin): Thanks [~jojochuang] for reviewing and advise. Section 7.6 of HDFS-1073 emphasize the problem of multi nn. HDFS-6440 didn't take care of edits rolling and avoid multiple fsimage uploading by 'primary check pointer' status for multi SNN. I'd like to make two sub jiras as respect to edits rolling and fsimage downloading. ANN will roll its edit logs. As to fsimage, two options as far as my concern: # SNN do its own checkpointk and ANN will download fsimage from a random selected SNN. # 2. ANN issues a checkpoint command to SNNs by a special edit log like "OP_ROLLING_UPGRADE_START", then ANN downloads fsimage form a random selected SNN. [~jojochuang], [~tlipcon] what's your opinion? > Simplify the design of multiple NN and both logic of edit log roll and > checkpoint > - > > Key: HDFS-14378 > URL: https://issues.apache.org/jira/browse/HDFS-14378 > Project: Hadoop HDFS > Issue Type: Improvement > Components: ha, namenode >Affects Versions: 3.1.2 >Reporter: star >Assignee: star >Priority: Major > Attachments: HDFS-14378-trunk.001.patch, HDFS-14378-trunk.002.patch, > HDFS-14378-trunk.003.patch, HDFS-14378-trunk.004.patch, > HDFS-14378-trunk.005.patch, HDFS-14378-trunk.006.patch > > > HDFS-6440 introduced a mechanism to support more than 2 NNs. It > implements a first-writer-win policy to avoid duplicated fsimage downloading. > Variable 'isPrimaryCheckPointer' is used to hold the first-writer state, with > which SNN will provide fsimage for ANN next time. Then we have three roles in > NN cluster: ANN, one primary SNN, one or more normal SNN. > Since HDFS-12248, there may be more than two primary SNN shortly after > a exception occurred. It takes care with a scenario that SNN will not upload > fsimage on IOE and Interrupted exceptions. Though it will not cause any > further functional issues, it is inconsistent. > Futher more, edit log may be rolled more frequently than necessary with > multiple Standby name nodes, HDFS-14349. (I'm not so sure about this, will > verify by unit tests or any one could point it out.) > Above all, I‘m wondering if we could make it simple with following > changes: > * There are only two roles:ANN, SNN > * ANN will roll its edit log every DFS_HA_LOGROLL_PERIOD_KEY period. > * ANN will select a SNN to download checkpoint. > SNN will just do logtail and checkpoint. Then provide a servlet for fsimage > downloading as normal. SNN will not try to roll edit log or send checkpoint > request to ANN. > In a word, ANN will be more active. Suggestions are welcomed. > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-14378) Simplify the design of multiple NN and both logic of edit log roll and checkpoint
[ https://issues.apache.org/jira/browse/HDFS-14378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925622#comment-16925622 ] star edited comment on HDFS-14378 at 9/9/19 11:51 AM: -- Thanks [~jojochuang] for reviewing and advise. Section 7.6 of HDFS-1073 emphasize the problem of multi nn. HDFS-6440 didn't take care of edits rolling and avoid multiple fsimage uploading by 'primary check pointer' status for multi SNN. I'd like to make two sub jiras as respect to edits rolling and fsimage downloading. ANN will roll its edit logs. As to fsimage, two options as far as my concern: # SNN do its own checkpointk and ANN will download fsimage from a random selected SNN. # 2. ANN issues a checkpoint command to SNNs by a special edit log like "OP_ROLLING_UPGRADE_START", then ANN downloads fsimage form a random selected SNN. [~jojochuang], [~tlipcon] what's your opinion? was (Author: starphin): Thanks [~jojochuang] for reviewing and advise. Section 7.6 of HDFS-1073 emphasize the problem of multi nn. HDFS-6440 didn't take care of edits rolling and avoid multiple fsimage uploading by 'primary check pointer' status for multi SNN. I'd like to make two sub jiras as respect to edits rolling and fsimage downloading. ANN will roll its edit logs. As to fsimage, two options as far as my concern: 1. SNN do its own checkpointk and ANN will download fsimage from a random selected SNN. 2. ANN issues a checkpoint command to SNNs by a special edit log like "OP_ROLLING_UPGRADE_START", then ANN downloads fsimage form a random selected SNN. [~jojochuang], [~tlipcon] what's your opinion? > Simplify the design of multiple NN and both logic of edit log roll and > checkpoint > - > > Key: HDFS-14378 > URL: https://issues.apache.org/jira/browse/HDFS-14378 > Project: Hadoop HDFS > Issue Type: Improvement > Components: ha, namenode >Affects Versions: 3.1.2 >Reporter: star >Assignee: star >Priority: Major > Attachments: HDFS-14378-trunk.001.patch, HDFS-14378-trunk.002.patch, > HDFS-14378-trunk.003.patch, HDFS-14378-trunk.004.patch, > HDFS-14378-trunk.005.patch, HDFS-14378-trunk.006.patch > > > HDFS-6440 introduced a mechanism to support more than 2 NNs. It > implements a first-writer-win policy to avoid duplicated fsimage downloading. > Variable 'isPrimaryCheckPointer' is used to hold the first-writer state, with > which SNN will provide fsimage for ANN next time. Then we have three roles in > NN cluster: ANN, one primary SNN, one or more normal SNN. > Since HDFS-12248, there may be more than two primary SNN shortly after > a exception occurred. It takes care with a scenario that SNN will not upload > fsimage on IOE and Interrupted exceptions. Though it will not cause any > further functional issues, it is inconsistent. > Futher more, edit log may be rolled more frequently than necessary with > multiple Standby name nodes, HDFS-14349. (I'm not so sure about this, will > verify by unit tests or any one could point it out.) > Above all, I‘m wondering if we could make it simple with following > changes: > * There are only two roles:ANN, SNN > * ANN will roll its edit log every DFS_HA_LOGROLL_PERIOD_KEY period. > * ANN will select a SNN to download checkpoint. > SNN will just do logtail and checkpoint. Then provide a servlet for fsimage > downloading as normal. SNN will not try to roll edit log or send checkpoint > request to ANN. > In a word, ANN will be more active. Suggestions are welcomed. > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14378) Simplify the design of multiple NN and both logic of edit log roll and checkpoint
[ https://issues.apache.org/jira/browse/HDFS-14378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925622#comment-16925622 ] star commented on HDFS-14378: - Thanks [~jojochuang] for reviewing and advise. Section 7.6 of HDFS-1073 emphasize the problem of multi nn. HDFS-6440 didn't take care of edits rolling and avoid multiple fsimage uploading by 'primary check pointer' status for multi SNN. I'd like to make two sub jiras as respect to edits rolling and fsimage downloading. ANN will roll its edit logs. As to fsimage, two options as far as my concern: 1. SNN do its own checkpointk and ANN will download fsimage from a random selected SNN. 2. ANN issues a checkpoint command to SNNs by a special edit log like "OP_ROLLING_UPGRADE_START", then ANN downloads fsimage form a random selected SNN. [~jojochuang], [~tlipcon] what's your opinion? > Simplify the design of multiple NN and both logic of edit log roll and > checkpoint > - > > Key: HDFS-14378 > URL: https://issues.apache.org/jira/browse/HDFS-14378 > Project: Hadoop HDFS > Issue Type: Improvement > Components: ha, namenode >Affects Versions: 3.1.2 >Reporter: star >Assignee: star >Priority: Major > Attachments: HDFS-14378-trunk.001.patch, HDFS-14378-trunk.002.patch, > HDFS-14378-trunk.003.patch, HDFS-14378-trunk.004.patch, > HDFS-14378-trunk.005.patch, HDFS-14378-trunk.006.patch > > > HDFS-6440 introduced a mechanism to support more than 2 NNs. It > implements a first-writer-win policy to avoid duplicated fsimage downloading. > Variable 'isPrimaryCheckPointer' is used to hold the first-writer state, with > which SNN will provide fsimage for ANN next time. Then we have three roles in > NN cluster: ANN, one primary SNN, one or more normal SNN. > Since HDFS-12248, there may be more than two primary SNN shortly after > a exception occurred. It takes care with a scenario that SNN will not upload > fsimage on IOE and Interrupted exceptions. Though it will not cause any > further functional issues, it is inconsistent. > Futher more, edit log may be rolled more frequently than necessary with > multiple Standby name nodes, HDFS-14349. (I'm not so sure about this, will > verify by unit tests or any one could point it out.) > Above all, I‘m wondering if we could make it simple with following > changes: > * There are only two roles:ANN, SNN > * ANN will roll its edit log every DFS_HA_LOGROLL_PERIOD_KEY period. > * ANN will select a SNN to download checkpoint. > SNN will just do logtail and checkpoint. Then provide a servlet for fsimage > downloading as normal. SNN will not try to roll edit log or send checkpoint > request to ANN. > In a word, ANN will be more active. Suggestions are welcomed. > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14361) SNN will always upload fsimage
[ https://issues.apache.org/jira/browse/HDFS-14361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16879988#comment-16879988 ] star commented on HDFS-14361: - Right. isPrimaryCheckPointer will not be changed when any error/exception occurred. > SNN will always upload fsimage > -- > > Key: HDFS-14361 > URL: https://issues.apache.org/jira/browse/HDFS-14361 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, namenode >Affects Versions: 3.2.0 >Reporter: hunshenshi >Priority: Major > Fix For: 3.2.0 > > > Related to -HDFS-12248.- > {code:java} > boolean sendRequest = isPrimaryCheckPointer > || secsSinceLastUpload >= checkpointConf.getQuietPeriod(); > doCheckpoint(sendRequest); > {code} > If sendRequest is true, SNN will upload fsimage. But isPrimaryCheckPointer > always is true, > {code:java} > if (ie == null && ioe == null) { > //Update only when response from remote about success or > lastUploadTime = monotonicNow(); > // we are primary if we successfully updated the ANN > this.isPrimaryCheckPointer = success; > } > {code} > isPrimaryCheckPointer should be outside the if condition. > If the ANN update was not successful, then isPrimaryCheckPointer should be > set to false. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12914) Block report leases cause missing blocks until next report
[ https://issues.apache.org/jira/browse/HDFS-12914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16877439#comment-16877439 ] star commented on HDFS-12914: - Yes, It is broken in branch-3.0 with 'private' indicator for 'processReport'. There's no such issue for other banch HDFS-11673, in which 'private' indicator is removed. {quote} Collection processReport( final DatanodeStorageInfo storageInfo, final BlockListAsLongs report, BlockReportContext context) throws IOException {{quote} I am not sure whether branch-3.0 should be covered for this issue. [~jojochuang], what do you think? > Block report leases cause missing blocks until next report > -- > > Key: HDFS-12914 > URL: https://issues.apache.org/jira/browse/HDFS-12914 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.8.0, 2.9.2 >Reporter: Daryn Sharp >Assignee: Santosh Marella >Priority: Critical > Fix For: 3.0.4, 3.3.0, 3.2.1, 3.1.3 > > Attachments: HDFS-12914-branch-2.001.patch, > HDFS-12914-trunk.00.patch, HDFS-12914-trunk.01.patch, HDFS-12914.005.patch, > HDFS-12914.006.patch, HDFS-12914.007.patch, HDFS-12914.008.patch, > HDFS-12914.009.patch, HDFS-12914.branch-2.patch, > HDFS-12914.branch-3.1.001.patch, HDFS-12914.branch-3.1.002.patch, > HDFS-12914.branch-3.2.patch, HDFS-12914.utfix.patch > > > {{BlockReportLeaseManager#checkLease}} will reject FBRs from DNs for > conditions such as "unknown datanode", "not in pending set", "lease has > expired", wrong lease id, etc. Lease rejection does not throw an exception. > It returns false which bubbles up to {{NameNodeRpcServer#blockReport}} and > interpreted as {{noStaleStorages}}. > A re-registering node whose FBR is rejected from an invalid lease becomes > active with _no blocks_. A replication storm ensues possibly causing DNs to > temporarily go dead (HDFS-12645), leading to more FBR lease rejections on > re-registration. The cluster will have many "missing blocks" until the DNs > next FBR is sent and/or forced. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-12914) Block report leases cause missing blocks until next report
[ https://issues.apache.org/jira/browse/HDFS-12914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16877439#comment-16877439 ] star edited comment on HDFS-12914 at 7/3/19 2:43 AM: - Yes, It is broken in branch-3.0 with 'private' indicator for 'processReport'. There's no such issue for other branch HDFS-11673, in which 'private' indicator is removed. {quote} Collection processReport( final DatanodeStorageInfo storageInfo, final BlockListAsLongs report, BlockReportContext context) throws IOException { {quote} I am not sure whether branch-3.0 should be covered for this issue. [~jojochuang], what do you think? was (Author: starphin): Yes, It is broken in branch-3.0 with 'private' indicator for 'processReport'. There's no such issue for other banch HDFS-11673, in which 'private' indicator is removed. {quote} Collection processReport( final DatanodeStorageInfo storageInfo, final BlockListAsLongs report, BlockReportContext context) throws IOException {{quote} I am not sure whether branch-3.0 should be covered for this issue. [~jojochuang], what do you think? > Block report leases cause missing blocks until next report > -- > > Key: HDFS-12914 > URL: https://issues.apache.org/jira/browse/HDFS-12914 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.8.0, 2.9.2 >Reporter: Daryn Sharp >Assignee: Santosh Marella >Priority: Critical > Fix For: 3.0.4, 3.3.0, 3.2.1, 3.1.3 > > Attachments: HDFS-12914-branch-2.001.patch, > HDFS-12914-trunk.00.patch, HDFS-12914-trunk.01.patch, HDFS-12914.005.patch, > HDFS-12914.006.patch, HDFS-12914.007.patch, HDFS-12914.008.patch, > HDFS-12914.009.patch, HDFS-12914.branch-2.patch, > HDFS-12914.branch-3.1.001.patch, HDFS-12914.branch-3.1.002.patch, > HDFS-12914.branch-3.2.patch, HDFS-12914.utfix.patch > > > {{BlockReportLeaseManager#checkLease}} will reject FBRs from DNs for > conditions such as "unknown datanode", "not in pending set", "lease has > expired", wrong lease id, etc. Lease rejection does not throw an exception. > It returns false which bubbles up to {{NameNodeRpcServer#blockReport}} and > interpreted as {{noStaleStorages}}. > A re-registering node whose FBR is rejected from an invalid lease becomes > active with _no blocks_. A replication storm ensues possibly causing DNs to > temporarily go dead (HDFS-12645), leading to more FBR lease rejections on > re-registration. The cluster will have many "missing blocks" until the DNs > next FBR is sent and/or forced. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-14424) NN failover failed beacuse of lossing paxos directory
[ https://issues.apache.org/jira/browse/HDFS-14424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] star resolved HDFS-14424. - Resolution: Duplicate > NN failover failed beacuse of lossing paxos directory > - > > Key: HDFS-14424 > URL: https://issues.apache.org/jira/browse/HDFS-14424 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: star >Assignee: star >Priority: Major > > Recently, our hadoop namenode shutdown when switching active namenode, just > because of missing paxos directory. It is created in the default /tmp path > and deleted by os for no operation in 7 days. We can avoid this by moving > journal directory to a none tmp dir, but it‘s better to make sure namenode > works well by a default config. > The issue throws exception similar to HDFS-10659, also caused by missing > paxos directory. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12914) Block report leases cause missing blocks until next report
[ https://issues.apache.org/jira/browse/HDFS-12914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16859347#comment-16859347 ] star commented on HDFS-12914: - lgtm. Thanks [~hexiaoqiao] for anwsering. Failing test maybe checked. > Block report leases cause missing blocks until next report > -- > > Key: HDFS-12914 > URL: https://issues.apache.org/jira/browse/HDFS-12914 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.8.0, 2.9.2 >Reporter: Daryn Sharp >Assignee: Santosh Marella >Priority: Critical > Attachments: HDFS-12914-branch-2.001.patch, > HDFS-12914-trunk.00.patch, HDFS-12914-trunk.01.patch, HDFS-12914.005.patch, > HDFS-12914.006.patch, HDFS-12914.007.patch > > > {{BlockReportLeaseManager#checkLease}} will reject FBRs from DNs for > conditions such as "unknown datanode", "not in pending set", "lease has > expired", wrong lease id, etc. Lease rejection does not throw an exception. > It returns false which bubbles up to {{NameNodeRpcServer#blockReport}} and > interpreted as {{noStaleStorages}}. > A re-registering node whose FBR is rejected from an invalid lease becomes > active with _no blocks_. A replication storm ensues possibly causing DNs to > temporarily go dead (HDFS-12645), leading to more FBR lease rejections on > re-registration. The cluster will have many "missing blocks" until the DNs > next FBR is sent and/or forced. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12914) Block report leases cause missing blocks until next report
[ https://issues.apache.org/jira/browse/HDFS-12914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16859272#comment-16859272 ] star commented on HDFS-12914: - Yes client refers to datanode. DN get REGISTER command after the startup of NN and make fullblock report every 6 hours. It's ok to send other rpc request to NameNode after its first REGISTER command. No need to register again just in case of full block lease id expiration. Feel free to ignore this if doesn't make sense to you. {code:java} I think `client` you mentioned is Datanode, right? If one datanode not register and send some other RPC request to NameNode, it should get RegisterCommand.REGISTER and try to re-register. It seems a normal flow. Thanks Íñigo Goiri and star again. HDFS-12914.007.patch please take another kindly review. {code} > Block report leases cause missing blocks until next report > -- > > Key: HDFS-12914 > URL: https://issues.apache.org/jira/browse/HDFS-12914 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.8.0, 2.9.2 >Reporter: Daryn Sharp >Assignee: Santosh Marella >Priority: Critical > Attachments: HDFS-12914-branch-2.001.patch, > HDFS-12914-trunk.00.patch, HDFS-12914-trunk.01.patch, HDFS-12914.005.patch, > HDFS-12914.006.patch, HDFS-12914.007.patch > > > {{BlockReportLeaseManager#checkLease}} will reject FBRs from DNs for > conditions such as "unknown datanode", "not in pending set", "lease has > expired", wrong lease id, etc. Lease rejection does not throw an exception. > It returns false which bubbles up to {{NameNodeRpcServer#blockReport}} and > interpreted as {{noStaleStorages}}. > A re-registering node whose FBR is rejected from an invalid lease becomes > active with _no blocks_. A replication storm ensues possibly causing DNs to > temporarily go dead (HDFS-12645), leading to more FBR lease rejections on > re-registration. The cluster will have many "missing blocks" until the DNs > next FBR is sent and/or forced. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-12914) Block report leases cause missing blocks until next report
[ https://issues.apache.org/jira/browse/HDFS-12914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16859250#comment-16859250 ] star edited comment on HDFS-12914 at 6/8/19 4:10 PM: - Few comments about your unit tests. # Following codes bypass lease expiration checking logic by removing valid lease id. Better to keep it as it is in running time. {code:java} // Remove full block report lease about dn spyBlockManager.getBlockReportLeaseManager() .removeLease(datanodeDescriptor);{code} 2. Do we really need to response with a RegisterCommand.REGISTER command to client? It's a somewhat heavy command. Should we just let the client know its failure on block report(such as a IOException response as my codes) and just try few more times. was (Author: starphin): Few comments about your unit tests. # Following codes bypass lease expiration checking logic by removing valid lease id. Better to keep it as it is in running time. {code:java} // Remove full block report lease about dn spyBlockManager.getBlockReportLeaseManager() .removeLease(datanodeDescriptor);{code} 2. Do we really need to response with a RegisterCommand.REGISTER command to client? It's a somewhat heavy command. Should we just let the client know its failure on block report and just try few more times. > Block report leases cause missing blocks until next report > -- > > Key: HDFS-12914 > URL: https://issues.apache.org/jira/browse/HDFS-12914 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.8.0, 2.9.2 >Reporter: Daryn Sharp >Assignee: Santosh Marella >Priority: Critical > Attachments: HDFS-12914-branch-2.001.patch, > HDFS-12914-trunk.00.patch, HDFS-12914-trunk.01.patch, HDFS-12914.005.patch, > HDFS-12914.006.patch > > > {{BlockReportLeaseManager#checkLease}} will reject FBRs from DNs for > conditions such as "unknown datanode", "not in pending set", "lease has > expired", wrong lease id, etc. Lease rejection does not throw an exception. > It returns false which bubbles up to {{NameNodeRpcServer#blockReport}} and > interpreted as {{noStaleStorages}}. > A re-registering node whose FBR is rejected from an invalid lease becomes > active with _no blocks_. A replication storm ensues possibly causing DNs to > temporarily go dead (HDFS-12645), leading to more FBR lease rejections on > re-registration. The cluster will have many "missing blocks" until the DNs > next FBR is sent and/or forced. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-12914) Block report leases cause missing blocks until next report
[ https://issues.apache.org/jira/browse/HDFS-12914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16859250#comment-16859250 ] star edited comment on HDFS-12914 at 6/8/19 4:09 PM: - Few comments about your unit tests. # Following codes bypass lease expiration checking logic by removing valid lease id. Better to keep it as it is in running time. {code:java} // Remove full block report lease about dn spyBlockManager.getBlockReportLeaseManager() .removeLease(datanodeDescriptor);{code} 2. Do we really need to response with a RegisterCommand.REGISTER command to client? It's a somewhat heavy command. Should we just let the client know its failure on block report and just try few more times. was (Author: starphin): Few comments about your unit tests. Following codes bypass lease expiration checking logic by removing valid lease id. Better to keep it as it is in running time. {code:java} // Remove full block report lease about dn spyBlockManager.getBlockReportLeaseManager() .removeLease(datanodeDescriptor); {code} > Block report leases cause missing blocks until next report > -- > > Key: HDFS-12914 > URL: https://issues.apache.org/jira/browse/HDFS-12914 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.8.0, 2.9.2 >Reporter: Daryn Sharp >Assignee: Santosh Marella >Priority: Critical > Attachments: HDFS-12914-branch-2.001.patch, > HDFS-12914-trunk.00.patch, HDFS-12914-trunk.01.patch, HDFS-12914.005.patch, > HDFS-12914.006.patch > > > {{BlockReportLeaseManager#checkLease}} will reject FBRs from DNs for > conditions such as "unknown datanode", "not in pending set", "lease has > expired", wrong lease id, etc. Lease rejection does not throw an exception. > It returns false which bubbles up to {{NameNodeRpcServer#blockReport}} and > interpreted as {{noStaleStorages}}. > A re-registering node whose FBR is rejected from an invalid lease becomes > active with _no blocks_. A replication storm ensues possibly causing DNs to > temporarily go dead (HDFS-12645), leading to more FBR lease rejections on > re-registration. The cluster will have many "missing blocks" until the DNs > next FBR is sent and/or forced. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-12914) Block report leases cause missing blocks until next report
[ https://issues.apache.org/jira/browse/HDFS-12914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16859246#comment-16859246 ] star edited comment on HDFS-12914 at 6/8/19 3:56 PM: - [~hexiaoqiao], I also write a unit test for this issue, mostly similar to yours. Pasted here just for ref. Other than the test code, a piece of code changed. BlockManager#processReport will throw IOException to indicate an invalid lease id. Client will get the exception. {code:java} if (context != null) { if (!blockReportLeaseManager.checkLease(node, startTime, context.getLeaseId())) { throw new IOException("Invalid block report lease id '"+context.getLeaseId()+"'"); } }{code} {code:java} //Before test start conf.setLong(DFSConfigKeys.DFS_NAMENODE_FULL_BLOCK_REPORT_LEASE_LENGTH_MS, 500L); @Test public void testDelayedBlockReport() throws IOException{ FSNamesystem namesystem = cluster.getNameNode(0).getNamesystem(); BlockManager testBlockManager = Mockito.spy(namesystem.getBlockManager()); Mockito.doAnswer(new Answer() { @Override public Boolean answer(InvocationOnMock invocationOnMock) throws Throwable { //sleep 1000 ms to delay processing of current report Thread.sleep(1000); return (Boolean)invocationOnMock.callRealMethod(); } }).when(testBlockManager).processReport( Mockito.any(DatanodeID.class), Mockito.any(DatanodeStorage.class), Mockito.any(BlockListAsLongs.class), Mockito.any(BlockReportContext.class)); namesystem.setBlockManagerForTesting(testBlockManager); String bpid = namesystem.getBlockPoolId(); DataNode dn = cluster.getDataNodes().get(0); DatanodeRegistration dnReg = dn.getDNRegistrationForBP(bpid); namesystem.readLock(); long leaseId = testBlockManager.requestBlockReportLeaseId(dnReg); namesystem.readUnlock(); Map report = cluster.getBlockReport(bpid, 0); List reportList = new ArrayList<>(); for(Map.Entry en : report.entrySet()){ reportList.add(new StorageBlockReport(en.getKey(), en.getValue())); } //it will throw IOException if lease id is invalid cluster.getNameNode().getRpcServer().blockReport( dnReg, bpid, reportList.toArray(new StorageBlockReport[]{}), new BlockReportContext(1, 0, System.nanoTime(), leaseId, true)); } {code} was (Author: starphin): [~hexiaoqiao], I also write a unit test for this issue, mostly similar to yours. Pasted here just for ref. Other than the test code, a piece of code changed. BlockManager#processReport will throw IOException to indicate an invalid lease id. Client will get the exception. {code:java} if (context != null) { if (!blockReportLeaseManager.checkLease(node, startTime, context.getLeaseId())) { throw new IOException("Invalid block report lease id '"+context.getLeaseId()+"'"); } }{code} {code:java} @Test public void testDelayedBlockReport() throws IOException{ FSNamesystem namesystem = cluster.getNameNode(0).getNamesystem(); BlockManager testBlockManager = Mockito.spy(namesystem.getBlockManager()); Mockito.doAnswer(new Answer() { @Override public Boolean answer(InvocationOnMock invocationOnMock) throws Throwable { //sleep 1000 ms to delay processing of current report Thread.sleep(1000); return (Boolean)invocationOnMock.callRealMethod(); } }).when(testBlockManager).processReport( Mockito.any(DatanodeID.class), Mockito.any(DatanodeStorage.class), Mockito.any(BlockListAsLongs.class), Mockito.any(BlockReportContext.class)); namesystem.setBlockManagerForTesting(testBlockManager); String bpid = namesystem.getBlockPoolId(); DataNode dn = cluster.getDataNodes().get(0); DatanodeRegistration dnReg = dn.getDNRegistrationForBP(bpid); namesystem.readLock(); long leaseId = testBlockManager.requestBlockReportLeaseId(dnReg); namesystem.readUnlock(); Map report = cluster.getBlockReport(bpid, 0); List reportList = new ArrayList<>(); for(Map.Entry en : report.entrySet()){ reportList.add(new StorageBlockReport(en.getKey(), en.getValue())); } //it will throw IOException if lease id is invalid cluster.getNameNode().getRpcServer().blockReport( dnReg, bpid, reportList.toArray(new StorageBlockReport[]{}), new BlockReportContext(1, 0, System.nanoTime(), leaseId, true)); } {code} > Block report leases cause missing blocks until next report > -- > > Key: HDFS-12914 > URL: https://issues.apache.org/jira/browse/HDFS-12914 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.8.0, 2.9.2 >Reporter: Daryn Sharp >Assignee: Santosh Marella >Priority: Critical > Attachments: HDFS-12914-branch-2.001.patch, > HDFS-12914-trunk.00.patch,
[jira] [Commented] (HDFS-12914) Block report leases cause missing blocks until next report
[ https://issues.apache.org/jira/browse/HDFS-12914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16859250#comment-16859250 ] star commented on HDFS-12914: - Few comments about your unit tests. Following codes bypass lease expiration checking logic by removing valid lease id. Better to keep it as it is in running time. {code:java} // Remove full block report lease about dn spyBlockManager.getBlockReportLeaseManager() .removeLease(datanodeDescriptor); {code} > Block report leases cause missing blocks until next report > -- > > Key: HDFS-12914 > URL: https://issues.apache.org/jira/browse/HDFS-12914 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.8.0, 2.9.2 >Reporter: Daryn Sharp >Assignee: Santosh Marella >Priority: Critical > Attachments: HDFS-12914-branch-2.001.patch, > HDFS-12914-trunk.00.patch, HDFS-12914-trunk.01.patch, HDFS-12914.005.patch, > HDFS-12914.006.patch > > > {{BlockReportLeaseManager#checkLease}} will reject FBRs from DNs for > conditions such as "unknown datanode", "not in pending set", "lease has > expired", wrong lease id, etc. Lease rejection does not throw an exception. > It returns false which bubbles up to {{NameNodeRpcServer#blockReport}} and > interpreted as {{noStaleStorages}}. > A re-registering node whose FBR is rejected from an invalid lease becomes > active with _no blocks_. A replication storm ensues possibly causing DNs to > temporarily go dead (HDFS-12645), leading to more FBR lease rejections on > re-registration. The cluster will have many "missing blocks" until the DNs > next FBR is sent and/or forced. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12914) Block report leases cause missing blocks until next report
[ https://issues.apache.org/jira/browse/HDFS-12914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16859246#comment-16859246 ] star commented on HDFS-12914: - [~hexiaoqiao], I also write a unit test for this issue, mostly similar to yours. Pasted here just for ref. Other than the test code, a piece of code changed. BlockManager#processReport will throw IOException to indicate an invalid lease id. Client will get the exception. {code:java} if (context != null) { if (!blockReportLeaseManager.checkLease(node, startTime, context.getLeaseId())) { throw new IOException("Invalid block report lease id '"+context.getLeaseId()+"'"); } }{code} {code:java} @Test public void testDelayedBlockReport() throws IOException{ FSNamesystem namesystem = cluster.getNameNode(0).getNamesystem(); BlockManager testBlockManager = Mockito.spy(namesystem.getBlockManager()); Mockito.doAnswer(new Answer() { @Override public Boolean answer(InvocationOnMock invocationOnMock) throws Throwable { //sleep 1000 ms to delay processing of current report Thread.sleep(1000); return (Boolean)invocationOnMock.callRealMethod(); } }).when(testBlockManager).processReport( Mockito.any(DatanodeID.class), Mockito.any(DatanodeStorage.class), Mockito.any(BlockListAsLongs.class), Mockito.any(BlockReportContext.class)); namesystem.setBlockManagerForTesting(testBlockManager); String bpid = namesystem.getBlockPoolId(); DataNode dn = cluster.getDataNodes().get(0); DatanodeRegistration dnReg = dn.getDNRegistrationForBP(bpid); namesystem.readLock(); long leaseId = testBlockManager.requestBlockReportLeaseId(dnReg); namesystem.readUnlock(); Map report = cluster.getBlockReport(bpid, 0); List reportList = new ArrayList<>(); for(Map.Entry en : report.entrySet()){ reportList.add(new StorageBlockReport(en.getKey(), en.getValue())); } //it will throw IOException if lease id is invalid cluster.getNameNode().getRpcServer().blockReport( dnReg, bpid, reportList.toArray(new StorageBlockReport[]{}), new BlockReportContext(1, 0, System.nanoTime(), leaseId, true)); } {code} > Block report leases cause missing blocks until next report > -- > > Key: HDFS-12914 > URL: https://issues.apache.org/jira/browse/HDFS-12914 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.8.0, 2.9.2 >Reporter: Daryn Sharp >Assignee: Santosh Marella >Priority: Critical > Attachments: HDFS-12914-branch-2.001.patch, > HDFS-12914-trunk.00.patch, HDFS-12914-trunk.01.patch, HDFS-12914.005.patch, > HDFS-12914.006.patch > > > {{BlockReportLeaseManager#checkLease}} will reject FBRs from DNs for > conditions such as "unknown datanode", "not in pending set", "lease has > expired", wrong lease id, etc. Lease rejection does not throw an exception. > It returns false which bubbles up to {{NameNodeRpcServer#blockReport}} and > interpreted as {{noStaleStorages}}. > A re-registering node whose FBR is rejected from an invalid lease becomes > active with _no blocks_. A replication storm ensues possibly causing DNs to > temporarily go dead (HDFS-12645), leading to more FBR lease rejections on > re-registration. The cluster will have many "missing blocks" until the DNs > next FBR is sent and/or forced. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-5853) Add "hadoop.user.group.metrics.percentiles.intervals" to hdfs-default.xml
[ https://issues.apache.org/jira/browse/HDFS-5853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] star reassigned HDFS-5853: -- Assignee: (was: star) > Add "hadoop.user.group.metrics.percentiles.intervals" to hdfs-default.xml > - > > Key: HDFS-5853 > URL: https://issues.apache.org/jira/browse/HDFS-5853 > Project: Hadoop HDFS > Issue Type: Improvement > Components: documentation, namenode >Affects Versions: 2.3.0 >Reporter: Akira Ajisaka >Priority: Minor > Fix For: 2.7.0 > > Attachments: HDFS-5853.patch > > > "hadoop.user.group.metrics.percentiles.intervals" was added in HDFS-5220, but > the parameter is not written in hdfs-default.xml. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-5853) Add "hadoop.user.group.metrics.percentiles.intervals" to hdfs-default.xml
[ https://issues.apache.org/jira/browse/HDFS-5853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] star reassigned HDFS-5853: -- Assignee: star (was: Akira Ajisaka) > Add "hadoop.user.group.metrics.percentiles.intervals" to hdfs-default.xml > - > > Key: HDFS-5853 > URL: https://issues.apache.org/jira/browse/HDFS-5853 > Project: Hadoop HDFS > Issue Type: Improvement > Components: documentation, namenode >Affects Versions: 2.3.0 >Reporter: Akira Ajisaka >Assignee: star >Priority: Minor > Fix For: 2.7.0 > > Attachments: HDFS-5853.patch > > > "hadoop.user.group.metrics.percentiles.intervals" was added in HDFS-5220, but > the parameter is not written in hdfs-default.xml. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-12914) Block report leases cause missing blocks until next report
[ https://issues.apache.org/jira/browse/HDFS-12914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16845414#comment-16845414 ] star edited comment on HDFS-12914 at 5/22/19 2:58 AM: -- Thanks [~smarella]. A few questioins for reference only. 1. Guess it's a mistake updating version of protobuf. {code:java} 2.5.0.t02 {code} 2. Do we need a test case? 3. Should we do more logs about full block lease as INFO level so that we can inspect issues easier? [~smarella], [~hexiaoqiao],[~jojochuang]. was (Author: starphin): Thanks [~smarella]. A few questioins for reference only. 1. Guess it's a mistake updating version of protobuf. {code:java} 2.5.0.t02 {code} 2. Do we need a test case? 3. Should we do more logs about full block lease so that we can inspect issues easier? [~smarella], [~hexiaoqiao],[~jojochuang]. > Block report leases cause missing blocks until next report > -- > > Key: HDFS-12914 > URL: https://issues.apache.org/jira/browse/HDFS-12914 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.8.0, 2.9.2 >Reporter: Daryn Sharp >Assignee: Santosh Marella >Priority: Critical > Attachments: HDFS-12914-branch-2.001.patch, > HDFS-12914-trunk.00.patch, HDFS-12914-trunk.01.patch > > > {{BlockReportLeaseManager#checkLease}} will reject FBRs from DNs for > conditions such as "unknown datanode", "not in pending set", "lease has > expired", wrong lease id, etc. Lease rejection does not throw an exception. > It returns false which bubbles up to {{NameNodeRpcServer#blockReport}} and > interpreted as {{noStaleStorages}}. > A re-registering node whose FBR is rejected from an invalid lease becomes > active with _no blocks_. A replication storm ensues possibly causing DNs to > temporarily go dead (HDFS-12645), leading to more FBR lease rejections on > re-registration. The cluster will have many "missing blocks" until the DNs > next FBR is sent and/or forced. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12914) Block report leases cause missing blocks until next report
[ https://issues.apache.org/jira/browse/HDFS-12914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16845414#comment-16845414 ] star commented on HDFS-12914: - Thanks [~smarella]. A few questioins for reference only. 1. Guess it's a mistake updating version of protobuf. {code:java} 2.5.0.t02 {code} 2. Do we need a test case? 3. Should we do more logs about full block lease so that we can inspect issues easier? [~smarella], [~hexiaoqiao],[~jojochuang]. > Block report leases cause missing blocks until next report > -- > > Key: HDFS-12914 > URL: https://issues.apache.org/jira/browse/HDFS-12914 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.8.0, 2.9.2 >Reporter: Daryn Sharp >Assignee: Santosh Marella >Priority: Critical > Attachments: HDFS-12914-branch-2.001.patch, HDFS-12914-trunk.00.patch > > > {{BlockReportLeaseManager#checkLease}} will reject FBRs from DNs for > conditions such as "unknown datanode", "not in pending set", "lease has > expired", wrong lease id, etc. Lease rejection does not throw an exception. > It returns false which bubbles up to {{NameNodeRpcServer#blockReport}} and > interpreted as {{noStaleStorages}}. > A re-registering node whose FBR is rejected from an invalid lease becomes > active with _no blocks_. A replication storm ensues possibly causing DNs to > temporarily go dead (HDFS-12645), leading to more FBR lease rejections on > re-registration. The cluster will have many "missing blocks" until the DNs > next FBR is sent and/or forced. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-12914) Block report leases cause missing blocks until next report
[ https://issues.apache.org/jira/browse/HDFS-12914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844056#comment-16844056 ] star edited comment on HDFS-12914 at 5/20/19 3:47 PM: -- [~hexiaoqiao] proposed a good solution to solve this issue. Except for rpc call queue, there's also a queue for processing block report. Maybe we should consider this and check lease id before putting it in the block report queue. I think this mainly causes delaying of block report process. was (Author: starphin): [~hexiaoqiao] proposed a good solution to solve this issue. Except for rpc call queue, there's also a queue for processing block report. Maybe we should consider this and check lease id before putting it in the block report queue. > Block report leases cause missing blocks until next report > -- > > Key: HDFS-12914 > URL: https://issues.apache.org/jira/browse/HDFS-12914 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.8.0 >Reporter: Daryn Sharp >Priority: Critical > > {{BlockReportLeaseManager#checkLease}} will reject FBRs from DNs for > conditions such as "unknown datanode", "not in pending set", "lease has > expired", wrong lease id, etc. Lease rejection does not throw an exception. > It returns false which bubbles up to {{NameNodeRpcServer#blockReport}} and > interpreted as {{noStaleStorages}}. > A re-registering node whose FBR is rejected from an invalid lease becomes > active with _no blocks_. A replication storm ensues possibly causing DNs to > temporarily go dead (HDFS-12645), leading to more FBR lease rejections on > re-registration. The cluster will have many "missing blocks" until the DNs > next FBR is sent and/or forced. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12914) Block report leases cause missing blocks until next report
[ https://issues.apache.org/jira/browse/HDFS-12914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844056#comment-16844056 ] star commented on HDFS-12914: - [~hexiaoqiao] proposed a good solution to solve this issue. Except for rpc call queue, there's also a queue for processing block report. Maybe we should consider this and check lease id before putting it in the block report queue. > Block report leases cause missing blocks until next report > -- > > Key: HDFS-12914 > URL: https://issues.apache.org/jira/browse/HDFS-12914 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.8.0 >Reporter: Daryn Sharp >Priority: Critical > > {{BlockReportLeaseManager#checkLease}} will reject FBRs from DNs for > conditions such as "unknown datanode", "not in pending set", "lease has > expired", wrong lease id, etc. Lease rejection does not throw an exception. > It returns false which bubbles up to {{NameNodeRpcServer#blockReport}} and > interpreted as {{noStaleStorages}}. > A re-registering node whose FBR is rejected from an invalid lease becomes > active with _no blocks_. A replication storm ensues possibly causing DNs to > temporarily go dead (HDFS-12645), leading to more FBR lease rejections on > re-registration. The cluster will have many "missing blocks" until the DNs > next FBR is sent and/or forced. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-12914) Block report leases cause missing blocks until next report
[ https://issues.apache.org/jira/browse/HDFS-12914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16843829#comment-16843829 ] star edited comment on HDFS-12914 at 5/20/19 10:08 AM: --- [~smarella] how many DNs do you have? According to the limited logs, I think it is caused by following case. A high cpu load of SNN delayed the processing of full block report. ||DN1...||DN2|| |register|register| |request Lease| | |process Report| | |...|request Lease| |process Report|{color:#707070}_more than 5 minutes_{color}| |...|{color:#d04437}process Report (failed){color}| There's no logs between 2019-05-16 15:15:35 and 2019-05-16 15:31:11. Logs unrelated to 10.54.63.120:50010 are filtered out, right [~smarella]? In that time, I think the SNN is processing blockreports from other DN. Untill 2019-05-16 15:31:11, SNN began to process block reports from that DN. It is 6 minutes after when full block lease id is requested, beyond default expire value 5 minutes (DFS_NAMENODE_FULL_BLOCK_REPORT_LEASE_LENGTH_MS_DEFAULT). Don't known when a full block lease id is got from server, for there's no info log about it. I guess it's about 5 minutes before the first failed report, say 15:26:29. was (Author: starphin): [~smarella] how many DNs do you have? According to the limited logs, I think it is caused by following case. A high cpu load of SNN delayed the processing of full block report. ||DN1...||DN2|| |register|register| |request Lease| | |process Report| | |...|request Lease| |process Report|{color:#707070}_more than 5 minutes_{color}| |...|process Report| There's no logs between 2019-05-16 15:15:35 and 2019-05-16 15:31:11. Logs unrelated to 10.54.63.120:50010 are filtered out, right [~smarella]? In that time, I think the SNN is processing blockreports from other DN. Untill 2019-05-16 15:31:11, SNN began to process block reports from that DN. It is 6 minutes after when full block lease id is requested, beyond default expire value 5 minutes (DFS_NAMENODE_FULL_BLOCK_REPORT_LEASE_LENGTH_MS_DEFAULT). Don't known when a full block lease id is got from server, for there's no info log about it. I guess it's about 5 minutes before the first failed report, say 15:26:29. > Block report leases cause missing blocks until next report > -- > > Key: HDFS-12914 > URL: https://issues.apache.org/jira/browse/HDFS-12914 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.8.0 >Reporter: Daryn Sharp >Priority: Critical > > {{BlockReportLeaseManager#checkLease}} will reject FBRs from DNs for > conditions such as "unknown datanode", "not in pending set", "lease has > expired", wrong lease id, etc. Lease rejection does not throw an exception. > It returns false which bubbles up to {{NameNodeRpcServer#blockReport}} and > interpreted as {{noStaleStorages}}. > A re-registering node whose FBR is rejected from an invalid lease becomes > active with _no blocks_. A replication storm ensues possibly causing DNs to > temporarily go dead (HDFS-12645), leading to more FBR lease rejections on > re-registration. The cluster will have many "missing blocks" until the DNs > next FBR is sent and/or forced. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-12914) Block report leases cause missing blocks until next report
[ https://issues.apache.org/jira/browse/HDFS-12914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16843829#comment-16843829 ] star edited comment on HDFS-12914 at 5/20/19 10:07 AM: --- [~smarella] how many DNs do you have? According to the limited logs, I think it is caused by following case. A high cpu load of SNN delayed the processing of full block report. ||DN1...||DN2|| |register|register| |request Lease| | |process Request| | |...|request Lease| |process Request|{color:#707070}_more than 5 minutes_{color}| |...|process Request| There's no logs between 2019-05-16 15:15:35 and 2019-05-16 15:31:11. Logs unrelated to 10.54.63.120:50010 are filtered out, right [~smarella]? In that time, I think the SNN is processing blockreports from other DN. Untill 2019-05-16 15:31:11, SNN began to process block reports from that DN. It is 6 minutes after when full block lease id is requested, beyond default expire value 5 minutes (DFS_NAMENODE_FULL_BLOCK_REPORT_LEASE_LENGTH_MS_DEFAULT). Don't known when a full block lease id is got from server, for there's no info log about it. I guess it's about 5 minutes before the first failed report, say 15:26:29. was (Author: starphin): [~smarella] how many DNs do you have? According to the limited logs, I think it is caused by following case. A high load delayed the process of full block report. ||DN1...||DN2|| |register|register| |request Lease| | |process Request| | |...|request Lease| |process Request|{color:#707070}_more than 5 minutes_{color}| |...|process Request| There's no logs between 2019-05-16 15:15:35 and 2019-05-16 15:31:11. Logs unrelated to 10.54.63.120:50010 are filtered out, right [~smarella]? In that time, I think the SNN is processing blockreports from other DN. Untill 2019-05-16 15:31:11, SNN began to process block reports from that DN. It is 6 minutes after when full block lease id is requested, beyond default expire value 5 minutes (DFS_NAMENODE_FULL_BLOCK_REPORT_LEASE_LENGTH_MS_DEFAULT). Don't known when a full block lease id is got from server, for there's no info log about it. I guess it's about 5 minutes before the first failed report, say 15:26:29. > Block report leases cause missing blocks until next report > -- > > Key: HDFS-12914 > URL: https://issues.apache.org/jira/browse/HDFS-12914 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.8.0 >Reporter: Daryn Sharp >Priority: Critical > > {{BlockReportLeaseManager#checkLease}} will reject FBRs from DNs for > conditions such as "unknown datanode", "not in pending set", "lease has > expired", wrong lease id, etc. Lease rejection does not throw an exception. > It returns false which bubbles up to {{NameNodeRpcServer#blockReport}} and > interpreted as {{noStaleStorages}}. > A re-registering node whose FBR is rejected from an invalid lease becomes > active with _no blocks_. A replication storm ensues possibly causing DNs to > temporarily go dead (HDFS-12645), leading to more FBR lease rejections on > re-registration. The cluster will have many "missing blocks" until the DNs > next FBR is sent and/or forced. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-12914) Block report leases cause missing blocks until next report
[ https://issues.apache.org/jira/browse/HDFS-12914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16843829#comment-16843829 ] star edited comment on HDFS-12914 at 5/20/19 10:07 AM: --- [~smarella] how many DNs do you have? According to the limited logs, I think it is caused by following case. A high cpu load of SNN delayed the processing of full block report. ||DN1...||DN2|| |register|register| |request Lease| | |process Report| | |...|request Lease| |process Report|{color:#707070}_more than 5 minutes_{color}| |...|process Report| There's no logs between 2019-05-16 15:15:35 and 2019-05-16 15:31:11. Logs unrelated to 10.54.63.120:50010 are filtered out, right [~smarella]? In that time, I think the SNN is processing blockreports from other DN. Untill 2019-05-16 15:31:11, SNN began to process block reports from that DN. It is 6 minutes after when full block lease id is requested, beyond default expire value 5 minutes (DFS_NAMENODE_FULL_BLOCK_REPORT_LEASE_LENGTH_MS_DEFAULT). Don't known when a full block lease id is got from server, for there's no info log about it. I guess it's about 5 minutes before the first failed report, say 15:26:29. was (Author: starphin): [~smarella] how many DNs do you have? According to the limited logs, I think it is caused by following case. A high cpu load of SNN delayed the processing of full block report. ||DN1...||DN2|| |register|register| |request Lease| | |process Request| | |...|request Lease| |process Request|{color:#707070}_more than 5 minutes_{color}| |...|process Request| There's no logs between 2019-05-16 15:15:35 and 2019-05-16 15:31:11. Logs unrelated to 10.54.63.120:50010 are filtered out, right [~smarella]? In that time, I think the SNN is processing blockreports from other DN. Untill 2019-05-16 15:31:11, SNN began to process block reports from that DN. It is 6 minutes after when full block lease id is requested, beyond default expire value 5 minutes (DFS_NAMENODE_FULL_BLOCK_REPORT_LEASE_LENGTH_MS_DEFAULT). Don't known when a full block lease id is got from server, for there's no info log about it. I guess it's about 5 minutes before the first failed report, say 15:26:29. > Block report leases cause missing blocks until next report > -- > > Key: HDFS-12914 > URL: https://issues.apache.org/jira/browse/HDFS-12914 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.8.0 >Reporter: Daryn Sharp >Priority: Critical > > {{BlockReportLeaseManager#checkLease}} will reject FBRs from DNs for > conditions such as "unknown datanode", "not in pending set", "lease has > expired", wrong lease id, etc. Lease rejection does not throw an exception. > It returns false which bubbles up to {{NameNodeRpcServer#blockReport}} and > interpreted as {{noStaleStorages}}. > A re-registering node whose FBR is rejected from an invalid lease becomes > active with _no blocks_. A replication storm ensues possibly causing DNs to > temporarily go dead (HDFS-12645), leading to more FBR lease rejections on > re-registration. The cluster will have many "missing blocks" until the DNs > next FBR is sent and/or forced. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12914) Block report leases cause missing blocks until next report
[ https://issues.apache.org/jira/browse/HDFS-12914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16843829#comment-16843829 ] star commented on HDFS-12914: - [~smarella] how many DNs do you have? According to the limited logs, I think it is caused by following case. A high load delayed the process of full block report. ||DN1...||DN2|| |register|register| |request Lease| | |process Request| | |...|request Lease| |process Request|{color:#707070}_more than 5 minutes_{color}| |...|process Request| There's no logs between 2019-05-16 15:15:35 and 2019-05-16 15:31:11. Logs unrelated to 10.54.63.120:50010 are filtered out, right [~smarella]? In that time, I think the SNN is processing blockreports from other DN. Untill 2019-05-16 15:31:11, SNN began to process block reports from that DN. It is 6 minutes after when full block lease id is requested, beyond default expire value 5 minutes (DFS_NAMENODE_FULL_BLOCK_REPORT_LEASE_LENGTH_MS_DEFAULT). Don't known when a full block lease id is got from server, for there's no info log about it. I guess it's about 5 minutes before the first failed report, say 15:26:29. > Block report leases cause missing blocks until next report > -- > > Key: HDFS-12914 > URL: https://issues.apache.org/jira/browse/HDFS-12914 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.8.0 >Reporter: Daryn Sharp >Priority: Critical > > {{BlockReportLeaseManager#checkLease}} will reject FBRs from DNs for > conditions such as "unknown datanode", "not in pending set", "lease has > expired", wrong lease id, etc. Lease rejection does not throw an exception. > It returns false which bubbles up to {{NameNodeRpcServer#blockReport}} and > interpreted as {{noStaleStorages}}. > A re-registering node whose FBR is rejected from an invalid lease becomes > active with _no blocks_. A replication storm ensues possibly causing DNs to > temporarily go dead (HDFS-12645), leading to more FBR lease rejections on > re-registration. The cluster will have many "missing blocks" until the DNs > next FBR is sent and/or forced. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14489) fix naming issue for ScmBlockLocationTestingClient
[ https://issues.apache.org/jira/browse/HDFS-14489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] star updated HDFS-14489: Component/s: (was: fs/ozone) ozone > fix naming issue for ScmBlockLocationTestingClient > -- > > Key: HDFS-14489 > URL: https://issues.apache.org/jira/browse/HDFS-14489 > Project: Hadoop HDFS > Issue Type: Bug > Components: ozone >Affects Versions: HDFS-7240 >Reporter: star >Assignee: star >Priority: Major > Attachments: HDFS-14489.patch > > > class 'ScmBlockLocationTestIngClient' is not named in Camel-Case form. Rename > it to ScmBlockLocationTestingClient. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14489) fix naming issue for ScmBlockLocationTestingClient
[ https://issues.apache.org/jira/browse/HDFS-14489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] star updated HDFS-14489: Attachment: HDFS-14489.patch > fix naming issue for ScmBlockLocationTestingClient > -- > > Key: HDFS-14489 > URL: https://issues.apache.org/jira/browse/HDFS-14489 > Project: Hadoop HDFS > Issue Type: Bug > Components: fs/ozone >Affects Versions: HDFS-7240 >Reporter: star >Assignee: star >Priority: Major > Attachments: HDFS-14489.patch > > > class 'ScmBlockLocationTestIngClient' is not named in Camel-Case form. Rename > it to ScmBlockLocationTestingClient. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-14489) fix naming issue for ScmBlockLocationTestingClient
star created HDFS-14489: --- Summary: fix naming issue for ScmBlockLocationTestingClient Key: HDFS-14489 URL: https://issues.apache.org/jira/browse/HDFS-14489 Project: Hadoop HDFS Issue Type: Bug Components: fs/ozone Affects Versions: HDFS-7240 Reporter: star Assignee: star class 'ScmBlockLocationTestIngClient' is not named in Camel-Case form. Rename it to ScmBlockLocationTestingClient. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14476) lock too long when fix inconsistent blocks between disk and in-memory
[ https://issues.apache.org/jira/browse/HDFS-14476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16834675#comment-16834675 ] star commented on HDFS-14476: - HDFS-10477 may be a useful reference. Seems like a ubiquitous problem with large amount of blocks. > lock too long when fix inconsistent blocks between disk and in-memory > - > > Key: HDFS-14476 > URL: https://issues.apache.org/jira/browse/HDFS-14476 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 2.6.0, 2.7.0 >Reporter: Sean Chow >Priority: Major > > When directoryScanner have the results of differences between disk and > in-memory blocks. it will try to run `checkAndUpdate` to fix it. However > `FsDatasetImpl.checkAndUpdate` is a synchronized call > As I have about 6millions blocks for every datanodes and every 6hours' scan > will have about 25000 abnormal blocks to fix. That leads to a long lock > holding FsDatasetImpl object. > let's assume every block need 10ms to fix(because of latency of SAS disk), > that will cost 250 seconds to finish. That means all reads and writes will be > blocked for 3mins for that datanode. > > {code:java} > 2019-05-06 08:06:51,704 INFO > org.apache.hadoop.hdfs.server.datanode.DirectoryScanner: BlockPool > BP-1644920766-10.223.143.220-1450099987967 Total blocks: 6850197, missing > metadata files:23574, missing block files:23574, missing blocks in > memory:47625, mismatched blocks:0 > ... > 2019-05-06 08:16:41,625 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > Took 588402ms to process 1 commands from NN > {code} > Take long time to process command from nn because threads are blocked. And > namenode will see long lastContact time for this datanode. > Maybe this affect all hdfs versions. > *to fix:* > just like process invalidate command from namenode with 1000 batch size, fix > these abnormal block should be handled with batch too and sleep 2 seconds > between the batch to allow normal reading/writing blocks. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-14349) Edit log may be rolled more frequently than necessary with multiple Standby nodes
[ https://issues.apache.org/jira/browse/HDFS-14349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16795553#comment-16795553 ] star edited comment on HDFS-14349 at 5/2/19 3:41 PM: - It has been verified that edit log roll will be triggered by multiple SNN. I‘ve proposed a improvement issue HDFS-14378 to put things right once and for all. The main idea is to make ANN rolling its edit log and download fsimage randomly from one SNN. SNN will just do checkpointing and tail editlogs. You are welcome to make a review or make contributions. was (Author: starphin): Yes, it seems that normal edit log roll will be triggered by multiple SNN. I am doing unit tests to verify the action. > Edit log may be rolled more frequently than necessary with multiple Standby > nodes > - > > Key: HDFS-14349 > URL: https://issues.apache.org/jira/browse/HDFS-14349 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, hdfs, qjm >Reporter: Erik Krogen >Assignee: Ekanth Sethuramalingam >Priority: Major > > When HDFS-14317 was fixed, we tackled the problem that in a cluster with > in-progress edit log tailing enabled, a Standby NameNode may _never_ roll the > edit logs, which can eventually cause data loss. > Unfortunately, in the process, it was made so that if there are multiple > Standby NameNodes, they will all roll the edit logs at their specified > frequency, so the edit log will be rolled X times more frequently than they > should be (where X is the number of Standby NNs). This is not as bad as the > original bug since rolling frequently does not affect correctness or data > availability, but may degrade performance by creating more edit log segments > than necessary. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13189) Standby NameNode should roll active edit log when checkpointing
[ https://issues.apache.org/jira/browse/HDFS-13189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16831694#comment-16831694 ] star commented on HDFS-13189: - I‘ve proposed a improvement issue HDFS-14378 to put things right once and for all. The main idea is to make ANN rolling its edit log and download fsimage randomly from one SNN. SNN will just do checkpointing and tail editlogs. You are welcome to make a review or make contributions. > Standby NameNode should roll active edit log when checkpointing > --- > > Key: HDFS-13189 > URL: https://issues.apache.org/jira/browse/HDFS-13189 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Reporter: Chao Sun >Priority: Minor > > When the SBN is doing checkpointing, it will hold the {{cpLock}}. In the > current implementation of edit log tailer thread, it will first check and > roll active edit log, and then tail and apply edits. In the case of > checkpointing, it will be blocked on the {{cpLock}} and will not roll the > edit log. > It seems there is no dependency between the edit log roll and tailing edits, > so a better may be to do these in separate threads. This will be helpful for > people who uses the observer feature without in-progress edit log tailing. > An alternative is to configure > {{dfs.namenode.edit.log.autoroll.multiplier.threshold}} and > {{dfs.namenode.edit.log.autoroll.check.interval.ms}} to let ANN roll its own > log more frequently in case SBN is stuck on the lock. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-14437) Exception happened when rollEditLog expects empty EditsDoubleBuffer.bufCurrent but not
[ https://issues.apache.org/jira/browse/HDFS-14437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16826604#comment-16826604 ] star edited comment on HDFS-14437 at 4/26/19 3:21 AM: -- As [~kihwal] said in HDFS-10943 that '{{rollEditLog()}} does not and cannot solely depend on {{FSEditLog}} synchronization', it's not enough to reproduce this issue in FSEditLog level. We should reproduce it in the FSNamesystem level or even RPC level. I've tried that but failed. As far as I know, all method of FSEditLog are called in FSNamesystem with either readlock or writelock. was (Author: starphin): As [~kihwal] said in HDFS-10943 that '{{rollEditLog()}} does not and cannot solely depend on {{FSEditLog}} synchronization', it's not enough to reproduce such issue in FSEditLog level. We should reproduce it in the FSNamesystem level or event RPC level. As far as I know, all method of FSEditLog are called in FSNamesystem with either readlock or writelock. > Exception happened when rollEditLog expects empty > EditsDoubleBuffer.bufCurrent but not > - > > Key: HDFS-14437 > URL: https://issues.apache.org/jira/browse/HDFS-14437 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, namenode, qjm >Reporter: angerszhu >Priority: Major > > For the problem mentioned in https://issues.apache.org/jira/browse/HDFS-10943 > , I have sort the process of write and flush EditLog and some important > function, I found the in the class FSEditLog class, the close() function > will call such process like below: > > {code:java} > waitForSyncToFinish(); > endCurrentLogSegment(true);{code} > since we have gain the object lock in the function close(), so when > waitForSyncToFish() method return, it mean all logSync job has done and all > data in bufReady has been flushed out, and since current thread has the lock > of this object, when call endCurrentLogSegment(), no other thread will gain > the lock so they can't write new editlog into currentBuf. > But when we don't call waitForSyncToFish() before endCurrentLogSegment(), > there may be some autoScheduled logSync()'s flush process is doing, since > this process don't need > synchronization since it has mention in the comment of logSync() method : > > {code:java} > /** > * Sync all modifications done by this thread. > * > * The internal concurrency design of this class is as follows: > * - Log items are written synchronized into an in-memory buffer, > * and each assigned a transaction ID. > * - When a thread (client) would like to sync all of its edits, logSync() > * uses a ThreadLocal transaction ID to determine what edit number must > * be synced to. > * - The isSyncRunning volatile boolean tracks whether a sync is currently > * under progress. > * > * The data is double-buffered within each edit log implementation so that > * in-memory writing can occur in parallel with the on-disk writing. > * > * Each sync occurs in three steps: > * 1. synchronized, it swaps the double buffer and sets the isSyncRunning > * flag. > * 2. unsynchronized, it flushes the data to storage > * 3. synchronized, it resets the flag and notifies anyone waiting on the > * sync. > * > * The lack of synchronization on step 2 allows other threads to continue > * to write into the memory buffer while the sync is in progress. > * Because this step is unsynchronized, actions that need to avoid > * concurrency with sync() should be synchronized and also call > * waitForSyncToFinish() before assuming they are running alone. > */ > public void logSync() { > long syncStart = 0; > // Fetch the transactionId of this thread. > long mytxid = myTransactionId.get().txid; > > boolean sync = false; > try { > EditLogOutputStream logStream = null; > synchronized (this) { > try { > printStatistics(false); > // if somebody is already syncing, then wait > while (mytxid > synctxid && isSyncRunning) { > try { > wait(1000); > } catch (InterruptedException ie) { > } > } > // > // If this transaction was already flushed, then nothing to do > // > if (mytxid <= synctxid) { > numTransactionsBatchedInSync++; > if (metrics != null) { > // Metrics is non-null only when used inside name node > metrics.incrTransactionsBatchedInSync(); > } > return; > } > > // now, this thread will do the sync > syncStart = txid; > isSyncRunning = true; > sync = true; > // swap buffers > try { > if (journalSet.isEmpty()) { > throw new
[jira] [Commented] (HDFS-14437) Exception happened when rollEditLog expects empty EditsDoubleBuffer.bufCurrent but not
[ https://issues.apache.org/jira/browse/HDFS-14437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16826604#comment-16826604 ] star commented on HDFS-14437: - As [~kihwal] said in HDFS-10943 that '{{rollEditLog()}} does not and cannot solely depend on {{FSEditLog}} synchronization', it's not enough to reproduce such issue in FSEditLog level. We should reproduce it in the FSNamesystem level or event RPC level. As far as I know, all method of FSEditLog are called in FSNamesystem with either readlock or writelock. > Exception happened when rollEditLog expects empty > EditsDoubleBuffer.bufCurrent but not > - > > Key: HDFS-14437 > URL: https://issues.apache.org/jira/browse/HDFS-14437 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, namenode, qjm >Reporter: angerszhu >Priority: Major > > For the problem mentioned in https://issues.apache.org/jira/browse/HDFS-10943 > , I have sort the process of write and flush EditLog and some important > function, I found the in the class FSEditLog class, the close() function > will call such process like below: > > {code:java} > waitForSyncToFinish(); > endCurrentLogSegment(true);{code} > since we have gain the object lock in the function close(), so when > waitForSyncToFish() method return, it mean all logSync job has done and all > data in bufReady has been flushed out, and since current thread has the lock > of this object, when call endCurrentLogSegment(), no other thread will gain > the lock so they can't write new editlog into currentBuf. > But when we don't call waitForSyncToFish() before endCurrentLogSegment(), > there may be some autoScheduled logSync()'s flush process is doing, since > this process don't need > synchronization since it has mention in the comment of logSync() method : > > {code:java} > /** > * Sync all modifications done by this thread. > * > * The internal concurrency design of this class is as follows: > * - Log items are written synchronized into an in-memory buffer, > * and each assigned a transaction ID. > * - When a thread (client) would like to sync all of its edits, logSync() > * uses a ThreadLocal transaction ID to determine what edit number must > * be synced to. > * - The isSyncRunning volatile boolean tracks whether a sync is currently > * under progress. > * > * The data is double-buffered within each edit log implementation so that > * in-memory writing can occur in parallel with the on-disk writing. > * > * Each sync occurs in three steps: > * 1. synchronized, it swaps the double buffer and sets the isSyncRunning > * flag. > * 2. unsynchronized, it flushes the data to storage > * 3. synchronized, it resets the flag and notifies anyone waiting on the > * sync. > * > * The lack of synchronization on step 2 allows other threads to continue > * to write into the memory buffer while the sync is in progress. > * Because this step is unsynchronized, actions that need to avoid > * concurrency with sync() should be synchronized and also call > * waitForSyncToFinish() before assuming they are running alone. > */ > public void logSync() { > long syncStart = 0; > // Fetch the transactionId of this thread. > long mytxid = myTransactionId.get().txid; > > boolean sync = false; > try { > EditLogOutputStream logStream = null; > synchronized (this) { > try { > printStatistics(false); > // if somebody is already syncing, then wait > while (mytxid > synctxid && isSyncRunning) { > try { > wait(1000); > } catch (InterruptedException ie) { > } > } > // > // If this transaction was already flushed, then nothing to do > // > if (mytxid <= synctxid) { > numTransactionsBatchedInSync++; > if (metrics != null) { > // Metrics is non-null only when used inside name node > metrics.incrTransactionsBatchedInSync(); > } > return; > } > > // now, this thread will do the sync > syncStart = txid; > isSyncRunning = true; > sync = true; > // swap buffers > try { > if (journalSet.isEmpty()) { > throw new IOException("No journals available to flush"); > } > editLogStream.setReadyToFlush(); > } catch (IOException e) { > final String msg = > "Could not sync enough journals to persistent storage " + > "due to " + e.getMessage() + ". " + > "Unsynced transactions: " + (txid - synctxid); > LOG.fatal(msg, new Exception()); > synchronized(journalSetLock) { >
[jira] [Commented] (HDFS-14437) Exception happened when rollEditLog expects empty EditsDoubleBuffer.bufCurrent but not
[ https://issues.apache.org/jira/browse/HDFS-14437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16823750#comment-16823750 ] star commented on HDFS-14437: - ok, Later I will take a view. > Exception happened when rollEditLog expects empty > EditsDoubleBuffer.bufCurrent but not > - > > Key: HDFS-14437 > URL: https://issues.apache.org/jira/browse/HDFS-14437 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, namenode, qjm >Reporter: angerszhu >Priority: Major > > For the problem mentioned in https://issues.apache.org/jira/browse/HDFS-10943 > , I have sort the process of write and flush EditLog and some important > function, I found the in the class FSEditLog class, the close() function > will call such process like below: > > {code:java} > waitForSyncToFinish(); > endCurrentLogSegment(true);{code} > since we have gain the object lock in the function close(), so when > waitForSyncToFish() method return, it mean all logSync job has done and all > data in bufReady has been flushed out, and since current thread has the lock > of this object, when call endCurrentLogSegment(), no other thread will gain > the lock so they can't write new editlog into currentBuf. > But when we don't call waitForSyncToFish() before endCurrentLogSegment(), > there may be some autoScheduled logSync()'s flush process is doing, since > this process don't need > synchronization since it has mention in the comment of logSync() method : > > {code:java} > /** > * Sync all modifications done by this thread. > * > * The internal concurrency design of this class is as follows: > * - Log items are written synchronized into an in-memory buffer, > * and each assigned a transaction ID. > * - When a thread (client) would like to sync all of its edits, logSync() > * uses a ThreadLocal transaction ID to determine what edit number must > * be synced to. > * - The isSyncRunning volatile boolean tracks whether a sync is currently > * under progress. > * > * The data is double-buffered within each edit log implementation so that > * in-memory writing can occur in parallel with the on-disk writing. > * > * Each sync occurs in three steps: > * 1. synchronized, it swaps the double buffer and sets the isSyncRunning > * flag. > * 2. unsynchronized, it flushes the data to storage > * 3. synchronized, it resets the flag and notifies anyone waiting on the > * sync. > * > * The lack of synchronization on step 2 allows other threads to continue > * to write into the memory buffer while the sync is in progress. > * Because this step is unsynchronized, actions that need to avoid > * concurrency with sync() should be synchronized and also call > * waitForSyncToFinish() before assuming they are running alone. > */ > public void logSync() { > long syncStart = 0; > // Fetch the transactionId of this thread. > long mytxid = myTransactionId.get().txid; > > boolean sync = false; > try { > EditLogOutputStream logStream = null; > synchronized (this) { > try { > printStatistics(false); > // if somebody is already syncing, then wait > while (mytxid > synctxid && isSyncRunning) { > try { > wait(1000); > } catch (InterruptedException ie) { > } > } > // > // If this transaction was already flushed, then nothing to do > // > if (mytxid <= synctxid) { > numTransactionsBatchedInSync++; > if (metrics != null) { > // Metrics is non-null only when used inside name node > metrics.incrTransactionsBatchedInSync(); > } > return; > } > > // now, this thread will do the sync > syncStart = txid; > isSyncRunning = true; > sync = true; > // swap buffers > try { > if (journalSet.isEmpty()) { > throw new IOException("No journals available to flush"); > } > editLogStream.setReadyToFlush(); > } catch (IOException e) { > final String msg = > "Could not sync enough journals to persistent storage " + > "due to " + e.getMessage() + ". " + > "Unsynced transactions: " + (txid - synctxid); > LOG.fatal(msg, new Exception()); > synchronized(journalSetLock) { > IOUtils.cleanup(LOG, journalSet); > } > terminate(1, msg); > } > } finally { > // Prevent RuntimeException from blocking other log edit write > doneWithAutoSyncScheduling(); > } > //editLogStream may become null, > //so store a local variable for flush. > logStream =
[jira] [Comment Edited] (HDFS-14437) Exception happened when rollEditLog expects empty EditsDoubleBuffer.bufCurrent but not
[ https://issues.apache.org/jira/browse/HDFS-14437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16823714#comment-16823714 ] star edited comment on HDFS-14437 at 4/23/19 6:55 AM: -- [~angerszhuuu], I think I've got that. ||Thread0||thread1||Thread2||locked||isSyncRunning|| | |flush| |false|true| |endCurrentSegment#{color:#33}logSyncAll{color}| | |true|true| |wait| | |false|true| | |flush done| |false|false| | | |*logEdit*|true|false| |{color:#d04437}finalize error{color}| | |true|false| was (Author: starphin): [~angerszhuuu], I think I've got that. ||Thread0||thread1||Thread2||locked||isSyncRunning|| | |flush| |false|true| |endCurrentSegment#{color:#33}logSyncAll{color}| | |true|true| |wait| | |false|true| | |flush done| |false|false| | | |*logAppend*|true|false| |{color:#d04437}finalize error{color}| | |true|false| > Exception happened when rollEditLog expects empty > EditsDoubleBuffer.bufCurrent but not > - > > Key: HDFS-14437 > URL: https://issues.apache.org/jira/browse/HDFS-14437 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, namenode, qjm >Reporter: angerszhu >Priority: Major > > For the problem mentioned in https://issues.apache.org/jira/browse/HDFS-10943 > , I have sort the process of write and flush EditLog and some important > function, I found the in the class FSEditLog class, the close() function > will call such process like below: > > {code:java} > waitForSyncToFinish(); > endCurrentLogSegment(true);{code} > since we have gain the object lock in the function close(), so when > waitForSyncToFish() method return, it mean all logSync job has done and all > data in bufReady has been flushed out, and since current thread has the lock > of this object, when call endCurrentLogSegment(), no other thread will gain > the lock so they can't write new editlog into currentBuf. > But when we don't call waitForSyncToFish() before endCurrentLogSegment(), > there may be some autoScheduled logSync()'s flush process is doing, since > this process don't need > synchronization since it has mention in the comment of logSync() method : > > {code:java} > /** > * Sync all modifications done by this thread. > * > * The internal concurrency design of this class is as follows: > * - Log items are written synchronized into an in-memory buffer, > * and each assigned a transaction ID. > * - When a thread (client) would like to sync all of its edits, logSync() > * uses a ThreadLocal transaction ID to determine what edit number must > * be synced to. > * - The isSyncRunning volatile boolean tracks whether a sync is currently > * under progress. > * > * The data is double-buffered within each edit log implementation so that > * in-memory writing can occur in parallel with the on-disk writing. > * > * Each sync occurs in three steps: > * 1. synchronized, it swaps the double buffer and sets the isSyncRunning > * flag. > * 2. unsynchronized, it flushes the data to storage > * 3. synchronized, it resets the flag and notifies anyone waiting on the > * sync. > * > * The lack of synchronization on step 2 allows other threads to continue > * to write into the memory buffer while the sync is in progress. > * Because this step is unsynchronized, actions that need to avoid > * concurrency with sync() should be synchronized and also call > * waitForSyncToFinish() before assuming they are running alone. > */ > public void logSync() { > long syncStart = 0; > // Fetch the transactionId of this thread. > long mytxid = myTransactionId.get().txid; > > boolean sync = false; > try { > EditLogOutputStream logStream = null; > synchronized (this) { > try { > printStatistics(false); > // if somebody is already syncing, then wait > while (mytxid > synctxid && isSyncRunning) { > try { > wait(1000); > } catch (InterruptedException ie) { > } > } > // > // If this transaction was already flushed, then nothing to do > // > if (mytxid <= synctxid) { > numTransactionsBatchedInSync++; > if (metrics != null) { > // Metrics is non-null only when used inside name node > metrics.incrTransactionsBatchedInSync(); > } > return; > } > > // now, this thread will do the sync > syncStart = txid; > isSyncRunning = true; > sync = true; > // swap buffers > try { > if (journalSet.isEmpty()) { > throw new IOException("No journals available to flush"); > } > editLogStream.setReadyToFlush(); > }
[jira] [Commented] (HDFS-14437) Exception happened when rollEditLog expects empty EditsDoubleBuffer.bufCurrent but not
[ https://issues.apache.org/jira/browse/HDFS-14437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16823742#comment-16823742 ] star commented on HDFS-14437: - [~angerszhuuu] Right. lock state fixed. {quote}But logAppend also need lock. {quote} > Exception happened when rollEditLog expects empty > EditsDoubleBuffer.bufCurrent but not > - > > Key: HDFS-14437 > URL: https://issues.apache.org/jira/browse/HDFS-14437 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, namenode, qjm >Reporter: angerszhu >Priority: Major > > For the problem mentioned in https://issues.apache.org/jira/browse/HDFS-10943 > , I have sort the process of write and flush EditLog and some important > function, I found the in the class FSEditLog class, the close() function > will call such process like below: > > {code:java} > waitForSyncToFinish(); > endCurrentLogSegment(true);{code} > since we have gain the object lock in the function close(), so when > waitForSyncToFish() method return, it mean all logSync job has done and all > data in bufReady has been flushed out, and since current thread has the lock > of this object, when call endCurrentLogSegment(), no other thread will gain > the lock so they can't write new editlog into currentBuf. > But when we don't call waitForSyncToFish() before endCurrentLogSegment(), > there may be some autoScheduled logSync()'s flush process is doing, since > this process don't need > synchronization since it has mention in the comment of logSync() method : > > {code:java} > /** > * Sync all modifications done by this thread. > * > * The internal concurrency design of this class is as follows: > * - Log items are written synchronized into an in-memory buffer, > * and each assigned a transaction ID. > * - When a thread (client) would like to sync all of its edits, logSync() > * uses a ThreadLocal transaction ID to determine what edit number must > * be synced to. > * - The isSyncRunning volatile boolean tracks whether a sync is currently > * under progress. > * > * The data is double-buffered within each edit log implementation so that > * in-memory writing can occur in parallel with the on-disk writing. > * > * Each sync occurs in three steps: > * 1. synchronized, it swaps the double buffer and sets the isSyncRunning > * flag. > * 2. unsynchronized, it flushes the data to storage > * 3. synchronized, it resets the flag and notifies anyone waiting on the > * sync. > * > * The lack of synchronization on step 2 allows other threads to continue > * to write into the memory buffer while the sync is in progress. > * Because this step is unsynchronized, actions that need to avoid > * concurrency with sync() should be synchronized and also call > * waitForSyncToFinish() before assuming they are running alone. > */ > public void logSync() { > long syncStart = 0; > // Fetch the transactionId of this thread. > long mytxid = myTransactionId.get().txid; > > boolean sync = false; > try { > EditLogOutputStream logStream = null; > synchronized (this) { > try { > printStatistics(false); > // if somebody is already syncing, then wait > while (mytxid > synctxid && isSyncRunning) { > try { > wait(1000); > } catch (InterruptedException ie) { > } > } > // > // If this transaction was already flushed, then nothing to do > // > if (mytxid <= synctxid) { > numTransactionsBatchedInSync++; > if (metrics != null) { > // Metrics is non-null only when used inside name node > metrics.incrTransactionsBatchedInSync(); > } > return; > } > > // now, this thread will do the sync > syncStart = txid; > isSyncRunning = true; > sync = true; > // swap buffers > try { > if (journalSet.isEmpty()) { > throw new IOException("No journals available to flush"); > } > editLogStream.setReadyToFlush(); > } catch (IOException e) { > final String msg = > "Could not sync enough journals to persistent storage " + > "due to " + e.getMessage() + ". " + > "Unsynced transactions: " + (txid - synctxid); > LOG.fatal(msg, new Exception()); > synchronized(journalSetLock) { > IOUtils.cleanup(LOG, journalSet); > } > terminate(1, msg); > } > } finally { > // Prevent RuntimeException from blocking other log edit write > doneWithAutoSyncScheduling(); > } > //editLogStream may become null, > //so
[jira] [Comment Edited] (HDFS-14437) Exception happened when rollEditLog expects empty EditsDoubleBuffer.bufCurrent but not
[ https://issues.apache.org/jira/browse/HDFS-14437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16823714#comment-16823714 ] star edited comment on HDFS-14437 at 4/23/19 6:38 AM: -- [~angerszhuuu], I think I've got that. ||Thread0||thread1||Thread2||locked||isSyncRunning|| | |flush| |false|true| |endCurrentSegment#{color:#33}logSyncAll{color}| | |true|true| |wait| | |false|true| | |flush done| |false|false| | | |*logAppend*|true|false| |{color:#d04437}finalize error{color}| | |true|false| was (Author: starphin): [~angerszhuuu], I think I've got that. ||Thread0||thread1||Thread2||locked||isSyncRunning|| | |flush| |false|true| |endCurrentSegment#{color:#33}logSyncAll{color}| | |true|true| |wait| | |false|true| | |flush done| |false|false| | | |*logAppend*|false|false| |{color:#d04437}finalize error{color}| | |true|false| > Exception happened when rollEditLog expects empty > EditsDoubleBuffer.bufCurrent but not > - > > Key: HDFS-14437 > URL: https://issues.apache.org/jira/browse/HDFS-14437 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, namenode, qjm >Reporter: angerszhu >Priority: Major > > For the problem mentioned in https://issues.apache.org/jira/browse/HDFS-10943 > , I have sort the process of write and flush EditLog and some important > function, I found the in the class FSEditLog class, the close() function > will call such process like below: > > {code:java} > waitForSyncToFinish(); > endCurrentLogSegment(true);{code} > since we have gain the object lock in the function close(), so when > waitForSyncToFish() method return, it mean all logSync job has done and all > data in bufReady has been flushed out, and since current thread has the lock > of this object, when call endCurrentLogSegment(), no other thread will gain > the lock so they can't write new editlog into currentBuf. > But when we don't call waitForSyncToFish() before endCurrentLogSegment(), > there may be some autoScheduled logSync()'s flush process is doing, since > this process don't need > synchronization since it has mention in the comment of logSync() method : > > {code:java} > /** > * Sync all modifications done by this thread. > * > * The internal concurrency design of this class is as follows: > * - Log items are written synchronized into an in-memory buffer, > * and each assigned a transaction ID. > * - When a thread (client) would like to sync all of its edits, logSync() > * uses a ThreadLocal transaction ID to determine what edit number must > * be synced to. > * - The isSyncRunning volatile boolean tracks whether a sync is currently > * under progress. > * > * The data is double-buffered within each edit log implementation so that > * in-memory writing can occur in parallel with the on-disk writing. > * > * Each sync occurs in three steps: > * 1. synchronized, it swaps the double buffer and sets the isSyncRunning > * flag. > * 2. unsynchronized, it flushes the data to storage > * 3. synchronized, it resets the flag and notifies anyone waiting on the > * sync. > * > * The lack of synchronization on step 2 allows other threads to continue > * to write into the memory buffer while the sync is in progress. > * Because this step is unsynchronized, actions that need to avoid > * concurrency with sync() should be synchronized and also call > * waitForSyncToFinish() before assuming they are running alone. > */ > public void logSync() { > long syncStart = 0; > // Fetch the transactionId of this thread. > long mytxid = myTransactionId.get().txid; > > boolean sync = false; > try { > EditLogOutputStream logStream = null; > synchronized (this) { > try { > printStatistics(false); > // if somebody is already syncing, then wait > while (mytxid > synctxid && isSyncRunning) { > try { > wait(1000); > } catch (InterruptedException ie) { > } > } > // > // If this transaction was already flushed, then nothing to do > // > if (mytxid <= synctxid) { > numTransactionsBatchedInSync++; > if (metrics != null) { > // Metrics is non-null only when used inside name node > metrics.incrTransactionsBatchedInSync(); > } > return; > } > > // now, this thread will do the sync > syncStart = txid; > isSyncRunning = true; > sync = true; > // swap buffers > try { > if (journalSet.isEmpty()) { > throw new IOException("No journals available to flush"); > } > editLogStream.setReadyToFlush(); >
[jira] [Commented] (HDFS-14437) Exception happened when rollEditLog expects empty EditsDoubleBuffer.bufCurrent but not
[ https://issues.apache.org/jira/browse/HDFS-14437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16823726#comment-16823726 ] star commented on HDFS-14437: - locked is object lock of FSEditLog. > Exception happened when rollEditLog expects empty > EditsDoubleBuffer.bufCurrent but not > - > > Key: HDFS-14437 > URL: https://issues.apache.org/jira/browse/HDFS-14437 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, namenode, qjm >Reporter: angerszhu >Priority: Major > > For the problem mentioned in https://issues.apache.org/jira/browse/HDFS-10943 > , I have sort the process of write and flush EditLog and some important > function, I found the in the class FSEditLog class, the close() function > will call such process like below: > > {code:java} > waitForSyncToFinish(); > endCurrentLogSegment(true);{code} > since we have gain the object lock in the function close(), so when > waitForSyncToFish() method return, it mean all logSync job has done and all > data in bufReady has been flushed out, and since current thread has the lock > of this object, when call endCurrentLogSegment(), no other thread will gain > the lock so they can't write new editlog into currentBuf. > But when we don't call waitForSyncToFish() before endCurrentLogSegment(), > there may be some autoScheduled logSync()'s flush process is doing, since > this process don't need > synchronization since it has mention in the comment of logSync() method : > > {code:java} > /** > * Sync all modifications done by this thread. > * > * The internal concurrency design of this class is as follows: > * - Log items are written synchronized into an in-memory buffer, > * and each assigned a transaction ID. > * - When a thread (client) would like to sync all of its edits, logSync() > * uses a ThreadLocal transaction ID to determine what edit number must > * be synced to. > * - The isSyncRunning volatile boolean tracks whether a sync is currently > * under progress. > * > * The data is double-buffered within each edit log implementation so that > * in-memory writing can occur in parallel with the on-disk writing. > * > * Each sync occurs in three steps: > * 1. synchronized, it swaps the double buffer and sets the isSyncRunning > * flag. > * 2. unsynchronized, it flushes the data to storage > * 3. synchronized, it resets the flag and notifies anyone waiting on the > * sync. > * > * The lack of synchronization on step 2 allows other threads to continue > * to write into the memory buffer while the sync is in progress. > * Because this step is unsynchronized, actions that need to avoid > * concurrency with sync() should be synchronized and also call > * waitForSyncToFinish() before assuming they are running alone. > */ > public void logSync() { > long syncStart = 0; > // Fetch the transactionId of this thread. > long mytxid = myTransactionId.get().txid; > > boolean sync = false; > try { > EditLogOutputStream logStream = null; > synchronized (this) { > try { > printStatistics(false); > // if somebody is already syncing, then wait > while (mytxid > synctxid && isSyncRunning) { > try { > wait(1000); > } catch (InterruptedException ie) { > } > } > // > // If this transaction was already flushed, then nothing to do > // > if (mytxid <= synctxid) { > numTransactionsBatchedInSync++; > if (metrics != null) { > // Metrics is non-null only when used inside name node > metrics.incrTransactionsBatchedInSync(); > } > return; > } > > // now, this thread will do the sync > syncStart = txid; > isSyncRunning = true; > sync = true; > // swap buffers > try { > if (journalSet.isEmpty()) { > throw new IOException("No journals available to flush"); > } > editLogStream.setReadyToFlush(); > } catch (IOException e) { > final String msg = > "Could not sync enough journals to persistent storage " + > "due to " + e.getMessage() + ". " + > "Unsynced transactions: " + (txid - synctxid); > LOG.fatal(msg, new Exception()); > synchronized(journalSetLock) { > IOUtils.cleanup(LOG, journalSet); > } > terminate(1, msg); > } > } finally { > // Prevent RuntimeException from blocking other log edit write > doneWithAutoSyncScheduling(); > } > //editLogStream may become null, > //so store a local variable for flush. > logStream
[jira] [Comment Edited] (HDFS-14437) Exception happened when rollEditLog expects empty EditsDoubleBuffer.bufCurrent but not
[ https://issues.apache.org/jira/browse/HDFS-14437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16823726#comment-16823726 ] star edited comment on HDFS-14437 at 4/23/19 6:24 AM: -- locked is the state of object lock for FSEditLog. was (Author: starphin): locked is object lock of FSEditLog. > Exception happened when rollEditLog expects empty > EditsDoubleBuffer.bufCurrent but not > - > > Key: HDFS-14437 > URL: https://issues.apache.org/jira/browse/HDFS-14437 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, namenode, qjm >Reporter: angerszhu >Priority: Major > > For the problem mentioned in https://issues.apache.org/jira/browse/HDFS-10943 > , I have sort the process of write and flush EditLog and some important > function, I found the in the class FSEditLog class, the close() function > will call such process like below: > > {code:java} > waitForSyncToFinish(); > endCurrentLogSegment(true);{code} > since we have gain the object lock in the function close(), so when > waitForSyncToFish() method return, it mean all logSync job has done and all > data in bufReady has been flushed out, and since current thread has the lock > of this object, when call endCurrentLogSegment(), no other thread will gain > the lock so they can't write new editlog into currentBuf. > But when we don't call waitForSyncToFish() before endCurrentLogSegment(), > there may be some autoScheduled logSync()'s flush process is doing, since > this process don't need > synchronization since it has mention in the comment of logSync() method : > > {code:java} > /** > * Sync all modifications done by this thread. > * > * The internal concurrency design of this class is as follows: > * - Log items are written synchronized into an in-memory buffer, > * and each assigned a transaction ID. > * - When a thread (client) would like to sync all of its edits, logSync() > * uses a ThreadLocal transaction ID to determine what edit number must > * be synced to. > * - The isSyncRunning volatile boolean tracks whether a sync is currently > * under progress. > * > * The data is double-buffered within each edit log implementation so that > * in-memory writing can occur in parallel with the on-disk writing. > * > * Each sync occurs in three steps: > * 1. synchronized, it swaps the double buffer and sets the isSyncRunning > * flag. > * 2. unsynchronized, it flushes the data to storage > * 3. synchronized, it resets the flag and notifies anyone waiting on the > * sync. > * > * The lack of synchronization on step 2 allows other threads to continue > * to write into the memory buffer while the sync is in progress. > * Because this step is unsynchronized, actions that need to avoid > * concurrency with sync() should be synchronized and also call > * waitForSyncToFinish() before assuming they are running alone. > */ > public void logSync() { > long syncStart = 0; > // Fetch the transactionId of this thread. > long mytxid = myTransactionId.get().txid; > > boolean sync = false; > try { > EditLogOutputStream logStream = null; > synchronized (this) { > try { > printStatistics(false); > // if somebody is already syncing, then wait > while (mytxid > synctxid && isSyncRunning) { > try { > wait(1000); > } catch (InterruptedException ie) { > } > } > // > // If this transaction was already flushed, then nothing to do > // > if (mytxid <= synctxid) { > numTransactionsBatchedInSync++; > if (metrics != null) { > // Metrics is non-null only when used inside name node > metrics.incrTransactionsBatchedInSync(); > } > return; > } > > // now, this thread will do the sync > syncStart = txid; > isSyncRunning = true; > sync = true; > // swap buffers > try { > if (journalSet.isEmpty()) { > throw new IOException("No journals available to flush"); > } > editLogStream.setReadyToFlush(); > } catch (IOException e) { > final String msg = > "Could not sync enough journals to persistent storage " + > "due to " + e.getMessage() + ". " + > "Unsynced transactions: " + (txid - synctxid); > LOG.fatal(msg, new Exception()); > synchronized(journalSetLock) { > IOUtils.cleanup(LOG, journalSet); > } > terminate(1, msg); > } > } finally { > // Prevent RuntimeException from blocking other log edit write >
[jira] [Commented] (HDFS-14437) Exception happened when rollEditLog expects empty EditsDoubleBuffer.bufCurrent but not
[ https://issues.apache.org/jira/browse/HDFS-14437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16823714#comment-16823714 ] star commented on HDFS-14437: - [~angerszhuuu], I think I've got that. ||Thread0||thread1||Thread2||locked||isSyncRunning|| | |flush| |false|true| |endCurrentSegment#{color:#33}logSyncAll{color}| | |true|true| |wait| | |false|true| | |flush done| |false|false| | | |*logAppend*|false|false| |{color:#d04437}finalize error{color}| | |true|false| > Exception happened when rollEditLog expects empty > EditsDoubleBuffer.bufCurrent but not > - > > Key: HDFS-14437 > URL: https://issues.apache.org/jira/browse/HDFS-14437 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, namenode, qjm >Reporter: angerszhu >Priority: Major > > For the problem mentioned in https://issues.apache.org/jira/browse/HDFS-10943 > , I have sort the process of write and flush EditLog and some important > function, I found the in the class FSEditLog class, the close() function > will call such process like below: > > {code:java} > waitForSyncToFinish(); > endCurrentLogSegment(true);{code} > since we have gain the object lock in the function close(), so when > waitForSyncToFish() method return, it mean all logSync job has done and all > data in bufReady has been flushed out, and since current thread has the lock > of this object, when call endCurrentLogSegment(), no other thread will gain > the lock so they can't write new editlog into currentBuf. > But when we don't call waitForSyncToFish() before endCurrentLogSegment(), > there may be some autoScheduled logSync()'s flush process is doing, since > this process don't need > synchronization since it has mention in the comment of logSync() method : > > {code:java} > /** > * Sync all modifications done by this thread. > * > * The internal concurrency design of this class is as follows: > * - Log items are written synchronized into an in-memory buffer, > * and each assigned a transaction ID. > * - When a thread (client) would like to sync all of its edits, logSync() > * uses a ThreadLocal transaction ID to determine what edit number must > * be synced to. > * - The isSyncRunning volatile boolean tracks whether a sync is currently > * under progress. > * > * The data is double-buffered within each edit log implementation so that > * in-memory writing can occur in parallel with the on-disk writing. > * > * Each sync occurs in three steps: > * 1. synchronized, it swaps the double buffer and sets the isSyncRunning > * flag. > * 2. unsynchronized, it flushes the data to storage > * 3. synchronized, it resets the flag and notifies anyone waiting on the > * sync. > * > * The lack of synchronization on step 2 allows other threads to continue > * to write into the memory buffer while the sync is in progress. > * Because this step is unsynchronized, actions that need to avoid > * concurrency with sync() should be synchronized and also call > * waitForSyncToFinish() before assuming they are running alone. > */ > public void logSync() { > long syncStart = 0; > // Fetch the transactionId of this thread. > long mytxid = myTransactionId.get().txid; > > boolean sync = false; > try { > EditLogOutputStream logStream = null; > synchronized (this) { > try { > printStatistics(false); > // if somebody is already syncing, then wait > while (mytxid > synctxid && isSyncRunning) { > try { > wait(1000); > } catch (InterruptedException ie) { > } > } > // > // If this transaction was already flushed, then nothing to do > // > if (mytxid <= synctxid) { > numTransactionsBatchedInSync++; > if (metrics != null) { > // Metrics is non-null only when used inside name node > metrics.incrTransactionsBatchedInSync(); > } > return; > } > > // now, this thread will do the sync > syncStart = txid; > isSyncRunning = true; > sync = true; > // swap buffers > try { > if (journalSet.isEmpty()) { > throw new IOException("No journals available to flush"); > } > editLogStream.setReadyToFlush(); > } catch (IOException e) { > final String msg = > "Could not sync enough journals to persistent storage " + > "due to " + e.getMessage() + ". " + > "Unsynced transactions: " + (txid - synctxid); > LOG.fatal(msg, new Exception()); > synchronized(journalSetLock) { > IOUtils.cleanup(LOG, journalSet); > } >
[jira] [Commented] (HDFS-14378) Simplify the design of multiple NN and both logic of edit log roll and checkpoint
[ https://issues.apache.org/jira/browse/HDFS-14378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16822296#comment-16822296 ] star commented on HDFS-14378: - Failed test pass locally. > Simplify the design of multiple NN and both logic of edit log roll and > checkpoint > - > > Key: HDFS-14378 > URL: https://issues.apache.org/jira/browse/HDFS-14378 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.1.2 >Reporter: star >Assignee: star >Priority: Minor > Labels: patch > Attachments: HDFS-14378-trunk.001.patch, HDFS-14378-trunk.002.patch, > HDFS-14378-trunk.003.patch, HDFS-14378-trunk.004.patch, > HDFS-14378-trunk.005.patch, HDFS-14378-trunk.006.patch > > > HDFS-6440 introduced a mechanism to support more than 2 NNs. It > implements a first-writer-win policy to avoid duplicated fsimage downloading. > Variable 'isPrimaryCheckPointer' is used to hold the first-writer state, with > which SNN will provide fsimage for ANN next time. Then we have three roles in > NN cluster: ANN, one primary SNN, one or more normal SNN. > Since HDFS-12248, there may be more than two primary SNN shortly after > a exception occurred. It takes care with a scenario that SNN will not upload > fsimage on IOE and Interrupted exceptions. Though it will not cause any > further functional issues, it is inconsistent. > Futher more, edit log may be rolled more frequently than necessary with > multiple Standby name nodes, HDFS-14349. (I'm not so sure about this, will > verify by unit tests or any one could point it out.) > Above all, I‘m wondering if we could make it simple with following > changes: > * There are only two roles:ANN, SNN > * ANN will roll its edit log every DFS_HA_LOGROLL_PERIOD_KEY period. > * ANN will select a SNN to download checkpoint. > SNN will just do logtail and checkpoint. Then provide a servlet for fsimage > downloading as normal. SNN will not try to roll edit log or send checkpoint > request to ANN. > In a word, ANN will be more active. Suggestions are welcomed. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-14439) FileNotFoundException thrown in TestEditLogRace when just add a setPermission operation
[ https://issues.apache.org/jira/browse/HDFS-14439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] star resolved HDFS-14439. - Resolution: Not A Problem > FileNotFoundException thrown in TestEditLogRace when just add a setPermission > operation > --- > > Key: HDFS-14439 > URL: https://issues.apache.org/jira/browse/HDFS-14439 > Project: Hadoop HDFS > Issue Type: Bug > Components: fs >Reporter: star >Assignee: star >Priority: Minor > > In TestEditLogRace.Transactions#run, add following code between mkdirs and > delete. > > {panel} > fs.setPermission(dirnamePath, p); > {panel} > It will be like. > {panel} > fs.mkdirs(dirnamePath); > *fs.setPermission(dirnamePath, p);* > fs.delete(dirnamePath, true);{panel} > then run test TestEditLogRace#testEditLogRolling, it will throw > FileNotFoundException. > {code:java} > java.io.FileNotFoundException: cannot find /thr-291-dir-2 > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.resolveLastINode(FSDirectory.java:1524) > at > org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.unprotectedSetPermission(FSDirAttrOp.java:264) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:656) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:287) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:182) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:159) > at > org.apache.hadoop.hdfs.server.namenode.TestEditLogRace.verifyEditLogs(TestEditLogRace.java:293) > at > org.apache.hadoop.hdfs.server.namenode.TestEditLogRace.testEditLogRolling(TestEditLogRace.java:258) > {code} > It happens when verifing edit logs and finds the target dir does not exits. > Any one could help figure out whether it makes sense or what makes it > behaving that way? -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14439) FileNotFoundException thrown in TestEditLogRace when just add a setPermission operation
[ https://issues.apache.org/jira/browse/HDFS-14439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16822295#comment-16822295 ] star commented on HDFS-14439: - Not a issue. Function verifyEditlogs load single editlogs segment, which makes it throws such exception. It happens when mkdir op is in previous segment and verify current segment with setPermission op. > FileNotFoundException thrown in TestEditLogRace when just add a setPermission > operation > --- > > Key: HDFS-14439 > URL: https://issues.apache.org/jira/browse/HDFS-14439 > Project: Hadoop HDFS > Issue Type: Bug > Components: fs >Reporter: star >Assignee: star >Priority: Minor > > In TestEditLogRace.Transactions#run, add following code between mkdirs and > delete. > > {panel} > fs.setPermission(dirnamePath, p); > {panel} > It will be like. > {panel} > fs.mkdirs(dirnamePath); > *fs.setPermission(dirnamePath, p);* > fs.delete(dirnamePath, true);{panel} > then run test TestEditLogRace#testEditLogRolling, it will throw > FileNotFoundException. > {code:java} > java.io.FileNotFoundException: cannot find /thr-291-dir-2 > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.resolveLastINode(FSDirectory.java:1524) > at > org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.unprotectedSetPermission(FSDirAttrOp.java:264) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:656) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:287) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:182) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:159) > at > org.apache.hadoop.hdfs.server.namenode.TestEditLogRace.verifyEditLogs(TestEditLogRace.java:293) > at > org.apache.hadoop.hdfs.server.namenode.TestEditLogRace.testEditLogRolling(TestEditLogRace.java:258) > {code} > It happens when verifing edit logs and finds the target dir does not exits. > Any one could help figure out whether it makes sense or what makes it > behaving that way? -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14378) Simplify the design of multiple NN and both logic of edit log roll and checkpoint
[ https://issues.apache.org/jira/browse/HDFS-14378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] star updated HDFS-14378: Attachment: HDFS-14378-trunk.006.patch > Simplify the design of multiple NN and both logic of edit log roll and > checkpoint > - > > Key: HDFS-14378 > URL: https://issues.apache.org/jira/browse/HDFS-14378 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.1.2 >Reporter: star >Assignee: star >Priority: Minor > Labels: patch > Attachments: HDFS-14378-trunk.001.patch, HDFS-14378-trunk.002.patch, > HDFS-14378-trunk.003.patch, HDFS-14378-trunk.004.patch, > HDFS-14378-trunk.005.patch, HDFS-14378-trunk.006.patch > > > HDFS-6440 introduced a mechanism to support more than 2 NNs. It > implements a first-writer-win policy to avoid duplicated fsimage downloading. > Variable 'isPrimaryCheckPointer' is used to hold the first-writer state, with > which SNN will provide fsimage for ANN next time. Then we have three roles in > NN cluster: ANN, one primary SNN, one or more normal SNN. > Since HDFS-12248, there may be more than two primary SNN shortly after > a exception occurred. It takes care with a scenario that SNN will not upload > fsimage on IOE and Interrupted exceptions. Though it will not cause any > further functional issues, it is inconsistent. > Futher more, edit log may be rolled more frequently than necessary with > multiple Standby name nodes, HDFS-14349. (I'm not so sure about this, will > verify by unit tests or any one could point it out.) > Above all, I‘m wondering if we could make it simple with following > changes: > * There are only two roles:ANN, SNN > * ANN will roll its edit log every DFS_HA_LOGROLL_PERIOD_KEY period. > * ANN will select a SNN to download checkpoint. > SNN will just do logtail and checkpoint. Then provide a servlet for fsimage > downloading as normal. SNN will not try to roll edit log or send checkpoint > request to ANN. > In a word, ANN will be more active. Suggestions are welcomed. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14437) Exception happened when rollEditLog expects empty EditsDoubleBuffer.bufCurrent but not
[ https://issues.apache.org/jira/browse/HDFS-14437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16821862#comment-16821862 ] star commented on HDFS-14437: - [~angerszhuuu], better upload your unit test code for others to reproduce the issue. > Exception happened when rollEditLog expects empty > EditsDoubleBuffer.bufCurrent but not > - > > Key: HDFS-14437 > URL: https://issues.apache.org/jira/browse/HDFS-14437 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, namenode, qjm >Reporter: angerszhu >Priority: Major > > For the problem mentioned in https://issues.apache.org/jira/browse/HDFS-10943 > , I have sort the process of write and flush EditLog and some important > function, I found the in the class FSEditLog class, the close() function > will call such process like below: > > {code:java} > waitForSyncToFinish(); > endCurrentLogSegment(true);{code} > since we have gain the object lock in the function close(), so when > waitForSyncToFish() method return, it mean all logSync job has done and all > data in bufReady has been flushed out, and since current thread has the lock > of this object, when call endCurrentLogSegment(), no other thread will gain > the lock so they can't write new editlog into currentBuf. > But when we don't call waitForSyncToFish() before endCurrentLogSegment(), > there may be some autoScheduled logSync()'s flush process is doing, since > this process don't need > synchronization since it has mention in the comment of logSync() method : > > {code:java} > /** > * Sync all modifications done by this thread. > * > * The internal concurrency design of this class is as follows: > * - Log items are written synchronized into an in-memory buffer, > * and each assigned a transaction ID. > * - When a thread (client) would like to sync all of its edits, logSync() > * uses a ThreadLocal transaction ID to determine what edit number must > * be synced to. > * - The isSyncRunning volatile boolean tracks whether a sync is currently > * under progress. > * > * The data is double-buffered within each edit log implementation so that > * in-memory writing can occur in parallel with the on-disk writing. > * > * Each sync occurs in three steps: > * 1. synchronized, it swaps the double buffer and sets the isSyncRunning > * flag. > * 2. unsynchronized, it flushes the data to storage > * 3. synchronized, it resets the flag and notifies anyone waiting on the > * sync. > * > * The lack of synchronization on step 2 allows other threads to continue > * to write into the memory buffer while the sync is in progress. > * Because this step is unsynchronized, actions that need to avoid > * concurrency with sync() should be synchronized and also call > * waitForSyncToFinish() before assuming they are running alone. > */ > public void logSync() { > long syncStart = 0; > // Fetch the transactionId of this thread. > long mytxid = myTransactionId.get().txid; > > boolean sync = false; > try { > EditLogOutputStream logStream = null; > synchronized (this) { > try { > printStatistics(false); > // if somebody is already syncing, then wait > while (mytxid > synctxid && isSyncRunning) { > try { > wait(1000); > } catch (InterruptedException ie) { > } > } > // > // If this transaction was already flushed, then nothing to do > // > if (mytxid <= synctxid) { > numTransactionsBatchedInSync++; > if (metrics != null) { > // Metrics is non-null only when used inside name node > metrics.incrTransactionsBatchedInSync(); > } > return; > } > > // now, this thread will do the sync > syncStart = txid; > isSyncRunning = true; > sync = true; > // swap buffers > try { > if (journalSet.isEmpty()) { > throw new IOException("No journals available to flush"); > } > editLogStream.setReadyToFlush(); > } catch (IOException e) { > final String msg = > "Could not sync enough journals to persistent storage " + > "due to " + e.getMessage() + ". " + > "Unsynced transactions: " + (txid - synctxid); > LOG.fatal(msg, new Exception()); > synchronized(journalSetLock) { > IOUtils.cleanup(LOG, journalSet); > } > terminate(1, msg); > } > } finally { > // Prevent RuntimeException from blocking other log edit write > doneWithAutoSyncScheduling(); > } > //editLogStream may become null, > //so
[jira] [Assigned] (HDFS-14439) FileNotFoundException thrown in TestEditLogRace when just add a setPermission operation
[ https://issues.apache.org/jira/browse/HDFS-14439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] star reassigned HDFS-14439: --- Assignee: star > FileNotFoundException thrown in TestEditLogRace when just add a setPermission > operation > --- > > Key: HDFS-14439 > URL: https://issues.apache.org/jira/browse/HDFS-14439 > Project: Hadoop HDFS > Issue Type: Bug > Components: fs >Reporter: star >Assignee: star >Priority: Minor > > In TestEditLogRace.Transactions#run, add following code between mkdirs and > delete. > > {panel} > fs.setPermission(dirnamePath, p); > {panel} > It will be like. > {panel} > fs.mkdirs(dirnamePath); > *fs.setPermission(dirnamePath, p);* > fs.delete(dirnamePath, true);{panel} > then run test TestEditLogRace#testEditLogRolling, it will throw > FileNotFoundException. > {code:java} > java.io.FileNotFoundException: cannot find /thr-291-dir-2 > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.resolveLastINode(FSDirectory.java:1524) > at > org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.unprotectedSetPermission(FSDirAttrOp.java:264) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:656) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:287) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:182) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:159) > at > org.apache.hadoop.hdfs.server.namenode.TestEditLogRace.verifyEditLogs(TestEditLogRace.java:293) > at > org.apache.hadoop.hdfs.server.namenode.TestEditLogRace.testEditLogRolling(TestEditLogRace.java:258) > {code} > It happens when verifing edit logs and finds the target dir does not exits. > Any one could help figure out whether it makes sense or what makes it > behaving that way? -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14378) Simplify the design of multiple NN and both logic of edit log roll and checkpoint
[ https://issues.apache.org/jira/browse/HDFS-14378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] star updated HDFS-14378: Attachment: HDFS-14378-trunk.005.patch > Simplify the design of multiple NN and both logic of edit log roll and > checkpoint > - > > Key: HDFS-14378 > URL: https://issues.apache.org/jira/browse/HDFS-14378 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.1.2 >Reporter: star >Assignee: star >Priority: Minor > Labels: patch > Attachments: HDFS-14378-trunk.001.patch, HDFS-14378-trunk.002.patch, > HDFS-14378-trunk.003.patch, HDFS-14378-trunk.004.patch, > HDFS-14378-trunk.005.patch > > > HDFS-6440 introduced a mechanism to support more than 2 NNs. It > implements a first-writer-win policy to avoid duplicated fsimage downloading. > Variable 'isPrimaryCheckPointer' is used to hold the first-writer state, with > which SNN will provide fsimage for ANN next time. Then we have three roles in > NN cluster: ANN, one primary SNN, one or more normal SNN. > Since HDFS-12248, there may be more than two primary SNN shortly after > a exception occurred. It takes care with a scenario that SNN will not upload > fsimage on IOE and Interrupted exceptions. Though it will not cause any > further functional issues, it is inconsistent. > Futher more, edit log may be rolled more frequently than necessary with > multiple Standby name nodes, HDFS-14349. (I'm not so sure about this, will > verify by unit tests or any one could point it out.) > Above all, I‘m wondering if we could make it simple with following > changes: > * There are only two roles:ANN, SNN > * ANN will roll its edit log every DFS_HA_LOGROLL_PERIOD_KEY period. > * ANN will select a SNN to download checkpoint. > SNN will just do logtail and checkpoint. Then provide a servlet for fsimage > downloading as normal. SNN will not try to roll edit log or send checkpoint > request to ANN. > In a word, ANN will be more active. Suggestions are welcomed. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-14439) FileNotFoundException thrown in TestEditLogRace when just add a setPermission operation
star created HDFS-14439: --- Summary: FileNotFoundException thrown in TestEditLogRace when just add a setPermission operation Key: HDFS-14439 URL: https://issues.apache.org/jira/browse/HDFS-14439 Project: Hadoop HDFS Issue Type: Bug Components: fs Reporter: star In TestEditLogRace.Transactions#run, add following code between mkdirs and delete. {panel} fs.setPermission(dirnamePath, p); {panel} It will be like. {panel} fs.mkdirs(dirnamePath); *fs.setPermission(dirnamePath, p);* fs.delete(dirnamePath, true);{panel} then run test TestEditLogRace#testEditLogRolling, it will throw FileNotFoundException. {code:java} java.io.FileNotFoundException: cannot find /thr-291-dir-2 at org.apache.hadoop.hdfs.server.namenode.FSDirectory.resolveLastINode(FSDirectory.java:1524) at org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.unprotectedSetPermission(FSDirAttrOp.java:264) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:656) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:287) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:182) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:159) at org.apache.hadoop.hdfs.server.namenode.TestEditLogRace.verifyEditLogs(TestEditLogRace.java:293) at org.apache.hadoop.hdfs.server.namenode.TestEditLogRace.testEditLogRolling(TestEditLogRace.java:258) {code} It happens when verifing edit logs and finds the target dir does not exits. Any one could help figure out whether it makes sense or what makes it behaving that way? -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14437) Exception happened when rollEditLog expects empty EditsDoubleBuffer.bufCurrent but not
[ https://issues.apache.org/jira/browse/HDFS-14437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16821092#comment-16821092 ] star commented on HDFS-14437: - [~angerszhuuu]. If fsLock ignored, the senario He Xiaoqiao listed could occur because flush doesn't need object lock of FSEditlog. > Exception happened when rollEditLog expects empty > EditsDoubleBuffer.bufCurrent but not > - > > Key: HDFS-14437 > URL: https://issues.apache.org/jira/browse/HDFS-14437 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, namenode, qjm >Reporter: angerszhu >Priority: Major > > For the problem mentioned in https://issues.apache.org/jira/browse/HDFS-10943 > , I have sort the process of write and flush EditLog and some important > function, I found the in the class FSEditLog class, the close() function > will call such process like below: > > {code:java} > waitForSyncToFinish(); > endCurrentLogSegment(true);{code} > since we have gain the object lock in the function close(), so when > waitForSyncToFish() method return, it mean all logSync job has done and all > data in bufReady has been flushed out, and since current thread has the lock > of this object, when call endCurrentLogSegment(), no other thread will gain > the lock so they can't write new editlog into currentBuf. > But when we don't call waitForSyncToFish() before endCurrentLogSegment(), > there may be some autoScheduled logSync()'s flush process is doing, since > this process don't need > synchronization since it has mention in the comment of logSync() method : > > {code:java} > /** > * Sync all modifications done by this thread. > * > * The internal concurrency design of this class is as follows: > * - Log items are written synchronized into an in-memory buffer, > * and each assigned a transaction ID. > * - When a thread (client) would like to sync all of its edits, logSync() > * uses a ThreadLocal transaction ID to determine what edit number must > * be synced to. > * - The isSyncRunning volatile boolean tracks whether a sync is currently > * under progress. > * > * The data is double-buffered within each edit log implementation so that > * in-memory writing can occur in parallel with the on-disk writing. > * > * Each sync occurs in three steps: > * 1. synchronized, it swaps the double buffer and sets the isSyncRunning > * flag. > * 2. unsynchronized, it flushes the data to storage > * 3. synchronized, it resets the flag and notifies anyone waiting on the > * sync. > * > * The lack of synchronization on step 2 allows other threads to continue > * to write into the memory buffer while the sync is in progress. > * Because this step is unsynchronized, actions that need to avoid > * concurrency with sync() should be synchronized and also call > * waitForSyncToFinish() before assuming they are running alone. > */ > public void logSync() { > long syncStart = 0; > // Fetch the transactionId of this thread. > long mytxid = myTransactionId.get().txid; > > boolean sync = false; > try { > EditLogOutputStream logStream = null; > synchronized (this) { > try { > printStatistics(false); > // if somebody is already syncing, then wait > while (mytxid > synctxid && isSyncRunning) { > try { > wait(1000); > } catch (InterruptedException ie) { > } > } > // > // If this transaction was already flushed, then nothing to do > // > if (mytxid <= synctxid) { > numTransactionsBatchedInSync++; > if (metrics != null) { > // Metrics is non-null only when used inside name node > metrics.incrTransactionsBatchedInSync(); > } > return; > } > > // now, this thread will do the sync > syncStart = txid; > isSyncRunning = true; > sync = true; > // swap buffers > try { > if (journalSet.isEmpty()) { > throw new IOException("No journals available to flush"); > } > editLogStream.setReadyToFlush(); > } catch (IOException e) { > final String msg = > "Could not sync enough journals to persistent storage " + > "due to " + e.getMessage() + ". " + > "Unsynced transactions: " + (txid - synctxid); > LOG.fatal(msg, new Exception()); > synchronized(journalSetLock) { > IOUtils.cleanup(LOG, journalSet); > } > terminate(1, msg); > } > } finally { > // Prevent RuntimeException from blocking other log edit write > doneWithAutoSyncScheduling(); > } >
[jira] [Commented] (HDFS-14436) Configuration#getTimeDuration is not consistent between default value and manual settings.
[ https://issues.apache.org/jira/browse/HDFS-14436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16821045#comment-16821045 ] star commented on HDFS-14436: - Maybe I should move it to hadoop-common project. > Configuration#getTimeDuration is not consistent between default value and > manual settings. > -- > > Key: HDFS-14436 > URL: https://issues.apache.org/jira/browse/HDFS-14436 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: star >Assignee: star >Priority: Major > Attachments: HDFS-14436.001.patch > > > When call getTimeDuration like this: > {quote}conf.getTimeDuration("nn.interval", 10, > TimeUnit.{color:#9876aa}SECONDS{color}{color:#cc7832}, > {color}TimeUnit.{color:#9876aa}MILLISECONDS{color}){color:#cc7832};{color} > {quote} > {color:#33}If "nn.interval" is set manually or configured in xml file, > 1 will be retrurned.{color} > If not, 10 will be returned while 1 is expected. > The logic is not consistent. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14436) Configuration#getTimeDuration is not consistent between default value and manual settings.
[ https://issues.apache.org/jira/browse/HDFS-14436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16821044#comment-16821044 ] star commented on HDFS-14436: - It is a little wired to return only the raw literal number when default time units and literal number is provided. It may mislead us to unexpected behave if we just take into account its name and parameters. Though changing long 10 to string '10s' will return expected result, I think it should return the same result when long 10 and default time unit SECOND is given. > Configuration#getTimeDuration is not consistent between default value and > manual settings. > -- > > Key: HDFS-14436 > URL: https://issues.apache.org/jira/browse/HDFS-14436 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: star >Assignee: star >Priority: Major > Attachments: HDFS-14436.001.patch > > > When call getTimeDuration like this: > {quote}conf.getTimeDuration("nn.interval", 10, > TimeUnit.{color:#9876aa}SECONDS{color}{color:#cc7832}, > {color}TimeUnit.{color:#9876aa}MILLISECONDS{color}){color:#cc7832};{color} > {quote} > {color:#33}If "nn.interval" is set manually or configured in xml file, > 1 will be retrurned.{color} > If not, 10 will be returned while 1 is expected. > The logic is not consistent. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14436) Configuration#getTimeDuration is not consistent between default value and manual settings.
[ https://issues.apache.org/jira/browse/HDFS-14436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] star updated HDFS-14436: Attachment: HDFS-14436.001.patch > Configuration#getTimeDuration is not consistent between default value and > manual settings. > -- > > Key: HDFS-14436 > URL: https://issues.apache.org/jira/browse/HDFS-14436 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: star >Assignee: star >Priority: Major > Attachments: HDFS-14436.001.patch > > > When call getTimeDuration like this: > {quote}conf.getTimeDuration("nn.interval", 10, > TimeUnit.{color:#9876aa}SECONDS{color}{color:#cc7832}, > {color}TimeUnit.{color:#9876aa}MILLISECONDS{color}){color:#cc7832};{color} > {quote} > {color:#33}If "nn.interval" is set manually or configured in xml file, > 1 will be retrurned.{color} > If not, 10 will be returned while 1 is expected. > The logic is not consistent. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-14437) Exception happened when rollEditLog expects empty EditsDoubleBuffer.bufCurrent but not
[ https://issues.apache.org/jira/browse/HDFS-14437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16820950#comment-16820950 ] star edited comment on HDFS-14437 at 4/18/19 10:43 AM: --- [~hexiaoqiao], agreed. There may be more editlog written into bufCurrent before flush. {quote}In my opinion, The following code segment in FSEditLog#logSync which is out of {{synchronized}} is core reason. {quote} But the table you presented will hardly occurred, for rollEditLog and FSEditLog#logEdit are almost called with FSNamesystem.fsLock holding such as mkdir, setAcl. I'm still working on it to find that racing case. was (Author: starphin): [~hexiaoqiao], agreed. {quote}In my opinion, The following code segment in FSEditLog#logSync which is out of {{synchronized}} is core reason. try {if (logStream != null) \{ logStream.flush(); } }{quote} > Exception happened when rollEditLog expects empty > EditsDoubleBuffer.bufCurrent but not > - > > Key: HDFS-14437 > URL: https://issues.apache.org/jira/browse/HDFS-14437 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, namenode, qjm >Reporter: angerszhu >Priority: Major > > For the problem mentioned in https://issues.apache.org/jira/browse/HDFS-10943 > , I have sort the process of write and flush EditLog and some important > function, I found the in the class FSEditLog class, the close() function > will call such process like below: > > {code:java} > waitForSyncToFinish(); > endCurrentLogSegment(true);{code} > since we have gain the object lock in the function close(), so when > waitForSyncToFish() method return, it mean all logSync job has done and all > data in bufReady has been flushed out, and since current thread has the lock > of this object, when call endCurrentLogSegment(), no other thread will gain > the lock so they can't write new editlog into currentBuf. > But when we don't call waitForSyncToFish() before endCurrentLogSegment(), > there may be some autoScheduled logSync()'s flush process is doing, since > this process don't need > synchronization since it has mention in the comment of logSync() method : > > {code:java} > /** > * Sync all modifications done by this thread. > * > * The internal concurrency design of this class is as follows: > * - Log items are written synchronized into an in-memory buffer, > * and each assigned a transaction ID. > * - When a thread (client) would like to sync all of its edits, logSync() > * uses a ThreadLocal transaction ID to determine what edit number must > * be synced to. > * - The isSyncRunning volatile boolean tracks whether a sync is currently > * under progress. > * > * The data is double-buffered within each edit log implementation so that > * in-memory writing can occur in parallel with the on-disk writing. > * > * Each sync occurs in three steps: > * 1. synchronized, it swaps the double buffer and sets the isSyncRunning > * flag. > * 2. unsynchronized, it flushes the data to storage > * 3. synchronized, it resets the flag and notifies anyone waiting on the > * sync. > * > * The lack of synchronization on step 2 allows other threads to continue > * to write into the memory buffer while the sync is in progress. > * Because this step is unsynchronized, actions that need to avoid > * concurrency with sync() should be synchronized and also call > * waitForSyncToFinish() before assuming they are running alone. > */ > public void logSync() { > long syncStart = 0; > // Fetch the transactionId of this thread. > long mytxid = myTransactionId.get().txid; > > boolean sync = false; > try { > EditLogOutputStream logStream = null; > synchronized (this) { > try { > printStatistics(false); > // if somebody is already syncing, then wait > while (mytxid > synctxid && isSyncRunning) { > try { > wait(1000); > } catch (InterruptedException ie) { > } > } > // > // If this transaction was already flushed, then nothing to do > // > if (mytxid <= synctxid) { > numTransactionsBatchedInSync++; > if (metrics != null) { > // Metrics is non-null only when used inside name node > metrics.incrTransactionsBatchedInSync(); > } > return; > } > > // now, this thread will do the sync > syncStart = txid; > isSyncRunning = true; > sync = true; > // swap buffers > try { > if (journalSet.isEmpty()) { > throw new IOException("No journals available to flush"); > } >
[jira] [Commented] (HDFS-14437) Exception happened when rollEditLog expects empty EditsDoubleBuffer.bufCurrent but not
[ https://issues.apache.org/jira/browse/HDFS-14437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16820950#comment-16820950 ] star commented on HDFS-14437: - [~hexiaoqiao], agreed. {quote}In my opinion, The following code segment in FSEditLog#logSync which is out of {{synchronized}} is core reason. try {if (logStream != null) \{ logStream.flush(); } }{quote} > Exception happened when rollEditLog expects empty > EditsDoubleBuffer.bufCurrent but not > - > > Key: HDFS-14437 > URL: https://issues.apache.org/jira/browse/HDFS-14437 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, namenode, qjm >Reporter: angerszhu >Priority: Major > > For the problem mentioned in https://issues.apache.org/jira/browse/HDFS-10943 > , I have sort the process of write and flush EditLog and some important > function, I found the in the class FSEditLog class, the close() function > will call such process like below: > > {code:java} > waitForSyncToFinish(); > endCurrentLogSegment(true);{code} > since we have gain the object lock in the function close(), so when > waitForSyncToFish() method return, it mean all logSync job has done and all > data in bufReady has been flushed out, and since current thread has the lock > of this object, when call endCurrentLogSegment(), no other thread will gain > the lock so they can't write new editlog into currentBuf. > But when we don't call waitForSyncToFish() before endCurrentLogSegment(), > there may be some autoScheduled logSync()'s flush process is doing, since > this process don't need > synchronization since it has mention in the comment of logSync() method : > > {code:java} > /** > * Sync all modifications done by this thread. > * > * The internal concurrency design of this class is as follows: > * - Log items are written synchronized into an in-memory buffer, > * and each assigned a transaction ID. > * - When a thread (client) would like to sync all of its edits, logSync() > * uses a ThreadLocal transaction ID to determine what edit number must > * be synced to. > * - The isSyncRunning volatile boolean tracks whether a sync is currently > * under progress. > * > * The data is double-buffered within each edit log implementation so that > * in-memory writing can occur in parallel with the on-disk writing. > * > * Each sync occurs in three steps: > * 1. synchronized, it swaps the double buffer and sets the isSyncRunning > * flag. > * 2. unsynchronized, it flushes the data to storage > * 3. synchronized, it resets the flag and notifies anyone waiting on the > * sync. > * > * The lack of synchronization on step 2 allows other threads to continue > * to write into the memory buffer while the sync is in progress. > * Because this step is unsynchronized, actions that need to avoid > * concurrency with sync() should be synchronized and also call > * waitForSyncToFinish() before assuming they are running alone. > */ > public void logSync() { > long syncStart = 0; > // Fetch the transactionId of this thread. > long mytxid = myTransactionId.get().txid; > > boolean sync = false; > try { > EditLogOutputStream logStream = null; > synchronized (this) { > try { > printStatistics(false); > // if somebody is already syncing, then wait > while (mytxid > synctxid && isSyncRunning) { > try { > wait(1000); > } catch (InterruptedException ie) { > } > } > // > // If this transaction was already flushed, then nothing to do > // > if (mytxid <= synctxid) { > numTransactionsBatchedInSync++; > if (metrics != null) { > // Metrics is non-null only when used inside name node > metrics.incrTransactionsBatchedInSync(); > } > return; > } > > // now, this thread will do the sync > syncStart = txid; > isSyncRunning = true; > sync = true; > // swap buffers > try { > if (journalSet.isEmpty()) { > throw new IOException("No journals available to flush"); > } > editLogStream.setReadyToFlush(); > } catch (IOException e) { > final String msg = > "Could not sync enough journals to persistent storage " + > "due to " + e.getMessage() + ". " + > "Unsynced transactions: " + (txid - synctxid); > LOG.fatal(msg, new Exception()); > synchronized(journalSetLock) { > IOUtils.cleanup(LOG, journalSet); > } > terminate(1, msg); > } > } finally { > // Prevent
[jira] [Comment Edited] (HDFS-14437) Exception happened when rollEditLog expects empty EditsDoubleBuffer.bufCurrent but not
[ https://issues.apache.org/jira/browse/HDFS-14437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16820886#comment-16820886 ] star edited comment on HDFS-14437 at 4/18/19 9:29 AM: -- Thanks [~hexiaoqiao] . The table really make the issue clearer. I've tracking FSNamesystem and FSEditLog, found that FSEditLog#rollEditlog always being called holding fsLock, such as {color:#33}rollEditLog{color}, startRollingUpgrade, finalizeRollingUpgrade in FSNamesystem. {quote}CheckpointSignature {color:#ffc66d}rollEditLog{color}() {color:#cc7832}throws {color}IOException { writeLock();{color:#cc7832} {color} {color:#cc7832} try {color}{ result = getFSImage().rollEditLog(getEffectiveLayoutVersion()){color:#cc7832};{color} } {color:#cc7832}finally {color}{ writeUnlock(operationName){color:#cc7832};{color} }{color:#cc7832} {color} {color:#cc7832} return {color}result{color:#cc7832};{color} } {quote} FSEditLog#logSync is also called holding fsLock in most file/dir related operations, such as mkdir, delete. But permission related operation is called out of fsLock, such as setPermission, setOwner, setAcl. {quote}{color:#cc7832}void {color}{color:#ffc66d}setPermission{color}(String src{color:#cc7832}, {color}FsPermission permission) {color:#cc7832}throws {color}IOException { writeLock();{color:#cc7832} {color} {color:#cc7832} try {color}{ auditStat = FSDirAttrOp.setPermission(dir{color:#cc7832}, {color}pc{color:#cc7832}, {color}src{color:#cc7832}, {color}permission){color:#cc7832};{color} } {color:#cc7832}catch {color}(AccessControlException e) Unknown macro: \{ } finally { writeUnlock(operationName){color:#cc7832};{color} } getEditLog().logSync(){color:#cc7832};{color} } {quote} Theoretically it will be a race case to cause this issue. Following operations as table below will reproduce the issue though it seems very difficult. ||Time||Thread1(rollEditlog)||Thread2(setPermission)|| |t0| |writeUnlock| |t1|logEdit|-| |t2|logSyncAll|-| |t3|-|logSync| |t4|finalize|-| |t5|exception and terminate| | was (Author: starphin): Thanks [~hexiaoqiao] . The table really make the issue clearer. I've tracking FSNamesystem and FSEditLog, found that FSEditLog#rollEditlog always being called holding fsLock, such as {color:#33}rollEditLog{color}, startRollingUpgrade, finalizeRollingUpgrade in FSNamesystem. {quote}CheckpointSignature {color:#ffc66d}rollEditLog{color}() {color:#cc7832}throws {color}IOException { writeLock();{color:#cc7832} {color} {color:#cc7832} try {color}{ result = getFSImage().rollEditLog(getEffectiveLayoutVersion()){color:#cc7832};{color} } {color:#cc7832}finally {color}{ writeUnlock(operationName){color:#cc7832};{color} }{color:#cc7832} {color} {color:#cc7832} return {color}result{color:#cc7832};{color} } {quote} FSEditLog#logSync is also called holding fsLock in most file/dir related operations, such as mkdir, delete. But permission related operation is called out of fsLock, such as setPermission, setOwner, setAcl. {quote}{color:#cc7832}void {color}{color:#ffc66d}setPermission{color}(String src{color:#cc7832}, {color}FsPermission permission) {color:#cc7832}throws {color}IOException { writeLock();{color:#cc7832} {color} {color:#cc7832} try {color}{ auditStat = FSDirAttrOp.setPermission(dir{color:#cc7832}, {color}pc{color:#cc7832}, {color}src{color:#cc7832}, {color}permission){color:#cc7832};{color} } {color:#cc7832}catch {color}(AccessControlException e) { } finally { writeUnlock(operationName){color:#cc7832};{color} } getEditLog().logSync(){color:#cc7832};{color} } {quote} Theoretically it will be a race case to cause this issue. Following operations as table below will reproduce the issue though it seems very difficult. ||Time||Thread1(rollEditlog)||Thread2(somewhat write)|| |t0| |writeUnlock| |t1|logEdit|-| |t2|logSyncAll|-| |t3|-|logSync| |t4|finalize|-| |t5|exception and terminate| | > Exception happened when rollEditLog expects empty > EditsDoubleBuffer.bufCurrent but not > - > > Key: HDFS-14437 > URL: https://issues.apache.org/jira/browse/HDFS-14437 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, namenode, qjm >Reporter: angerszhu >Priority: Major > > For the problem mentioned in https://issues.apache.org/jira/browse/HDFS-10943 > , I have sort the process of write and flush EditLog and some important > function, I found the in the class FSEditLog
[jira] [Comment Edited] (HDFS-14437) Exception happened when rollEditLog expects empty EditsDoubleBuffer.bufCurrent but not
[ https://issues.apache.org/jira/browse/HDFS-14437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16820886#comment-16820886 ] star edited comment on HDFS-14437 at 4/18/19 9:25 AM: -- Thanks [~hexiaoqiao] . The table really make the issue clearer. I've tracking FSNamesystem and FSEditLog, found that FSEditLog#rollEditlog always being called holding fsLock, such as {color:#33}rollEditLog{color}, startRollingUpgrade, finalizeRollingUpgrade in FSNamesystem. {quote}CheckpointSignature {color:#ffc66d}rollEditLog{color}() {color:#cc7832}throws {color}IOException { writeLock();{color:#cc7832} {color} {color:#cc7832} try {color}{ result = getFSImage().rollEditLog(getEffectiveLayoutVersion()){color:#cc7832};{color} } {color:#cc7832}finally {color}{ writeUnlock(operationName){color:#cc7832};{color} }{color:#cc7832} {color} {color:#cc7832} return {color}result{color:#cc7832};{color} } {quote} FSEditLog#logSync is also called holding fsLock in most file/dir related operations, such as mkdir, delete. But permission related operation is called out of fsLock, such as setPermission, setOwner, setAcl. {quote}{color:#cc7832}void {color}{color:#ffc66d}setPermission{color}(String src{color:#cc7832}, {color}FsPermission permission) {color:#cc7832}throws {color}IOException { writeLock();{color:#cc7832} {color} {color:#cc7832} try {color}{ auditStat = FSDirAttrOp.setPermission(dir{color:#cc7832}, {color}pc{color:#cc7832}, {color}src{color:#cc7832}, {color}permission){color:#cc7832};{color} } {color:#cc7832}catch {color}(AccessControlException e) { } finally { writeUnlock(operationName){color:#cc7832};{color} } getEditLog().logSync(){color:#cc7832};{color} } {quote} Theoretically it will be a race case to cause this issue. Following operations as table below will reproduce the issue though it seems very difficult. ||Time||Thread1(rollEditlog)||Thread2(somewhat write)|| |t0| |writeUnlock| |t1|logEdit|-| |t2|logSyncAll|-| |t3|-|logSync| |t4|finalize|-| |t5|exception and terminate| | was (Author: starphin): Thanks [~hexiaoqiao] . The table really make the issue clearer. I've tracking FSNamesystem and FSEditLog, found that FSEditLog#rollEditlog always being called holding fsLock, such as {color:#33}rollEditLog{color}, startRollingUpgrade, finalizeRollingUpgrade in FSNamesystem. {quote}CheckpointSignature {color:#ffc66d}rollEditLog{color}() {color:#cc7832}throws {color}IOException {{color:#cc7832} {color} writeLock(){color:#cc7832}; {color}{color:#cc7832} try {color}{ result = getFSImage().rollEditLog(getEffectiveLayoutVersion()){color:#cc7832}; {color} } {color:#cc7832}finally {color}{ writeUnlock(operationName){color:#cc7832}; {color} }{color:#cc7832} {color}{color:#cc7832} return {color}result{color:#cc7832}; {color}}{quote} FSEditLog#logSync is also called holding fsLock in most file/dir related operations, such as mkdir, delete. But permission related operation is called out of fsLock, such as setPermission, setOwner, setAcl. {quote}{color:#cc7832}void {color}{color:#ffc66d}setPermission{color}(String src{color:#cc7832}, {color}FsPermission permission) {color:#cc7832}throws {color}IOException {{color:#cc7832} {color} writeLock(){color:#cc7832}; {color}{color:#cc7832} try {color}{{color:#cc7832} {color} auditStat = FSDirAttrOp.setPermission({color:#9876aa}dir{color}{color:#cc7832}, {color}pc{color:#cc7832}, {color}src{color:#cc7832}, {color}permission){color:#cc7832}; {color} } {color:#cc7832}catch {color}(AccessControlException e) {{color:#cc7832} {color} } {color:#cc7832}finally {color}{ writeUnlock(operationName){color:#cc7832}; {color} } getEditLog().logSync(){color:#cc7832};{color}{color:#cc7832} {color}}{quote} Theoretically it will be a race case to cause this issue. Following operations as table below will reproduce the issue though it seems very difficult. ||Time||Thread1(rollEditlog)||Thread2(somewhat write)|| |t0| |writeUnlock| |t1|logEdit|-| |t2|logSyncAll|-| |t3|-|logSync| |t4|finalize|-| |t5|exception and terminate| | > Exception happened when rollEditLog expects empty > EditsDoubleBuffer.bufCurrent but not > - > > Key: HDFS-14437 > URL: https://issues.apache.org/jira/browse/HDFS-14437 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, namenode, qjm >Reporter: angerszhu >Priority: Major > > For the problem mentioned in https://issues.apache.org/jira/browse/HDFS-10943 > , I have sort the process of write and flush
[jira] [Commented] (HDFS-14437) Exception happened when rollEditLog expects empty EditsDoubleBuffer.bufCurrent but not
[ https://issues.apache.org/jira/browse/HDFS-14437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16820886#comment-16820886 ] star commented on HDFS-14437: - Thanks [~hexiaoqiao] . The table really make the issue clearer. I've tracking FSNamesystem and FSEditLog, found that FSEditLog#rollEditlog always being called holding fsLock, such as {color:#33}rollEditLog{color}, startRollingUpgrade, finalizeRollingUpgrade in FSNamesystem. {quote}CheckpointSignature {color:#ffc66d}rollEditLog{color}() {color:#cc7832}throws {color}IOException {{color:#cc7832} {color} writeLock(){color:#cc7832}; {color}{color:#cc7832} try {color}{ result = getFSImage().rollEditLog(getEffectiveLayoutVersion()){color:#cc7832}; {color} } {color:#cc7832}finally {color}{ writeUnlock(operationName){color:#cc7832}; {color} }{color:#cc7832} {color}{color:#cc7832} return {color}result{color:#cc7832}; {color}}{quote} FSEditLog#logSync is also called holding fsLock in most file/dir related operations, such as mkdir, delete. But permission related operation is called out of fsLock, such as setPermission, setOwner, setAcl. {quote}{color:#cc7832}void {color}{color:#ffc66d}setPermission{color}(String src{color:#cc7832}, {color}FsPermission permission) {color:#cc7832}throws {color}IOException {{color:#cc7832} {color} writeLock(){color:#cc7832}; {color}{color:#cc7832} try {color}{{color:#cc7832} {color} auditStat = FSDirAttrOp.setPermission({color:#9876aa}dir{color}{color:#cc7832}, {color}pc{color:#cc7832}, {color}src{color:#cc7832}, {color}permission){color:#cc7832}; {color} } {color:#cc7832}catch {color}(AccessControlException e) {{color:#cc7832} {color} } {color:#cc7832}finally {color}{ writeUnlock(operationName){color:#cc7832}; {color} } getEditLog().logSync(){color:#cc7832};{color}{color:#cc7832} {color}}{quote} Theoretically it will be a race case to cause this issue. Following operations as table below will reproduce the issue though it seems very difficult. ||Time||Thread1(rollEditlog)||Thread2(somewhat write)|| |t0| |writeUnlock| |t1|logEdit|-| |t2|logSyncAll|-| |t3|-|logSync| |t4|finalize|-| |t5|exception and terminate| | > Exception happened when rollEditLog expects empty > EditsDoubleBuffer.bufCurrent but not > - > > Key: HDFS-14437 > URL: https://issues.apache.org/jira/browse/HDFS-14437 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, namenode, qjm >Reporter: angerszhu >Priority: Major > > For the problem mentioned in https://issues.apache.org/jira/browse/HDFS-10943 > , I have sort the process of write and flush EditLog and some important > function, I found the in the class FSEditLog class, the close() function > will call such process like below: > > {code:java} > waitForSyncToFinish(); > endCurrentLogSegment(true);{code} > since we have gain the object lock in the function close(), so when > waitForSyncToFish() method return, it mean all logSync job has done and all > data in bufReady has been flushed out, and since current thread has the lock > of this object, when call endCurrentLogSegment(), no other thread will gain > the lock so they can't write new editlog into currentBuf. > But when we don't call waitForSyncToFish() before endCurrentLogSegment(), > there may be some autoScheduled logSync()'s flush process is doing, since > this process don't need > synchronization since it has mention in the comment of logSync() method : > > {code:java} > /** > * Sync all modifications done by this thread. > * > * The internal concurrency design of this class is as follows: > * - Log items are written synchronized into an in-memory buffer, > * and each assigned a transaction ID. > * - When a thread (client) would like to sync all of its edits, logSync() > * uses a ThreadLocal transaction ID to determine what edit number must > * be synced to. > * - The isSyncRunning volatile boolean tracks whether a sync is currently > * under progress. > * > * The data is double-buffered within each edit log implementation so that > * in-memory writing can occur in parallel with the on-disk writing. > * > * Each sync occurs in three steps: > * 1. synchronized, it swaps the double buffer and sets the isSyncRunning > * flag. > * 2. unsynchronized, it flushes the data to storage > * 3. synchronized, it resets the flag and notifies anyone waiting on the > * sync. > * > * The lack of synchronization on step 2 allows other threads to continue > * to write into the memory buffer while the sync is in progress. > * Because this step is unsynchronized, actions that need to avoid > * concurrency with sync()
[jira] [Updated] (HDFS-14436) Configuration#getTimeDuration is not consistent between default value and manual settings.
[ https://issues.apache.org/jira/browse/HDFS-14436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] star updated HDFS-14436: Description: When call getTimeDuration like this: {quote}conf.getTimeDuration("nn.interval", 10, TimeUnit.{color:#9876aa}SECONDS{color}{color:#cc7832}, {color}TimeUnit.{color:#9876aa}MILLISECONDS{color}){color:#cc7832};{color} {quote} {color:#33}If "nn.interval" is set manually or configured in xml file, 1 will be retrurned.{color} If not, 10 will be returned while 1 is expected. The logic is not consistent. was: When call getTimeDuration like this: {quote}conf.getTimeDuration("nn.interval", 10, TimeUnit.{color:#9876aa}SECONDS{color}{color:#cc7832}, {color}TimeUnit.{color:#9876aa}MILLISECONDS{color}){color:#cc7832};{color} {quote} {color:#33}{color:#cc7832}If "nn.interval" is set manually or configured in xml file, 1 will be retrurned.{color}{color} If not, 10 will be returned while 1 is expected. The logic is not consistent. > Configuration#getTimeDuration is not consistent between default value and > manual settings. > -- > > Key: HDFS-14436 > URL: https://issues.apache.org/jira/browse/HDFS-14436 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: star >Assignee: star >Priority: Major > > When call getTimeDuration like this: > {quote}conf.getTimeDuration("nn.interval", 10, > TimeUnit.{color:#9876aa}SECONDS{color}{color:#cc7832}, > {color}TimeUnit.{color:#9876aa}MILLISECONDS{color}){color:#cc7832};{color} > {quote} > {color:#33}If "nn.interval" is set manually or configured in xml file, > 1 will be retrurned.{color} > If not, 10 will be returned while 1 is expected. > The logic is not consistent. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14436) Configuration#getTimeDuration is not consistent between default value and manual settings.
[ https://issues.apache.org/jira/browse/HDFS-14436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] star updated HDFS-14436: Description: When call getTimeDuration like this: {quote}conf.getTimeDuration("nn.interval", 10, TimeUnit.{color:#9876aa}SECONDS{color}{color:#cc7832}, {color}TimeUnit.{color:#9876aa}MILLISECONDS{color}){color:#cc7832};{color} {quote} {color:#33}{color:#cc7832}If "nn.interval" is set manually or configured in xml file, 1 will be retrurned.{color}{color} If not, 10 will be returned while 1 is expected. The logic is not consistent. was: When call getTimeDuration like this: {quote}conf.getTimeDuration("nn.interval", 10, TimeUnit.{color:#9876aa}SECONDS{color}{color:#cc7832}, {color}TimeUnit.{color:#9876aa}MILLISECONDS{color}){color:#cc7832}; {color} {quote} {color:#cc7832}{color:#33}If "nn.interval" is set manually or configured in xml file, 1 will be retrurned. {color}{color} {color:#cc7832}{color:#33}If not, 10 will be returned while 1 is expected.{color} {color} {color:#33}{color:#cc7832}The logic is not consistent.{color}{color} > Configuration#getTimeDuration is not consistent between default value and > manual settings. > -- > > Key: HDFS-14436 > URL: https://issues.apache.org/jira/browse/HDFS-14436 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: star >Assignee: star >Priority: Major > > When call getTimeDuration like this: > {quote}conf.getTimeDuration("nn.interval", 10, > TimeUnit.{color:#9876aa}SECONDS{color}{color:#cc7832}, > {color}TimeUnit.{color:#9876aa}MILLISECONDS{color}){color:#cc7832};{color} > {quote} > > {color:#33}{color:#cc7832}If "nn.interval" is set manually or configured > in xml file, 1 will be retrurned.{color}{color} > If not, 10 will be returned while 1 is expected. > The logic is not consistent. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14436) Configuration#getTimeDuration is not consistent between default value and manual settings.
[ https://issues.apache.org/jira/browse/HDFS-14436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] star updated HDFS-14436: Description: When call getTimeDuration like this: {quote}conf.getTimeDuration("nn.interval", 10, TimeUnit.{color:#9876aa}SECONDS{color}{color:#cc7832}, {color}TimeUnit.{color:#9876aa}MILLISECONDS{color}){color:#cc7832}; {color} {quote} {color:#cc7832}{color:#33}If "nn.interval" is set manually or configured in xml file, 1 will be retrurned. {color}{color} {color:#cc7832}{color:#33}If not, 10 will be returned while 1 is expected.{color} {color} {color:#33}{color:#cc7832}The logic is not consistent.{color}{color} was: When call getTimeDuration like this: {quote}conf.getTimeDuration("property", 10, TimeUnit.{color:#9876aa}SECONDS{color}{color:#cc7832}, {color}TimeUnit.{color:#9876aa}MILLISECONDS{color}){color:#cc7832}; {color} {quote} > Configuration#getTimeDuration is not consistent between default value and > manual settings. > -- > > Key: HDFS-14436 > URL: https://issues.apache.org/jira/browse/HDFS-14436 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: star >Assignee: star >Priority: Major > > When call getTimeDuration like this: > {quote}conf.getTimeDuration("nn.interval", 10, > TimeUnit.{color:#9876aa}SECONDS{color}{color:#cc7832}, > {color}TimeUnit.{color:#9876aa}MILLISECONDS{color}){color:#cc7832}; > {color} > {quote} > {color:#cc7832}{color:#33}If "nn.interval" is set manually or configured > in xml file, 1 will be retrurned. > {color}{color} > {color:#cc7832}{color:#33}If not, 10 will be returned while 1 is > expected.{color} {color} > {color:#33}{color:#cc7832}The logic is not consistent.{color}{color} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-14436) Configuration#getTimeDuration is not consistent between default value and manual settings.
star created HDFS-14436: --- Summary: Configuration#getTimeDuration is not consistent between default value and manual settings. Key: HDFS-14436 URL: https://issues.apache.org/jira/browse/HDFS-14436 Project: Hadoop HDFS Issue Type: Bug Reporter: star Assignee: star When call getTimeDuration like this: {quote}conf.getTimeDuration("property", 10, TimeUnit.{color:#9876aa}SECONDS{color}{color:#cc7832}, {color}TimeUnit.{color:#9876aa}MILLISECONDS{color}){color:#cc7832}; {color} {quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10659) Namenode crashes after Journalnode re-installation in an HA cluster due to missing paxos directory
[ https://issues.apache.org/jira/browse/HDFS-10659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16820206#comment-16820206 ] star commented on HDFS-10659: - [~jojochuang], would you like to take a review for the patch? > Namenode crashes after Journalnode re-installation in an HA cluster due to > missing paxos directory > -- > > Key: HDFS-10659 > URL: https://issues.apache.org/jira/browse/HDFS-10659 > Project: Hadoop HDFS > Issue Type: Improvement > Components: ha, journal-node >Affects Versions: 2.7.0 >Reporter: Amit Anand >Assignee: star >Priority: Major > Attachments: HDFS-10659.000.patch, HDFS-10659.001.patch, > HDFS-10659.002.patch, HDFS-10659.003.patch, HDFS-10659.004.patch, > HDFS-10659.005.patch, HDFS-10659.006.patch > > > In my environment I am seeing {{Namenodes}} crashing down after majority of > {{Journalnodes}} are re-installed. We manage multiple clusters and do rolling > upgrades followed by rolling re-install of each node including master(NN, JN, > RM, ZK) nodes. When a journal node is re-installed or moved to a new > disk/host, instead of running {{"initializeSharedEdits"}} command, I copy > {{VERSION}} file from one of the other {{Journalnode}} and that allows my > {{NN}} to start writing data to the newly installed {{Journalnode}}. > To acheive quorum for JN and recover unfinalized segments NN during starupt > creates .tmp files under {{"/jn/current/paxos"}} directory . In > current implementation "paxos" directry is only created during > {{"initializeSharedEdits"}} command and if a JN is re-installed the "paxos" > directory is not created upon JN startup or by NN while writing .tmp > files which causes NN to crash with following error message: > {code} > 192.168.100.16:8485: /disk/1/dfs/jn/Test-Laptop/current/paxos/64044.tmp (No > such file or directory) > at java.io.FileOutputStream.open(Native Method) > at java.io.FileOutputStream.(FileOutputStream.java:221) > at java.io.FileOutputStream.(FileOutputStream.java:171) > at > org.apache.hadoop.hdfs.util.AtomicFileOutputStream.(AtomicFileOutputStream.java:58) > at > org.apache.hadoop.hdfs.qjournal.server.Journal.persistPaxosData(Journal.java:971) > at > org.apache.hadoop.hdfs.qjournal.server.Journal.acceptRecovery(Journal.java:846) > at > org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.acceptRecovery(JournalNodeRpcServer.java:205) > at > org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.acceptRecovery(QJournalProtocolServerSideTranslatorPB.java:249) > at > org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:25435) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2151) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2147) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2145) > {code} > The current > [getPaxosFile|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/server/JNStorage.java#L128-L130] > method simply returns a path to a file under "paxos" directory without > verifiying its existence. Since "paxos" directoy holds files that are > required for NN recovery and acheiving JN quorum my proposed solution is to > add a check to "getPaxosFile" method and create the {{"paxos"}} directory if > it is missing. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14378) Simplify the design of multiple NN and both logic of edit log roll and checkpoint
[ https://issues.apache.org/jira/browse/HDFS-14378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16820196#comment-16820196 ] star commented on HDFS-14378: - Any one would like to make a review for the patch? Appreciated. > Simplify the design of multiple NN and both logic of edit log roll and > checkpoint > - > > Key: HDFS-14378 > URL: https://issues.apache.org/jira/browse/HDFS-14378 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.1.2 >Reporter: star >Assignee: star >Priority: Minor > Labels: patch > Attachments: HDFS-14378-trunk.001.patch, HDFS-14378-trunk.002.patch, > HDFS-14378-trunk.003.patch, HDFS-14378-trunk.004.patch > > > HDFS-6440 introduced a mechanism to support more than 2 NNs. It > implements a first-writer-win policy to avoid duplicated fsimage downloading. > Variable 'isPrimaryCheckPointer' is used to hold the first-writer state, with > which SNN will provide fsimage for ANN next time. Then we have three roles in > NN cluster: ANN, one primary SNN, one or more normal SNN. > Since HDFS-12248, there may be more than two primary SNN shortly after > a exception occurred. It takes care with a scenario that SNN will not upload > fsimage on IOE and Interrupted exceptions. Though it will not cause any > further functional issues, it is inconsistent. > Futher more, edit log may be rolled more frequently than necessary with > multiple Standby name nodes, HDFS-14349. (I'm not so sure about this, will > verify by unit tests or any one could point it out.) > Above all, I‘m wondering if we could make it simple with following > changes: > * There are only two roles:ANN, SNN > * ANN will roll its edit log every DFS_HA_LOGROLL_PERIOD_KEY period. > * ANN will select a SNN to download checkpoint. > SNN will just do logtail and checkpoint. Then provide a servlet for fsimage > downloading as normal. SNN will not try to roll edit log or send checkpoint > request to ANN. > In a word, ANN will be more active. Suggestions are welcomed. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14378) Simplify the design of multiple NN and both logic of edit log roll and checkpoint
[ https://issues.apache.org/jira/browse/HDFS-14378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] star updated HDFS-14378: Attachment: HDFS-14378-trunk.004.patch > Simplify the design of multiple NN and both logic of edit log roll and > checkpoint > - > > Key: HDFS-14378 > URL: https://issues.apache.org/jira/browse/HDFS-14378 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.1.2 >Reporter: star >Assignee: star >Priority: Minor > Labels: patch > Attachments: HDFS-14378-trunk.001.patch, HDFS-14378-trunk.002.patch, > HDFS-14378-trunk.003.patch, HDFS-14378-trunk.004.patch > > > HDFS-6440 introduced a mechanism to support more than 2 NNs. It > implements a first-writer-win policy to avoid duplicated fsimage downloading. > Variable 'isPrimaryCheckPointer' is used to hold the first-writer state, with > which SNN will provide fsimage for ANN next time. Then we have three roles in > NN cluster: ANN, one primary SNN, one or more normal SNN. > Since HDFS-12248, there may be more than two primary SNN shortly after > a exception occurred. It takes care with a scenario that SNN will not upload > fsimage on IOE and Interrupted exceptions. Though it will not cause any > further functional issues, it is inconsistent. > Futher more, edit log may be rolled more frequently than necessary with > multiple Standby name nodes, HDFS-14349. (I'm not so sure about this, will > verify by unit tests or any one could point it out.) > Above all, I‘m wondering if we could make it simple with following > changes: > * There are only two roles:ANN, SNN > * ANN will roll its edit log every DFS_HA_LOGROLL_PERIOD_KEY period. > * ANN will select a SNN to download checkpoint. > SNN will just do logtail and checkpoint. Then provide a servlet for fsimage > downloading as normal. SNN will not try to roll edit log or send checkpoint > request to ANN. > In a word, ANN will be more active. Suggestions are welcomed. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14378) Simplify the design of multiple NN and both logic of edit log roll and checkpoint
[ https://issues.apache.org/jira/browse/HDFS-14378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] star updated HDFS-14378: Attachment: HDFS-14378-trunk.003.patch > Simplify the design of multiple NN and both logic of edit log roll and > checkpoint > - > > Key: HDFS-14378 > URL: https://issues.apache.org/jira/browse/HDFS-14378 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.1.2 >Reporter: star >Assignee: star >Priority: Minor > Labels: patch > Attachments: HDFS-14378-trunk.001.patch, HDFS-14378-trunk.002.patch, > HDFS-14378-trunk.003.patch > > > HDFS-6440 introduced a mechanism to support more than 2 NNs. It > implements a first-writer-win policy to avoid duplicated fsimage downloading. > Variable 'isPrimaryCheckPointer' is used to hold the first-writer state, with > which SNN will provide fsimage for ANN next time. Then we have three roles in > NN cluster: ANN, one primary SNN, one or more normal SNN. > Since HDFS-12248, there may be more than two primary SNN shortly after > a exception occurred. It takes care with a scenario that SNN will not upload > fsimage on IOE and Interrupted exceptions. Though it will not cause any > further functional issues, it is inconsistent. > Futher more, edit log may be rolled more frequently than necessary with > multiple Standby name nodes, HDFS-14349. (I'm not so sure about this, will > verify by unit tests or any one could point it out.) > Above all, I‘m wondering if we could make it simple with following > changes: > * There are only two roles:ANN, SNN > * ANN will roll its edit log every DFS_HA_LOGROLL_PERIOD_KEY period. > * ANN will select a SNN to download checkpoint. > SNN will just do logtail and checkpoint. Then provide a servlet for fsimage > downloading as normal. SNN will not try to roll edit log or send checkpoint > request to ANN. > In a word, ANN will be more active. Suggestions are welcomed. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14378) Simplify the design of multiple NN and both logic of edit log roll and checkpoint
[ https://issues.apache.org/jira/browse/HDFS-14378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] star updated HDFS-14378: Attachment: (was: HDFS-14378-trunk.002.patch) > Simplify the design of multiple NN and both logic of edit log roll and > checkpoint > - > > Key: HDFS-14378 > URL: https://issues.apache.org/jira/browse/HDFS-14378 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.1.2 >Reporter: star >Assignee: star >Priority: Minor > Labels: patch > Attachments: HDFS-14378-trunk.001.patch, HDFS-14378-trunk.002.patch > > > HDFS-6440 introduced a mechanism to support more than 2 NNs. It > implements a first-writer-win policy to avoid duplicated fsimage downloading. > Variable 'isPrimaryCheckPointer' is used to hold the first-writer state, with > which SNN will provide fsimage for ANN next time. Then we have three roles in > NN cluster: ANN, one primary SNN, one or more normal SNN. > Since HDFS-12248, there may be more than two primary SNN shortly after > a exception occurred. It takes care with a scenario that SNN will not upload > fsimage on IOE and Interrupted exceptions. Though it will not cause any > further functional issues, it is inconsistent. > Futher more, edit log may be rolled more frequently than necessary with > multiple Standby name nodes, HDFS-14349. (I'm not so sure about this, will > verify by unit tests or any one could point it out.) > Above all, I‘m wondering if we could make it simple with following > changes: > * There are only two roles:ANN, SNN > * ANN will roll its edit log every DFS_HA_LOGROLL_PERIOD_KEY period. > * ANN will select a SNN to download checkpoint. > SNN will just do logtail and checkpoint. Then provide a servlet for fsimage > downloading as normal. SNN will not try to roll edit log or send checkpoint > request to ANN. > In a word, ANN will be more active. Suggestions are welcomed. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14378) Simplify the design of multiple NN and both logic of edit log roll and checkpoint
[ https://issues.apache.org/jira/browse/HDFS-14378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] star updated HDFS-14378: Labels: patch (was: ) Affects Version/s: 3.1.2 Attachment: HDFS-14378-trunk.002.patch Status: Patch Available (was: Open) > Simplify the design of multiple NN and both logic of edit log roll and > checkpoint > - > > Key: HDFS-14378 > URL: https://issues.apache.org/jira/browse/HDFS-14378 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.1.2 >Reporter: star >Assignee: star >Priority: Minor > Labels: patch > Attachments: HDFS-14378-trunk.001.patch, HDFS-14378-trunk.002.patch, > HDFS-14378-trunk.002.patch > > > HDFS-6440 introduced a mechanism to support more than 2 NNs. It > implements a first-writer-win policy to avoid duplicated fsimage downloading. > Variable 'isPrimaryCheckPointer' is used to hold the first-writer state, with > which SNN will provide fsimage for ANN next time. Then we have three roles in > NN cluster: ANN, one primary SNN, one or more normal SNN. > Since HDFS-12248, there may be more than two primary SNN shortly after > a exception occurred. It takes care with a scenario that SNN will not upload > fsimage on IOE and Interrupted exceptions. Though it will not cause any > further functional issues, it is inconsistent. > Futher more, edit log may be rolled more frequently than necessary with > multiple Standby name nodes, HDFS-14349. (I'm not so sure about this, will > verify by unit tests or any one could point it out.) > Above all, I‘m wondering if we could make it simple with following > changes: > * There are only two roles:ANN, SNN > * ANN will roll its edit log every DFS_HA_LOGROLL_PERIOD_KEY period. > * ANN will select a SNN to download checkpoint. > SNN will just do logtail and checkpoint. Then provide a servlet for fsimage > downloading as normal. SNN will not try to roll edit log or send checkpoint > request to ANN. > In a word, ANN will be more active. Suggestions are welcomed. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14424) NN failover failed beacuse of lossing paxos directory
[ https://issues.apache.org/jira/browse/HDFS-14424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] star updated HDFS-14424: External issue ID: (was: HDFS-10659) > NN failover failed beacuse of lossing paxos directory > - > > Key: HDFS-14424 > URL: https://issues.apache.org/jira/browse/HDFS-14424 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: star >Assignee: star >Priority: Major > > Recently, our hadoop namenode shutdown when switching active namenode, just > because of missing paxos directory. It is created in the default /tmp path > and deleted by os for no operation in 7 days. We can avoid this by moving > journal directory to a none tmp dir, but it‘s better to make sure namenode > works well by a default config. > The issue throws exception similar to HDFS-10659, also caused by missing > paxos directory. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14424) NN failover failed beacuse of lossing paxos directory
[ https://issues.apache.org/jira/browse/HDFS-14424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] star updated HDFS-14424: External issue ID: HDFS-10659 > NN failover failed beacuse of lossing paxos directory > - > > Key: HDFS-14424 > URL: https://issues.apache.org/jira/browse/HDFS-14424 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: star >Assignee: star >Priority: Major > > Recently, our hadoop namenode shutdown when switching active namenode, just > because of missing paxos directory. It is created in the default /tmp path > and deleted by os for no operation in 7 days. We can avoid this by moving > journal directory to a none tmp dir, but it‘s better to make sure namenode > works well by a default config. > The issue throws exception similar to HDFS-10659, also caused by missing > paxos directory. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10659) Namenode crashes after Journalnode re-installation in an HA cluster due to missing paxos directory
[ https://issues.apache.org/jira/browse/HDFS-10659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16818634#comment-16818634 ] star commented on HDFS-10659: - failed test unrelated. check style is not newly added for hidden variable 'sd' . > Namenode crashes after Journalnode re-installation in an HA cluster due to > missing paxos directory > -- > > Key: HDFS-10659 > URL: https://issues.apache.org/jira/browse/HDFS-10659 > Project: Hadoop HDFS > Issue Type: Improvement > Components: ha, journal-node >Affects Versions: 2.7.0 >Reporter: Amit Anand >Assignee: star >Priority: Major > Attachments: HDFS-10659.000.patch, HDFS-10659.001.patch, > HDFS-10659.002.patch, HDFS-10659.003.patch, HDFS-10659.004.patch, > HDFS-10659.005.patch, HDFS-10659.006.patch > > > In my environment I am seeing {{Namenodes}} crashing down after majority of > {{Journalnodes}} are re-installed. We manage multiple clusters and do rolling > upgrades followed by rolling re-install of each node including master(NN, JN, > RM, ZK) nodes. When a journal node is re-installed or moved to a new > disk/host, instead of running {{"initializeSharedEdits"}} command, I copy > {{VERSION}} file from one of the other {{Journalnode}} and that allows my > {{NN}} to start writing data to the newly installed {{Journalnode}}. > To acheive quorum for JN and recover unfinalized segments NN during starupt > creates .tmp files under {{"/jn/current/paxos"}} directory . In > current implementation "paxos" directry is only created during > {{"initializeSharedEdits"}} command and if a JN is re-installed the "paxos" > directory is not created upon JN startup or by NN while writing .tmp > files which causes NN to crash with following error message: > {code} > 192.168.100.16:8485: /disk/1/dfs/jn/Test-Laptop/current/paxos/64044.tmp (No > such file or directory) > at java.io.FileOutputStream.open(Native Method) > at java.io.FileOutputStream.(FileOutputStream.java:221) > at java.io.FileOutputStream.(FileOutputStream.java:171) > at > org.apache.hadoop.hdfs.util.AtomicFileOutputStream.(AtomicFileOutputStream.java:58) > at > org.apache.hadoop.hdfs.qjournal.server.Journal.persistPaxosData(Journal.java:971) > at > org.apache.hadoop.hdfs.qjournal.server.Journal.acceptRecovery(Journal.java:846) > at > org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.acceptRecovery(JournalNodeRpcServer.java:205) > at > org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.acceptRecovery(QJournalProtocolServerSideTranslatorPB.java:249) > at > org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:25435) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2151) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2147) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2145) > {code} > The current > [getPaxosFile|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/server/JNStorage.java#L128-L130] > method simply returns a path to a file under "paxos" directory without > verifiying its existence. Since "paxos" directoy holds files that are > required for NN recovery and acheiving JN quorum my proposed solution is to > add a check to "getPaxosFile" method and create the {{"paxos"}} directory if > it is missing. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10659) Namenode crashes after Journalnode re-installation in an HA cluster due to missing paxos directory
[ https://issues.apache.org/jira/browse/HDFS-10659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16818480#comment-16818480 ] star commented on HDFS-10659: - fix check style. Seems func JNStorage#findFinalizedEditsFile is never used. Need to remove? > Namenode crashes after Journalnode re-installation in an HA cluster due to > missing paxos directory > -- > > Key: HDFS-10659 > URL: https://issues.apache.org/jira/browse/HDFS-10659 > Project: Hadoop HDFS > Issue Type: Improvement > Components: ha, journal-node >Affects Versions: 2.7.0 >Reporter: Amit Anand >Assignee: star >Priority: Major > Attachments: HDFS-10659.000.patch, HDFS-10659.001.patch, > HDFS-10659.002.patch, HDFS-10659.003.patch, HDFS-10659.004.patch, > HDFS-10659.005.patch, HDFS-10659.006.patch > > > In my environment I am seeing {{Namenodes}} crashing down after majority of > {{Journalnodes}} are re-installed. We manage multiple clusters and do rolling > upgrades followed by rolling re-install of each node including master(NN, JN, > RM, ZK) nodes. When a journal node is re-installed or moved to a new > disk/host, instead of running {{"initializeSharedEdits"}} command, I copy > {{VERSION}} file from one of the other {{Journalnode}} and that allows my > {{NN}} to start writing data to the newly installed {{Journalnode}}. > To acheive quorum for JN and recover unfinalized segments NN during starupt > creates .tmp files under {{"/jn/current/paxos"}} directory . In > current implementation "paxos" directry is only created during > {{"initializeSharedEdits"}} command and if a JN is re-installed the "paxos" > directory is not created upon JN startup or by NN while writing .tmp > files which causes NN to crash with following error message: > {code} > 192.168.100.16:8485: /disk/1/dfs/jn/Test-Laptop/current/paxos/64044.tmp (No > such file or directory) > at java.io.FileOutputStream.open(Native Method) > at java.io.FileOutputStream.(FileOutputStream.java:221) > at java.io.FileOutputStream.(FileOutputStream.java:171) > at > org.apache.hadoop.hdfs.util.AtomicFileOutputStream.(AtomicFileOutputStream.java:58) > at > org.apache.hadoop.hdfs.qjournal.server.Journal.persistPaxosData(Journal.java:971) > at > org.apache.hadoop.hdfs.qjournal.server.Journal.acceptRecovery(Journal.java:846) > at > org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.acceptRecovery(JournalNodeRpcServer.java:205) > at > org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.acceptRecovery(QJournalProtocolServerSideTranslatorPB.java:249) > at > org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:25435) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2151) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2147) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2145) > {code} > The current > [getPaxosFile|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/server/JNStorage.java#L128-L130] > method simply returns a path to a file under "paxos" directory without > verifiying its existence. Since "paxos" directoy holds files that are > required for NN recovery and acheiving JN quorum my proposed solution is to > add a check to "getPaxosFile" method and create the {{"paxos"}} directory if > it is missing. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10659) Namenode crashes after Journalnode re-installation in an HA cluster due to missing paxos directory
[ https://issues.apache.org/jira/browse/HDFS-10659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] star updated HDFS-10659: Attachment: HDFS-10659.006.patch > Namenode crashes after Journalnode re-installation in an HA cluster due to > missing paxos directory > -- > > Key: HDFS-10659 > URL: https://issues.apache.org/jira/browse/HDFS-10659 > Project: Hadoop HDFS > Issue Type: Improvement > Components: ha, journal-node >Affects Versions: 2.7.0 >Reporter: Amit Anand >Assignee: star >Priority: Major > Attachments: HDFS-10659.000.patch, HDFS-10659.001.patch, > HDFS-10659.002.patch, HDFS-10659.003.patch, HDFS-10659.004.patch, > HDFS-10659.005.patch, HDFS-10659.006.patch > > > In my environment I am seeing {{Namenodes}} crashing down after majority of > {{Journalnodes}} are re-installed. We manage multiple clusters and do rolling > upgrades followed by rolling re-install of each node including master(NN, JN, > RM, ZK) nodes. When a journal node is re-installed or moved to a new > disk/host, instead of running {{"initializeSharedEdits"}} command, I copy > {{VERSION}} file from one of the other {{Journalnode}} and that allows my > {{NN}} to start writing data to the newly installed {{Journalnode}}. > To acheive quorum for JN and recover unfinalized segments NN during starupt > creates .tmp files under {{"/jn/current/paxos"}} directory . In > current implementation "paxos" directry is only created during > {{"initializeSharedEdits"}} command and if a JN is re-installed the "paxos" > directory is not created upon JN startup or by NN while writing .tmp > files which causes NN to crash with following error message: > {code} > 192.168.100.16:8485: /disk/1/dfs/jn/Test-Laptop/current/paxos/64044.tmp (No > such file or directory) > at java.io.FileOutputStream.open(Native Method) > at java.io.FileOutputStream.(FileOutputStream.java:221) > at java.io.FileOutputStream.(FileOutputStream.java:171) > at > org.apache.hadoop.hdfs.util.AtomicFileOutputStream.(AtomicFileOutputStream.java:58) > at > org.apache.hadoop.hdfs.qjournal.server.Journal.persistPaxosData(Journal.java:971) > at > org.apache.hadoop.hdfs.qjournal.server.Journal.acceptRecovery(Journal.java:846) > at > org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.acceptRecovery(JournalNodeRpcServer.java:205) > at > org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.acceptRecovery(QJournalProtocolServerSideTranslatorPB.java:249) > at > org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:25435) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2151) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2147) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2145) > {code} > The current > [getPaxosFile|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/server/JNStorage.java#L128-L130] > method simply returns a path to a file under "paxos" directory without > verifiying its existence. Since "paxos" directoy holds files that are > required for NN recovery and acheiving JN quorum my proposed solution is to > add a check to "getPaxosFile" method and create the {{"paxos"}} directory if > it is missing. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10659) Namenode crashes after Journalnode re-installation in an HA cluster due to missing paxos directory
[ https://issues.apache.org/jira/browse/HDFS-10659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16817936#comment-16817936 ] star commented on HDFS-10659: - [~hanishakoneru] thanks. [~jojochuang], patch for trunk uploaded. Mind make a review? > Namenode crashes after Journalnode re-installation in an HA cluster due to > missing paxos directory > -- > > Key: HDFS-10659 > URL: https://issues.apache.org/jira/browse/HDFS-10659 > Project: Hadoop HDFS > Issue Type: Improvement > Components: ha, journal-node >Affects Versions: 2.7.0 >Reporter: Amit Anand >Assignee: star >Priority: Major > Attachments: HDFS-10659.000.patch, HDFS-10659.001.patch, > HDFS-10659.002.patch, HDFS-10659.003.patch, HDFS-10659.004.patch, > HDFS-10659.005.patch > > > In my environment I am seeing {{Namenodes}} crashing down after majority of > {{Journalnodes}} are re-installed. We manage multiple clusters and do rolling > upgrades followed by rolling re-install of each node including master(NN, JN, > RM, ZK) nodes. When a journal node is re-installed or moved to a new > disk/host, instead of running {{"initializeSharedEdits"}} command, I copy > {{VERSION}} file from one of the other {{Journalnode}} and that allows my > {{NN}} to start writing data to the newly installed {{Journalnode}}. > To acheive quorum for JN and recover unfinalized segments NN during starupt > creates .tmp files under {{"/jn/current/paxos"}} directory . In > current implementation "paxos" directry is only created during > {{"initializeSharedEdits"}} command and if a JN is re-installed the "paxos" > directory is not created upon JN startup or by NN while writing .tmp > files which causes NN to crash with following error message: > {code} > 192.168.100.16:8485: /disk/1/dfs/jn/Test-Laptop/current/paxos/64044.tmp (No > such file or directory) > at java.io.FileOutputStream.open(Native Method) > at java.io.FileOutputStream.(FileOutputStream.java:221) > at java.io.FileOutputStream.(FileOutputStream.java:171) > at > org.apache.hadoop.hdfs.util.AtomicFileOutputStream.(AtomicFileOutputStream.java:58) > at > org.apache.hadoop.hdfs.qjournal.server.Journal.persistPaxosData(Journal.java:971) > at > org.apache.hadoop.hdfs.qjournal.server.Journal.acceptRecovery(Journal.java:846) > at > org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.acceptRecovery(JournalNodeRpcServer.java:205) > at > org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.acceptRecovery(QJournalProtocolServerSideTranslatorPB.java:249) > at > org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:25435) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2151) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2147) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2145) > {code} > The current > [getPaxosFile|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/server/JNStorage.java#L128-L130] > method simply returns a path to a file under "paxos" directory without > verifiying its existence. Since "paxos" directoy holds files that are > required for NN recovery and acheiving JN quorum my proposed solution is to > add a check to "getPaxosFile" method and create the {{"paxos"}} directory if > it is missing. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10659) Namenode crashes after Journalnode re-installation in an HA cluster due to missing paxos directory
[ https://issues.apache.org/jira/browse/HDFS-10659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] star updated HDFS-10659: Attachment: HDFS-10659.005.patch > Namenode crashes after Journalnode re-installation in an HA cluster due to > missing paxos directory > -- > > Key: HDFS-10659 > URL: https://issues.apache.org/jira/browse/HDFS-10659 > Project: Hadoop HDFS > Issue Type: Improvement > Components: ha, journal-node >Affects Versions: 2.7.0 >Reporter: Amit Anand >Assignee: star >Priority: Major > Attachments: HDFS-10659.000.patch, HDFS-10659.001.patch, > HDFS-10659.002.patch, HDFS-10659.003.patch, HDFS-10659.004.patch, > HDFS-10659.005.patch > > > In my environment I am seeing {{Namenodes}} crashing down after majority of > {{Journalnodes}} are re-installed. We manage multiple clusters and do rolling > upgrades followed by rolling re-install of each node including master(NN, JN, > RM, ZK) nodes. When a journal node is re-installed or moved to a new > disk/host, instead of running {{"initializeSharedEdits"}} command, I copy > {{VERSION}} file from one of the other {{Journalnode}} and that allows my > {{NN}} to start writing data to the newly installed {{Journalnode}}. > To acheive quorum for JN and recover unfinalized segments NN during starupt > creates .tmp files under {{"/jn/current/paxos"}} directory . In > current implementation "paxos" directry is only created during > {{"initializeSharedEdits"}} command and if a JN is re-installed the "paxos" > directory is not created upon JN startup or by NN while writing .tmp > files which causes NN to crash with following error message: > {code} > 192.168.100.16:8485: /disk/1/dfs/jn/Test-Laptop/current/paxos/64044.tmp (No > such file or directory) > at java.io.FileOutputStream.open(Native Method) > at java.io.FileOutputStream.(FileOutputStream.java:221) > at java.io.FileOutputStream.(FileOutputStream.java:171) > at > org.apache.hadoop.hdfs.util.AtomicFileOutputStream.(AtomicFileOutputStream.java:58) > at > org.apache.hadoop.hdfs.qjournal.server.Journal.persistPaxosData(Journal.java:971) > at > org.apache.hadoop.hdfs.qjournal.server.Journal.acceptRecovery(Journal.java:846) > at > org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.acceptRecovery(JournalNodeRpcServer.java:205) > at > org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.acceptRecovery(QJournalProtocolServerSideTranslatorPB.java:249) > at > org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:25435) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2151) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2147) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2145) > {code} > The current > [getPaxosFile|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/server/JNStorage.java#L128-L130] > method simply returns a path to a file under "paxos" directory without > verifiying its existence. Since "paxos" directoy holds files that are > required for NN recovery and acheiving JN quorum my proposed solution is to > add a check to "getPaxosFile" method and create the {{"paxos"}} directory if > it is missing. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-10659) Namenode crashes after Journalnode re-installation in an HA cluster due to missing paxos directory
[ https://issues.apache.org/jira/browse/HDFS-10659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] star reassigned HDFS-10659: --- Assignee: star (was: Hanisha Koneru) > Namenode crashes after Journalnode re-installation in an HA cluster due to > missing paxos directory > -- > > Key: HDFS-10659 > URL: https://issues.apache.org/jira/browse/HDFS-10659 > Project: Hadoop HDFS > Issue Type: Improvement > Components: ha, journal-node >Affects Versions: 2.7.0 >Reporter: Amit Anand >Assignee: star >Priority: Major > Attachments: HDFS-10659.000.patch, HDFS-10659.001.patch, > HDFS-10659.002.patch, HDFS-10659.003.patch, HDFS-10659.004.patch > > > In my environment I am seeing {{Namenodes}} crashing down after majority of > {{Journalnodes}} are re-installed. We manage multiple clusters and do rolling > upgrades followed by rolling re-install of each node including master(NN, JN, > RM, ZK) nodes. When a journal node is re-installed or moved to a new > disk/host, instead of running {{"initializeSharedEdits"}} command, I copy > {{VERSION}} file from one of the other {{Journalnode}} and that allows my > {{NN}} to start writing data to the newly installed {{Journalnode}}. > To acheive quorum for JN and recover unfinalized segments NN during starupt > creates .tmp files under {{"/jn/current/paxos"}} directory . In > current implementation "paxos" directry is only created during > {{"initializeSharedEdits"}} command and if a JN is re-installed the "paxos" > directory is not created upon JN startup or by NN while writing .tmp > files which causes NN to crash with following error message: > {code} > 192.168.100.16:8485: /disk/1/dfs/jn/Test-Laptop/current/paxos/64044.tmp (No > such file or directory) > at java.io.FileOutputStream.open(Native Method) > at java.io.FileOutputStream.(FileOutputStream.java:221) > at java.io.FileOutputStream.(FileOutputStream.java:171) > at > org.apache.hadoop.hdfs.util.AtomicFileOutputStream.(AtomicFileOutputStream.java:58) > at > org.apache.hadoop.hdfs.qjournal.server.Journal.persistPaxosData(Journal.java:971) > at > org.apache.hadoop.hdfs.qjournal.server.Journal.acceptRecovery(Journal.java:846) > at > org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.acceptRecovery(JournalNodeRpcServer.java:205) > at > org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.acceptRecovery(QJournalProtocolServerSideTranslatorPB.java:249) > at > org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:25435) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2151) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2147) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2145) > {code} > The current > [getPaxosFile|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/server/JNStorage.java#L128-L130] > method simply returns a path to a file under "paxos" directory without > verifiying its existence. Since "paxos" directoy holds files that are > required for NN recovery and acheiving JN quorum my proposed solution is to > add a check to "getPaxosFile" method and create the {{"paxos"}} directory if > it is missing. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10477) Stop decommission a rack of DataNodes caused NameNode fail over to standby
[ https://issues.apache.org/jira/browse/HDFS-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16817908#comment-16817908 ] star commented on HDFS-10477: - [~jojochuang], [^HDFS-10477.branch-2.8.patch] is for branch 2.8. Anything else should I do to make the patch committed? > Stop decommission a rack of DataNodes caused NameNode fail over to standby > -- > > Key: HDFS-10477 > URL: https://issues.apache.org/jira/browse/HDFS-10477 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.2 >Reporter: yunjiong zhao >Assignee: yunjiong zhao >Priority: Major > Fix For: 2.10.0, 3.0.4, 3.3.0, 3.2.1, 2.9.3, 3.1.3 > > Attachments: HDFS-10477.002.patch, HDFS-10477.003.patch, > HDFS-10477.004.patch, HDFS-10477.005.patch, HDFS-10477.006.patch, > HDFS-10477.007.patch, HDFS-10477.branch-2.8.patch, HDFS-10477.branch-2.patch, > HDFS-10477.patch > > > In our cluster, when we stop decommissioning a rack which have 46 DataNodes, > it locked Namesystem for about 7 minutes as below log shows: > {code} > 2016-05-26 20:11:41,697 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.27:1004 > 2016-05-26 20:11:51,171 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 285258 over-replicated blocks on 10.142.27.27:1004 during recommissioning > 2016-05-26 20:11:51,171 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.118:1004 > 2016-05-26 20:11:59,972 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 279923 over-replicated blocks on 10.142.27.118:1004 during recommissioning > 2016-05-26 20:11:59,972 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.113:1004 > 2016-05-26 20:12:09,007 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 294307 over-replicated blocks on 10.142.27.113:1004 during recommissioning > 2016-05-26 20:12:09,008 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.117:1004 > 2016-05-26 20:12:18,055 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 314381 over-replicated blocks on 10.142.27.117:1004 during recommissioning > 2016-05-26 20:12:18,056 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.130:1004 > 2016-05-26 20:12:25,938 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 272779 over-replicated blocks on 10.142.27.130:1004 during recommissioning > 2016-05-26 20:12:25,939 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.121:1004 > 2016-05-26 20:12:34,134 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 287248 over-replicated blocks on 10.142.27.121:1004 during recommissioning > 2016-05-26 20:12:34,134 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.33:1004 > 2016-05-26 20:12:43,020 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 299868 over-replicated blocks on 10.142.27.33:1004 during recommissioning > 2016-05-26 20:12:43,020 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.137:1004 > 2016-05-26 20:12:52,220 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 303914 over-replicated blocks on 10.142.27.137:1004 during recommissioning > 2016-05-26 20:12:52,220 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.51:1004 > 2016-05-26 20:13:00,362 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 281175 over-replicated blocks on 10.142.27.51:1004 during recommissioning > 2016-05-26 20:13:00,362 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.12:1004 > 2016-05-26 20:13:08,756 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 274880 over-replicated blocks on 10.142.27.12:1004 during recommissioning > 2016-05-26 20:13:08,757 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.15:1004 > 2016-05-26 20:13:17,185 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 286334 over-replicated blocks on 10.142.27.15:1004 during recommissioning > 2016-05-26 20:13:17,185 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.14:1004 > 2016-05-26
[jira] [Assigned] (HDFS-14424) NN failover failed beacuse of lossing paxos directory
[ https://issues.apache.org/jira/browse/HDFS-14424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] star reassigned HDFS-14424: --- Assignee: star > NN failover failed beacuse of lossing paxos directory > - > > Key: HDFS-14424 > URL: https://issues.apache.org/jira/browse/HDFS-14424 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: star >Assignee: star >Priority: Major > > Recently, our hadoop namenode shutdown when switching active namenode, just > because of missing paxos directory. It is created in the default /tmp path > and deleted by os for no operation in 7 days. We can avoid this by moving > journal directory to a none tmp dir, but it‘s better to make sure namenode > works well by a default config. > The issue throws exception similar to HDFS-10659, also caused by missing > paxos directory. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14424) NN failover failed beacuse of lossing paxos directory
[ https://issues.apache.org/jira/browse/HDFS-14424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] star updated HDFS-14424: Description: Recently, our hadoop namenode shutdown when switching active namenode, just because of missing paxos directory. It is created in the default /tmp path and deleted by os for no operation in 7 days. We can avoid this by moving journal directory to a none tmp dir, but it‘s better to make sure namenode works well by a default config. The issue throws exception similar to HDFS-10659, also caused by missing paxos directory. > NN failover failed beacuse of lossing paxos directory > - > > Key: HDFS-14424 > URL: https://issues.apache.org/jira/browse/HDFS-14424 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: star >Priority: Major > > Recently, our hadoop namenode shutdown when switching active namenode, just > because of missing paxos directory. It is created in the default /tmp path > and deleted by os for no operation in 7 days. We can avoid this by moving > journal directory to a none tmp dir, but it‘s better to make sure namenode > works well by a default config. > The issue throws exception similar to HDFS-10659, also caused by missing > paxos directory. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-14424) NN start beacuse of lossing paxos directory
star created HDFS-14424: --- Summary: NN start beacuse of lossing paxos directory Key: HDFS-14424 URL: https://issues.apache.org/jira/browse/HDFS-14424 Project: Hadoop HDFS Issue Type: Bug Reporter: star -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14424) NN failover failed beacuse of lossing paxos directory
[ https://issues.apache.org/jira/browse/HDFS-14424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] star updated HDFS-14424: Summary: NN failover failed beacuse of lossing paxos directory (was: NN start beacuse of lossing paxos directory) > NN failover failed beacuse of lossing paxos directory > - > > Key: HDFS-14424 > URL: https://issues.apache.org/jira/browse/HDFS-14424 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: star >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10477) Stop decommission a rack of DataNodes caused NameNode fail over to standby
[ https://issues.apache.org/jira/browse/HDFS-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16815013#comment-16815013 ] star commented on HDFS-10477: - [~jojochuang], uploaded patch for branch-2.8. > Stop decommission a rack of DataNodes caused NameNode fail over to standby > -- > > Key: HDFS-10477 > URL: https://issues.apache.org/jira/browse/HDFS-10477 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.2 >Reporter: yunjiong zhao >Assignee: yunjiong zhao >Priority: Major > Fix For: 2.10.0, 3.0.4, 3.3.0, 3.2.1, 2.9.3, 3.1.3 > > Attachments: HDFS-10477.002.patch, HDFS-10477.003.patch, > HDFS-10477.004.patch, HDFS-10477.005.patch, HDFS-10477.006.patch, > HDFS-10477.007.patch, HDFS-10477.branch-2.8.patch, HDFS-10477.branch-2.patch, > HDFS-10477.patch > > > In our cluster, when we stop decommissioning a rack which have 46 DataNodes, > it locked Namesystem for about 7 minutes as below log shows: > {code} > 2016-05-26 20:11:41,697 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.27:1004 > 2016-05-26 20:11:51,171 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 285258 over-replicated blocks on 10.142.27.27:1004 during recommissioning > 2016-05-26 20:11:51,171 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.118:1004 > 2016-05-26 20:11:59,972 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 279923 over-replicated blocks on 10.142.27.118:1004 during recommissioning > 2016-05-26 20:11:59,972 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.113:1004 > 2016-05-26 20:12:09,007 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 294307 over-replicated blocks on 10.142.27.113:1004 during recommissioning > 2016-05-26 20:12:09,008 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.117:1004 > 2016-05-26 20:12:18,055 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 314381 over-replicated blocks on 10.142.27.117:1004 during recommissioning > 2016-05-26 20:12:18,056 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.130:1004 > 2016-05-26 20:12:25,938 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 272779 over-replicated blocks on 10.142.27.130:1004 during recommissioning > 2016-05-26 20:12:25,939 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.121:1004 > 2016-05-26 20:12:34,134 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 287248 over-replicated blocks on 10.142.27.121:1004 during recommissioning > 2016-05-26 20:12:34,134 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.33:1004 > 2016-05-26 20:12:43,020 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 299868 over-replicated blocks on 10.142.27.33:1004 during recommissioning > 2016-05-26 20:12:43,020 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.137:1004 > 2016-05-26 20:12:52,220 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 303914 over-replicated blocks on 10.142.27.137:1004 during recommissioning > 2016-05-26 20:12:52,220 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.51:1004 > 2016-05-26 20:13:00,362 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 281175 over-replicated blocks on 10.142.27.51:1004 during recommissioning > 2016-05-26 20:13:00,362 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.12:1004 > 2016-05-26 20:13:08,756 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 274880 over-replicated blocks on 10.142.27.12:1004 during recommissioning > 2016-05-26 20:13:08,757 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.15:1004 > 2016-05-26 20:13:17,185 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 286334 over-replicated blocks on 10.142.27.15:1004 during recommissioning > 2016-05-26 20:13:17,185 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.14:1004 > 2016-05-26 20:13:25,369 INFO >
[jira] [Updated] (HDFS-10477) Stop decommission a rack of DataNodes caused NameNode fail over to standby
[ https://issues.apache.org/jira/browse/HDFS-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] star updated HDFS-10477: Attachment: HDFS-10477.branch-2.8.patch > Stop decommission a rack of DataNodes caused NameNode fail over to standby > -- > > Key: HDFS-10477 > URL: https://issues.apache.org/jira/browse/HDFS-10477 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.2 >Reporter: yunjiong zhao >Assignee: yunjiong zhao >Priority: Major > Fix For: 2.10.0, 3.0.4, 3.3.0, 3.2.1, 2.9.3, 3.1.3 > > Attachments: HDFS-10477.002.patch, HDFS-10477.003.patch, > HDFS-10477.004.patch, HDFS-10477.005.patch, HDFS-10477.006.patch, > HDFS-10477.007.patch, HDFS-10477.branch-2.8.patch, HDFS-10477.branch-2.patch, > HDFS-10477.patch > > > In our cluster, when we stop decommissioning a rack which have 46 DataNodes, > it locked Namesystem for about 7 minutes as below log shows: > {code} > 2016-05-26 20:11:41,697 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.27:1004 > 2016-05-26 20:11:51,171 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 285258 over-replicated blocks on 10.142.27.27:1004 during recommissioning > 2016-05-26 20:11:51,171 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.118:1004 > 2016-05-26 20:11:59,972 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 279923 over-replicated blocks on 10.142.27.118:1004 during recommissioning > 2016-05-26 20:11:59,972 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.113:1004 > 2016-05-26 20:12:09,007 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 294307 over-replicated blocks on 10.142.27.113:1004 during recommissioning > 2016-05-26 20:12:09,008 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.117:1004 > 2016-05-26 20:12:18,055 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 314381 over-replicated blocks on 10.142.27.117:1004 during recommissioning > 2016-05-26 20:12:18,056 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.130:1004 > 2016-05-26 20:12:25,938 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 272779 over-replicated blocks on 10.142.27.130:1004 during recommissioning > 2016-05-26 20:12:25,939 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.121:1004 > 2016-05-26 20:12:34,134 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 287248 over-replicated blocks on 10.142.27.121:1004 during recommissioning > 2016-05-26 20:12:34,134 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.33:1004 > 2016-05-26 20:12:43,020 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 299868 over-replicated blocks on 10.142.27.33:1004 during recommissioning > 2016-05-26 20:12:43,020 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.137:1004 > 2016-05-26 20:12:52,220 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 303914 over-replicated blocks on 10.142.27.137:1004 during recommissioning > 2016-05-26 20:12:52,220 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.51:1004 > 2016-05-26 20:13:00,362 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 281175 over-replicated blocks on 10.142.27.51:1004 during recommissioning > 2016-05-26 20:13:00,362 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.12:1004 > 2016-05-26 20:13:08,756 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 274880 over-replicated blocks on 10.142.27.12:1004 during recommissioning > 2016-05-26 20:13:08,757 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.15:1004 > 2016-05-26 20:13:17,185 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 286334 over-replicated blocks on 10.142.27.15:1004 during recommissioning > 2016-05-26 20:13:17,185 INFO > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Stop > Decommissioning 10.142.27.14:1004 > 2016-05-26 20:13:25,369 INFO > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Invalidated > 280219 over-replicated
[jira] [Commented] (HDFS-13596) NN restart fails after RollingUpgrade from 2.x to 3.x
[ https://issues.apache.org/jira/browse/HDFS-13596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16814975#comment-16814975 ] star commented on HDFS-13596: - Sorry for my mistake. Indeed length field is not behaving the way as my expectation. It just skip 4 bytes of checksum. {quote}IOUtils.skipFully(in, 4 + 8); // skip length and txid op.readFields(in, logVersion); // skip over the checksum, which we validated above. IOUtils.skipFully(in, CHECKSUM_LENGTH);{quote} > NN restart fails after RollingUpgrade from 2.x to 3.x > - > > Key: HDFS-13596 > URL: https://issues.apache.org/jira/browse/HDFS-13596 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Reporter: Hanisha Koneru >Assignee: Fei Hui >Priority: Critical > Attachments: HDFS-13596.001.patch, HDFS-13596.002.patch, > HDFS-13596.003.patch, HDFS-13596.004.patch > > > After rollingUpgrade NN from 2.x and 3.x, if the NN is restarted, it fails > while replaying edit logs. > * After NN is started with rollingUpgrade, the layoutVersion written to > editLogs (before finalizing the upgrade) is the pre-upgrade layout version > (so as to support downgrade). > * When writing transactions to log, NN writes as per the current layout > version. In 3.x, erasureCoding bits are added to the editLog transactions. > * So any edit log written after the upgrade and before finalizing the > upgrade will have the old layout version but the new format of transactions. > * When NN is restarted and the edit logs are replayed, the NN reads the old > layout version from the editLog file. When parsing the transactions, it > assumes that the transactions are also from the previous layout and hence > skips parsing the erasureCoding bits. > * This cascades into reading the wrong set of bits for other fields and > leads to NN shutting down. > Sample error output: > {code:java} > java.lang.IllegalArgumentException: Invalid clientId - length is 0 expected > length 16 > at com.google.common.base.Preconditions.checkArgument(Preconditions.java:88) > at org.apache.hadoop.ipc.RetryCache$CacheEntry.(RetryCache.java:74) > at org.apache.hadoop.ipc.RetryCache$CacheEntry.(RetryCache.java:86) > at > org.apache.hadoop.ipc.RetryCache$CacheEntryWithPayload.(RetryCache.java:163) > at > org.apache.hadoop.ipc.RetryCache.addCacheEntryWithPayload(RetryCache.java:322) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.addCacheEntryWithPayload(FSNamesystem.java:960) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:397) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:249) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158) > at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:888) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:745) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:323) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1086) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:714) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:632) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:694) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:937) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:910) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1643) > at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1710) > 2018-05-17 19:10:06,522 WARN > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Encountered exception > loading fsimage > java.io.IOException: java.lang.IllegalStateException: Cannot skip to less > than the current value (=16389), where newValue=16388 > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.resetLastInodeId(FSDirectory.java:1945) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:298) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158) > at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:888) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:745) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:323) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1086) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:714) > at >
[jira] [Commented] (HDFS-13596) NN restart fails after RollingUpgrade from 2.x to 3.x
[ https://issues.apache.org/jira/browse/HDFS-13596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16813327#comment-16813327 ] star commented on HDFS-13596: - New properties will be ignored as I checked the code. There ls a length field in every edit log. To write new properties after older ones is not just about this issue, but a rule for future extension. If not, there's high risks that hdfs metadata being broken after finanizing. Because until then new properties are written in editlogs, which is not verified in the process of rolling upgrade. Metadata will not be recoverable at that time. Maybe I 'm wrong on this, feel free to correct. > NN restart fails after RollingUpgrade from 2.x to 3.x > - > > Key: HDFS-13596 > URL: https://issues.apache.org/jira/browse/HDFS-13596 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Reporter: Hanisha Koneru >Assignee: Fei Hui >Priority: Critical > Attachments: HDFS-13596.001.patch, HDFS-13596.002.patch, > HDFS-13596.003.patch, HDFS-13596.004.patch > > > After rollingUpgrade NN from 2.x and 3.x, if the NN is restarted, it fails > while replaying edit logs. > * After NN is started with rollingUpgrade, the layoutVersion written to > editLogs (before finalizing the upgrade) is the pre-upgrade layout version > (so as to support downgrade). > * When writing transactions to log, NN writes as per the current layout > version. In 3.x, erasureCoding bits are added to the editLog transactions. > * So any edit log written after the upgrade and before finalizing the > upgrade will have the old layout version but the new format of transactions. > * When NN is restarted and the edit logs are replayed, the NN reads the old > layout version from the editLog file. When parsing the transactions, it > assumes that the transactions are also from the previous layout and hence > skips parsing the erasureCoding bits. > * This cascades into reading the wrong set of bits for other fields and > leads to NN shutting down. > Sample error output: > {code:java} > java.lang.IllegalArgumentException: Invalid clientId - length is 0 expected > length 16 > at com.google.common.base.Preconditions.checkArgument(Preconditions.java:88) > at org.apache.hadoop.ipc.RetryCache$CacheEntry.(RetryCache.java:74) > at org.apache.hadoop.ipc.RetryCache$CacheEntry.(RetryCache.java:86) > at > org.apache.hadoop.ipc.RetryCache$CacheEntryWithPayload.(RetryCache.java:163) > at > org.apache.hadoop.ipc.RetryCache.addCacheEntryWithPayload(RetryCache.java:322) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.addCacheEntryWithPayload(FSNamesystem.java:960) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:397) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:249) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158) > at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:888) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:745) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:323) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1086) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:714) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:632) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:694) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:937) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:910) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1643) > at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1710) > 2018-05-17 19:10:06,522 WARN > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Encountered exception > loading fsimage > java.io.IOException: java.lang.IllegalStateException: Cannot skip to less > than the current value (=16389), where newValue=16388 > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.resetLastInodeId(FSDirectory.java:1945) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:298) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158) > at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:888) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:745) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:323) > at >
[jira] [Commented] (HDFS-14403) Cost-Based RPC FairCallQueue
[ https://issues.apache.org/jira/browse/HDFS-14403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16811877#comment-16811877 ] star commented on HDFS-14403: - [~xkrogen], Impressive results. Can you give a detailed information about how much listStatus on a directory with one subdirectories and listStatus on a directory with 1000 subdirectories in your benckmark tests? Is it possible that scheduler with LockCostProvider schedules more low cost operations like listStatus on a directory with one subdirectories, which results in a lower queue time. Or it makes sense for LockCostProvider. > Cost-Based RPC FairCallQueue > > > Key: HDFS-14403 > URL: https://issues.apache.org/jira/browse/HDFS-14403 > Project: Hadoop HDFS > Issue Type: Improvement > Components: ipc, namenode >Reporter: Erik Krogen >Assignee: Christopher Gregorian >Priority: Major > Labels: qos, rpc > Attachments: CostBasedFairCallQueueDesign_v0.pdf, HDFS-14403.001.patch > > > HADOOP-15016 initially described extensions to the Hadoop FairCallQueue > encompassing both cost-based analysis of incoming RPCs, as well as support > for reservations of RPC capacity for system/platform users. This JIRA intends > to track the former, as HADOOP-15016 was repurposed to more specifically > focus on the reservation portion of the work. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-13596) NN restart fails after RollingUpgrade from 2.x to 3.x
[ https://issues.apache.org/jira/browse/HDFS-13596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16811584#comment-16811584 ] star edited comment on HDFS-13596 at 4/6/19 2:35 PM: - Maybe another option:we can write new properties after the older ones, so that when restarted they can be ignored. EC requests will still be accepted and persisted in editlog before finalizing rollingupgrade. In this case , it would be like this. from {quote}FSImageSerialization.writeByte(storagePolicyId, out); FSImageSerialization.writeByte(erasureCodingPolicyId, out); writeRpcIds(rpcClientId, rpcCallId, out); {quote} to {quote}writeRpcIds(rpcClientId, rpcCallId, out); FSImageSerialization.writeByte(storagePolicyId, out); FSImageSerialization.writeByte(erasureCodingPolicyId, out); {quote} was (Author: starphin): Maybe another option:we can write new properties after the older ones, so that when restarted they can be ignored. EC requests will still be accepted and persisted in editlog. In this case , it would be like this. from {quote}FSImageSerialization.writeByte(storagePolicyId, out); FSImageSerialization.writeByte(erasureCodingPolicyId, out); writeRpcIds(rpcClientId, rpcCallId, out); {quote} to {quote}writeRpcIds(rpcClientId, rpcCallId, out); FSImageSerialization.writeByte(storagePolicyId, out); FSImageSerialization.writeByte(erasureCodingPolicyId, out); {quote} > NN restart fails after RollingUpgrade from 2.x to 3.x > - > > Key: HDFS-13596 > URL: https://issues.apache.org/jira/browse/HDFS-13596 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Reporter: Hanisha Koneru >Assignee: Fei Hui >Priority: Critical > Attachments: HDFS-13596.001.patch, HDFS-13596.002.patch, > HDFS-13596.003.patch > > > After rollingUpgrade NN from 2.x and 3.x, if the NN is restarted, it fails > while replaying edit logs. > * After NN is started with rollingUpgrade, the layoutVersion written to > editLogs (before finalizing the upgrade) is the pre-upgrade layout version > (so as to support downgrade). > * When writing transactions to log, NN writes as per the current layout > version. In 3.x, erasureCoding bits are added to the editLog transactions. > * So any edit log written after the upgrade and before finalizing the > upgrade will have the old layout version but the new format of transactions. > * When NN is restarted and the edit logs are replayed, the NN reads the old > layout version from the editLog file. When parsing the transactions, it > assumes that the transactions are also from the previous layout and hence > skips parsing the erasureCoding bits. > * This cascades into reading the wrong set of bits for other fields and > leads to NN shutting down. > Sample error output: > {code:java} > java.lang.IllegalArgumentException: Invalid clientId - length is 0 expected > length 16 > at com.google.common.base.Preconditions.checkArgument(Preconditions.java:88) > at org.apache.hadoop.ipc.RetryCache$CacheEntry.(RetryCache.java:74) > at org.apache.hadoop.ipc.RetryCache$CacheEntry.(RetryCache.java:86) > at > org.apache.hadoop.ipc.RetryCache$CacheEntryWithPayload.(RetryCache.java:163) > at > org.apache.hadoop.ipc.RetryCache.addCacheEntryWithPayload(RetryCache.java:322) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.addCacheEntryWithPayload(FSNamesystem.java:960) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:397) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:249) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158) > at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:888) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:745) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:323) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1086) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:714) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:632) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:694) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:937) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:910) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1643) > at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1710) > 2018-05-17 19:10:06,522 WARN > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Encountered exception > loading fsimage >
[jira] [Comment Edited] (HDFS-13596) NN restart fails after RollingUpgrade from 2.x to 3.x
[ https://issues.apache.org/jira/browse/HDFS-13596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16811584#comment-16811584 ] star edited comment on HDFS-13596 at 4/6/19 2:34 PM: - Maybe another option:we can write new properties after the older ones, so that when restarted they can be ignored. EC requests will still be accepted and persisted in editlog. In this case , it would be like this. from {quote}FSImageSerialization.writeByte(storagePolicyId, out); FSImageSerialization.writeByte(erasureCodingPolicyId, out); writeRpcIds(rpcClientId, rpcCallId, out); {quote} to {quote}writeRpcIds(rpcClientId, rpcCallId, out); FSImageSerialization.writeByte(storagePolicyId, out); FSImageSerialization.writeByte(erasureCodingPolicyId, out); {quote} was (Author: starphin): Maybe another option:we can write new properties after the older ones, so that when restarted they can be ignored. EC requests will still be accepted. In this case , it would be like this. from {quote}FSImageSerialization.writeByte(storagePolicyId, out); FSImageSerialization.writeByte(erasureCodingPolicyId, out); writeRpcIds(rpcClientId, rpcCallId, out); {quote} to {quote}writeRpcIds(rpcClientId, rpcCallId, out); FSImageSerialization.writeByte(storagePolicyId, out); FSImageSerialization.writeByte(erasureCodingPolicyId, out); {quote} > NN restart fails after RollingUpgrade from 2.x to 3.x > - > > Key: HDFS-13596 > URL: https://issues.apache.org/jira/browse/HDFS-13596 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Reporter: Hanisha Koneru >Assignee: Fei Hui >Priority: Critical > Attachments: HDFS-13596.001.patch, HDFS-13596.002.patch, > HDFS-13596.003.patch > > > After rollingUpgrade NN from 2.x and 3.x, if the NN is restarted, it fails > while replaying edit logs. > * After NN is started with rollingUpgrade, the layoutVersion written to > editLogs (before finalizing the upgrade) is the pre-upgrade layout version > (so as to support downgrade). > * When writing transactions to log, NN writes as per the current layout > version. In 3.x, erasureCoding bits are added to the editLog transactions. > * So any edit log written after the upgrade and before finalizing the > upgrade will have the old layout version but the new format of transactions. > * When NN is restarted and the edit logs are replayed, the NN reads the old > layout version from the editLog file. When parsing the transactions, it > assumes that the transactions are also from the previous layout and hence > skips parsing the erasureCoding bits. > * This cascades into reading the wrong set of bits for other fields and > leads to NN shutting down. > Sample error output: > {code:java} > java.lang.IllegalArgumentException: Invalid clientId - length is 0 expected > length 16 > at com.google.common.base.Preconditions.checkArgument(Preconditions.java:88) > at org.apache.hadoop.ipc.RetryCache$CacheEntry.(RetryCache.java:74) > at org.apache.hadoop.ipc.RetryCache$CacheEntry.(RetryCache.java:86) > at > org.apache.hadoop.ipc.RetryCache$CacheEntryWithPayload.(RetryCache.java:163) > at > org.apache.hadoop.ipc.RetryCache.addCacheEntryWithPayload(RetryCache.java:322) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.addCacheEntryWithPayload(FSNamesystem.java:960) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:397) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:249) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158) > at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:888) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:745) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:323) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1086) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:714) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:632) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:694) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:937) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:910) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1643) > at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1710) > 2018-05-17 19:10:06,522 WARN > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Encountered exception > loading fsimage > java.io.IOException: java.lang.IllegalStateException: Cannot skip
[jira] [Commented] (HDFS-13596) NN restart fails after RollingUpgrade from 2.x to 3.x
[ https://issues.apache.org/jira/browse/HDFS-13596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16811584#comment-16811584 ] star commented on HDFS-13596: - Maybe another option:we can write new properties after the older ones, so that when restarted they can be ignored. EC requests will still be accepted. In this case , it would be like this. from {quote}FSImageSerialization.writeByte(storagePolicyId, out); FSImageSerialization.writeByte(erasureCodingPolicyId, out); writeRpcIds(rpcClientId, rpcCallId, out); {quote} to {quote}writeRpcIds(rpcClientId, rpcCallId, out); FSImageSerialization.writeByte(storagePolicyId, out); FSImageSerialization.writeByte(erasureCodingPolicyId, out); {quote} > NN restart fails after RollingUpgrade from 2.x to 3.x > - > > Key: HDFS-13596 > URL: https://issues.apache.org/jira/browse/HDFS-13596 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Reporter: Hanisha Koneru >Assignee: Fei Hui >Priority: Critical > Attachments: HDFS-13596.001.patch, HDFS-13596.002.patch, > HDFS-13596.003.patch > > > After rollingUpgrade NN from 2.x and 3.x, if the NN is restarted, it fails > while replaying edit logs. > * After NN is started with rollingUpgrade, the layoutVersion written to > editLogs (before finalizing the upgrade) is the pre-upgrade layout version > (so as to support downgrade). > * When writing transactions to log, NN writes as per the current layout > version. In 3.x, erasureCoding bits are added to the editLog transactions. > * So any edit log written after the upgrade and before finalizing the > upgrade will have the old layout version but the new format of transactions. > * When NN is restarted and the edit logs are replayed, the NN reads the old > layout version from the editLog file. When parsing the transactions, it > assumes that the transactions are also from the previous layout and hence > skips parsing the erasureCoding bits. > * This cascades into reading the wrong set of bits for other fields and > leads to NN shutting down. > Sample error output: > {code:java} > java.lang.IllegalArgumentException: Invalid clientId - length is 0 expected > length 16 > at com.google.common.base.Preconditions.checkArgument(Preconditions.java:88) > at org.apache.hadoop.ipc.RetryCache$CacheEntry.(RetryCache.java:74) > at org.apache.hadoop.ipc.RetryCache$CacheEntry.(RetryCache.java:86) > at > org.apache.hadoop.ipc.RetryCache$CacheEntryWithPayload.(RetryCache.java:163) > at > org.apache.hadoop.ipc.RetryCache.addCacheEntryWithPayload(RetryCache.java:322) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.addCacheEntryWithPayload(FSNamesystem.java:960) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:397) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:249) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158) > at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:888) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:745) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:323) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1086) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:714) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:632) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:694) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:937) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:910) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1643) > at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1710) > 2018-05-17 19:10:06,522 WARN > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Encountered exception > loading fsimage > java.io.IOException: java.lang.IllegalStateException: Cannot skip to less > than the current value (=16389), where newValue=16388 > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.resetLastInodeId(FSDirectory.java:1945) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:298) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158) > at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:888) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:745) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:323) > at >
[jira] [Commented] (HDFS-14378) Simplify the design of multiple NN and both logic of edit log roll and checkpoint
[ https://issues.apache.org/jira/browse/HDFS-14378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16807719#comment-16807719 ] star commented on HDFS-14378: - Initial patch for review. [~xkrogen] would you like to make a review for the patch? Don't have too much time these days. Maybe there are better solutions for this patch. I'd like to have some suggestion for this. > Simplify the design of multiple NN and both logic of edit log roll and > checkpoint > - > > Key: HDFS-14378 > URL: https://issues.apache.org/jira/browse/HDFS-14378 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: star >Assignee: star >Priority: Minor > Attachments: HDFS-14378-trunk.001.patch, HDFS-14378-trunk.002.patch > > > HDFS-6440 introduced a mechanism to support more than 2 NNs. It > implements a first-writer-win policy to avoid duplicated fsimage downloading. > Variable 'isPrimaryCheckPointer' is used to hold the first-writer state, with > which SNN will provide fsimage for ANN next time. Then we have three roles in > NN cluster: ANN, one primary SNN, one or more normal SNN. > Since HDFS-12248, there may be more than two primary SNN shortly after > a exception occurred. It takes care with a scenario that SNN will not upload > fsimage on IOE and Interrupted exceptions. Though it will not cause any > further functional issues, it is inconsistent. > Futher more, edit log may be rolled more frequently than necessary with > multiple Standby name nodes, HDFS-14349. (I'm not so sure about this, will > verify by unit tests or any one could point it out.) > Above all, I‘m wondering if we could make it simple with following > changes: > * There are only two roles:ANN, SNN > * ANN will roll its edit log every DFS_HA_LOGROLL_PERIOD_KEY period. > * ANN will select a SNN to download checkpoint. > SNN will just do logtail and checkpoint. Then provide a servlet for fsimage > downloading as normal. SNN will not try to roll edit log or send checkpoint > request to ANN. > In a word, ANN will be more active. Suggestions are welcomed. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14378) Simplify the design of multiple NN and both logic of edit log roll and checkpoint
[ https://issues.apache.org/jira/browse/HDFS-14378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] star updated HDFS-14378: Attachment: HDFS-14378-trunk.002.patch > Simplify the design of multiple NN and both logic of edit log roll and > checkpoint > - > > Key: HDFS-14378 > URL: https://issues.apache.org/jira/browse/HDFS-14378 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: star >Assignee: star >Priority: Minor > Attachments: HDFS-14378-trunk.001.patch, HDFS-14378-trunk.002.patch > > > HDFS-6440 introduced a mechanism to support more than 2 NNs. It > implements a first-writer-win policy to avoid duplicated fsimage downloading. > Variable 'isPrimaryCheckPointer' is used to hold the first-writer state, with > which SNN will provide fsimage for ANN next time. Then we have three roles in > NN cluster: ANN, one primary SNN, one or more normal SNN. > Since HDFS-12248, there may be more than two primary SNN shortly after > a exception occurred. It takes care with a scenario that SNN will not upload > fsimage on IOE and Interrupted exceptions. Though it will not cause any > further functional issues, it is inconsistent. > Futher more, edit log may be rolled more frequently than necessary with > multiple Standby name nodes, HDFS-14349. (I'm not so sure about this, will > verify by unit tests or any one could point it out.) > Above all, I‘m wondering if we could make it simple with following > changes: > * There are only two roles:ANN, SNN > * ANN will roll its edit log every DFS_HA_LOGROLL_PERIOD_KEY period. > * ANN will select a SNN to download checkpoint. > SNN will just do logtail and checkpoint. Then provide a servlet for fsimage > downloading as normal. SNN will not try to roll edit log or send checkpoint > request to ANN. > In a word, ANN will be more active. Suggestions are welcomed. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14370) Edit log tailing fast-path should allow for backoff
[ https://issues.apache.org/jira/browse/HDFS-14370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16807341#comment-16807341 ] star commented on HDFS-14370: - [~xkrogen], agree. Server side handler as limited resource should not be blocked. {quote}This is similar to how the NN, when it wants a client to backoff due to load, throws a backoff exception and expects the client to act accordingly. {quote} > Edit log tailing fast-path should allow for backoff > --- > > Key: HDFS-14370 > URL: https://issues.apache.org/jira/browse/HDFS-14370 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode, qjm >Affects Versions: 3.3.0 >Reporter: Erik Krogen >Assignee: Erik Krogen >Priority: Major > > As part of HDFS-13150, in-progress edit log tailing was changed to use an > RPC-based mechanism, thus allowing the edit log tailing frequency to be > turned way down, and allowing standby/observer NameNodes to be only a few > milliseconds stale as compared to the Active NameNode. > When there is a high volume of transactions on the system, each RPC fetches > transactions and takes some time to process them, self-rate-limiting how > frequently an RPC is submitted. In a lightly loaded cluster, however, most of > these RPCs return an empty set of transactions, consuming a high > (de)serialization overhead for very little benefit. This was reported by > [~jojochuang] in HDFS-14276 and I have also seen it on a test cluster where > the SbNN was submitting 8000 RPCs per second that returned empty. > I propose we add some sort of backoff to the tailing, so that if an empty > response is received, it will wait a longer period of time before submitting > a new RPC. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14276) [SBN read] Reduce tailing overhead
[ https://issues.apache.org/jira/browse/HDFS-14276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16806918#comment-16806918 ] star commented on HDFS-14276: - Maybe we can just sleep 10ms in rpc server side if there are no edit logs. As a consequence that rpc thread will be blocked for 10ms. > [SBN read] Reduce tailing overhead > -- > > Key: HDFS-14276 > URL: https://issues.apache.org/jira/browse/HDFS-14276 > Project: Hadoop HDFS > Issue Type: Improvement > Components: ha, namenode >Affects Versions: 3.3.0 > Environment: Hardware: 4-node cluster, each node has 4 core, Xeon > 2.5Ghz, 25GB memory. > Software: CentOS 7.4, CDH 6.0 + Consistent Reads from Standby, Kerberos, SSL, > RPC encryption + Data Transfer Encryption. >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang >Priority: Major > Attachments: HDFS-14276.000.patch, Screen Shot 2019-02-12 at 10.51.41 > PM.png, Screen Shot 2019-02-14 at 11.50.37 AM.png > > > When Observer sets {{dfs.ha.tail-edits.period}} = {{0ms}}, it tails edit log > continuously in order to fetch the latest edits, but there is a lot of > overhead in doing so. > Critically, edit log tailer should _not_ update NameDirSize metric every > time. It has nothing to do with fetching edits, and it involves lots of > directory space calculation. > Profiler suggests a non-trivial chunk of time is spent for nothing. > Other than this, the biggest overhead is in the communication to > serialize/deserialize messages to/from JNs. I am looking for ways to reduce > the cost because it's burning 30% of my CPU time even when the cluster is > idle. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14378) Simplify the design of multiple NN and both logic of edit log roll and checkpoint
[ https://issues.apache.org/jira/browse/HDFS-14378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] star updated HDFS-14378: Attachment: (was: HDFS-14378-trunk.002.patch) > Simplify the design of multiple NN and both logic of edit log roll and > checkpoint > - > > Key: HDFS-14378 > URL: https://issues.apache.org/jira/browse/HDFS-14378 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: star >Assignee: star >Priority: Minor > Attachments: HDFS-14378-trunk.001.patch > > > HDFS-6440 introduced a mechanism to support more than 2 NNs. It > implements a first-writer-win policy to avoid duplicated fsimage downloading. > Variable 'isPrimaryCheckPointer' is used to hold the first-writer state, with > which SNN will provide fsimage for ANN next time. Then we have three roles in > NN cluster: ANN, one primary SNN, one or more normal SNN. > Since HDFS-12248, there may be more than two primary SNN shortly after > a exception occurred. It takes care with a scenario that SNN will not upload > fsimage on IOE and Interrupted exceptions. Though it will not cause any > further functional issues, it is inconsistent. > Futher more, edit log may be rolled more frequently than necessary with > multiple Standby name nodes, HDFS-14349. (I'm not so sure about this, will > verify by unit tests or any one could point it out.) > Above all, I‘m wondering if we could make it simple with following > changes: > * There are only two roles:ANN, SNN > * ANN will roll its edit log every DFS_HA_LOGROLL_PERIOD_KEY period. > * ANN will select a SNN to download checkpoint. > SNN will just do logtail and checkpoint. Then provide a servlet for fsimage > downloading as normal. SNN will not try to roll edit log or send checkpoint > request to ANN. > In a word, ANN will be more active. Suggestions are welcomed. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14378) Simplify the design of multiple NN and both logic of edit log roll and checkpoint
[ https://issues.apache.org/jira/browse/HDFS-14378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] star updated HDFS-14378: Attachment: HDFS-14378-trunk.002.patch > Simplify the design of multiple NN and both logic of edit log roll and > checkpoint > - > > Key: HDFS-14378 > URL: https://issues.apache.org/jira/browse/HDFS-14378 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: star >Assignee: star >Priority: Minor > Attachments: HDFS-14378-trunk.001.patch, HDFS-14378-trunk.002.patch > > > HDFS-6440 introduced a mechanism to support more than 2 NNs. It > implements a first-writer-win policy to avoid duplicated fsimage downloading. > Variable 'isPrimaryCheckPointer' is used to hold the first-writer state, with > which SNN will provide fsimage for ANN next time. Then we have three roles in > NN cluster: ANN, one primary SNN, one or more normal SNN. > Since HDFS-12248, there may be more than two primary SNN shortly after > a exception occurred. It takes care with a scenario that SNN will not upload > fsimage on IOE and Interrupted exceptions. Though it will not cause any > further functional issues, it is inconsistent. > Futher more, edit log may be rolled more frequently than necessary with > multiple Standby name nodes, HDFS-14349. (I'm not so sure about this, will > verify by unit tests or any one could point it out.) > Above all, I‘m wondering if we could make it simple with following > changes: > * There are only two roles:ANN, SNN > * ANN will roll its edit log every DFS_HA_LOGROLL_PERIOD_KEY period. > * ANN will select a SNN to download checkpoint. > SNN will just do logtail and checkpoint. Then provide a servlet for fsimage > downloading as normal. SNN will not try to roll edit log or send checkpoint > request to ANN. > In a word, ANN will be more active. Suggestions are welcomed. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14378) Simplify the design of multiple NN and both logic of edit log roll and checkpoint
[ https://issues.apache.org/jira/browse/HDFS-14378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16806045#comment-16806045 ] star commented on HDFS-14378: - first step: make ANN rolling its own log. last step: make ANN download a fsimage from a randomly chosen SNN. Later will be added. > Simplify the design of multiple NN and both logic of edit log roll and > checkpoint > - > > Key: HDFS-14378 > URL: https://issues.apache.org/jira/browse/HDFS-14378 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: star >Assignee: star >Priority: Minor > Attachments: HDFS-14378-trunk.001.patch > > > HDFS-6440 introduced a mechanism to support more than 2 NNs. It > implements a first-writer-win policy to avoid duplicated fsimage downloading. > Variable 'isPrimaryCheckPointer' is used to hold the first-writer state, with > which SNN will provide fsimage for ANN next time. Then we have three roles in > NN cluster: ANN, one primary SNN, one or more normal SNN. > Since HDFS-12248, there may be more than two primary SNN shortly after > a exception occurred. It takes care with a scenario that SNN will not upload > fsimage on IOE and Interrupted exceptions. Though it will not cause any > further functional issues, it is inconsistent. > Futher more, edit log may be rolled more frequently than necessary with > multiple Standby name nodes, HDFS-14349. (I'm not so sure about this, will > verify by unit tests or any one could point it out.) > Above all, I‘m wondering if we could make it simple with following > changes: > * There are only two roles:ANN, SNN > * ANN will roll its edit log every DFS_HA_LOGROLL_PERIOD_KEY period. > * ANN will select a SNN to download checkpoint. > SNN will just do logtail and checkpoint. Then provide a servlet for fsimage > downloading as normal. SNN will not try to roll edit log or send checkpoint > request to ANN. > In a word, ANN will be more active. Suggestions are welcomed. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14378) Simplify the design of multiple NN and both logic of edit log roll and checkpoint
[ https://issues.apache.org/jira/browse/HDFS-14378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] star updated HDFS-14378: Attachment: HDFS-14378-trunk.001.patch > Simplify the design of multiple NN and both logic of edit log roll and > checkpoint > - > > Key: HDFS-14378 > URL: https://issues.apache.org/jira/browse/HDFS-14378 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: star >Assignee: star >Priority: Minor > Attachments: HDFS-14378-trunk.001.patch > > > HDFS-6440 introduced a mechanism to support more than 2 NNs. It > implements a first-writer-win policy to avoid duplicated fsimage downloading. > Variable 'isPrimaryCheckPointer' is used to hold the first-writer state, with > which SNN will provide fsimage for ANN next time. Then we have three roles in > NN cluster: ANN, one primary SNN, one or more normal SNN. > Since HDFS-12248, there may be more than two primary SNN shortly after > a exception occurred. It takes care with a scenario that SNN will not upload > fsimage on IOE and Interrupted exceptions. Though it will not cause any > further functional issues, it is inconsistent. > Futher more, edit log may be rolled more frequently than necessary with > multiple Standby name nodes, HDFS-14349. (I'm not so sure about this, will > verify by unit tests or any one could point it out.) > Above all, I‘m wondering if we could make it simple with following > changes: > * There are only two roles:ANN, SNN > * ANN will roll its edit log every DFS_HA_LOGROLL_PERIOD_KEY period. > * ANN will select a SNN to download checkpoint. > SNN will just do logtail and checkpoint. Then provide a servlet for fsimage > downloading as normal. SNN will not try to roll edit log or send checkpoint > request to ANN. > In a word, ANN will be more active. Suggestions are welcomed. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-14378) Simplify the design of multiple NN and both logic of edit log roll and checkpoint
star created HDFS-14378: --- Summary: Simplify the design of multiple NN and both logic of edit log roll and checkpoint Key: HDFS-14378 URL: https://issues.apache.org/jira/browse/HDFS-14378 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: star Assignee: star HDFS-6440 introduced a mechanism to support more than 2 NNs. It implements a first-writer-win policy to avoid duplicated fsimage downloading. Variable 'isPrimaryCheckPointer' is used to hold the first-writer state, with which SNN will provide fsimage for ANN next time. Then we have three roles in NN cluster: ANN, one primary SNN, one or more normal SNN. Since HDFS-12248, there may be more than two primary SNN shortly after a exception occurred. It takes care with a scenario that SNN will not upload fsimage on IOE and Interrupted exceptions. Though it will not cause any further functional issues, it is inconsistent. Futher more, edit log may be rolled more frequently than necessary with multiple Standby name nodes, HDFS-14349. (I'm not so sure about this, will verify by unit tests or any could point it out.) Above all, I‘m wondering if we could make it simple with following changes: * There are only two roles:ANN, SNN * ANN will roll its edit log every DFS_HA_LOGROLL_PERIOD_KEY period. * ANN will select a SNN to download checkpoint. SNN will just do logtail and checkpoint. Then provide a servlet for fsimage downloading as normal. SNN will not try to roll edit log or send checkpoint request to ANN. In a word, ANN will be more active. Suggestions are welcomed. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14378) Simplify the design of multiple NN and both logic of edit log roll and checkpoint
[ https://issues.apache.org/jira/browse/HDFS-14378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] star updated HDFS-14378: Description: HDFS-6440 introduced a mechanism to support more than 2 NNs. It implements a first-writer-win policy to avoid duplicated fsimage downloading. Variable 'isPrimaryCheckPointer' is used to hold the first-writer state, with which SNN will provide fsimage for ANN next time. Then we have three roles in NN cluster: ANN, one primary SNN, one or more normal SNN. Since HDFS-12248, there may be more than two primary SNN shortly after a exception occurred. It takes care with a scenario that SNN will not upload fsimage on IOE and Interrupted exceptions. Though it will not cause any further functional issues, it is inconsistent. Futher more, edit log may be rolled more frequently than necessary with multiple Standby name nodes, HDFS-14349. (I'm not so sure about this, will verify by unit tests or any one could point it out.) Above all, I‘m wondering if we could make it simple with following changes: * There are only two roles:ANN, SNN * ANN will roll its edit log every DFS_HA_LOGROLL_PERIOD_KEY period. * ANN will select a SNN to download checkpoint. SNN will just do logtail and checkpoint. Then provide a servlet for fsimage downloading as normal. SNN will not try to roll edit log or send checkpoint request to ANN. In a word, ANN will be more active. Suggestions are welcomed. was: HDFS-6440 introduced a mechanism to support more than 2 NNs. It implements a first-writer-win policy to avoid duplicated fsimage downloading. Variable 'isPrimaryCheckPointer' is used to hold the first-writer state, with which SNN will provide fsimage for ANN next time. Then we have three roles in NN cluster: ANN, one primary SNN, one or more normal SNN. Since HDFS-12248, there may be more than two primary SNN shortly after a exception occurred. It takes care with a scenario that SNN will not upload fsimage on IOE and Interrupted exceptions. Though it will not cause any further functional issues, it is inconsistent. Futher more, edit log may be rolled more frequently than necessary with multiple Standby name nodes, HDFS-14349. (I'm not so sure about this, will verify by unit tests or any could point it out.) Above all, I‘m wondering if we could make it simple with following changes: * There are only two roles:ANN, SNN * ANN will roll its edit log every DFS_HA_LOGROLL_PERIOD_KEY period. * ANN will select a SNN to download checkpoint. SNN will just do logtail and checkpoint. Then provide a servlet for fsimage downloading as normal. SNN will not try to roll edit log or send checkpoint request to ANN. In a word, ANN will be more active. Suggestions are welcomed. > Simplify the design of multiple NN and both logic of edit log roll and > checkpoint > - > > Key: HDFS-14378 > URL: https://issues.apache.org/jira/browse/HDFS-14378 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: star >Assignee: star >Priority: Minor > > HDFS-6440 introduced a mechanism to support more than 2 NNs. It > implements a first-writer-win policy to avoid duplicated fsimage downloading. > Variable 'isPrimaryCheckPointer' is used to hold the first-writer state, with > which SNN will provide fsimage for ANN next time. Then we have three roles in > NN cluster: ANN, one primary SNN, one or more normal SNN. > Since HDFS-12248, there may be more than two primary SNN shortly after > a exception occurred. It takes care with a scenario that SNN will not upload > fsimage on IOE and Interrupted exceptions. Though it will not cause any > further functional issues, it is inconsistent. > Futher more, edit log may be rolled more frequently than necessary with > multiple Standby name nodes, HDFS-14349. (I'm not so sure about this, will > verify by unit tests or any one could point it out.) > Above all, I‘m wondering if we could make it simple with following > changes: > * There are only two roles:ANN, SNN > * ANN will roll its edit log every DFS_HA_LOGROLL_PERIOD_KEY period. > * ANN will select a SNN to download checkpoint. > SNN will just do logtail and checkpoint. Then provide a servlet for fsimage > downloading as normal. SNN will not try to roll edit log or send checkpoint > request to ANN. > In a word, ANN will be more active. Suggestions are welcomed. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14349) Edit log may be rolled more frequently than necessary with multiple Standby nodes
[ https://issues.apache.org/jira/browse/HDFS-14349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16795553#comment-16795553 ] star commented on HDFS-14349: - Yes, it seems that normal edit log roll will be triggered by multiple SNN. I am doing unit tests to verify the action. > Edit log may be rolled more frequently than necessary with multiple Standby > nodes > - > > Key: HDFS-14349 > URL: https://issues.apache.org/jira/browse/HDFS-14349 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, hdfs, qjm >Reporter: Erik Krogen >Assignee: Ekanth Sethuramalingam >Priority: Major > > When HDFS-14317 was fixed, we tackled the problem that in a cluster with > in-progress edit log tailing enabled, a Standby NameNode may _never_ roll the > edit logs, which can eventually cause data loss. > Unfortunately, in the process, it was made so that if there are multiple > Standby NameNodes, they will all roll the edit logs at their specified > frequency, so the edit log will be rolled X times more frequently than they > should be (where X is the number of Standby NNs). This is not as bad as the > original bug since rolling frequently does not affect correctness or data > availability, but may degrade performance by creating more edit log segments > than necessary. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14349) Edit log may be rolled more frequently than necessary with multiple Standby nodes
[ https://issues.apache.org/jira/browse/HDFS-14349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16794201#comment-16794201 ] star commented on HDFS-14349: - The autoroll operation is executed in ANN service, not triggered by SNN. So it will not degrade NN performance with more SNN. Related code FSNamesystem: {code:java} //Active Service void startActiveServices() throws IOException { ... nnEditLogRoller = new Daemon(new NameNodeEditLogRoller( editLogRollerThreshold, editLogRollerInterval)); nnEditLogRoller.start(); ... } //Auto roll log long numEdits = getCorrectTransactionsSinceLastLogRoll(); if (numEdits > rollThreshold) { FSNamesystem.LOG.info("NameNode rolling its own edit log because" + " number of edits in open segment exceeds threshold of " + rollThreshold); rollEditLog(); } {code} > Edit log may be rolled more frequently than necessary with multiple Standby > nodes > - > > Key: HDFS-14349 > URL: https://issues.apache.org/jira/browse/HDFS-14349 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, hdfs, qjm >Reporter: Erik Krogen >Assignee: Ekanth Sethuramalingam >Priority: Major > > When HDFS-14317 was fixed, we tackled the problem that in a cluster with > in-progress edit log tailing enabled, a Standby NameNode may _never_ roll the > edit logs, which can eventually cause data loss. > Unfortunately, in the process, it was made so that if there are multiple > Standby NameNodes, they will all roll the edit logs at their specified > frequency, so the edit log will be rolled X times more frequently than they > should be (where X is the number of Standby NNs). This is not as bad as the > original bug since rolling frequently does not affect correctness or data > availability, but may degrade performance by creating more edit log segments > than necessary. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org