[jira] [Updated] (HDFS-8869) Don't mark storages as failed before first block report
[ https://issues.apache.org/jira/browse/HDFS-8869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko updated HDFS-8869: -- Target Version/s: (was: 2.7.6) > Don't mark storages as failed before first block report > --- > > Key: HDFS-8869 > URL: https://issues.apache.org/jira/browse/HDFS-8869 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.7.0 >Reporter: Rushabh S Shah >Assignee: Daryn Sharp >Priority: Major > > Creating this ticket on behalf of [~daryn]. > Heartbeat processing performs the failed storage check. The DN reports its > storages and any prior missing storages, ex. unique storage id upgrade, are > marked failed. The heartbeat monitor removes all blocks associated to the > failed storage. A replication storm ensues for all blocks on the node. > Eventually the DN block reports for the new storages - up to 15m later for > large clusters. Now the NN has many excess blocks to invalidate. If the > cluster has failed over in the past 24h, ex. rolling upgrade, the standby > gone active will queue the block invalidations which triggers the severe > performance degradation of HDFS-8674 which has been greatly lessened but is > still an issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-8869) Don't mark storages as failed before first block report
[ https://issues.apache.org/jira/browse/HDFS-8869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated HDFS-8869: -- Target Version/s: 2.7.4 (was: 2.7.3) 2.7.3 is under release process, changing target-version to 2.7.4. > Don't mark storages as failed before first block report > --- > > Key: HDFS-8869 > URL: https://issues.apache.org/jira/browse/HDFS-8869 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.7.0 >Reporter: Rushabh S Shah >Assignee: Daryn Sharp > > Creating this ticket on behalf of [~daryn]. > Heartbeat processing performs the failed storage check. The DN reports its > storages and any prior missing storages, ex. unique storage id upgrade, are > marked failed. The heartbeat monitor removes all blocks associated to the > failed storage. A replication storm ensues for all blocks on the node. > Eventually the DN block reports for the new storages - up to 15m later for > large clusters. Now the NN has many excess blocks to invalidate. If the > cluster has failed over in the past 24h, ex. rolling upgrade, the standby > gone active will queue the block invalidations which triggers the severe > performance degradation of HDFS-8674 which has been greatly lessened but is > still an issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-8869) Don't mark storages as failed before first block report
[ https://issues.apache.org/jira/browse/HDFS-8869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated HDFS-8869: -- Target Version/s: 2.7.3 (was: 2.7.2) Moving out all non-critical / non-blocker issues that didn't make it out of 2.7.2 into 2.7.3. > Don't mark storages as failed before first block report > --- > > Key: HDFS-8869 > URL: https://issues.apache.org/jira/browse/HDFS-8869 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.7.0 >Reporter: Rushabh S Shah >Assignee: Daryn Sharp > > Creating this ticket on behalf of [~daryn]. > Heartbeat processing performs the failed storage check. The DN reports its > storages and any prior missing storages, ex. unique storage id upgrade, are > marked failed. The heartbeat monitor removes all blocks associated to the > failed storage. A replication storm ensues for all blocks on the node. > Eventually the DN block reports for the new storages - up to 15m later for > large clusters. Now the NN has many excess blocks to invalidate. If the > cluster has failed over in the past 24h, ex. rolling upgrade, the standby > gone active will queue the block invalidations which triggers the severe > performance degradation of HDFS-8674 which has been greatly lessened but is > still an issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)