[jira] [Updated] (HDFS-7859) Erasure Coding: Persist erasure coding policies in NameNode
[ https://issues.apache.org/jira/browse/HDFS-7859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinwei Qin updated HDFS-7859: -- Attachment: (was: HDFS-7859.008.patch) > Erasure Coding: Persist erasure coding policies in NameNode > --- > > Key: HDFS-7859 > URL: https://issues.apache.org/jira/browse/HDFS-7859 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Kai Zheng >Assignee: Xinwei Qin > Labels: BB2015-05-TBR, hdfs-ec-3.0-must-do > Attachments: HDFS-7859-HDFS-7285.002.patch, > HDFS-7859-HDFS-7285.002.patch, HDFS-7859-HDFS-7285.003.patch, > HDFS-7859.001.patch, HDFS-7859.002.patch, HDFS-7859.004.patch, > HDFS-7859.005.patch, HDFS-7859.006.patch, HDFS-7859.007.patch > > > In meetup discussion with [~zhz] and [~jingzhao], it's suggested that we > persist EC schemas in NameNode centrally and reliably, so that EC zones can > reference them by name efficiently. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-7859) Erasure Coding: Persist erasure coding policies in NameNode
[ https://issues.apache.org/jira/browse/HDFS-7859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15435188#comment-15435188 ] Xinwei Qin commented on HDFS-7859: --- Sorry for attaching the wrong patch, not the latest one, I will correct it tomorrow morning. > Erasure Coding: Persist erasure coding policies in NameNode > --- > > Key: HDFS-7859 > URL: https://issues.apache.org/jira/browse/HDFS-7859 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Kai Zheng >Assignee: Xinwei Qin > Labels: BB2015-05-TBR, hdfs-ec-3.0-must-do > Attachments: HDFS-7859-HDFS-7285.002.patch, > HDFS-7859-HDFS-7285.002.patch, HDFS-7859-HDFS-7285.003.patch, > HDFS-7859.001.patch, HDFS-7859.002.patch, HDFS-7859.004.patch, > HDFS-7859.005.patch, HDFS-7859.006.patch, HDFS-7859.007.patch, > HDFS-7859.008.patch > > > In meetup discussion with [~zhz] and [~jingzhao], it's suggested that we > persist EC schemas in NameNode centrally and reliably, so that EC zones can > reference them by name efficiently. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-7859) Erasure Coding: Persist erasure coding policies in NameNode
[ https://issues.apache.org/jira/browse/HDFS-7859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15435092#comment-15435092 ] Xinwei Qin commented on HDFS-7859: --- Attach the new patch to fix the only TestOfflineEditsViewer failure. Checkstyle and Findbugs results are not relate to this issue. > Erasure Coding: Persist erasure coding policies in NameNode > --- > > Key: HDFS-7859 > URL: https://issues.apache.org/jira/browse/HDFS-7859 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Kai Zheng >Assignee: Xinwei Qin > Labels: BB2015-05-TBR, hdfs-ec-3.0-must-do > Attachments: HDFS-7859-HDFS-7285.002.patch, > HDFS-7859-HDFS-7285.002.patch, HDFS-7859-HDFS-7285.003.patch, > HDFS-7859.001.patch, HDFS-7859.002.patch, HDFS-7859.004.patch, > HDFS-7859.005.patch, HDFS-7859.006.patch, HDFS-7859.007.patch, > HDFS-7859.008.patch > > > In meetup discussion with [~zhz] and [~jingzhao], it's suggested that we > persist EC schemas in NameNode centrally and reliably, so that EC zones can > reference them by name efficiently. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-7859) Erasure Coding: Persist erasure coding policies in NameNode
[ https://issues.apache.org/jira/browse/HDFS-7859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinwei Qin updated HDFS-7859: -- Attachment: HDFS-7859.008.patch > Erasure Coding: Persist erasure coding policies in NameNode > --- > > Key: HDFS-7859 > URL: https://issues.apache.org/jira/browse/HDFS-7859 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Kai Zheng >Assignee: Xinwei Qin > Labels: BB2015-05-TBR, hdfs-ec-3.0-must-do > Attachments: HDFS-7859-HDFS-7285.002.patch, > HDFS-7859-HDFS-7285.002.patch, HDFS-7859-HDFS-7285.003.patch, > HDFS-7859.001.patch, HDFS-7859.002.patch, HDFS-7859.004.patch, > HDFS-7859.005.patch, HDFS-7859.006.patch, HDFS-7859.007.patch, > HDFS-7859.008.patch > > > In meetup discussion with [~zhz] and [~jingzhao], it's suggested that we > persist EC schemas in NameNode centrally and reliably, so that EC zones can > reference them by name efficiently. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10786) Erasure Coding: Add removeErasureCodingPolicy API
[ https://issues.apache.org/jira/browse/HDFS-10786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinwei Qin updated HDFS-10786: --- Description: HDFS-7859 has developed addErasureCodingPolicy API to add some user-added Erasure Coding policies, and as discussed in HDFS-7859, we should also add removeErasureCodingPolicy API to support removing some user-added Erasure Coding Polices. (was: HDFS-7859 has developed addErasureCodingPolicy API to add some user-added Erasure Coding policies, we should also add removeErasureCodingPolicy API to support removing some user-added Erasure Coding Polices.) > Erasure Coding: Add removeErasureCodingPolicy API > - > > Key: HDFS-10786 > URL: https://issues.apache.org/jira/browse/HDFS-10786 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Xinwei Qin > > HDFS-7859 has developed addErasureCodingPolicy API to add some user-added > Erasure Coding policies, and as discussed in HDFS-7859, we should also add > removeErasureCodingPolicy API to support removing some user-added Erasure > Coding Polices. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-7859) Erasure Coding: Persist erasure coding policies in NameNode
[ https://issues.apache.org/jira/browse/HDFS-7859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15432993#comment-15432993 ] Xinwei Qin commented on HDFS-7859: --- Hi, [~zhz], Have updated the patch with your comments and fixed some UT failure, pls review. > Erasure Coding: Persist erasure coding policies in NameNode > --- > > Key: HDFS-7859 > URL: https://issues.apache.org/jira/browse/HDFS-7859 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Kai Zheng >Assignee: Xinwei Qin > Labels: BB2015-05-TBR, hdfs-ec-3.0-must-do > Attachments: HDFS-7859-HDFS-7285.002.patch, > HDFS-7859-HDFS-7285.002.patch, HDFS-7859-HDFS-7285.003.patch, > HDFS-7859.001.patch, HDFS-7859.002.patch, HDFS-7859.004.patch, > HDFS-7859.005.patch, HDFS-7859.006.patch, HDFS-7859.007.patch > > > In meetup discussion with [~zhz] and [~jingzhao], it's suggested that we > persist EC schemas in NameNode centrally and reliably, so that EC zones can > reference them by name efficiently. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-10786) Erasure Coding: Add removeErasureCodingPolicy API
Xinwei Qin created HDFS-10786: -- Summary: Erasure Coding: Add removeErasureCodingPolicy API Key: HDFS-10786 URL: https://issues.apache.org/jira/browse/HDFS-10786 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Xinwei Qin HDFS-7859 has developed addErasureCodingPolicy API to add some user-added Erasure Coding policies, we should also add removeErasureCodingPolicy API to support removing some user-added Erasure Coding Polices. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-7859) Erasure Coding: Persist erasure coding policies in NameNode
[ https://issues.apache.org/jira/browse/HDFS-7859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinwei Qin updated HDFS-7859: -- Attachment: HDFS-7859.007.patch > Erasure Coding: Persist erasure coding policies in NameNode > --- > > Key: HDFS-7859 > URL: https://issues.apache.org/jira/browse/HDFS-7859 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Kai Zheng >Assignee: Xinwei Qin > Labels: BB2015-05-TBR, hdfs-ec-3.0-must-do > Attachments: HDFS-7859-HDFS-7285.002.patch, > HDFS-7859-HDFS-7285.002.patch, HDFS-7859-HDFS-7285.003.patch, > HDFS-7859.001.patch, HDFS-7859.002.patch, HDFS-7859.004.patch, > HDFS-7859.005.patch, HDFS-7859.006.patch, HDFS-7859.007.patch > > > In meetup discussion with [~zhz] and [~jingzhao], it's suggested that we > persist EC schemas in NameNode centrally and reliably, so that EC zones can > reference them by name efficiently. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-7859) Erasure Coding: Persist erasure coding policies in NameNode
[ https://issues.apache.org/jira/browse/HDFS-7859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15424776#comment-15424776 ] Xinwei Qin commented on HDFS-7859: --- Hi, [~zhz] I have updated the patch. About your comments, I have fixed 1, 2, 5, 6, 7. About 3 and 4, I think the current method name and usage may be more suitable, because {{addErasureCodingPolicy}} always does not need lock the {{FSDirectory}}, which is similar to the {{addCacheDirective}}. > Erasure Coding: Persist erasure coding policies in NameNode > --- > > Key: HDFS-7859 > URL: https://issues.apache.org/jira/browse/HDFS-7859 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Kai Zheng >Assignee: Xinwei Qin > Labels: BB2015-05-TBR, hdfs-ec-3.0-must-do > Attachments: HDFS-7859-HDFS-7285.002.patch, > HDFS-7859-HDFS-7285.002.patch, HDFS-7859-HDFS-7285.003.patch, > HDFS-7859.001.patch, HDFS-7859.002.patch, HDFS-7859.004.patch, > HDFS-7859.005.patch, HDFS-7859.006.patch > > > In meetup discussion with [~zhz] and [~jingzhao], it's suggested that we > persist EC schemas in NameNode centrally and reliably, so that EC zones can > reference them by name efficiently. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Issue Comment Deleted] (HDFS-7859) Erasure Coding: Persist erasure coding policies in NameNode
[ https://issues.apache.org/jira/browse/HDFS-7859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinwei Qin updated HDFS-7859: -- Comment: was deleted (was: Hi, [~zhz] I have updated the patch. About your comments, I have fixed 1, 2, 5, 6, 7. About 3 and 4, I think the current method name and usage may be more suitable, because {{addErasureCodingPolicy}} always does not need lock the {{FSDirectory}}, which is similar to the {{addCacheDirective}}.) > Erasure Coding: Persist erasure coding policies in NameNode > --- > > Key: HDFS-7859 > URL: https://issues.apache.org/jira/browse/HDFS-7859 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Kai Zheng >Assignee: Xinwei Qin > Labels: BB2015-05-TBR, hdfs-ec-3.0-must-do > Attachments: HDFS-7859-HDFS-7285.002.patch, > HDFS-7859-HDFS-7285.002.patch, HDFS-7859-HDFS-7285.003.patch, > HDFS-7859.001.patch, HDFS-7859.002.patch, HDFS-7859.004.patch, > HDFS-7859.005.patch, HDFS-7859.006.patch > > > In meetup discussion with [~zhz] and [~jingzhao], it's suggested that we > persist EC schemas in NameNode centrally and reliably, so that EC zones can > reference them by name efficiently. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-7859) Erasure Coding: Persist erasure coding policies in NameNode
[ https://issues.apache.org/jira/browse/HDFS-7859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15424774#comment-15424774 ] Xinwei Qin commented on HDFS-7859: --- Hi, [~zhz] I have updated the patch. About your comments, I have fixed 1, 2, 5, 6, 7. About 3 and 4, I think the current method name and usage may be more suitable, because {{addErasureCodingPolicy}} always does not need lock the {{FSDirectory}}, which is similar to the {{addCacheDirective}}. > Erasure Coding: Persist erasure coding policies in NameNode > --- > > Key: HDFS-7859 > URL: https://issues.apache.org/jira/browse/HDFS-7859 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Kai Zheng >Assignee: Xinwei Qin > Labels: BB2015-05-TBR, hdfs-ec-3.0-must-do > Attachments: HDFS-7859-HDFS-7285.002.patch, > HDFS-7859-HDFS-7285.002.patch, HDFS-7859-HDFS-7285.003.patch, > HDFS-7859.001.patch, HDFS-7859.002.patch, HDFS-7859.004.patch, > HDFS-7859.005.patch, HDFS-7859.006.patch > > > In meetup discussion with [~zhz] and [~jingzhao], it's suggested that we > persist EC schemas in NameNode centrally and reliably, so that EC zones can > reference them by name efficiently. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-7859) Erasure Coding: Persist erasure coding policies in NameNode
[ https://issues.apache.org/jira/browse/HDFS-7859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinwei Qin updated HDFS-7859: -- Attachment: HDFS-7859.006.patch > Erasure Coding: Persist erasure coding policies in NameNode > --- > > Key: HDFS-7859 > URL: https://issues.apache.org/jira/browse/HDFS-7859 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Kai Zheng >Assignee: Xinwei Qin > Labels: BB2015-05-TBR, hdfs-ec-3.0-must-do > Attachments: HDFS-7859-HDFS-7285.002.patch, > HDFS-7859-HDFS-7285.002.patch, HDFS-7859-HDFS-7285.003.patch, > HDFS-7859.001.patch, HDFS-7859.002.patch, HDFS-7859.004.patch, > HDFS-7859.005.patch, HDFS-7859.006.patch > > > In meetup discussion with [~zhz] and [~jingzhao], it's suggested that we > persist EC schemas in NameNode centrally and reliably, so that EC zones can > reference them by name efficiently. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-7859) Erasure Coding: Persist erasure coding policies in NameNode
[ https://issues.apache.org/jira/browse/HDFS-7859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15421090#comment-15421090 ] Xinwei Qin commented on HDFS-7859: --- Thanks for your comments, [~zhz], I will update the patch shortly. {{removeErasureCodingPolicy}} is similar to this, I'm glad to work on it as well, I will fill another JIRA about it. > Erasure Coding: Persist erasure coding policies in NameNode > --- > > Key: HDFS-7859 > URL: https://issues.apache.org/jira/browse/HDFS-7859 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Kai Zheng >Assignee: Xinwei Qin > Labels: BB2015-05-TBR, hdfs-ec-3.0-must-do > Attachments: HDFS-7859-HDFS-7285.002.patch, > HDFS-7859-HDFS-7285.002.patch, HDFS-7859-HDFS-7285.003.patch, > HDFS-7859.001.patch, HDFS-7859.002.patch, HDFS-7859.004.patch, > HDFS-7859.005.patch > > > In meetup discussion with [~zhz] and [~jingzhao], it's suggested that we > persist EC schemas in NameNode centrally and reliably, so that EC zones can > reference them by name efficiently. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-7859) Erasure Coding: Persist erasure coding policies in NameNode
[ https://issues.apache.org/jira/browse/HDFS-7859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinwei Qin updated HDFS-7859: -- Attachment: HDFS-7859.005.patch Rebased the patch(005.patch), and add a unit test case. > Erasure Coding: Persist erasure coding policies in NameNode > --- > > Key: HDFS-7859 > URL: https://issues.apache.org/jira/browse/HDFS-7859 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Kai Zheng >Assignee: Xinwei Qin > Labels: BB2015-05-TBR, hdfs-ec-3.0-must-do > Attachments: HDFS-7859-HDFS-7285.002.patch, > HDFS-7859-HDFS-7285.002.patch, HDFS-7859-HDFS-7285.003.patch, > HDFS-7859.001.patch, HDFS-7859.002.patch, HDFS-7859.004.patch, > HDFS-7859.005.patch > > > In meetup discussion with [~zhz] and [~jingzhao], it's suggested that we > persist EC schemas in NameNode centrally and reliably, so that EC zones can > reference them by name efficiently. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-7859) Erasure Coding: Persist erasure coding policies in NameNode
[ https://issues.apache.org/jira/browse/HDFS-7859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15350146#comment-15350146 ] Xinwei Qin commented on HDFS-7859: --- Rebased patch still have some test failure, I'm doing my best to fix it now. > Erasure Coding: Persist erasure coding policies in NameNode > --- > > Key: HDFS-7859 > URL: https://issues.apache.org/jira/browse/HDFS-7859 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Kai Zheng >Assignee: Xinwei Qin > Labels: BB2015-05-TBR, hdfs-ec-3.0-must-do > Attachments: HDFS-7859-HDFS-7285.002.patch, > HDFS-7859-HDFS-7285.002.patch, HDFS-7859-HDFS-7285.003.patch, > HDFS-7859.001.patch, HDFS-7859.002.patch, HDFS-7859.004.patch > > > In meetup discussion with [~zhz] and [~jingzhao], it's suggested that we > persist EC schemas in NameNode centrally and reliably, so that EC zones can > reference them by name efficiently. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-7859) Erasure Coding: Persist erasure coding policies in NameNode
[ https://issues.apache.org/jira/browse/HDFS-7859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15341759#comment-15341759 ] Xinwei Qin commented on HDFS-7859: --- [~zhz], it's better to have this in 3.0, rebasing and perfect this will be done ASAP this week. > Erasure Coding: Persist erasure coding policies in NameNode > --- > > Key: HDFS-7859 > URL: https://issues.apache.org/jira/browse/HDFS-7859 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Kai Zheng >Assignee: Xinwei Qin > Labels: BB2015-05-TBR, hdfs-ec-3.0-must-do > Attachments: HDFS-7859-HDFS-7285.002.patch, > HDFS-7859-HDFS-7285.002.patch, HDFS-7859-HDFS-7285.003.patch, > HDFS-7859.001.patch, HDFS-7859.002.patch, HDFS-7859.004.patch > > > In meetup discussion with [~zhz] and [~jingzhao], it's suggested that we > persist EC schemas in NameNode centrally and reliably, so that EC zones can > reference them by name efficiently. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-7859) Erasure Coding: Persist erasure coding policies in NameNode
[ https://issues.apache.org/jira/browse/HDFS-7859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15253497#comment-15253497 ] Xinwei Qin commented on HDFS-7859: --- Need to update patch to fix the checkstyles and relevant unit test failure. > Erasure Coding: Persist erasure coding policies in NameNode > --- > > Key: HDFS-7859 > URL: https://issues.apache.org/jira/browse/HDFS-7859 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Kai Zheng >Assignee: Xinwei Qin > Labels: BB2015-05-TBR > Attachments: HDFS-7859-HDFS-7285.002.patch, > HDFS-7859-HDFS-7285.002.patch, HDFS-7859-HDFS-7285.003.patch, > HDFS-7859.001.patch, HDFS-7859.002.patch, HDFS-7859.004.patch > > > In meetup discussion with [~zhz] and [~jingzhao], it's suggested that we > persist EC schemas in NameNode centrally and reliably, so that EC zones can > reference them by name efficiently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7859) Erasure Coding: Persist erasure coding policies in NameNode
[ https://issues.apache.org/jira/browse/HDFS-7859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinwei Qin updated HDFS-7859: -- Attachment: HDFS-7859.004.patch Rebase the patch with the latest trunk. > Erasure Coding: Persist erasure coding policies in NameNode > --- > > Key: HDFS-7859 > URL: https://issues.apache.org/jira/browse/HDFS-7859 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Kai Zheng >Assignee: Xinwei Qin > Labels: BB2015-05-TBR > Attachments: HDFS-7859-HDFS-7285.002.patch, > HDFS-7859-HDFS-7285.002.patch, HDFS-7859-HDFS-7285.003.patch, > HDFS-7859.001.patch, HDFS-7859.002.patch, HDFS-7859.004.patch > > > In meetup discussion with [~zhz] and [~jingzhao], it's suggested that we > persist EC schemas in NameNode centrally and reliably, so that EC zones can > reference them by name efficiently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7859) Erasure Coding: Persist erasure coding policies in NameNode
[ https://issues.apache.org/jira/browse/HDFS-7859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15245677#comment-15245677 ] Xinwei Qin commented on HDFS-7859: --- [~rakeshr] [~drankye], and [~zhz], thanks for your comments and clarifications. Now, it is a good time to update this patch, though we should have a more clear about the details of custom policies. I am glad to rebase the patch with latest code and maybe attach it tomorrow. > Erasure Coding: Persist erasure coding policies in NameNode > --- > > Key: HDFS-7859 > URL: https://issues.apache.org/jira/browse/HDFS-7859 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Kai Zheng >Assignee: Xinwei Qin > Labels: BB2015-05-TBR > Attachments: HDFS-7859-HDFS-7285.002.patch, > HDFS-7859-HDFS-7285.002.patch, HDFS-7859-HDFS-7285.003.patch, > HDFS-7859.001.patch, HDFS-7859.002.patch > > > In meetup discussion with [~zhz] and [~jingzhao], it's suggested that we > persist EC schemas in NameNode centrally and reliably, so that EC zones can > reference them by name efficiently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7859) Erasure Coding: Persist erasure coding policies in NameNode
[ https://issues.apache.org/jira/browse/HDFS-7859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15104111#comment-15104111 ] Xinwei Qin commented on HDFS-7859: --- Hi, [~drankye], I will proceed to fill this jira recently. > Erasure Coding: Persist erasure coding policies in NameNode > --- > > Key: HDFS-7859 > URL: https://issues.apache.org/jira/browse/HDFS-7859 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Kai Zheng >Assignee: Xinwei Qin > Labels: BB2015-05-TBR > Attachments: HDFS-7859-HDFS-7285.002.patch, > HDFS-7859-HDFS-7285.002.patch, HDFS-7859-HDFS-7285.003.patch, > HDFS-7859.001.patch, HDFS-7859.002.patch > > > In meetup discussion with [~zhz] and [~jingzhao], it's suggested that we > persist EC schemas in NameNode centrally and reliably, so that EC zones can > reference them by name efficiently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8202) Improve end to end stirpping file test to add erasure recovering test
[ https://issues.apache.org/jira/browse/HDFS-8202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinwei Qin updated HDFS-8202: -- Attachment: HDFS-8202-HDFS-7285.006.patch Thanks [~zhz] for the comments. Upload 006 patch for review. bq.TestWriteStripedFileWithFailure actually fails, could you debug the issue? This issue is addressed by HDFS-8704, now, all the test can pass with the latest 003 patch in HDFS-8704. Improve end to end stirpping file test to add erasure recovering test - Key: HDFS-8202 URL: https://issues.apache.org/jira/browse/HDFS-8202 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Kai Zheng Assignee: Xinwei Qin Attachments: HDFS-8202-HDFS-7285.003.patch, HDFS-8202-HDFS-7285.004.patch, HDFS-8202-HDFS-7285.005.patch, HDFS-8202-HDFS-7285.006.patch, HDFS-8202.001.patch, HDFS-8202.002.patch This to follow on HDFS-8201 to add erasure recovering test in the end to end stripping file test: * After writing certain blocks to the test file, delete some block file; * Read the file content and compare, see if any recovering issue, or verify the erasure recovering works or not. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8202) Improve end to end stirpping file test to add erasure recovering test
[ https://issues.apache.org/jira/browse/HDFS-8202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinwei Qin updated HDFS-8202: -- Attachment: HDFS-8202-HDFS-7285.005.patch Thanks [~zhz] and [~walter.k.su] for the comments and advice. Upload 005 patch. Improve end to end stirpping file test to add erasure recovering test - Key: HDFS-8202 URL: https://issues.apache.org/jira/browse/HDFS-8202 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Kai Zheng Assignee: Xinwei Qin Attachments: HDFS-8202-HDFS-7285.003.patch, HDFS-8202-HDFS-7285.004.patch, HDFS-8202-HDFS-7285.005.patch, HDFS-8202.001.patch, HDFS-8202.002.patch This to follow on HDFS-8201 to add erasure recovering test in the end to end stripping file test: * After writing certain blocks to the test file, delete some block file; * Read the file content and compare, see if any recovering issue, or verify the erasure recovering works or not. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8202) Improve end to end stirpping file test to add erasure recovering test
[ https://issues.apache.org/jira/browse/HDFS-8202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinwei Qin updated HDFS-8202: -- Attachment: HDFS-8202-HDFS-7285.004.patch Thanks [~zhz] for the comments. Update 004 patch to address these comments. bq. Can we write a loop instead of manually add all possibilities for testReadWithDNFailure and testReadCorruptedData? {{testReadCorruptedData}} could be modified to a loop as long as the corrupted files are different, but we can not do this for {{testReadWithDNFailure}}, since each test case of it needs a new cluster with no dead Datanode. Improve end to end stirpping file test to add erasure recovering test - Key: HDFS-8202 URL: https://issues.apache.org/jira/browse/HDFS-8202 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Kai Zheng Assignee: Xinwei Qin Attachments: HDFS-8202-HDFS-7285.003.patch, HDFS-8202-HDFS-7285.004.patch, HDFS-8202.001.patch, HDFS-8202.002.patch This to follow on HDFS-8201 to add erasure recovering test in the end to end stripping file test: * After writing certain blocks to the test file, delete some block file; * Read the file content and compare, see if any recovering issue, or verify the erasure recovering works or not. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8704) Erasure Coding: client fails to write large file when one datanode fails
[ https://issues.apache.org/jira/browse/HDFS-8704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14636721#comment-14636721 ] Xinwei Qin commented on HDFS-8704: --- Hi, [~libo-intel], even if the file is smaller than a block group(ie. {{filelength = blocksize * dataBlocks -123}}), when the index of failure DN is 0, the {{testDatanodeFailure0}} is also failed. error logs: {code} java.lang.AssertionError: org.apache.hadoop.ipc.RemoteException(java.lang.AssertionError): commitBlock length is less than the stored one 524165 vs. 1045504 at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.commitBlock(BlockManager.java:635) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.commitOrCompleteLastBlock(BlockManager.java:665) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.commitOrCompleteLastBlock(FSNamesystem.java:3672) at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.completeFileInternal(FSDirWriteFileOp.java:773) at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.completeFile(FSDirWriteFileOp.java:720) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:3084) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:771) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:541) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:636) {code} Erasure Coding: client fails to write large file when one datanode fails Key: HDFS-8704 URL: https://issues.apache.org/jira/browse/HDFS-8704 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Li Bo Assignee: Li Bo Attachments: HDFS-8704-000.patch I test current code on a 5-node cluster using RS(3,2). When a datanode is corrupt, client succeeds to write a file smaller than a block group but fails to write a large one. {{TestDFSStripeOutputStreamWithFailure}} only tests files smaller than a block group, this jira will add more test situations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8202) Improve end to end stirpping file test to add erasure recovering test
[ https://issues.apache.org/jira/browse/HDFS-8202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinwei Qin updated HDFS-8202: -- Attachment: HDFS-8202-HDFS-7285.003.patch [~zhz], updated the patch including reading and writing EC file test with failure, please help to review. Improve end to end stirpping file test to add erasure recovering test - Key: HDFS-8202 URL: https://issues.apache.org/jira/browse/HDFS-8202 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Kai Zheng Assignee: Xinwei Qin Attachments: HDFS-8202-HDFS-7285.003.patch, HDFS-8202.001.patch, HDFS-8202.002.patch This to follow on HDFS-8201 to add erasure recovering test in the end to end stripping file test: * After writing certain blocks to the test file, delete some block file; * Read the file content and compare, see if any recovering issue, or verify the erasure recovering works or not. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8202) Improve end to end stirpping file test to add erasure recovering test
[ https://issues.apache.org/jira/browse/HDFS-8202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14629094#comment-14629094 ] Xinwei Qin commented on HDFS-8202: --- Hi, [~zhz], thanks for your clarify. I will move HDFS-8259 and HDFS-8260 patch here. Improve end to end stirpping file test to add erasure recovering test - Key: HDFS-8202 URL: https://issues.apache.org/jira/browse/HDFS-8202 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Kai Zheng Assignee: Xinwei Qin Attachments: HDFS-8202.001.patch, HDFS-8202.002.patch This to follow on HDFS-8201 to add erasure recovering test in the end to end stripping file test: * After writing certain blocks to the test file, delete some block file; * Read the file content and compare, see if any recovering issue, or verify the erasure recovering works or not. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8260) Erasure Coding: system test of writing EC file
[ https://issues.apache.org/jira/browse/HDFS-8260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinwei Qin updated HDFS-8260: -- Attachment: HDFS-8260-HDFS-7285.001.patch Attach the system test patch of writing EC file with some datanodes failing. Some tests cannot pass due to the issues(HDFS-8704, HDFS-8383) have not been fixed. Erasure Coding: system test of writing EC file --- Key: HDFS-8260 URL: https://issues.apache.org/jira/browse/HDFS-8260 Project: Hadoop HDFS Issue Type: Test Affects Versions: HDFS-7285 Reporter: GAO Rui Assignee: Xinwei Qin Attachments: HDFS-8260-HDFS-7285.001.patch 1. Normally writing EC file(writing without datanote failure) 2. Writing EC file with tolerable number of datanodes failing. 3. Writing EC file with intolerable number of datanodes failing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8260) Erasure Coding: system test of writing EC file
[ https://issues.apache.org/jira/browse/HDFS-8260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14627488#comment-14627488 ] Xinwei Qin commented on HDFS-8260: --- Hi, [~demongaorui], Maybe I incorrectly understand what you mean. These jiras under the umbrella(HDFS8197) need system test results in real cluster, but not the patch in code. So, the attached patch does not match your intention, right? Erasure Coding: system test of writing EC file --- Key: HDFS-8260 URL: https://issues.apache.org/jira/browse/HDFS-8260 Project: Hadoop HDFS Issue Type: Test Affects Versions: HDFS-7285 Reporter: GAO Rui Assignee: Xinwei Qin Attachments: HDFS-8260-HDFS-7285.001.patch 1. Normally writing EC file(writing without datanote failure) 2. Writing EC file with tolerable number of datanodes failing. 3. Writing EC file with intolerable number of datanodes failing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8732) Erasure Coding: Fail to read a file with corrupted blocks
[ https://issues.apache.org/jira/browse/HDFS-8732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621593#comment-14621593 ] Xinwei Qin commented on HDFS-8732: --- Yes, [~jingzhao], the test passed with patch in HDFS-8669. I think this jira can be closed now. Erasure Coding: Fail to read a file with corrupted blocks - Key: HDFS-8732 URL: https://issues.apache.org/jira/browse/HDFS-8732 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Xinwei Qin Assignee: Walter Su Attachments: testReadCorruptedData.patch In system test of reading EC file(HDFS-8259), the methods {{testReadCorruptedData*()}} failed to read a EC file with corrupted blocks(overwrite some data to several blocks and this will make client get a checksum exception). Exception logs: {code} java.lang.NullPointerException at org.apache.hadoop.hdfs.DFSStripedInputStream$StatefulStripeReader.readChunk(DFSStripedInputStream.java:771) at org.apache.hadoop.hdfs.DFSStripedInputStream$StripeReader.readStripe(DFSStripedInputStream.java:623) at org.apache.hadoop.hdfs.DFSStripedInputStream.readOneStripe(DFSStripedInputStream.java:335) at org.apache.hadoop.hdfs.DFSStripedInputStream.readWithStrategy(DFSStripedInputStream.java:465) at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:946) at java.io.DataInputStream.read(DataInputStream.java:149) at org.apache.hadoop.hdfs.StripedFileTestUtil.verifyStatefulRead(StripedFileTestUtil.java:98) at org.apache.hadoop.hdfs.TestReadStripedFileWithDecoding.verifyRead(TestReadStripedFileWithDecoding.java:196) at org.apache.hadoop.hdfs.TestReadStripedFileWithDecoding.testOneFileWithBlockCorrupted(TestReadStripedFileWithDecoding.java:246) at org.apache.hadoop.hdfs.TestReadStripedFileWithDecoding.testReadCorruptedData11(TestReadStripedFileWithDecoding.java:114) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-8732) Erasure Coding: Fail to read a file with corrupted blocks
[ https://issues.apache.org/jira/browse/HDFS-8732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinwei Qin resolved HDFS-8732. --- Resolution: Fixed Erasure Coding: Fail to read a file with corrupted blocks - Key: HDFS-8732 URL: https://issues.apache.org/jira/browse/HDFS-8732 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Xinwei Qin Assignee: Walter Su Attachments: testReadCorruptedData.patch In system test of reading EC file(HDFS-8259), the methods {{testReadCorruptedData*()}} failed to read a EC file with corrupted blocks(overwrite some data to several blocks and this will make client get a checksum exception). Exception logs: {code} java.lang.NullPointerException at org.apache.hadoop.hdfs.DFSStripedInputStream$StatefulStripeReader.readChunk(DFSStripedInputStream.java:771) at org.apache.hadoop.hdfs.DFSStripedInputStream$StripeReader.readStripe(DFSStripedInputStream.java:623) at org.apache.hadoop.hdfs.DFSStripedInputStream.readOneStripe(DFSStripedInputStream.java:335) at org.apache.hadoop.hdfs.DFSStripedInputStream.readWithStrategy(DFSStripedInputStream.java:465) at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:946) at java.io.DataInputStream.read(DataInputStream.java:149) at org.apache.hadoop.hdfs.StripedFileTestUtil.verifyStatefulRead(StripedFileTestUtil.java:98) at org.apache.hadoop.hdfs.TestReadStripedFileWithDecoding.verifyRead(TestReadStripedFileWithDecoding.java:196) at org.apache.hadoop.hdfs.TestReadStripedFileWithDecoding.testOneFileWithBlockCorrupted(TestReadStripedFileWithDecoding.java:246) at org.apache.hadoop.hdfs.TestReadStripedFileWithDecoding.testReadCorruptedData11(TestReadStripedFileWithDecoding.java:114) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (HDFS-8732) Erasure Coding: Fail to read a file with corrupted blocks
[ https://issues.apache.org/jira/browse/HDFS-8732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinwei Qin reopened HDFS-8732: --- Erasure Coding: Fail to read a file with corrupted blocks - Key: HDFS-8732 URL: https://issues.apache.org/jira/browse/HDFS-8732 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Xinwei Qin Assignee: Walter Su Attachments: testReadCorruptedData.patch In system test of reading EC file(HDFS-8259), the methods {{testReadCorruptedData*()}} failed to read a EC file with corrupted blocks(overwrite some data to several blocks and this will make client get a checksum exception). Exception logs: {code} java.lang.NullPointerException at org.apache.hadoop.hdfs.DFSStripedInputStream$StatefulStripeReader.readChunk(DFSStripedInputStream.java:771) at org.apache.hadoop.hdfs.DFSStripedInputStream$StripeReader.readStripe(DFSStripedInputStream.java:623) at org.apache.hadoop.hdfs.DFSStripedInputStream.readOneStripe(DFSStripedInputStream.java:335) at org.apache.hadoop.hdfs.DFSStripedInputStream.readWithStrategy(DFSStripedInputStream.java:465) at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:946) at java.io.DataInputStream.read(DataInputStream.java:149) at org.apache.hadoop.hdfs.StripedFileTestUtil.verifyStatefulRead(StripedFileTestUtil.java:98) at org.apache.hadoop.hdfs.TestReadStripedFileWithDecoding.verifyRead(TestReadStripedFileWithDecoding.java:196) at org.apache.hadoop.hdfs.TestReadStripedFileWithDecoding.testOneFileWithBlockCorrupted(TestReadStripedFileWithDecoding.java:246) at org.apache.hadoop.hdfs.TestReadStripedFileWithDecoding.testReadCorruptedData11(TestReadStripedFileWithDecoding.java:114) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-8732) Erasure Coding: Fail to read a file with corrupted blocks
[ https://issues.apache.org/jira/browse/HDFS-8732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinwei Qin resolved HDFS-8732. --- Resolution: Duplicate Erasure Coding: Fail to read a file with corrupted blocks - Key: HDFS-8732 URL: https://issues.apache.org/jira/browse/HDFS-8732 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Xinwei Qin Assignee: Walter Su Attachments: testReadCorruptedData.patch In system test of reading EC file(HDFS-8259), the methods {{testReadCorruptedData*()}} failed to read a EC file with corrupted blocks(overwrite some data to several blocks and this will make client get a checksum exception). Exception logs: {code} java.lang.NullPointerException at org.apache.hadoop.hdfs.DFSStripedInputStream$StatefulStripeReader.readChunk(DFSStripedInputStream.java:771) at org.apache.hadoop.hdfs.DFSStripedInputStream$StripeReader.readStripe(DFSStripedInputStream.java:623) at org.apache.hadoop.hdfs.DFSStripedInputStream.readOneStripe(DFSStripedInputStream.java:335) at org.apache.hadoop.hdfs.DFSStripedInputStream.readWithStrategy(DFSStripedInputStream.java:465) at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:946) at java.io.DataInputStream.read(DataInputStream.java:149) at org.apache.hadoop.hdfs.StripedFileTestUtil.verifyStatefulRead(StripedFileTestUtil.java:98) at org.apache.hadoop.hdfs.TestReadStripedFileWithDecoding.verifyRead(TestReadStripedFileWithDecoding.java:196) at org.apache.hadoop.hdfs.TestReadStripedFileWithDecoding.testOneFileWithBlockCorrupted(TestReadStripedFileWithDecoding.java:246) at org.apache.hadoop.hdfs.TestReadStripedFileWithDecoding.testReadCorruptedData11(TestReadStripedFileWithDecoding.java:114) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8732) Erasure Coding: Fail to read a file with corrupted blocks
Xinwei Qin created HDFS-8732: - Summary: Erasure Coding: Fail to read a file with corrupted blocks Key: HDFS-8732 URL: https://issues.apache.org/jira/browse/HDFS-8732 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Xinwei Qin In system test of reading EC file(HDFS-8259), the methods {{testReadCorruptedData*()}} failed to read a EC file with corrupted blocks(overwrite some data to several blocks and this will make client get a checksum exception). Exception logs: {code} java.lang.NullPointerException at org.apache.hadoop.hdfs.DFSStripedInputStream$StatefulStripeReader.readChunk(DFSStripedInputStream.java:771) at org.apache.hadoop.hdfs.DFSStripedInputStream$StripeReader.readStripe(DFSStripedInputStream.java:623) at org.apache.hadoop.hdfs.DFSStripedInputStream.readOneStripe(DFSStripedInputStream.java:335) at org.apache.hadoop.hdfs.DFSStripedInputStream.readWithStrategy(DFSStripedInputStream.java:465) at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:946) at java.io.DataInputStream.read(DataInputStream.java:149) at org.apache.hadoop.hdfs.StripedFileTestUtil.verifyStatefulRead(StripedFileTestUtil.java:98) at org.apache.hadoop.hdfs.TestReadStripedFileWithDecoding.verifyRead(TestReadStripedFileWithDecoding.java:196) at org.apache.hadoop.hdfs.TestReadStripedFileWithDecoding.testOneFileWithBlockCorrupted(TestReadStripedFileWithDecoding.java:246) at org.apache.hadoop.hdfs.TestReadStripedFileWithDecoding.testReadCorruptedData11(TestReadStripedFileWithDecoding.java:114) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8732) Erasure Coding: Fail to read a file with corrupted blocks
[ https://issues.apache.org/jira/browse/HDFS-8732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14618199#comment-14618199 ] Xinwei Qin commented on HDFS-8732: --- Hi, [~hitliuyi], I noticed HDFS-8602 had resolved the similar problem, but it cannot fix the issue in this jira. Thanks [~walter.k.su] to clarify. The error log in HDFS-8602: {code} 2015-07-08 16:19:04,742 ERROR datanode.DataNode (BlockSender.java:sendPacket(615)) - BlockSender.sendChunks() exception: java.io.EOFException: EOF Reached. file size is 10 and 65526 more bytes left to be transfered. at org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:228) at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:585) at org.apache.hadoop.hdfs.server.datanode.BlockSender.doSendBlock(BlockSender.java:765) at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:712) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:556) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:116) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:71) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:256) at java.lang.Thread.run(Thread.java:722) {code} and the error or warn log of this jira: {code} 2015-07-08 15:05:13,455 WARN hdfs.DFSClient (DFSInputStream.java:actualGetFromOneDataNode(1203)) - fetchBlockByteRange(). Got a checksum exception for /partially_corrupted_1_0 at BP-1928182115-9.96.1.31-1436339108502:blk_-9223372036854775792_1001:13824 from DatanodeInfoWithStorage[127.0.0.1:36871,DS-cfab070a-8983-4c61-8647-eb0526df31c9,DISK] 2015-07-08 15:05:13,457 WARN hdfs.DFSClient (StripedBlockUtil.java:getNextCompletedStripedRead(215)) - ExecutionException java.util.concurrent.ExecutionException: java.io.IOException: fetchBlockByteRange(). Got a checksum exception for /partially_corrupted_1_0 at BP-1928182115-9.96.1.31-1436339108502:blk_-9223372036854775792_1001:13824 from DatanodeInfoWithStorage[127.0.0.1:36871,DS-cfab070a-8983-4c61-8647-eb0526df31c9,DISK] 2015-07-08 15:05:13,560 INFO hdfs.StateChange (FSNamesystem.java:reportBadBlocks(5783)) - *DIR* reportBadBlocks 2015-07-08 15:05:13,561 INFO BlockStateChange (CorruptReplicasMap.java:addToCorruptReplicasMap(76)) - BLOCK NameSystem.addToCorruptReplicasMap: blk_-9223372036854775792 added as corrupt on 127.0.0.1:36871 by /127.0.0.1 because client machine reported it 2015-07-08 15:05:13,690 WARN hdfs.DFSClient (DFSInputStream.java:actualGetFromOneDataNode(1203)) - fetchBlockByteRange(). Got a checksum exception for /partially_corrupted_1_0 at BP-1928182115-9.96.1.31-1436339108502:blk_-9223372036854775792_1001:13824 from DatanodeInfoWithStorage[127.0.0.1:36871,DS-cfab070a-8983-4c61-8647-eb0526df31c9,DISK] 2015-07-08 15:05:13,693 WARN hdfs.DFSClient (StripedBlockUtil.java:getNextCompletedStripedRead(215)) - ExecutionException java.util.concurrent.ExecutionException: java.io.IOException: fetchBlockByteRange(). Got a checksum exception for /partially_corrupted_1_0 at BP-1928182115-9.96.1.31-1436339108502:blk_-9223372036854775792_1001:13824 from DatanodeInfoWithStorage[127.0.0.1:36871,DS-cfab070a-8983-4c61-8647-eb0526df31c9,DISK] 2015-07-08 15:05:13,705 INFO hdfs.StateChange (FSNamesystem.java:reportBadBlocks(5783)) - *DIR* reportBadBlocks 2015-07-08 15:05:13,706 INFO BlockStateChange (CorruptReplicasMap.java:addToCorruptReplicasMap(81)) - BLOCK NameSystem.addToCorruptReplicasMap: duplicate requested for blk_-9223372036854775792 to add as corrupt on 127.0.0.1:36871 by /127.0.0.1 because client machine reported it 2015-07-08 15:05:14,033 INFO FSNamesystem.audit (FSNamesystem.java:logAuditMessage(7816)) - allowed=trueugi=root (auth:SIMPLE) ip=/127.0.0.1 cmd=opensrc=/partially_corrupted_1_0 dst=nullperm=null proto=rpc 2015-07-08 15:05:14,049 INFO hdfs.MiniDFSCluster (MiniDFSCluster.java:shutdown(1728)) - Shutting down the Mini HDFS Cluster {code} Erasure Coding: Fail to read a file with corrupted blocks - Key: HDFS-8732 URL: https://issues.apache.org/jira/browse/HDFS-8732 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Xinwei Qin Assignee: Walter Su In system test of reading EC file(HDFS-8259), the methods {{testReadCorruptedData*()}} failed to read a EC file with corrupted blocks(overwrite some data to several blocks and this will make client get a checksum exception). Exception logs: {code} java.lang.NullPointerException at org.apache.hadoop.hdfs.DFSStripedInputStream$StatefulStripeReader.readChunk(DFSStripedInputStream.java:771) at
[jira] [Updated] (HDFS-8259) Erasure Coding: System Test of reading EC file
[ https://issues.apache.org/jira/browse/HDFS-8259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinwei Qin updated HDFS-8259: -- Status: Patch Available (was: Open) Erasure Coding: System Test of reading EC file -- Key: HDFS-8259 URL: https://issues.apache.org/jira/browse/HDFS-8259 Project: Hadoop HDFS Issue Type: Test Affects Versions: HDFS-7285 Reporter: GAO Rui Assignee: Xinwei Qin Attachments: HDFS-8259-HDFS-7285.001.patch 1. Normally reading EC file(reading without datanote failure and no need of recovery) 2. Reading EC file with datanode failure. 3. Reading EC file with data block recovery by decoding from parity blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8732) Erasure Coding: Fail to read a file with corrupted blocks
[ https://issues.apache.org/jira/browse/HDFS-8732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14618246#comment-14618246 ] Xinwei Qin commented on HDFS-8732: --- a simple patch Erasure Coding: Fail to read a file with corrupted blocks - Key: HDFS-8732 URL: https://issues.apache.org/jira/browse/HDFS-8732 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Xinwei Qin Assignee: Walter Su Attachments: testReadCorruptedData.patch In system test of reading EC file(HDFS-8259), the methods {{testReadCorruptedData*()}} failed to read a EC file with corrupted blocks(overwrite some data to several blocks and this will make client get a checksum exception). Exception logs: {code} java.lang.NullPointerException at org.apache.hadoop.hdfs.DFSStripedInputStream$StatefulStripeReader.readChunk(DFSStripedInputStream.java:771) at org.apache.hadoop.hdfs.DFSStripedInputStream$StripeReader.readStripe(DFSStripedInputStream.java:623) at org.apache.hadoop.hdfs.DFSStripedInputStream.readOneStripe(DFSStripedInputStream.java:335) at org.apache.hadoop.hdfs.DFSStripedInputStream.readWithStrategy(DFSStripedInputStream.java:465) at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:946) at java.io.DataInputStream.read(DataInputStream.java:149) at org.apache.hadoop.hdfs.StripedFileTestUtil.verifyStatefulRead(StripedFileTestUtil.java:98) at org.apache.hadoop.hdfs.TestReadStripedFileWithDecoding.verifyRead(TestReadStripedFileWithDecoding.java:196) at org.apache.hadoop.hdfs.TestReadStripedFileWithDecoding.testOneFileWithBlockCorrupted(TestReadStripedFileWithDecoding.java:246) at org.apache.hadoop.hdfs.TestReadStripedFileWithDecoding.testReadCorruptedData11(TestReadStripedFileWithDecoding.java:114) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8732) Erasure Coding: Fail to read a file with corrupted blocks
[ https://issues.apache.org/jira/browse/HDFS-8732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinwei Qin updated HDFS-8732: -- Attachment: testReadCorruptedData.patch Attach a simple path including a test method to reproduce the exception and verify the solution in next step. The detail and comprehensive tests are in HDFS-8259. Erasure Coding: Fail to read a file with corrupted blocks - Key: HDFS-8732 URL: https://issues.apache.org/jira/browse/HDFS-8732 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Xinwei Qin Assignee: Walter Su Attachments: testReadCorruptedData.patch In system test of reading EC file(HDFS-8259), the methods {{testReadCorruptedData*()}} failed to read a EC file with corrupted blocks(overwrite some data to several blocks and this will make client get a checksum exception). Exception logs: {code} java.lang.NullPointerException at org.apache.hadoop.hdfs.DFSStripedInputStream$StatefulStripeReader.readChunk(DFSStripedInputStream.java:771) at org.apache.hadoop.hdfs.DFSStripedInputStream$StripeReader.readStripe(DFSStripedInputStream.java:623) at org.apache.hadoop.hdfs.DFSStripedInputStream.readOneStripe(DFSStripedInputStream.java:335) at org.apache.hadoop.hdfs.DFSStripedInputStream.readWithStrategy(DFSStripedInputStream.java:465) at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:946) at java.io.DataInputStream.read(DataInputStream.java:149) at org.apache.hadoop.hdfs.StripedFileTestUtil.verifyStatefulRead(StripedFileTestUtil.java:98) at org.apache.hadoop.hdfs.TestReadStripedFileWithDecoding.verifyRead(TestReadStripedFileWithDecoding.java:196) at org.apache.hadoop.hdfs.TestReadStripedFileWithDecoding.testOneFileWithBlockCorrupted(TestReadStripedFileWithDecoding.java:246) at org.apache.hadoop.hdfs.TestReadStripedFileWithDecoding.testReadCorruptedData11(TestReadStripedFileWithDecoding.java:114) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8259) Erasure Coding: System Test of reading EC file
[ https://issues.apache.org/jira/browse/HDFS-8259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14618181#comment-14618181 ] Xinwei Qin commented on HDFS-8259: --- In this patch, the test methods {{testReadCorruptedData*()}} will fail to read a EC file with corrupted blocks, I have created HDFS-8732 to address it. Erasure Coding: System Test of reading EC file -- Key: HDFS-8259 URL: https://issues.apache.org/jira/browse/HDFS-8259 Project: Hadoop HDFS Issue Type: Test Affects Versions: HDFS-7285 Reporter: GAO Rui Assignee: Xinwei Qin Attachments: HDFS-8259-HDFS-7285.001.patch 1. Normally reading EC file(reading without datanote failure and no need of recovery) 2. Reading EC file with datanode failure. 3. Reading EC file with data block recovery by decoding from parity blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8259) Erasure Coding: System Test of reading EC file
[ https://issues.apache.org/jira/browse/HDFS-8259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14618082#comment-14618082 ] Xinwei Qin commented on HDFS-8259: --- Attached the patch to review. About reading a file with some datanodes failure or some blocks corrupted or some blocks deleted. Erasure Coding: System Test of reading EC file -- Key: HDFS-8259 URL: https://issues.apache.org/jira/browse/HDFS-8259 Project: Hadoop HDFS Issue Type: Test Affects Versions: HDFS-7285 Reporter: GAO Rui Assignee: Xinwei Qin Attachments: HDFS-8259-HDFS-7285.001.patch 1. Normally reading EC file(reading without datanote failure and no need of recovery) 2. Reading EC file with datanode failure. 3. Reading EC file with data block recovery by decoding from parity blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8202) Improve end to end stirpping file test to add erasure recovering test
[ https://issues.apache.org/jira/browse/HDFS-8202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14618097#comment-14618097 ] Xinwei Qin commented on HDFS-8202: --- Maybe this jira is duplicated to HDFS-8259, I have attached a patch there, we can review and discuss in that jira. So, I feel this jira can be closed now. Improve end to end stirpping file test to add erasure recovering test - Key: HDFS-8202 URL: https://issues.apache.org/jira/browse/HDFS-8202 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Kai Zheng Assignee: Xinwei Qin Attachments: HDFS-8202.001.patch, HDFS-8202.002.patch This to follow on HDFS-8201 to add erasure recovering test in the end to end stripping file test: * After writing certain blocks to the test file, delete some block file; * Read the file content and compare, see if any recovering issue, or verify the erasure recovering works or not. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8259) Erasure Coding: System Test of reading EC file
[ https://issues.apache.org/jira/browse/HDFS-8259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinwei Qin updated HDFS-8259: -- Attachment: HDFS-8259-HDFS-7285.001.patch Erasure Coding: System Test of reading EC file -- Key: HDFS-8259 URL: https://issues.apache.org/jira/browse/HDFS-8259 Project: Hadoop HDFS Issue Type: Test Affects Versions: HDFS-7285 Reporter: GAO Rui Assignee: Xinwei Qin Attachments: HDFS-8259-HDFS-7285.001.patch 1. Normally reading EC file(reading without datanote failure and no need of recovery) 2. Reading EC file with datanode failure. 3. Reading EC file with data block recovery by decoding from parity blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8710) Always read DU value from the cached dfsUsed file on datanode startup
[ https://issues.apache.org/jira/browse/HDFS-8710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14613014#comment-14613014 ] Xinwei Qin commented on HDFS-8710: --- [~aw], thanks for your comment. The du value will be recalculated after 600 seconds, we don't need to calculated a precise value on startup. The current project cannot also recalculate du value when the disk structure changed if the {{dfsUsed}} value is less than 600 seconds old. In a large cluster, the DU can even cost several or tens of minutes, which slows down startup speed of the whole cluster, so quick startup is necessary. Maybe always skip DU is radical, add a quick-restart configuration(default is true to skip DU) for datanode is more reasonable, when disk structure is changed, user can turn off quick-restart configuration to trigger DU to recalculate dfsused value. Any thoughts? Always read DU value from the cached dfsUsed file on datanode startup --- Key: HDFS-8710 URL: https://issues.apache.org/jira/browse/HDFS-8710 Project: Hadoop HDFS Issue Type: Improvement Reporter: Xinwei Qin Assignee: Xinwei Qin Attachments: HDFS-8710.001.patch Currently, DataNode will cache DU value in dfsUsed file termly. When DataNode starts or restarts, it will read in the cached DU value from dfsUsed file if the value is less than 600 seconds old, otherwise, it will run DU command, which is a very time-consuming operation(may up to dozens of minutes) when DataNode has huge number of blocks. Since slight imprecision of dfsUsed is not critical, and the DU value will be updated every 600 seconds (the default DU interval) after DataNode started, we can always read DU value from the cached file (Regardless of whether this value is less than 600 seconds old or not) and skip DU operation on DataNode startup to significantly shorten the startup time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8260) Erasure Coding: test of writing EC file
[ https://issues.apache.org/jira/browse/HDFS-8260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14612934#comment-14612934 ] Xinwei Qin commented on HDFS-8260: --- Hi [~demongaorui], sorry for busy with other work in last several weeks, I will work on these jiras in next several days, and upload the patch ASAP. Erasure Coding: test of writing EC file Key: HDFS-8260 URL: https://issues.apache.org/jira/browse/HDFS-8260 Project: Hadoop HDFS Issue Type: Test Affects Versions: HDFS-7285 Reporter: GAO Rui Assignee: Xinwei Qin 1. Normally writing EC file(writing without datanote failure) 2. Writing EC file with tolerable number of datanodes failing. 3. Writing EC file with intolerable number of datanodes failing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8710) Always read DU value from the cached dfsUsed file on datanode startup
[ https://issues.apache.org/jira/browse/HDFS-8710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinwei Qin updated HDFS-8710: -- Attachment: HDFS-8710.001.patch Attached the patch to review. Always read DU value from the cached dfsUsed file on datanode startup --- Key: HDFS-8710 URL: https://issues.apache.org/jira/browse/HDFS-8710 Project: Hadoop HDFS Issue Type: Improvement Reporter: Xinwei Qin Assignee: Xinwei Qin Attachments: HDFS-8710.001.patch Currently, DataNode will cache DU value in dfsUsed file termly. When DataNode starts or restarts, it will read in the cached DU value from dfsUsed file if the value is less than 600 seconds old, otherwise, it will run DU command, which is a very time-consuming operation(may up to dozens of minutes) when DataNode has huge number of blocks. Since slight imprecision of dfsUsed is not critical, and the DU value will be updated every 600 seconds (the default DU interval) after DataNode started, we can always read DU value from the cached file (Regardless of whether this value is less than 600 seconds old or not) and skip DU operation on DataNode startup to significantly shorten the startup time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8710) Always read DU value from the cached dfsUsed file on datanode startup
Xinwei Qin created HDFS-8710: - Summary: Always read DU value from the cached dfsUsed file on datanode startup Key: HDFS-8710 URL: https://issues.apache.org/jira/browse/HDFS-8710 Project: Hadoop HDFS Issue Type: Improvement Reporter: Xinwei Qin Assignee: Xinwei Qin Currently, DataNode will cache DU value in dfsUsed file termly. When DataNode starts or restarts, it will read in the cached DU value from dfsUsed file if the value is less than 600 seconds old, otherwise, it will run DU command, which is a very time-consuming operation(may up to dozens of minutes) when DataNode has huge number of blocks. Since slight imprecision of dfsUsed is not critical, and the DU value will be updated every 600 seconds (the default DU interval) after DataNode started, we can always read DU value from the cached file (Regardless of whether this value is less than 600 seconds old or not) and skip DU operation on DataNode startup to significantly shorten the startup time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8710) Always read DU value from the cached dfsUsed file on datanode startup
[ https://issues.apache.org/jira/browse/HDFS-8710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinwei Qin updated HDFS-8710: -- Status: Patch Available (was: Open) Always read DU value from the cached dfsUsed file on datanode startup --- Key: HDFS-8710 URL: https://issues.apache.org/jira/browse/HDFS-8710 Project: Hadoop HDFS Issue Type: Improvement Reporter: Xinwei Qin Assignee: Xinwei Qin Attachments: HDFS-8710.001.patch Currently, DataNode will cache DU value in dfsUsed file termly. When DataNode starts or restarts, it will read in the cached DU value from dfsUsed file if the value is less than 600 seconds old, otherwise, it will run DU command, which is a very time-consuming operation(may up to dozens of minutes) when DataNode has huge number of blocks. Since slight imprecision of dfsUsed is not critical, and the DU value will be updated every 600 seconds (the default DU interval) after DataNode started, we can always read DU value from the cached file (Regardless of whether this value is less than 600 seconds old or not) and skip DU operation on DataNode startup to significantly shorten the startup time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8201) Refactor the end to end test for stripping file writing and reading
[ https://issues.apache.org/jira/browse/HDFS-8201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinwei Qin updated HDFS-8201: -- Summary: Refactor the end to end test for stripping file writing and reading (was: Add an end to end test for stripping file writing and reading) Refactor the end to end test for stripping file writing and reading --- Key: HDFS-8201 URL: https://issues.apache.org/jira/browse/HDFS-8201 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Kai Zheng Assignee: Xinwei Qin Attachments: HDFS-8201-HDFS-7285.003.patch, HDFS-8201-HDFS-7285.004.patch, HDFS-8201.001.patch, HDFS-8201.002.patch According to off-line discussion with [~zhz] and [~xinwei], we need to implement an end to end test for stripping file support: * Create an EC zone; * Create a file in the zone; * Write various typical sizes of content to the file, each size maybe a test method; * Read the written content back; * Compare the written content and read content to ensure it's good; The test facility is subject to add more steps for erasure encoding and recovering. Will open separate issue for it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8201) Refactor the end to end test for stripping file writing and reading
[ https://issues.apache.org/jira/browse/HDFS-8201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14554239#comment-14554239 ] Xinwei Qin commented on HDFS-8201: --- Thanks [~zhz] for the review. 1. The JIRA description and summary has been updated. 2. I don't think it is necessary to eventually subclass {{TestWriteRead}}, as the non-EC files can be tested in the {{TestWriteRead}}, and the {{TestWriteReadStripedFile}} class can only aim to the striped files. What do you think? Refactor the end to end test for stripping file writing and reading --- Key: HDFS-8201 URL: https://issues.apache.org/jira/browse/HDFS-8201 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Kai Zheng Assignee: Xinwei Qin Attachments: HDFS-8201-HDFS-7285.003.patch, HDFS-8201-HDFS-7285.004.patch, HDFS-8201.001.patch, HDFS-8201.002.patch According to off-line discussion with [~zhz] and [~xinwei], we need to implement an end to end test for stripping file support: * Create an EC zone; * Create a file in the zone; * Write various typical sizes of content to the file, each size maybe a test method; * Read the written content back; * Compare the written content and read content to ensure it's good; This jira aims to refactor the end to end test class(TestWriteReadStripedFile) in order to reuse them conveniently in the next test step for erasure encoding and recovering. Will open separate issue for it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8201) Refactor the end to end test for stripping file writing and reading
[ https://issues.apache.org/jira/browse/HDFS-8201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinwei Qin updated HDFS-8201: -- Attachment: HDFS-8201-HDFS-7285.005.patch Refactor the end to end test for stripping file writing and reading --- Key: HDFS-8201 URL: https://issues.apache.org/jira/browse/HDFS-8201 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Kai Zheng Assignee: Xinwei Qin Attachments: HDFS-8201-HDFS-7285.003.patch, HDFS-8201-HDFS-7285.004.patch, HDFS-8201-HDFS-7285.005.patch, HDFS-8201.001.patch, HDFS-8201.002.patch According to off-line discussion with [~zhz] and [~xinwei], we need to implement an end to end test for stripping file support: * Create an EC zone; * Create a file in the zone; * Write various typical sizes of content to the file, each size maybe a test method; * Read the written content back; * Compare the written content and read content to ensure it's good; This jira aims to refactor the end to end test class(TestWriteReadStripedFile) in order to reuse them conveniently in the next test step for erasure encoding and recovering. Will open separate issue for it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8201) Refactor the end to end test for stripping file writing and reading
[ https://issues.apache.org/jira/browse/HDFS-8201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinwei Qin updated HDFS-8201: -- Description: According to off-line discussion with [~zhz] and [~xinwei], we need to implement an end to end test for stripping file support: * Create an EC zone; * Create a file in the zone; * Write various typical sizes of content to the file, each size maybe a test method; * Read the written content back; * Compare the written content and read content to ensure it's good; This jira aims to refactor the end to end test class(TestWriteReadStripedFile) in order to reuse them conveniently in the next test step for erasure encoding and recovering. Will open separate issue for it. was: According to off-line discussion with [~zhz] and [~xinwei], we need to implement an end to end test for stripping file support: * Create an EC zone; * Create a file in the zone; * Write various typical sizes of content to the file, each size maybe a test method; * Read the written content back; * Compare the written content and read content to ensure it's good; The test facility is subject to add more steps for erasure encoding and recovering. Will open separate issue for it. Refactor the end to end test for stripping file writing and reading --- Key: HDFS-8201 URL: https://issues.apache.org/jira/browse/HDFS-8201 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Kai Zheng Assignee: Xinwei Qin Attachments: HDFS-8201-HDFS-7285.003.patch, HDFS-8201-HDFS-7285.004.patch, HDFS-8201.001.patch, HDFS-8201.002.patch According to off-line discussion with [~zhz] and [~xinwei], we need to implement an end to end test for stripping file support: * Create an EC zone; * Create a file in the zone; * Write various typical sizes of content to the file, each size maybe a test method; * Read the written content back; * Compare the written content and read content to ensure it's good; This jira aims to refactor the end to end test class(TestWriteReadStripedFile) in order to reuse them conveniently in the next test step for erasure encoding and recovering. Will open separate issue for it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8201) Add an end to end test for stripping file writing and reading
[ https://issues.apache.org/jira/browse/HDFS-8201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinwei Qin updated HDFS-8201: -- Attachment: HDFS-8201-HDFS-7285.004.patch Update patch with the latest branch-7285. Add an end to end test for stripping file writing and reading - Key: HDFS-8201 URL: https://issues.apache.org/jira/browse/HDFS-8201 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Kai Zheng Assignee: Xinwei Qin Attachments: HDFS-8201-HDFS-7285.003.patch, HDFS-8201-HDFS-7285.004.patch, HDFS-8201.001.patch, HDFS-8201.002.patch According to off-line discussion with [~zhz] and [~xinwei], we need to implement an end to end test for stripping file support: * Create an EC zone; * Create a file in the zone; * Write various typical sizes of content to the file, each size maybe a test method; * Read the written content back; * Compare the written content and read content to ensure it's good; The test facility is subject to add more steps for erasure encoding and recovering. Will open separate issue for it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8202) Improve end to end stirpping file test to add erasure recovering test
[ https://issues.apache.org/jira/browse/HDFS-8202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinwei Qin updated HDFS-8202: -- Attachment: HDFS-8202.002.patch Update patch with the latest branch-7285. Improve end to end stirpping file test to add erasure recovering test - Key: HDFS-8202 URL: https://issues.apache.org/jira/browse/HDFS-8202 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Kai Zheng Assignee: Xinwei Qin Attachments: HDFS-8202.001.patch, HDFS-8202.002.patch This to follow on HDFS-8201 to add erasure recovering test in the end to end stripping file test: * After writing certain blocks to the test file, delete some block file; * Read the file content and compare, see if any recovering issue, or verify the erasure recovering works or not. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8201) Add an end to end test for stripping file writing and reading
[ https://issues.apache.org/jira/browse/HDFS-8201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinwei Qin updated HDFS-8201: -- Status: Patch Available (was: In Progress) Add an end to end test for stripping file writing and reading - Key: HDFS-8201 URL: https://issues.apache.org/jira/browse/HDFS-8201 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Kai Zheng Assignee: Xinwei Qin Attachments: HDFS-8201-HDFS-7285.003.patch, HDFS-8201-HDFS-7285.004.patch, HDFS-8201.001.patch, HDFS-8201.002.patch According to off-line discussion with [~zhz] and [~xinwei], we need to implement an end to end test for stripping file support: * Create an EC zone; * Create a file in the zone; * Write various typical sizes of content to the file, each size maybe a test method; * Read the written content back; * Compare the written content and read content to ensure it's good; The test facility is subject to add more steps for erasure encoding and recovering. Will open separate issue for it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8201) Add an end to end test for stripping file writing and reading
[ https://issues.apache.org/jira/browse/HDFS-8201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinwei Qin updated HDFS-8201: -- Attachment: HDFS-8201-HDFS-7285.003.patch Add an end to end test for stripping file writing and reading - Key: HDFS-8201 URL: https://issues.apache.org/jira/browse/HDFS-8201 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Kai Zheng Assignee: Xinwei Qin Attachments: HDFS-8201-HDFS-7285.003.patch, HDFS-8201.001.patch, HDFS-8201.002.patch According to off-line discussion with [~zhz] and [~xinwei], we need to implement an end to end test for stripping file support: * Create an EC zone; * Create a file in the zone; * Write various typical sizes of content to the file, each size maybe a test method; * Read the written content back; * Compare the written content and read content to ensure it's good; The test facility is subject to add more steps for erasure encoding and recovering. Will open separate issue for it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8201) Add an end to end test for stripping file writing and reading
[ https://issues.apache.org/jira/browse/HDFS-8201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537478#comment-14537478 ] Xinwei Qin commented on HDFS-8201: --- Attached the patch with a name {{HDFS-8201-HDFS-7285.003.patch}}. [~rakeshr], thanks for your adivce. Add an end to end test for stripping file writing and reading - Key: HDFS-8201 URL: https://issues.apache.org/jira/browse/HDFS-8201 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Kai Zheng Assignee: Xinwei Qin Attachments: HDFS-8201-HDFS-7285.003.patch, HDFS-8201.001.patch, HDFS-8201.002.patch According to off-line discussion with [~zhz] and [~xinwei], we need to implement an end to end test for stripping file support: * Create an EC zone; * Create a file in the zone; * Write various typical sizes of content to the file, each size maybe a test method; * Read the written content back; * Compare the written content and read content to ensure it's good; The test facility is subject to add more steps for erasure encoding and recovering. Will open separate issue for it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8201) Add an end to end test for stripping file writing and reading
[ https://issues.apache.org/jira/browse/HDFS-8201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinwei Qin updated HDFS-8201: -- Attachment: HDFS-8201.002.patch Add an end to end test for stripping file writing and reading - Key: HDFS-8201 URL: https://issues.apache.org/jira/browse/HDFS-8201 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Kai Zheng Assignee: Xinwei Qin Attachments: HDFS-8201.001.patch, HDFS-8201.002.patch According to off-line discussion with [~zhz] and [~xinwei], we need to implement an end to end test for stripping file support: * Create an EC zone; * Create a file in the zone; * Write various typical sizes of content to the file, each size maybe a test method; * Read the written content back; * Compare the written content and read content to ensure it's good; The test facility is subject to add more steps for erasure encoding and recovering. Will open separate issue for it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8202) Improve end to end stirpping file test to add erasure recovering test
[ https://issues.apache.org/jira/browse/HDFS-8202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinwei Qin updated HDFS-8202: -- Attachment: HDFS-8202.001.patch Improve end to end stirpping file test to add erasure recovering test - Key: HDFS-8202 URL: https://issues.apache.org/jira/browse/HDFS-8202 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Kai Zheng Assignee: Xinwei Qin Attachments: HDFS-8202.001.patch This to follow on HDFS-8201 to add erasure recovering test in the end to end stripping file test: * After writing certain blocks to the test file, delete some block file; * Read the file content and compare, see if any recovering issue, or verify the erasure recovering works or not. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8201) Add an end to end test for stripping file writing and reading
[ https://issues.apache.org/jira/browse/HDFS-8201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14534230#comment-14534230 ] Xinwei Qin commented on HDFS-8201: --- The {{TestWriteReadStripedFile}} has implemented a comprehensive end to end test for writing and reading, the 002 patch mainly separates the write and read test method in order to reused them in next test such as HDFS-8202. Add an end to end test for stripping file writing and reading - Key: HDFS-8201 URL: https://issues.apache.org/jira/browse/HDFS-8201 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Kai Zheng Assignee: Xinwei Qin Attachments: HDFS-8201.001.patch, HDFS-8201.002.patch According to off-line discussion with [~zhz] and [~xinwei], we need to implement an end to end test for stripping file support: * Create an EC zone; * Create a file in the zone; * Write various typical sizes of content to the file, each size maybe a test method; * Read the written content back; * Compare the written content and read content to ensure it's good; The test facility is subject to add more steps for erasure encoding and recovering. Will open separate issue for it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8202) Improve end to end stirpping file test to add erasure recovering test
[ https://issues.apache.org/jira/browse/HDFS-8202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14534236#comment-14534236 ] Xinwei Qin commented on HDFS-8202: --- Initial patch for review. This patch is based on HDFS-8201. Improve end to end stirpping file test to add erasure recovering test - Key: HDFS-8202 URL: https://issues.apache.org/jira/browse/HDFS-8202 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Kai Zheng Assignee: Xinwei Qin Attachments: HDFS-8202.001.patch This to follow on HDFS-8201 to add erasure recovering test in the end to end stripping file test: * After writing certain blocks to the test file, delete some block file; * Read the file content and compare, see if any recovering issue, or verify the erasure recovering works or not. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7859) Erasure Coding: Persist EC schemas in NameNode
[ https://issues.apache.org/jira/browse/HDFS-7859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinwei Qin updated HDFS-7859: -- Attachment: HDFS-7859-HDFS-7285.003.patch Erasure Coding: Persist EC schemas in NameNode -- Key: HDFS-7859 URL: https://issues.apache.org/jira/browse/HDFS-7859 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Kai Zheng Assignee: Xinwei Qin Attachments: HDFS-7859-HDFS-7285.002.patch, HDFS-7859-HDFS-7285.002.patch, HDFS-7859-HDFS-7285.003.patch, HDFS-7859.001.patch, HDFS-7859.002.patch In meetup discussion with [~zhz] and [~jingzhao], it's suggested that we persist EC schemas in NameNode centrally and reliably, so that EC zones can reference them by name efficiently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8295) Add MODIFY and REMOVE ECSchema editlog operations
[ https://issues.apache.org/jira/browse/HDFS-8295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinwei Qin updated HDFS-8295: -- Issue Type: Sub-task (was: Task) Parent: HDFS-8031 Add MODIFY and REMOVE ECSchema editlog operations - Key: HDFS-8295 URL: https://issues.apache.org/jira/browse/HDFS-8295 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Xinwei Qin Assignee: Xinwei Qin If MODIFY and REMOVE ECSchema operations are supported, then add these editlog operations to persist them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7859) Erasure Coding: Persist EC schemas in NameNode
[ https://issues.apache.org/jira/browse/HDFS-7859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14521029#comment-14521029 ] Xinwei Qin commented on HDFS-7859: --- The 003 patch removes MODIFY and REMOVE ECSchema editlog operations, these operations will be added by another JIRA(HDFS-8295) later when they are supported. Erasure Coding: Persist EC schemas in NameNode -- Key: HDFS-7859 URL: https://issues.apache.org/jira/browse/HDFS-7859 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Kai Zheng Assignee: Xinwei Qin Attachments: HDFS-7859-HDFS-7285.002.patch, HDFS-7859-HDFS-7285.002.patch, HDFS-7859-HDFS-7285.003.patch, HDFS-7859.001.patch, HDFS-7859.002.patch In meetup discussion with [~zhz] and [~jingzhao], it's suggested that we persist EC schemas in NameNode centrally and reliably, so that EC zones can reference them by name efficiently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8295) Add MODIFY and REMOVE ECSchema editlog operations
Xinwei Qin created HDFS-8295: - Summary: Add MODIFY and REMOVE ECSchema editlog operations Key: HDFS-8295 URL: https://issues.apache.org/jira/browse/HDFS-8295 Project: Hadoop HDFS Issue Type: Task Reporter: Xinwei Qin Assignee: Xinwei Qin If MODIFY and REMOVE ECSchema operations are supported, then add these editlog operations to persist them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8295) Add MODIFY and REMOVE ECSchema editlog operations
[ https://issues.apache.org/jira/browse/HDFS-8295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinwei Qin updated HDFS-8295: -- Attachment: HDFS-8295.001.patch A initial patch based on HDFS-7859. Add MODIFY and REMOVE ECSchema editlog operations - Key: HDFS-8295 URL: https://issues.apache.org/jira/browse/HDFS-8295 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Xinwei Qin Assignee: Xinwei Qin Attachments: HDFS-8295.001.patch If MODIFY and REMOVE ECSchema operations are supported, then add these editlog operations to persist them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7836) BlockManager Scalability Improvements
[ https://issues.apache.org/jira/browse/HDFS-7836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518976#comment-14518976 ] Xinwei Qin commented on HDFS-7836: --- Hi [~cmccabe], [~clamb], This is a very meaningful improvement. Is there any update or next plan about this JIRA? Could you list a summary of the meeting held on March 11th? BlockManager Scalability Improvements - Key: HDFS-7836 URL: https://issues.apache.org/jira/browse/HDFS-7836 Project: Hadoop HDFS Issue Type: Improvement Reporter: Charles Lamb Assignee: Charles Lamb Attachments: BlockManagerScalabilityImprovementsDesign.pdf Improvements to BlockManager scalability. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HDFS-8259) Erasure Coding: Test of reading EC file
[ https://issues.apache.org/jira/browse/HDFS-8259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinwei Qin reassigned HDFS-8259: - Assignee: Xinwei Qin Erasure Coding: Test of reading EC file --- Key: HDFS-8259 URL: https://issues.apache.org/jira/browse/HDFS-8259 Project: Hadoop HDFS Issue Type: Test Affects Versions: HDFS-7285 Reporter: GAO Rui Assignee: Xinwei Qin 1. Normally reading EC file(reading without datanote failure and no need of recovery) 2. Reading EC file with datanode failure. 3. Reading EC file with data block recovery by decoding from parity blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HDFS-8262) Erasure Coding: Test of datanode decommission which EC blocks are stored
[ https://issues.apache.org/jira/browse/HDFS-8262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinwei Qin reassigned HDFS-8262: - Assignee: Xinwei Qin Erasure Coding: Test of datanode decommission which EC blocks are stored -- Key: HDFS-8262 URL: https://issues.apache.org/jira/browse/HDFS-8262 Project: Hadoop HDFS Issue Type: Test Reporter: GAO Rui Assignee: Xinwei Qin -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HDFS-8260) Erasure Coding: test of writing EC file
[ https://issues.apache.org/jira/browse/HDFS-8260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinwei Qin reassigned HDFS-8260: - Assignee: Xinwei Qin Erasure Coding: test of writing EC file Key: HDFS-8260 URL: https://issues.apache.org/jira/browse/HDFS-8260 Project: Hadoop HDFS Issue Type: Test Affects Versions: HDFS-7285 Reporter: GAO Rui Assignee: Xinwei Qin 1. Normally writing EC file(writing without datanote failure) 2. Writing EC file with tolerable number of datanodes failing. 3. Writing EC file with intolerable number of datanodes failing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8201) Add an end to end test for stripping file writing and reading
[ https://issues.apache.org/jira/browse/HDFS-8201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518545#comment-14518545 ] Xinwei Qin commented on HDFS-8201: --- [~zhz] I have noticed that JIRA. I think your suggestion is very good. Add an end to end test for stripping file writing and reading - Key: HDFS-8201 URL: https://issues.apache.org/jira/browse/HDFS-8201 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Kai Zheng Assignee: Xinwei Qin Attachments: HDFS-8201.001.patch According to off-line discussion with [~zhz] and [~xinwei], we need to implement an end to end test for stripping file support: * Create an EC zone; * Create a file in the zone; * Write various typical sizes of content to the file, each size maybe a test method; * Read the written content back; * Compare the written content and read content to ensure it's good; The test facility is subject to add more steps for erasure encoding and recovering. Will open separate issue for it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7859) Erasure Coding: Persist EC schemas in NameNode
[ https://issues.apache.org/jira/browse/HDFS-7859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinwei Qin updated HDFS-7859: -- Status: Patch Available (was: In Progress) Erasure Coding: Persist EC schemas in NameNode -- Key: HDFS-7859 URL: https://issues.apache.org/jira/browse/HDFS-7859 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Kai Zheng Assignee: Xinwei Qin Attachments: HDFS-7859-HDFS-7285.002.patch, HDFS-7859.001.patch, HDFS-7859.002.patch In meetup discussion with [~zhz] and [~jingzhao], it's suggested that we persist EC schemas in NameNode centrally and reliably, so that EC zones can reference them by name efficiently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7060) Avoid taking locks when sending heartbeats from the DataNode
[ https://issues.apache.org/jira/browse/HDFS-7060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14516265#comment-14516265 ] Xinwei Qin commented on HDFS-7060: --- [~jnp] Thanks for your comment. This test was done before applying HDFS-7999. Avoid taking locks when sending heartbeats from the DataNode Key: HDFS-7060 URL: https://issues.apache.org/jira/browse/HDFS-7060 Project: Hadoop HDFS Issue Type: Improvement Reporter: Haohui Mai Assignee: Xinwei Qin Attachments: HDFS-7060-002.patch, HDFS-7060.000.patch, HDFS-7060.001.patch We're seeing the heartbeat is blocked by the monitor of {{FsDatasetImpl}} when the DN is under heavy load of writes: {noformat} java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getDfsUsed(FsVolumeImpl.java:115) - waiting to lock 0x000780304fb8 (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getStorageReports(FsDatasetImpl.java:91) - locked 0x000780612fd8 (a java.lang.Object) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:563) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:668) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:827) at java.lang.Thread.run(Thread.java:744) java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:743) - waiting to lock 0x000780304fb8 (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:60) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.init(BlockReceiver.java:169) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:621) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:124) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:71) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:232) at java.lang.Thread.run(Thread.java:744) java.lang.Thread.State: RUNNABLE at java.io.UnixFileSystem.createFileExclusively(Native Method) at java.io.File.createNewFile(File.java:1006) at org.apache.hadoop.hdfs.server.datanode.DatanodeUtil.createTmpFile(DatanodeUtil.java:59) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.createRbwFile(BlockPoolSlice.java:244) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.createRbwFile(FsVolumeImpl.java:195) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:753) - locked 0x000780304fb8 (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:60) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.init(BlockReceiver.java:169) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:621) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:124) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:71) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:232) at java.lang.Thread.run(Thread.java:744) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7859) Erasure Coding: Persist EC schemas in NameNode
[ https://issues.apache.org/jira/browse/HDFS-7859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinwei Qin updated HDFS-7859: -- Attachment: HDFS-7859.002.patch Update the patch based on the latest HDFS-7285 branch and Kai's comments. Erasure Coding: Persist EC schemas in NameNode -- Key: HDFS-7859 URL: https://issues.apache.org/jira/browse/HDFS-7859 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Kai Zheng Assignee: Xinwei Qin Attachments: HDFS-7859.001.patch, HDFS-7859.002.patch In meetup discussion with [~zhz] and [~jingzhao], it's suggested that we persist EC schemas in NameNode centrally and reliably, so that EC zones can reference them by name efficiently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Work started] (HDFS-8201) Add an end to end test for stripping file writing and reading
[ https://issues.apache.org/jira/browse/HDFS-8201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HDFS-8201 started by Xinwei Qin . - Add an end to end test for stripping file writing and reading - Key: HDFS-8201 URL: https://issues.apache.org/jira/browse/HDFS-8201 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Kai Zheng Assignee: Xinwei Qin Attachments: HDFS-8201.001.patch According to off-line discussion with [~zhz] and [~xinwei], we need to implement an end to end test for stripping file support: * Create an EC zone; * Create a file in the zone; * Write various typical sizes of content to the file, each size maybe a test method; * Read the written content back; * Compare the written content and read content to ensure it's good; The test facility is subject to add more steps for erasure encoding and recovering. Will open separate issue for it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Work started] (HDFS-8202) Improve end to end stirpping file test to add erasure recovering test
[ https://issues.apache.org/jira/browse/HDFS-8202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HDFS-8202 started by Xinwei Qin . - Improve end to end stirpping file test to add erasure recovering test - Key: HDFS-8202 URL: https://issues.apache.org/jira/browse/HDFS-8202 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Kai Zheng Assignee: Xinwei Qin This to follow on HDFS-8201 to add erasure recovering test in the end to end stripping file test: * After writing certain blocks to the test file, delete some block file; * Read the file content and compare, see if any recovering issue, or verify the erasure recovering works or not. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8201) Add an end to end test for stripping file writing and reading
[ https://issues.apache.org/jira/browse/HDFS-8201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinwei Qin updated HDFS-8201: -- Attachment: HDFS-8201.001.patch Add an end to end test for stripping file writing and reading - Key: HDFS-8201 URL: https://issues.apache.org/jira/browse/HDFS-8201 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Kai Zheng Assignee: Xinwei Qin Attachments: HDFS-8201.001.patch According to off-line discussion with [~zhz] and [~xinwei], we need to implement an end to end test for stripping file support: * Create an EC zone; * Create a file in the zone; * Write various typical sizes of content to the file, each size maybe a test method; * Read the written content back; * Compare the written content and read content to ensure it's good; The test facility is subject to add more steps for erasure encoding and recovering. Will open separate issue for it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8202) Improve end to end stirpping file test to add erasure recovering test
[ https://issues.apache.org/jira/browse/HDFS-8202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinwei Qin updated HDFS-8202: -- Description: This to follow on HDFS-8201 to add erasure recovering test in the end to end stripping file test: * After writing certain blocks to the test file, delete some block file; * Read the file content and compare, see if any recovering issue, or verify the erasure recovering works or not. was: This to follow on HDFS-8021 to add erasure recovering test in the end to end stripping file test: * After writing certain blocks to the test file, delete some block file; * Read the file content and compare, see if any recovering issue, or verify the erasure recovering works or not. Improve end to end stirpping file test to add erasure recovering test - Key: HDFS-8202 URL: https://issues.apache.org/jira/browse/HDFS-8202 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Kai Zheng Assignee: Xinwei Qin This to follow on HDFS-8201 to add erasure recovering test in the end to end stripping file test: * After writing certain blocks to the test file, delete some block file; * Read the file content and compare, see if any recovering issue, or verify the erasure recovering works or not. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8156) Add/implement necessary APIs even we just have the system default schema
[ https://issues.apache.org/jira/browse/HDFS-8156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14506221#comment-14506221 ] Xinwei Qin commented on HDFS-8156: --- A minor comment: The {{toString()}} method will output string similar to the following format: {code} ECSchema=[Name=xxx,option1=xxx,option2=xxx,option3=xxx,] {code} The last option will have a comma. Remove the last comma and add a whitespace between every two options will be better. Add/implement necessary APIs even we just have the system default schema Key: HDFS-8156 URL: https://issues.apache.org/jira/browse/HDFS-8156 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Kai Zheng Assignee: Kai Zheng Attachments: HDFS-8156-v1.patch, HDFS-8156-v2.patch, HDFS-8156-v3.patch, HDFS-8156-v4.patch, HDFS-8156-v5.patch, HDFS-8156-v6.patch, HDFS-8156-v7.patch, HDFS-8156-v8.patch According to the discussion here, this issue was repurposed and modified. This is to add and implement some necessary APIs even we just have the system default schema, to resolve some TODOs left for HDFS-7859 and HDFS-7866 as they're still subject to further discussion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8125) Erasure Coding: Expose refreshECSchemas command to reload predefined schemas
[ https://issues.apache.org/jira/browse/HDFS-8125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14499371#comment-14499371 ] Xinwei Qin commented on HDFS-8125: --- As HDFS-7859 and HDFS-7866 are all moved to HDFS-8031, this jira can or should be moved to HDFS-8031, too. Erasure Coding: Expose refreshECSchemas command to reload predefined schemas Key: HDFS-8125 URL: https://issues.apache.org/jira/browse/HDFS-8125 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Rakesh R Assignee: Rakesh R This is to expose {{refreshECSchemas}} command to administrators. When invoking this command it will reload predefined schemas from configuration file and dynamically update the schema definitions maintained in Namenode. Note: For more details please refer the [discussion|https://issues.apache.org/jira/browse/HDFS-7866?focusedCommentId=14489387page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14489387] with [~drankye] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8156) Define some system schemas in codes
[ https://issues.apache.org/jira/browse/HDFS-8156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14499359#comment-14499359 ] Xinwei Qin commented on HDFS-8156: --- Patch looks fine. Just a minor comment: {code} - FSNDNCacheOp.removeCachePool(this, cacheManager, cachePoolName, - logRetryCache); + FSNDNCacheOp.removeCachePool(this, cacheManager, cachePoolName, logRetryCache); {code} {code} - createEncryptionZoneInt(src, metadata.getCipher(), - keyName, logRetryCache); + createEncryptionZoneInt(src, metadata.getCipher(), keyName, logRetryCache); {code} These two modifications have no relationship with this issue, you'd better remove it out. Define some system schemas in codes --- Key: HDFS-8156 URL: https://issues.apache.org/jira/browse/HDFS-8156 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Kai Zheng Assignee: Kai Zheng Attachments: HDFS-8156-v1.patch, HDFS-8156-v2.patch, HDFS-8156-v3.patch This is to define and add some system schemas in codes, and also resolve some TODOs left for HDFS-7859 and HDFS-7866 as they're still subject to further discussion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8140) ECSchema supports for offline EditsVistor over an OEV XML file
[ https://issues.apache.org/jira/browse/HDFS-8140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinwei Qin updated HDFS-8140: -- Issue Type: Sub-task (was: Task) Parent: HDFS-8031 ECSchema supports for offline EditsVistor over an OEV XML file -- Key: HDFS-8140 URL: https://issues.apache.org/jira/browse/HDFS-8140 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: HDFS-7285 Reporter: Xinwei Qin Assignee: Xinwei Qin Make the ECSchema info in Editlog Support for offline EditsVistor over an OEV XML file, which is not implemented in HDFS-7859. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8156) Define some system schemas in codes
[ https://issues.apache.org/jira/browse/HDFS-8156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14499161#comment-14499161 ] Xinwei Qin commented on HDFS-8156: --- Hi, [~drankye] Does the map field {{options}} of {{ECSchema}} contain {{NUM_DATA_UNITS_KEY}}, {{NUM_PARITY_UNITS_KEY}}, {{CODEC_NAME_KEY}}? In the method {{initWith()}} you remove them from {{options}}, but in the method {{toString()}} you think them contained in {{options}}, and try to skip them. {code} for (String opt : options.keySet()) { - boolean skip = (opt.equals(NUM_DATA_UNITS_KEY) || + boolean skip = (opt.equals(CODEC_NAME_KEY) || + opt.equals(NUM_DATA_UNITS_KEY) || opt.equals(NUM_PARITY_UNITS_KEY) || opt.equals(CHUNK_SIZE_KEY)); {code} IMO, the {{options}} does not need to contain other fields. Based on this, change its name to {{extraOptions}} may be better and can avoid confusion, as the field {{options}} in {{ECSChema}} and the parameter {{options}} of constructor method {{ECSchema(String schemaName, MapString, String options)}} are two different things. what do you think?. Define some system schemas in codes --- Key: HDFS-8156 URL: https://issues.apache.org/jira/browse/HDFS-8156 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Kai Zheng Assignee: Kai Zheng Attachments: HDFS-8156-v1.patch, HDFS-8156-v2.patch This is to define and add some system schemas in codes, and also resolve some TODOs left for HDFS-7859 and HDFS-7866 as they're still subject to further discussion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7859) Erasure Coding: Persist EC schemas in NameNode
[ https://issues.apache.org/jira/browse/HDFS-7859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14493702#comment-14493702 ] Xinwei Qin commented on HDFS-7859: --- OK, I will track it. Erasure Coding: Persist EC schemas in NameNode -- Key: HDFS-7859 URL: https://issues.apache.org/jira/browse/HDFS-7859 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Kai Zheng Assignee: Xinwei Qin Attachments: HDFS-7859.001.patch In meetup discussion with [~zhz] and [~jingzhao], it's suggested that we persist EC schemas in NameNode centrally and reliably, so that EC zones can reference them by name efficiently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7859) Erasure Coding: Persist EC schemas in NameNode
[ https://issues.apache.org/jira/browse/HDFS-7859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14493699#comment-14493699 ] Xinwei Qin commented on HDFS-7859: --- [~drankye], thanks for your comments. {quote} 1. Looks like this couples with HDFS-7866. Maybe I could commit HDFS-7866 first and then this gets all the left work done. Will it work for you this way? {quote} Yes, committing HDFS-7866 first is better. bq. 2. What methods can ECSchemaManager call to make it happen? Some methods like {{logAddECSchema()}} in {{FSEditLog.java}} are missing, I will add them in next patch. bq. 3. In ECSchemaManager, new methods like addECSchema are not necessarily public. I will change to friendly. bq. 4. Are we supporting the two formats? Please add Javadoc to explain them, thanks. Yes, two formats are supported. These methods are all only called during namenode startup or do checkpoint, and which method is called depends on the FSImage format. I will add detail Javadoc on them. bq. 5. Would you have separate issue(s) for the following? I will create a new issue for it. Erasure Coding: Persist EC schemas in NameNode -- Key: HDFS-7859 URL: https://issues.apache.org/jira/browse/HDFS-7859 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Kai Zheng Assignee: Xinwei Qin Attachments: HDFS-7859.001.patch In meetup discussion with [~zhz] and [~jingzhao], it's suggested that we persist EC schemas in NameNode centrally and reliably, so that EC zones can reference them by name efficiently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7859) Erasure Coding: Persist EC schemas in NameNode
[ https://issues.apache.org/jira/browse/HDFS-7859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14493703#comment-14493703 ] Xinwei Qin commented on HDFS-7859: --- OK, I will track it. Erasure Coding: Persist EC schemas in NameNode -- Key: HDFS-7859 URL: https://issues.apache.org/jira/browse/HDFS-7859 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Kai Zheng Assignee: Xinwei Qin Attachments: HDFS-7859.001.patch In meetup discussion with [~zhz] and [~jingzhao], it's suggested that we persist EC schemas in NameNode centrally and reliably, so that EC zones can reference them by name efficiently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7859) Erasure Coding: Persist EC schemas in NameNode
[ https://issues.apache.org/jira/browse/HDFS-7859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14493704#comment-14493704 ] Xinwei Qin commented on HDFS-7859: --- OK, I will track it. Erasure Coding: Persist EC schemas in NameNode -- Key: HDFS-7859 URL: https://issues.apache.org/jira/browse/HDFS-7859 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Kai Zheng Assignee: Xinwei Qin Attachments: HDFS-7859.001.patch In meetup discussion with [~zhz] and [~jingzhao], it's suggested that we persist EC schemas in NameNode centrally and reliably, so that EC zones can reference them by name efficiently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8140) ECSchema supports for offline EditsVistor over an OEV XML file
Xinwei Qin created HDFS-8140: - Summary: ECSchema supports for offline EditsVistor over an OEV XML file Key: HDFS-8140 URL: https://issues.apache.org/jira/browse/HDFS-8140 Project: Hadoop HDFS Issue Type: Task Affects Versions: HDFS-7285 Reporter: Xinwei Qin Assignee: Xinwei Qin Make the ECSchema info in Editlog Support for offline EditsVistor over an OEV XML file, which is not implemented in HDFS-7859. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7866) Erasure coding: NameNode manages EC schemas
[ https://issues.apache.org/jira/browse/HDFS-7866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14493727#comment-14493727 ] Xinwei Qin commented on HDFS-7866: --- OK, that sounds good. Erasure coding: NameNode manages EC schemas --- Key: HDFS-7866 URL: https://issues.apache.org/jira/browse/HDFS-7866 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Kai Zheng Assignee: Kai Zheng Attachments: HDFS-7866-v1.patch, HDFS-7866-v2.patch This is to extend NameNode to load, list and sync predefine EC schemas in authorized and controlled approach. The provided facilities will be used to implement DFSAdmin commands so admin can list available EC schemas, then could choose some of them for target EC zones. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7866) Erasure coding: NameNode manages EC schemas
[ https://issues.apache.org/jira/browse/HDFS-7866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14493720#comment-14493720 ] Xinwei Qin commented on HDFS-7866: --- Hi, [~drankye] {code} +/** + * TODO: HDFS-7859 persist into NameNode + * load persistent schemas from image and editlog, which is done only once + * during NameNode startup. This can be done here or in a separate method. + */ {code} These annotation can be removed. Now loading persistent schemas from fsimage and editlog is done in {{loadECSchemas()}} or {{loadState()}} method and these methods are called during NameNode startup. Erasure coding: NameNode manages EC schemas --- Key: HDFS-7866 URL: https://issues.apache.org/jira/browse/HDFS-7866 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Kai Zheng Assignee: Kai Zheng Attachments: HDFS-7866-v1.patch, HDFS-7866-v2.patch This is to extend NameNode to load, list and sync predefine EC schemas in authorized and controlled approach. The provided facilities will be used to implement DFSAdmin commands so admin can list available EC schemas, then could choose some of them for target EC zones. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7859) Erasure Coding: Persist EC schemas in NameNode
[ https://issues.apache.org/jira/browse/HDFS-7859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinwei Qin updated HDFS-7859: -- Attachment: HDFS-7859.001.patch Post the patch for review. Erasure Coding: Persist EC schemas in NameNode -- Key: HDFS-7859 URL: https://issues.apache.org/jira/browse/HDFS-7859 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Kai Zheng Assignee: Xinwei Qin Attachments: HDFS-7859.001.patch In meetup discussion with [~zhz] and [~jingzhao], it's suggested that we persist EC schemas in NameNode centrally and reliably, so that EC zones can reference them by name efficiently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7859) Erasure Coding: Persist EC schemas in NameNode
[ https://issues.apache.org/jira/browse/HDFS-7859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14492394#comment-14492394 ] Xinwei Qin commented on HDFS-7859: --- Hi [~drankye] The patch has been completed, but is a little big. I will post it about half an hour later at home. Erasure Coding: Persist EC schemas in NameNode -- Key: HDFS-7859 URL: https://issues.apache.org/jira/browse/HDFS-7859 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Kai Zheng Assignee: Xinwei Qin In meetup discussion with [~zhz] and [~jingzhao], it's suggested that we persist EC schemas in NameNode centrally and reliably, so that EC zones can reference them by name efficiently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7859) Erasure Coding: Persist EC schemas in NameNode
[ https://issues.apache.org/jira/browse/HDFS-7859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14487291#comment-14487291 ] Xinwei Qin commented on HDFS-7859: --- Hi [~drankye], Thanks for your clarification and suggestion. I'm more clear on this issue, and will post the patch ASAP. Erasure Coding: Persist EC schemas in NameNode -- Key: HDFS-7859 URL: https://issues.apache.org/jira/browse/HDFS-7859 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Kai Zheng Assignee: Xinwei Qin In meetup discussion with [~zhz] and [~jingzhao], it's suggested that we persist EC schemas in NameNode centrally and reliably, so that EC zones can reference them by name efficiently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Work started] (HDFS-7859) Erasure Coding: Persist EC schemas in NameNode
[ https://issues.apache.org/jira/browse/HDFS-7859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HDFS-7859 started by Xinwei Qin . - Erasure Coding: Persist EC schemas in NameNode -- Key: HDFS-7859 URL: https://issues.apache.org/jira/browse/HDFS-7859 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Kai Zheng Assignee: Xinwei Qin In meetup discussion with [~zhz] and [~jingzhao], it's suggested that we persist EC schemas in NameNode centrally and reliably, so that EC zones can reference them by name efficiently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7859) Erasure Coding: Persist EC schemas in NameNode
[ https://issues.apache.org/jira/browse/HDFS-7859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14393895#comment-14393895 ] Xinwei Qin commented on HDFS-7859: --- Hi, [~drankye], I'm interested in this issue, if you have no time to do, can reassign this to me. Thanks. Erasure Coding: Persist EC schemas in NameNode -- Key: HDFS-7859 URL: https://issues.apache.org/jira/browse/HDFS-7859 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Kai Zheng Assignee: Kai Zheng In meetup discussion with [~zhz] and [~jingzhao], it's suggested that we persist EC schemas in NameNode centrally and reliably, so that EC zones can reference them by name efficiently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HDFS-7859) Erasure Coding: Persist EC schemas in NameNode
[ https://issues.apache.org/jira/browse/HDFS-7859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinwei Qin reassigned HDFS-7859: - Assignee: Xinwei Qin (was: Kai Zheng) Erasure Coding: Persist EC schemas in NameNode -- Key: HDFS-7859 URL: https://issues.apache.org/jira/browse/HDFS-7859 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Kai Zheng Assignee: Xinwei Qin In meetup discussion with [~zhz] and [~jingzhao], it's suggested that we persist EC schemas in NameNode centrally and reliably, so that EC zones can reference them by name efficiently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7060) Avoid taking locks when sending heartbeats from the DataNode
[ https://issues.apache.org/jira/browse/HDFS-7060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14392672#comment-14392672 ] Xinwei Qin commented on HDFS-7060: --- [~cmccabe] , [~wheat9] Thanks a lot for your comments. We create 400 threads to write 20 million files to a Hadoop cluster with 3 DNs. Every DN has already about 90 million blocks. Without HDFS-7060 patch, heartbeat always delays about several hundreds of seconds or even longer to make DN dead. With HDFS-7060 patch, heartbeat only sometimes delays about 50s and the DN never dead. As can be seen from above, making the DN heartbeat lockless can reduce the delay time of heartbeat significantly, especially in large-scale concurrent read and write scenario. So, if the lock is unnecessary, removing it will be a good idea. Avoid taking locks when sending heartbeats from the DataNode Key: HDFS-7060 URL: https://issues.apache.org/jira/browse/HDFS-7060 Project: Hadoop HDFS Issue Type: Improvement Reporter: Haohui Mai Assignee: Xinwei Qin Attachments: HDFS-7060-002.patch, HDFS-7060.000.patch, HDFS-7060.001.patch We're seeing the heartbeat is blocked by the monitor of {{FsDatasetImpl}} when the DN is under heavy load of writes: {noformat} java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getDfsUsed(FsVolumeImpl.java:115) - waiting to lock 0x000780304fb8 (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getStorageReports(FsDatasetImpl.java:91) - locked 0x000780612fd8 (a java.lang.Object) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:563) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:668) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:827) at java.lang.Thread.run(Thread.java:744) java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:743) - waiting to lock 0x000780304fb8 (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:60) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.init(BlockReceiver.java:169) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:621) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:124) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:71) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:232) at java.lang.Thread.run(Thread.java:744) java.lang.Thread.State: RUNNABLE at java.io.UnixFileSystem.createFileExclusively(Native Method) at java.io.File.createNewFile(File.java:1006) at org.apache.hadoop.hdfs.server.datanode.DatanodeUtil.createTmpFile(DatanodeUtil.java:59) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.createRbwFile(BlockPoolSlice.java:244) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.createRbwFile(FsVolumeImpl.java:195) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:753) - locked 0x000780304fb8 (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:60) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.init(BlockReceiver.java:169) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:621) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:124) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:71) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:232) at java.lang.Thread.run(Thread.java:744) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7999) FsDatasetImpl#createTemporary sometimes holds the FSDatasetImpl lock for a very long time
[ https://issues.apache.org/jira/browse/HDFS-7999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14388273#comment-14388273 ] Xinwei Qin commented on HDFS-7999: --- Yeah, It's a good and necessary idea to avoid holding the lock for a long time by the createTemporary() method. FsDatasetImpl#createTemporary sometimes holds the FSDatasetImpl lock for a very long time - Key: HDFS-7999 URL: https://issues.apache.org/jira/browse/HDFS-7999 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: zhouyingchao Assignee: zhouyingchao Attachments: HDFS-7999-001.patch I'm using 2.6.0 and noticed that sometime DN's heartbeat were delayed for very long time, say more than 100 seconds. I get the jstack twice and looks like they are all blocked (at getStorageReport) by dataset lock, and which is held by a thread that is calling createTemporary, which again is blocked to wait earlier incarnation writer to exit. The heartbeat thread stack: java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getDfsUsed(FsVolumeImpl.java:152) - waiting to lock 0x0007b01428c0 (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getStorageReports(FsDatasetImpl.java:144) - locked 0x0007b0140ed0 (a java.lang.Object) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:575) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:680) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:850) at java.lang.Thread.run(Thread.java:662) The DataXceiver thread holds the dataset lock: DataXceiver for client at X daemon prio=10 tid=0x7f14041e6480 nid=0x52bc in Object.wait() [0x7f11d78f7000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) at java.lang.Thread.join(Thread.java:1194) locked 0x0007a33b85d8 (a org.apache.hadoop.util.Daemon) at org.apache.hadoop.hdfs.server.datanode.ReplicaInPipeline.stopWriter(ReplicaInPipeline.java:183) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:1231) locked 0x0007b01428c0 (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:114) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.init(BlockReceiver.java:179) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235) at java.lang.Thread.run(Thread.java:662) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7999) FsDatasetImpl#createTemporary sometimes holds the FSDatasetImpl lock for a very long time
[ https://issues.apache.org/jira/browse/HDFS-7999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14388309#comment-14388309 ] Xinwei Qin commented on HDFS-7999: --- Hi [~cmccabe] Thanks for your comment. {quote} even if we made the heartbeat lockless, there are still many other problems associated with having FsDatasetImpl#createTemporary hold the FSDatasetImpl lock for a very long time. Any thread that needs to read or write from the datanode will be blocked. {quote} Make the heartbeat lockless can avoid the happening of dead DataNode, and I think it is a necessary patch([https://issues.apache.org/jira/browse/HDFS-7060]). FSDatasetImpl lock held for a long time is another problem, May be the patch of this jira can alleviate the problem. FsDatasetImpl#createTemporary sometimes holds the FSDatasetImpl lock for a very long time - Key: HDFS-7999 URL: https://issues.apache.org/jira/browse/HDFS-7999 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: zhouyingchao Assignee: zhouyingchao Attachments: HDFS-7999-001.patch I'm using 2.6.0 and noticed that sometime DN's heartbeat were delayed for very long time, say more than 100 seconds. I get the jstack twice and looks like they are all blocked (at getStorageReport) by dataset lock, and which is held by a thread that is calling createTemporary, which again is blocked to wait earlier incarnation writer to exit. The heartbeat thread stack: java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getDfsUsed(FsVolumeImpl.java:152) - waiting to lock 0x0007b01428c0 (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getStorageReports(FsDatasetImpl.java:144) - locked 0x0007b0140ed0 (a java.lang.Object) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:575) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:680) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:850) at java.lang.Thread.run(Thread.java:662) The DataXceiver thread holds the dataset lock: DataXceiver for client at X daemon prio=10 tid=0x7f14041e6480 nid=0x52bc in Object.wait() [0x7f11d78f7000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) at java.lang.Thread.join(Thread.java:1194) locked 0x0007a33b85d8 (a org.apache.hadoop.util.Daemon) at org.apache.hadoop.hdfs.server.datanode.ReplicaInPipeline.stopWriter(ReplicaInPipeline.java:183) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:1231) locked 0x0007b01428c0 (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:114) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.init(BlockReceiver.java:179) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235) at java.lang.Thread.run(Thread.java:662) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7060) Contentions of the monitor of FsDatasetImpl block DN's heartbeat
[ https://issues.apache.org/jira/browse/HDFS-7060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14386110#comment-14386110 ] Xinwei Qin commented on HDFS-7060: --- [~brahmareddy] Thanks for your review. bq. some minor nits : intends are missed ( 2 spaces + 2 tabs) The indent is just 2 spaces in Hadoop code format. So, I think the 002.patch format has no problem. If not, please correct me. Contentions of the monitor of FsDatasetImpl block DN's heartbeat Key: HDFS-7060 URL: https://issues.apache.org/jira/browse/HDFS-7060 Project: Hadoop HDFS Issue Type: Bug Reporter: Haohui Mai Assignee: Xinwei Qin Priority: Critical Attachments: HDFS-7060-002.patch, HDFS-7060.000.patch, HDFS-7060.001.patch We're seeing the heartbeat is blocked by the monitor of {{FsDatasetImpl}} when the DN is under heavy load of writes: {noformat} java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getDfsUsed(FsVolumeImpl.java:115) - waiting to lock 0x000780304fb8 (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getStorageReports(FsDatasetImpl.java:91) - locked 0x000780612fd8 (a java.lang.Object) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:563) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:668) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:827) at java.lang.Thread.run(Thread.java:744) java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:743) - waiting to lock 0x000780304fb8 (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:60) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.init(BlockReceiver.java:169) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:621) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:124) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:71) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:232) at java.lang.Thread.run(Thread.java:744) java.lang.Thread.State: RUNNABLE at java.io.UnixFileSystem.createFileExclusively(Native Method) at java.io.File.createNewFile(File.java:1006) at org.apache.hadoop.hdfs.server.datanode.DatanodeUtil.createTmpFile(DatanodeUtil.java:59) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.createRbwFile(BlockPoolSlice.java:244) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.createRbwFile(FsVolumeImpl.java:195) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:753) - locked 0x000780304fb8 (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:60) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.init(BlockReceiver.java:169) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:621) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:124) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:71) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:232) at java.lang.Thread.run(Thread.java:744) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7060) Contentions of the monitor of FsDatasetImpl block DN's heartbeat
[ https://issues.apache.org/jira/browse/HDFS-7060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14386122#comment-14386122 ] Xinwei Qin commented on HDFS-7060: --- 2nd patch Contentions of the monitor of FsDatasetImpl block DN's heartbeat Key: HDFS-7060 URL: https://issues.apache.org/jira/browse/HDFS-7060 Project: Hadoop HDFS Issue Type: Bug Reporter: Haohui Mai Assignee: Xinwei Qin Priority: Critical Attachments: HDFS-7060-002.patch, HDFS-7060.000.patch, HDFS-7060.001.patch We're seeing the heartbeat is blocked by the monitor of {{FsDatasetImpl}} when the DN is under heavy load of writes: {noformat} java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getDfsUsed(FsVolumeImpl.java:115) - waiting to lock 0x000780304fb8 (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getStorageReports(FsDatasetImpl.java:91) - locked 0x000780612fd8 (a java.lang.Object) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:563) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:668) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:827) at java.lang.Thread.run(Thread.java:744) java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:743) - waiting to lock 0x000780304fb8 (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:60) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.init(BlockReceiver.java:169) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:621) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:124) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:71) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:232) at java.lang.Thread.run(Thread.java:744) java.lang.Thread.State: RUNNABLE at java.io.UnixFileSystem.createFileExclusively(Native Method) at java.io.File.createNewFile(File.java:1006) at org.apache.hadoop.hdfs.server.datanode.DatanodeUtil.createTmpFile(DatanodeUtil.java:59) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.createRbwFile(BlockPoolSlice.java:244) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.createRbwFile(FsVolumeImpl.java:195) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:753) - locked 0x000780304fb8 (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:60) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.init(BlockReceiver.java:169) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:621) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:124) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:71) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:232) at java.lang.Thread.run(Thread.java:744) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7060) Contentions of the monitor of FsDatasetImpl block DN's heartbeat
[ https://issues.apache.org/jira/browse/HDFS-7060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14386121#comment-14386121 ] Xinwei Qin commented on HDFS-7060: --- Sorry, my mistake. I mean the 2rd patch, which should be 001.patch. Contentions of the monitor of FsDatasetImpl block DN's heartbeat Key: HDFS-7060 URL: https://issues.apache.org/jira/browse/HDFS-7060 Project: Hadoop HDFS Issue Type: Bug Reporter: Haohui Mai Assignee: Xinwei Qin Priority: Critical Attachments: HDFS-7060-002.patch, HDFS-7060.000.patch, HDFS-7060.001.patch We're seeing the heartbeat is blocked by the monitor of {{FsDatasetImpl}} when the DN is under heavy load of writes: {noformat} java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getDfsUsed(FsVolumeImpl.java:115) - waiting to lock 0x000780304fb8 (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getStorageReports(FsDatasetImpl.java:91) - locked 0x000780612fd8 (a java.lang.Object) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:563) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:668) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:827) at java.lang.Thread.run(Thread.java:744) java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:743) - waiting to lock 0x000780304fb8 (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:60) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.init(BlockReceiver.java:169) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:621) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:124) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:71) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:232) at java.lang.Thread.run(Thread.java:744) java.lang.Thread.State: RUNNABLE at java.io.UnixFileSystem.createFileExclusively(Native Method) at java.io.File.createNewFile(File.java:1006) at org.apache.hadoop.hdfs.server.datanode.DatanodeUtil.createTmpFile(DatanodeUtil.java:59) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.createRbwFile(BlockPoolSlice.java:244) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.createRbwFile(FsVolumeImpl.java:195) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:753) - locked 0x000780304fb8 (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:60) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.init(BlockReceiver.java:169) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:621) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:124) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:71) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:232) at java.lang.Thread.run(Thread.java:744) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)