[jira] [Commented] (HDFS-12257) Expose getSnapshottableDirListing as a public API in HdfsAdmin
[ https://issues.apache.org/jira/browse/HDFS-12257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16383063#comment-16383063 ] Huafeng Wang commented on HDFS-12257: - Hi [~Sammi], I'm not sure about that. The patch hasn't been reviewed and looks like it has conflicts with trunk now so it has to be revised. I can try to update the patch but I'm afraid it will take few days. > Expose getSnapshottableDirListing as a public API in HdfsAdmin > -- > > Key: HDFS-12257 > URL: https://issues.apache.org/jira/browse/HDFS-12257 > Project: Hadoop HDFS > Issue Type: Improvement > Components: snapshots >Affects Versions: 2.6.5 >Reporter: Andrew Wang >Assignee: Huafeng Wang >Priority: Major > Attachments: HDFS-12257.001.patch, HDFS-12257.002.patch, > HDFS-12257.003.patch > > > Found at HIVE-16294. We have a CLI API for listing snapshottable dirs, but no > programmatic API. Other snapshot APIs are exposed in HdfsAdmin, I think we > should expose listing there as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11467) Support ErasureCoding section in OIV XML/ReverseXML
[ https://issues.apache.org/jira/browse/HDFS-11467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16237274#comment-16237274 ] Huafeng Wang commented on HDFS-11467: - The failed tests are irrelevant and they all passed locally. > Support ErasureCoding section in OIV XML/ReverseXML > --- > > Key: HDFS-11467 > URL: https://issues.apache.org/jira/browse/HDFS-11467 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: tools >Affects Versions: 3.0.0-alpha4 >Reporter: Wei-Chiu Chuang >Assignee: Huafeng Wang >Priority: Blocker > Labels: hdfs-ec-3.0-must-do > Attachments: HDFS-11467.001.patch, HDFS-11467.002.patch, > HDFS-11467.003.patch > > > As discussed in HDFS-7859, after ErasureCoding section is added into fsimage, > we would like to also support exporting this section into an XML back and > forth using the OIV tool. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11467) Support ErasureCoding section in OIV XML/ReverseXML
[ https://issues.apache.org/jira/browse/HDFS-11467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16237146#comment-16237146 ] Huafeng Wang commented on HDFS-11467: - Hi [~xiaochen], I just uploaded a new patch against the latest trunk. Please help to review it. > Support ErasureCoding section in OIV XML/ReverseXML > --- > > Key: HDFS-11467 > URL: https://issues.apache.org/jira/browse/HDFS-11467 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: tools >Affects Versions: 3.0.0-alpha4 >Reporter: Wei-Chiu Chuang >Assignee: Huafeng Wang >Priority: Blocker > Labels: hdfs-ec-3.0-must-do > Attachments: HDFS-11467.001.patch, HDFS-11467.002.patch, > HDFS-11467.003.patch > > > As discussed in HDFS-7859, after ErasureCoding section is added into fsimage, > we would like to also support exporting this section into an XML back and > forth using the OIV tool. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11467) Support ErasureCoding section in OIV XML/ReverseXML
[ https://issues.apache.org/jira/browse/HDFS-11467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huafeng Wang updated HDFS-11467: Attachment: HDFS-11467.003.patch > Support ErasureCoding section in OIV XML/ReverseXML > --- > > Key: HDFS-11467 > URL: https://issues.apache.org/jira/browse/HDFS-11467 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: tools >Affects Versions: 3.0.0-alpha4 >Reporter: Wei-Chiu Chuang >Assignee: Huafeng Wang >Priority: Blocker > Labels: hdfs-ec-3.0-must-do > Attachments: HDFS-11467.001.patch, HDFS-11467.002.patch, > HDFS-11467.003.patch > > > As discussed in HDFS-7859, after ErasureCoding section is added into fsimage, > we would like to also support exporting this section into an XML back and > forth using the OIV tool. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11467) Support ErasureCoding section in OIV XML/ReverseXML
[ https://issues.apache.org/jira/browse/HDFS-11467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16235339#comment-16235339 ] Huafeng Wang commented on HDFS-11467: - Hi Andrew, I'm working on it and I'll post an updated patch ASAP. > Support ErasureCoding section in OIV XML/ReverseXML > --- > > Key: HDFS-11467 > URL: https://issues.apache.org/jira/browse/HDFS-11467 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: tools >Affects Versions: 3.0.0-alpha4 >Reporter: Wei-Chiu Chuang >Assignee: Huafeng Wang >Priority: Blocker > Labels: hdfs-ec-3.0-must-do > Attachments: HDFS-11467.001.patch, HDFS-11467.002.patch > > > As discussed in HDFS-7859, after ErasureCoding section is added into fsimage, > we would like to also support exporting this section into an XML back and > forth using the OIV tool. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11467) Support ErasureCoding section in OIV XML/ReverseXML
[ https://issues.apache.org/jira/browse/HDFS-11467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16216339#comment-16216339 ] Huafeng Wang commented on HDFS-11467: - Thanks [~xiaochen] for your clarification, I'll update my patch once HDFS-12682 is merged. > Support ErasureCoding section in OIV XML/ReverseXML > --- > > Key: HDFS-11467 > URL: https://issues.apache.org/jira/browse/HDFS-11467 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: tools >Affects Versions: 3.0.0-alpha4 >Reporter: Wei-Chiu Chuang >Assignee: Huafeng Wang >Priority: Blocker > Labels: hdfs-ec-3.0-must-do > Attachments: HDFS-11467.001.patch, HDFS-11467.002.patch > > > As discussed in HDFS-7859, after ErasureCoding section is added into fsimage, > we would like to also support exporting this section into an XML back and > forth using the OIV tool. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11467) Support ErasureCoding section in OIV XML/ReverseXML
[ https://issues.apache.org/jira/browse/HDFS-11467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16216250#comment-16216250 ] Huafeng Wang commented on HDFS-11467: - Hi [~xiaochen], thanks for your review. Sorry I don't fully get your second point. AFAIK fsimage is the snapshot of namespace so we only need a disabled, an enabled and a removed ec policy to test the section serialization/deserialization. The combinations of state exchange make no difference here. Please correct me if I am wrong. And also, I think now this issue is kind of blocked by HDFS-12682 since the serialized ec policy will not have the right state so the test can not pass. > Support ErasureCoding section in OIV XML/ReverseXML > --- > > Key: HDFS-11467 > URL: https://issues.apache.org/jira/browse/HDFS-11467 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: tools >Affects Versions: 3.0.0-alpha4 >Reporter: Wei-Chiu Chuang >Assignee: Huafeng Wang >Priority: Blocker > Labels: hdfs-ec-3.0-must-do > Attachments: HDFS-11467.001.patch, HDFS-11467.002.patch > > > As discussed in HDFS-7859, after ErasureCoding section is added into fsimage, > we would like to also support exporting this section into an XML back and > forth using the OIV tool. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11467) Support ErasureCoding section in OIV XML/ReverseXML
[ https://issues.apache.org/jira/browse/HDFS-11467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16205654#comment-16205654 ] Huafeng Wang commented on HDFS-11467: - Thanks [~jojochuang], [~Sammi] for your reviews, I just updated my patch according to your comments. > Support ErasureCoding section in OIV XML/ReverseXML > --- > > Key: HDFS-11467 > URL: https://issues.apache.org/jira/browse/HDFS-11467 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: tools >Affects Versions: 3.0.0-alpha4 >Reporter: Wei-Chiu Chuang >Assignee: Huafeng Wang > Labels: hdfs-ec-3.0-must-do > Attachments: HDFS-11467.001.patch, HDFS-11467.002.patch > > > As discussed in HDFS-7859, after ErasureCoding section is added into fsimage, > we would like to also support exporting this section into an XML back and > forth using the OIV tool. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11467) Support ErasureCoding section in OIV XML/ReverseXML
[ https://issues.apache.org/jira/browse/HDFS-11467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huafeng Wang updated HDFS-11467: Attachment: HDFS-11467.002.patch > Support ErasureCoding section in OIV XML/ReverseXML > --- > > Key: HDFS-11467 > URL: https://issues.apache.org/jira/browse/HDFS-11467 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: tools >Affects Versions: 3.0.0-alpha4 >Reporter: Wei-Chiu Chuang >Assignee: Huafeng Wang > Labels: hdfs-ec-3.0-must-do > Attachments: HDFS-11467.001.patch, HDFS-11467.002.patch > > > As discussed in HDFS-7859, after ErasureCoding section is added into fsimage, > we would like to also support exporting this section into an XML back and > forth using the OIV tool. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11467) Support ErasureCoding section in OIV XML/ReverseXML
[ https://issues.apache.org/jira/browse/HDFS-11467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huafeng Wang updated HDFS-11467: Attachment: HDFS-11467.001.patch > Support ErasureCoding section in OIV XML/ReverseXML > --- > > Key: HDFS-11467 > URL: https://issues.apache.org/jira/browse/HDFS-11467 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: tools >Affects Versions: 3.0.0-alpha4 >Reporter: Wei-Chiu Chuang >Assignee: Huafeng Wang > Labels: hdfs-ec-3.0-must-do > Attachments: HDFS-11467.001.patch > > > As discussed in HDFS-7859, after ErasureCoding section is added into fsimage, > we would like to also support exporting this section into an XML back and > forth using the OIV tool. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11467) Support ErasureCoding section in OIV XML/ReverseXML
[ https://issues.apache.org/jira/browse/HDFS-11467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huafeng Wang updated HDFS-11467: Status: Patch Available (was: Open) > Support ErasureCoding section in OIV XML/ReverseXML > --- > > Key: HDFS-11467 > URL: https://issues.apache.org/jira/browse/HDFS-11467 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: tools >Affects Versions: 3.0.0-alpha4 >Reporter: Wei-Chiu Chuang >Assignee: Huafeng Wang > Labels: hdfs-ec-3.0-must-do > Attachments: HDFS-11467.001.patch > > > As discussed in HDFS-7859, after ErasureCoding section is added into fsimage, > we would like to also support exporting this section into an XML back and > forth using the OIV tool. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11467) Support ErasureCoding section in OIV XML/ReverseXML
[ https://issues.apache.org/jira/browse/HDFS-11467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16202993#comment-16202993 ] Huafeng Wang commented on HDFS-11467: - As discussed with Wei offline, I'll take this one. > Support ErasureCoding section in OIV XML/ReverseXML > --- > > Key: HDFS-11467 > URL: https://issues.apache.org/jira/browse/HDFS-11467 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: tools >Affects Versions: 3.0.0-alpha4 >Reporter: Wei-Chiu Chuang >Assignee: Wei Zhou > Labels: hdfs-ec-3.0-must-do > > As discussed in HDFS-7859, after ErasureCoding section is added into fsimage, > we would like to also support exporting this section into an XML back and > forth using the OIV tool. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-11467) Support ErasureCoding section in OIV XML/ReverseXML
[ https://issues.apache.org/jira/browse/HDFS-11467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huafeng Wang reassigned HDFS-11467: --- Assignee: Huafeng Wang (was: Wei Zhou) > Support ErasureCoding section in OIV XML/ReverseXML > --- > > Key: HDFS-11467 > URL: https://issues.apache.org/jira/browse/HDFS-11467 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: tools >Affects Versions: 3.0.0-alpha4 >Reporter: Wei-Chiu Chuang >Assignee: Huafeng Wang > Labels: hdfs-ec-3.0-must-do > > As discussed in HDFS-7859, after ErasureCoding section is added into fsimage, > we would like to also support exporting this section into an XML back and > forth using the OIV tool. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12635) Unnecessary exception declaration of the CellBuffers constructor
[ https://issues.apache.org/jira/browse/HDFS-12635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16199943#comment-16199943 ] Huafeng Wang commented on HDFS-12635: - Thanks Kai for your review! > Unnecessary exception declaration of the CellBuffers constructor > > > Key: HDFS-12635 > URL: https://issues.apache.org/jira/browse/HDFS-12635 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Huafeng Wang >Assignee: Huafeng Wang >Priority: Minor > Fix For: 3.0.0 > > Attachments: HDFS-12635.001.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-12633) Unnecessary exception declaration of the CellBuffers constructor
[ https://issues.apache.org/jira/browse/HDFS-12633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huafeng Wang resolved HDFS-12633. - Resolution: Duplicate > Unnecessary exception declaration of the CellBuffers constructor > > > Key: HDFS-12633 > URL: https://issues.apache.org/jira/browse/HDFS-12633 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Huafeng Wang >Assignee: Huafeng Wang >Priority: Minor > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12635) Unnecessary exception declaration of the CellBuffers constructor
[ https://issues.apache.org/jira/browse/HDFS-12635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huafeng Wang updated HDFS-12635: Status: Patch Available (was: Open) > Unnecessary exception declaration of the CellBuffers constructor > > > Key: HDFS-12635 > URL: https://issues.apache.org/jira/browse/HDFS-12635 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Huafeng Wang >Assignee: Huafeng Wang >Priority: Minor > Attachments: HDFS-12635.001.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12635) Unnecessary exception declaration of the CellBuffers constructor
[ https://issues.apache.org/jira/browse/HDFS-12635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huafeng Wang updated HDFS-12635: Attachment: HDFS-12635.001.patch > Unnecessary exception declaration of the CellBuffers constructor > > > Key: HDFS-12635 > URL: https://issues.apache.org/jira/browse/HDFS-12635 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Huafeng Wang >Assignee: Huafeng Wang >Priority: Minor > Attachments: HDFS-12635.001.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-12635) Unnecessary exception declaration of the CellBuffers constructor
Huafeng Wang created HDFS-12635: --- Summary: Unnecessary exception declaration of the CellBuffers constructor Key: HDFS-12635 URL: https://issues.apache.org/jira/browse/HDFS-12635 Project: Hadoop HDFS Issue Type: Bug Reporter: Huafeng Wang Assignee: Huafeng Wang Priority: Minor -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-12633) Unnecessary exception declaration of the CellBuffers constructor
Huafeng Wang created HDFS-12633: --- Summary: Unnecessary exception declaration of the CellBuffers constructor Key: HDFS-12633 URL: https://issues.apache.org/jira/browse/HDFS-12633 Project: Hadoop HDFS Issue Type: Bug Reporter: Huafeng Wang Assignee: Huafeng Wang Priority: Minor -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12497) Re-enable TestDFSStripedOutputStreamWithFailure tests
[ https://issues.apache.org/jira/browse/HDFS-12497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huafeng Wang updated HDFS-12497: Attachment: HDFS-12497.004.patch > Re-enable TestDFSStripedOutputStreamWithFailure tests > - > > Key: HDFS-12497 > URL: https://issues.apache.org/jira/browse/HDFS-12497 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding >Affects Versions: 3.0.0-beta1 >Reporter: Andrew Wang >Assignee: Huafeng Wang > Labels: flaky-test, hdfs-ec-3.0-must-do > Attachments: HDFS-12497.001.patch, HDFS-12497.002.patch, > HDFS-12497.003.patch, HDFS-12497.004.patch > > > We disabled this suite of tests in HDFS-12417 since they were very flaky. We > should fix these tests and re-enable them. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12257) Expose getSnapshottableDirListing as a public API in HdfsAdmin
[ https://issues.apache.org/jira/browse/HDFS-12257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16198299#comment-16198299 ] Huafeng Wang commented on HDFS-12257: - Hi [~asuresh], about next major release, do you mean 3.0 release? I'm OK with porting to 2.X versions since the patch is basically adding a new API so it won't be much trouble. [~andrew.wang], any comment on this one? I'll correct the check styles along with the modification according to the later comments. > Expose getSnapshottableDirListing as a public API in HdfsAdmin > -- > > Key: HDFS-12257 > URL: https://issues.apache.org/jira/browse/HDFS-12257 > Project: Hadoop HDFS > Issue Type: Improvement > Components: snapshots >Affects Versions: 2.6.5 >Reporter: Andrew Wang >Assignee: Huafeng Wang > Attachments: HDFS-12257.001.patch, HDFS-12257.002.patch, > HDFS-12257.003.patch > > > Found at HIVE-16294. We have a CLI API for listing snapshottable dirs, but no > programmatic API. Other snapshot APIs are exposed in HdfsAdmin, I think we > should expose listing there as well. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12497) Re-enable TestDFSStripedOutputStreamWithFailure tests
[ https://issues.apache.org/jira/browse/HDFS-12497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16198069#comment-16198069 ] Huafeng Wang commented on HDFS-12497: - Hi Andrew, {quote} The exception being thrown in DFSStripedOutputStream {quote} The constructor of inner class {{CellBuffers}} in {{DFSStripedOutputStream}} declares throwing InterruptedException while the actual code will not throw that exception so I removed it. {quote} Removing logging from TestDFSStripedOutputStreamWithFailureWithRandomECPolicy {quote} The constructor is not needed anymore so I also removed the logging part. I can add back if this log is necessary. > Re-enable TestDFSStripedOutputStreamWithFailure tests > - > > Key: HDFS-12497 > URL: https://issues.apache.org/jira/browse/HDFS-12497 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding >Affects Versions: 3.0.0-beta1 >Reporter: Andrew Wang >Assignee: Huafeng Wang > Labels: flaky-test, hdfs-ec-3.0-must-do > Attachments: HDFS-12497.001.patch, HDFS-12497.002.patch, > HDFS-12497.003.patch > > > We disabled this suite of tests in HDFS-12417 since they were very flaky. We > should fix these tests and re-enable them. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12497) Re-enable TestDFSStripedOutputStreamWithFailure tests
[ https://issues.apache.org/jira/browse/HDFS-12497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16196428#comment-16196428 ] Huafeng Wang commented on HDFS-12497: - Hi [~andrew.wang], any comment on this one? > Re-enable TestDFSStripedOutputStreamWithFailure tests > - > > Key: HDFS-12497 > URL: https://issues.apache.org/jira/browse/HDFS-12497 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding >Affects Versions: 3.0.0-beta1 >Reporter: Andrew Wang >Assignee: Huafeng Wang > Labels: flaky-test, hdfs-ec-3.0-must-do > Attachments: HDFS-12497.001.patch, HDFS-12497.002.patch, > HDFS-12497.003.patch > > > We disabled this suite of tests in HDFS-12417 since they were very flaky. We > should fix these tests and re-enable them. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12497) Re-enable TestDFSStripedOutputStreamWithFailure tests
[ https://issues.apache.org/jira/browse/HDFS-12497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huafeng Wang updated HDFS-12497: Attachment: HDFS-12497.003.patch > Re-enable TestDFSStripedOutputStreamWithFailure tests > - > > Key: HDFS-12497 > URL: https://issues.apache.org/jira/browse/HDFS-12497 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding >Affects Versions: 3.0.0-beta1 >Reporter: Andrew Wang >Assignee: Huafeng Wang > Labels: flaky-test, hdfs-ec-3.0-must-do > Attachments: HDFS-12497.001.patch, HDFS-12497.002.patch, > HDFS-12497.003.patch > > > We disabled this suite of tests in HDFS-12417 since they were very flaky. We > should fix these tests and re-enable them. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12497) Re-enable TestDFSStripedOutputStreamWithFailure tests
[ https://issues.apache.org/jira/browse/HDFS-12497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16185412#comment-16185412 ] Huafeng Wang commented on HDFS-12497: - Hi [~Sammi] [~andrew.wang], I found decreasing the number of stripes of a block cannot totally solve the timeout issue. It will also impact the original test cases. In original {{TestDFSStripedOutputStreamWithFailure}}, it generates a list of 216(3 * 4 * 6 * 3) different file lengths and the subclass will choose different file length to run the test. Setting stripes of a block to 2 will make some test cases actually invalid. So I proposed to decrease the cell size in these test cases. I'll give it a try and see whether the timeout issue can be eliminated. > Re-enable TestDFSStripedOutputStreamWithFailure tests > - > > Key: HDFS-12497 > URL: https://issues.apache.org/jira/browse/HDFS-12497 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding >Affects Versions: 3.0.0-beta1 >Reporter: Andrew Wang >Assignee: Huafeng Wang > Labels: flaky-test, hdfs-ec-3.0-must-do > Attachments: HDFS-12497.001.patch, HDFS-12497.002.patch > > > We disabled this suite of tests in HDFS-12417 since they were very flaky. We > should fix these tests and re-enable them. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12257) Expose getSnapshottableDirListing as a public API in HdfsAdmin
[ https://issues.apache.org/jira/browse/HDFS-12257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16185286#comment-16185286 ] Huafeng Wang commented on HDFS-12257: - Hi Andrew, I just added a new API as you proposed. Please help to review the new patch. > Expose getSnapshottableDirListing as a public API in HdfsAdmin > -- > > Key: HDFS-12257 > URL: https://issues.apache.org/jira/browse/HDFS-12257 > Project: Hadoop HDFS > Issue Type: Improvement > Components: snapshots >Affects Versions: 2.6.5 >Reporter: Andrew Wang >Assignee: Huafeng Wang > Attachments: HDFS-12257.001.patch, HDFS-12257.002.patch, > HDFS-12257.003.patch > > > Found at HIVE-16294. We have a CLI API for listing snapshottable dirs, but no > programmatic API. Other snapshot APIs are exposed in HdfsAdmin, I think we > should expose listing there as well. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12257) Expose getSnapshottableDirListing as a public API in HdfsAdmin
[ https://issues.apache.org/jira/browse/HDFS-12257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huafeng Wang updated HDFS-12257: Attachment: HDFS-12257.003.patch > Expose getSnapshottableDirListing as a public API in HdfsAdmin > -- > > Key: HDFS-12257 > URL: https://issues.apache.org/jira/browse/HDFS-12257 > Project: Hadoop HDFS > Issue Type: Improvement > Components: snapshots >Affects Versions: 2.6.5 >Reporter: Andrew Wang >Assignee: Huafeng Wang > Attachments: HDFS-12257.001.patch, HDFS-12257.002.patch, > HDFS-12257.003.patch > > > Found at HIVE-16294. We have a CLI API for listing snapshottable dirs, but no > programmatic API. Other snapshot APIs are exposed in HdfsAdmin, I think we > should expose listing there as well. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12497) Re-enable TestDFSStripedOutputStreamWithFailure tests
[ https://issues.apache.org/jira/browse/HDFS-12497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huafeng Wang updated HDFS-12497: Attachment: HDFS-12497.002.patch > Re-enable TestDFSStripedOutputStreamWithFailure tests > - > > Key: HDFS-12497 > URL: https://issues.apache.org/jira/browse/HDFS-12497 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding >Affects Versions: 3.0.0-beta1 >Reporter: Andrew Wang >Assignee: Huafeng Wang > Labels: flaky-test, hdfs-ec-3.0-must-do > Attachments: HDFS-12497.001.patch, HDFS-12497.002.patch > > > We disabled this suite of tests in HDFS-12417 since they were very flaky. We > should fix these tests and re-enable them. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12497) Re-enable TestDFSStripedOutputStreamWithFailure tests
[ https://issues.apache.org/jira/browse/HDFS-12497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huafeng Wang updated HDFS-12497: Status: Patch Available (was: Open) > Re-enable TestDFSStripedOutputStreamWithFailure tests > - > > Key: HDFS-12497 > URL: https://issues.apache.org/jira/browse/HDFS-12497 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding >Affects Versions: 3.0.0-beta1 >Reporter: Andrew Wang >Assignee: Huafeng Wang > Labels: flaky-test, hdfs-ec-3.0-must-do > Attachments: HDFS-12497.001.patch > > > We disabled this suite of tests in HDFS-12417 since they were very flaky. We > should fix these tests and re-enable them. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-12497) Re-enable TestDFSStripedOutputStreamWithFailure tests
[ https://issues.apache.org/jira/browse/HDFS-12497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huafeng Wang reassigned HDFS-12497: --- Assignee: Huafeng Wang (was: SammiChen) > Re-enable TestDFSStripedOutputStreamWithFailure tests > - > > Key: HDFS-12497 > URL: https://issues.apache.org/jira/browse/HDFS-12497 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding >Affects Versions: 3.0.0-beta1 >Reporter: Andrew Wang >Assignee: Huafeng Wang > Labels: flaky-test, hdfs-ec-3.0-must-do > Attachments: HDFS-12497.001.patch > > > We disabled this suite of tests in HDFS-12417 since they were very flaky. We > should fix these tests and re-enable them. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12497) Re-enable TestDFSStripedOutputStreamWithFailure tests
[ https://issues.apache.org/jira/browse/HDFS-12497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huafeng Wang updated HDFS-12497: Attachment: HDFS-12497.001.patch > Re-enable TestDFSStripedOutputStreamWithFailure tests > - > > Key: HDFS-12497 > URL: https://issues.apache.org/jira/browse/HDFS-12497 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding >Affects Versions: 3.0.0-beta1 >Reporter: Andrew Wang >Assignee: Huafeng Wang > Labels: flaky-test, hdfs-ec-3.0-must-do > Attachments: HDFS-12497.001.patch > > > We disabled this suite of tests in HDFS-12417 since they were very flaky. We > should fix these tests and re-enable them. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12257) Expose getSnapshottableDirListing as a public API in HdfsAdmin
[ https://issues.apache.org/jira/browse/HDFS-12257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16181865#comment-16181865 ] Huafeng Wang commented on HDFS-12257: - Hi Andrew, right now the DistributributedFileSystem and DFSClient both have the API that returns an array. I agree with you and if we proceed with your idea, should we deprecate these old API? > Expose getSnapshottableDirListing as a public API in HdfsAdmin > -- > > Key: HDFS-12257 > URL: https://issues.apache.org/jira/browse/HDFS-12257 > Project: Hadoop HDFS > Issue Type: Improvement > Components: snapshots >Affects Versions: 2.6.5 >Reporter: Andrew Wang >Assignee: Huafeng Wang > Attachments: HDFS-12257.001.patch, HDFS-12257.002.patch > > > Found at HIVE-16294. We have a CLI API for listing snapshottable dirs, but no > programmatic API. Other snapshot APIs are exposed in HdfsAdmin, I think we > should expose listing there as well. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12534) Provide logical BlockLocations for EC files for better split calculation
[ https://issues.apache.org/jira/browse/HDFS-12534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16178479#comment-16178479 ] Huafeng Wang commented on HDFS-12534: - Hi [~andrew.wang], I have a question here. {quote} Applications depend on HDFS BlockLocation to understand where the split points are. {quote} I think currently the returned logical BlockLocation per block group has all the data block and parity block's locations. Isn't these information enough? What's the difference between splitting a single block group and multiple logical block locations here? > Provide logical BlockLocations for EC files for better split calculation > > > Key: HDFS-12534 > URL: https://issues.apache.org/jira/browse/HDFS-12534 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding >Affects Versions: 3.0.0-beta1 >Reporter: Andrew Wang > Labels: hdfs-ec-3.0-must-do > > I talked to [~vanzin] and [~alex.behm] some more about split calculation with > EC. It turns out HDFS-1 was resolved prematurely. Applications depend on > HDFS BlockLocation to understand where the split points are. The current > scheme of returning one BlockLocation per block group loses this information. > We should change this to provide logical blocks. Divide the file length by > the block size and provide suitable BlockLocations to match, with virtual > offsets and lengths too. > I'm not marking this as incompatible, since changing it this way would in > fact make it more compatible from the perspective of applications that are > scheduling against replicated files. Thus, it'd be good for beta1 if > possible, but okay for later too. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12257) Expose getSnapshottableDirListing as a public API in HdfsAdmin
[ https://issues.apache.org/jira/browse/HDFS-12257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16178467#comment-16178467 ] Huafeng Wang commented on HDFS-12257: - Hi [~andrew.wang], can you help to take a look at this one? > Expose getSnapshottableDirListing as a public API in HdfsAdmin > -- > > Key: HDFS-12257 > URL: https://issues.apache.org/jira/browse/HDFS-12257 > Project: Hadoop HDFS > Issue Type: Improvement > Components: snapshots >Affects Versions: 2.6.5 >Reporter: Andrew Wang >Assignee: Huafeng Wang > Attachments: HDFS-12257.001.patch, HDFS-12257.002.patch > > > Found at HIVE-16294. We have a CLI API for listing snapshottable dirs, but no > programmatic API. Other snapshot APIs are exposed in HdfsAdmin, I think we > should expose listing there as well. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12257) Expose getSnapshottableDirListing as a public API in HdfsAdmin
[ https://issues.apache.org/jira/browse/HDFS-12257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16175941#comment-16175941 ] Huafeng Wang commented on HDFS-12257: - Hi [~msingh], thanks for your review and I just updated the patch. > Expose getSnapshottableDirListing as a public API in HdfsAdmin > -- > > Key: HDFS-12257 > URL: https://issues.apache.org/jira/browse/HDFS-12257 > Project: Hadoop HDFS > Issue Type: Improvement > Components: snapshots >Affects Versions: 2.6.5 >Reporter: Andrew Wang >Assignee: Huafeng Wang > Attachments: HDFS-12257.001.patch, HDFS-12257.002.patch > > > Found at HIVE-16294. We have a CLI API for listing snapshottable dirs, but no > programmatic API. Other snapshot APIs are exposed in HdfsAdmin, I think we > should expose listing there as well. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12257) Expose getSnapshottableDirListing as a public API in HdfsAdmin
[ https://issues.apache.org/jira/browse/HDFS-12257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huafeng Wang updated HDFS-12257: Attachment: HDFS-12257.002.patch > Expose getSnapshottableDirListing as a public API in HdfsAdmin > -- > > Key: HDFS-12257 > URL: https://issues.apache.org/jira/browse/HDFS-12257 > Project: Hadoop HDFS > Issue Type: Improvement > Components: snapshots >Affects Versions: 2.6.5 >Reporter: Andrew Wang >Assignee: Huafeng Wang > Attachments: HDFS-12257.001.patch, HDFS-12257.002.patch > > > Found at HIVE-16294. We have a CLI API for listing snapshottable dirs, but no > programmatic API. Other snapshot APIs are exposed in HdfsAdmin, I think we > should expose listing there as well. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12257) Expose getSnapshottableDirListing as a public API in HdfsAdmin
[ https://issues.apache.org/jira/browse/HDFS-12257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huafeng Wang updated HDFS-12257: Status: Patch Available (was: Open) > Expose getSnapshottableDirListing as a public API in HdfsAdmin > -- > > Key: HDFS-12257 > URL: https://issues.apache.org/jira/browse/HDFS-12257 > Project: Hadoop HDFS > Issue Type: Improvement > Components: snapshots >Affects Versions: 2.6.5 >Reporter: Andrew Wang >Assignee: Huafeng Wang > Attachments: HDFS-12257.001.patch > > > Found at HIVE-16294. We have a CLI API for listing snapshottable dirs, but no > programmatic API. Other snapshot APIs are exposed in HdfsAdmin, I think we > should expose listing there as well. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12257) Expose getSnapshottableDirListing as a public API in HdfsAdmin
[ https://issues.apache.org/jira/browse/HDFS-12257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huafeng Wang updated HDFS-12257: Attachment: HDFS-12257.001.patch > Expose getSnapshottableDirListing as a public API in HdfsAdmin > -- > > Key: HDFS-12257 > URL: https://issues.apache.org/jira/browse/HDFS-12257 > Project: Hadoop HDFS > Issue Type: Improvement > Components: snapshots >Affects Versions: 2.6.5 >Reporter: Andrew Wang >Assignee: Huafeng Wang > Attachments: HDFS-12257.001.patch > > > Found at HIVE-16294. We have a CLI API for listing snapshottable dirs, but no > programmatic API. Other snapshot APIs are exposed in HdfsAdmin, I think we > should expose listing there as well. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-12257) Expose getSnapshottableDirListing as a public API in HdfsAdmin
[ https://issues.apache.org/jira/browse/HDFS-12257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huafeng Wang reassigned HDFS-12257: --- Assignee: Huafeng Wang > Expose getSnapshottableDirListing as a public API in HdfsAdmin > -- > > Key: HDFS-12257 > URL: https://issues.apache.org/jira/browse/HDFS-12257 > Project: Hadoop HDFS > Issue Type: Improvement > Components: snapshots >Affects Versions: 2.6.5 >Reporter: Andrew Wang >Assignee: Huafeng Wang > > Found at HIVE-16294. We have a CLI API for listing snapshottable dirs, but no > programmatic API. Other snapshot APIs are exposed in HdfsAdmin, I think we > should expose listing there as well. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12523) Thread pools in ErasureCodingWorker do not shutdown
[ https://issues.apache.org/jira/browse/HDFS-12523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huafeng Wang updated HDFS-12523: Attachment: HDFS-12523.002.patch > Thread pools in ErasureCodingWorker do not shutdown > --- > > Key: HDFS-12523 > URL: https://issues.apache.org/jira/browse/HDFS-12523 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding >Affects Versions: 3.0.0-alpha4 >Reporter: Lei (Eddy) Xu >Assignee: Huafeng Wang > Attachments: HDFS-12523.001.patch, HDFS-12523.002.patch > > > There is no code path in {{ErasureCodingWorker}} to shutdown its two thread > pools: {{stripedReconstructionPool}} and {{stripedReadPool}}. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12448) Make sure user defined erasure coding policy ID will not overflow
[ https://issues.apache.org/jira/browse/HDFS-12448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huafeng Wang updated HDFS-12448: Attachment: HDFS-12448.002.patch > Make sure user defined erasure coding policy ID will not overflow > - > > Key: HDFS-12448 > URL: https://issues.apache.org/jira/browse/HDFS-12448 > Project: Hadoop HDFS > Issue Type: Improvement > Components: erasure-coding >Reporter: SammiChen >Assignee: Huafeng Wang > Labels: hdfs-ec-3.0-nice-to-have > Attachments: HDFS-12448.001.patch, HDFS-12448.002.patch > > > Current policy ID is of type "byte". 1~63 is reserved for built-in erasure > coding policy. 64 above is for user defined erasure coding policy. Make sure > user policy ID will not overflow when addErasureCodingPolicy API is called. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12448) Make sure user defined erasure coding policy ID will not overflow
[ https://issues.apache.org/jira/browse/HDFS-12448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huafeng Wang updated HDFS-12448: Attachment: HDFS-12448.001.patch > Make sure user defined erasure coding policy ID will not overflow > - > > Key: HDFS-12448 > URL: https://issues.apache.org/jira/browse/HDFS-12448 > Project: Hadoop HDFS > Issue Type: Improvement > Components: erasure-coding > Environment: Current policy ID is of type "byte". 1~63 is reserved > for built-in erasure coding policy. 64 above is for user defined erasure > coding policy. Make sure user policy ID will not overflow when > addErasureCodingPolicy API is called. >Reporter: SammiChen >Assignee: Huafeng Wang > Labels: hdfs-ec-3.0-nice-to-have > Attachments: HDFS-12448.001.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12448) Make sure user defined erasure coding policy ID will not overflow
[ https://issues.apache.org/jira/browse/HDFS-12448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huafeng Wang updated HDFS-12448: Status: Patch Available (was: Open) > Make sure user defined erasure coding policy ID will not overflow > - > > Key: HDFS-12448 > URL: https://issues.apache.org/jira/browse/HDFS-12448 > Project: Hadoop HDFS > Issue Type: Improvement > Components: erasure-coding > Environment: Current policy ID is of type "byte". 1~63 is reserved > for built-in erasure coding policy. 64 above is for user defined erasure > coding policy. Make sure user policy ID will not overflow when > addErasureCodingPolicy API is called. >Reporter: SammiChen >Assignee: Huafeng Wang > Labels: hdfs-ec-3.0-nice-to-have > Attachments: HDFS-12448.001.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12523) Thread pools in ErasureCodingWorker do not shutdown
[ https://issues.apache.org/jira/browse/HDFS-12523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huafeng Wang updated HDFS-12523: Status: Patch Available (was: Open) > Thread pools in ErasureCodingWorker do not shutdown > --- > > Key: HDFS-12523 > URL: https://issues.apache.org/jira/browse/HDFS-12523 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding >Affects Versions: 3.0.0-alpha4 >Reporter: Lei (Eddy) Xu >Assignee: Huafeng Wang > Attachments: HDFS-12523.001.patch > > > There is no code path in {{ErasureCodingWorker}} to shutdown its two thread > pools: {{stripedReconstructionPool}} and {{stripedReadPool}}. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12523) Thread pools in ErasureCodingWorker do not shutdown
[ https://issues.apache.org/jira/browse/HDFS-12523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huafeng Wang updated HDFS-12523: Attachment: HDFS-12523.001.patch > Thread pools in ErasureCodingWorker do not shutdown > --- > > Key: HDFS-12523 > URL: https://issues.apache.org/jira/browse/HDFS-12523 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding >Affects Versions: 3.0.0-alpha4 >Reporter: Lei (Eddy) Xu >Assignee: Huafeng Wang > Attachments: HDFS-12523.001.patch > > > There is no code path in {{ErasureCodingWorker}} to shutdown its two thread > pools: {{stripedReconstructionPool}} and {{stripedReadPool}}. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-12523) Thread pools in ErasureCodingWorker do not shutdown
[ https://issues.apache.org/jira/browse/HDFS-12523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huafeng Wang reassigned HDFS-12523: --- Assignee: Huafeng Wang > Thread pools in ErasureCodingWorker do not shutdown > --- > > Key: HDFS-12523 > URL: https://issues.apache.org/jira/browse/HDFS-12523 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding >Affects Versions: 3.0.0-alpha4 >Reporter: Lei (Eddy) Xu >Assignee: Huafeng Wang > > There is no code path in {{ErasureCodingWorker}} to shutdown its two thread > pools: {{stripedReconstructionPool}} and {{stripedReadPool}}. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12479) Some misuses of lock in DFSStripedOutputStream
[ https://issues.apache.org/jira/browse/HDFS-12479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16169861#comment-16169861 ] Huafeng Wang commented on HDFS-12479: - Hi [~drankye], can you help to review this patch? Thanks! > Some misuses of lock in DFSStripedOutputStream > -- > > Key: HDFS-12479 > URL: https://issues.apache.org/jira/browse/HDFS-12479 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Huafeng Wang >Assignee: Huafeng Wang >Priority: Minor > Attachments: HDFS-12479.001.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-12413) Inotify should support erasure coding policy op as replica meta change
[ https://issues.apache.org/jira/browse/HDFS-12413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huafeng Wang resolved HDFS-12413. - Resolution: Not A Problem > Inotify should support erasure coding policy op as replica meta change > -- > > Key: HDFS-12413 > URL: https://issues.apache.org/jira/browse/HDFS-12413 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: erasure-coding >Reporter: Kai Zheng >Assignee: Huafeng Wang > > Currently HDFS Inotify already supports meta change like replica for a file. > We should also support erasure coding policy setting/unsetting for a file > similarly. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12398) Use JUnit Paramaterized test suite in TestWriteReadStripedFile
[ https://issues.apache.org/jira/browse/HDFS-12398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16169768#comment-16169768 ] Huafeng Wang commented on HDFS-12398: - Hi [~andrew.wang], sorry I don't fully get your idea. Splitting the tests into subclasses cannot reduce the duplication and most of them come from the different file sizes. Did I miss anything? And I also noticed the same duplication in TestDFSStripedOutputStream. > Use JUnit Paramaterized test suite in TestWriteReadStripedFile > -- > > Key: HDFS-12398 > URL: https://issues.apache.org/jira/browse/HDFS-12398 > Project: Hadoop HDFS > Issue Type: Improvement > Components: test >Reporter: Huafeng Wang >Assignee: Huafeng Wang >Priority: Trivial > Labels: flaky-test > Attachments: HDFS-12398.001.patch, HDFS-12398.002.patch > > > The TestWriteReadStripedFile is basically doing the full product of file size > with data node failure or not. It's better to use JUnit Paramaterized test > suite. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12479) Some misuses of lock in DFSStripedOutputStream
[ https://issues.apache.org/jira/browse/HDFS-12479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huafeng Wang updated HDFS-12479: Status: Patch Available (was: Open) > Some misuses of lock in DFSStripedOutputStream > -- > > Key: HDFS-12479 > URL: https://issues.apache.org/jira/browse/HDFS-12479 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Huafeng Wang >Assignee: Huafeng Wang >Priority: Minor > Attachments: HDFS-12479.001.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12479) Some misuses of lock in DFSStripedOutputStream
[ https://issues.apache.org/jira/browse/HDFS-12479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huafeng Wang updated HDFS-12479: Attachment: HDFS-12479.001.patch > Some misuses of lock in DFSStripedOutputStream > -- > > Key: HDFS-12479 > URL: https://issues.apache.org/jira/browse/HDFS-12479 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Huafeng Wang >Assignee: Huafeng Wang >Priority: Minor > Attachments: HDFS-12479.001.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12479) Some misuses of lock in DFSStripedOutputStream
[ https://issues.apache.org/jira/browse/HDFS-12479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16169758#comment-16169758 ] Huafeng Wang commented on HDFS-12479: - # In {{MultipleBlockingQueue}}, the underlying list is immutable so there will be no concurrent modification and lock here is not needed. # In {{Coordinator}}, {{ConcurrentHashMap}} will have better performance than {{Collections.synchronizedMap}}. > Some misuses of lock in DFSStripedOutputStream > -- > > Key: HDFS-12479 > URL: https://issues.apache.org/jira/browse/HDFS-12479 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Huafeng Wang >Assignee: Huafeng Wang >Priority: Minor > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-12479) Some misuses of lock in DFSStripedOutputStream
Huafeng Wang created HDFS-12479: --- Summary: Some misuses of lock in DFSStripedOutputStream Key: HDFS-12479 URL: https://issues.apache.org/jira/browse/HDFS-12479 Project: Hadoop HDFS Issue Type: Improvement Reporter: Huafeng Wang Assignee: Huafeng Wang Priority: Minor -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12444) Reduce runtime of TestWriteReadStripedFile
[ https://issues.apache.org/jira/browse/HDFS-12444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16169508#comment-16169508 ] Huafeng Wang commented on HDFS-12444: - [~drankye] the TODO mark is already removed. > Reduce runtime of TestWriteReadStripedFile > -- > > Key: HDFS-12444 > URL: https://issues.apache.org/jira/browse/HDFS-12444 > Project: Hadoop HDFS > Issue Type: Improvement > Components: erasure-coding, test >Affects Versions: 3.0.0-alpha4 >Reporter: Andrew Wang >Assignee: Huafeng Wang > Attachments: HDFS-12444.001.patch, HDFS-12444.002.patch, > HDFS-12444.003.patch > > > This test takes a long time to run since it writes a lot of data, and > frequently times out during precommit testing. If we change the EC policy > from RS(6,3) to RS(3,2) then it will run a lot faster. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12444) Reduce runtime of TestWriteReadStripedFile
[ https://issues.apache.org/jira/browse/HDFS-12444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huafeng Wang updated HDFS-12444: Attachment: HDFS-12444.003.patch > Reduce runtime of TestWriteReadStripedFile > -- > > Key: HDFS-12444 > URL: https://issues.apache.org/jira/browse/HDFS-12444 > Project: Hadoop HDFS > Issue Type: Improvement > Components: erasure-coding, test >Affects Versions: 3.0.0-alpha4 >Reporter: Andrew Wang >Assignee: Andrew Wang > Attachments: HDFS-12444.001.patch, HDFS-12444.002.patch, > HDFS-12444.003.patch > > > This test takes a long time to run since it writes a lot of data, and > frequently times out during precommit testing. If we change the EC policy > from RS(6,3) to RS(3,2) then it will run a lot faster. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12444) Reduce runtime of TestWriteReadStripedFile
[ https://issues.apache.org/jira/browse/HDFS-12444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16165971#comment-16165971 ] Huafeng Wang commented on HDFS-12444: - Updated the patch according to Kai's suggestion. > Reduce runtime of TestWriteReadStripedFile > -- > > Key: HDFS-12444 > URL: https://issues.apache.org/jira/browse/HDFS-12444 > Project: Hadoop HDFS > Issue Type: Improvement > Components: erasure-coding, test >Affects Versions: 3.0.0-alpha4 >Reporter: Andrew Wang >Assignee: Andrew Wang > Attachments: HDFS-12444.001.patch, HDFS-12444.002.patch > > > This test takes a long time to run since it writes a lot of data, and > frequently times out during precommit testing. If we change the EC policy > from RS(6,3) to RS(3,2) then it will run a lot faster. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12444) Reduce runtime of TestWriteReadStripedFile
[ https://issues.apache.org/jira/browse/HDFS-12444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huafeng Wang updated HDFS-12444: Attachment: HDFS-12444.002.patch > Reduce runtime of TestWriteReadStripedFile > -- > > Key: HDFS-12444 > URL: https://issues.apache.org/jira/browse/HDFS-12444 > Project: Hadoop HDFS > Issue Type: Improvement > Components: erasure-coding, test >Affects Versions: 3.0.0-alpha4 >Reporter: Andrew Wang >Assignee: Andrew Wang > Attachments: HDFS-12444.001.patch, HDFS-12444.002.patch > > > This test takes a long time to run since it writes a lot of data, and > frequently times out during precommit testing. If we change the EC policy > from RS(6,3) to RS(3,2) then it will run a lot faster. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12413) Inotify should support erasure coding policy op as replica meta change
[ https://issues.apache.org/jira/browse/HDFS-12413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16165688#comment-16165688 ] Huafeng Wang commented on HDFS-12413: - I looked into the code, actually the setting/unsetting erasure code policy for files are also returned in inotify streams. They are represented as {{MetadataUpdateEvent}} and the MetadataType is {{XATTRS}}. > Inotify should support erasure coding policy op as replica meta change > -- > > Key: HDFS-12413 > URL: https://issues.apache.org/jira/browse/HDFS-12413 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: erasure-coding >Reporter: Kai Zheng >Assignee: Huafeng Wang > > Currently HDFS Inotify already supports meta change like replica for a file. > We should also support erasure coding policy setting/unsetting for a file > similarly. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12414) Ensure to use CLI command to enable/disable erasure coding policy
[ https://issues.apache.org/jira/browse/HDFS-12414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16164188#comment-16164188 ] Huafeng Wang commented on HDFS-12414: - non-binding +1 > Ensure to use CLI command to enable/disable erasure coding policy > - > > Key: HDFS-12414 > URL: https://issues.apache.org/jira/browse/HDFS-12414 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: SammiChen >Assignee: SammiChen > Labels: hdfs-ec-3.0-nice-to-have > Attachments: HDFS-12414.001.patch, HDFS-12414.002.patch, > HDFS-12414.003.patch > > > Currently, there are two methods for user to enable/disable a erasure coding > policy. One is through "dfs.namenode.ec.policies.enabled" property which is a > static way to configure the enabled erasure coding policies. Another is > through "enableErasureCodingPolicy" or "disabledErasureCodingPolicy" API > which can enabled or disable erasure coding policy at runtime. > When Namenode restart, there is potential state conflicts between the policy > defined in "dfs.namenode.ec.policies.enabled" and policy saved in fsImage. To > resolve the conflict and simplify the operation, it's better to use just one > way and remove the old method configuring the property. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12222) Document and test BlockLocation for erasure-coded files
[ https://issues.apache.org/jira/browse/HDFS-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16163998#comment-16163998 ] Huafeng Wang commented on HDFS-1: - Thanks [~andrew.wang] for your advice and help! > Document and test BlockLocation for erasure-coded files > --- > > Key: HDFS-1 > URL: https://issues.apache.org/jira/browse/HDFS-1 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0-alpha1 >Reporter: Andrew Wang >Assignee: Huafeng Wang > Labels: hdfs-ec-3.0-nice-to-have > Fix For: 3.0.0-beta1 > > Attachments: HDFS-1.001.patch, HDFS-1.002.patch, > HDFS-1.003.patch, HDFS-1.004.patch, HDFS-1.005.patch, > HDFS-1.006.patch > > > HDFS applications query block location information to compute splits. One > example of this is FileInputFormat: > https://github.com/apache/hadoop/blob/d4015f8628dd973c7433639451a9acc3e741d2a2/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/FileInputFormat.java#L346 > You see bits of code like this that calculate offsets as follows: > {noformat} > long bytesInThisBlock = blkLocations[startIndex].getOffset() + > blkLocations[startIndex].getLength() - offset; > {noformat} > EC confuses this since the block locations include parity block locations as > well, which are not part of the logical file length. This messes up the > offset calculation and thus topology/caching information too. > Applications can figure out what's a parity block by reading the EC policy > and then parsing the schema, but it'd be a lot better if we exposed this more > generically in BlockLocation instead. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12405) Clean up removed erasure coding policies from namenode
[ https://issues.apache.org/jira/browse/HDFS-12405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16162668#comment-16162668 ] Huafeng Wang commented on HDFS-12405: - I got few questions about this issue. Why do we have to clean up the removed policies? I think NameNode's restart is not frequently enough so clean up at that time can only cover a little portion of policies, so clean up them when NameNode restart would suffice? > Clean up removed erasure coding policies from namenode > -- > > Key: HDFS-12405 > URL: https://issues.apache.org/jira/browse/HDFS-12405 > Project: Hadoop HDFS > Issue Type: Improvement > Components: erasure-coding >Reporter: SammiChen >Assignee: Huafeng Wang > Labels: hdfs-ec-3.0-nice-to-have > > Currently, when an erasure coding policy is removed, it's been transited to > "removed" state. User cannot apply policy with "removed" state to > file/directory anymore. The policy cannot be safely removed from the system > unless we know there are no existing files or directories that use this > "removed" policy. To find out whether there are files or directories which > are using the policy is time consuming in runtime and might impact the > Namenode performance. So a better choice is doing the work when NameNode > restarts and loads Inodes. Collecting the information at that time will not > introduce much extra overhead. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12222) Add EC information to BlockLocation
[ https://issues.apache.org/jira/browse/HDFS-1?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huafeng Wang updated HDFS-1: Attachment: HDFS-1.006.patch > Add EC information to BlockLocation > --- > > Key: HDFS-1 > URL: https://issues.apache.org/jira/browse/HDFS-1 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0-alpha1 >Reporter: Andrew Wang >Assignee: Huafeng Wang > Labels: hdfs-ec-3.0-nice-to-have > Attachments: HDFS-1.001.patch, HDFS-1.002.patch, > HDFS-1.003.patch, HDFS-1.004.patch, HDFS-1.005.patch, > HDFS-1.006.patch > > > HDFS applications query block location information to compute splits. One > example of this is FileInputFormat: > https://github.com/apache/hadoop/blob/d4015f8628dd973c7433639451a9acc3e741d2a2/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/FileInputFormat.java#L346 > You see bits of code like this that calculate offsets as follows: > {noformat} > long bytesInThisBlock = blkLocations[startIndex].getOffset() + > blkLocations[startIndex].getLength() - offset; > {noformat} > EC confuses this since the block locations include parity block locations as > well, which are not part of the logical file length. This messes up the > offset calculation and thus topology/caching information too. > Applications can figure out what's a parity block by reading the EC policy > and then parsing the schema, but it'd be a lot better if we exposed this more > generically in BlockLocation instead. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-12413) Inotify should support erasure coding policy op as replica meta change
[ https://issues.apache.org/jira/browse/HDFS-12413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huafeng Wang reassigned HDFS-12413: --- Assignee: Huafeng Wang > Inotify should support erasure coding policy op as replica meta change > -- > > Key: HDFS-12413 > URL: https://issues.apache.org/jira/browse/HDFS-12413 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: erasure-coding >Reporter: Kai Zheng >Assignee: Huafeng Wang > > Currently HDFS Inotify already supports meta change like replica for a file. > We should also support erasure coding policy setting/unsetting for a file > similarly. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12398) Use JUnit Paramaterized test suite in TestWriteReadStripedFile
[ https://issues.apache.org/jira/browse/HDFS-12398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16160561#comment-16160561 ] Huafeng Wang commented on HDFS-12398: - Hi [~drankye], very thanks for your review. {quote} 1. The current way of having many test methods are much better readable; {quote} It's true, I can add some comments on the parameters if you wish. But I think currently the ec file names also can tell what the test is doing. {quote} 2. It's also easier to debug if some of them are failed; {quote} It's also true and it's the limitation of junit Parameterized. {quote} 3. More important, every test case (contained in a test method) needs a brand new cluster to start with; {quote} It's intended because in each test, it will randomly kill a datanode so start with a new cluster is needed. {quote} 4. Timeout can be fine-tuned for each test method in current way. {quote} It's not true, before the refactor, the timeout is controlled by {code} @Rule public Timeout globalTimeout = new Timeout(30) {code} which applies the same timeout to all test methods in a class. > Use JUnit Paramaterized test suite in TestWriteReadStripedFile > -- > > Key: HDFS-12398 > URL: https://issues.apache.org/jira/browse/HDFS-12398 > Project: Hadoop HDFS > Issue Type: Improvement > Components: test >Reporter: Huafeng Wang >Assignee: Huafeng Wang >Priority: Trivial > Attachments: HDFS-12398.001.patch, HDFS-12398.002.patch > > > The TestWriteReadStripedFile is basically doing the full product of file size > with data node failure or not. It's better to use JUnit Paramaterized test > suite. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12398) Use JUnit Paramaterized test suite in TestWriteReadStripedFile
[ https://issues.apache.org/jira/browse/HDFS-12398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huafeng Wang updated HDFS-12398: Attachment: HDFS-12398.002.patch > Use JUnit Paramaterized test suite in TestWriteReadStripedFile > -- > > Key: HDFS-12398 > URL: https://issues.apache.org/jira/browse/HDFS-12398 > Project: Hadoop HDFS > Issue Type: Improvement > Components: test >Reporter: Huafeng Wang >Assignee: Huafeng Wang >Priority: Trivial > Attachments: HDFS-12398.001.patch, HDFS-12398.002.patch > > > The TestWriteReadStripedFile is basically doing the full product of file size > with data node failure or not. It's better to use JUnit Paramaterized test > suite. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-12405) Clean up removed erasure coding policies from namenode
[ https://issues.apache.org/jira/browse/HDFS-12405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huafeng Wang reassigned HDFS-12405: --- Assignee: Huafeng Wang > Clean up removed erasure coding policies from namenode > -- > > Key: HDFS-12405 > URL: https://issues.apache.org/jira/browse/HDFS-12405 > Project: Hadoop HDFS > Issue Type: Improvement > Components: erasure-coding >Reporter: SammiChen >Assignee: Huafeng Wang > Labels: hdfs-ec-3.0-nice-to-have > > Currently, when an erasure coding policy is removed, it's been transited to > "removed" state. User cannot apply policy with "removed" state to > file/directory anymore. The policy cannot be safely removed from the system > unless we know there are no existing files or directories that use this > "removed" policy. To find out whether there are files or directories which > are using the policy is time consuming in runtime and might impact the > Namenode performance. So a better choice is doing the work when NameNode > restarts and loads Inodes. Collecting the information at that time will not > introduce much extra overhead. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12398) Use JUnit Paramaterized test suite in TestWriteReadStripedFile
[ https://issues.apache.org/jira/browse/HDFS-12398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huafeng Wang updated HDFS-12398: Attachment: HDFS-12398.001.patch > Use JUnit Paramaterized test suite in TestWriteReadStripedFile > -- > > Key: HDFS-12398 > URL: https://issues.apache.org/jira/browse/HDFS-12398 > Project: Hadoop HDFS > Issue Type: Improvement > Components: test >Reporter: Huafeng Wang >Assignee: Huafeng Wang >Priority: Trivial > Attachments: HDFS-12398.001.patch > > > The TestWriteReadStripedFile is basically doing the full product of file size > with data node failure or not. It's better to use JUnit Paramaterized test > suite. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12398) Use JUnit Paramaterized test suite in TestWriteReadStripedFile
[ https://issues.apache.org/jira/browse/HDFS-12398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huafeng Wang updated HDFS-12398: Status: Patch Available (was: Open) > Use JUnit Paramaterized test suite in TestWriteReadStripedFile > -- > > Key: HDFS-12398 > URL: https://issues.apache.org/jira/browse/HDFS-12398 > Project: Hadoop HDFS > Issue Type: Improvement > Components: test >Reporter: Huafeng Wang >Assignee: Huafeng Wang >Priority: Trivial > Attachments: HDFS-12398.001.patch > > > The TestWriteReadStripedFile is basically doing the full product of file size > with data node failure or not. It's better to use JUnit Paramaterized test > suite. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12222) Add EC information to BlockLocation
[ https://issues.apache.org/jira/browse/HDFS-1?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huafeng Wang updated HDFS-1: Attachment: HDFS-1.005.patch > Add EC information to BlockLocation > --- > > Key: HDFS-1 > URL: https://issues.apache.org/jira/browse/HDFS-1 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0-alpha1 >Reporter: Andrew Wang >Assignee: Huafeng Wang > Labels: hdfs-ec-3.0-nice-to-have > Attachments: HDFS-1.001.patch, HDFS-1.002.patch, > HDFS-1.003.patch, HDFS-1.004.patch, HDFS-1.005.patch > > > HDFS applications query block location information to compute splits. One > example of this is FileInputFormat: > https://github.com/apache/hadoop/blob/d4015f8628dd973c7433639451a9acc3e741d2a2/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/FileInputFormat.java#L346 > You see bits of code like this that calculate offsets as follows: > {noformat} > long bytesInThisBlock = blkLocations[startIndex].getOffset() + > blkLocations[startIndex].getLength() - offset; > {noformat} > EC confuses this since the block locations include parity block locations as > well, which are not part of the logical file length. This messes up the > offset calculation and thus topology/caching information too. > Applications can figure out what's a parity block by reading the EC policy > and then parsing the schema, but it'd be a lot better if we exposed this more > generically in BlockLocation instead. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12222) Add EC information to BlockLocation
[ https://issues.apache.org/jira/browse/HDFS-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16158013#comment-16158013 ] Huafeng Wang commented on HDFS-1: - Hi Andrew, I agree with you. I'll update the patch soon. > Add EC information to BlockLocation > --- > > Key: HDFS-1 > URL: https://issues.apache.org/jira/browse/HDFS-1 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0-alpha1 >Reporter: Andrew Wang >Assignee: Huafeng Wang > Labels: hdfs-ec-3.0-nice-to-have > Attachments: HDFS-1.001.patch, HDFS-1.002.patch, > HDFS-1.003.patch, HDFS-1.004.patch > > > HDFS applications query block location information to compute splits. One > example of this is FileInputFormat: > https://github.com/apache/hadoop/blob/d4015f8628dd973c7433639451a9acc3e741d2a2/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/FileInputFormat.java#L346 > You see bits of code like this that calculate offsets as follows: > {noformat} > long bytesInThisBlock = blkLocations[startIndex].getOffset() + > blkLocations[startIndex].getLength() - offset; > {noformat} > EC confuses this since the block locations include parity block locations as > well, which are not part of the logical file length. This messes up the > offset calculation and thus topology/caching information too. > Applications can figure out what's a parity block by reading the EC policy > and then parsing the schema, but it'd be a lot better if we exposed this more > generically in BlockLocation instead. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12222) Add EC information to BlockLocation
[ https://issues.apache.org/jira/browse/HDFS-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16156634#comment-16156634 ] Huafeng Wang commented on HDFS-1: - Hi [~andrew.wang], thanks for your review! I just uploaded a new patch. In this patch I mainly: * Removed the getECBlockLocation function and ECBlockLocation class. * Fixed {{getFileBlockLocation}} of DFSClient. * Add comments about {{getFileBlockLocation}}, {{listFiles}} and {{listLocatedStatus}} in {{FileSystem}}, {{DistributedFileSystem}} and {{FileContext}} * Add comments about {{makeQualifiedLocated}} in {{HdfsLocatedFileStatus}} * Add tests for {{DistributedFileSystem.getFileBlockLocation}}, {{DistributedFileSystem.listFiles}}, {{FileContext.getFileBlockLocation}} and {{FileContext.listFiles}} in case of ec with various file size. And about {quote} Could you verify that fsck -files -blocks -locations still returns parity blocks? {quote} I checked the output of {{fsck -files -blocks -locations}}, it does not have very detailed block location info of an erasure coded file. An output example of a 6+3 eraure coded file will be like {code} 0. BP-417570284-10.239.160.132-1504687036886:blk_-9223372036854775792_1001 len=6291456 Live_repl=9 [blk_-9223372036854775792:DatanodeInfoWithStorage[127.0.0.1:54859,DS-09a24593-5cbc-444c-ad43-ab1b39c65887,DISK](LIVE), blk_-9223372036854775791:DatanodeInfoWithStorage[127.0.0.1:54863,DS-80d7a2bb-5acc-437c-936a-bd28314e2a8c,DISK](LIVE), blk_-9223372036854775790:DatanodeInfoWithStorage[127.0.0.1:54883,DS-05a880c7-0fa2-4683-a382-06ec7d975fd3,DISK](LIVE), blk_-9223372036854775789:DatanodeInfoWithStorage[127.0.0.1:54854,DS-8a5cf2da-1c7e-4942-b57c-8755ddb3cfcb,DISK](LIVE), blk_-9223372036854775788:DatanodeInfoWithStorage[127.0.0.1:54871,DS-95c64656-3131-413c-b400-0f14612b387d,DISK](LIVE), blk_-9223372036854775787:DatanodeInfoWithStorage[127.0.0.1:54867,DS-fbf6ea90-8829-44ce-8681-b5f53be726c1,DISK](STALE_BLOCK_CONTENT), blk_-9223372036854775786:DatanodeInfoWithStorage[127.0.0.1:54875,DS-d40bfede-c5c9-4cb0-8b5e-92ead1bbb4da,DISK](LIVE), blk_-9223372036854775785:DatanodeInfoWithStorage[127.0.0.1:54879,DS-c999124f-3d0e-4f6c-bd31-5f0fdff86fca,DISK](STALE_BLOCK_CONTENT), blk_-9223372036854775784:DatanodeInfoWithStorage[127.0.0.1:54850,DS-7ff8f0ed-b62a-40a9-8966-b16f71532712,DISK](LIVE)] {code} So you mean we should also remove the parity blocks info? > Add EC information to BlockLocation > --- > > Key: HDFS-1 > URL: https://issues.apache.org/jira/browse/HDFS-1 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0-alpha1 >Reporter: Andrew Wang >Assignee: Huafeng Wang > Labels: hdfs-ec-3.0-nice-to-have > Attachments: HDFS-1.001.patch, HDFS-1.002.patch, > HDFS-1.003.patch, HDFS-1.004.patch > > > HDFS applications query block location information to compute splits. One > example of this is FileInputFormat: > https://github.com/apache/hadoop/blob/d4015f8628dd973c7433639451a9acc3e741d2a2/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/FileInputFormat.java#L346 > You see bits of code like this that calculate offsets as follows: > {noformat} > long bytesInThisBlock = blkLocations[startIndex].getOffset() + > blkLocations[startIndex].getLength() - offset; > {noformat} > EC confuses this since the block locations include parity block locations as > well, which are not part of the logical file length. This messes up the > offset calculation and thus topology/caching information too. > Applications can figure out what's a parity block by reading the EC policy > and then parsing the schema, but it'd be a lot better if we exposed this more > generically in BlockLocation instead. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Issue Comment Deleted] (HDFS-12222) Add EC information to BlockLocation
[ https://issues.apache.org/jira/browse/HDFS-1?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huafeng Wang updated HDFS-1: Comment: was deleted (was: Hi [~andrew.wang], thanks for your review! I just uploaded a new patch. In this patch I mainly: * Removed the getECBlockLocation function and ECBlockLocation class. * Fixed getFileBlockLocation of DFSClient. * Add comments for {{getFileBlockLocation}}, {{listFiles}} and {{listLocatedStatus}}) > Add EC information to BlockLocation > --- > > Key: HDFS-1 > URL: https://issues.apache.org/jira/browse/HDFS-1 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0-alpha1 >Reporter: Andrew Wang >Assignee: Huafeng Wang > Labels: hdfs-ec-3.0-nice-to-have > Attachments: HDFS-1.001.patch, HDFS-1.002.patch, > HDFS-1.003.patch, HDFS-1.004.patch > > > HDFS applications query block location information to compute splits. One > example of this is FileInputFormat: > https://github.com/apache/hadoop/blob/d4015f8628dd973c7433639451a9acc3e741d2a2/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/FileInputFormat.java#L346 > You see bits of code like this that calculate offsets as follows: > {noformat} > long bytesInThisBlock = blkLocations[startIndex].getOffset() + > blkLocations[startIndex].getLength() - offset; > {noformat} > EC confuses this since the block locations include parity block locations as > well, which are not part of the logical file length. This messes up the > offset calculation and thus topology/caching information too. > Applications can figure out what's a parity block by reading the EC policy > and then parsing the schema, but it'd be a lot better if we exposed this more > generically in BlockLocation instead. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12222) Add EC information to BlockLocation
[ https://issues.apache.org/jira/browse/HDFS-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16156621#comment-16156621 ] Huafeng Wang commented on HDFS-1: - Hi [~andrew.wang], thanks for your review! I just uploaded a new patch. In this patch I mainly: * Removed the getECBlockLocation function and ECBlockLocation class. * Fixed getFileBlockLocation of DFSClient. * Add comments for {{getFileBlockLocation}}, {{listFiles}} and {{listLocatedStatus}} > Add EC information to BlockLocation > --- > > Key: HDFS-1 > URL: https://issues.apache.org/jira/browse/HDFS-1 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0-alpha1 >Reporter: Andrew Wang >Assignee: Huafeng Wang > Labels: hdfs-ec-3.0-nice-to-have > Attachments: HDFS-1.001.patch, HDFS-1.002.patch, > HDFS-1.003.patch, HDFS-1.004.patch > > > HDFS applications query block location information to compute splits. One > example of this is FileInputFormat: > https://github.com/apache/hadoop/blob/d4015f8628dd973c7433639451a9acc3e741d2a2/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/FileInputFormat.java#L346 > You see bits of code like this that calculate offsets as follows: > {noformat} > long bytesInThisBlock = blkLocations[startIndex].getOffset() + > blkLocations[startIndex].getLength() - offset; > {noformat} > EC confuses this since the block locations include parity block locations as > well, which are not part of the logical file length. This messes up the > offset calculation and thus topology/caching information too. > Applications can figure out what's a parity block by reading the EC policy > and then parsing the schema, but it'd be a lot better if we exposed this more > generically in BlockLocation instead. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12222) Add EC information to BlockLocation
[ https://issues.apache.org/jira/browse/HDFS-1?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huafeng Wang updated HDFS-1: Attachment: HDFS-1.004.patch > Add EC information to BlockLocation > --- > > Key: HDFS-1 > URL: https://issues.apache.org/jira/browse/HDFS-1 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0-alpha1 >Reporter: Andrew Wang >Assignee: Huafeng Wang > Labels: hdfs-ec-3.0-nice-to-have > Attachments: HDFS-1.001.patch, HDFS-1.002.patch, > HDFS-1.003.patch, HDFS-1.004.patch > > > HDFS applications query block location information to compute splits. One > example of this is FileInputFormat: > https://github.com/apache/hadoop/blob/d4015f8628dd973c7433639451a9acc3e741d2a2/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/FileInputFormat.java#L346 > You see bits of code like this that calculate offsets as follows: > {noformat} > long bytesInThisBlock = blkLocations[startIndex].getOffset() + > blkLocations[startIndex].getLength() - offset; > {noformat} > EC confuses this since the block locations include parity block locations as > well, which are not part of the logical file length. This messes up the > offset calculation and thus topology/caching information too. > Applications can figure out what's a parity block by reading the EC policy > and then parsing the schema, but it'd be a lot better if we exposed this more > generically in BlockLocation instead. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-12398) Use JUnit Paramaterized test suite in TestWriteReadStripedFile
Huafeng Wang created HDFS-12398: --- Summary: Use JUnit Paramaterized test suite in TestWriteReadStripedFile Key: HDFS-12398 URL: https://issues.apache.org/jira/browse/HDFS-12398 Project: Hadoop HDFS Issue Type: Improvement Components: test Reporter: Huafeng Wang Priority: Trivial The TestWriteReadStripedFile is basically doing the full product of file size with data node failure or not. It's better to use JUnit Paramaterized test suite. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-12398) Use JUnit Paramaterized test suite in TestWriteReadStripedFile
[ https://issues.apache.org/jira/browse/HDFS-12398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huafeng Wang reassigned HDFS-12398: --- Assignee: Huafeng Wang > Use JUnit Paramaterized test suite in TestWriteReadStripedFile > -- > > Key: HDFS-12398 > URL: https://issues.apache.org/jira/browse/HDFS-12398 > Project: Hadoop HDFS > Issue Type: Improvement > Components: test >Reporter: Huafeng Wang >Assignee: Huafeng Wang >Priority: Trivial > > The TestWriteReadStripedFile is basically doing the full product of file size > with data node failure or not. It's better to use JUnit Paramaterized test > suite. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12388) A bad error message in DFSStripedOutputStream
[ https://issues.apache.org/jira/browse/HDFS-12388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huafeng Wang updated HDFS-12388: Attachment: HDFS-12388.001.patch > A bad error message in DFSStripedOutputStream > - > > Key: HDFS-12388 > URL: https://issues.apache.org/jira/browse/HDFS-12388 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Kai Zheng >Assignee: Huafeng Wang > Attachments: HDFS-12388.001.patch > > > Noticed a failure reported by Jenkins in HDFS-11882. The reported error > message wasn't correct, it should be: {{the number of failed blocks = 4 > the > number of data blocks = 3}} => {{the number of failed blocks = 4 > the > number of parity blocks = 3}} > {noformat} > Regression > org.apache.hadoop.hdfs.TestDFSStripedOutputStreamWithFailure030.testBlockTokenExpired > Failing for the past 1 build (Since Failed#20973 ) > Took 6.4 sec. > Error Message > Failed at i=6294527 > Stacktrace > java.io.IOException: Failed at i=6294527 > at > org.apache.hadoop.hdfs.TestDFSStripedOutputStreamWithFailure.write(TestDFSStripedOutputStreamWithFailure.java:559) > at > org.apache.hadoop.hdfs.TestDFSStripedOutputStreamWithFailure.runTest(TestDFSStripedOutputStreamWithFailure.java:534) > at > org.apache.hadoop.hdfs.TestDFSStripedOutputStreamWithFailure.testBlockTokenExpired(TestDFSStripedOutputStreamWithFailure.java:273) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) > Caused by: java.io.IOException: Failed: the number of failed blocks = 4 > the > number of data blocks = 3 > at > org.apache.hadoop.hdfs.DFSStripedOutputStream.checkStreamers(DFSStripedOutputStream.java:392) > at > org.apache.hadoop.hdfs.DFSStripedOutputStream.handleStreamerFailure(DFSStripedOutputStream.java:410) > at > org.apache.hadoop.hdfs.DFSStripedOutputStream.flushAllInternals(DFSStripedOutputStream.java:1262) > at > org.apache.hadoop.hdfs.DFSStripedOutputStream.checkStreamerFailures(DFSStripedOutputStream.java:627) > at > org.apache.hadoop.hdfs.DFSStripedOutputStream.writeChunk(DFSStripedOutputStream.java:563) > at > org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunks(FSOutputSummer.java:217) > at > org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:164) > at > org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:145) > at org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:79) > at > org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:48) > at java.io.DataOutputStream.write(DataOutputStream.java:88) > at > org.apache.hadoop.hdfs.TestDFSStripedOutputStreamWithFailure.write(TestDFSStripedOutputStreamWithFailure.java:557) > at > org.apache.hadoop.hdfs.TestDFSStripedOutputStreamWithFailure.runTest(TestDFSStripedOutputStreamWithFailure.java:534) > at > org.apache.hadoop.hdfs.TestDFSStripedOutputStreamWithFailure.testBlockTokenExpired(TestDFSStripedOutputStreamWithFailure.java:273) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail:
[jira] [Updated] (HDFS-12388) A bad error message in DFSStripedOutputStream
[ https://issues.apache.org/jira/browse/HDFS-12388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huafeng Wang updated HDFS-12388: Status: Patch Available (was: Open) > A bad error message in DFSStripedOutputStream > - > > Key: HDFS-12388 > URL: https://issues.apache.org/jira/browse/HDFS-12388 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Kai Zheng >Assignee: Huafeng Wang > Attachments: HDFS-12388.001.patch > > > Noticed a failure reported by Jenkins in HDFS-11882. The reported error > message wasn't correct, it should be: {{the number of failed blocks = 4 > the > number of data blocks = 3}} => {{the number of failed blocks = 4 > the > number of parity blocks = 3}} > {noformat} > Regression > org.apache.hadoop.hdfs.TestDFSStripedOutputStreamWithFailure030.testBlockTokenExpired > Failing for the past 1 build (Since Failed#20973 ) > Took 6.4 sec. > Error Message > Failed at i=6294527 > Stacktrace > java.io.IOException: Failed at i=6294527 > at > org.apache.hadoop.hdfs.TestDFSStripedOutputStreamWithFailure.write(TestDFSStripedOutputStreamWithFailure.java:559) > at > org.apache.hadoop.hdfs.TestDFSStripedOutputStreamWithFailure.runTest(TestDFSStripedOutputStreamWithFailure.java:534) > at > org.apache.hadoop.hdfs.TestDFSStripedOutputStreamWithFailure.testBlockTokenExpired(TestDFSStripedOutputStreamWithFailure.java:273) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) > Caused by: java.io.IOException: Failed: the number of failed blocks = 4 > the > number of data blocks = 3 > at > org.apache.hadoop.hdfs.DFSStripedOutputStream.checkStreamers(DFSStripedOutputStream.java:392) > at > org.apache.hadoop.hdfs.DFSStripedOutputStream.handleStreamerFailure(DFSStripedOutputStream.java:410) > at > org.apache.hadoop.hdfs.DFSStripedOutputStream.flushAllInternals(DFSStripedOutputStream.java:1262) > at > org.apache.hadoop.hdfs.DFSStripedOutputStream.checkStreamerFailures(DFSStripedOutputStream.java:627) > at > org.apache.hadoop.hdfs.DFSStripedOutputStream.writeChunk(DFSStripedOutputStream.java:563) > at > org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunks(FSOutputSummer.java:217) > at > org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:164) > at > org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:145) > at org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:79) > at > org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:48) > at java.io.DataOutputStream.write(DataOutputStream.java:88) > at > org.apache.hadoop.hdfs.TestDFSStripedOutputStreamWithFailure.write(TestDFSStripedOutputStreamWithFailure.java:557) > at > org.apache.hadoop.hdfs.TestDFSStripedOutputStreamWithFailure.runTest(TestDFSStripedOutputStreamWithFailure.java:534) > at > org.apache.hadoop.hdfs.TestDFSStripedOutputStreamWithFailure.testBlockTokenExpired(TestDFSStripedOutputStreamWithFailure.java:273) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail:
[jira] [Assigned] (HDFS-12388) A bad error message in DFSStripedOutputStream
[ https://issues.apache.org/jira/browse/HDFS-12388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huafeng Wang reassigned HDFS-12388: --- Assignee: Huafeng Wang > A bad error message in DFSStripedOutputStream > - > > Key: HDFS-12388 > URL: https://issues.apache.org/jira/browse/HDFS-12388 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Kai Zheng >Assignee: Huafeng Wang > > Noticed a failure reported by Jenkins in HDFS-11882. The reported error > message wasn't correct, it should be: {{the number of failed blocks = 4 > the > number of data blocks = 3}} => {{the number of failed blocks = 4 > the > number of parity blocks = 3}} > {noformat} > Regression > org.apache.hadoop.hdfs.TestDFSStripedOutputStreamWithFailure030.testBlockTokenExpired > Failing for the past 1 build (Since Failed#20973 ) > Took 6.4 sec. > Error Message > Failed at i=6294527 > Stacktrace > java.io.IOException: Failed at i=6294527 > at > org.apache.hadoop.hdfs.TestDFSStripedOutputStreamWithFailure.write(TestDFSStripedOutputStreamWithFailure.java:559) > at > org.apache.hadoop.hdfs.TestDFSStripedOutputStreamWithFailure.runTest(TestDFSStripedOutputStreamWithFailure.java:534) > at > org.apache.hadoop.hdfs.TestDFSStripedOutputStreamWithFailure.testBlockTokenExpired(TestDFSStripedOutputStreamWithFailure.java:273) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) > Caused by: java.io.IOException: Failed: the number of failed blocks = 4 > the > number of data blocks = 3 > at > org.apache.hadoop.hdfs.DFSStripedOutputStream.checkStreamers(DFSStripedOutputStream.java:392) > at > org.apache.hadoop.hdfs.DFSStripedOutputStream.handleStreamerFailure(DFSStripedOutputStream.java:410) > at > org.apache.hadoop.hdfs.DFSStripedOutputStream.flushAllInternals(DFSStripedOutputStream.java:1262) > at > org.apache.hadoop.hdfs.DFSStripedOutputStream.checkStreamerFailures(DFSStripedOutputStream.java:627) > at > org.apache.hadoop.hdfs.DFSStripedOutputStream.writeChunk(DFSStripedOutputStream.java:563) > at > org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunks(FSOutputSummer.java:217) > at > org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:164) > at > org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:145) > at org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:79) > at > org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:48) > at java.io.DataOutputStream.write(DataOutputStream.java:88) > at > org.apache.hadoop.hdfs.TestDFSStripedOutputStreamWithFailure.write(TestDFSStripedOutputStreamWithFailure.java:557) > at > org.apache.hadoop.hdfs.TestDFSStripedOutputStreamWithFailure.runTest(TestDFSStripedOutputStreamWithFailure.java:534) > at > org.apache.hadoop.hdfs.TestDFSStripedOutputStreamWithFailure.testBlockTokenExpired(TestDFSStripedOutputStreamWithFailure.java:273) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail:
[jira] [Updated] (HDFS-12222) Add EC information to BlockLocation
[ https://issues.apache.org/jira/browse/HDFS-1?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huafeng Wang updated HDFS-1: Attachment: HDFS-1.003.patch > Add EC information to BlockLocation > --- > > Key: HDFS-1 > URL: https://issues.apache.org/jira/browse/HDFS-1 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0-alpha1 >Reporter: Andrew Wang >Assignee: Huafeng Wang > Labels: hdfs-ec-3.0-nice-to-have > Attachments: HDFS-1.001.patch, HDFS-1.002.patch, > HDFS-1.003.patch > > > HDFS applications query block location information to compute splits. One > example of this is FileInputFormat: > https://github.com/apache/hadoop/blob/d4015f8628dd973c7433639451a9acc3e741d2a2/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/FileInputFormat.java#L346 > You see bits of code like this that calculate offsets as follows: > {noformat} > long bytesInThisBlock = blkLocations[startIndex].getOffset() + > blkLocations[startIndex].getLength() - offset; > {noformat} > EC confuses this since the block locations include parity block locations as > well, which are not part of the logical file length. This messes up the > offset calculation and thus topology/caching information too. > Applications can figure out what's a parity block by reading the EC policy > and then parsing the schema, but it'd be a lot better if we exposed this more > generically in BlockLocation instead. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12222) Add EC information to BlockLocation
[ https://issues.apache.org/jira/browse/HDFS-1?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huafeng Wang updated HDFS-1: Status: Patch Available (was: Open) > Add EC information to BlockLocation > --- > > Key: HDFS-1 > URL: https://issues.apache.org/jira/browse/HDFS-1 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0-alpha1 >Reporter: Andrew Wang >Assignee: Huafeng Wang > Labels: hdfs-ec-3.0-nice-to-have > Attachments: HDFS-1.001.patch, HDFS-1.002.patch, > HDFS-1.003.patch > > > HDFS applications query block location information to compute splits. One > example of this is FileInputFormat: > https://github.com/apache/hadoop/blob/d4015f8628dd973c7433639451a9acc3e741d2a2/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/FileInputFormat.java#L346 > You see bits of code like this that calculate offsets as follows: > {noformat} > long bytesInThisBlock = blkLocations[startIndex].getOffset() + > blkLocations[startIndex].getLength() - offset; > {noformat} > EC confuses this since the block locations include parity block locations as > well, which are not part of the logical file length. This messes up the > offset calculation and thus topology/caching information too. > Applications can figure out what's a parity block by reading the EC policy > and then parsing the schema, but it'd be a lot better if we exposed this more > generically in BlockLocation instead. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-11066) Improve test coverage for ISA-L native coder
[ https://issues.apache.org/jira/browse/HDFS-11066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huafeng Wang reassigned HDFS-11066: --- Assignee: Huafeng Wang > Improve test coverage for ISA-L native coder > > > Key: HDFS-11066 > URL: https://issues.apache.org/jira/browse/HDFS-11066 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: 3.0.0-alpha1 >Reporter: Wei-Chiu Chuang >Assignee: Huafeng Wang > Labels: hdfs-ec-3.0-nice-to-have > > Some issues were introduced but not found in time due to lack of necessary > Jenkins support for the ISA-L related building options. We should re-enable > ISA-L related building options in Jenkins system, so to ensure the quality of > the related native codes. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12222) Add EC information to BlockLocation
[ https://issues.apache.org/jira/browse/HDFS-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16143291#comment-16143291 ] Huafeng Wang commented on HDFS-1: - Hi [~andrew.wang], any comment on my latest update? > Add EC information to BlockLocation > --- > > Key: HDFS-1 > URL: https://issues.apache.org/jira/browse/HDFS-1 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0-alpha1 >Reporter: Andrew Wang >Assignee: Huafeng Wang > Labels: hdfs-ec-3.0-nice-to-have > Attachments: HDFS-1.001.patch, HDFS-1.002.patch > > > HDFS applications query block location information to compute splits. One > example of this is FileInputFormat: > https://github.com/apache/hadoop/blob/d4015f8628dd973c7433639451a9acc3e741d2a2/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/FileInputFormat.java#L346 > You see bits of code like this that calculate offsets as follows: > {noformat} > long bytesInThisBlock = blkLocations[startIndex].getOffset() + > blkLocations[startIndex].getLength() - offset; > {noformat} > EC confuses this since the block locations include parity block locations as > well, which are not part of the logical file length. This messes up the > offset calculation and thus topology/caching information too. > Applications can figure out what's a parity block by reading the EC policy > and then parsing the schema, but it'd be a lot better if we exposed this more > generically in BlockLocation instead. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12222) Add EC information to BlockLocation
[ https://issues.apache.org/jira/browse/HDFS-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16136375#comment-16136375 ] Huafeng Wang commented on HDFS-1: - I just tweaked the patch according to your suggestions. Is it on the right way? And about the new API that returns both data and parity blocks, I tend to place this API in DFSClient and DistributedFileSystem, something like {code} public ErasureCodedBlockLocation getECBlockLocation(Path p); {code} Is it a proper way to do that? > Add EC information to BlockLocation > --- > > Key: HDFS-1 > URL: https://issues.apache.org/jira/browse/HDFS-1 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0-alpha1 >Reporter: Andrew Wang >Assignee: Huafeng Wang > Labels: hdfs-ec-3.0-nice-to-have > Attachments: HDFS-1.001.patch, HDFS-1.002.patch > > > HDFS applications query block location information to compute splits. One > example of this is FileInputFormat: > https://github.com/apache/hadoop/blob/d4015f8628dd973c7433639451a9acc3e741d2a2/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/FileInputFormat.java#L346 > You see bits of code like this that calculate offsets as follows: > {noformat} > long bytesInThisBlock = blkLocations[startIndex].getOffset() + > blkLocations[startIndex].getLength() - offset; > {noformat} > EC confuses this since the block locations include parity block locations as > well, which are not part of the logical file length. This messes up the > offset calculation and thus topology/caching information too. > Applications can figure out what's a parity block by reading the EC policy > and then parsing the schema, but it'd be a lot better if we exposed this more > generically in BlockLocation instead. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12222) Add EC information to BlockLocation
[ https://issues.apache.org/jira/browse/HDFS-1?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huafeng Wang updated HDFS-1: Attachment: HDFS-1.002.patch > Add EC information to BlockLocation > --- > > Key: HDFS-1 > URL: https://issues.apache.org/jira/browse/HDFS-1 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0-alpha1 >Reporter: Andrew Wang >Assignee: Huafeng Wang > Labels: hdfs-ec-3.0-nice-to-have > Attachments: HDFS-1.001.patch, HDFS-1.002.patch > > > HDFS applications query block location information to compute splits. One > example of this is FileInputFormat: > https://github.com/apache/hadoop/blob/d4015f8628dd973c7433639451a9acc3e741d2a2/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/FileInputFormat.java#L346 > You see bits of code like this that calculate offsets as follows: > {noformat} > long bytesInThisBlock = blkLocations[startIndex].getOffset() + > blkLocations[startIndex].getLength() - offset; > {noformat} > EC confuses this since the block locations include parity block locations as > well, which are not part of the logical file length. This messes up the > offset calculation and thus topology/caching information too. > Applications can figure out what's a parity block by reading the EC policy > and then parsing the schema, but it'd be a lot better if we exposed this more > generically in BlockLocation instead. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12222) Add EC information to BlockLocation
[ https://issues.apache.org/jira/browse/HDFS-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16130050#comment-16130050 ] Huafeng Wang commented on HDFS-1: - Hi guys, I just uploaded an initial patch which only sketches the basic idea. In the current implementation, the LocatedFileStatus that FIF fetched is transformed from HdfsLocatedFileStatus if the underlying file system is HDFS. And the BlockLocation is actually a block group in the erasure coding case. In my first patch, I added an ErasureCodedBlockLocation into LocatedFileStatus and this property will be set if HdfsLocatedFileStatus is erasure coded. > Add EC information to BlockLocation > --- > > Key: HDFS-1 > URL: https://issues.apache.org/jira/browse/HDFS-1 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0-alpha1 >Reporter: Andrew Wang >Assignee: Huafeng Wang > Labels: hdfs-ec-3.0-nice-to-have > Attachments: HDFS-1.001.patch > > > HDFS applications query block location information to compute splits. One > example of this is FileInputFormat: > https://github.com/apache/hadoop/blob/d4015f8628dd973c7433639451a9acc3e741d2a2/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/FileInputFormat.java#L346 > You see bits of code like this that calculate offsets as follows: > {noformat} > long bytesInThisBlock = blkLocations[startIndex].getOffset() + > blkLocations[startIndex].getLength() - offset; > {noformat} > EC confuses this since the block locations include parity block locations as > well, which are not part of the logical file length. This messes up the > offset calculation and thus topology/caching information too. > Applications can figure out what's a parity block by reading the EC policy > and then parsing the schema, but it'd be a lot better if we exposed this more > generically in BlockLocation instead. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12222) Add EC information to BlockLocation
[ https://issues.apache.org/jira/browse/HDFS-1?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huafeng Wang updated HDFS-1: Attachment: HDFS-1.001.patch > Add EC information to BlockLocation > --- > > Key: HDFS-1 > URL: https://issues.apache.org/jira/browse/HDFS-1 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0-alpha1 >Reporter: Andrew Wang >Assignee: Huafeng Wang > Labels: hdfs-ec-3.0-nice-to-have > Attachments: HDFS-1.001.patch > > > HDFS applications query block location information to compute splits. One > example of this is FileInputFormat: > https://github.com/apache/hadoop/blob/d4015f8628dd973c7433639451a9acc3e741d2a2/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/FileInputFormat.java#L346 > You see bits of code like this that calculate offsets as follows: > {noformat} > long bytesInThisBlock = blkLocations[startIndex].getOffset() + > blkLocations[startIndex].getLength() - offset; > {noformat} > EC confuses this since the block locations include parity block locations as > well, which are not part of the logical file length. This messes up the > offset calculation and thus topology/caching information too. > Applications can figure out what's a parity block by reading the EC policy > and then parsing the schema, but it'd be a lot better if we exposed this more > generically in BlockLocation instead. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12269) Better to return a Map rather than HashMap in getErasureCodingCodecs
[ https://issues.apache.org/jira/browse/HDFS-12269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16128582#comment-16128582 ] Huafeng Wang commented on HDFS-12269: - Hi [~ajisakaa], thanks for your review. I just uploaded a new patch. > Better to return a Map rather than HashMap in getErasureCodingCodecs > > > Key: HDFS-12269 > URL: https://issues.apache.org/jira/browse/HDFS-12269 > Project: Hadoop HDFS > Issue Type: Improvement > Components: erasure-coding >Reporter: Huafeng Wang >Assignee: Huafeng Wang >Priority: Minor > Attachments: HDFS-12269.001.patch, HDFS-12269.002.patch, > HDFS-12269.003.patch > > > Currently the getErasureCodingCodecs function defined in ClientProtocal > returns a Hashmap: > {code:java} > HashMapgetErasureCodingCodecs() throws IOException; > {code} > It's better to return a Map. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12269) Better to return a Map rather than HashMap in getErasureCodingCodecs
[ https://issues.apache.org/jira/browse/HDFS-12269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huafeng Wang updated HDFS-12269: Attachment: HDFS-12269.003.patch > Better to return a Map rather than HashMap in getErasureCodingCodecs > > > Key: HDFS-12269 > URL: https://issues.apache.org/jira/browse/HDFS-12269 > Project: Hadoop HDFS > Issue Type: Improvement > Components: erasure-coding >Reporter: Huafeng Wang >Assignee: Huafeng Wang >Priority: Minor > Attachments: HDFS-12269.001.patch, HDFS-12269.002.patch, > HDFS-12269.003.patch > > > Currently the getErasureCodingCodecs function defined in ClientProtocal > returns a Hashmap: > {code:java} > HashMapgetErasureCodingCodecs() throws IOException; > {code} > It's better to return a Map. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12269) Better to return a Map rather than HashMap in getErasureCodingCodecs
[ https://issues.apache.org/jira/browse/HDFS-12269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huafeng Wang updated HDFS-12269: Attachment: HDFS-12269.002.patch > Better to return a Map rather than HashMap in getErasureCodingCodecs > > > Key: HDFS-12269 > URL: https://issues.apache.org/jira/browse/HDFS-12269 > Project: Hadoop HDFS > Issue Type: Improvement > Components: erasure-coding >Reporter: Huafeng Wang >Assignee: Huafeng Wang >Priority: Minor > Attachments: HDFS-12269.001.patch, HDFS-12269.002.patch > > > Currently the getErasureCodingCodecs function defined in ClientProtocal > returns a Hashmap: > {code:java} > HashMapgetErasureCodingCodecs() throws IOException; > {code} > It's better to return a Map. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12269) Better to return a Map rather than HashMap in getErasureCodingCodecs
[ https://issues.apache.org/jira/browse/HDFS-12269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huafeng Wang updated HDFS-12269: Status: Patch Available (was: Open) > Better to return a Map rather than HashMap in getErasureCodingCodecs > > > Key: HDFS-12269 > URL: https://issues.apache.org/jira/browse/HDFS-12269 > Project: Hadoop HDFS > Issue Type: Improvement > Components: erasure-coding >Reporter: Huafeng Wang >Assignee: Huafeng Wang >Priority: Minor > Attachments: HDFS-12269.001.patch > > > Currently the getErasureCodingCodecs function defined in ClientProtocal > returns a Hashmap: > {code:java} > HashMapgetErasureCodingCodecs() throws IOException; > {code} > It's better to return a Map. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12269) Better to return a Map rather than HashMap in getErasureCodingCodecs
[ https://issues.apache.org/jira/browse/HDFS-12269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huafeng Wang updated HDFS-12269: Attachment: HDFS-12269.001.patch > Better to return a Map rather than HashMap in getErasureCodingCodecs > > > Key: HDFS-12269 > URL: https://issues.apache.org/jira/browse/HDFS-12269 > Project: Hadoop HDFS > Issue Type: Improvement > Components: erasure-coding >Reporter: Huafeng Wang >Assignee: Huafeng Wang >Priority: Minor > Attachments: HDFS-12269.001.patch > > > Currently the getErasureCodingCodecs function defined in ClientProtocal > returns a Hashmap: > {code:java} > HashMapgetErasureCodingCodecs() throws IOException; > {code} > It's better to return a Map. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12222) Add EC information to BlockLocation
[ https://issues.apache.org/jira/browse/HDFS-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16122627#comment-16122627 ] Huafeng Wang commented on HDFS-1: - Hi [~drankye], you're right. I think it's a better way and I'll give it a try. > Add EC information to BlockLocation > --- > > Key: HDFS-1 > URL: https://issues.apache.org/jira/browse/HDFS-1 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0-alpha1 >Reporter: Andrew Wang >Assignee: Huafeng Wang > Labels: hdfs-ec-3.0-nice-to-have > > HDFS applications query block location information to compute splits. One > example of this is FileInputFormat: > https://github.com/apache/hadoop/blob/d4015f8628dd973c7433639451a9acc3e741d2a2/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/FileInputFormat.java#L346 > You see bits of code like this that calculate offsets as follows: > {noformat} > long bytesInThisBlock = blkLocations[startIndex].getOffset() + > blkLocations[startIndex].getLength() - offset; > {noformat} > EC confuses this since the block locations include parity block locations as > well, which are not part of the logical file length. This messes up the > offset calculation and thus topology/caching information too. > Applications can figure out what's a parity block by reading the EC policy > and then parsing the schema, but it'd be a lot better if we exposed this more > generically in BlockLocation instead. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12222) Add EC information to BlockLocation
[ https://issues.apache.org/jira/browse/HDFS-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16121223#comment-16121223 ] Huafeng Wang commented on HDFS-1: - I've checked the related code and found it is not easy to provide other functions to get parity or data blocks. The problem is, LocatedFileStatus is a subclass of FileStatus, both located in the hadoop-common module, which does not have file related erasure coding policy information. Without that specific policy information, LocatedFileStatus has no idea which BlockLocation is actually a parity block. After discussed with Kai offline, one approach is to add an ECSchema into LocatedFileStatus so that we can determine which blocks are parity blocks if erasure coding is enabled. Any suggestions here? Thanks. > Add EC information to BlockLocation > --- > > Key: HDFS-1 > URL: https://issues.apache.org/jira/browse/HDFS-1 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0-alpha1 >Reporter: Andrew Wang >Assignee: Huafeng Wang > Labels: hdfs-ec-3.0-nice-to-have > > HDFS applications query block location information to compute splits. One > example of this is FileInputFormat: > https://github.com/apache/hadoop/blob/d4015f8628dd973c7433639451a9acc3e741d2a2/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/FileInputFormat.java#L346 > You see bits of code like this that calculate offsets as follows: > {noformat} > long bytesInThisBlock = blkLocations[startIndex].getOffset() + > blkLocations[startIndex].getLength() - offset; > {noformat} > EC confuses this since the block locations include parity block locations as > well, which are not part of the logical file length. This messes up the > offset calculation and thus topology/caching information too. > Applications can figure out what's a parity block by reading the EC policy > and then parsing the schema, but it'd be a lot better if we exposed this more > generically in BlockLocation instead. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12036) Add audit log for getErasureCodingPolicy, getErasureCodingPolicies, getErasureCodingCodecs
[ https://issues.apache.org/jira/browse/HDFS-12036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16116381#comment-16116381 ] Huafeng Wang commented on HDFS-12036: - Hi Kai, the function is defined in ClientProtocol so I think it should be fixed in another issue. I just created one: https://issues.apache.org/jira/browse/HDFS-12269 > Add audit log for getErasureCodingPolicy, getErasureCodingPolicies, > getErasureCodingCodecs > -- > > Key: HDFS-12036 > URL: https://issues.apache.org/jira/browse/HDFS-12036 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.0.0-alpha4 >Reporter: Wei-Chiu Chuang >Assignee: Huafeng Wang > Labels: hdfs-ec-3.0-must-do > Attachments: HDFS-12036.001.patch, HDFS-12036.002.patch, > HDFS-12036.003.patch > > > These three FSNameSystem operations do not yet record audit logs. I am not > sure how useful these audit logs would be, but thought I should file them so > that they don't get dropped if they turn out to be needed. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-12269) Better to return a Map rather than HashMap in getErasureCodingCodecs
Huafeng Wang created HDFS-12269: --- Summary: Better to return a Map rather than HashMap in getErasureCodingCodecs Key: HDFS-12269 URL: https://issues.apache.org/jira/browse/HDFS-12269 Project: Hadoop HDFS Issue Type: Improvement Components: erasure-coding Reporter: Huafeng Wang Assignee: Huafeng Wang Priority: Minor Currently the getErasureCodingCodecs function defined in ClientProtocal returns a Hashmap: {code:java} HashMapgetErasureCodingCodecs() throws IOException; {code} It's better to return a Map. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12036) Add audit log for getErasureCodingPolicy, getErasureCodingPolicies, getErasureCodingCodecs
[ https://issues.apache.org/jira/browse/HDFS-12036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16116190#comment-16116190 ] Huafeng Wang commented on HDFS-12036: - Thanks [~drankye] for your review, I just updated the patch. > Add audit log for getErasureCodingPolicy, getErasureCodingPolicies, > getErasureCodingCodecs > -- > > Key: HDFS-12036 > URL: https://issues.apache.org/jira/browse/HDFS-12036 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.0.0-alpha4 >Reporter: Wei-Chiu Chuang >Assignee: Huafeng Wang > Labels: hdfs-ec-3.0-must-do > Attachments: HDFS-12036.001.patch, HDFS-12036.002.patch, > HDFS-12036.003.patch > > > These three FSNameSystem operations do not yet record audit logs. I am not > sure how useful these audit logs would be, but thought I should file them so > that they don't get dropped if they turn out to be needed. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12036) Add audit log for getErasureCodingPolicy, getErasureCodingPolicies, getErasureCodingCodecs
[ https://issues.apache.org/jira/browse/HDFS-12036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huafeng Wang updated HDFS-12036: Attachment: HDFS-12036.003.patch > Add audit log for getErasureCodingPolicy, getErasureCodingPolicies, > getErasureCodingCodecs > -- > > Key: HDFS-12036 > URL: https://issues.apache.org/jira/browse/HDFS-12036 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.0.0-alpha4 >Reporter: Wei-Chiu Chuang >Assignee: Huafeng Wang > Labels: hdfs-ec-3.0-must-do > Attachments: HDFS-12036.001.patch, HDFS-12036.002.patch, > HDFS-12036.003.patch > > > These three FSNameSystem operations do not yet record audit logs. I am not > sure how useful these audit logs would be, but thought I should file them so > that they don't get dropped if they turn out to be needed. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12222) Add EC information to BlockLocation
[ https://issues.apache.org/jira/browse/HDFS-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16110440#comment-16110440 ] Huafeng Wang commented on HDFS-1: - Hi guys, I'd like to take this one. > Add EC information to BlockLocation > --- > > Key: HDFS-1 > URL: https://issues.apache.org/jira/browse/HDFS-1 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0-alpha1 >Reporter: Andrew Wang > Labels: hdfs-ec-3.0-nice-to-have > > HDFS applications query block location information to compute splits. One > example of this is FileInputFormat: > https://github.com/apache/hadoop/blob/d4015f8628dd973c7433639451a9acc3e741d2a2/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/FileInputFormat.java#L346 > You see bits of code like this that calculate offsets as follows: > {noformat} > long bytesInThisBlock = blkLocations[startIndex].getOffset() + > blkLocations[startIndex].getLength() - offset; > {noformat} > EC confuses this since the block locations include parity block locations as > well, which are not part of the logical file length. This messes up the > offset calculation and thus topology/caching information too. > Applications can figure out what's a parity block by reading the EC policy > and then parsing the schema, but it'd be a lot better if we exposed this more > generically in BlockLocation instead. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-12222) Add EC information to BlockLocation
[ https://issues.apache.org/jira/browse/HDFS-1?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huafeng Wang reassigned HDFS-1: --- Assignee: Huafeng Wang > Add EC information to BlockLocation > --- > > Key: HDFS-1 > URL: https://issues.apache.org/jira/browse/HDFS-1 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0-alpha1 >Reporter: Andrew Wang >Assignee: Huafeng Wang > Labels: hdfs-ec-3.0-nice-to-have > > HDFS applications query block location information to compute splits. One > example of this is FileInputFormat: > https://github.com/apache/hadoop/blob/d4015f8628dd973c7433639451a9acc3e741d2a2/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/FileInputFormat.java#L346 > You see bits of code like this that calculate offsets as follows: > {noformat} > long bytesInThisBlock = blkLocations[startIndex].getOffset() + > blkLocations[startIndex].getLength() - offset; > {noformat} > EC confuses this since the block locations include parity block locations as > well, which are not part of the logical file length. This messes up the > offset calculation and thus topology/caching information too. > Applications can figure out what's a parity block by reading the EC policy > and then parsing the schema, but it'd be a lot better if we exposed this more > generically in BlockLocation instead. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org