[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm
[ https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16018603#comment-16018603 ] Rashmi Vinayak commented on HADOOP-11828: - Hi [~drankye], Does [HDFS-7337] address the above issue? Thanks, Rashmi > Implement the Hitchhiker erasure coding algorithm > - > > Key: HADOOP-11828 > URL: https://issues.apache.org/jira/browse/HADOOP-11828 > Project: Hadoop Common > Issue Type: Sub-task >Affects Versions: 3.0.0-alpha1 >Reporter: Zhe Zhang >Assignee: jack liuquan > Fix For: 3.0.0-alpha1 > > Attachments: 7715-hitchhikerXOR-v2.patch, > 7715-hitchhikerXOR-v2-testcode.patch, HADOOP-11828-hitchhikerXOR-V3.patch, > HADOOP-11828-hitchhikerXOR-V4.patch, HADOOP-11828-hitchhikerXOR-V5.patch, > HADOOP-11828-hitchhikerXOR-V6.patch, HADOOP-11828-hitchhikerXOR-V7.patch, > HADOOP-11828-v8.patch, HDFS-7715-hhxor-decoder.patch, > HDFS-7715-hhxor-encoder.patch > > > [Hitchhiker | > http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is > a new erasure coding algorithm developed as a research project at UC > Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% > during data reconstruction while retaining the same storage capacity and > failure tolerance capability as RS codes. This JIRA aims to introduce > Hitchhiker to the HDFS-EC framework, as one of the pluggable codec algorithms. > The existing implementation is based on HDFS-RAID. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm
[ https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15488649#comment-15488649 ] Kai Zheng commented on HADOOP-11828: bq. I want to know weather Hitch Hiker is attached to Hadoop or not and if it is attached I want to know more about its commands. Good question. Yes and no. Hitchhiker coder is already in the codebase in Hadoop common side, but not attached to HDFS side yet. Currently HDFS uses raw erasure coder API doing all the work, but HH coder is implemented in erasure coder API. Evolving HDFS towards using erasure coders is still on going, but kept as low priority in my side. I'd like to provide some help if anyone would push this, though. > Implement the Hitchhiker erasure coding algorithm > - > > Key: HADOOP-11828 > URL: https://issues.apache.org/jira/browse/HADOOP-11828 > Project: Hadoop Common > Issue Type: Sub-task >Affects Versions: 3.0.0-alpha1 >Reporter: Zhe Zhang >Assignee: jack liuquan > Fix For: 3.0.0-alpha1 > > Attachments: 7715-hitchhikerXOR-v2-testcode.patch, > 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, > HADOOP-11828-hitchhikerXOR-V4.patch, HADOOP-11828-hitchhikerXOR-V5.patch, > HADOOP-11828-hitchhikerXOR-V6.patch, HADOOP-11828-hitchhikerXOR-V7.patch, > HADOOP-11828-v8.patch, HDFS-7715-hhxor-decoder.patch, > HDFS-7715-hhxor-encoder.patch > > > [Hitchhiker | > http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is > a new erasure coding algorithm developed as a research project at UC > Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% > during data reconstruction while retaining the same storage capacity and > failure tolerance capability as RS codes. This JIRA aims to introduce > Hitchhiker to the HDFS-EC framework, as one of the pluggable codec algorithms. > The existing implementation is based on HDFS-RAID. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm
[ https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15488623#comment-15488623 ] Kai Zheng commented on HADOOP-11828: Hi [~rashmikv], Sorry for not seeing your question and the late reply. By compatible or inter-operable, I mean the encoded data by a coder can be decoded by another coder. The new Java coder is compatible with the native ISA-L coder in this sense. In some environment in your clients you may not have Hadoop native setup so only pure Java solution will do the work, but your other clients and the cluster use Hadoop native so the native ISA-L based coder can work for better performance. In other words, it's possible to use more than one coder implementations in a cluster and its clients, so the requirement for such implementation should be compatible with others, otherwise the data will be messy. > Implement the Hitchhiker erasure coding algorithm > - > > Key: HADOOP-11828 > URL: https://issues.apache.org/jira/browse/HADOOP-11828 > Project: Hadoop Common > Issue Type: Sub-task >Affects Versions: 3.0.0-alpha1 >Reporter: Zhe Zhang >Assignee: jack liuquan > Fix For: 3.0.0-alpha1 > > Attachments: 7715-hitchhikerXOR-v2-testcode.patch, > 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, > HADOOP-11828-hitchhikerXOR-V4.patch, HADOOP-11828-hitchhikerXOR-V5.patch, > HADOOP-11828-hitchhikerXOR-V6.patch, HADOOP-11828-hitchhikerXOR-V7.patch, > HADOOP-11828-v8.patch, HDFS-7715-hhxor-decoder.patch, > HDFS-7715-hhxor-encoder.patch > > > [Hitchhiker | > http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is > a new erasure coding algorithm developed as a research project at UC > Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% > during data reconstruction while retaining the same storage capacity and > failure tolerance capability as RS codes. This JIRA aims to introduce > Hitchhiker to the HDFS-EC framework, as one of the pluggable codec algorithms. > The existing implementation is based on HDFS-RAID. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm
[ https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15456547#comment-15456547 ] Mehran Mahbuti commented on HADOOP-11828: - I want to know weather Hitch Hiker is attached to Hadoop or not and if it is attached I want to know more about its commands. > Implement the Hitchhiker erasure coding algorithm > - > > Key: HADOOP-11828 > URL: https://issues.apache.org/jira/browse/HADOOP-11828 > Project: Hadoop Common > Issue Type: Sub-task >Affects Versions: 3.0.0-alpha1 >Reporter: Zhe Zhang >Assignee: jack liuquan > Fix For: 3.0.0-alpha1 > > Attachments: 7715-hitchhikerXOR-v2-testcode.patch, > 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, > HADOOP-11828-hitchhikerXOR-V4.patch, HADOOP-11828-hitchhikerXOR-V5.patch, > HADOOP-11828-hitchhikerXOR-V6.patch, HADOOP-11828-hitchhikerXOR-V7.patch, > HADOOP-11828-v8.patch, HDFS-7715-hhxor-decoder.patch, > HDFS-7715-hhxor-encoder.patch > > > [Hitchhiker | > http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is > a new erasure coding algorithm developed as a research project at UC > Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% > during data reconstruction while retaining the same storage capacity and > failure tolerance capability as RS codes. This JIRA aims to introduce > Hitchhiker to the HDFS-EC framework, as one of the pluggable codec algorithms. > The existing implementation is based on HDFS-RAID. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm
[ https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15154603#comment-15154603 ] Rashmi Vinayak commented on HADOOP-11828: - Thanks a lot for the clarifications, [~jack_liuquan] and [~drankye]! I was concerned by the name containing the term 'Raw' in it as in my understanding, 'xRawCoders' stand for pure Java ones and the the plain 'xCoders' stand for the ISA-L ones. bq. RSRawDecoder2 is the new Java one (compatible with ISA-L coder) Could you please clarify what 'compatible with ISA-L coder' means? Does it mean that when you use RSRawDecoder2, it uses ISA-L implementation if the hardware allows and if not uses the Java one? Or is this selection done through Configuration too? > Implement the Hitchhiker erasure coding algorithm > - > > Key: HADOOP-11828 > URL: https://issues.apache.org/jira/browse/HADOOP-11828 > Project: Hadoop Common > Issue Type: Sub-task >Affects Versions: 3.0.0 >Reporter: Zhe Zhang >Assignee: jack liuquan > Fix For: 3.0.0 > > Attachments: 7715-hitchhikerXOR-v2-testcode.patch, > 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, > HADOOP-11828-hitchhikerXOR-V4.patch, HADOOP-11828-hitchhikerXOR-V5.patch, > HADOOP-11828-hitchhikerXOR-V6.patch, HADOOP-11828-hitchhikerXOR-V7.patch, > HADOOP-11828-v8.patch, HDFS-7715-hhxor-decoder.patch, > HDFS-7715-hhxor-encoder.patch > > > [Hitchhiker | > http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is > a new erasure coding algorithm developed as a research project at UC > Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% > during data reconstruction while retaining the same storage capacity and > failure tolerance capability as RS codes. This JIRA aims to introduce > Hitchhiker to the HDFS-EC framework, as one of the pluggable codec algorithms. > The existing implementation is based on HDFS-RAID. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm
[ https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15153512#comment-15153512 ] jack liuquan commented on HADOOP-11828: --- Hi [~rashmikv], Kai is right, current hitchhiker coders are not couple with specific RS implementation. It picks up the concrete implementation from configuration. you can see {{HHXORErasureDecoder.java}} for more details. private RawErasureDecoder checkCreateRSRawDecoder() { if (rsRawDecoder == null) { rsRawDecoder = CodecUtil.createRSRawDecoder(getConf(), getNumDataUnits(), getNumParityUnits()); } return rsRawDecoder; } Thanks, Jack > Implement the Hitchhiker erasure coding algorithm > - > > Key: HADOOP-11828 > URL: https://issues.apache.org/jira/browse/HADOOP-11828 > Project: Hadoop Common > Issue Type: Sub-task >Affects Versions: 3.0.0 >Reporter: Zhe Zhang >Assignee: jack liuquan > Fix For: 3.0.0 > > Attachments: 7715-hitchhikerXOR-v2-testcode.patch, > 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, > HADOOP-11828-hitchhikerXOR-V4.patch, HADOOP-11828-hitchhikerXOR-V5.patch, > HADOOP-11828-hitchhikerXOR-V6.patch, HADOOP-11828-hitchhikerXOR-V7.patch, > HADOOP-11828-v8.patch, HDFS-7715-hhxor-decoder.patch, > HDFS-7715-hhxor-encoder.patch > > > [Hitchhiker | > http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is > a new erasure coding algorithm developed as a research project at UC > Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% > during data reconstruction while retaining the same storage capacity and > failure tolerance capability as RS codes. This JIRA aims to introduce > Hitchhiker to the HDFS-EC framework, as one of the pluggable codec algorithms. > The existing implementation is based on HDFS-RAID. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm
[ https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15153501#comment-15153501 ] Kai Zheng commented on HADOOP-11828: Hi Rashmi, bq. When looking at HHXORErasureDecodingStep, I see that 'RSRawDecoder' is being used. I don't see this but I'm sure HH coders should not couple with specific RS implementation. It picks up the concrete implementation from configuration. bq. Isn't 'RSRawDecoder' the older java implementation borrowed from Facebook's HDFS-RAID? I was under the impression that RSRawDecoder is the older version and RSRawDecoder2 is the newer version. Is it not so? You're almost right. Yeah right now RSRawDecoder is the old (from HDFS-RAID) and RSRawDecoder2 is the new Java one (compatible with ISA-L coder). Very soon the two coders will be renamed, ref. HADOOP-12808. > Implement the Hitchhiker erasure coding algorithm > - > > Key: HADOOP-11828 > URL: https://issues.apache.org/jira/browse/HADOOP-11828 > Project: Hadoop Common > Issue Type: Sub-task >Affects Versions: 3.0.0 >Reporter: Zhe Zhang >Assignee: jack liuquan > Fix For: 3.0.0 > > Attachments: 7715-hitchhikerXOR-v2-testcode.patch, > 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, > HADOOP-11828-hitchhikerXOR-V4.patch, HADOOP-11828-hitchhikerXOR-V5.patch, > HADOOP-11828-hitchhikerXOR-V6.patch, HADOOP-11828-hitchhikerXOR-V7.patch, > HADOOP-11828-v8.patch, HDFS-7715-hhxor-decoder.patch, > HDFS-7715-hhxor-encoder.patch > > > [Hitchhiker | > http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is > a new erasure coding algorithm developed as a research project at UC > Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% > during data reconstruction while retaining the same storage capacity and > failure tolerance capability as RS codes. This JIRA aims to introduce > Hitchhiker to the HDFS-EC framework, as one of the pluggable codec algorithms. > The existing implementation is based on HDFS-RAID. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm
[ https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15153339#comment-15153339 ] Rashmi Vinayak commented on HADOOP-11828: - Hi [~jack_liuquan], [~drankye], Is there a way to check if the encoders and decoders are using the ISA-L based implementation and not the Java ones? When looking at HHXORErasureDecodingStep, I see that 'RSRawDecoder' is being used. Isn't 'RSRawDecoder' the older java implementation borrowed from Facebook's HDFS-RAID? I was under the impression that RSRawDecoder is the older version and RSRawDecoder2 is the newer version. Is it not so? Thanks, Rashmi > Implement the Hitchhiker erasure coding algorithm > - > > Key: HADOOP-11828 > URL: https://issues.apache.org/jira/browse/HADOOP-11828 > Project: Hadoop Common > Issue Type: Sub-task >Affects Versions: 3.0.0 >Reporter: Zhe Zhang >Assignee: jack liuquan > Fix For: 3.0.0 > > Attachments: 7715-hitchhikerXOR-v2-testcode.patch, > 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, > HADOOP-11828-hitchhikerXOR-V4.patch, HADOOP-11828-hitchhikerXOR-V5.patch, > HADOOP-11828-hitchhikerXOR-V6.patch, HADOOP-11828-hitchhikerXOR-V7.patch, > HADOOP-11828-v8.patch, HDFS-7715-hhxor-decoder.patch, > HDFS-7715-hhxor-encoder.patch > > > [Hitchhiker | > http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is > a new erasure coding algorithm developed as a research project at UC > Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% > during data reconstruction while retaining the same storage capacity and > failure tolerance capability as RS codes. This JIRA aims to introduce > Hitchhiker to the HDFS-EC framework, as one of the pluggable codec algorithms. > The existing implementation is based on HDFS-RAID. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm
[ https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15140359#comment-15140359 ] Rashmi Vinayak commented on HADOOP-11828: - Hi [~jack_liuquan], [~drankye], [~zhz], I am super excited to see this being resolved! Thank you all for the efforts that you put in. I agree with [~zhz] that it would be good to get some performance results comparing RS and Hitchhiker based on the new implementation. This would guide enterprises who are considering using erasure coding, and thus leading to a greater impact from this effort and HDFS-EC in general as they will come to know about this more efficient EC option. > Implement the Hitchhiker erasure coding algorithm > - > > Key: HADOOP-11828 > URL: https://issues.apache.org/jira/browse/HADOOP-11828 > Project: Hadoop Common > Issue Type: Sub-task >Affects Versions: 3.0.0 >Reporter: Zhe Zhang >Assignee: jack liuquan > Fix For: 3.0.0 > > Attachments: 7715-hitchhikerXOR-v2-testcode.patch, > 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, > HADOOP-11828-hitchhikerXOR-V4.patch, HADOOP-11828-hitchhikerXOR-V5.patch, > HADOOP-11828-hitchhikerXOR-V6.patch, HADOOP-11828-hitchhikerXOR-V7.patch, > HADOOP-11828-v8.patch, HDFS-7715-hhxor-decoder.patch, > HDFS-7715-hhxor-encoder.patch > > > [Hitchhiker | > http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is > a new erasure coding algorithm developed as a research project at UC > Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% > during data reconstruction while retaining the same storage capacity and > failure tolerance capability as RS codes. This JIRA aims to introduce > Hitchhiker to the HDFS-EC framework, as one of the pluggable codec algorithms. > The existing implementation is based on HDFS-RAID. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm
[ https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15110481#comment-15110481 ] jack liuquan commented on HADOOP-11828: --- Hi [~zhz], Thanks for your review! bq.For an util class with all static methods, we don't need a constructor. I add a private constructor for checkstyle issue, just reference to code of {{DumpUtil}} class in {{rawcoder}} bq.The getPiggyBacksFromInput method is fairly complex and deserves a unit test. A ASCII illustration would also be very helpful, similar to Figure 4 in the Hitchhiker paper . Although {{getPiggyBacksFromInput }} is fairly complex, there only one running branch in it. I think current unit test cases are good to cover it. I will add a ASCII illustration for it. bq.3.Maybe I'm missing something, but how do we guarantee the length of inputs passed to performCoding is always numDataUnits * subPacketSize? As Rashmi said, subPacketSize is always 2 in Hitchhiker. I think we can guarantee it when we preparing block chunks. > Implement the Hitchhiker erasure coding algorithm > - > > Key: HADOOP-11828 > URL: https://issues.apache.org/jira/browse/HADOOP-11828 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Zhe Zhang >Assignee: jack liuquan > Attachments: 7715-hitchhikerXOR-v2-testcode.patch, > 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, > HADOOP-11828-hitchhikerXOR-V4.patch, HADOOP-11828-hitchhikerXOR-V5.patch, > HADOOP-11828-hitchhikerXOR-V6.patch, HADOOP-11828-hitchhikerXOR-V7.patch, > HDFS-7715-hhxor-decoder.patch, HDFS-7715-hhxor-encoder.patch > > > [Hitchhiker | > http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is > a new erasure coding algorithm developed as a research project at UC > Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% > during data reconstruction while retaining the same storage capacity and > failure tolerance capability as RS codes. This JIRA aims to introduce > Hitchhiker to the HDFS-EC framework, as one of the pluggable codec algorithms. > The existing implementation is based on HDFS-RAID. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm
[ https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15111085#comment-15111085 ] Hudson commented on HADOOP-11828: - FAILURE: Integrated in Hadoop-trunk-Commit #9153 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9153/]) HADOOP-11828. Implement the Hitchhiker erasure coding algorithm. (zhz: rev 1bb31fb22e6f8e6df8e9ff4e94adf20308b4c743) * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/erasurecode/coder/util/HHUtil.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/erasurecode/coder/AbstractErasureDecoder.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/erasurecode/coder/HHXORErasureDecodingStep.java * hadoop-common-project/hadoop-common/CHANGES.txt * hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/io/erasurecode/coder/TestHHErasureCoderBase.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/erasurecode/coder/HHXORErasureDecoder.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/erasurecode/coder/AbstractHHErasureCodingStep.java * hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/io/erasurecode/coder/TestErasureCoderBase.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/erasurecode/coder/HHXORErasureEncodingStep.java * hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/io/erasurecode/coder/TestHHXORErasureCoder.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/erasurecode/coder/HHXORErasureEncoder.java > Implement the Hitchhiker erasure coding algorithm > - > > Key: HADOOP-11828 > URL: https://issues.apache.org/jira/browse/HADOOP-11828 > Project: Hadoop Common > Issue Type: Sub-task >Affects Versions: 3.0.0 >Reporter: Zhe Zhang >Assignee: jack liuquan > Fix For: 3.0.0 > > Attachments: 7715-hitchhikerXOR-v2-testcode.patch, > 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, > HADOOP-11828-hitchhikerXOR-V4.patch, HADOOP-11828-hitchhikerXOR-V5.patch, > HADOOP-11828-hitchhikerXOR-V6.patch, HADOOP-11828-hitchhikerXOR-V7.patch, > HADOOP-11828-v8.patch, HDFS-7715-hhxor-decoder.patch, > HDFS-7715-hhxor-encoder.patch > > > [Hitchhiker | > http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is > a new erasure coding algorithm developed as a research project at UC > Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% > during data reconstruction while retaining the same storage capacity and > failure tolerance capability as RS codes. This JIRA aims to introduce > Hitchhiker to the HDFS-EC framework, as one of the pluggable codec algorithms. > The existing implementation is based on HDFS-RAID. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm
[ https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15108208#comment-15108208 ] Rashmi Vinayak commented on HADOOP-11828: - Hi [~zhz], Regarding you second question: bq. 2. Could SUB_PACKET_SIZE be other than 2? If so, should we still keep it as a variable? Sub-packet-size is always 2 in Hitchhiker. So we may perhaps have it as a constant? > Implement the Hitchhiker erasure coding algorithm > - > > Key: HADOOP-11828 > URL: https://issues.apache.org/jira/browse/HADOOP-11828 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Zhe Zhang >Assignee: jack liuquan > Attachments: 7715-hitchhikerXOR-v2-testcode.patch, > 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, > HADOOP-11828-hitchhikerXOR-V4.patch, HADOOP-11828-hitchhikerXOR-V5.patch, > HADOOP-11828-hitchhikerXOR-V6.patch, HADOOP-11828-hitchhikerXOR-V7.patch, > HDFS-7715-hhxor-decoder.patch, HDFS-7715-hhxor-encoder.patch > > > [Hitchhiker | > http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is > a new erasure coding algorithm developed as a research project at UC > Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% > during data reconstruction while retaining the same storage capacity and > failure tolerance capability as RS codes. This JIRA aims to introduce > Hitchhiker to the HDFS-EC framework, as one of the pluggable codec algorithms. > The existing implementation is based on HDFS-RAID. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm
[ https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15107878#comment-15107878 ] jack liuquan commented on HADOOP-11828: --- Thanks ,kai. I'm just confused about that where need to add, where not need. I find not all the package added the package-info.java file. > Implement the Hitchhiker erasure coding algorithm > - > > Key: HADOOP-11828 > URL: https://issues.apache.org/jira/browse/HADOOP-11828 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Zhe Zhang >Assignee: jack liuquan > Attachments: 7715-hitchhikerXOR-v2-testcode.patch, > 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, > HADOOP-11828-hitchhikerXOR-V4.patch, HADOOP-11828-hitchhikerXOR-V5.patch, > HADOOP-11828-hitchhikerXOR-V6.patch, HADOOP-11828-hitchhikerXOR-V7.patch, > HDFS-7715-hhxor-decoder.patch, HDFS-7715-hhxor-encoder.patch > > > [Hitchhiker | > http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is > a new erasure coding algorithm developed as a research project at UC > Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% > during data reconstruction while retaining the same storage capacity and > failure tolerance capability as RS codes. This JIRA aims to introduce > Hitchhiker to the HDFS-EC framework, as one of the pluggable codec algorithms. > The existing implementation is based on HDFS-RAID. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm
[ https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15107861#comment-15107861 ] jack liuquan commented on HADOOP-11828: --- Hi, [~zhz] and [~walter.k.su], have you reviewed the codes? let me know if you have finished review.Thanks! Hi, kai, the last checkstyle issue is below: ./hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/erasurecode/coder/util/HHUtil.java:0:: Missing package-info.java file. How can I ignore this ussue or Do I really need to add a package-info.java file? > Implement the Hitchhiker erasure coding algorithm > - > > Key: HADOOP-11828 > URL: https://issues.apache.org/jira/browse/HADOOP-11828 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Zhe Zhang >Assignee: jack liuquan > Attachments: 7715-hitchhikerXOR-v2-testcode.patch, > 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, > HADOOP-11828-hitchhikerXOR-V4.patch, HADOOP-11828-hitchhikerXOR-V5.patch, > HADOOP-11828-hitchhikerXOR-V6.patch, HADOOP-11828-hitchhikerXOR-V7.patch, > HDFS-7715-hhxor-decoder.patch, HDFS-7715-hhxor-encoder.patch > > > [Hitchhiker | > http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is > a new erasure coding algorithm developed as a research project at UC > Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% > during data reconstruction while retaining the same storage capacity and > failure tolerance capability as RS codes. This JIRA aims to introduce > Hitchhiker to the HDFS-EC framework, as one of the pluggable codec algorithms. > The existing implementation is based on HDFS-RAID. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm
[ https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15107869#comment-15107869 ] Kai Zheng commented on HADOOP-11828: bq. Do I really need to add a package-info.java file? I thought it's easy to have one :). Perhaps you can add it in next update. > Implement the Hitchhiker erasure coding algorithm > - > > Key: HADOOP-11828 > URL: https://issues.apache.org/jira/browse/HADOOP-11828 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Zhe Zhang >Assignee: jack liuquan > Attachments: 7715-hitchhikerXOR-v2-testcode.patch, > 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, > HADOOP-11828-hitchhikerXOR-V4.patch, HADOOP-11828-hitchhikerXOR-V5.patch, > HADOOP-11828-hitchhikerXOR-V6.patch, HADOOP-11828-hitchhikerXOR-V7.patch, > HDFS-7715-hhxor-decoder.patch, HDFS-7715-hhxor-encoder.patch > > > [Hitchhiker | > http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is > a new erasure coding algorithm developed as a research project at UC > Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% > during data reconstruction while retaining the same storage capacity and > failure tolerance capability as RS codes. This JIRA aims to introduce > Hitchhiker to the HDFS-EC framework, as one of the pluggable codec algorithms. > The existing implementation is based on HDFS-RAID. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm
[ https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15108115#comment-15108115 ] Zhe Zhang commented on HADOOP-11828: Thanks Jack for the great work, and Rashmi / Kai for the reviews! The patch and it LGTM overall. I about about to finish and post my review. Meanwhile, could you add Apache license header to {{HHXORErasureDecodingStep}} (see this [complain | https://builds.apache.org/job/PreCommit-HADOOP-Build/8425/artifact/patchprocess/patch-asflicense-problems.txt]), and add a trivial {{package-info}} for now (as Kai suggested)? The purpose of the class is for Javadoc and I do think we will add Javadoc for the new package later. > Implement the Hitchhiker erasure coding algorithm > - > > Key: HADOOP-11828 > URL: https://issues.apache.org/jira/browse/HADOOP-11828 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Zhe Zhang >Assignee: jack liuquan > Attachments: 7715-hitchhikerXOR-v2-testcode.patch, > 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, > HADOOP-11828-hitchhikerXOR-V4.patch, HADOOP-11828-hitchhikerXOR-V5.patch, > HADOOP-11828-hitchhikerXOR-V6.patch, HADOOP-11828-hitchhikerXOR-V7.patch, > HDFS-7715-hhxor-decoder.patch, HDFS-7715-hhxor-encoder.patch > > > [Hitchhiker | > http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is > a new erasure coding algorithm developed as a research project at UC > Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% > during data reconstruction while retaining the same storage capacity and > failure tolerance capability as RS codes. This JIRA aims to introduce > Hitchhiker to the HDFS-EC framework, as one of the pluggable codec algorithms. > The existing implementation is based on HDFS-RAID. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm
[ https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15107951#comment-15107951 ] Kai Zheng commented on HADOOP-11828: bq. I find not all the package added the package-info.java file. That's right. You probably don't find it in existing packages. That's fine and people won't complain. But if someone would complement and provide it for existing packages, it would be welcome, and I guess the check style counter would be decremented. bq. I'm just confused about that where need to add, where not need. Well, if you introduce a new package, then probably you need to add it, because introduce a new check style (counter +1). IMO, {{package-info}} for important package makes sense, but for an initial package, it would be OK to delay on adding it. But since it's required by the check for now, simply, maybe just to add it. :( > Implement the Hitchhiker erasure coding algorithm > - > > Key: HADOOP-11828 > URL: https://issues.apache.org/jira/browse/HADOOP-11828 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Zhe Zhang >Assignee: jack liuquan > Attachments: 7715-hitchhikerXOR-v2-testcode.patch, > 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, > HADOOP-11828-hitchhikerXOR-V4.patch, HADOOP-11828-hitchhikerXOR-V5.patch, > HADOOP-11828-hitchhikerXOR-V6.patch, HADOOP-11828-hitchhikerXOR-V7.patch, > HDFS-7715-hhxor-decoder.patch, HDFS-7715-hhxor-encoder.patch > > > [Hitchhiker | > http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is > a new erasure coding algorithm developed as a research project at UC > Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% > during data reconstruction while retaining the same storage capacity and > failure tolerance capability as RS codes. This JIRA aims to introduce > Hitchhiker to the HDFS-EC framework, as one of the pluggable codec algorithms. > The existing implementation is based on HDFS-RAID. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm
[ https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15108177#comment-15108177 ] Zhe Zhang commented on HADOOP-11828: I finished reviewing the patch and it LGTM overall. Given that this is a new coder and doesn't modify existing code, I'm +1 on committing the latest patch pending a fix on the license header and checkstyle issues, and the following minor issue: {code} private HHUtil() { // No called } {code} For an util class with all static methods, we don't need a constructor. Possible follow-on work and questions: # The {{getPiggyBacksFromInput}} method is fairly complex and deserves a unit test. A ASCII illustration would also be very helpful, similar to Figure 4 in the Hitchhiker [paper | http://eecs.berkeley.edu/~rashmikv/papers/Hitchhiker_SIGCOMM14.pdf]. # Could {{SUB_PACKET_SIZE}} be other than 2? If so, should we still keep it as a variable? # Maybe I'm missing something, but how do we guarantee the length of inputs passed to {{performCoding}} is always {{numDataUnits * subPacketSize}}? > Implement the Hitchhiker erasure coding algorithm > - > > Key: HADOOP-11828 > URL: https://issues.apache.org/jira/browse/HADOOP-11828 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Zhe Zhang >Assignee: jack liuquan > Attachments: 7715-hitchhikerXOR-v2-testcode.patch, > 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, > HADOOP-11828-hitchhikerXOR-V4.patch, HADOOP-11828-hitchhikerXOR-V5.patch, > HADOOP-11828-hitchhikerXOR-V6.patch, HADOOP-11828-hitchhikerXOR-V7.patch, > HDFS-7715-hhxor-decoder.patch, HDFS-7715-hhxor-encoder.patch > > > [Hitchhiker | > http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is > a new erasure coding algorithm developed as a research project at UC > Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% > during data reconstruction while retaining the same storage capacity and > failure tolerance capability as RS codes. This JIRA aims to introduce > Hitchhiker to the HDFS-EC framework, as one of the pluggable codec algorithms. > The existing implementation is based on HDFS-RAID. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm
[ https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15102998#comment-15102998 ] jack liuquan commented on HADOOP-11828: --- Hi, kai, the latest v7 patch is only updating for repairing the check in the report. If you think the code is ok, I will proceed two other modes of hh algorithm. > Implement the Hitchhiker erasure coding algorithm > - > > Key: HADOOP-11828 > URL: https://issues.apache.org/jira/browse/HADOOP-11828 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Zhe Zhang >Assignee: jack liuquan > Attachments: 7715-hitchhikerXOR-v2-testcode.patch, > 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, > HADOOP-11828-hitchhikerXOR-V4.patch, HADOOP-11828-hitchhikerXOR-V5.patch, > HADOOP-11828-hitchhikerXOR-V6.patch, HADOOP-11828-hitchhikerXOR-V7.patch, > HDFS-7715-hhxor-decoder.patch, HDFS-7715-hhxor-encoder.patch > > > [Hitchhiker | > http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is > a new erasure coding algorithm developed as a research project at UC > Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% > during data reconstruction while retaining the same storage capacity and > failure tolerance capability as RS codes. This JIRA aims to introduce > Hitchhiker to the HDFS-EC framework, as one of the pluggable codec algorithms. > The existing implementation is based on HDFS-RAID. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm
[ https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15103030#comment-15103030 ] Hadoop QA commented on HADOOP-11828: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 3 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 27s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 11s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 29s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 18s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 11s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 14s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 56s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 57s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 7s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 38s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 34s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 6m 34s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 19s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 19s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 17s {color} | {color:red} Patch generated 1 new checkstyle issues in hadoop-common-project/hadoop-common (total was 4, now 5). {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 4s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 14s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 2s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 58s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 7s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 2s {color} | {color:green} hadoop-common in the patch passed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 0s {color} | {color:green} hadoop-common in the patch passed with JDK v1.7.0_91. {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 24s {color} | {color:red} Patch generated 1 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 67m 53s {color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:0ca8df7 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12782687/HADOOP-11828-hitchhikerXOR-V7.patch | | JIRA Issue | HADOOP-11828 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 5b83ad083f35 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 2a30386 | | Default
[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm
[ https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15101540#comment-15101540 ] Kai Zheng commented on HADOOP-11828: Thanks Jack for the update! Would be good to have a summary about the updates. [~zhz] or [~walter.k.su] maybe you could do the final review and get it in if everything is ok? There're two other modes I guess Jack could proceed basing on this work. > Implement the Hitchhiker erasure coding algorithm > - > > Key: HADOOP-11828 > URL: https://issues.apache.org/jira/browse/HADOOP-11828 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Zhe Zhang >Assignee: jack liuquan > Attachments: 7715-hitchhikerXOR-v2-testcode.patch, > 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, > HADOOP-11828-hitchhikerXOR-V4.patch, HADOOP-11828-hitchhikerXOR-V5.patch, > HADOOP-11828-hitchhikerXOR-V6.patch, HADOOP-11828-hitchhikerXOR-V7.zip, > HDFS-7715-hhxor-decoder.patch, HDFS-7715-hhxor-encoder.patch > > > [Hitchhiker | > http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is > a new erasure coding algorithm developed as a research project at UC > Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% > during data reconstruction while retaining the same storage capacity and > failure tolerance capability as RS codes. This JIRA aims to introduce > Hitchhiker to the HDFS-EC framework, as one of the pluggable codec algorithms. > The existing implementation is based on HDFS-RAID. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm
[ https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15090569#comment-15090569 ] Hadoop QA commented on HADOOP-11828: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 3 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 9s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 39s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 14s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 17s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 9s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 15s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 56s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 58s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 6s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 39s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 29s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 8m 29s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 15s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 9m 15s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 16s {color} | {color:red} Patch generated 8 new checkstyle issues in hadoop-common-project/hadoop-common (total was 0, now 8). {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 3s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 15s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 5s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 3m 21s {color} | {color:red} hadoop-common-project_hadoop-common-jdk1.8.0_66 with JDK v1.8.0_66 generated 1 new issues (was 1, now 2). {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 56s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 4m 49s {color} | {color:red} hadoop-common-project_hadoop-common-jdk1.7.0_91 with JDK v1.7.0_91 generated 1 new issues (was 13, now 14). {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 6s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 17m 10s {color} | {color:red} hadoop-common in the patch failed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 35s {color} | {color:green} hadoop-common in the patch passed with JDK v1.7.0_91. {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 25s {color} | {color:red} Patch generated 8 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 84m 23s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_66 Failed junit tests | hadoop.metrics2.impl.TestGangliaMetrics | | JDK v1.8.0_66 Timed out junit tests | org.apache.hadoop.http.TestHttpServerLifecycle | \\
[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm
[ https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15090556#comment-15090556 ] jack liuquan commented on HADOOP-11828: --- Hi all, I have update a new patch. This patch fix the codes after review comments. > Implement the Hitchhiker erasure coding algorithm > - > > Key: HADOOP-11828 > URL: https://issues.apache.org/jira/browse/HADOOP-11828 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Zhe Zhang >Assignee: jack liuquan > Attachments: 7715-hitchhikerXOR-v2-testcode.patch, > 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, > HADOOP-11828-hitchhikerXOR-V4.patch, HADOOP-11828-hitchhikerXOR-V5.patch, > HADOOP-11828-hitchhikerXOR-V6.patch, HDFS-7715-hhxor-decoder.patch, > HDFS-7715-hhxor-encoder.patch > > > [Hitchhiker | > http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is > a new erasure coding algorithm developed as a research project at UC > Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% > during data reconstruction while retaining the same storage capacity and > failure tolerance capability as RS codes. This JIRA aims to introduce > Hitchhiker to the HDFS-EC framework, as one of the pluggable codec algorithms. > The existing implementation is based on HDFS-RAID. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm
[ https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15088969#comment-15088969 ] jack liuquan commented on HADOOP-11828: --- Hi kai, The report urls of last build have been not available, how can I get the report again? > Implement the Hitchhiker erasure coding algorithm > - > > Key: HADOOP-11828 > URL: https://issues.apache.org/jira/browse/HADOOP-11828 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Zhe Zhang >Assignee: jack liuquan > Attachments: 7715-hitchhikerXOR-v2-testcode.patch, > 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, > HADOOP-11828-hitchhikerXOR-V4.patch, HADOOP-11828-hitchhikerXOR-V5.patch, > HDFS-7715-hhxor-decoder.patch, HDFS-7715-hhxor-encoder.patch > > > [Hitchhiker | > http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is > a new erasure coding algorithm developed as a research project at UC > Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% > during data reconstruction while retaining the same storage capacity and > failure tolerance capability as RS codes. This JIRA aims to introduce > Hitchhiker to the HDFS-EC framework, as one of the pluggable codec algorithms. > The existing implementation is based on HDFS-RAID. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm
[ https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15089005#comment-15089005 ] jack liuquan commented on HADOOP-11828: --- OK, Thanks, Kai. > Implement the Hitchhiker erasure coding algorithm > - > > Key: HADOOP-11828 > URL: https://issues.apache.org/jira/browse/HADOOP-11828 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Zhe Zhang >Assignee: jack liuquan > Attachments: 7715-hitchhikerXOR-v2-testcode.patch, > 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, > HADOOP-11828-hitchhikerXOR-V4.patch, HADOOP-11828-hitchhikerXOR-V5.patch, > HDFS-7715-hhxor-decoder.patch, HDFS-7715-hhxor-encoder.patch > > > [Hitchhiker | > http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is > a new erasure coding algorithm developed as a research project at UC > Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% > during data reconstruction while retaining the same storage capacity and > failure tolerance capability as RS codes. This JIRA aims to introduce > Hitchhiker to the HDFS-EC framework, as one of the pluggable codec algorithms. > The existing implementation is based on HDFS-RAID. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm
[ https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15089001#comment-15089001 ] Kai Zheng commented on HADOOP-11828: I guess it's flushed out. So you may cancel and then submit your patch again to trigger the building again? Or just update the patch to wait for another round. > Implement the Hitchhiker erasure coding algorithm > - > > Key: HADOOP-11828 > URL: https://issues.apache.org/jira/browse/HADOOP-11828 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Zhe Zhang >Assignee: jack liuquan > Attachments: 7715-hitchhikerXOR-v2-testcode.patch, > 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, > HADOOP-11828-hitchhikerXOR-V4.patch, HADOOP-11828-hitchhikerXOR-V5.patch, > HDFS-7715-hhxor-decoder.patch, HDFS-7715-hhxor-encoder.patch > > > [Hitchhiker | > http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is > a new erasure coding algorithm developed as a research project at UC > Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% > during data reconstruction while retaining the same storage capacity and > failure tolerance capability as RS codes. This JIRA aims to introduce > Hitchhiker to the HDFS-EC framework, as one of the pluggable codec algorithms. > The existing implementation is based on HDFS-RAID. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm
[ https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15085301#comment-15085301 ] jack liuquan commented on HADOOP-11828: --- Hi Kai, I have seen the update in HADOOP-12685. I will update a new patch this week. > Implement the Hitchhiker erasure coding algorithm > - > > Key: HADOOP-11828 > URL: https://issues.apache.org/jira/browse/HADOOP-11828 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Zhe Zhang >Assignee: jack liuquan > Attachments: 7715-hitchhikerXOR-v2-testcode.patch, > 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, > HADOOP-11828-hitchhikerXOR-V4.patch, HADOOP-11828-hitchhikerXOR-V5.patch, > HDFS-7715-hhxor-decoder.patch, HDFS-7715-hhxor-encoder.patch > > > [Hitchhiker | > http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is > a new erasure coding algorithm developed as a research project at UC > Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% > during data reconstruction while retaining the same storage capacity and > failure tolerance capability as RS codes. This JIRA aims to introduce > Hitchhiker to the HDFS-EC framework, as one of the pluggable codec algorithms. > The existing implementation is based on HDFS-RAID. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm
[ https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15082654#comment-15082654 ] Kai Zheng commented on HADOOP-11828: Thanks [~lirui] for the taking to fix the issue mentioned by Jack in HADOOP-12685. Jack if you need the fix you might check the patch there. By the way, do you have any update, Jack? Note there is some progress in HDFS-9603 to apply {{ErasureCoder}} in HDFS side. With this done, we can try it in HDFS to see the effect. Thanks. > Implement the Hitchhiker erasure coding algorithm > - > > Key: HADOOP-11828 > URL: https://issues.apache.org/jira/browse/HADOOP-11828 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Zhe Zhang >Assignee: jack liuquan > Attachments: 7715-hitchhikerXOR-v2-testcode.patch, > 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, > HADOOP-11828-hitchhikerXOR-V4.patch, HADOOP-11828-hitchhikerXOR-V5.patch, > HDFS-7715-hhxor-decoder.patch, HDFS-7715-hhxor-encoder.patch > > > [Hitchhiker | > http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is > a new erasure coding algorithm developed as a research project at UC > Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% > during data reconstruction while retaining the same storage capacity and > failure tolerance capability as RS codes. This JIRA aims to introduce > Hitchhiker to the HDFS-EC framework, as one of the pluggable codec algorithms. > The existing implementation is based on HDFS-RAID. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm
[ https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15075787#comment-15075787 ] Rui Li commented on HADOOP-11828: - Just filed HADOOP-12685 for the position inconsistent issue. > Implement the Hitchhiker erasure coding algorithm > - > > Key: HADOOP-11828 > URL: https://issues.apache.org/jira/browse/HADOOP-11828 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Zhe Zhang >Assignee: jack liuquan > Attachments: 7715-hitchhikerXOR-v2-testcode.patch, > 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, > HADOOP-11828-hitchhikerXOR-V4.patch, HADOOP-11828-hitchhikerXOR-V5.patch, > HDFS-7715-hhxor-decoder.patch, HDFS-7715-hhxor-encoder.patch > > > [Hitchhiker | > http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is > a new erasure coding algorithm developed as a research project at UC > Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% > during data reconstruction while retaining the same storage capacity and > failure tolerance capability as RS codes. This JIRA aims to introduce > Hitchhiker to the HDFS-EC framework, as one of the pluggable codec algorithms. > The existing implementation is based on HDFS-RAID. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm
[ https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15073356#comment-15073356 ] jack liuquan commented on HADOOP-11828: --- Hi Kai, Thank you for your reply.I will maintain the same codes of HH layer. > Implement the Hitchhiker erasure coding algorithm > - > > Key: HADOOP-11828 > URL: https://issues.apache.org/jira/browse/HADOOP-11828 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Zhe Zhang >Assignee: jack liuquan > Attachments: 7715-hitchhikerXOR-v2-testcode.patch, > 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, > HADOOP-11828-hitchhikerXOR-V4.patch, HADOOP-11828-hitchhikerXOR-V5.patch, > HDFS-7715-hhxor-decoder.patch, HDFS-7715-hhxor-encoder.patch > > > [Hitchhiker | > http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is > a new erasure coding algorithm developed as a research project at UC > Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% > during data reconstruction while retaining the same storage capacity and > failure tolerance capability as RS codes. This JIRA aims to introduce > Hitchhiker to the HDFS-EC framework, as one of the pluggable codec algorithms. > The existing implementation is based on HDFS-RAID. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm
[ https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072494#comment-15072494 ] Kai Zheng commented on HADOOP-11828: Hi Jack, {{ALLOW_CHANGE_INPUTS}} means it can change the input buffers data or content, and the buffer's position may still change after consumed. I thought it would be good to make it consistent regarding the buffer position change behavior between the two buffer types, and I would take care of it in some other issues. Anyway you'd better duplicate those input buffers for the {{HH}} layer for robust. OK? > Implement the Hitchhiker erasure coding algorithm > - > > Key: HADOOP-11828 > URL: https://issues.apache.org/jira/browse/HADOOP-11828 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Zhe Zhang >Assignee: jack liuquan > Attachments: 7715-hitchhikerXOR-v2-testcode.patch, > 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, > HADOOP-11828-hitchhikerXOR-V4.patch, HADOOP-11828-hitchhikerXOR-V5.patch, > HDFS-7715-hhxor-decoder.patch, HDFS-7715-hhxor-encoder.patch > > > [Hitchhiker | > http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is > a new erasure coding algorithm developed as a research project at UC > Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% > during data reconstruction while retaining the same storage capacity and > failure tolerance capability as RS codes. This JIRA aims to introduce > Hitchhiker to the HDFS-EC framework, as one of the pluggable codec algorithms. > The existing implementation is based on HDFS-RAID. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm
[ https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15071796#comment-15071796 ] jack liuquan commented on HADOOP-11828: --- Hi Kai, * In raw erasure coder level, you can set the {{ALLOW_CHANGE_INPUTS}} coder option to ensure the input buffers are not changed during encoding/decoding. Thus in HH coder level, you don't need to clone the input buffers thus avoids data copy. When I tested, I found that after running encode() of RS, non-direct buffer input's position will move to end. but direct buffer input's position will not move. Is that be OK? If I don't clone the input buffers in HH coder level, I will deal with non-direct buffer input and move input postion to the begin after I call encode() of RS. > Implement the Hitchhiker erasure coding algorithm > - > > Key: HADOOP-11828 > URL: https://issues.apache.org/jira/browse/HADOOP-11828 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Zhe Zhang >Assignee: jack liuquan > Attachments: 7715-hitchhikerXOR-v2-testcode.patch, > 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, > HADOOP-11828-hitchhikerXOR-V4.patch, HADOOP-11828-hitchhikerXOR-V5.patch, > HDFS-7715-hhxor-decoder.patch, HDFS-7715-hhxor-encoder.patch > > > [Hitchhiker | > http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is > a new erasure coding algorithm developed as a research project at UC > Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% > during data reconstruction while retaining the same storage capacity and > failure tolerance capability as RS codes. This JIRA aims to introduce > Hitchhiker to the HDFS-EC framework, as one of the pluggable codec algorithms. > The existing implementation is based on HDFS-RAID. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm
[ https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15071785#comment-15071785 ] jack liuquan commented on HADOOP-11828: --- Hi Rashmi, Thanks for your review. Your comments are great! I will do after your comments. > Implement the Hitchhiker erasure coding algorithm > - > > Key: HADOOP-11828 > URL: https://issues.apache.org/jira/browse/HADOOP-11828 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Zhe Zhang >Assignee: jack liuquan > Attachments: 7715-hitchhikerXOR-v2-testcode.patch, > 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, > HADOOP-11828-hitchhikerXOR-V4.patch, HADOOP-11828-hitchhikerXOR-V5.patch, > HDFS-7715-hhxor-decoder.patch, HDFS-7715-hhxor-encoder.patch > > > [Hitchhiker | > http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is > a new erasure coding algorithm developed as a research project at UC > Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% > during data reconstruction while retaining the same storage capacity and > failure tolerance capability as RS codes. This JIRA aims to introduce > Hitchhiker to the HDFS-EC framework, as one of the pluggable codec algorithms. > The existing implementation is based on HDFS-RAID. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm
[ https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15069031#comment-15069031 ] Rashmi Vinayak commented on HADOOP-11828: - Hi [~jack_liuquan], Thanks for the great work! I went through the code very carefully for the algorithm review. Everything looks fine in terms of correctness. Few comments: 1. The name ‘doDecodeMulti’ for the method in HHXORErasureDecodingStep is slightly confusing since it handles both the case of multiple erasures and as well single parity erasure. Perhaps something on the lines of ‘doDecodeMultiAndParity’ might reflect the actions of this method more accurately? 2. It seems that there is no need to pass ‘erasedIndexes’ as input to the methods in HHXORErasureDecodingStep class since it is a class variable? (You might have used these additional inputs for clarity; I just thought of bringing this to your attention.) 3. On a minor side, I think it would be helpful for future readers to include a reference to the paper in case they want to understand the algorithm. What do you think? (We can have something on the lines: “A "Hitchhiker's" Guide to Fast and Efficient Data Reconstruction in Erasure-coded Data Centers”, in ACM SIGCOMM 2014.). Also, just to make the context completely clear, could you please change the description in the comments to “It has been shown to reduce network traffic and disk I/O by 25%-45% during data reconstruction while retaining the same storage capacity and failure tolerance capability of RS codes.” (last phrase is added to the existing comment). Thanks, Rashmi > Implement the Hitchhiker erasure coding algorithm > - > > Key: HADOOP-11828 > URL: https://issues.apache.org/jira/browse/HADOOP-11828 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Zhe Zhang >Assignee: jack liuquan > Attachments: 7715-hitchhikerXOR-v2-testcode.patch, > 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, > HADOOP-11828-hitchhikerXOR-V4.patch, HADOOP-11828-hitchhikerXOR-V5.patch, > HDFS-7715-hhxor-decoder.patch, HDFS-7715-hhxor-encoder.patch > > > [Hitchhiker | > http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is > a new erasure coding algorithm developed as a research project at UC > Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% > during data reconstruction while retaining the same storage capacity and > failure tolerance capability as RS codes. This JIRA aims to introduce > Hitchhiker to the HDFS-EC framework, as one of the pluggable codec algorithms. > The existing implementation is based on HDFS-RAID. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm
[ https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15067995#comment-15067995 ] jack liuquan commented on HADOOP-11828: --- Hi Kai, Thank you for your review. I will fix the codes after your comments. > Implement the Hitchhiker erasure coding algorithm > - > > Key: HADOOP-11828 > URL: https://issues.apache.org/jira/browse/HADOOP-11828 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Zhe Zhang >Assignee: jack liuquan > Attachments: 7715-hitchhikerXOR-v2-testcode.patch, > 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, > HADOOP-11828-hitchhikerXOR-V4.patch, HADOOP-11828-hitchhikerXOR-V5.patch, > HDFS-7715-hhxor-decoder.patch, HDFS-7715-hhxor-encoder.patch > > > [Hitchhiker | > http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is > a new erasure coding algorithm developed as a research project at UC > Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% > during data reconstruction. This JIRA aims to introduce Hitchhiker to the > HDFS-EC framework, as one of the pluggable codec algorithms. > The existing implementation is based on HDFS-RAID. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm
[ https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15066468#comment-15066468 ] Kai Zheng commented on HADOOP-11828: The code looks pretty nice and much clean now. Besides the Jenkins reported issues you need to look at and clear, some comments: * In {{AbstractHHErasureCodingStep}}, {{getSubPacketSize}} may be protected instead of public * In raw erasure coder level, you can set the {{ALLOW_CHANGE_INPUTS}} coder option to ensure the input buffers are not changed during encoding/decoding. Thus in HH coder level, you don't need to clone the input buffers thus avoids data copy. * In {{TestErasureCoderBase}}, note HH specific logic was added. Please don't, because it's for all codec/coders. you can add a test base class for HH like {{TestHHErasureCoderBase}} that extends {{TestErasureCoderBase}} instead. > Implement the Hitchhiker erasure coding algorithm > - > > Key: HADOOP-11828 > URL: https://issues.apache.org/jira/browse/HADOOP-11828 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Zhe Zhang >Assignee: jack liuquan > Attachments: 7715-hitchhikerXOR-v2-testcode.patch, > 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, > HADOOP-11828-hitchhikerXOR-V4.patch, HADOOP-11828-hitchhikerXOR-V5.patch, > HDFS-7715-hhxor-decoder.patch, HDFS-7715-hhxor-encoder.patch > > > [Hitchhiker | > http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is > a new erasure coding algorithm developed as a research project at UC > Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% > during data reconstruction. This JIRA aims to introduce Hitchhiker to the > HDFS-EC framework, as one of the pluggable codec algorithms. > The existing implementation is based on HDFS-RAID. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm
[ https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15066365#comment-15066365 ] jack liuquan commented on HADOOP-11828: --- Hi all, Have you reviewed the codes? Please tell me immediately when you have finished review. Thanks! > Implement the Hitchhiker erasure coding algorithm > - > > Key: HADOOP-11828 > URL: https://issues.apache.org/jira/browse/HADOOP-11828 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Zhe Zhang >Assignee: jack liuquan > Attachments: 7715-hitchhikerXOR-v2-testcode.patch, > 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, > HADOOP-11828-hitchhikerXOR-V4.patch, HADOOP-11828-hitchhikerXOR-V5.patch, > HDFS-7715-hhxor-decoder.patch, HDFS-7715-hhxor-encoder.patch > > > [Hitchhiker | > http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is > a new erasure coding algorithm developed as a research project at UC > Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% > during data reconstruction. This JIRA aims to introduce Hitchhiker to the > HDFS-EC framework, as one of the pluggable codec algorithms. > The existing implementation is based on HDFS-RAID. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm
[ https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15066435#comment-15066435 ] Kai Zheng commented on HADOOP-11828: Hi Jack, thanks for your update. Let me read it and give some comments soon. > Implement the Hitchhiker erasure coding algorithm > - > > Key: HADOOP-11828 > URL: https://issues.apache.org/jira/browse/HADOOP-11828 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Zhe Zhang >Assignee: jack liuquan > Attachments: 7715-hitchhikerXOR-v2-testcode.patch, > 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, > HADOOP-11828-hitchhikerXOR-V4.patch, HADOOP-11828-hitchhikerXOR-V5.patch, > HDFS-7715-hhxor-decoder.patch, HDFS-7715-hhxor-encoder.patch > > > [Hitchhiker | > http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is > a new erasure coding algorithm developed as a research project at UC > Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% > during data reconstruction. This JIRA aims to introduce Hitchhiker to the > HDFS-EC framework, as one of the pluggable codec algorithms. > The existing implementation is based on HDFS-RAID. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm
[ https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051642#comment-15051642 ] Rashmi Vinayak commented on HADOOP-11828: - Hi Jack, Great! I will start working on the algo review. > Implement the Hitchhiker erasure coding algorithm > - > > Key: HADOOP-11828 > URL: https://issues.apache.org/jira/browse/HADOOP-11828 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Zhe Zhang >Assignee: jack liuquan > Attachments: 7715-hitchhikerXOR-v2-testcode.patch, > 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, > HADOOP-11828-hitchhikerXOR-V4.patch, HADOOP-11828-hitchhikerXOR-V5.patch, > HDFS-7715-hhxor-decoder.patch, HDFS-7715-hhxor-encoder.patch > > > [Hitchhiker | > http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is > a new erasure coding algorithm developed as a research project at UC > Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% > during data reconstruction. This JIRA aims to introduce Hitchhiker to the > HDFS-EC framework, as one of the pluggable codec algorithms. > The existing implementation is based on HDFS-RAID. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm
[ https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15048319#comment-15048319 ] Hadoop QA commented on HADOOP-11828: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 1s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 2 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 17s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 56s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 21s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 17s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 8s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 14s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 55s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 0s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 8s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 42s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 59s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 8m 59s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 20s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 9m 20s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 16s {color} | {color:red} Patch generated 17 new checkstyle issues in hadoop-common-project/hadoop-common (total was 0, now 17). {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 5s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 14s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s {color} | {color:red} The patch has 22 line(s) that end in whitespace. Use git apply --whitespace=fix. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 2m 7s {color} | {color:red} hadoop-common-project/hadoop-common introduced 1 new FindBugs issues. {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 3m 25s {color} | {color:red} hadoop-common-project_hadoop-common-jdk1.8.0_66 with JDK v1.8.0_66 generated 1 new issues (was 1, now 2). {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 58s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 4m 55s {color} | {color:red} hadoop-common-project_hadoop-common-jdk1.7.0_91 with JDK v1.7.0_91 generated 1 new issues (was 13, now 14). {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 8s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 7m 44s {color} | {color:green} hadoop-common in the patch passed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 7m 47s {color} | {color:green} hadoop-common in the patch passed with JDK v1.7.0_91. {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 15s {color} | {color:red} Patch generated 7 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 75m 5s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-common-project/hadoop-common | | | Unread
[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm
[ https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15050047#comment-15050047 ] jack liuquan commented on HADOOP-11828: --- Hi ,all. Can you review codes? or i should fix codes following the report first? If you can review codes right now, then I can fix codes with your review comments and report together. > Implement the Hitchhiker erasure coding algorithm > - > > Key: HADOOP-11828 > URL: https://issues.apache.org/jira/browse/HADOOP-11828 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Zhe Zhang >Assignee: jack liuquan > Attachments: 7715-hitchhikerXOR-v2-testcode.patch, > 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, > HADOOP-11828-hitchhikerXOR-V4.patch, HADOOP-11828-hitchhikerXOR-V5.patch, > HDFS-7715-hhxor-decoder.patch, HDFS-7715-hhxor-encoder.patch > > > [Hitchhiker | > http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is > a new erasure coding algorithm developed as a research project at UC > Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% > during data reconstruction. This JIRA aims to introduce Hitchhiker to the > HDFS-EC framework, as one of the pluggable codec algorithms. > The existing implementation is based on HDFS-RAID. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm
[ https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15046992#comment-15046992 ] jack liuquan commented on HADOOP-11828: --- Hi, Kai, I have updated a new patch of hitchhiker-XOR. This patch is for general parameters, not only for (10. 4). > Implement the Hitchhiker erasure coding algorithm > - > > Key: HADOOP-11828 > URL: https://issues.apache.org/jira/browse/HADOOP-11828 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Zhe Zhang >Assignee: jack liuquan > Attachments: 7715-hitchhikerXOR-v2-testcode.patch, > 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, > HADOOP-11828-hitchhikerXOR-V4.patch, HADOOP-11828-hitchhikerXOR-V5.patch, > HDFS-7715-hhxor-decoder.patch, HDFS-7715-hhxor-encoder.patch > > > [Hitchhiker | > http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is > a new erasure coding algorithm developed as a research project at UC > Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% > during data reconstruction. This JIRA aims to introduce Hitchhiker to the > HDFS-EC framework, as one of the pluggable codec algorithms. > The existing implementation is based on HDFS-RAID. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm
[ https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15047114#comment-15047114 ] Zhe Zhang commented on HADOOP-11828: Thanks [~jack_liuquan]! Per question from [~rashmikv] above, is the patch ready for algorithm review? Looks to me it's ready to be reviewed. > Implement the Hitchhiker erasure coding algorithm > - > > Key: HADOOP-11828 > URL: https://issues.apache.org/jira/browse/HADOOP-11828 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Zhe Zhang >Assignee: jack liuquan > Attachments: 7715-hitchhikerXOR-v2-testcode.patch, > 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, > HADOOP-11828-hitchhikerXOR-V4.patch, HADOOP-11828-hitchhikerXOR-V5.patch, > HDFS-7715-hhxor-decoder.patch, HDFS-7715-hhxor-encoder.patch > > > [Hitchhiker | > http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is > a new erasure coding algorithm developed as a research project at UC > Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% > during data reconstruction. This JIRA aims to introduce Hitchhiker to the > HDFS-EC framework, as one of the pluggable codec algorithms. > The existing implementation is based on HDFS-RAID. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm
[ https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15047116#comment-15047116 ] Zhe Zhang commented on HADOOP-11828: Could you also click "Submit Patch"? Thanks. > Implement the Hitchhiker erasure coding algorithm > - > > Key: HADOOP-11828 > URL: https://issues.apache.org/jira/browse/HADOOP-11828 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Zhe Zhang >Assignee: jack liuquan > Attachments: 7715-hitchhikerXOR-v2-testcode.patch, > 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, > HADOOP-11828-hitchhikerXOR-V4.patch, HADOOP-11828-hitchhikerXOR-V5.patch, > HDFS-7715-hhxor-decoder.patch, HDFS-7715-hhxor-encoder.patch > > > [Hitchhiker | > http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is > a new erasure coding algorithm developed as a research project at UC > Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% > during data reconstruction. This JIRA aims to introduce Hitchhiker to the > HDFS-EC framework, as one of the pluggable codec algorithms. > The existing implementation is based on HDFS-RAID. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm
[ https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15048012#comment-15048012 ] jack liuquan commented on HADOOP-11828: --- Hi Zhe, the patch is ready for algorithm review. I can click "Submit Patch", but I don't know how to fill the version info. > Implement the Hitchhiker erasure coding algorithm > - > > Key: HADOOP-11828 > URL: https://issues.apache.org/jira/browse/HADOOP-11828 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Zhe Zhang >Assignee: jack liuquan > Attachments: 7715-hitchhikerXOR-v2-testcode.patch, > 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, > HADOOP-11828-hitchhikerXOR-V4.patch, HADOOP-11828-hitchhikerXOR-V5.patch, > HDFS-7715-hhxor-decoder.patch, HDFS-7715-hhxor-encoder.patch > > > [Hitchhiker | > http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is > a new erasure coding algorithm developed as a research project at UC > Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% > during data reconstruction. This JIRA aims to introduce Hitchhiker to the > HDFS-EC framework, as one of the pluggable codec algorithms. > The existing implementation is based on HDFS-RAID. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm
[ https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15048021#comment-15048021 ] Kai Zheng commented on HADOOP-11828: Jack you can just submit without filling any information. The fields can be filled later. > Implement the Hitchhiker erasure coding algorithm > - > > Key: HADOOP-11828 > URL: https://issues.apache.org/jira/browse/HADOOP-11828 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Zhe Zhang >Assignee: jack liuquan > Attachments: 7715-hitchhikerXOR-v2-testcode.patch, > 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, > HADOOP-11828-hitchhikerXOR-V4.patch, HADOOP-11828-hitchhikerXOR-V5.patch, > HDFS-7715-hhxor-decoder.patch, HDFS-7715-hhxor-encoder.patch > > > [Hitchhiker | > http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is > a new erasure coding algorithm developed as a research project at UC > Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% > during data reconstruction. This JIRA aims to introduce Hitchhiker to the > HDFS-EC framework, as one of the pluggable codec algorithms. > The existing implementation is based on HDFS-RAID. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm
[ https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15044627#comment-15044627 ] jack liuquan commented on HADOOP-11828: --- I agree. I spend some time to read new codes & docs and will update a new patch this week. I will pay more time to cover this work in order to make it moving more faster. Thank you, all! > Implement the Hitchhiker erasure coding algorithm > - > > Key: HADOOP-11828 > URL: https://issues.apache.org/jira/browse/HADOOP-11828 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Zhe Zhang >Assignee: jack liuquan > Attachments: 7715-hitchhikerXOR-v2-testcode.patch, > 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, > HADOOP-11828-hitchhikerXOR-V4.patch, HDFS-7715-hhxor-decoder.patch, > HDFS-7715-hhxor-encoder.patch > > > [Hitchhiker | > http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is > a new erasure coding algorithm developed as a research project at UC > Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% > during data reconstruction. This JIRA aims to introduce Hitchhiker to the > HDFS-EC framework, as one of the pluggable codec algorithms. > The existing implementation is based on HDFS-RAID. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm
[ https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15044185#comment-15044185 ] Rashmi Vinayak commented on HADOOP-11828: - [~jack_liuquan], [~drankye]: I think we can make significantly more impact if we move faster and get this out soon: More systems will use the code and EC in general if this more efficient code is available when they are deciding whether to use EC or not. Do let me know when it would be appropriate to start with the algo review. Also, as we discussed, once we have (10,4), I will work on implementing for general parameters. [~jack_liuquan]: Feel free to let us know if there are any bottlenecks. > Implement the Hitchhiker erasure coding algorithm > - > > Key: HADOOP-11828 > URL: https://issues.apache.org/jira/browse/HADOOP-11828 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Zhe Zhang >Assignee: jack liuquan > Attachments: 7715-hitchhikerXOR-v2-testcode.patch, > 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, > HADOOP-11828-hitchhikerXOR-V4.patch, HDFS-7715-hhxor-decoder.patch, > HDFS-7715-hhxor-encoder.patch > > > [Hitchhiker | > http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is > a new erasure coding algorithm developed as a research project at UC > Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% > during data reconstruction. This JIRA aims to introduce Hitchhiker to the > HDFS-EC framework, as one of the pluggable codec algorithms. > The existing implementation is based on HDFS-RAID. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm
[ https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15044469#comment-15044469 ] Kai Zheng commented on HADOOP-11828: Thanks [~rashmikv] for the thoughts! As discussed off-line with [~zhz], we could even get this work based on trunk, not essentially bounded with phase II. Jack do you have any plan for this recently? > Implement the Hitchhiker erasure coding algorithm > - > > Key: HADOOP-11828 > URL: https://issues.apache.org/jira/browse/HADOOP-11828 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Zhe Zhang >Assignee: jack liuquan > Attachments: 7715-hitchhikerXOR-v2-testcode.patch, > 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, > HADOOP-11828-hitchhikerXOR-V4.patch, HDFS-7715-hhxor-decoder.patch, > HDFS-7715-hhxor-encoder.patch > > > [Hitchhiker | > http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is > a new erasure coding algorithm developed as a research project at UC > Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% > during data reconstruction. This JIRA aims to introduce Hitchhiker to the > HDFS-EC framework, as one of the pluggable codec algorithms. > The existing implementation is based on HDFS-RAID. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm
[ https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15002646#comment-15002646 ] Rashmi Vinayak commented on HADOOP-11828: - Thanks, [~zhz]! > Implement the Hitchhiker erasure coding algorithm > - > > Key: HADOOP-11828 > URL: https://issues.apache.org/jira/browse/HADOOP-11828 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Zhe Zhang >Assignee: jack liuquan > Attachments: 7715-hitchhikerXOR-v2-testcode.patch, > 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, > HADOOP-11828-hitchhikerXOR-V4.patch, HDFS-7715-hhxor-decoder.patch, > HDFS-7715-hhxor-encoder.patch > > > [Hitchhiker | > http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is > a new erasure coding algorithm developed as a research project at UC > Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% > during data reconstruction. This JIRA aims to introduce Hitchhiker to the > HDFS-EC framework, as one of the pluggable codec algorithms. > The existing implementation is based on HDFS-RAID. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm
[ https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14994586#comment-14994586 ] Zhe Zhang commented on HADOOP-11828: [~rashmikv] Phase 1 of EC was tracked under HDFS-7285 / HADOOP-11264, which has been finished. Follow-on tasks of phase 1 are being worked on, tracked under HDFS-8031 / HADOOP-11842. Overall, phase 1 of EC is already in Apache Hadoop trunk, and will appear in release version 2.9 or 3.0 (being discussed on hdfs-...@hadoop.apache.org mail list). Work on phase 2 of EC has not been started. We have created some tasks under the umbrella JIRA HDFS-8030. > Implement the Hitchhiker erasure coding algorithm > - > > Key: HADOOP-11828 > URL: https://issues.apache.org/jira/browse/HADOOP-11828 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Zhe Zhang >Assignee: jack liuquan > Attachments: 7715-hitchhikerXOR-v2-testcode.patch, > 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, > HADOOP-11828-hitchhikerXOR-V4.patch, HDFS-7715-hhxor-decoder.patch, > HDFS-7715-hhxor-encoder.patch > > > [Hitchhiker | > http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is > a new erasure coding algorithm developed as a research project at UC > Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% > during data reconstruction. This JIRA aims to introduce Hitchhiker to the > HDFS-EC framework, as one of the pluggable codec algorithms. > The existing implementation is based on HDFS-RAID. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm
[ https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14994571#comment-14994571 ] Rashmi Vinayak commented on HADOOP-11828: - [~drankye] [~jack_liuquan] Super excited about this! This might be a naive question: Is there a place where I can find more details about Phase 1 & 2 such as target dates? > Implement the Hitchhiker erasure coding algorithm > - > > Key: HADOOP-11828 > URL: https://issues.apache.org/jira/browse/HADOOP-11828 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Zhe Zhang >Assignee: jack liuquan > Attachments: 7715-hitchhikerXOR-v2-testcode.patch, > 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, > HADOOP-11828-hitchhikerXOR-V4.patch, HDFS-7715-hhxor-decoder.patch, > HDFS-7715-hhxor-encoder.patch > > > [Hitchhiker | > http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is > a new erasure coding algorithm developed as a research project at UC > Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% > during data reconstruction. This JIRA aims to introduce Hitchhiker to the > HDFS-EC framework, as one of the pluggable codec algorithms. > The existing implementation is based on HDFS-RAID. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm
[ https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14986546#comment-14986546 ] Kai Zheng commented on HADOOP-11828: I thought all the basic adjustment issues in raw coder layer were recently resolved so the code here can be rebased accordingly. It would be good to move on this even though it may be re-targeted for Phase II. Any thought? > Implement the Hitchhiker erasure coding algorithm > - > > Key: HADOOP-11828 > URL: https://issues.apache.org/jira/browse/HADOOP-11828 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Zhe Zhang >Assignee: jack liuquan > Attachments: 7715-hitchhikerXOR-v2-testcode.patch, > 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, > HADOOP-11828-hitchhikerXOR-V4.patch, HDFS-7715-hhxor-decoder.patch, > HDFS-7715-hhxor-encoder.patch > > > [Hitchhiker | > http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is > a new erasure coding algorithm developed as a research project at UC > Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% > during data reconstruction. This JIRA aims to introduce Hitchhiker to the > HDFS-EC framework, as one of the pluggable codec algorithms. > The existing implementation is based on HDFS-RAID. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm
[ https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14986580#comment-14986580 ] jack liuquan commented on HADOOP-11828: --- OK, I will do it. > Implement the Hitchhiker erasure coding algorithm > - > > Key: HADOOP-11828 > URL: https://issues.apache.org/jira/browse/HADOOP-11828 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Zhe Zhang >Assignee: jack liuquan > Attachments: 7715-hitchhikerXOR-v2-testcode.patch, > 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, > HADOOP-11828-hitchhikerXOR-V4.patch, HDFS-7715-hhxor-decoder.patch, > HDFS-7715-hhxor-encoder.patch > > > [Hitchhiker | > http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is > a new erasure coding algorithm developed as a research project at UC > Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% > during data reconstruction. This JIRA aims to introduce Hitchhiker to the > HDFS-EC framework, as one of the pluggable codec algorithms. > The existing implementation is based on HDFS-RAID. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm
[ https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14986557#comment-14986557 ] jack liuquan commented on HADOOP-11828: --- OK, I'm glad to hear that. Please tell me how to go on. > Implement the Hitchhiker erasure coding algorithm > - > > Key: HADOOP-11828 > URL: https://issues.apache.org/jira/browse/HADOOP-11828 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Zhe Zhang >Assignee: jack liuquan > Attachments: 7715-hitchhikerXOR-v2-testcode.patch, > 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, > HADOOP-11828-hitchhikerXOR-V4.patch, HDFS-7715-hhxor-decoder.patch, > HDFS-7715-hhxor-encoder.patch > > > [Hitchhiker | > http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is > a new erasure coding algorithm developed as a research project at UC > Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% > during data reconstruction. This JIRA aims to introduce Hitchhiker to the > HDFS-EC framework, as one of the pluggable codec algorithms. > The existing implementation is based on HDFS-RAID. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm
[ https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14986569#comment-14986569 ] Kai Zheng commented on HADOOP-11828: Jack, please checkout the latest trunk codes and understand related changes on raw erasure coders. You may also need to aware the on-going new Java coder in HADOOP-12041. Then rebase the patch here on trunk, also addressing previous review comments. > Implement the Hitchhiker erasure coding algorithm > - > > Key: HADOOP-11828 > URL: https://issues.apache.org/jira/browse/HADOOP-11828 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Zhe Zhang >Assignee: jack liuquan > Attachments: 7715-hitchhikerXOR-v2-testcode.patch, > 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, > HADOOP-11828-hitchhikerXOR-V4.patch, HDFS-7715-hhxor-decoder.patch, > HDFS-7715-hhxor-encoder.patch > > > [Hitchhiker | > http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is > a new erasure coding algorithm developed as a research project at UC > Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% > during data reconstruction. This JIRA aims to introduce Hitchhiker to the > HDFS-EC framework, as one of the pluggable codec algorithms. > The existing implementation is based on HDFS-RAID. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm
[ https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14726999#comment-14726999 ] jack liuquan commented on HADOOP-11828: --- Thanks all. Hi Kai, I am always here. ;) > Implement the Hitchhiker erasure coding algorithm > - > > Key: HADOOP-11828 > URL: https://issues.apache.org/jira/browse/HADOOP-11828 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Zhe Zhang >Assignee: jack liuquan > Attachments: 7715-hitchhikerXOR-v2-testcode.patch, > 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, > HADOOP-11828-hitchhikerXOR-V4.patch, HDFS-7715-hhxor-decoder.patch, > HDFS-7715-hhxor-encoder.patch > > > [Hitchhiker | > http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is > a new erasure coding algorithm developed as a research project at UC > Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% > during data reconstruction. This JIRA aims to introduce Hitchhiker to the > HDFS-EC framework, as one of the pluggable codec algorithms. > The existing implementation is based on HDFS-RAID. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm
[ https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14727791#comment-14727791 ] Rashmi Vinayak commented on HADOOP-11828: - [~drankye], [~jack_liuquan]: Awesome! > Implement the Hitchhiker erasure coding algorithm > - > > Key: HADOOP-11828 > URL: https://issues.apache.org/jira/browse/HADOOP-11828 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Zhe Zhang >Assignee: jack liuquan > Attachments: 7715-hitchhikerXOR-v2-testcode.patch, > 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, > HADOOP-11828-hitchhikerXOR-V4.patch, HDFS-7715-hhxor-decoder.patch, > HDFS-7715-hhxor-encoder.patch > > > [Hitchhiker | > http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is > a new erasure coding algorithm developed as a research project at UC > Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% > during data reconstruction. This JIRA aims to introduce Hitchhiker to the > HDFS-EC framework, as one of the pluggable codec algorithms. > The existing implementation is based on HDFS-RAID. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm
[ https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14724908#comment-14724908 ] Kai Zheng commented on HADOOP-11828: Thanks [~rashmikv] for the suggestion. There are some depending issues in the erasure codec & coder framework to be resolved. I'll get them done in higher priority. When they're done, I will update here then maybe Jack could rebase this effort? > Implement the Hitchhiker erasure coding algorithm > - > > Key: HADOOP-11828 > URL: https://issues.apache.org/jira/browse/HADOOP-11828 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Zhe Zhang >Assignee: jack liuquan > Attachments: 7715-hitchhikerXOR-v2-testcode.patch, > 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, > HADOOP-11828-hitchhikerXOR-V4.patch, HDFS-7715-hhxor-decoder.patch, > HDFS-7715-hhxor-encoder.patch > > > [Hitchhiker | > http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is > a new erasure coding algorithm developed as a research project at UC > Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% > during data reconstruction. This JIRA aims to introduce Hitchhiker to the > HDFS-EC framework, as one of the pluggable codec algorithms. > The existing implementation is based on HDFS-RAID. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm
[ https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14724449#comment-14724449 ] Rashmi Vinayak commented on HADOOP-11828: - Hi [~jack_liuquan] and [~drankye], Thank you both for the amazing effort on this feature implementation. I was wondering what we can do to not lose momentum. Perhaps assigning priorities to the major change suggestions could help? Thanks! > Implement the Hitchhiker erasure coding algorithm > - > > Key: HADOOP-11828 > URL: https://issues.apache.org/jira/browse/HADOOP-11828 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Zhe Zhang >Assignee: jack liuquan > Attachments: 7715-hitchhikerXOR-v2-testcode.patch, > 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, > HADOOP-11828-hitchhikerXOR-V4.patch, HDFS-7715-hhxor-decoder.patch, > HDFS-7715-hhxor-encoder.patch > > > [Hitchhiker | > http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is > a new erasure coding algorithm developed as a research project at UC > Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% > during data reconstruction. This JIRA aims to introduce Hitchhiker to the > HDFS-EC framework, as one of the pluggable codec algorithms. > The existing implementation is based on HDFS-RAID. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm
[ https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537790#comment-14537790 ] Kai Zheng commented on HADOOP-11828: Hi Jack, thanks for your clarifying. For the first 3 points I really would like them to be resolved first as they're clear to us now and it would lay a more solid base for the following implementations of the other two modes. Doing so we won't have to change big after committed. I understand the process isn't very productive but that's the pain of open source. I really wish we could get this in sooner but we have to do more reviews from more guys, so I guess you will have chances to get the codes more clean and elegant. bq. HH is specific in preparing input data in decoding I don't think so, any erasure code is used to encode and decode arbitrary user data, we don't need to prepare for it specifically. bq. Current testCoding()in TestErasureCoderBase using left 9 data units + 4 parity units to reconstruct the missing one data unit. Yes it is for now. It will be corrected in HADOOP-11847. I thought it's good to customize the {{testCoding}} logic here, but in future we should consolidate the codes into the parent {{testCoding}}. bq. I have no good idea cause encoding of RS will erasure input data. I see. I don't have either, checking the RS codes it's not easy to avoid the erasure. Let's optimize it in future when we get all the things work right first. Implement the Hitchhiker erasure coding algorithm - Key: HADOOP-11828 URL: https://issues.apache.org/jira/browse/HADOOP-11828 Project: Hadoop Common Issue Type: Sub-task Reporter: Zhe Zhang Assignee: jack liuquan Attachments: 7715-hitchhikerXOR-v2-testcode.patch, 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, HADOOP-11828-hitchhikerXOR-V4.patch, HDFS-7715-hhxor-decoder.patch, HDFS-7715-hhxor-encoder.patch [Hitchhiker | http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is a new erasure coding algorithm developed as a research project at UC Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% during data reconstruction. This JIRA aims to introduce Hitchhiker to the HDFS-EC framework, as one of the pluggable codec algorithms. The existing implementation is based on HDFS-RAID. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm
[ https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533961#comment-14533961 ] Kai Zheng commented on HADOOP-11828: Hi [~rashmikv] or [~jack_liuquan], for HitchHicker algorithm, is it possible to use different chunk/cell size in decoding from the chunk/cell size used in encoding? In my understanding, it's not or we shouldn't do that way. Please help clarify as you're experts in this implementation. Thanks. Implement the Hitchhiker erasure coding algorithm - Key: HADOOP-11828 URL: https://issues.apache.org/jira/browse/HADOOP-11828 Project: Hadoop Common Issue Type: Sub-task Reporter: Zhe Zhang Assignee: jack liuquan Attachments: 7715-hitchhikerXOR-v2-testcode.patch, 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, HADOOP-11828-hitchhikerXOR-V4.patch, HDFS-7715-hhxor-decoder.patch, HDFS-7715-hhxor-encoder.patch [Hitchhiker | http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is a new erasure coding algorithm developed as a research project at UC Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% during data reconstruction. This JIRA aims to introduce Hitchhiker to the HDFS-EC framework, as one of the pluggable codec algorithms. The existing implementation is based on HDFS-RAID. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm
[ https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14534164#comment-14534164 ] jack liuquan commented on HADOOP-11828: --- bq.As both xor raw coder and rs raw coder are common to erasure coders for RS and HH, please extract common codes resolving the duplicates to abstract class, regarding creating xor and rs raw coder. bq.We may need abstract class like HHErasureDecodingStep and HHErasureEncodingStep for the three derivations of the HH algorithm. Classes like HHXORErasureDecodingStep can inherit from them. bq.Please try to reuse codes between the two versions of coding: byte[] version and ByteBuffer version. You may look at the patch in HADOOP-11847 for some idea. Shall we mark these tips and deal with them in next iterative development? I think these optimization tips are not the last tips, if we fit one as soon as we find one, maybe we will do some temporary work and the development efficiency will be low. Maybe we can plan a next development iterative stage and cover all these tips in next stage. What you think? bq.We might not override testCoding and performCodingStep in TestHHErasureCoderBase. Any specific for HH here? If we have to, then there would be problem to use the coder as it's not general to use. HH is specific in preparing input data in decoding. e.g. in (k=10, r=4), Current testCoding()in {{TestErasureCoderBase}} using left 9 data units + 4 parity units to reconstruct the missing one data unit. But it is not good for HH cause the advantage of HH is to saving requring data units when reconstructing. For performCodingStep(), the reason is the use of sub-strip pair. bq.Is it possible to avoid the cloning input data in getPiggyBacksFromInput? I have no good idea cause encoding of RS will erasure input data. Could you give me some good suggestions? bq.Some comments might be better to reorganized to make them look better. Some are too long, and some can be longer. bq.Please note lines should not exceed 80 chars. You could set the width limit in your IDE. bq.We need Javadocs for the public functions in HHUtil. bq.I thought we don't need this test as it's the configuration isn't specific to the coder. These are OK to me, I will try my best to treat them with your suggestion. Implement the Hitchhiker erasure coding algorithm - Key: HADOOP-11828 URL: https://issues.apache.org/jira/browse/HADOOP-11828 Project: Hadoop Common Issue Type: Sub-task Reporter: Zhe Zhang Assignee: jack liuquan Attachments: 7715-hitchhikerXOR-v2-testcode.patch, 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, HADOOP-11828-hitchhikerXOR-V4.patch, HDFS-7715-hhxor-decoder.patch, HDFS-7715-hhxor-encoder.patch [Hitchhiker | http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is a new erasure coding algorithm developed as a research project at UC Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% during data reconstruction. This JIRA aims to introduce Hitchhiker to the HDFS-EC framework, as one of the pluggable codec algorithms. The existing implementation is based on HDFS-RAID. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm
[ https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14534051#comment-14534051 ] jack liuquan commented on HADOOP-11828: --- Hi Kai, I can't catch your meanings exactly. The encoding/decoding of bytes in one chunk(one chunk is a sub-stripe) is linear operation, but not linear between chunks in one block. If we read bytes in blocks is a linear operation, and it's right that the encoding/decoding of units should happen in aligned boundaries. But if we can use an offset to read chunks parallel, I think we can encoding/decoding bytes in flow mode, not need in aligned boundaries . Do I answer your question? Thanks! Implement the Hitchhiker erasure coding algorithm - Key: HADOOP-11828 URL: https://issues.apache.org/jira/browse/HADOOP-11828 Project: Hadoop Common Issue Type: Sub-task Reporter: Zhe Zhang Assignee: jack liuquan Attachments: 7715-hitchhikerXOR-v2-testcode.patch, 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, HADOOP-11828-hitchhikerXOR-V4.patch, HDFS-7715-hhxor-decoder.patch, HDFS-7715-hhxor-encoder.patch [Hitchhiker | http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is a new erasure coding algorithm developed as a research project at UC Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% during data reconstruction. This JIRA aims to introduce Hitchhiker to the HDFS-EC framework, as one of the pluggable codec algorithms. The existing implementation is based on HDFS-RAID. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm
[ https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533995#comment-14533995 ] jack liuquan commented on HADOOP-11828: --- Hi Kai, Yes,we can't use different chunk size in decoding and encoding cause we use sub-stripe pair to encoding and decoding in Hitchhiker. Different chunk size may make sub-stripes confusion. Implement the Hitchhiker erasure coding algorithm - Key: HADOOP-11828 URL: https://issues.apache.org/jira/browse/HADOOP-11828 Project: Hadoop Common Issue Type: Sub-task Reporter: Zhe Zhang Assignee: jack liuquan Attachments: 7715-hitchhikerXOR-v2-testcode.patch, 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, HADOOP-11828-hitchhikerXOR-V4.patch, HDFS-7715-hhxor-decoder.patch, HDFS-7715-hhxor-encoder.patch [Hitchhiker | http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is a new erasure coding algorithm developed as a research project at UC Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% during data reconstruction. This JIRA aims to introduce Hitchhiker to the HDFS-EC framework, as one of the pluggable codec algorithms. The existing implementation is based on HDFS-RAID. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm
[ https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14534009#comment-14534009 ] Kai Zheng commented on HADOOP-11828: Thanks Jack for the clarifying. So because of the sub-strip arrangement specific to the algorithm, the encoding/decoding of bytes in chunks isn't any a linear operation, and the encoding/decoding of units much happen in aligned boundaries (fixed chunk/cell size), right. Implement the Hitchhiker erasure coding algorithm - Key: HADOOP-11828 URL: https://issues.apache.org/jira/browse/HADOOP-11828 Project: Hadoop Common Issue Type: Sub-task Reporter: Zhe Zhang Assignee: jack liuquan Attachments: 7715-hitchhikerXOR-v2-testcode.patch, 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, HADOOP-11828-hitchhikerXOR-V4.patch, HDFS-7715-hhxor-decoder.patch, HDFS-7715-hhxor-encoder.patch [Hitchhiker | http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is a new erasure coding algorithm developed as a research project at UC Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% during data reconstruction. This JIRA aims to introduce Hitchhiker to the HDFS-EC framework, as one of the pluggable codec algorithms. The existing implementation is based on HDFS-RAID. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm
[ https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14534102#comment-14534102 ] jack liuquan commented on HADOOP-11828: --- yes, I confirm it. Implement the Hitchhiker erasure coding algorithm - Key: HADOOP-11828 URL: https://issues.apache.org/jira/browse/HADOOP-11828 Project: Hadoop Common Issue Type: Sub-task Reporter: Zhe Zhang Assignee: jack liuquan Attachments: 7715-hitchhikerXOR-v2-testcode.patch, 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, HADOOP-11828-hitchhikerXOR-V4.patch, HDFS-7715-hhxor-decoder.patch, HDFS-7715-hhxor-encoder.patch [Hitchhiker | http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is a new erasure coding algorithm developed as a research project at UC Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% during data reconstruction. This JIRA aims to introduce Hitchhiker to the HDFS-EC framework, as one of the pluggable codec algorithms. The existing implementation is based on HDFS-RAID. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm
[ https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14534080#comment-14534080 ] Kai Zheng commented on HADOOP-11828: And the chunkSize passed to initialize HH encoder should be the same value with the one used to initialize the corresponding HH decoder? Implement the Hitchhiker erasure coding algorithm - Key: HADOOP-11828 URL: https://issues.apache.org/jira/browse/HADOOP-11828 Project: Hadoop Common Issue Type: Sub-task Reporter: Zhe Zhang Assignee: jack liuquan Attachments: 7715-hitchhikerXOR-v2-testcode.patch, 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, HADOOP-11828-hitchhikerXOR-V4.patch, HDFS-7715-hhxor-decoder.patch, HDFS-7715-hhxor-encoder.patch [Hitchhiker | http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is a new erasure coding algorithm developed as a research project at UC Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% during data reconstruction. This JIRA aims to introduce Hitchhiker to the HDFS-EC framework, as one of the pluggable codec algorithms. The existing implementation is based on HDFS-RAID. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm
[ https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14534071#comment-14534071 ] Kai Zheng commented on HADOOP-11828: I thought I'm aligned with your understanding. So for simple let's come down to the real implementation, would you confirm, in your implemented HH encoder and decoder, the chunkSize passed to {{initialize}} function should be respected or used by {{encode}} and {{decode}} input/output buffers? Implement the Hitchhiker erasure coding algorithm - Key: HADOOP-11828 URL: https://issues.apache.org/jira/browse/HADOOP-11828 Project: Hadoop Common Issue Type: Sub-task Reporter: Zhe Zhang Assignee: jack liuquan Attachments: 7715-hitchhikerXOR-v2-testcode.patch, 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, HADOOP-11828-hitchhikerXOR-V4.patch, HDFS-7715-hhxor-decoder.patch, HDFS-7715-hhxor-encoder.patch [Hitchhiker | http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is a new erasure coding algorithm developed as a research project at UC Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% during data reconstruction. This JIRA aims to introduce Hitchhiker to the HDFS-EC framework, as one of the pluggable codec algorithms. The existing implementation is based on HDFS-RAID. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm
[ https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14526652#comment-14526652 ] Kai Zheng commented on HADOOP-11828: Hi Jack, the updated patch looks overall good to me. Some comments so far: * Some comments might be better to reorganized to make them look better. Some are too long, and some can be longer. * Please note lines should not exceed 80 chars. You could set the width limit in your IDE. * As both xor raw coder and rs raw coder are common to erasure coders for RS and HH, please extract common codes resolving the duplicates to abstract class, regarding creating xor and rs raw coder. * We may need abstract class like {{HHErasureDecodingStep}} and {{HHErasureEncodingStep}} for the three derivations of the HH algorithm. Classes like {{HHXORErasureDecodingStep}} can inherit from them. * Please try to reuse codes between the two versions of coding: byte[] version and ByteBuffer version. You may look at the patch in HADOOP-11847 for some idea. * We might not override {{testCoding}} and {{performCodingStep}} in {{TestHHErasureCoderBase}}. Any specific for HH here? If we have to, then there would be problem to use the coder as it's not general to use. * We need Javadocs for the public functions in {{HHUtil}}. * Is it possible to avoid the cloning input data in {{getPiggyBacksFromInput}}? * I thought we don't need this test as it's the configuration isn't specific to the coder. {code} + @Test + public void testCodingDirectBufferWithConf_10x4() { +/** + * This tests if the two configuration items work or not. + */ +Configuration conf = new Configuration(); +conf.set(CommonConfigurationKeys.IO_ERASURECODE_CODEC_RS_RAWCODER_KEY, +RSRawErasureCoderFactory.class.getCanonicalName()); +conf.setBoolean( +CommonConfigurationKeys.IO_ERASURECODE_CODEC_RS_USEXOR_KEY, false); +prepare(conf, 10, 4, null); +initHitchhiker(); +testCoding(true); + } {code} Implement the Hitchhiker erasure coding algorithm - Key: HADOOP-11828 URL: https://issues.apache.org/jira/browse/HADOOP-11828 Project: Hadoop Common Issue Type: Sub-task Reporter: Zhe Zhang Assignee: jack liuquan Attachments: 7715-hitchhikerXOR-v2-testcode.patch, 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, HADOOP-11828-hitchhikerXOR-V4.patch, HDFS-7715-hhxor-decoder.patch, HDFS-7715-hhxor-encoder.patch [Hitchhiker | http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is a new erasure coding algorithm developed as a research project at UC Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% during data reconstruction. This JIRA aims to introduce Hitchhiker to the HDFS-EC framework, as one of the pluggable codec algorithms. The existing implementation is based on HDFS-RAID. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm
[ https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14516526#comment-14516526 ] Kai Zheng commented on HADOOP-11828: Hi Jack, You're still using {{JRSRawEncoder}}, but it's renamed to {{RSRawEncoder}} quite some time ago. How about naming the new coder {{HitchhikerXORErasureEncoder}} to {{HHXORErasureEnoder}}? Similar to other coders. Implement the Hitchhiker erasure coding algorithm - Key: HADOOP-11828 URL: https://issues.apache.org/jira/browse/HADOOP-11828 Project: Hadoop Common Issue Type: Sub-task Reporter: Zhe Zhang Assignee: jack liuquan Attachments: 7715-hitchhikerXOR-v2-testcode.patch, 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, HDFS-7715-hhxor-decoder.patch, HDFS-7715-hhxor-encoder.patch [Hitchhiker | http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is a new erasure coding algorithm developed as a research project at UC Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% during data reconstruction. This JIRA aims to introduce Hitchhiker to the HDFS-EC framework, as one of the pluggable codec algorithms. The existing implementation is based on HDFS-RAID. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm
[ https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14516731#comment-14516731 ] jack liuquan commented on HADOOP-11828: --- Hi Kai, bq. I mentioned that in my rough thought, would you clarify it in details and provide your thoughts? This would be desired as it's a rather major design change. Hitchhiker algorithm builds on top of RS codes and XOR codes, it is more suitable to put hitchhiker in ErasureCoder layer for architecture consideration. And it is convenient to replace underlying RS codes and XOR codes of Hitchhiker to get better performance in ErasureCoder layer. bq.You're still using JRSRawEncoder, but it's renamed to RSRawEncoder quite some time ago. Oh yes, I see. I will check and modify it. bq.How about naming the new coder HitchhikerXORErasureEncoder to HHXORErasureEnoder? Similar to other coders. OK, sounds great. Implement the Hitchhiker erasure coding algorithm - Key: HADOOP-11828 URL: https://issues.apache.org/jira/browse/HADOOP-11828 Project: Hadoop Common Issue Type: Sub-task Reporter: Zhe Zhang Assignee: jack liuquan Attachments: 7715-hitchhikerXOR-v2-testcode.patch, 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, HDFS-7715-hhxor-decoder.patch, HDFS-7715-hhxor-encoder.patch [Hitchhiker | http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is a new erasure coding algorithm developed as a research project at UC Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% during data reconstruction. This JIRA aims to introduce Hitchhiker to the HDFS-EC framework, as one of the pluggable codec algorithms. The existing implementation is based on HDFS-RAID. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm
[ https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14510597#comment-14510597 ] jack liuquan commented on HADOOP-11828: --- Hi Kai, Thank you for your review. bq.* Please rebase with latest branch. Your codebase is rather old. sorry, I'm not sure. I add my codes based on the latest HDFS-7285 branch. Is that not right? bq.* Please remove codes for other modes for now, even in tests. OK, I will remove the codes not used in Hitchhiker-XOR. Implement the Hitchhiker erasure coding algorithm - Key: HADOOP-11828 URL: https://issues.apache.org/jira/browse/HADOOP-11828 Project: Hadoop Common Issue Type: Sub-task Reporter: Zhe Zhang Assignee: jack liuquan Attachments: 7715-hitchhikerXOR-v2-testcode.patch, 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, HDFS-7715-hhxor-decoder.patch, HDFS-7715-hhxor-encoder.patch [Hitchhiker | http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is a new erasure coding algorithm developed as a research project at UC Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% during data reconstruction. This JIRA aims to introduce Hitchhiker to the HDFS-EC framework, as one of the pluggable codec algorithms. The existing implementation is based on HDFS-RAID. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm
[ https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508733#comment-14508733 ] Kai Zheng commented on HADOOP-11828: Jack, good work. Thanks! * Please rebase with latest branch. Your codebase is rather old. * Please remove codes for other modes for now, even in tests. Implement the Hitchhiker erasure coding algorithm - Key: HADOOP-11828 URL: https://issues.apache.org/jira/browse/HADOOP-11828 Project: Hadoop Common Issue Type: Sub-task Reporter: Zhe Zhang Assignee: jack liuquan Attachments: 7715-hitchhikerXOR-v2-testcode.patch, 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, HDFS-7715-hhxor-decoder.patch, HDFS-7715-hhxor-encoder.patch [Hitchhiker | http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is a new erasure coding algorithm developed as a research project at UC Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% during data reconstruction. This JIRA aims to introduce Hitchhiker to the HDFS-EC framework, as one of the pluggable codec algorithms. The existing implementation is based on HDFS-RAID. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm
[ https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505125#comment-14505125 ] jack liuquan commented on HADOOP-11828: --- Hi kai, I have uploaded a new patch. I think you are right, maybe the hitchhiker in ErasureCoder layer is better. So in the new patch, I move hitchhiker-XOR to the ErasureCoder layer. Please review the codes, Thanks a lot! Implement the Hitchhiker erasure coding algorithm - Key: HADOOP-11828 URL: https://issues.apache.org/jira/browse/HADOOP-11828 Project: Hadoop Common Issue Type: Sub-task Reporter: Zhe Zhang Assignee: jack liuquan Attachments: 7715-hitchhikerXOR-v2-testcode.patch, 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, HDFS-7715-hhxor-decoder.patch, HDFS-7715-hhxor-encoder.patch [Hitchhiker | http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is a new erasure coding algorithm developed as a research project at UC Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% during data reconstruction. This JIRA aims to introduce Hitchhiker to the HDFS-EC framework, as one of the pluggable codec algorithms. The existing implementation is based on HDFS-RAID. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm
[ https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505901#comment-14505901 ] Kai Zheng commented on HADOOP-11828: Thanks [~jack_liuquan] for the update! bq.I think you are right, maybe the hitchhiker in ErasureCoder layer is better. Sounds great to me. I mentioned that in my rough thought, would you clarify it in details and provide your thoughts? This would be desired as it's a rather major design change. bq.So in the new patch, I move hitchhiker-XOR to the ErasureCoder layer. It's great to have this version. I will read it recently. Thanks! Implement the Hitchhiker erasure coding algorithm - Key: HADOOP-11828 URL: https://issues.apache.org/jira/browse/HADOOP-11828 Project: Hadoop Common Issue Type: Sub-task Reporter: Zhe Zhang Assignee: jack liuquan Attachments: 7715-hitchhikerXOR-v2-testcode.patch, 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, HDFS-7715-hhxor-decoder.patch, HDFS-7715-hhxor-encoder.patch [Hitchhiker | http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is a new erasure coding algorithm developed as a research project at UC Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% during data reconstruction. This JIRA aims to introduce Hitchhiker to the HDFS-EC framework, as one of the pluggable codec algorithms. The existing implementation is based on HDFS-RAID. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm
[ https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14492780#comment-14492780 ] Zhe Zhang commented on HADOOP-11828: Thanks Uma for moving this to COMMON; good call. Just to quickly confirm, do we plan to change DataNode in this work? Implement the Hitchhiker erasure coding algorithm - Key: HADOOP-11828 URL: https://issues.apache.org/jira/browse/HADOOP-11828 Project: Hadoop Common Issue Type: Sub-task Reporter: Zhe Zhang Assignee: jack liuquan Attachments: 7715-hitchhikerXOR-v2-testcode.patch, 7715-hitchhikerXOR-v2.patch, HDFS-7715-hhxor-decoder.patch, HDFS-7715-hhxor-encoder.patch [Hitchhiker | http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is a new erasure coding algorithm developed as a research project at UC Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% during data reconstruction. This JIRA aims to introduce Hitchhiker to the HDFS-EC framework, as one of the pluggable codec algorithms. The existing implementation is based on HDFS-RAID. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm
[ https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14493361#comment-14493361 ] Kai Zheng commented on HADOOP-11828: bq.do we plan to change DataNode in this work? Hi [~zhz], it's not clear to us yet. Is it possible to have a separate issue if determined later? This would focus on the core algorithm or codec part. Implement the Hitchhiker erasure coding algorithm - Key: HADOOP-11828 URL: https://issues.apache.org/jira/browse/HADOOP-11828 Project: Hadoop Common Issue Type: Sub-task Reporter: Zhe Zhang Assignee: jack liuquan Attachments: 7715-hitchhikerXOR-v2-testcode.patch, 7715-hitchhikerXOR-v2.patch, HDFS-7715-hhxor-decoder.patch, HDFS-7715-hhxor-encoder.patch [Hitchhiker | http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is a new erasure coding algorithm developed as a research project at UC Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% during data reconstruction. This JIRA aims to introduce Hitchhiker to the HDFS-EC framework, as one of the pluggable codec algorithms. The existing implementation is based on HDFS-RAID. -- This message was sent by Atlassian JIRA (v6.3.4#6332)