[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm

2017-05-20 Thread Rashmi Vinayak (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16018603#comment-16018603
 ] 

Rashmi Vinayak commented on HADOOP-11828:
-

Hi [~drankye],

Does [HDFS-7337] address the above issue?

Thanks,
Rashmi

> Implement the Hitchhiker erasure coding algorithm
> -
>
> Key: HADOOP-11828
> URL: https://issues.apache.org/jira/browse/HADOOP-11828
> Project: Hadoop Common
>  Issue Type: Sub-task
>Affects Versions: 3.0.0-alpha1
>Reporter: Zhe Zhang
>Assignee: jack liuquan
> Fix For: 3.0.0-alpha1
>
> Attachments: 7715-hitchhikerXOR-v2.patch, 
> 7715-hitchhikerXOR-v2-testcode.patch, HADOOP-11828-hitchhikerXOR-V3.patch, 
> HADOOP-11828-hitchhikerXOR-V4.patch, HADOOP-11828-hitchhikerXOR-V5.patch, 
> HADOOP-11828-hitchhikerXOR-V6.patch, HADOOP-11828-hitchhikerXOR-V7.patch, 
> HADOOP-11828-v8.patch, HDFS-7715-hhxor-decoder.patch, 
> HDFS-7715-hhxor-encoder.patch
>
>
> [Hitchhiker | 
> http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is 
> a new erasure coding algorithm developed as a research project at UC 
> Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% 
> during data reconstruction while retaining the same storage capacity and 
> failure tolerance capability as RS codes. This JIRA aims to introduce 
> Hitchhiker to the HDFS-EC framework, as one of the pluggable codec algorithms.
> The existing implementation is based on HDFS-RAID. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm

2016-09-13 Thread Kai Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15488649#comment-15488649
 ] 

Kai Zheng commented on HADOOP-11828:


bq. I want to know weather Hitch Hiker is attached to Hadoop or not and if it 
is attached I want to know more about its commands.
Good question. Yes and no. Hitchhiker coder is already in the codebase in 
Hadoop common side, but not attached to HDFS side yet. Currently HDFS uses raw 
erasure coder API doing all the work, but HH coder is implemented in erasure 
coder API. Evolving HDFS towards using erasure coders is still on going, but 
kept as low priority in my side. I'd like to provide some help if anyone would 
push this, though. 

> Implement the Hitchhiker erasure coding algorithm
> -
>
> Key: HADOOP-11828
> URL: https://issues.apache.org/jira/browse/HADOOP-11828
> Project: Hadoop Common
>  Issue Type: Sub-task
>Affects Versions: 3.0.0-alpha1
>Reporter: Zhe Zhang
>Assignee: jack liuquan
> Fix For: 3.0.0-alpha1
>
> Attachments: 7715-hitchhikerXOR-v2-testcode.patch, 
> 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, 
> HADOOP-11828-hitchhikerXOR-V4.patch, HADOOP-11828-hitchhikerXOR-V5.patch, 
> HADOOP-11828-hitchhikerXOR-V6.patch, HADOOP-11828-hitchhikerXOR-V7.patch, 
> HADOOP-11828-v8.patch, HDFS-7715-hhxor-decoder.patch, 
> HDFS-7715-hhxor-encoder.patch
>
>
> [Hitchhiker | 
> http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is 
> a new erasure coding algorithm developed as a research project at UC 
> Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% 
> during data reconstruction while retaining the same storage capacity and 
> failure tolerance capability as RS codes. This JIRA aims to introduce 
> Hitchhiker to the HDFS-EC framework, as one of the pluggable codec algorithms.
> The existing implementation is based on HDFS-RAID. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm

2016-09-13 Thread Kai Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15488623#comment-15488623
 ] 

Kai Zheng commented on HADOOP-11828:


Hi [~rashmikv], 

Sorry for not seeing your question and the late reply. By compatible or 
inter-operable, I mean the encoded data by a coder can be decoded by another 
coder. The new Java coder is compatible with the native ISA-L coder in this 
sense. In some environment in your clients you may not have Hadoop native setup 
so only pure Java solution will do the work, but your other clients and the 
cluster use Hadoop native so the native ISA-L based coder can work for better 
performance. In other words, it's possible to use more than one coder 
implementations in a cluster and its clients, so the requirement for such 
implementation should be compatible with others, otherwise the data will be 
messy.

> Implement the Hitchhiker erasure coding algorithm
> -
>
> Key: HADOOP-11828
> URL: https://issues.apache.org/jira/browse/HADOOP-11828
> Project: Hadoop Common
>  Issue Type: Sub-task
>Affects Versions: 3.0.0-alpha1
>Reporter: Zhe Zhang
>Assignee: jack liuquan
> Fix For: 3.0.0-alpha1
>
> Attachments: 7715-hitchhikerXOR-v2-testcode.patch, 
> 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, 
> HADOOP-11828-hitchhikerXOR-V4.patch, HADOOP-11828-hitchhikerXOR-V5.patch, 
> HADOOP-11828-hitchhikerXOR-V6.patch, HADOOP-11828-hitchhikerXOR-V7.patch, 
> HADOOP-11828-v8.patch, HDFS-7715-hhxor-decoder.patch, 
> HDFS-7715-hhxor-encoder.patch
>
>
> [Hitchhiker | 
> http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is 
> a new erasure coding algorithm developed as a research project at UC 
> Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% 
> during data reconstruction while retaining the same storage capacity and 
> failure tolerance capability as RS codes. This JIRA aims to introduce 
> Hitchhiker to the HDFS-EC framework, as one of the pluggable codec algorithms.
> The existing implementation is based on HDFS-RAID. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm

2016-09-01 Thread Mehran Mahbuti (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15456547#comment-15456547
 ] 

Mehran Mahbuti commented on HADOOP-11828:
-

I want to know weather Hitch Hiker is attached to Hadoop or not and if it is 
attached I want to know more about its commands.

> Implement the Hitchhiker erasure coding algorithm
> -
>
> Key: HADOOP-11828
> URL: https://issues.apache.org/jira/browse/HADOOP-11828
> Project: Hadoop Common
>  Issue Type: Sub-task
>Affects Versions: 3.0.0-alpha1
>Reporter: Zhe Zhang
>Assignee: jack liuquan
> Fix For: 3.0.0-alpha1
>
> Attachments: 7715-hitchhikerXOR-v2-testcode.patch, 
> 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, 
> HADOOP-11828-hitchhikerXOR-V4.patch, HADOOP-11828-hitchhikerXOR-V5.patch, 
> HADOOP-11828-hitchhikerXOR-V6.patch, HADOOP-11828-hitchhikerXOR-V7.patch, 
> HADOOP-11828-v8.patch, HDFS-7715-hhxor-decoder.patch, 
> HDFS-7715-hhxor-encoder.patch
>
>
> [Hitchhiker | 
> http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is 
> a new erasure coding algorithm developed as a research project at UC 
> Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% 
> during data reconstruction while retaining the same storage capacity and 
> failure tolerance capability as RS codes. This JIRA aims to introduce 
> Hitchhiker to the HDFS-EC framework, as one of the pluggable codec algorithms.
> The existing implementation is based on HDFS-RAID. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm

2016-02-19 Thread Rashmi Vinayak (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15154603#comment-15154603
 ] 

Rashmi Vinayak commented on HADOOP-11828:
-

Thanks a lot for the clarifications, [~jack_liuquan] and [~drankye]! I was 
concerned by the name containing the term 'Raw' in it as in my understanding, 
'xRawCoders' stand for pure Java ones and the the plain 'xCoders' stand for the 
ISA-L ones.

bq. RSRawDecoder2 is the new Java one (compatible with ISA-L coder)

Could you please clarify what 'compatible with ISA-L coder' means? Does it mean 
that when you use RSRawDecoder2, it uses ISA-L implementation if the hardware 
allows and if not uses the Java one? Or is this selection done through 
Configuration too?

> Implement the Hitchhiker erasure coding algorithm
> -
>
> Key: HADOOP-11828
> URL: https://issues.apache.org/jira/browse/HADOOP-11828
> Project: Hadoop Common
>  Issue Type: Sub-task
>Affects Versions: 3.0.0
>Reporter: Zhe Zhang
>Assignee: jack liuquan
> Fix For: 3.0.0
>
> Attachments: 7715-hitchhikerXOR-v2-testcode.patch, 
> 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, 
> HADOOP-11828-hitchhikerXOR-V4.patch, HADOOP-11828-hitchhikerXOR-V5.patch, 
> HADOOP-11828-hitchhikerXOR-V6.patch, HADOOP-11828-hitchhikerXOR-V7.patch, 
> HADOOP-11828-v8.patch, HDFS-7715-hhxor-decoder.patch, 
> HDFS-7715-hhxor-encoder.patch
>
>
> [Hitchhiker | 
> http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is 
> a new erasure coding algorithm developed as a research project at UC 
> Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% 
> during data reconstruction while retaining the same storage capacity and 
> failure tolerance capability as RS codes. This JIRA aims to introduce 
> Hitchhiker to the HDFS-EC framework, as one of the pluggable codec algorithms.
> The existing implementation is based on HDFS-RAID. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm

2016-02-18 Thread jack liuquan (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15153512#comment-15153512
 ] 

jack liuquan commented on HADOOP-11828:
---

Hi [~rashmikv],
Kai is right, current hitchhiker coders are not couple with specific RS 
implementation. It picks up the concrete implementation from configuration.
you can see {{HHXORErasureDecoder.java}} for more details.

  private RawErasureDecoder checkCreateRSRawDecoder() {
if (rsRawDecoder == null) {
  rsRawDecoder = CodecUtil.createRSRawDecoder(getConf(),
  getNumDataUnits(), getNumParityUnits());
}
return rsRawDecoder;
  }

Thanks,
Jack

> Implement the Hitchhiker erasure coding algorithm
> -
>
> Key: HADOOP-11828
> URL: https://issues.apache.org/jira/browse/HADOOP-11828
> Project: Hadoop Common
>  Issue Type: Sub-task
>Affects Versions: 3.0.0
>Reporter: Zhe Zhang
>Assignee: jack liuquan
> Fix For: 3.0.0
>
> Attachments: 7715-hitchhikerXOR-v2-testcode.patch, 
> 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, 
> HADOOP-11828-hitchhikerXOR-V4.patch, HADOOP-11828-hitchhikerXOR-V5.patch, 
> HADOOP-11828-hitchhikerXOR-V6.patch, HADOOP-11828-hitchhikerXOR-V7.patch, 
> HADOOP-11828-v8.patch, HDFS-7715-hhxor-decoder.patch, 
> HDFS-7715-hhxor-encoder.patch
>
>
> [Hitchhiker | 
> http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is 
> a new erasure coding algorithm developed as a research project at UC 
> Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% 
> during data reconstruction while retaining the same storage capacity and 
> failure tolerance capability as RS codes. This JIRA aims to introduce 
> Hitchhiker to the HDFS-EC framework, as one of the pluggable codec algorithms.
> The existing implementation is based on HDFS-RAID. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm

2016-02-18 Thread Kai Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15153501#comment-15153501
 ] 

Kai Zheng commented on HADOOP-11828:


Hi Rashmi,

bq. When looking at HHXORErasureDecodingStep, I see that 'RSRawDecoder' is 
being used.
I don't see this but I'm sure HH coders should not couple with specific RS 
implementation. It picks up the concrete implementation from configuration.
bq. Isn't 'RSRawDecoder' the older java implementation borrowed from Facebook's 
HDFS-RAID? I was under the impression that RSRawDecoder is the older version 
and RSRawDecoder2 is the newer version. Is it not so?
You're almost right. Yeah right now RSRawDecoder is the old (from HDFS-RAID) 
and RSRawDecoder2 is the new Java one (compatible with ISA-L coder). Very soon 
the two coders will be renamed, ref. HADOOP-12808.

> Implement the Hitchhiker erasure coding algorithm
> -
>
> Key: HADOOP-11828
> URL: https://issues.apache.org/jira/browse/HADOOP-11828
> Project: Hadoop Common
>  Issue Type: Sub-task
>Affects Versions: 3.0.0
>Reporter: Zhe Zhang
>Assignee: jack liuquan
> Fix For: 3.0.0
>
> Attachments: 7715-hitchhikerXOR-v2-testcode.patch, 
> 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, 
> HADOOP-11828-hitchhikerXOR-V4.patch, HADOOP-11828-hitchhikerXOR-V5.patch, 
> HADOOP-11828-hitchhikerXOR-V6.patch, HADOOP-11828-hitchhikerXOR-V7.patch, 
> HADOOP-11828-v8.patch, HDFS-7715-hhxor-decoder.patch, 
> HDFS-7715-hhxor-encoder.patch
>
>
> [Hitchhiker | 
> http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is 
> a new erasure coding algorithm developed as a research project at UC 
> Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% 
> during data reconstruction while retaining the same storage capacity and 
> failure tolerance capability as RS codes. This JIRA aims to introduce 
> Hitchhiker to the HDFS-EC framework, as one of the pluggable codec algorithms.
> The existing implementation is based on HDFS-RAID. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm

2016-02-18 Thread Rashmi Vinayak (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15153339#comment-15153339
 ] 

Rashmi Vinayak commented on HADOOP-11828:
-

Hi [~jack_liuquan], [~drankye],

Is there a way to check if the encoders and decoders are using the ISA-L based 
implementation and not the Java ones? When looking at HHXORErasureDecodingStep, 
I see that 'RSRawDecoder' is being used. Isn't 'RSRawDecoder' the older java 
implementation borrowed from Facebook's HDFS-RAID? I was under the impression 
that RSRawDecoder is the older version and RSRawDecoder2 is the newer version. 
Is it not so?

Thanks,
Rashmi

> Implement the Hitchhiker erasure coding algorithm
> -
>
> Key: HADOOP-11828
> URL: https://issues.apache.org/jira/browse/HADOOP-11828
> Project: Hadoop Common
>  Issue Type: Sub-task
>Affects Versions: 3.0.0
>Reporter: Zhe Zhang
>Assignee: jack liuquan
> Fix For: 3.0.0
>
> Attachments: 7715-hitchhikerXOR-v2-testcode.patch, 
> 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, 
> HADOOP-11828-hitchhikerXOR-V4.patch, HADOOP-11828-hitchhikerXOR-V5.patch, 
> HADOOP-11828-hitchhikerXOR-V6.patch, HADOOP-11828-hitchhikerXOR-V7.patch, 
> HADOOP-11828-v8.patch, HDFS-7715-hhxor-decoder.patch, 
> HDFS-7715-hhxor-encoder.patch
>
>
> [Hitchhiker | 
> http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is 
> a new erasure coding algorithm developed as a research project at UC 
> Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% 
> during data reconstruction while retaining the same storage capacity and 
> failure tolerance capability as RS codes. This JIRA aims to introduce 
> Hitchhiker to the HDFS-EC framework, as one of the pluggable codec algorithms.
> The existing implementation is based on HDFS-RAID. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm

2016-02-09 Thread Rashmi Vinayak (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15140359#comment-15140359
 ] 

Rashmi Vinayak commented on HADOOP-11828:
-

Hi [~jack_liuquan], [~drankye], [~zhz],

I am super excited to see this being resolved! Thank you all for the efforts 
that you put in. I agree with [~zhz] that it would be good to get some 
performance results comparing RS and Hitchhiker based on the new 
implementation. This would guide enterprises who are considering using erasure 
coding, and thus leading to a greater impact from this effort and HDFS-EC in 
general as they will come to know about this more efficient EC option. 

> Implement the Hitchhiker erasure coding algorithm
> -
>
> Key: HADOOP-11828
> URL: https://issues.apache.org/jira/browse/HADOOP-11828
> Project: Hadoop Common
>  Issue Type: Sub-task
>Affects Versions: 3.0.0
>Reporter: Zhe Zhang
>Assignee: jack liuquan
> Fix For: 3.0.0
>
> Attachments: 7715-hitchhikerXOR-v2-testcode.patch, 
> 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, 
> HADOOP-11828-hitchhikerXOR-V4.patch, HADOOP-11828-hitchhikerXOR-V5.patch, 
> HADOOP-11828-hitchhikerXOR-V6.patch, HADOOP-11828-hitchhikerXOR-V7.patch, 
> HADOOP-11828-v8.patch, HDFS-7715-hhxor-decoder.patch, 
> HDFS-7715-hhxor-encoder.patch
>
>
> [Hitchhiker | 
> http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is 
> a new erasure coding algorithm developed as a research project at UC 
> Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% 
> during data reconstruction while retaining the same storage capacity and 
> failure tolerance capability as RS codes. This JIRA aims to introduce 
> Hitchhiker to the HDFS-EC framework, as one of the pluggable codec algorithms.
> The existing implementation is based on HDFS-RAID. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm

2016-01-21 Thread jack liuquan (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15110481#comment-15110481
 ] 

jack liuquan commented on HADOOP-11828:
---

Hi [~zhz],
Thanks for your review!
bq.For an util class with all static methods, we don't need a constructor.
I add a private constructor for checkstyle issue, just reference to code of 
{{DumpUtil}} class in {{rawcoder}}
bq.The getPiggyBacksFromInput method is fairly complex and deserves a unit 
test. A ASCII illustration would also be very helpful, similar to Figure 4 in 
the Hitchhiker paper .
Although {{getPiggyBacksFromInput }} is fairly complex, there only one running 
branch in it. I think current unit test cases are good to cover it. I will add 
a ASCII illustration for it.
bq.3.Maybe I'm missing something, but how do we guarantee the length of inputs 
passed to performCoding is always numDataUnits * subPacketSize?
As Rashmi said, subPacketSize is always 2 in Hitchhiker. I think we can 
guarantee it when we preparing block chunks.

> Implement the Hitchhiker erasure coding algorithm
> -
>
> Key: HADOOP-11828
> URL: https://issues.apache.org/jira/browse/HADOOP-11828
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Zhe Zhang
>Assignee: jack liuquan
> Attachments: 7715-hitchhikerXOR-v2-testcode.patch, 
> 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, 
> HADOOP-11828-hitchhikerXOR-V4.patch, HADOOP-11828-hitchhikerXOR-V5.patch, 
> HADOOP-11828-hitchhikerXOR-V6.patch, HADOOP-11828-hitchhikerXOR-V7.patch, 
> HDFS-7715-hhxor-decoder.patch, HDFS-7715-hhxor-encoder.patch
>
>
> [Hitchhiker | 
> http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is 
> a new erasure coding algorithm developed as a research project at UC 
> Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% 
> during data reconstruction while retaining the same storage capacity and 
> failure tolerance capability as RS codes. This JIRA aims to introduce 
> Hitchhiker to the HDFS-EC framework, as one of the pluggable codec algorithms.
> The existing implementation is based on HDFS-RAID. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm

2016-01-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15111085#comment-15111085
 ] 

Hudson commented on HADOOP-11828:
-

FAILURE: Integrated in Hadoop-trunk-Commit #9153 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/9153/])
HADOOP-11828. Implement the Hitchhiker erasure coding algorithm. (zhz: rev 
1bb31fb22e6f8e6df8e9ff4e94adf20308b4c743)
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/erasurecode/coder/util/HHUtil.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/erasurecode/coder/AbstractErasureDecoder.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/erasurecode/coder/HHXORErasureDecodingStep.java
* hadoop-common-project/hadoop-common/CHANGES.txt
* 
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/io/erasurecode/coder/TestHHErasureCoderBase.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/erasurecode/coder/HHXORErasureDecoder.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/erasurecode/coder/AbstractHHErasureCodingStep.java
* 
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/io/erasurecode/coder/TestErasureCoderBase.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/erasurecode/coder/HHXORErasureEncodingStep.java
* 
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/io/erasurecode/coder/TestHHXORErasureCoder.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/erasurecode/coder/HHXORErasureEncoder.java


> Implement the Hitchhiker erasure coding algorithm
> -
>
> Key: HADOOP-11828
> URL: https://issues.apache.org/jira/browse/HADOOP-11828
> Project: Hadoop Common
>  Issue Type: Sub-task
>Affects Versions: 3.0.0
>Reporter: Zhe Zhang
>Assignee: jack liuquan
> Fix For: 3.0.0
>
> Attachments: 7715-hitchhikerXOR-v2-testcode.patch, 
> 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, 
> HADOOP-11828-hitchhikerXOR-V4.patch, HADOOP-11828-hitchhikerXOR-V5.patch, 
> HADOOP-11828-hitchhikerXOR-V6.patch, HADOOP-11828-hitchhikerXOR-V7.patch, 
> HADOOP-11828-v8.patch, HDFS-7715-hhxor-decoder.patch, 
> HDFS-7715-hhxor-encoder.patch
>
>
> [Hitchhiker | 
> http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is 
> a new erasure coding algorithm developed as a research project at UC 
> Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% 
> during data reconstruction while retaining the same storage capacity and 
> failure tolerance capability as RS codes. This JIRA aims to introduce 
> Hitchhiker to the HDFS-EC framework, as one of the pluggable codec algorithms.
> The existing implementation is based on HDFS-RAID. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm

2016-01-20 Thread Rashmi Vinayak (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15108208#comment-15108208
 ] 

Rashmi Vinayak commented on HADOOP-11828:
-

Hi [~zhz],
Regarding you second question:
bq. 2. Could SUB_PACKET_SIZE be other than 2? If so, should we still keep it as 
a variable?
Sub-packet-size is always 2 in Hitchhiker. So we may perhaps have it as a 
constant?


> Implement the Hitchhiker erasure coding algorithm
> -
>
> Key: HADOOP-11828
> URL: https://issues.apache.org/jira/browse/HADOOP-11828
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Zhe Zhang
>Assignee: jack liuquan
> Attachments: 7715-hitchhikerXOR-v2-testcode.patch, 
> 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, 
> HADOOP-11828-hitchhikerXOR-V4.patch, HADOOP-11828-hitchhikerXOR-V5.patch, 
> HADOOP-11828-hitchhikerXOR-V6.patch, HADOOP-11828-hitchhikerXOR-V7.patch, 
> HDFS-7715-hhxor-decoder.patch, HDFS-7715-hhxor-encoder.patch
>
>
> [Hitchhiker | 
> http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is 
> a new erasure coding algorithm developed as a research project at UC 
> Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% 
> during data reconstruction while retaining the same storage capacity and 
> failure tolerance capability as RS codes. This JIRA aims to introduce 
> Hitchhiker to the HDFS-EC framework, as one of the pluggable codec algorithms.
> The existing implementation is based on HDFS-RAID. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm

2016-01-19 Thread jack liuquan (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15107878#comment-15107878
 ] 

jack liuquan commented on HADOOP-11828:
---

Thanks ,kai.
I'm just confused about that where need to add, where not need.
I find not all the package added the package-info.java file.

> Implement the Hitchhiker erasure coding algorithm
> -
>
> Key: HADOOP-11828
> URL: https://issues.apache.org/jira/browse/HADOOP-11828
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Zhe Zhang
>Assignee: jack liuquan
> Attachments: 7715-hitchhikerXOR-v2-testcode.patch, 
> 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, 
> HADOOP-11828-hitchhikerXOR-V4.patch, HADOOP-11828-hitchhikerXOR-V5.patch, 
> HADOOP-11828-hitchhikerXOR-V6.patch, HADOOP-11828-hitchhikerXOR-V7.patch, 
> HDFS-7715-hhxor-decoder.patch, HDFS-7715-hhxor-encoder.patch
>
>
> [Hitchhiker | 
> http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is 
> a new erasure coding algorithm developed as a research project at UC 
> Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% 
> during data reconstruction while retaining the same storage capacity and 
> failure tolerance capability as RS codes. This JIRA aims to introduce 
> Hitchhiker to the HDFS-EC framework, as one of the pluggable codec algorithms.
> The existing implementation is based on HDFS-RAID. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm

2016-01-19 Thread jack liuquan (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15107861#comment-15107861
 ] 

jack liuquan commented on HADOOP-11828:
---

Hi, [~zhz] and [~walter.k.su], have you reviewed the codes?
let me know if you have finished review.Thanks!
Hi, kai, the last checkstyle issue is below:
./hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/erasurecode/coder/util/HHUtil.java:0::
 Missing package-info.java file.
How can I ignore this ussue or Do I really need to add a  package-info.java 
file?


> Implement the Hitchhiker erasure coding algorithm
> -
>
> Key: HADOOP-11828
> URL: https://issues.apache.org/jira/browse/HADOOP-11828
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Zhe Zhang
>Assignee: jack liuquan
> Attachments: 7715-hitchhikerXOR-v2-testcode.patch, 
> 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, 
> HADOOP-11828-hitchhikerXOR-V4.patch, HADOOP-11828-hitchhikerXOR-V5.patch, 
> HADOOP-11828-hitchhikerXOR-V6.patch, HADOOP-11828-hitchhikerXOR-V7.patch, 
> HDFS-7715-hhxor-decoder.patch, HDFS-7715-hhxor-encoder.patch
>
>
> [Hitchhiker | 
> http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is 
> a new erasure coding algorithm developed as a research project at UC 
> Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% 
> during data reconstruction while retaining the same storage capacity and 
> failure tolerance capability as RS codes. This JIRA aims to introduce 
> Hitchhiker to the HDFS-EC framework, as one of the pluggable codec algorithms.
> The existing implementation is based on HDFS-RAID. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm

2016-01-19 Thread Kai Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15107869#comment-15107869
 ] 

Kai Zheng commented on HADOOP-11828:


bq. Do I really need to add a package-info.java file?
I thought it's easy to have one :). Perhaps you can add it in next update.

> Implement the Hitchhiker erasure coding algorithm
> -
>
> Key: HADOOP-11828
> URL: https://issues.apache.org/jira/browse/HADOOP-11828
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Zhe Zhang
>Assignee: jack liuquan
> Attachments: 7715-hitchhikerXOR-v2-testcode.patch, 
> 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, 
> HADOOP-11828-hitchhikerXOR-V4.patch, HADOOP-11828-hitchhikerXOR-V5.patch, 
> HADOOP-11828-hitchhikerXOR-V6.patch, HADOOP-11828-hitchhikerXOR-V7.patch, 
> HDFS-7715-hhxor-decoder.patch, HDFS-7715-hhxor-encoder.patch
>
>
> [Hitchhiker | 
> http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is 
> a new erasure coding algorithm developed as a research project at UC 
> Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% 
> during data reconstruction while retaining the same storage capacity and 
> failure tolerance capability as RS codes. This JIRA aims to introduce 
> Hitchhiker to the HDFS-EC framework, as one of the pluggable codec algorithms.
> The existing implementation is based on HDFS-RAID. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm

2016-01-19 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15108115#comment-15108115
 ] 

Zhe Zhang commented on HADOOP-11828:


Thanks Jack for the great work, and Rashmi / Kai for the reviews! The patch and 
it LGTM overall. I about about to finish and post my review. Meanwhile, could 
you add Apache license header to {{HHXORErasureDecodingStep}} (see this 
[complain | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/8425/artifact/patchprocess/patch-asflicense-problems.txt]),
 and add a trivial {{package-info}} for now (as Kai suggested)? The purpose of 
the class is for Javadoc and I do think we will add Javadoc for the new package 
later.

> Implement the Hitchhiker erasure coding algorithm
> -
>
> Key: HADOOP-11828
> URL: https://issues.apache.org/jira/browse/HADOOP-11828
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Zhe Zhang
>Assignee: jack liuquan
> Attachments: 7715-hitchhikerXOR-v2-testcode.patch, 
> 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, 
> HADOOP-11828-hitchhikerXOR-V4.patch, HADOOP-11828-hitchhikerXOR-V5.patch, 
> HADOOP-11828-hitchhikerXOR-V6.patch, HADOOP-11828-hitchhikerXOR-V7.patch, 
> HDFS-7715-hhxor-decoder.patch, HDFS-7715-hhxor-encoder.patch
>
>
> [Hitchhiker | 
> http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is 
> a new erasure coding algorithm developed as a research project at UC 
> Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% 
> during data reconstruction while retaining the same storage capacity and 
> failure tolerance capability as RS codes. This JIRA aims to introduce 
> Hitchhiker to the HDFS-EC framework, as one of the pluggable codec algorithms.
> The existing implementation is based on HDFS-RAID. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm

2016-01-19 Thread Kai Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15107951#comment-15107951
 ] 

Kai Zheng commented on HADOOP-11828:


bq. I find not all the package added the package-info.java file.
That's right. You probably don't find it in existing packages. That's fine and 
people won't complain. But if someone would complement and provide it for 
existing packages, it would be welcome, and I guess the check style counter 
would be decremented.
bq. I'm just confused about that where need to add, where not need.
Well, if you introduce a new package, then probably you need to add it, because 
introduce a new check style (counter +1).

IMO, {{package-info}} for important package makes sense, but for an initial 
package, it would be OK to delay on adding it. But  since it's required by the 
check for now, simply, maybe just to add it. :(

> Implement the Hitchhiker erasure coding algorithm
> -
>
> Key: HADOOP-11828
> URL: https://issues.apache.org/jira/browse/HADOOP-11828
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Zhe Zhang
>Assignee: jack liuquan
> Attachments: 7715-hitchhikerXOR-v2-testcode.patch, 
> 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, 
> HADOOP-11828-hitchhikerXOR-V4.patch, HADOOP-11828-hitchhikerXOR-V5.patch, 
> HADOOP-11828-hitchhikerXOR-V6.patch, HADOOP-11828-hitchhikerXOR-V7.patch, 
> HDFS-7715-hhxor-decoder.patch, HDFS-7715-hhxor-encoder.patch
>
>
> [Hitchhiker | 
> http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is 
> a new erasure coding algorithm developed as a research project at UC 
> Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% 
> during data reconstruction while retaining the same storage capacity and 
> failure tolerance capability as RS codes. This JIRA aims to introduce 
> Hitchhiker to the HDFS-EC framework, as one of the pluggable codec algorithms.
> The existing implementation is based on HDFS-RAID. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm

2016-01-19 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15108177#comment-15108177
 ] 

Zhe Zhang commented on HADOOP-11828:


I finished reviewing the patch and it LGTM overall. Given that this is a new 
coder and doesn't modify existing code, I'm +1 on committing the latest patch 
pending a fix on the license header and checkstyle issues, and the following 
minor issue:
{code}
  private HHUtil() {
// No called
  }
{code}
For an util class with all static methods, we don't need a constructor.

Possible follow-on work and questions:
# The {{getPiggyBacksFromInput}} method is fairly complex and deserves a unit 
test. A ASCII illustration would also be very helpful, similar to Figure 4 in 
the Hitchhiker [paper | 
http://eecs.berkeley.edu/~rashmikv/papers/Hitchhiker_SIGCOMM14.pdf].
# Could {{SUB_PACKET_SIZE}} be other than 2? If so, should we still keep it as 
a variable?
# Maybe I'm missing something, but how do we guarantee the length of inputs 
passed to {{performCoding}} is always {{numDataUnits * subPacketSize}}?

> Implement the Hitchhiker erasure coding algorithm
> -
>
> Key: HADOOP-11828
> URL: https://issues.apache.org/jira/browse/HADOOP-11828
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Zhe Zhang
>Assignee: jack liuquan
> Attachments: 7715-hitchhikerXOR-v2-testcode.patch, 
> 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, 
> HADOOP-11828-hitchhikerXOR-V4.patch, HADOOP-11828-hitchhikerXOR-V5.patch, 
> HADOOP-11828-hitchhikerXOR-V6.patch, HADOOP-11828-hitchhikerXOR-V7.patch, 
> HDFS-7715-hhxor-decoder.patch, HDFS-7715-hhxor-encoder.patch
>
>
> [Hitchhiker | 
> http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is 
> a new erasure coding algorithm developed as a research project at UC 
> Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% 
> during data reconstruction while retaining the same storage capacity and 
> failure tolerance capability as RS codes. This JIRA aims to introduce 
> Hitchhiker to the HDFS-EC framework, as one of the pluggable codec algorithms.
> The existing implementation is based on HDFS-RAID. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm

2016-01-15 Thread jack liuquan (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15102998#comment-15102998
 ] 

jack liuquan commented on HADOOP-11828:
---

Hi, kai, 
the latest v7 patch is only updating for repairing the check in the report.
If you think the code is ok, I will proceed two other modes of hh algorithm.

> Implement the Hitchhiker erasure coding algorithm
> -
>
> Key: HADOOP-11828
> URL: https://issues.apache.org/jira/browse/HADOOP-11828
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Zhe Zhang
>Assignee: jack liuquan
> Attachments: 7715-hitchhikerXOR-v2-testcode.patch, 
> 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, 
> HADOOP-11828-hitchhikerXOR-V4.patch, HADOOP-11828-hitchhikerXOR-V5.patch, 
> HADOOP-11828-hitchhikerXOR-V6.patch, HADOOP-11828-hitchhikerXOR-V7.patch, 
> HDFS-7715-hhxor-decoder.patch, HDFS-7715-hhxor-encoder.patch
>
>
> [Hitchhiker | 
> http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is 
> a new erasure coding algorithm developed as a research project at UC 
> Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% 
> during data reconstruction while retaining the same storage capacity and 
> failure tolerance capability as RS codes. This JIRA aims to introduce 
> Hitchhiker to the HDFS-EC framework, as one of the pluggable codec algorithms.
> The existing implementation is based on HDFS-RAID. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm

2016-01-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15103030#comment-15103030
 ] 

Hadoop QA commented on HADOOP-11828:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
27s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 11s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 29s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
18s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 11s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
56s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 57s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 7s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
38s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 34s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 6m 34s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 19s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 19s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 17s 
{color} | {color:red} Patch generated 1 new checkstyle issues in 
hadoop-common-project/hadoop-common (total was 4, now 5). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 4s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 2s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 58s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 7s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 2s 
{color} | {color:green} hadoop-common in the patch passed with JDK v1.8.0_66. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 0s 
{color} | {color:green} hadoop-common in the patch passed with JDK v1.7.0_91. 
{color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 24s 
{color} | {color:red} Patch generated 1 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 67m 53s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12782687/HADOOP-11828-hitchhikerXOR-V7.patch
 |
| JIRA Issue | HADOOP-11828 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 5b83ad083f35 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 2a30386 |
| Default 

[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm

2016-01-15 Thread Kai Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15101540#comment-15101540
 ] 

Kai Zheng commented on HADOOP-11828:


Thanks Jack for the update! Would be good to have a summary about the updates. 
[~zhz] or [~walter.k.su] maybe you could do the final review and get it in if 
everything is ok? There're two other modes I guess Jack could proceed basing on 
this work.

> Implement the Hitchhiker erasure coding algorithm
> -
>
> Key: HADOOP-11828
> URL: https://issues.apache.org/jira/browse/HADOOP-11828
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Zhe Zhang
>Assignee: jack liuquan
> Attachments: 7715-hitchhikerXOR-v2-testcode.patch, 
> 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, 
> HADOOP-11828-hitchhikerXOR-V4.patch, HADOOP-11828-hitchhikerXOR-V5.patch, 
> HADOOP-11828-hitchhikerXOR-V6.patch, HADOOP-11828-hitchhikerXOR-V7.zip, 
> HDFS-7715-hhxor-decoder.patch, HDFS-7715-hhxor-encoder.patch
>
>
> [Hitchhiker | 
> http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is 
> a new erasure coding algorithm developed as a research project at UC 
> Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% 
> during data reconstruction while retaining the same storage capacity and 
> failure tolerance capability as RS codes. This JIRA aims to introduce 
> Hitchhiker to the HDFS-EC framework, as one of the pluggable codec algorithms.
> The existing implementation is based on HDFS-RAID. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm

2016-01-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15090569#comment-15090569
 ] 

Hadoop QA commented on HADOOP-11828:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
9s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 39s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 14s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
17s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 9s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
15s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
56s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 58s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 6s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
39s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 29s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 8m 29s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 15s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 9m 15s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 16s 
{color} | {color:red} Patch generated 8 new checkstyle issues in 
hadoop-common-project/hadoop-common (total was 0, now 8). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 3s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
15s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 5s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 3m 21s 
{color} | {color:red} hadoop-common-project_hadoop-common-jdk1.8.0_66 with JDK 
v1.8.0_66 generated 1 new issues (was 1, now 2). {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 56s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 4m 49s 
{color} | {color:red} hadoop-common-project_hadoop-common-jdk1.7.0_91 with JDK 
v1.7.0_91 generated 1 new issues (was 13, now 14). {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 6s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 17m 10s {color} 
| {color:red} hadoop-common in the patch failed with JDK v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 35s 
{color} | {color:green} hadoop-common in the patch passed with JDK v1.7.0_91. 
{color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 25s 
{color} | {color:red} Patch generated 8 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 84m 23s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_66 Failed junit tests | hadoop.metrics2.impl.TestGangliaMetrics |
| JDK v1.8.0_66 Timed out junit tests | 
org.apache.hadoop.http.TestHttpServerLifecycle |
\\

[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm

2016-01-09 Thread jack liuquan (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15090556#comment-15090556
 ] 

jack liuquan commented on HADOOP-11828:
---

Hi all,
I have update a new patch.
This patch fix the codes after review comments.

> Implement the Hitchhiker erasure coding algorithm
> -
>
> Key: HADOOP-11828
> URL: https://issues.apache.org/jira/browse/HADOOP-11828
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Zhe Zhang
>Assignee: jack liuquan
> Attachments: 7715-hitchhikerXOR-v2-testcode.patch, 
> 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, 
> HADOOP-11828-hitchhikerXOR-V4.patch, HADOOP-11828-hitchhikerXOR-V5.patch, 
> HADOOP-11828-hitchhikerXOR-V6.patch, HDFS-7715-hhxor-decoder.patch, 
> HDFS-7715-hhxor-encoder.patch
>
>
> [Hitchhiker | 
> http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is 
> a new erasure coding algorithm developed as a research project at UC 
> Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% 
> during data reconstruction while retaining the same storage capacity and 
> failure tolerance capability as RS codes. This JIRA aims to introduce 
> Hitchhiker to the HDFS-EC framework, as one of the pluggable codec algorithms.
> The existing implementation is based on HDFS-RAID. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm

2016-01-08 Thread jack liuquan (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15088969#comment-15088969
 ] 

jack liuquan commented on HADOOP-11828:
---

Hi kai,
The report urls of last build have been not available, how can I get the report 
again?


> Implement the Hitchhiker erasure coding algorithm
> -
>
> Key: HADOOP-11828
> URL: https://issues.apache.org/jira/browse/HADOOP-11828
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Zhe Zhang
>Assignee: jack liuquan
> Attachments: 7715-hitchhikerXOR-v2-testcode.patch, 
> 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, 
> HADOOP-11828-hitchhikerXOR-V4.patch, HADOOP-11828-hitchhikerXOR-V5.patch, 
> HDFS-7715-hhxor-decoder.patch, HDFS-7715-hhxor-encoder.patch
>
>
> [Hitchhiker | 
> http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is 
> a new erasure coding algorithm developed as a research project at UC 
> Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% 
> during data reconstruction while retaining the same storage capacity and 
> failure tolerance capability as RS codes. This JIRA aims to introduce 
> Hitchhiker to the HDFS-EC framework, as one of the pluggable codec algorithms.
> The existing implementation is based on HDFS-RAID. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm

2016-01-08 Thread jack liuquan (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15089005#comment-15089005
 ] 

jack liuquan commented on HADOOP-11828:
---

OK, Thanks, Kai.

> Implement the Hitchhiker erasure coding algorithm
> -
>
> Key: HADOOP-11828
> URL: https://issues.apache.org/jira/browse/HADOOP-11828
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Zhe Zhang
>Assignee: jack liuquan
> Attachments: 7715-hitchhikerXOR-v2-testcode.patch, 
> 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, 
> HADOOP-11828-hitchhikerXOR-V4.patch, HADOOP-11828-hitchhikerXOR-V5.patch, 
> HDFS-7715-hhxor-decoder.patch, HDFS-7715-hhxor-encoder.patch
>
>
> [Hitchhiker | 
> http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is 
> a new erasure coding algorithm developed as a research project at UC 
> Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% 
> during data reconstruction while retaining the same storage capacity and 
> failure tolerance capability as RS codes. This JIRA aims to introduce 
> Hitchhiker to the HDFS-EC framework, as one of the pluggable codec algorithms.
> The existing implementation is based on HDFS-RAID. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm

2016-01-08 Thread Kai Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15089001#comment-15089001
 ] 

Kai Zheng commented on HADOOP-11828:


I guess it's flushed out. So you may cancel and then submit your patch again to 
trigger the building again? Or just update the patch to wait for another round.

> Implement the Hitchhiker erasure coding algorithm
> -
>
> Key: HADOOP-11828
> URL: https://issues.apache.org/jira/browse/HADOOP-11828
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Zhe Zhang
>Assignee: jack liuquan
> Attachments: 7715-hitchhikerXOR-v2-testcode.patch, 
> 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, 
> HADOOP-11828-hitchhikerXOR-V4.patch, HADOOP-11828-hitchhikerXOR-V5.patch, 
> HDFS-7715-hhxor-decoder.patch, HDFS-7715-hhxor-encoder.patch
>
>
> [Hitchhiker | 
> http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is 
> a new erasure coding algorithm developed as a research project at UC 
> Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% 
> during data reconstruction while retaining the same storage capacity and 
> failure tolerance capability as RS codes. This JIRA aims to introduce 
> Hitchhiker to the HDFS-EC framework, as one of the pluggable codec algorithms.
> The existing implementation is based on HDFS-RAID. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm

2016-01-06 Thread jack liuquan (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15085301#comment-15085301
 ] 

jack liuquan commented on HADOOP-11828:
---

Hi Kai,
I have seen the update in HADOOP-12685.
I will update a new patch this week.


> Implement the Hitchhiker erasure coding algorithm
> -
>
> Key: HADOOP-11828
> URL: https://issues.apache.org/jira/browse/HADOOP-11828
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Zhe Zhang
>Assignee: jack liuquan
> Attachments: 7715-hitchhikerXOR-v2-testcode.patch, 
> 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, 
> HADOOP-11828-hitchhikerXOR-V4.patch, HADOOP-11828-hitchhikerXOR-V5.patch, 
> HDFS-7715-hhxor-decoder.patch, HDFS-7715-hhxor-encoder.patch
>
>
> [Hitchhiker | 
> http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is 
> a new erasure coding algorithm developed as a research project at UC 
> Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% 
> during data reconstruction while retaining the same storage capacity and 
> failure tolerance capability as RS codes. This JIRA aims to introduce 
> Hitchhiker to the HDFS-EC framework, as one of the pluggable codec algorithms.
> The existing implementation is based on HDFS-RAID. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm

2016-01-05 Thread Kai Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15082654#comment-15082654
 ] 

Kai Zheng commented on HADOOP-11828:


Thanks [~lirui] for the taking to fix the issue mentioned by Jack in 
HADOOP-12685. Jack if you need the fix you might check the patch there.

By the way, do you have any update, Jack? Note there is some progress in 
HDFS-9603 to apply {{ErasureCoder}} in HDFS side. With this done, we can try it 
in HDFS to see the effect. Thanks.

> Implement the Hitchhiker erasure coding algorithm
> -
>
> Key: HADOOP-11828
> URL: https://issues.apache.org/jira/browse/HADOOP-11828
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Zhe Zhang
>Assignee: jack liuquan
> Attachments: 7715-hitchhikerXOR-v2-testcode.patch, 
> 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, 
> HADOOP-11828-hitchhikerXOR-V4.patch, HADOOP-11828-hitchhikerXOR-V5.patch, 
> HDFS-7715-hhxor-decoder.patch, HDFS-7715-hhxor-encoder.patch
>
>
> [Hitchhiker | 
> http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is 
> a new erasure coding algorithm developed as a research project at UC 
> Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% 
> during data reconstruction while retaining the same storage capacity and 
> failure tolerance capability as RS codes. This JIRA aims to introduce 
> Hitchhiker to the HDFS-EC framework, as one of the pluggable codec algorithms.
> The existing implementation is based on HDFS-RAID. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm

2015-12-30 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15075787#comment-15075787
 ] 

Rui Li commented on HADOOP-11828:
-

Just filed HADOOP-12685 for the position inconsistent issue.

> Implement the Hitchhiker erasure coding algorithm
> -
>
> Key: HADOOP-11828
> URL: https://issues.apache.org/jira/browse/HADOOP-11828
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Zhe Zhang
>Assignee: jack liuquan
> Attachments: 7715-hitchhikerXOR-v2-testcode.patch, 
> 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, 
> HADOOP-11828-hitchhikerXOR-V4.patch, HADOOP-11828-hitchhikerXOR-V5.patch, 
> HDFS-7715-hhxor-decoder.patch, HDFS-7715-hhxor-encoder.patch
>
>
> [Hitchhiker | 
> http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is 
> a new erasure coding algorithm developed as a research project at UC 
> Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% 
> during data reconstruction while retaining the same storage capacity and 
> failure tolerance capability as RS codes. This JIRA aims to introduce 
> Hitchhiker to the HDFS-EC framework, as one of the pluggable codec algorithms.
> The existing implementation is based on HDFS-RAID. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm

2015-12-28 Thread jack liuquan (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15073356#comment-15073356
 ] 

jack liuquan commented on HADOOP-11828:
---

Hi Kai, 
Thank you for your reply.I will maintain the same codes of HH layer.

> Implement the Hitchhiker erasure coding algorithm
> -
>
> Key: HADOOP-11828
> URL: https://issues.apache.org/jira/browse/HADOOP-11828
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Zhe Zhang
>Assignee: jack liuquan
> Attachments: 7715-hitchhikerXOR-v2-testcode.patch, 
> 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, 
> HADOOP-11828-hitchhikerXOR-V4.patch, HADOOP-11828-hitchhikerXOR-V5.patch, 
> HDFS-7715-hhxor-decoder.patch, HDFS-7715-hhxor-encoder.patch
>
>
> [Hitchhiker | 
> http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is 
> a new erasure coding algorithm developed as a research project at UC 
> Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% 
> during data reconstruction while retaining the same storage capacity and 
> failure tolerance capability as RS codes. This JIRA aims to introduce 
> Hitchhiker to the HDFS-EC framework, as one of the pluggable codec algorithms.
> The existing implementation is based on HDFS-RAID. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm

2015-12-27 Thread Kai Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072494#comment-15072494
 ] 

Kai Zheng commented on HADOOP-11828:


Hi Jack,

{{ALLOW_CHANGE_INPUTS}} means it can change the input buffers data or content, 
and the buffer's position may still change after consumed. I thought it would 
be good to make it consistent regarding the buffer position change behavior 
between the two buffer types, and I would take care of it in some other issues. 
Anyway you'd better duplicate those input buffers for the {{HH}} layer for 
robust. OK?

> Implement the Hitchhiker erasure coding algorithm
> -
>
> Key: HADOOP-11828
> URL: https://issues.apache.org/jira/browse/HADOOP-11828
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Zhe Zhang
>Assignee: jack liuquan
> Attachments: 7715-hitchhikerXOR-v2-testcode.patch, 
> 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, 
> HADOOP-11828-hitchhikerXOR-V4.patch, HADOOP-11828-hitchhikerXOR-V5.patch, 
> HDFS-7715-hhxor-decoder.patch, HDFS-7715-hhxor-encoder.patch
>
>
> [Hitchhiker | 
> http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is 
> a new erasure coding algorithm developed as a research project at UC 
> Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% 
> during data reconstruction while retaining the same storage capacity and 
> failure tolerance capability as RS codes. This JIRA aims to introduce 
> Hitchhiker to the HDFS-EC framework, as one of the pluggable codec algorithms.
> The existing implementation is based on HDFS-RAID. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm

2015-12-25 Thread jack liuquan (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15071796#comment-15071796
 ] 

jack liuquan commented on HADOOP-11828:
---

Hi Kai,
* In raw erasure coder level, you can set the {{ALLOW_CHANGE_INPUTS}} coder 
option to ensure the input buffers are not changed during encoding/decoding. 
Thus in HH coder level, you don't need to clone the input buffers thus avoids 
data copy.
When I tested, I found that after running encode() of RS, non-direct buffer 
input's position will move to end. but direct buffer input's position will not 
move. Is that be OK?
If I don't clone the input buffers in HH coder level, I will deal with  
non-direct buffer input and move input postion to the begin after I call 
encode() of RS.

> Implement the Hitchhiker erasure coding algorithm
> -
>
> Key: HADOOP-11828
> URL: https://issues.apache.org/jira/browse/HADOOP-11828
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Zhe Zhang
>Assignee: jack liuquan
> Attachments: 7715-hitchhikerXOR-v2-testcode.patch, 
> 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, 
> HADOOP-11828-hitchhikerXOR-V4.patch, HADOOP-11828-hitchhikerXOR-V5.patch, 
> HDFS-7715-hhxor-decoder.patch, HDFS-7715-hhxor-encoder.patch
>
>
> [Hitchhiker | 
> http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is 
> a new erasure coding algorithm developed as a research project at UC 
> Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% 
> during data reconstruction while retaining the same storage capacity and 
> failure tolerance capability as RS codes. This JIRA aims to introduce 
> Hitchhiker to the HDFS-EC framework, as one of the pluggable codec algorithms.
> The existing implementation is based on HDFS-RAID. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm

2015-12-25 Thread jack liuquan (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15071785#comment-15071785
 ] 

jack liuquan commented on HADOOP-11828:
---

Hi Rashmi,
Thanks for your review.
Your comments are great! I will do after your comments.


> Implement the Hitchhiker erasure coding algorithm
> -
>
> Key: HADOOP-11828
> URL: https://issues.apache.org/jira/browse/HADOOP-11828
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Zhe Zhang
>Assignee: jack liuquan
> Attachments: 7715-hitchhikerXOR-v2-testcode.patch, 
> 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, 
> HADOOP-11828-hitchhikerXOR-V4.patch, HADOOP-11828-hitchhikerXOR-V5.patch, 
> HDFS-7715-hhxor-decoder.patch, HDFS-7715-hhxor-encoder.patch
>
>
> [Hitchhiker | 
> http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is 
> a new erasure coding algorithm developed as a research project at UC 
> Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% 
> during data reconstruction while retaining the same storage capacity and 
> failure tolerance capability as RS codes. This JIRA aims to introduce 
> Hitchhiker to the HDFS-EC framework, as one of the pluggable codec algorithms.
> The existing implementation is based on HDFS-RAID. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm

2015-12-22 Thread Rashmi Vinayak (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15069031#comment-15069031
 ] 

Rashmi Vinayak commented on HADOOP-11828:
-

Hi [~jack_liuquan],
Thanks for the great work! I went through the code very carefully for the 
algorithm review. Everything looks fine in terms of correctness. 
Few comments:
1. The name ‘doDecodeMulti’ for the method in HHXORErasureDecodingStep is 
slightly confusing since it handles both the case of multiple erasures and as 
well single parity erasure. Perhaps something on the lines of 
‘doDecodeMultiAndParity’ might reflect the actions of this method more 
accurately? 
2. It seems that there is no need to pass ‘erasedIndexes’  as input to the 
methods in HHXORErasureDecodingStep class since it is a class variable? (You 
might have used these additional inputs for clarity; I just thought of bringing 
this to your attention.) 
3. On a minor side, I think it would be helpful for future readers to include a 
reference to the paper in case they want to understand the algorithm. What do 
you think? (We can have something on the lines: “A "Hitchhiker's" Guide to Fast 
and Efficient Data Reconstruction in Erasure-coded Data Centers”, in ACM 
SIGCOMM 2014.). Also, just to make the context completely clear, could you 
please change the description in the comments to “It has been shown to reduce 
network traffic and disk I/O by 25%-45% during data reconstruction while 
retaining the same storage capacity and failure tolerance capability of RS 
codes.”  (last phrase is added to the existing comment).

Thanks,
Rashmi

> Implement the Hitchhiker erasure coding algorithm
> -
>
> Key: HADOOP-11828
> URL: https://issues.apache.org/jira/browse/HADOOP-11828
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Zhe Zhang
>Assignee: jack liuquan
> Attachments: 7715-hitchhikerXOR-v2-testcode.patch, 
> 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, 
> HADOOP-11828-hitchhikerXOR-V4.patch, HADOOP-11828-hitchhikerXOR-V5.patch, 
> HDFS-7715-hhxor-decoder.patch, HDFS-7715-hhxor-encoder.patch
>
>
> [Hitchhiker | 
> http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is 
> a new erasure coding algorithm developed as a research project at UC 
> Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% 
> during data reconstruction while retaining the same storage capacity and 
> failure tolerance capability as RS codes. This JIRA aims to introduce 
> Hitchhiker to the HDFS-EC framework, as one of the pluggable codec algorithms.
> The existing implementation is based on HDFS-RAID. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm

2015-12-22 Thread jack liuquan (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15067995#comment-15067995
 ] 

jack liuquan commented on HADOOP-11828:
---

Hi Kai,
Thank you for your review.
I will fix the codes after your comments.

> Implement the Hitchhiker erasure coding algorithm
> -
>
> Key: HADOOP-11828
> URL: https://issues.apache.org/jira/browse/HADOOP-11828
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Zhe Zhang
>Assignee: jack liuquan
> Attachments: 7715-hitchhikerXOR-v2-testcode.patch, 
> 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, 
> HADOOP-11828-hitchhikerXOR-V4.patch, HADOOP-11828-hitchhikerXOR-V5.patch, 
> HDFS-7715-hhxor-decoder.patch, HDFS-7715-hhxor-encoder.patch
>
>
> [Hitchhiker | 
> http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is 
> a new erasure coding algorithm developed as a research project at UC 
> Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% 
> during data reconstruction. This JIRA aims to introduce Hitchhiker to the 
> HDFS-EC framework, as one of the pluggable codec algorithms.
> The existing implementation is based on HDFS-RAID. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm

2015-12-21 Thread Kai Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15066468#comment-15066468
 ] 

Kai Zheng commented on HADOOP-11828:


The code looks pretty nice and much clean now. Besides the Jenkins reported 
issues you need to look at and clear, some comments:
* In {{AbstractHHErasureCodingStep}}, {{getSubPacketSize}} may be protected 
instead of public
* In raw erasure coder level, you can set the {{ALLOW_CHANGE_INPUTS}} coder 
option to ensure the input buffers are not changed during encoding/decoding. 
Thus in HH coder level, you don't need to clone the input buffers thus avoids 
data copy.
* In {{TestErasureCoderBase}}, note HH specific logic was added. Please don't, 
because it's for all codec/coders. you can add a test base class for HH like 
{{TestHHErasureCoderBase}} that extends {{TestErasureCoderBase}} instead.

> Implement the Hitchhiker erasure coding algorithm
> -
>
> Key: HADOOP-11828
> URL: https://issues.apache.org/jira/browse/HADOOP-11828
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Zhe Zhang
>Assignee: jack liuquan
> Attachments: 7715-hitchhikerXOR-v2-testcode.patch, 
> 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, 
> HADOOP-11828-hitchhikerXOR-V4.patch, HADOOP-11828-hitchhikerXOR-V5.patch, 
> HDFS-7715-hhxor-decoder.patch, HDFS-7715-hhxor-encoder.patch
>
>
> [Hitchhiker | 
> http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is 
> a new erasure coding algorithm developed as a research project at UC 
> Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% 
> during data reconstruction. This JIRA aims to introduce Hitchhiker to the 
> HDFS-EC framework, as one of the pluggable codec algorithms.
> The existing implementation is based on HDFS-RAID. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm

2015-12-21 Thread jack liuquan (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15066365#comment-15066365
 ] 

jack liuquan commented on HADOOP-11828:
---

Hi all,
Have you reviewed the codes? Please tell me immediately when you have finished 
review.
Thanks! 

> Implement the Hitchhiker erasure coding algorithm
> -
>
> Key: HADOOP-11828
> URL: https://issues.apache.org/jira/browse/HADOOP-11828
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Zhe Zhang
>Assignee: jack liuquan
> Attachments: 7715-hitchhikerXOR-v2-testcode.patch, 
> 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, 
> HADOOP-11828-hitchhikerXOR-V4.patch, HADOOP-11828-hitchhikerXOR-V5.patch, 
> HDFS-7715-hhxor-decoder.patch, HDFS-7715-hhxor-encoder.patch
>
>
> [Hitchhiker | 
> http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is 
> a new erasure coding algorithm developed as a research project at UC 
> Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% 
> during data reconstruction. This JIRA aims to introduce Hitchhiker to the 
> HDFS-EC framework, as one of the pluggable codec algorithms.
> The existing implementation is based on HDFS-RAID. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm

2015-12-21 Thread Kai Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15066435#comment-15066435
 ] 

Kai Zheng commented on HADOOP-11828:


Hi Jack, thanks for your update. Let me read it and give some comments soon.

> Implement the Hitchhiker erasure coding algorithm
> -
>
> Key: HADOOP-11828
> URL: https://issues.apache.org/jira/browse/HADOOP-11828
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Zhe Zhang
>Assignee: jack liuquan
> Attachments: 7715-hitchhikerXOR-v2-testcode.patch, 
> 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, 
> HADOOP-11828-hitchhikerXOR-V4.patch, HADOOP-11828-hitchhikerXOR-V5.patch, 
> HDFS-7715-hhxor-decoder.patch, HDFS-7715-hhxor-encoder.patch
>
>
> [Hitchhiker | 
> http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is 
> a new erasure coding algorithm developed as a research project at UC 
> Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% 
> during data reconstruction. This JIRA aims to introduce Hitchhiker to the 
> HDFS-EC framework, as one of the pluggable codec algorithms.
> The existing implementation is based on HDFS-RAID. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm

2015-12-10 Thread Rashmi Vinayak (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051642#comment-15051642
 ] 

Rashmi Vinayak commented on HADOOP-11828:
-

Hi Jack,
Great! I will start working on the algo review.




> Implement the Hitchhiker erasure coding algorithm
> -
>
> Key: HADOOP-11828
> URL: https://issues.apache.org/jira/browse/HADOOP-11828
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Zhe Zhang
>Assignee: jack liuquan
> Attachments: 7715-hitchhikerXOR-v2-testcode.patch, 
> 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, 
> HADOOP-11828-hitchhikerXOR-V4.patch, HADOOP-11828-hitchhikerXOR-V5.patch, 
> HDFS-7715-hhxor-decoder.patch, HDFS-7715-hhxor-encoder.patch
>
>
> [Hitchhiker | 
> http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is 
> a new erasure coding algorithm developed as a research project at UC 
> Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% 
> during data reconstruction. This JIRA aims to introduce Hitchhiker to the 
> HDFS-EC framework, as one of the pluggable codec algorithms.
> The existing implementation is based on HDFS-RAID. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm

2015-12-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15048319#comment-15048319
 ] 

Hadoop QA commented on HADOOP-11828:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 1s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
17s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 56s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 21s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
17s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 8s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
55s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 0s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 8s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
42s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 59s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 8m 59s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 20s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 9m 20s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 16s 
{color} | {color:red} Patch generated 17 new checkstyle issues in 
hadoop-common-project/hadoop-common (total was 0, now 17). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 5s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s 
{color} | {color:red} The patch has 22 line(s) that end in whitespace. Use git 
apply --whitespace=fix. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 2m 7s 
{color} | {color:red} hadoop-common-project/hadoop-common introduced 1 new 
FindBugs issues. {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 3m 25s 
{color} | {color:red} hadoop-common-project_hadoop-common-jdk1.8.0_66 with JDK 
v1.8.0_66 generated 1 new issues (was 1, now 2). {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 58s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 4m 55s 
{color} | {color:red} hadoop-common-project_hadoop-common-jdk1.7.0_91 with JDK 
v1.7.0_91 generated 1 new issues (was 13, now 14). {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 8s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 7m 44s 
{color} | {color:green} hadoop-common in the patch passed with JDK v1.8.0_66. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 7m 47s 
{color} | {color:green} hadoop-common in the patch passed with JDK v1.7.0_91. 
{color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 15s 
{color} | {color:red} Patch generated 7 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 75m 5s {color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-common-project/hadoop-common |
|  |  Unread 

[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm

2015-12-09 Thread jack liuquan (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15050047#comment-15050047
 ] 

jack liuquan commented on HADOOP-11828:
---

Hi ,all. Can you review codes? or i should fix codes following the report first?
If you can review codes right now, then I can fix codes with your review 
comments and report together.

> Implement the Hitchhiker erasure coding algorithm
> -
>
> Key: HADOOP-11828
> URL: https://issues.apache.org/jira/browse/HADOOP-11828
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Zhe Zhang
>Assignee: jack liuquan
> Attachments: 7715-hitchhikerXOR-v2-testcode.patch, 
> 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, 
> HADOOP-11828-hitchhikerXOR-V4.patch, HADOOP-11828-hitchhikerXOR-V5.patch, 
> HDFS-7715-hhxor-decoder.patch, HDFS-7715-hhxor-encoder.patch
>
>
> [Hitchhiker | 
> http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is 
> a new erasure coding algorithm developed as a research project at UC 
> Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% 
> during data reconstruction. This JIRA aims to introduce Hitchhiker to the 
> HDFS-EC framework, as one of the pluggable codec algorithms.
> The existing implementation is based on HDFS-RAID. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm

2015-12-08 Thread jack liuquan (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15046992#comment-15046992
 ] 

jack liuquan commented on HADOOP-11828:
---

Hi, Kai, I have updated a new patch of hitchhiker-XOR.
This patch is for general parameters, not only for (10. 4).

> Implement the Hitchhiker erasure coding algorithm
> -
>
> Key: HADOOP-11828
> URL: https://issues.apache.org/jira/browse/HADOOP-11828
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Zhe Zhang
>Assignee: jack liuquan
> Attachments: 7715-hitchhikerXOR-v2-testcode.patch, 
> 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, 
> HADOOP-11828-hitchhikerXOR-V4.patch, HADOOP-11828-hitchhikerXOR-V5.patch, 
> HDFS-7715-hhxor-decoder.patch, HDFS-7715-hhxor-encoder.patch
>
>
> [Hitchhiker | 
> http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is 
> a new erasure coding algorithm developed as a research project at UC 
> Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% 
> during data reconstruction. This JIRA aims to introduce Hitchhiker to the 
> HDFS-EC framework, as one of the pluggable codec algorithms.
> The existing implementation is based on HDFS-RAID. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm

2015-12-08 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15047114#comment-15047114
 ] 

Zhe Zhang commented on HADOOP-11828:


Thanks [~jack_liuquan]! Per question from [~rashmikv] above, is the patch ready 
for algorithm review? Looks to me it's ready to be reviewed.

> Implement the Hitchhiker erasure coding algorithm
> -
>
> Key: HADOOP-11828
> URL: https://issues.apache.org/jira/browse/HADOOP-11828
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Zhe Zhang
>Assignee: jack liuquan
> Attachments: 7715-hitchhikerXOR-v2-testcode.patch, 
> 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, 
> HADOOP-11828-hitchhikerXOR-V4.patch, HADOOP-11828-hitchhikerXOR-V5.patch, 
> HDFS-7715-hhxor-decoder.patch, HDFS-7715-hhxor-encoder.patch
>
>
> [Hitchhiker | 
> http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is 
> a new erasure coding algorithm developed as a research project at UC 
> Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% 
> during data reconstruction. This JIRA aims to introduce Hitchhiker to the 
> HDFS-EC framework, as one of the pluggable codec algorithms.
> The existing implementation is based on HDFS-RAID. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm

2015-12-08 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15047116#comment-15047116
 ] 

Zhe Zhang commented on HADOOP-11828:


Could you also click "Submit Patch"? Thanks.

> Implement the Hitchhiker erasure coding algorithm
> -
>
> Key: HADOOP-11828
> URL: https://issues.apache.org/jira/browse/HADOOP-11828
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Zhe Zhang
>Assignee: jack liuquan
> Attachments: 7715-hitchhikerXOR-v2-testcode.patch, 
> 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, 
> HADOOP-11828-hitchhikerXOR-V4.patch, HADOOP-11828-hitchhikerXOR-V5.patch, 
> HDFS-7715-hhxor-decoder.patch, HDFS-7715-hhxor-encoder.patch
>
>
> [Hitchhiker | 
> http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is 
> a new erasure coding algorithm developed as a research project at UC 
> Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% 
> during data reconstruction. This JIRA aims to introduce Hitchhiker to the 
> HDFS-EC framework, as one of the pluggable codec algorithms.
> The existing implementation is based on HDFS-RAID. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm

2015-12-08 Thread jack liuquan (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15048012#comment-15048012
 ] 

jack liuquan commented on HADOOP-11828:
---

Hi Zhe, the patch is ready for algorithm review.
I can click "Submit Patch", but I don't know how to fill the version info.

> Implement the Hitchhiker erasure coding algorithm
> -
>
> Key: HADOOP-11828
> URL: https://issues.apache.org/jira/browse/HADOOP-11828
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Zhe Zhang
>Assignee: jack liuquan
> Attachments: 7715-hitchhikerXOR-v2-testcode.patch, 
> 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, 
> HADOOP-11828-hitchhikerXOR-V4.patch, HADOOP-11828-hitchhikerXOR-V5.patch, 
> HDFS-7715-hhxor-decoder.patch, HDFS-7715-hhxor-encoder.patch
>
>
> [Hitchhiker | 
> http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is 
> a new erasure coding algorithm developed as a research project at UC 
> Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% 
> during data reconstruction. This JIRA aims to introduce Hitchhiker to the 
> HDFS-EC framework, as one of the pluggable codec algorithms.
> The existing implementation is based on HDFS-RAID. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm

2015-12-08 Thread Kai Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15048021#comment-15048021
 ] 

Kai Zheng commented on HADOOP-11828:


Jack you can just submit without filling any information. The fields can be 
filled later.

> Implement the Hitchhiker erasure coding algorithm
> -
>
> Key: HADOOP-11828
> URL: https://issues.apache.org/jira/browse/HADOOP-11828
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Zhe Zhang
>Assignee: jack liuquan
> Attachments: 7715-hitchhikerXOR-v2-testcode.patch, 
> 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, 
> HADOOP-11828-hitchhikerXOR-V4.patch, HADOOP-11828-hitchhikerXOR-V5.patch, 
> HDFS-7715-hhxor-decoder.patch, HDFS-7715-hhxor-encoder.patch
>
>
> [Hitchhiker | 
> http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is 
> a new erasure coding algorithm developed as a research project at UC 
> Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% 
> during data reconstruction. This JIRA aims to introduce Hitchhiker to the 
> HDFS-EC framework, as one of the pluggable codec algorithms.
> The existing implementation is based on HDFS-RAID. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm

2015-12-07 Thread jack liuquan (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15044627#comment-15044627
 ] 

jack liuquan commented on HADOOP-11828:
---

I agree. 
I spend some time to read new codes & docs and will update a new patch this 
week.
I will pay more time to cover this work in order to make it moving more faster.
Thank you, all!

> Implement the Hitchhiker erasure coding algorithm
> -
>
> Key: HADOOP-11828
> URL: https://issues.apache.org/jira/browse/HADOOP-11828
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Zhe Zhang
>Assignee: jack liuquan
> Attachments: 7715-hitchhikerXOR-v2-testcode.patch, 
> 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, 
> HADOOP-11828-hitchhikerXOR-V4.patch, HDFS-7715-hhxor-decoder.patch, 
> HDFS-7715-hhxor-encoder.patch
>
>
> [Hitchhiker | 
> http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is 
> a new erasure coding algorithm developed as a research project at UC 
> Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% 
> during data reconstruction. This JIRA aims to introduce Hitchhiker to the 
> HDFS-EC framework, as one of the pluggable codec algorithms.
> The existing implementation is based on HDFS-RAID. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm

2015-12-06 Thread Rashmi Vinayak (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15044185#comment-15044185
 ] 

Rashmi Vinayak commented on HADOOP-11828:
-

[~jack_liuquan], [~drankye]: I think we can make significantly more impact if 
we move faster and get this out soon: More systems will use the code and EC in 
general if this more efficient code is available when they are deciding whether 
to use EC or not. 
Do let me know when it would be appropriate to start with the algo review. 
Also, as we discussed, once we have (10,4), I will work on implementing for 
general parameters.
[~jack_liuquan]: Feel free to let us know if there are any bottlenecks.


> Implement the Hitchhiker erasure coding algorithm
> -
>
> Key: HADOOP-11828
> URL: https://issues.apache.org/jira/browse/HADOOP-11828
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Zhe Zhang
>Assignee: jack liuquan
> Attachments: 7715-hitchhikerXOR-v2-testcode.patch, 
> 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, 
> HADOOP-11828-hitchhikerXOR-V4.patch, HDFS-7715-hhxor-decoder.patch, 
> HDFS-7715-hhxor-encoder.patch
>
>
> [Hitchhiker | 
> http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is 
> a new erasure coding algorithm developed as a research project at UC 
> Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% 
> during data reconstruction. This JIRA aims to introduce Hitchhiker to the 
> HDFS-EC framework, as one of the pluggable codec algorithms.
> The existing implementation is based on HDFS-RAID. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm

2015-12-06 Thread Kai Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15044469#comment-15044469
 ] 

Kai Zheng commented on HADOOP-11828:


Thanks [~rashmikv] for the thoughts! As discussed off-line with [~zhz], we 
could even get this work based on trunk, not essentially bounded with phase II. 
Jack do you have any plan for this recently? 

> Implement the Hitchhiker erasure coding algorithm
> -
>
> Key: HADOOP-11828
> URL: https://issues.apache.org/jira/browse/HADOOP-11828
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Zhe Zhang
>Assignee: jack liuquan
> Attachments: 7715-hitchhikerXOR-v2-testcode.patch, 
> 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, 
> HADOOP-11828-hitchhikerXOR-V4.patch, HDFS-7715-hhxor-decoder.patch, 
> HDFS-7715-hhxor-encoder.patch
>
>
> [Hitchhiker | 
> http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is 
> a new erasure coding algorithm developed as a research project at UC 
> Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% 
> during data reconstruction. This JIRA aims to introduce Hitchhiker to the 
> HDFS-EC framework, as one of the pluggable codec algorithms.
> The existing implementation is based on HDFS-RAID. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm

2015-11-12 Thread Rashmi Vinayak (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15002646#comment-15002646
 ] 

Rashmi Vinayak commented on HADOOP-11828:
-

Thanks, [~zhz]!

> Implement the Hitchhiker erasure coding algorithm
> -
>
> Key: HADOOP-11828
> URL: https://issues.apache.org/jira/browse/HADOOP-11828
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Zhe Zhang
>Assignee: jack liuquan
> Attachments: 7715-hitchhikerXOR-v2-testcode.patch, 
> 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, 
> HADOOP-11828-hitchhikerXOR-V4.patch, HDFS-7715-hhxor-decoder.patch, 
> HDFS-7715-hhxor-encoder.patch
>
>
> [Hitchhiker | 
> http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is 
> a new erasure coding algorithm developed as a research project at UC 
> Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% 
> during data reconstruction. This JIRA aims to introduce Hitchhiker to the 
> HDFS-EC framework, as one of the pluggable codec algorithms.
> The existing implementation is based on HDFS-RAID. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm

2015-11-06 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14994586#comment-14994586
 ] 

Zhe Zhang commented on HADOOP-11828:


[~rashmikv] Phase 1 of EC was tracked under HDFS-7285 / HADOOP-11264, which has 
been finished. Follow-on tasks of phase 1 are being worked on, tracked under 
HDFS-8031 / HADOOP-11842. Overall, phase 1 of EC is already in Apache Hadoop 
trunk, and will appear in release version 2.9 or 3.0 (being discussed on 
hdfs-...@hadoop.apache.org mail list).

Work on phase 2 of EC has not been started. We have created some tasks under 
the umbrella JIRA HDFS-8030.

> Implement the Hitchhiker erasure coding algorithm
> -
>
> Key: HADOOP-11828
> URL: https://issues.apache.org/jira/browse/HADOOP-11828
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Zhe Zhang
>Assignee: jack liuquan
> Attachments: 7715-hitchhikerXOR-v2-testcode.patch, 
> 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, 
> HADOOP-11828-hitchhikerXOR-V4.patch, HDFS-7715-hhxor-decoder.patch, 
> HDFS-7715-hhxor-encoder.patch
>
>
> [Hitchhiker | 
> http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is 
> a new erasure coding algorithm developed as a research project at UC 
> Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% 
> during data reconstruction. This JIRA aims to introduce Hitchhiker to the 
> HDFS-EC framework, as one of the pluggable codec algorithms.
> The existing implementation is based on HDFS-RAID. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm

2015-11-06 Thread Rashmi Vinayak (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14994571#comment-14994571
 ] 

Rashmi Vinayak commented on HADOOP-11828:
-

[~drankye] [~jack_liuquan] Super excited about this! This might be a naive 
question: Is there a place where I can find more details about Phase 1 & 2 such 
as target dates?

> Implement the Hitchhiker erasure coding algorithm
> -
>
> Key: HADOOP-11828
> URL: https://issues.apache.org/jira/browse/HADOOP-11828
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Zhe Zhang
>Assignee: jack liuquan
> Attachments: 7715-hitchhikerXOR-v2-testcode.patch, 
> 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, 
> HADOOP-11828-hitchhikerXOR-V4.patch, HDFS-7715-hhxor-decoder.patch, 
> HDFS-7715-hhxor-encoder.patch
>
>
> [Hitchhiker | 
> http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is 
> a new erasure coding algorithm developed as a research project at UC 
> Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% 
> during data reconstruction. This JIRA aims to introduce Hitchhiker to the 
> HDFS-EC framework, as one of the pluggable codec algorithms.
> The existing implementation is based on HDFS-RAID. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm

2015-11-02 Thread Kai Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14986546#comment-14986546
 ] 

Kai Zheng commented on HADOOP-11828:


I thought all the basic adjustment issues in raw coder layer were recently 
resolved so the code here can be rebased accordingly. It would be good to move 
on this even though it may be re-targeted for Phase II. Any thought?

> Implement the Hitchhiker erasure coding algorithm
> -
>
> Key: HADOOP-11828
> URL: https://issues.apache.org/jira/browse/HADOOP-11828
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Zhe Zhang
>Assignee: jack liuquan
> Attachments: 7715-hitchhikerXOR-v2-testcode.patch, 
> 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, 
> HADOOP-11828-hitchhikerXOR-V4.patch, HDFS-7715-hhxor-decoder.patch, 
> HDFS-7715-hhxor-encoder.patch
>
>
> [Hitchhiker | 
> http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is 
> a new erasure coding algorithm developed as a research project at UC 
> Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% 
> during data reconstruction. This JIRA aims to introduce Hitchhiker to the 
> HDFS-EC framework, as one of the pluggable codec algorithms.
> The existing implementation is based on HDFS-RAID. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm

2015-11-02 Thread jack liuquan (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14986580#comment-14986580
 ] 

jack liuquan commented on HADOOP-11828:
---

OK, I will do it.

> Implement the Hitchhiker erasure coding algorithm
> -
>
> Key: HADOOP-11828
> URL: https://issues.apache.org/jira/browse/HADOOP-11828
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Zhe Zhang
>Assignee: jack liuquan
> Attachments: 7715-hitchhikerXOR-v2-testcode.patch, 
> 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, 
> HADOOP-11828-hitchhikerXOR-V4.patch, HDFS-7715-hhxor-decoder.patch, 
> HDFS-7715-hhxor-encoder.patch
>
>
> [Hitchhiker | 
> http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is 
> a new erasure coding algorithm developed as a research project at UC 
> Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% 
> during data reconstruction. This JIRA aims to introduce Hitchhiker to the 
> HDFS-EC framework, as one of the pluggable codec algorithms.
> The existing implementation is based on HDFS-RAID. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm

2015-11-02 Thread jack liuquan (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14986557#comment-14986557
 ] 

jack liuquan commented on HADOOP-11828:
---

OK, I'm glad to hear that. Please tell me how to go on.

> Implement the Hitchhiker erasure coding algorithm
> -
>
> Key: HADOOP-11828
> URL: https://issues.apache.org/jira/browse/HADOOP-11828
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Zhe Zhang
>Assignee: jack liuquan
> Attachments: 7715-hitchhikerXOR-v2-testcode.patch, 
> 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, 
> HADOOP-11828-hitchhikerXOR-V4.patch, HDFS-7715-hhxor-decoder.patch, 
> HDFS-7715-hhxor-encoder.patch
>
>
> [Hitchhiker | 
> http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is 
> a new erasure coding algorithm developed as a research project at UC 
> Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% 
> during data reconstruction. This JIRA aims to introduce Hitchhiker to the 
> HDFS-EC framework, as one of the pluggable codec algorithms.
> The existing implementation is based on HDFS-RAID. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm

2015-11-02 Thread Kai Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14986569#comment-14986569
 ] 

Kai Zheng commented on HADOOP-11828:


Jack, please checkout the latest trunk codes and understand related changes on 
raw erasure coders. You may also need to aware the on-going new Java coder in 
HADOOP-12041. Then rebase the patch here on trunk, also addressing previous 
review comments.

> Implement the Hitchhiker erasure coding algorithm
> -
>
> Key: HADOOP-11828
> URL: https://issues.apache.org/jira/browse/HADOOP-11828
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Zhe Zhang
>Assignee: jack liuquan
> Attachments: 7715-hitchhikerXOR-v2-testcode.patch, 
> 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, 
> HADOOP-11828-hitchhikerXOR-V4.patch, HDFS-7715-hhxor-decoder.patch, 
> HDFS-7715-hhxor-encoder.patch
>
>
> [Hitchhiker | 
> http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is 
> a new erasure coding algorithm developed as a research project at UC 
> Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% 
> during data reconstruction. This JIRA aims to introduce Hitchhiker to the 
> HDFS-EC framework, as one of the pluggable codec algorithms.
> The existing implementation is based on HDFS-RAID. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm

2015-09-02 Thread jack liuquan (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14726999#comment-14726999
 ] 

jack liuquan commented on HADOOP-11828:
---

Thanks all.
Hi Kai, I am always here. ;)

> Implement the Hitchhiker erasure coding algorithm
> -
>
> Key: HADOOP-11828
> URL: https://issues.apache.org/jira/browse/HADOOP-11828
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Zhe Zhang
>Assignee: jack liuquan
> Attachments: 7715-hitchhikerXOR-v2-testcode.patch, 
> 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, 
> HADOOP-11828-hitchhikerXOR-V4.patch, HDFS-7715-hhxor-decoder.patch, 
> HDFS-7715-hhxor-encoder.patch
>
>
> [Hitchhiker | 
> http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is 
> a new erasure coding algorithm developed as a research project at UC 
> Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% 
> during data reconstruction. This JIRA aims to introduce Hitchhiker to the 
> HDFS-EC framework, as one of the pluggable codec algorithms.
> The existing implementation is based on HDFS-RAID. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm

2015-09-02 Thread Rashmi Vinayak (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14727791#comment-14727791
 ] 

Rashmi Vinayak commented on HADOOP-11828:
-

[~drankye], [~jack_liuquan]: Awesome! 

> Implement the Hitchhiker erasure coding algorithm
> -
>
> Key: HADOOP-11828
> URL: https://issues.apache.org/jira/browse/HADOOP-11828
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Zhe Zhang
>Assignee: jack liuquan
> Attachments: 7715-hitchhikerXOR-v2-testcode.patch, 
> 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, 
> HADOOP-11828-hitchhikerXOR-V4.patch, HDFS-7715-hhxor-decoder.patch, 
> HDFS-7715-hhxor-encoder.patch
>
>
> [Hitchhiker | 
> http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is 
> a new erasure coding algorithm developed as a research project at UC 
> Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% 
> during data reconstruction. This JIRA aims to introduce Hitchhiker to the 
> HDFS-EC framework, as one of the pluggable codec algorithms.
> The existing implementation is based on HDFS-RAID. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm

2015-09-01 Thread Kai Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14724908#comment-14724908
 ] 

Kai Zheng commented on HADOOP-11828:


Thanks [~rashmikv] for the suggestion. There are some depending issues in the 
erasure codec & coder framework to be resolved. I'll get them done in higher 
priority. When they're done, I will update here then maybe Jack could rebase 
this effort?

> Implement the Hitchhiker erasure coding algorithm
> -
>
> Key: HADOOP-11828
> URL: https://issues.apache.org/jira/browse/HADOOP-11828
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Zhe Zhang
>Assignee: jack liuquan
> Attachments: 7715-hitchhikerXOR-v2-testcode.patch, 
> 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, 
> HADOOP-11828-hitchhikerXOR-V4.patch, HDFS-7715-hhxor-decoder.patch, 
> HDFS-7715-hhxor-encoder.patch
>
>
> [Hitchhiker | 
> http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is 
> a new erasure coding algorithm developed as a research project at UC 
> Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% 
> during data reconstruction. This JIRA aims to introduce Hitchhiker to the 
> HDFS-EC framework, as one of the pluggable codec algorithms.
> The existing implementation is based on HDFS-RAID. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm

2015-08-31 Thread Rashmi Vinayak (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14724449#comment-14724449
 ] 

Rashmi Vinayak commented on HADOOP-11828:
-

Hi [~jack_liuquan] and [~drankye],

Thank you both for the amazing effort on this feature implementation. I was 
wondering what we can do to not lose momentum. Perhaps assigning priorities to 
the major change suggestions could help?

Thanks!

> Implement the Hitchhiker erasure coding algorithm
> -
>
> Key: HADOOP-11828
> URL: https://issues.apache.org/jira/browse/HADOOP-11828
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Zhe Zhang
>Assignee: jack liuquan
> Attachments: 7715-hitchhikerXOR-v2-testcode.patch, 
> 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, 
> HADOOP-11828-hitchhikerXOR-V4.patch, HDFS-7715-hhxor-decoder.patch, 
> HDFS-7715-hhxor-encoder.patch
>
>
> [Hitchhiker | 
> http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is 
> a new erasure coding algorithm developed as a research project at UC 
> Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% 
> during data reconstruction. This JIRA aims to introduce Hitchhiker to the 
> HDFS-EC framework, as one of the pluggable codec algorithms.
> The existing implementation is based on HDFS-RAID. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm

2015-05-11 Thread Kai Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537790#comment-14537790
 ] 

Kai Zheng commented on HADOOP-11828:


Hi Jack, thanks for your clarifying.

For the first 3 points I really would like them to be resolved first as they're 
clear to us now and it would lay a more solid base for the following 
implementations of the other two modes. Doing so we won't have to change big 
after committed. I understand the process isn't very productive but that's the 
pain of open source. I really wish we could get this in sooner but we have to 
do more reviews from more guys, so I guess you will have chances to get the 
codes more clean and elegant. 
bq. HH is specific in preparing input data in decoding
I don't think so, any erasure code is used to encode and decode arbitrary user 
data, we don't need to prepare for it specifically. 
bq. Current testCoding()in TestErasureCoderBase using left 9 data units + 4 
parity units to reconstruct the missing one data unit. 
Yes it is for now. It will be corrected in HADOOP-11847. I thought it's good to 
customize the {{testCoding}} logic here, but in future we should consolidate 
the codes into the parent {{testCoding}}.
bq. I have no good idea cause encoding of RS will erasure input data. 
I see. I don't have either, checking the RS codes it's not easy to avoid the 
erasure. Let's optimize it in future when we get all the things work right 
first.



 Implement the Hitchhiker erasure coding algorithm
 -

 Key: HADOOP-11828
 URL: https://issues.apache.org/jira/browse/HADOOP-11828
 Project: Hadoop Common
  Issue Type: Sub-task
Reporter: Zhe Zhang
Assignee: jack liuquan
 Attachments: 7715-hitchhikerXOR-v2-testcode.patch, 
 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, 
 HADOOP-11828-hitchhikerXOR-V4.patch, HDFS-7715-hhxor-decoder.patch, 
 HDFS-7715-hhxor-encoder.patch


 [Hitchhiker | 
 http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is 
 a new erasure coding algorithm developed as a research project at UC 
 Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% 
 during data reconstruction. This JIRA aims to introduce Hitchhiker to the 
 HDFS-EC framework, as one of the pluggable codec algorithms.
 The existing implementation is based on HDFS-RAID. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm

2015-05-08 Thread Kai Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533961#comment-14533961
 ] 

Kai Zheng commented on HADOOP-11828:


Hi [~rashmikv] or [~jack_liuquan], for HitchHicker algorithm, is it possible to 
use different chunk/cell size in decoding from the chunk/cell size used in 
encoding? In my understanding, it's not or we shouldn't do that way. Please 
help clarify as you're experts in this implementation. Thanks.

 Implement the Hitchhiker erasure coding algorithm
 -

 Key: HADOOP-11828
 URL: https://issues.apache.org/jira/browse/HADOOP-11828
 Project: Hadoop Common
  Issue Type: Sub-task
Reporter: Zhe Zhang
Assignee: jack liuquan
 Attachments: 7715-hitchhikerXOR-v2-testcode.patch, 
 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, 
 HADOOP-11828-hitchhikerXOR-V4.patch, HDFS-7715-hhxor-decoder.patch, 
 HDFS-7715-hhxor-encoder.patch


 [Hitchhiker | 
 http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is 
 a new erasure coding algorithm developed as a research project at UC 
 Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% 
 during data reconstruction. This JIRA aims to introduce Hitchhiker to the 
 HDFS-EC framework, as one of the pluggable codec algorithms.
 The existing implementation is based on HDFS-RAID. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm

2015-05-08 Thread jack liuquan (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14534164#comment-14534164
 ] 

jack liuquan commented on HADOOP-11828:
---

bq.As both xor raw coder and rs raw coder are common to erasure coders for RS 
and HH, please extract common codes resolving the duplicates to abstract class, 
regarding creating xor and rs raw coder.
bq.We may need abstract class like HHErasureDecodingStep and 
HHErasureEncodingStep for the three derivations of the HH algorithm. Classes 
like HHXORErasureDecodingStep can inherit from them.
bq.Please try to reuse codes between the two versions of coding: byte[] version 
and ByteBuffer version. You may look at the patch in HADOOP-11847 for some idea.
Shall we mark these tips and deal with them in next iterative development?
I think these optimization tips are not the last tips, if we fit one as soon as 
we find one, maybe we will do some temporary work and the development 
efficiency will be low.
Maybe we can plan a next development iterative stage and cover all these tips 
in next stage.
What you think?

bq.We might not override testCoding and performCodingStep in 
TestHHErasureCoderBase. Any specific for HH here? If we have to, then there 
would be problem to use the coder as it's not general to use.
HH is specific in preparing input data in decoding. e.g. in (k=10, r=4), 
Current testCoding()in {{TestErasureCoderBase}} using left 9 data units + 4 
parity units to reconstruct the missing one data unit. 
But it is not good for HH cause the advantage of HH is to saving requring data 
units when reconstructing. 
For performCodingStep(), the reason is the use of sub-strip pair.

bq.Is it possible to avoid the cloning input data in getPiggyBacksFromInput?
I have no good idea cause encoding of RS will erasure input data. Could you 
give me some good suggestions?

bq.Some comments might be better to reorganized to make them look better. Some 
are too long, and some can be longer.
bq.Please note lines should not exceed 80 chars. You could set the width limit 
in your IDE.
bq.We need Javadocs for the public functions in HHUtil.
bq.I thought we don't need this test as it's the configuration isn't specific 
to the coder.
These are OK to me, I will try my best to treat them with your suggestion.

 Implement the Hitchhiker erasure coding algorithm
 -

 Key: HADOOP-11828
 URL: https://issues.apache.org/jira/browse/HADOOP-11828
 Project: Hadoop Common
  Issue Type: Sub-task
Reporter: Zhe Zhang
Assignee: jack liuquan
 Attachments: 7715-hitchhikerXOR-v2-testcode.patch, 
 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, 
 HADOOP-11828-hitchhikerXOR-V4.patch, HDFS-7715-hhxor-decoder.patch, 
 HDFS-7715-hhxor-encoder.patch


 [Hitchhiker | 
 http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is 
 a new erasure coding algorithm developed as a research project at UC 
 Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% 
 during data reconstruction. This JIRA aims to introduce Hitchhiker to the 
 HDFS-EC framework, as one of the pluggable codec algorithms.
 The existing implementation is based on HDFS-RAID. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm

2015-05-08 Thread jack liuquan (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14534051#comment-14534051
 ] 

jack liuquan commented on HADOOP-11828:
---

Hi Kai,
I can't catch your meanings exactly.
The encoding/decoding of bytes in one chunk(one chunk is a sub-stripe) is 
linear operation, but not linear between chunks in one block.
If we read bytes in blocks is a linear operation, and it's right that the  
encoding/decoding of units should happen in aligned boundaries.
But if we can use an offset to read chunks parallel, I think we can  
encoding/decoding bytes in flow mode, not need in aligned boundaries .
Do I answer your question? Thanks!

 Implement the Hitchhiker erasure coding algorithm
 -

 Key: HADOOP-11828
 URL: https://issues.apache.org/jira/browse/HADOOP-11828
 Project: Hadoop Common
  Issue Type: Sub-task
Reporter: Zhe Zhang
Assignee: jack liuquan
 Attachments: 7715-hitchhikerXOR-v2-testcode.patch, 
 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, 
 HADOOP-11828-hitchhikerXOR-V4.patch, HDFS-7715-hhxor-decoder.patch, 
 HDFS-7715-hhxor-encoder.patch


 [Hitchhiker | 
 http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is 
 a new erasure coding algorithm developed as a research project at UC 
 Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% 
 during data reconstruction. This JIRA aims to introduce Hitchhiker to the 
 HDFS-EC framework, as one of the pluggable codec algorithms.
 The existing implementation is based on HDFS-RAID. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm

2015-05-08 Thread jack liuquan (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533995#comment-14533995
 ] 

jack liuquan commented on HADOOP-11828:
---

Hi Kai,
Yes,we can't use different chunk size in decoding and encoding cause we use 
sub-stripe pair to encoding and decoding in Hitchhiker.
Different chunk size may make sub-stripes confusion.

 Implement the Hitchhiker erasure coding algorithm
 -

 Key: HADOOP-11828
 URL: https://issues.apache.org/jira/browse/HADOOP-11828
 Project: Hadoop Common
  Issue Type: Sub-task
Reporter: Zhe Zhang
Assignee: jack liuquan
 Attachments: 7715-hitchhikerXOR-v2-testcode.patch, 
 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, 
 HADOOP-11828-hitchhikerXOR-V4.patch, HDFS-7715-hhxor-decoder.patch, 
 HDFS-7715-hhxor-encoder.patch


 [Hitchhiker | 
 http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is 
 a new erasure coding algorithm developed as a research project at UC 
 Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% 
 during data reconstruction. This JIRA aims to introduce Hitchhiker to the 
 HDFS-EC framework, as one of the pluggable codec algorithms.
 The existing implementation is based on HDFS-RAID. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm

2015-05-08 Thread Kai Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14534009#comment-14534009
 ] 

Kai Zheng commented on HADOOP-11828:


Thanks Jack for the clarifying. So because of the sub-strip arrangement 
specific to the algorithm, the encoding/decoding of bytes in chunks isn't any a 
linear operation, and the encoding/decoding of units much happen in aligned 
boundaries (fixed chunk/cell size), right.

 Implement the Hitchhiker erasure coding algorithm
 -

 Key: HADOOP-11828
 URL: https://issues.apache.org/jira/browse/HADOOP-11828
 Project: Hadoop Common
  Issue Type: Sub-task
Reporter: Zhe Zhang
Assignee: jack liuquan
 Attachments: 7715-hitchhikerXOR-v2-testcode.patch, 
 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, 
 HADOOP-11828-hitchhikerXOR-V4.patch, HDFS-7715-hhxor-decoder.patch, 
 HDFS-7715-hhxor-encoder.patch


 [Hitchhiker | 
 http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is 
 a new erasure coding algorithm developed as a research project at UC 
 Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% 
 during data reconstruction. This JIRA aims to introduce Hitchhiker to the 
 HDFS-EC framework, as one of the pluggable codec algorithms.
 The existing implementation is based on HDFS-RAID. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm

2015-05-08 Thread jack liuquan (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14534102#comment-14534102
 ] 

jack liuquan commented on HADOOP-11828:
---

yes, I confirm it.

 Implement the Hitchhiker erasure coding algorithm
 -

 Key: HADOOP-11828
 URL: https://issues.apache.org/jira/browse/HADOOP-11828
 Project: Hadoop Common
  Issue Type: Sub-task
Reporter: Zhe Zhang
Assignee: jack liuquan
 Attachments: 7715-hitchhikerXOR-v2-testcode.patch, 
 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, 
 HADOOP-11828-hitchhikerXOR-V4.patch, HDFS-7715-hhxor-decoder.patch, 
 HDFS-7715-hhxor-encoder.patch


 [Hitchhiker | 
 http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is 
 a new erasure coding algorithm developed as a research project at UC 
 Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% 
 during data reconstruction. This JIRA aims to introduce Hitchhiker to the 
 HDFS-EC framework, as one of the pluggable codec algorithms.
 The existing implementation is based on HDFS-RAID. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm

2015-05-08 Thread Kai Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14534080#comment-14534080
 ] 

Kai Zheng commented on HADOOP-11828:


And the chunkSize passed to initialize HH encoder should be the same value with 
the one used to initialize the corresponding HH decoder?

 Implement the Hitchhiker erasure coding algorithm
 -

 Key: HADOOP-11828
 URL: https://issues.apache.org/jira/browse/HADOOP-11828
 Project: Hadoop Common
  Issue Type: Sub-task
Reporter: Zhe Zhang
Assignee: jack liuquan
 Attachments: 7715-hitchhikerXOR-v2-testcode.patch, 
 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, 
 HADOOP-11828-hitchhikerXOR-V4.patch, HDFS-7715-hhxor-decoder.patch, 
 HDFS-7715-hhxor-encoder.patch


 [Hitchhiker | 
 http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is 
 a new erasure coding algorithm developed as a research project at UC 
 Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% 
 during data reconstruction. This JIRA aims to introduce Hitchhiker to the 
 HDFS-EC framework, as one of the pluggable codec algorithms.
 The existing implementation is based on HDFS-RAID. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm

2015-05-08 Thread Kai Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14534071#comment-14534071
 ] 

Kai Zheng commented on HADOOP-11828:


I thought I'm aligned with your understanding. So for simple let's come down to 
the real implementation, would you confirm, in your implemented HH encoder and 
decoder, the chunkSize passed to {{initialize}} function should be respected or 
used by {{encode}} and {{decode}} input/output buffers?

 Implement the Hitchhiker erasure coding algorithm
 -

 Key: HADOOP-11828
 URL: https://issues.apache.org/jira/browse/HADOOP-11828
 Project: Hadoop Common
  Issue Type: Sub-task
Reporter: Zhe Zhang
Assignee: jack liuquan
 Attachments: 7715-hitchhikerXOR-v2-testcode.patch, 
 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, 
 HADOOP-11828-hitchhikerXOR-V4.patch, HDFS-7715-hhxor-decoder.patch, 
 HDFS-7715-hhxor-encoder.patch


 [Hitchhiker | 
 http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is 
 a new erasure coding algorithm developed as a research project at UC 
 Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% 
 during data reconstruction. This JIRA aims to introduce Hitchhiker to the 
 HDFS-EC framework, as one of the pluggable codec algorithms.
 The existing implementation is based on HDFS-RAID. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm

2015-05-04 Thread Kai Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14526652#comment-14526652
 ] 

Kai Zheng commented on HADOOP-11828:


Hi Jack, the updated patch looks overall good to me. Some comments so far:
* Some comments might be better to reorganized to make them look better. Some 
are too long, and some can be longer.
* Please note lines should not exceed 80 chars. You could set the width limit 
in your IDE.
* As both xor raw coder and rs raw coder are common to erasure coders for RS 
and HH, please extract common codes resolving the duplicates to abstract class, 
regarding creating xor and rs raw coder.
* We may need abstract class like {{HHErasureDecodingStep}} and 
{{HHErasureEncodingStep}} for the three derivations of the HH algorithm. 
Classes like {{HHXORErasureDecodingStep}} can inherit from them.
* Please try to reuse codes between the two versions of coding: byte[] version 
and ByteBuffer version. You may look at the patch in HADOOP-11847 for some idea.
* We might not override {{testCoding}} and {{performCodingStep}} in 
{{TestHHErasureCoderBase}}. Any specific for HH here? If we have to, then there 
would be problem to use the coder as it's not general to use.
* We need Javadocs for the public functions in {{HHUtil}}.
* Is it possible to avoid the cloning input data in {{getPiggyBacksFromInput}}?
* I thought we don't need this test as it's the configuration isn't specific to 
the coder.
{code}
+  @Test
+  public void testCodingDirectBufferWithConf_10x4() {
+/**
+ * This tests if the two configuration items work or not.
+ */
+Configuration conf = new Configuration();
+conf.set(CommonConfigurationKeys.IO_ERASURECODE_CODEC_RS_RAWCODER_KEY,
+RSRawErasureCoderFactory.class.getCanonicalName());
+conf.setBoolean(
+CommonConfigurationKeys.IO_ERASURECODE_CODEC_RS_USEXOR_KEY, false);
+prepare(conf, 10, 4, null);
+initHitchhiker();
+testCoding(true);
+  }
{code}

 Implement the Hitchhiker erasure coding algorithm
 -

 Key: HADOOP-11828
 URL: https://issues.apache.org/jira/browse/HADOOP-11828
 Project: Hadoop Common
  Issue Type: Sub-task
Reporter: Zhe Zhang
Assignee: jack liuquan
 Attachments: 7715-hitchhikerXOR-v2-testcode.patch, 
 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, 
 HADOOP-11828-hitchhikerXOR-V4.patch, HDFS-7715-hhxor-decoder.patch, 
 HDFS-7715-hhxor-encoder.patch


 [Hitchhiker | 
 http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is 
 a new erasure coding algorithm developed as a research project at UC 
 Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% 
 during data reconstruction. This JIRA aims to introduce Hitchhiker to the 
 HDFS-EC framework, as one of the pluggable codec algorithms.
 The existing implementation is based on HDFS-RAID. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm

2015-04-28 Thread Kai Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14516526#comment-14516526
 ] 

Kai Zheng commented on HADOOP-11828:


Hi Jack,

You're still using {{JRSRawEncoder}}, but it's renamed to {{RSRawEncoder}} 
quite some time ago.

How about naming the new coder {{HitchhikerXORErasureEncoder}} to 
{{HHXORErasureEnoder}}? Similar to other coders.

 Implement the Hitchhiker erasure coding algorithm
 -

 Key: HADOOP-11828
 URL: https://issues.apache.org/jira/browse/HADOOP-11828
 Project: Hadoop Common
  Issue Type: Sub-task
Reporter: Zhe Zhang
Assignee: jack liuquan
 Attachments: 7715-hitchhikerXOR-v2-testcode.patch, 
 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, 
 HDFS-7715-hhxor-decoder.patch, HDFS-7715-hhxor-encoder.patch


 [Hitchhiker | 
 http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is 
 a new erasure coding algorithm developed as a research project at UC 
 Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% 
 during data reconstruction. This JIRA aims to introduce Hitchhiker to the 
 HDFS-EC framework, as one of the pluggable codec algorithms.
 The existing implementation is based on HDFS-RAID. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm

2015-04-28 Thread jack liuquan (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14516731#comment-14516731
 ] 

jack liuquan commented on HADOOP-11828:
---

Hi Kai,
bq. I mentioned that in my rough thought, would you clarify it in details and 
provide your thoughts? This would be desired as it's a rather major design 
change.
Hitchhiker algorithm builds on top of RS codes and XOR codes, it is more 
suitable to put hitchhiker  in ErasureCoder layer for architecture 
consideration. And it is convenient to replace underlying RS codes and XOR 
codes of Hitchhiker to get better performance in ErasureCoder layer.
bq.You're still using JRSRawEncoder, but it's renamed to RSRawEncoder quite 
some time ago.
Oh yes, I see. I will check and modify it.
bq.How about naming the new coder HitchhikerXORErasureEncoder to 
HHXORErasureEnoder? Similar to other coders.
OK, sounds great.

 Implement the Hitchhiker erasure coding algorithm
 -

 Key: HADOOP-11828
 URL: https://issues.apache.org/jira/browse/HADOOP-11828
 Project: Hadoop Common
  Issue Type: Sub-task
Reporter: Zhe Zhang
Assignee: jack liuquan
 Attachments: 7715-hitchhikerXOR-v2-testcode.patch, 
 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, 
 HDFS-7715-hhxor-decoder.patch, HDFS-7715-hhxor-encoder.patch


 [Hitchhiker | 
 http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is 
 a new erasure coding algorithm developed as a research project at UC 
 Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% 
 during data reconstruction. This JIRA aims to introduce Hitchhiker to the 
 HDFS-EC framework, as one of the pluggable codec algorithms.
 The existing implementation is based on HDFS-RAID. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm

2015-04-24 Thread jack liuquan (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14510597#comment-14510597
 ] 

jack liuquan commented on HADOOP-11828:
---

Hi Kai,
Thank you for your review.
bq.* Please rebase with latest branch. Your codebase is rather old.
sorry, I'm not sure. I add my codes based on the latest HDFS-7285 branch. Is 
that not right?
bq.* Please remove codes for other modes for now, even in tests.
OK, I will remove the codes not used in Hitchhiker-XOR.

 Implement the Hitchhiker erasure coding algorithm
 -

 Key: HADOOP-11828
 URL: https://issues.apache.org/jira/browse/HADOOP-11828
 Project: Hadoop Common
  Issue Type: Sub-task
Reporter: Zhe Zhang
Assignee: jack liuquan
 Attachments: 7715-hitchhikerXOR-v2-testcode.patch, 
 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, 
 HDFS-7715-hhxor-decoder.patch, HDFS-7715-hhxor-encoder.patch


 [Hitchhiker | 
 http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is 
 a new erasure coding algorithm developed as a research project at UC 
 Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% 
 during data reconstruction. This JIRA aims to introduce Hitchhiker to the 
 HDFS-EC framework, as one of the pluggable codec algorithms.
 The existing implementation is based on HDFS-RAID. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm

2015-04-23 Thread Kai Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508733#comment-14508733
 ] 

Kai Zheng commented on HADOOP-11828:


Jack, good work. Thanks!
* Please rebase with latest branch. Your codebase is rather old.
* Please remove codes for other modes for now, even in tests.

 Implement the Hitchhiker erasure coding algorithm
 -

 Key: HADOOP-11828
 URL: https://issues.apache.org/jira/browse/HADOOP-11828
 Project: Hadoop Common
  Issue Type: Sub-task
Reporter: Zhe Zhang
Assignee: jack liuquan
 Attachments: 7715-hitchhikerXOR-v2-testcode.patch, 
 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, 
 HDFS-7715-hhxor-decoder.patch, HDFS-7715-hhxor-encoder.patch


 [Hitchhiker | 
 http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is 
 a new erasure coding algorithm developed as a research project at UC 
 Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% 
 during data reconstruction. This JIRA aims to introduce Hitchhiker to the 
 HDFS-EC framework, as one of the pluggable codec algorithms.
 The existing implementation is based on HDFS-RAID. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm

2015-04-21 Thread jack liuquan (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505125#comment-14505125
 ] 

jack liuquan commented on HADOOP-11828:
---

Hi kai,
I have uploaded a new patch.
I think you are right, maybe the  hitchhiker in ErasureCoder layer is better. 
So in the new patch, I move hitchhiker-XOR to the ErasureCoder layer.
Please review the codes, Thanks a lot! 

 Implement the Hitchhiker erasure coding algorithm
 -

 Key: HADOOP-11828
 URL: https://issues.apache.org/jira/browse/HADOOP-11828
 Project: Hadoop Common
  Issue Type: Sub-task
Reporter: Zhe Zhang
Assignee: jack liuquan
 Attachments: 7715-hitchhikerXOR-v2-testcode.patch, 
 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, 
 HDFS-7715-hhxor-decoder.patch, HDFS-7715-hhxor-encoder.patch


 [Hitchhiker | 
 http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is 
 a new erasure coding algorithm developed as a research project at UC 
 Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% 
 during data reconstruction. This JIRA aims to introduce Hitchhiker to the 
 HDFS-EC framework, as one of the pluggable codec algorithms.
 The existing implementation is based on HDFS-RAID. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm

2015-04-21 Thread Kai Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505901#comment-14505901
 ] 

Kai Zheng commented on HADOOP-11828:


Thanks [~jack_liuquan] for the update!
bq.I think you are right, maybe the hitchhiker in ErasureCoder layer is better. 
Sounds great to me. I mentioned that in my rough thought, would you clarify it 
in details and provide your thoughts? This would be desired as it's a rather 
major design change.
bq.So in the new patch, I move hitchhiker-XOR to the ErasureCoder layer.
It's great to have this version. I will read it recently. Thanks!

 Implement the Hitchhiker erasure coding algorithm
 -

 Key: HADOOP-11828
 URL: https://issues.apache.org/jira/browse/HADOOP-11828
 Project: Hadoop Common
  Issue Type: Sub-task
Reporter: Zhe Zhang
Assignee: jack liuquan
 Attachments: 7715-hitchhikerXOR-v2-testcode.patch, 
 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, 
 HDFS-7715-hhxor-decoder.patch, HDFS-7715-hhxor-encoder.patch


 [Hitchhiker | 
 http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is 
 a new erasure coding algorithm developed as a research project at UC 
 Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% 
 during data reconstruction. This JIRA aims to introduce Hitchhiker to the 
 HDFS-EC framework, as one of the pluggable codec algorithms.
 The existing implementation is based on HDFS-RAID. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm

2015-04-13 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14492780#comment-14492780
 ] 

Zhe Zhang commented on HADOOP-11828:


Thanks Uma for moving this to COMMON; good call. 

Just to quickly confirm, do we plan to change DataNode in this work?

 Implement the Hitchhiker erasure coding algorithm
 -

 Key: HADOOP-11828
 URL: https://issues.apache.org/jira/browse/HADOOP-11828
 Project: Hadoop Common
  Issue Type: Sub-task
Reporter: Zhe Zhang
Assignee: jack liuquan
 Attachments: 7715-hitchhikerXOR-v2-testcode.patch, 
 7715-hitchhikerXOR-v2.patch, HDFS-7715-hhxor-decoder.patch, 
 HDFS-7715-hhxor-encoder.patch


 [Hitchhiker | 
 http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is 
 a new erasure coding algorithm developed as a research project at UC 
 Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% 
 during data reconstruction. This JIRA aims to introduce Hitchhiker to the 
 HDFS-EC framework, as one of the pluggable codec algorithms.
 The existing implementation is based on HDFS-RAID. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm

2015-04-13 Thread Kai Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14493361#comment-14493361
 ] 

Kai Zheng commented on HADOOP-11828:


bq.do we plan to change DataNode in this work?
Hi [~zhz], it's not clear to us yet. Is it possible to have a separate issue if 
determined later? This would focus on the core algorithm or codec part.

 Implement the Hitchhiker erasure coding algorithm
 -

 Key: HADOOP-11828
 URL: https://issues.apache.org/jira/browse/HADOOP-11828
 Project: Hadoop Common
  Issue Type: Sub-task
Reporter: Zhe Zhang
Assignee: jack liuquan
 Attachments: 7715-hitchhikerXOR-v2-testcode.patch, 
 7715-hitchhikerXOR-v2.patch, HDFS-7715-hhxor-decoder.patch, 
 HDFS-7715-hhxor-encoder.patch


 [Hitchhiker | 
 http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is 
 a new erasure coding algorithm developed as a research project at UC 
 Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% 
 during data reconstruction. This JIRA aims to introduce Hitchhiker to the 
 HDFS-EC framework, as one of the pluggable codec algorithms.
 The existing implementation is based on HDFS-RAID. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)