subject:"FYI \- forking TFile off Hadoop into Zebra"

Re: FYI - forking TFile off Hadoop into Zebra

2009-11-13 Thread Alan Gates



On Nov 11, 2009, at 4:13 PM, Ashutosh Chauhan wrote:


On Wed, Nov 11, 2009 at 18:26, Chao Wang  wrote:


Last, we would like to point out that this is a short term solution  
for

Zebra and we plan to:
1) port all changes to Zebra TFile back into Hadoop TFile.
2) in the long run have a single unified solution for this.

Just for clarity, in long run as Zebra stabilizes and Pig adopts

hadoop-0.22, Zebra will get rid of this fork?


I think the promise is they'll get rid of the fork at some point, not  
necessarily at 0.22 though.


Alan.



Ashutosh

Re: FYI - forking TFile off Hadoop into Zebra

2009-11-11 Thread Ashutosh Chauhan

On Wed, Nov 11, 2009 at 18:26, Chao Wang  wrote:


> Last, we would like to point out that this is a short term solution for
> Zebra and we plan to:
> 1) port all changes to Zebra TFile back into Hadoop TFile.
> 2) in the long run have a single unified solution for this.
>
> Just for clarity, in long run as Zebra stabilizes and Pig adopts
hadoop-0.22, Zebra will get rid of this fork?

Ashutosh

FYI - forking TFile off Hadoop into Zebra

2009-11-11 Thread Chao Wang

Hi all,

 

In Jira Pig-1077, we Zebra team plan to utilize Hadoop TFile's split by
record sequence number support to provide record(row)-based input split
support in Zebra.

 

Here we would like to point out that: along the way we plan to also
resolve the dependency issue that Zebra record-based split needs Hadoop
TFile split support to work. For this dependency, Zebra has to maintain
its own copy of Hadoop jar in svn for it to be able to build.
Furthermore, the fact that Zebra currently sits inside Pig in svn and
Pig itself maintains its own copy of Hadoop jar in lib directory makes
things even messier. Finally, we notice that Zebra is new and making
many changes and needs to get new revisions quickly, while Hadoop and
Pig are more mature and moving slowly and thus can't make new releases
for Zebra all the time. 

After carefully thinking through all this, we plan to fork the TFile
part off the Hadoop and port it into Zebra's own code base. This will
greatly simply the building process of Zebra and also enable it to make
quick revisions. 

Last, we would like to point out that this is a short term solution for
Zebra and we plan to: 
1) port all changes to Zebra TFile back into Hadoop TFile. 
2) in the long run have a single unified solution for this. 

 

 

For more information, please see
https://issues.apache.org/jira/browse/PIG-1077

 

 

 

Welcome your feedback on this.

 

 

 

Regards,

 

Chao

Re: FYI - forking TFile off Hadoop into Zebra

Re: FYI - forking TFile off Hadoop into Zebra

FYI - forking TFile off Hadoop into Zebra

3 matches

Site Navigation

Mail list logo

Footer information