Re: FYI - forking TFile off Hadoop into Zebra
On Nov 11, 2009, at 4:13 PM, Ashutosh Chauhan wrote: On Wed, Nov 11, 2009 at 18:26, Chao Wang wrote: Last, we would like to point out that this is a short term solution for Zebra and we plan to: 1) port all changes to Zebra TFile back into Hadoop TFile. 2) in the long run have a single unified solution for this. Just for clarity, in long run as Zebra stabilizes and Pig adopts hadoop-0.22, Zebra will get rid of this fork? I think the promise is they'll get rid of the fork at some point, not necessarily at 0.22 though. Alan. Ashutosh
Re: FYI - forking TFile off Hadoop into Zebra
On Wed, Nov 11, 2009 at 18:26, Chao Wang wrote: > Last, we would like to point out that this is a short term solution for > Zebra and we plan to: > 1) port all changes to Zebra TFile back into Hadoop TFile. > 2) in the long run have a single unified solution for this. > > Just for clarity, in long run as Zebra stabilizes and Pig adopts hadoop-0.22, Zebra will get rid of this fork? Ashutosh
FYI - forking TFile off Hadoop into Zebra
Hi all, In Jira Pig-1077, we Zebra team plan to utilize Hadoop TFile's split by record sequence number support to provide record(row)-based input split support in Zebra. Here we would like to point out that: along the way we plan to also resolve the dependency issue that Zebra record-based split needs Hadoop TFile split support to work. For this dependency, Zebra has to maintain its own copy of Hadoop jar in svn for it to be able to build. Furthermore, the fact that Zebra currently sits inside Pig in svn and Pig itself maintains its own copy of Hadoop jar in lib directory makes things even messier. Finally, we notice that Zebra is new and making many changes and needs to get new revisions quickly, while Hadoop and Pig are more mature and moving slowly and thus can't make new releases for Zebra all the time. After carefully thinking through all this, we plan to fork the TFile part off the Hadoop and port it into Zebra's own code base. This will greatly simply the building process of Zebra and also enable it to make quick revisions. Last, we would like to point out that this is a short term solution for Zebra and we plan to: 1) port all changes to Zebra TFile back into Hadoop TFile. 2) in the long run have a single unified solution for this. For more information, please see https://issues.apache.org/jira/browse/PIG-1077 Welcome your feedback on this. Regards, Chao