Since most of the bugfixes to Pig happen in trunk, I (and several folks that
I know) tend to use pig trunk most often. It would be nice if I picked up
Zebra enhancements along the way, as well.
Since zebra.jar is not included in pig.jar (I hope not), I can still use
stable zebra jar (binary) with latest pig compiled in trunk.
Also, build failure in zebra need not impact pig release, since the other
contrib, i.e. Piggybank is also "build-optional".
I think that creating a branch results in too many changes on that branch
before a mainline merge happens. Each of the feature additions you mention
would be very highly desirable even in the absence of others.
Just my 2 non-binding cents.
On 8/17/09 10:28 PM, "Raghu Angadi" <rang...@yahoo-inc.com> wrote:
> The reason for a branch is purely based on fair number of improvements
> we are planning for Zebra and our desire to have a stable Zebra
> implementation for users to use along with PIG on Hadoop-0.20.
> New features planned (jiras will be filed soon) :
> * Column security (different permissions for different columns)
> * Ability to drop columns
> * ability to address "column groups" by name
> * Support for sorted tables, map side joins,
> * ...
> Many of these changes involve changes to table metadata, schema syntax,
> and on disk format of the metadata (all of these will be backward
> If Zebra was a project of its own, one would have made a 0.1.0 branch
> and worked on new features in the trunk. The new proposed branch is for
> achieving the same by keeping PIG and stable Zebra together. PIG branch
> 0.4.0 will be made when it is appropriate for PIG. Generally, a contrib
> project should not influence that decision.
> Is there an alternative to creating a branch? Would you prefer we commit
> new features to a line that is being used by users?
> Milind A Bhandarkar wrote:
>> IANAC, but my (non-binding) vote is also -1. I think all the improvements
>> and feature addition to zebra should be available through pig trunk. The
>> codebase is not big enough to justify creating a branch. If the reason is
>> Pig's dependence on a checked in hadoop jar, the shims proposal by Dmitry
>> should be taken up asap, so that those who want to use zebra can use pig
>> trunk with hadoop 0.20
>> - milind
>> On 8/17/09 5:14 PM, "Yiping Han" <y...@yahoo-inc.com> wrote:
>>> On 8/18/09 7:11 AM, "Olga Natkovich" <ol...@yahoo-inc.com> wrote:
>>>> -----Original Message-----
>>>> From: Raghu Angadi [mailto:rang...@yahoo-inc.com]
>>>> Sent: Monday, August 17, 2009 4:06 PM
>>>> To: email@example.com
>>>> Subject: Proposal to create a branch for contrib project Zebra
>>>> Thanks to the PIG team, The first version of contrib project Zebra
>>>> (PIG-833) is committed to PIG trunk.
>>>> In short, Zebra is a table storage layer built for use in PIG and other
>>>> Hadoop applications.
>>>> While we are stabilizing current version V1 in the trunk, we plan to add
>>>> more new features to it. We would like to create an svn branch for the
>>>> new features. We will be responsible for managing zebra in PIG trunk and
>>>> in the new branch. We will merge the branch when it is ready. We expect
>>>> the changes to affect only 'contrib/zebra' directory.
>>>> As a regular contributor to Hadoop, I will be the initial committer for
>>>> Zebra. As more patches are contributed by other Zebra developers, there
>>>> might be more commiters added through normal Hadoop/Apache procedure.
>>>> I would like to create a branch called 'zebra-v2' with approval from PIG
>>> Yiping Han