The reason for a branch is purely based on fair number of improvements we are planning for Zebra and our desire to have a stable Zebra implementation for users to use along with PIG on Hadoop-0.20.

New features planned (jiras will be filed soon) :
   * Column security (different permissions for different columns)
   * Ability to drop columns
   * ability to address "column groups" by name
   * Support for sorted tables, map side joins,
   * ...

Many of these changes involve changes to table metadata, schema syntax, and on disk format of the metadata (all of these will be backward compatible).

If Zebra was a project of its own, one would have made a 0.1.0 branch and worked on new features in the trunk. The new proposed branch is for achieving the same by keeping PIG and stable Zebra together. PIG branch 0.4.0 will be made when it is appropriate for PIG. Generally, a contrib project should not influence that decision.

Is there an alternative to creating a branch? Would you prefer we commit new features to a line that is being used by users?


IANAC, but my (non-binding) vote is also -1. I think all the improvements
and feature addition to zebra should be available through pig trunk. The
codebase is not big enough to justify creating a branch. If the reason is
Pig's dependence on a checked in hadoop jar, the shims proposal by Dmitry
should be taken up asap, so that those who want to use zebra can use pig
trunk with hadoop 0.20

Thanks to the PIG team, The first version of contrib project Zebra
(PIG-833) is committed to PIG trunk.

In short, Zebra is a table storage layer built for use in PIG and other
Hadoop applications.

While we are stabilizing current version V1 in the trunk, we plan to add

more new features to it. We would like to create an svn branch for the
new features. We will be responsible for managing zebra in PIG trunk and

in the new branch. We will merge the branch when it is ready. We expect
the changes to affect only 'contrib/zebra' directory.

As a regular contributor to Hadoop, I will be the initial committer for
Zebra. As more patches are contributed by other Zebra developers, there
might be more commiters added through normal Hadoop/Apache procedure.

I would like to create a branch called 'zebra-v2' with approval from PIG


