Re: Proposal to create a branch for contrib project Zebra
Raghu, Since most of the bugfixes to Pig happen in trunk, I (and several folks that I know) tend to use pig trunk most often. It would be nice if I picked up Zebra enhancements along the way, as well. Since zebra.jar is not included in pig.jar (I hope not), I can still use stable zebra jar (binary) with latest pig compiled in trunk. Also, build failure in zebra need not impact pig release, since the other contrib, i.e. Piggybank is also build-optional. I think that creating a branch results in too many changes on that branch before a mainline merge happens. Each of the feature additions you mention would be very highly desirable even in the absence of others. Just my 2 non-binding cents. - milind On 8/17/09 10:28 PM, Raghu Angadi rang...@yahoo-inc.com wrote: The reason for a branch is purely based on fair number of improvements we are planning for Zebra and our desire to have a stable Zebra implementation for users to use along with PIG on Hadoop-0.20. New features planned (jiras will be filed soon) : * Column security (different permissions for different columns) * Ability to drop columns * ability to address column groups by name * Support for sorted tables, map side joins, * ... Many of these changes involve changes to table metadata, schema syntax, and on disk format of the metadata (all of these will be backward compatible). If Zebra was a project of its own, one would have made a 0.1.0 branch and worked on new features in the trunk. The new proposed branch is for achieving the same by keeping PIG and stable Zebra together. PIG branch 0.4.0 will be made when it is appropriate for PIG. Generally, a contrib project should not influence that decision. Is there an alternative to creating a branch? Would you prefer we commit new features to a line that is being used by users? Raghu. Milind A Bhandarkar wrote: IANAC, but my (non-binding) vote is also -1. I think all the improvements and feature addition to zebra should be available through pig trunk. The codebase is not big enough to justify creating a branch. If the reason is Pig's dependence on a checked in hadoop jar, the shims proposal by Dmitry should be taken up asap, so that those who want to use zebra can use pig trunk with hadoop 0.20 - milind On 8/17/09 5:14 PM, Yiping Han y...@yahoo-inc.com wrote: +1 On 8/18/09 7:11 AM, Olga Natkovich ol...@yahoo-inc.com wrote: +1 -Original Message- From: Raghu Angadi [mailto:rang...@yahoo-inc.com] Sent: Monday, August 17, 2009 4:06 PM To: pig-dev@hadoop.apache.org Subject: Proposal to create a branch for contrib project Zebra Thanks to the PIG team, The first version of contrib project Zebra (PIG-833) is committed to PIG trunk. In short, Zebra is a table storage layer built for use in PIG and other Hadoop applications. While we are stabilizing current version V1 in the trunk, we plan to add more new features to it. We would like to create an svn branch for the new features. We will be responsible for managing zebra in PIG trunk and in the new branch. We will merge the branch when it is ready. We expect the changes to affect only 'contrib/zebra' directory. As a regular contributor to Hadoop, I will be the initial committer for Zebra. As more patches are contributed by other Zebra developers, there might be more commiters added through normal Hadoop/Apache procedure. I would like to create a branch called 'zebra-v2' with approval from PIG team. Thanks, Raghu. -- Yiping Han F-3140 (408)349-4403 y...@yahoo-inc.com -- Milind Bhandarkar Y!IM: GridSolutions Tel: 408-349-2136 (mili...@yahoo-inc.com)
Re: Proposal to create a branch for contrib project Zebra
Milind A Bhandarkar wrote: Since zebra.jar is not included in pig.jar (I hope not), I can still use stable zebra jar (binary) with latest pig compiled in trunk. The problem is that though the current version is expected to be stable, it would still require some bug fixes. We essentially need to maintain another branch (official or a private git) to provide version 0.1 jar with critical bug fixes. In that sense, would it be better if we created a zebra-v1 branch and commit the new features to trunk? May be for regular users we can create Pig.jar and zebra.jar from different lines. Raghu. Also, build failure in zebra need not impact pig release, since the other contrib, i.e. Piggybank is also build-optional. I think that creating a branch results in too many changes on that branch before a mainline merge happens. Each of the feature additions you mention would be very highly desirable even in the absence of others. Just my 2 non-binding cents. - milind
RE: Proposal to create a branch for contrib project Zebra
I would recommend that zebra wait for Pig 0.4.0 (a couple of weeks?). A branch will be created for the 0.4.0 release and zebra will automatically benefit. Santhosh -Original Message- From: Raghu Angadi [mailto:rang...@yahoo-inc.com] Sent: Tuesday, August 18, 2009 9:49 AM To: pig-dev@hadoop.apache.org Subject: Re: Proposal to create a branch for contrib project Zebra Milind A Bhandarkar wrote: Since zebra.jar is not included in pig.jar (I hope not), I can still use stable zebra jar (binary) with latest pig compiled in trunk. The problem is that though the current version is expected to be stable, it would still require some bug fixes. We essentially need to maintain another branch (official or a private git) to provide version 0.1 jar with critical bug fixes. In that sense, would it be better if we created a zebra-v1 branch and commit the new features to trunk? May be for regular users we can create Pig.jar and zebra.jar from different lines. Raghu. Also, build failure in zebra need not impact pig release, since the other contrib, i.e. Piggybank is also build-optional. I think that creating a branch results in too many changes on that branch before a mainline merge happens. Each of the feature additions you mention would be very highly desirable even in the absence of others. Just my 2 non-binding cents. - milind
Re: Proposal to create a branch for contrib project Zebra
Right. I just noticed the mails on Pig.0.4.0. I joined pig-dev list just yesterday. waiting for 0.4.0 might be good enough if it is just a couple of weeks. will keep a watch on it. I think we will wait for a few days and attach any new feature patches to jiras. Those patches can certainly wait there. For interdependencies of the patches, we might maintain a private git. Raghu. Santhosh Srinivasan wrote: I would recommend that zebra wait for Pig 0.4.0 (a couple of weeks?). A branch will be created for the 0.4.0 release and zebra will automatically benefit. Santhosh -Original Message- From: Raghu Angadi [mailto:rang...@yahoo-inc.com] Sent: Tuesday, August 18, 2009 9:49 AM To: pig-dev@hadoop.apache.org Subject: Re: Proposal to create a branch for contrib project Zebra Milind A Bhandarkar wrote: Since zebra.jar is not included in pig.jar (I hope not), I can still use stable zebra jar (binary) with latest pig compiled in trunk. The problem is that though the current version is expected to be stable, it would still require some bug fixes. We essentially need to maintain another branch (official or a private git) to provide version 0.1 jar with critical bug fixes. In that sense, would it be better if we created a zebra-v1 branch and commit the new features to trunk? May be for regular users we can create Pig.jar and zebra.jar from different lines. Raghu. Also, build failure in zebra need not impact pig release, since the other contrib, i.e. Piggybank is also build-optional. I think that creating a branch results in too many changes on that branch before a mainline merge happens. Each of the feature additions you mention would be very highly desirable even in the absence of others. Just my 2 non-binding cents. - milind
Re: Proposal to create a branch for contrib project Zebra
I think we are creating unnecessary bureaucratic hurdles here by preventing contrib project from having a branch. I don't see why zebra has to use pig release branch, as the new pig release does not include it. The decisions are supposed to help keeping things open, but this seems to be forcing Raghu to keep things in private git . -Thejas On 8/18/09 10:56 AM, Raghu Angadi rang...@yahoo-inc.com wrote: Right. I just noticed the mails on Pig.0.4.0. I joined pig-dev list just yesterday. waiting for 0.4.0 might be good enough if it is just a couple of weeks. will keep a watch on it. I think we will wait for a few days and attach any new feature patches to jiras. Those patches can certainly wait there. For interdependencies of the patches, we might maintain a private git. Raghu. Santhosh Srinivasan wrote: I would recommend that zebra wait for Pig 0.4.0 (a couple of weeks?). A branch will be created for the 0.4.0 release and zebra will automatically benefit. Santhosh
Proposal to create a branch for contrib project Zebra
Thanks to the PIG team, The first version of contrib project Zebra (PIG-833) is committed to PIG trunk. In short, Zebra is a table storage layer built for use in PIG and other Hadoop applications. While we are stabilizing current version V1 in the trunk, we plan to add more new features to it. We would like to create an svn branch for the new features. We will be responsible for managing zebra in PIG trunk and in the new branch. We will merge the branch when it is ready. We expect the changes to affect only 'contrib/zebra' directory. As a regular contributor to Hadoop, I will be the initial committer for Zebra. As more patches are contributed by other Zebra developers, there might be more commiters added through normal Hadoop/Apache procedure. I would like to create a branch called 'zebra-v2' with approval from PIG team. Thanks, Raghu.
RE: Proposal to create a branch for contrib project Zebra
+1 -Original Message- From: Raghu Angadi [mailto:rang...@yahoo-inc.com] Sent: Monday, August 17, 2009 4:06 PM To: pig-dev@hadoop.apache.org Subject: Proposal to create a branch for contrib project Zebra Thanks to the PIG team, The first version of contrib project Zebra (PIG-833) is committed to PIG trunk. In short, Zebra is a table storage layer built for use in PIG and other Hadoop applications. While we are stabilizing current version V1 in the trunk, we plan to add more new features to it. We would like to create an svn branch for the new features. We will be responsible for managing zebra in PIG trunk and in the new branch. We will merge the branch when it is ready. We expect the changes to affect only 'contrib/zebra' directory. As a regular contributor to Hadoop, I will be the initial committer for Zebra. As more patches are contributed by other Zebra developers, there might be more commiters added through normal Hadoop/Apache procedure. I would like to create a branch called 'zebra-v2' with approval from PIG team. Thanks, Raghu.
RE: Proposal to create a branch for contrib project Zebra
My vote is -1 -Original Message- From: Santhosh Srinivasan Sent: Monday, August 17, 2009 4:38 PM To: 'pig-dev@hadoop.apache.org' Subject: RE: Proposal to create a branch for contrib project Zebra Is there any precedence for such proposals? I am not comfortable with extending committer access to contrib teams. I would suggest that Zebra be made a sub-project of Hadoop and have a life of its own. Santhosh -Original Message- From: Raghu Angadi [mailto:rang...@yahoo-inc.com] Sent: Monday, August 17, 2009 4:06 PM To: pig-dev@hadoop.apache.org Subject: Proposal to create a branch for contrib project Zebra Thanks to the PIG team, The first version of contrib project Zebra (PIG-833) is committed to PIG trunk. In short, Zebra is a table storage layer built for use in PIG and other Hadoop applications. While we are stabilizing current version V1 in the trunk, we plan to add more new features to it. We would like to create an svn branch for the new features. We will be responsible for managing zebra in PIG trunk and in the new branch. We will merge the branch when it is ready. We expect the changes to affect only 'contrib/zebra' directory. As a regular contributor to Hadoop, I will be the initial committer for Zebra. As more patches are contributed by other Zebra developers, there might be more commiters added through normal Hadoop/Apache procedure. I would like to create a branch called 'zebra-v2' with approval from PIG team. Thanks, Raghu.
RE: Proposal to create a branch for contrib project Zebra
Is there any precedence for such proposals? I am not comfortable with extending committer access to contrib teams. I would suggest that Zebra be made a sub-project of Hadoop and have a life of its own. Santhosh -Original Message- From: Raghu Angadi [mailto:rang...@yahoo-inc.com] Sent: Monday, August 17, 2009 4:06 PM To: pig-dev@hadoop.apache.org Subject: Proposal to create a branch for contrib project Zebra Thanks to the PIG team, The first version of contrib project Zebra (PIG-833) is committed to PIG trunk. In short, Zebra is a table storage layer built for use in PIG and other Hadoop applications. While we are stabilizing current version V1 in the trunk, we plan to add more new features to it. We would like to create an svn branch for the new features. We will be responsible for managing zebra in PIG trunk and in the new branch. We will merge the branch when it is ready. We expect the changes to affect only 'contrib/zebra' directory. As a regular contributor to Hadoop, I will be the initial committer for Zebra. As more patches are contributed by other Zebra developers, there might be more commiters added through normal Hadoop/Apache procedure. I would like to create a branch called 'zebra-v2' with approval from PIG team. Thanks, Raghu.
RE: Proposal to create a branch for contrib project Zebra
Raghu is PMC member and as such already has committer rights to all subprojects. So we are not breaking any new grounds here. The reasoning is the same as for creating branches for Pig multiquery work that we did in Pig. Olga -Original Message- From: Santhosh Srinivasan [mailto:s...@yahoo-inc.com] Sent: Monday, August 17, 2009 4:39 PM To: Santhosh Srinivasan; pig-dev@hadoop.apache.org Subject: RE: Proposal to create a branch for contrib project Zebra My vote is -1 -Original Message- From: Santhosh Srinivasan Sent: Monday, August 17, 2009 4:38 PM To: 'pig-dev@hadoop.apache.org' Subject: RE: Proposal to create a branch for contrib project Zebra Is there any precedence for such proposals? I am not comfortable with extending committer access to contrib teams. I would suggest that Zebra be made a sub-project of Hadoop and have a life of its own. Santhosh -Original Message- From: Raghu Angadi [mailto:rang...@yahoo-inc.com] Sent: Monday, August 17, 2009 4:06 PM To: pig-dev@hadoop.apache.org Subject: Proposal to create a branch for contrib project Zebra Thanks to the PIG team, The first version of contrib project Zebra (PIG-833) is committed to PIG trunk. In short, Zebra is a table storage layer built for use in PIG and other Hadoop applications. While we are stabilizing current version V1 in the trunk, we plan to add more new features to it. We would like to create an svn branch for the new features. We will be responsible for managing zebra in PIG trunk and in the new branch. We will merge the branch when it is ready. We expect the changes to affect only 'contrib/zebra' directory. As a regular contributor to Hadoop, I will be the initial committer for Zebra. As more patches are contributed by other Zebra developers, there might be more commiters added through normal Hadoop/Apache procedure. I would like to create a branch called 'zebra-v2' with approval from PIG team. Thanks, Raghu.
Re: Proposal to create a branch for contrib project Zebra
+1 On 8/18/09 7:11 AM, Olga Natkovich ol...@yahoo-inc.com wrote: +1 -Original Message- From: Raghu Angadi [mailto:rang...@yahoo-inc.com] Sent: Monday, August 17, 2009 4:06 PM To: pig-dev@hadoop.apache.org Subject: Proposal to create a branch for contrib project Zebra Thanks to the PIG team, The first version of contrib project Zebra (PIG-833) is committed to PIG trunk. In short, Zebra is a table storage layer built for use in PIG and other Hadoop applications. While we are stabilizing current version V1 in the trunk, we plan to add more new features to it. We would like to create an svn branch for the new features. We will be responsible for managing zebra in PIG trunk and in the new branch. We will merge the branch when it is ready. We expect the changes to affect only 'contrib/zebra' directory. As a regular contributor to Hadoop, I will be the initial committer for Zebra. As more patches are contributed by other Zebra developers, there might be more commiters added through normal Hadoop/Apache procedure. I would like to create a branch called 'zebra-v2' with approval from PIG team. Thanks, Raghu. -- Yiping Han F-3140 (408)349-4403 y...@yahoo-inc.com
Re: Proposal to create a branch for contrib project Zebra
IANAC, but my (non-binding) vote is also -1. I think all the improvements and feature addition to zebra should be available through pig trunk. The codebase is not big enough to justify creating a branch. If the reason is Pig's dependence on a checked in hadoop jar, the shims proposal by Dmitry should be taken up asap, so that those who want to use zebra can use pig trunk with hadoop 0.20 - milind On 8/17/09 5:14 PM, Yiping Han y...@yahoo-inc.com wrote: +1 On 8/18/09 7:11 AM, Olga Natkovich ol...@yahoo-inc.com wrote: +1 -Original Message- From: Raghu Angadi [mailto:rang...@yahoo-inc.com] Sent: Monday, August 17, 2009 4:06 PM To: pig-dev@hadoop.apache.org Subject: Proposal to create a branch for contrib project Zebra Thanks to the PIG team, The first version of contrib project Zebra (PIG-833) is committed to PIG trunk. In short, Zebra is a table storage layer built for use in PIG and other Hadoop applications. While we are stabilizing current version V1 in the trunk, we plan to add more new features to it. We would like to create an svn branch for the new features. We will be responsible for managing zebra in PIG trunk and in the new branch. We will merge the branch when it is ready. We expect the changes to affect only 'contrib/zebra' directory. As a regular contributor to Hadoop, I will be the initial committer for Zebra. As more patches are contributed by other Zebra developers, there might be more commiters added through normal Hadoop/Apache procedure. I would like to create a branch called 'zebra-v2' with approval from PIG team. Thanks, Raghu. -- Yiping Han F-3140 (408)349-4403 y...@yahoo-inc.com -- Milind Bhandarkar Y!IM: GridSolutions Tel: 408-349-2136 (mili...@yahoo-inc.com)
Re: Proposal to create a branch for contrib project Zebra
On Aug 17, 2009, at 4:38 PM, Santhosh Srinivasan wrote: Is there any precedence for such proposals? I am not comfortable with extending committer access to contrib teams. I would suggest that Zebra be made a sub-project of Hadoop and have a life of its own. There has been sufficient precedence for 'contrib committers' in Hadoop (e.g. Chukwa vis-a-vis the former 'Hadoop Core' sub-project) and is normal within the Apache world for committers with specific 'roles' e.g specific Contrib modules, QA, Release/Build etc. (http://hadoop.apache.org/common/credits.html - in fact, Giridharan Kesavan is an unlisted 'release' committer for Apache Hadoop) I believe it's a desired, nay stated, goal for Zebra to graduate as a Hadoop sub-project eventually, based on which it was voted-in as a contrib module by the Apache Pig. Given these, I don't see any cause for concern here. Arun Santhosh -Original Message- From: Raghu Angadi [mailto:rang...@yahoo-inc.com] Sent: Monday, August 17, 2009 4:06 PM To: pig-dev@hadoop.apache.org Subject: Proposal to create a branch for contrib project Zebra Thanks to the PIG team, The first version of contrib project Zebra (PIG-833) is committed to PIG trunk. In short, Zebra is a table storage layer built for use in PIG and other Hadoop applications. While we are stabilizing current version V1 in the trunk, we plan to add more new features to it. We would like to create an svn branch for the new features. We will be responsible for managing zebra in PIG trunk and in the new branch. We will merge the branch when it is ready. We expect the changes to affect only 'contrib/zebra' directory. As a regular contributor to Hadoop, I will be the initial committer for Zebra. As more patches are contributed by other Zebra developers, there might be more commiters added through normal Hadoop/Apache procedure. I would like to create a branch called 'zebra-v2' with approval from PIG team. Thanks, Raghu.
Re: Proposal to create a branch for contrib project Zebra
That leaves us with contrib committers. Can you point to earlier email threads that cover the topic of giving committer access to contrib projects? Specifically, what does it mean to award someone committer privileges to a contrib project, what are the access privileges that come with such rights, what are the dos/don'ts, etc. Chukwa was a contrib module prior to it's current avatar as a full- fledged sub-project. It's 'contrib committers' Ari Rabkin and Eric Yang became it's first committers: http://markmail.org/message/75qvvcigi3qumifp Unfortunately the email threads for voting contrib committers are private to the Hadoop PMC, you'll just have to take my word for it. *smile* I did dig-up some other examples for you: http://www.gossamer-threads.com/lists/lucene/java-dev/81122 http://www.nabble.com/ANNOUNCE:-Welcome--as-Contrib-Committer-td21506295.html Contrib committers have privileges to commit only to their 'module': pig/trunk/contrib/zebra in this case. Thirdly, are there instances of contrib committers creating branches? Branches are a development tool... I don't see the problem with creating/using them. Arun
Re: Proposal to create a branch for contrib project Zebra
Hi Santosh, There are two separate things : (a) voting a contributor as a committer (b) committing to a contrib project. (b): My experience with Hadoop is that Contrib by definition is very loosely coupled with core. By convention, we as committers to core (hdfs, mapred, etc) did not have to monitor changes to contrib as thoroughly as we would monitor core changes. It is the responsibility of contrib developers to make sure they are not breaking builds etc. Contrib changes get reviewed by people interested in the project. (a): Voting takes place when a contributor is being blessed as a committer. It involves some legal stuff as well. Although a committer has permissions to commit to any part of a project, it is expected that they don't misuse it. e.g. if I have a patch for core Map/Reduce, I would certainly wait for a regular MR contributor to review it and possibly commit it. It does not matter how many patches I might have contributed to say HDFS. Reason for (a) is simple scalability. We can not monitor everything. If you or another PIG developer volunteers to commit zebra patches, we are more than happy to let you do it. Please let us know. Or at any stage, if you feel we may be violating normal conventions (like breaking builds or committing some PIG changes).. please raise the issue. We have not seen serious problems in this regd with any other project, I think we should get benefit or doubt. I have not addressed the reason for a new branch here. will pitch for it another mail. Raghu. Santhosh Srinivasan wrote: Is there any precedence for such proposals? I am not comfortable with extending committer access to contrib teams. I would suggest that Zebra be made a sub-project of Hadoop and have a life of its own. Santhosh -Original Message- From: Raghu Angadi [mailto:rang...@yahoo-inc.com] Sent: Monday, August 17, 2009 4:06 PM To: pig-dev@hadoop.apache.org Subject: Proposal to create a branch for contrib project Zebra Thanks to the PIG team, The first version of contrib project Zebra (PIG-833) is committed to PIG trunk. In short, Zebra is a table storage layer built for use in PIG and other Hadoop applications. While we are stabilizing current version V1 in the trunk, we plan to add more new features to it. We would like to create an svn branch for the new features. We will be responsible for managing zebra in PIG trunk and in the new branch. We will merge the branch when it is ready. We expect the changes to affect only 'contrib/zebra' directory. As a regular contributor to Hadoop, I will be the initial committer for Zebra. As more patches are contributed by other Zebra developers, there might be more commiters added through normal Hadoop/Apache procedure. I would like to create a branch called 'zebra-v2' with approval from PIG team. Thanks, Raghu.
Re: Proposal to create a branch for contrib project Zebra
Raghu Angadi wrote: Hi Santosh, There are two separate things : (a) voting a contributor as a committer (b) committing to a contrib project. [...] Reason for (a) is simple scalability. We can not monitor everything. If I meant to say Reason for (b) (why contrib commits are treated bit differently). Our motivation is not to bypass any oversight.. it is just so that we don't to burden PIG committers too much. We are happy if a PIG committer volunteers to oversee and commit. Raghu. you or another PIG developer volunteers to commit zebra patches, we are more than happy to let you do it. Please let us know. Or at any stage, if you feel we may be violating normal conventions (like breaking builds or committing some PIG changes).. please raise the issue. We have not seen serious problems in this regd with any other project, I think we should get benefit or doubt. I have not addressed the reason for a new branch here. will pitch for it another mail. Raghu. Santhosh Srinivasan wrote: Is there any precedence for such proposals? I am not comfortable with extending committer access to contrib teams. I would suggest that Zebra be made a sub-project of Hadoop and have a life of its own. Santhosh -Original Message- From: Raghu Angadi [mailto:rang...@yahoo-inc.com] Sent: Monday, August 17, 2009 4:06 PM To: pig-dev@hadoop.apache.org Subject: Proposal to create a branch for contrib project Zebra Thanks to the PIG team, The first version of contrib project Zebra (PIG-833) is committed to PIG trunk. In short, Zebra is a table storage layer built for use in PIG and other Hadoop applications. While we are stabilizing current version V1 in the trunk, we plan to add more new features to it. We would like to create an svn branch for the new features. We will be responsible for managing zebra in PIG trunk and in the new branch. We will merge the branch when it is ready. We expect the changes to affect only 'contrib/zebra' directory. As a regular contributor to Hadoop, I will be the initial committer for Zebra. As more patches are contributed by other Zebra developers, there might be more commiters added through normal Hadoop/Apache procedure. I would like to create a branch called 'zebra-v2' with approval from PIG team. Thanks, Raghu.