Over time, Pig is increasing its coupling to Hadoop (for good reasons), rather than decreasing it. If and when Pig becomes a viable entity without hadoop around, it might make sense as a TLP. As is, I think becoming a TLP will only introduce unnecessary administrative and bureaucratic headaches. So my vote is also -1.
-Dmitriy On Wed, Mar 31, 2010 at 2:38 PM, Alan Gates <ga...@yahoo-inc.com> wrote: > So far I haven't seen any feedback on this. Apache has asked the Hadoop > PMC to submit input in April on whether some subprojects should be promoted > to TLPs. We, the Pig community, need to give feedback to the Hadoop PMC on > how we feel about this. Please make your voice heard. > > So now I'll head my own call and give my thoughts on it. > > The biggest advantage I see to being a TLP is a direct connection to > Apache. Right now all of the Pig team's interaction with Apache is through > the Hadoop PMC. Being directly connected to Apache would benefit Pig team > members who would have a better view into Apache. It would also raise our > profile in Apache and thus make other projects more aware of us. > > However, I am concerned about loosing Pig's explicit connection to Hadoop. > This concern has a couple of dimensions. One, Hadoop and MapReduce are the > current flavor of the month in computing. Given that Pig shares a name with > the common farm animal, it's hard to be sure based on search statistics. > But Google trends shows that "hadoop" is searched on much more frequently > than "hadoop pig" or "apache pig" (see > http://www.google.com/trends?q=hadoop%2Chadoop+pig). I am guessing that > most Pig users come from Hadoop users who discover Pig via Hadoop's website. > Loosing that subproject tab on Hadoop's front page may radically lower the > number of users coming to Pig to check out our project. I would argue that > this benefits Hadoop as well, since high level languages like Pig Latin have > the potential to greatly extend the user base and usability of Hadoop. > > Two, being explicitly connected to Hadoop keeps our two communities aware > of each others needs. There are features proposed for MR that would greatly > help Pig. By staying in the Hadoop community Pig is better positioned to > advocate for and help implement and test those features. The response to > this will be that Pig developers can still subscribe to Hadoop mailing > lists, submit patches, etc. That is, they can still be part of the Hadoop > community. Which reinforces my point that it makes more sense to leave Pig > in the Hadoop community since Pig developers will need to be part of that > community anyway. > > Finally, philosophically it makes sense to me that projects that are > tightly connected belong together. It strikes me as strange to have Pig as > a TLP completely dependent on another TLP. Hadoop was originally a > subproject of Lucene. It moved out to be a TLP when it became obvious that > Hadoop had become independent of and useful apart from Lucene. Pig is not > in that position relative to Hadoop. > > So, I'm -1 on Pig moving out. But this is a soft -1. I'm open to being > persuaded that I'm wrong or my concerns can be addressed while still having > Pig as a TLP. > > Alan. > > > On Mar 19, 2010, at 10:59 AM, Alan Gates wrote: > > You have probably heard by now that there is a discussion going on in the >> Hadoop PMC as to whether a number of the subprojects (Hbase, Avro, >> Zookeeper, Hive, and Pig) should move out from under the Hadoop umbrella and >> become top level Apache projects (TLP). This discussion has picked up >> recently since the Apache board has clearly communicated to the Hadoop PMC >> that it is concerned that Hadoop is acting as an umbrella project with many >> disjoint subprojects underneath it. They are concerned that this gives >> Apache little insight into the health and happenings of the subproject >> communities which in turn means Apache cannot properly mentor those >> communities. >> >> The purpose of this email is to start a discussion within the Pig >> community about this topic. Let me cover first what becoming TLP would mean >> for Pig, and then I'll go into what options I think we as a community have. >> >> Becoming a TLP would mean that Pig would itself have a PMC that would >> report directly to the Apache board. Who would be on the PMC would be >> something we as a community would need to decide. Common options would be >> to say all active committers are on the PMC, or all active committers who >> have been a committer for at least a year. We would also need to elect a >> chair of the PMC. This lucky person would have no additional power, but >> would have the additional responsibility of writing quarterly reports on >> Pig's status for Apache board meetings, as well as coordinating with Apache >> to get accounts for new committers, etc. For more information see >> http://www.apache.org/foundation/how-it-works.html#roles >> >> Becoming a TLP would not mean that we are ostracized from the Hadoop >> community. We would continue to be invited to Hadoop Summits, HUGs, etc. >> Since all Pig developers and users are by definition Hadoop users, we would >> continue to be a strong presence in the Hadoop community. >> >> I see three ways that we as a community can respond to this: >> >> 1) Say yes, we want to be a TLP now. >> 2) Say yes, we want to be a TLP, but not yet. We feel we need more time >> to mature. If we choose this option we need to be able to clearly >> articulate how much time we need and what we hope to see change in that >> time. >> 3) Say no, we feel the benefits for us staying with Hadoop outweigh the >> drawbacks of being a disjoint subproject. If we choose this, we need to be >> able to say exactly what those benefits are and why we feel they will be >> compromised by leaving the Hadoop project. >> >> There may other options that I haven't thought of. Please feel free to >> suggest any you think of. >> >> Questions? Thoughts? Let the discussion begin. >> >> Alan. >> >> >