I guess this is more of a suggestion for roadmap than TLP discussion, I think the PMC/committers can create a dedicate position what maintains the web/doc's. Somebody who yell and screams until the doc's are in sync with the implementation before the release.
Because TLP is an elevation of status in addition to internal re-organization. I think it might to create the PR needed to attract the talents to fill in that job... On Mon, Apr 5, 2010 at 11:23 AM, Alan Gates <ga...@yahoo-inc.com> wrote: > I agree that Pig's code documentation is in sad shape. I think our user > documentation for each release is good, of limited. I hope that our > documents on wiki (such as PigJournal) help people understand our roadmap. > Please let us know if you disagree so we can find ways to improve it. > > That said, it isn't clear to me how Pig being a TLP will solve that. The > current committers or some subset thereof (see original message) would > become the PMC. Other than having expanded powers to vote on releases and > who becomes new committers, the role of these new PMC members would not > change much. They won't have anymore time to address documentation and > communication issues. We need to find a way to address those no matter what > governance framework or community Pig is in. > > Alan. > > > On Apr 5, 2010, at 9:02 AM, hc busy wrote: > > This is awesome!!! As much as I hate PJM's for wasting time at all the >> places that I've worked at, I think formalizing the management group(PMC) >> to >> openly and clearly determine feature roadmap and dev schedule is the best >> thing pig can have. >> >> I once commented to my co-worker (also heavy pig user) that pig's >> organization (with all due respect to all you hardworking people) is like >> a >> pigsty! documentations all over the place, javadocs from three versions >> ago, >> much of the documentation doesn't match actual features... links to the >> download page is broken. >> >> If you look at cascading's website... it's so much cleaner. (Of course... >> we >> still use pig because it works well) >> >> I think as TLP, pig will receive better marketing and better support in a >> way that will propel it both in popularity and in the amount of support it >> receives. >> >> As a user, that change will be good for me. >> >> >> On Sun, Apr 4, 2010 at 11:10 PM, Ashutosh Chauhan < >> ashutosh.chau...@gmail.com> wrote: >> >> I concur with Santhosh here. I think main question we need to answer >>> here is how close our ties are with Hadoop currently and how it will >>> be in future ? When Pig was originally designed the intent was to keep >>> it backend neutral, so much so that there was a reference backend >>> implementation (also known as local engine) which had nothing to do >>> with Hadoop. But things have changed since then. Hadoop's local mode >>> is adopted in favor of Pig's own local mode. We have moved from being >>> backend agnostic to hadoop favoring. And while this was happening, it >>> seems we tried to keep Pig Latin language independent of hadoop >>> backend while Pig runtime started to make use of hadoop concepts. >>> >>> Apart from design decisions, this move also has a practical impact on >>> our codebase. Since we adopted Hadoop more closely, we got rid of an >>> extra layer of abstraction and instead started using similar >>> abstractions already existing in Hadoop. This has a positive impact >>> that it simplified the codebase and provides tighter integration with >>> Hadoop. >>> So, if we are continuing in a direction where Hadoop is our only >>> backend (or atleast a favored one), close ties to Hadoop are useful >>> because of the reasons Alan and Dmitriy pointed out. if not, then I >>> think moving out to TLP makes sense. Since, there is no efforts which >>> I am aware of, is trying to plug in a different backend for Pig, I >>> think maintaining close ties with Hadoop is useful for Pig. In future >>> when there is a different distributed computing platform comes up >>> which we want to use as backend, we can revisit our decision. So, as >>> for things stand today I am -1 to move out of Hadoop. >>> >>> And I would also like to reiterate my point that though Pig runtime >>> may continue to get closer to Hadoop, we shall keep Pig Latin >>> completely backend agnostic. >>> >>> Ashutosh >>> >>> On Sat, Apr 3, 2010 at 12:43, Santhosh Srinivasan <s...@yahoo-inc.com> >>> wrote: >>> >>>> I see this as a multi-part question. Looking back at some of the >>>> significant roadmap/existential questions asked in the last 12 months, I >>>> see the following: >>>> >>>> 1. With the introduction of SQL, what is the philosophy of Pig (I sent >>>> an email about this approximately 9 months ago) >>>> 2. What is the approach to support backward compatibility in Pig (Alan >>>> had sent an email about this 3 months ago) >>>> 3. Should Pig be a TLP (the current email thread). >>>> >>>> Here is my take on answering the aforementioned questions. >>>> >>>> The initial philosophy of Pig was to be backend agnostic. It was >>>> designed as a data flow language. Whenever a new language is designed, >>>> the syntax and semantics of the language have to be laid out. The syntax >>>> is usually captured in the form of a BNF grammar. The semantics are >>>> defined by the language creators. Backward compatibility is then a >>>> question of holding true to the syntax and semantics. With Pig, in >>>> addition to the language, the Java APIs were exposed to customers to >>>> implement UDFs (load/store/filter/grouping/row transformation etc), >>>> provision looping since the language does not support looping constructs >>>> and also support a programmatic mode of access. Backward compatibility >>>> in this context is to support API versioning. >>>> >>>> Do we still intend to position as a data flow language that is backend >>>> agnostic? If the answer is yes, then there is a strong case for making >>>> Pig a TLP. >>>> >>>> Are we influenced by Hadoop? A big YES! The reason Pig chose to become a >>>> Hadoop sub-project was to ride the Hadoop popularity wave. As a >>>> consequence, we chose to be heavily influenced by the Hadoop roadmap. >>>> >>>> Like a good lawyer, I also have rebuttals to Alan's questions :) >>>> >>>> 1. Search engine popularity - We can discuss this with the Hadoop team >>>> and still retain links to TLP's that are coupled (loosely or tightly). >>>> 2. Explicit connection to Hadoop - I see this as logical connection v/s >>>> physical connection. Today, we are physically connected as a >>>> sub-project. Becoming a TLP, will not increase/decrease our influence on >>>> the Hadoop community (think Logical, Physical and MR Layers :) >>>> 3. Philosophy - I have already talked about this. The tight coupling is >>>> by choice. If Pig continues to be a data flow language with clear syntax >>>> and semantics then someone can implement Pig on top of a different >>>> backend. Do we intend to take this approach? >>>> >>>> I just wanted to offer a different opinion to this thread. I strongly >>>> believe that we should think about the original philosophy. Will we have >>>> a Pig standards committee that will decide on the changes to the >>>> language (think C/C++) if there are multiple backend implementations? >>>> >>>> I will reserve my vote based on the outcome of the philosophy and >>>> backward compatibility discussions. If we decide that Pig will be >>>> treated and maintained like a true language with clear syntax and >>>> semantics then we have a strong case to make it into a TLP. If not, we >>>> should retain our existing ties to Hadoop and make Pig into a data flow >>>> language for Hadoop. >>>> >>>> Santhosh >>>> >>>> -----Original Message----- >>>> From: Thejas Nair [mailto:te...@yahoo-inc.com] >>>> Sent: Friday, April 02, 2010 4:08 PM >>>> To: pig-dev@hadoop.apache.org; Dmitriy Ryaboy >>>> Subject: Re: Begin a discussion about Pig as a top level project >>>> >>>> I agree with Alan and Dmitriy - Pig is tightly coupled with hadoop, and >>>> heavily influenced by its roadmap. I think it makes sense to continue as >>>> a sub-project of hadoop. >>>> >>>> -Thejas >>>> >>>> >>>> >>>> On 3/31/10 4:04 PM, "Dmitriy Ryaboy" <dvrya...@gmail.com> wrote: >>>> >>>> Over time, Pig is increasing its coupling to Hadoop (for good >>>>> reasons), rather than decreasing it. If and when Pig becomes a viable >>>>> entity without hadoop around, it might make sense as a TLP. As is, I >>>>> think becoming a TLP will only introduce unnecessary administrative >>>>> >>>> and bureaucratic headaches. >>>> >>>>> So my vote is also -1. >>>>> >>>>> -Dmitriy >>>>> >>>>> >>>>> >>>>> On Wed, Mar 31, 2010 at 2:38 PM, Alan Gates <ga...@yahoo-inc.com> >>>>> >>>> wrote: >>>> >>>>> >>>>> So far I haven't seen any feedback on this. Apache has asked the >>>>>> Hadoop PMC to submit input in April on whether some subprojects >>>>>> should be promoted to TLPs. We, the Pig community, need to give >>>>>> feedback to the Hadoop PMC on how we feel about this. Please make >>>>>> >>>>> your voice heard. >>>> >>>>> >>>>>> So now I'll head my own call and give my thoughts on it. >>>>>> >>>>>> The biggest advantage I see to being a TLP is a direct connection to >>>>>> Apache. Right now all of the Pig team's interaction with Apache is >>>>>> through the Hadoop PMC. Being directly connected to Apache would >>>>>> benefit Pig team members who would have a better view into Apache. >>>>>> It would also raise our profile in Apache and thus make other >>>>>> >>>>> projects more aware of us. >>>> >>>>> >>>>>> However, I am concerned about loosing Pig's explicit connection to >>>>>> >>>>> Hadoop. >>>> >>>>> This concern has a couple of dimensions. One, Hadoop and MapReduce >>>>>> are the current flavor of the month in computing. Given that Pig >>>>>> shares a name with the common farm animal, it's hard to be sure based >>>>>> >>>>> on search statistics. >>>> >>>>> But Google trends shows that "hadoop" is searched on much more >>>>>> frequently than "hadoop pig" or "apache pig" (see >>>>>> http://www.google.com/trends?q=hadoop%2Chadoop+pig). I am guessing >>>>>> that most Pig users come from Hadoop users who discover Pig via >>>>>> >>>>> Hadoop's website. >>>> >>>>> Loosing that subproject tab on Hadoop's front page may radically >>>>>> lower the number of users coming to Pig to check out our project. I >>>>>> would argue that this benefits Hadoop as well, since high level >>>>>> languages like Pig Latin have the potential to greatly extend the >>>>>> >>>>> user base and usability of Hadoop. >>>> >>>>> >>>>>> Two, being explicitly connected to Hadoop keeps our two communities >>>>>> aware of each others needs. There are features proposed for MR that >>>>>> would greatly help Pig. By staying in the Hadoop community Pig is >>>>>> better positioned to advocate for and help implement and test those >>>>>> features. The response to this will be that Pig developers can still >>>>>> >>>>> >>>> subscribe to Hadoop mailing lists, submit patches, etc. That is, >>>>>> they can still be part of the Hadoop community. Which reinforces my >>>>>> point that it makes more sense to leave Pig in the Hadoop community >>>>>> since Pig developers will need to be part of that community anyway. >>>>>> >>>>>> Finally, philosophically it makes sense to me that projects that are >>>>>> tightly connected belong together. It strikes me as strange to have >>>>>> Pig as a TLP completely dependent on another TLP. Hadoop was >>>>>> originally a subproject of Lucene. It moved out to be a TLP when it >>>>>> became obvious that Hadoop had become independent of and useful apart >>>>>> >>>>> >>>> from Lucene. Pig is not in that position relative to Hadoop. >>>>>> >>>>>> So, I'm -1 on Pig moving out. But this is a soft -1. I'm open to >>>>>> being persuaded that I'm wrong or my concerns can be addressed while >>>>>> still having Pig as a TLP. >>>>>> >>>>>> Alan. >>>>>> >>>>>> >>>>>> On Mar 19, 2010, at 10:59 AM, Alan Gates wrote: >>>>>> >>>>>> You have probably heard by now that there is a discussion going on >>>>>> in the >>>>>> >>>>>>> Hadoop PMC as to whether a number of the subprojects (Hbase, Avro, >>>>>>> Zookeeper, Hive, and Pig) should move out from under the Hadoop >>>>>>> umbrella and become top level Apache projects (TLP). This >>>>>>> discussion has picked up recently since the Apache board has clearly >>>>>>> >>>>>> >>>> communicated to the Hadoop PMC that it is concerned that Hadoop is >>>>>>> acting as an umbrella project with many disjoint subprojects >>>>>>> underneath it. They are concerned that this gives Apache little >>>>>>> insight into the health and happenings of the subproject communities >>>>>>> >>>>>> >>>> which in turn means Apache cannot properly mentor those communities. >>>>>>> >>>>>>> The purpose of this email is to start a discussion within the Pig >>>>>>> community about this topic. Let me cover first what becoming TLP >>>>>>> would mean for Pig, and then I'll go into what options I think we as >>>>>>> >>>>>> a community have. >>>> >>>>> >>>>>>> Becoming a TLP would mean that Pig would itself have a PMC that >>>>>>> would report directly to the Apache board. Who would be on the PMC >>>>>>> would be something we as a community would need to decide. Common >>>>>>> options would be to say all active committers are on the PMC, or all >>>>>>> >>>>>> >>>> active committers who have been a committer for at least a year. We >>>>>>> >>>>>> >>>> would also need to elect a chair of the PMC. This lucky person >>>>>>> would have no additional power, but would have the additional >>>>>>> responsibility of writing quarterly reports on Pig's status for >>>>>>> Apache board meetings, as well as coordinating with Apache to get >>>>>>> accounts for new committers, etc. For more information see >>>>>>> http://www.apache.org/foundation/how-it-works.html#roles >>>>>>> >>>>>>> Becoming a TLP would not mean that we are ostracized from the Hadoop >>>>>>> >>>>>> >>>> community. We would continue to be invited to Hadoop Summits, HUGs, >>>>>>> >>>>>> etc. >>>> >>>>> Since all Pig developers and users are by definition Hadoop users, >>>>>>> we would continue to be a strong presence in the Hadoop community. >>>>>>> >>>>>>> I see three ways that we as a community can respond to this: >>>>>>> >>>>>>> 1) Say yes, we want to be a TLP now. >>>>>>> 2) Say yes, we want to be a TLP, but not yet. We feel we need more >>>>>>> time to mature. If we choose this option we need to be able to >>>>>>> clearly articulate how much time we need and what we hope to see >>>>>>> change in that time. >>>>>>> 3) Say no, we feel the benefits for us staying with Hadoop outweigh >>>>>>> the drawbacks of being a disjoint subproject. If we choose this, we >>>>>>> >>>>>> >>>> need to be able to say exactly what those benefits are and why we >>>>>>> feel they will be compromised by leaving the Hadoop project. >>>>>>> >>>>>>> There may other options that I haven't thought of. Please feel free >>>>>>> >>>>>> >>>> to suggest any you think of. >>>>>>> >>>>>>> Questions? Thoughts? Let the discussion begin. >>>>>>> >>>>>>> Alan. >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>> >>>> >>> >