Re: Begin a discussion about Pig as a top level project

2010-04-05 Thread Ashutosh Chauhan
: Friday, April 02, 2010 4:08 PM
 To: pig-dev@hadoop.apache.org; Dmitriy Ryaboy
 Subject: Re: Begin a discussion about Pig as a top level project

 I agree with Alan and Dmitriy - Pig is tightly coupled with hadoop, and
 heavily influenced by its roadmap. I think it makes sense to continue as
 a sub-project of hadoop.

 -Thejas



 On 3/31/10 4:04 PM, Dmitriy Ryaboy dvrya...@gmail.com wrote:

 Over time, Pig is increasing its coupling to Hadoop (for good
 reasons), rather than decreasing it. If and when Pig becomes a viable
 entity without hadoop around, it might make sense as a TLP. As is, I
 think becoming a TLP will only introduce unnecessary administrative
 and bureaucratic headaches.
 So my vote is also -1.

 -Dmitriy



 On Wed, Mar 31, 2010 at 2:38 PM, Alan Gates ga...@yahoo-inc.com
 wrote:

 So far I haven't seen any feedback on this.  Apache has asked the
 Hadoop PMC to submit input in April on whether some subprojects
 should be promoted to TLPs.  We, the Pig community, need to give
 feedback to the Hadoop PMC on how we feel about this.  Please make
 your voice heard.

 So now I'll head my own call and give my thoughts on it.

 The biggest advantage I see to being a TLP is a direct connection to
 Apache.  Right now all of the Pig team's interaction with Apache is
 through the Hadoop PMC.  Being directly connected to Apache would
 benefit Pig team members who would have a better view into Apache.
 It would also raise our profile in Apache and thus make other
 projects more aware of us.

 However, I am concerned about loosing Pig's explicit connection to
 Hadoop.
  This concern has a couple of dimensions.  One, Hadoop and MapReduce
 are the current flavor of the month in computing.  Given that Pig
 shares a name with the common farm animal, it's hard to be sure based
 on search statistics.
  But Google trends shows that hadoop is searched on much more
 frequently than hadoop pig or apache pig (see
 http://www.google.com/trends?q=hadoop%2Chadoop+pig).  I am guessing
 that most Pig users come from Hadoop users who discover Pig via
 Hadoop's website.
  Loosing that subproject tab on Hadoop's front page may radically
 lower the number of users coming to Pig to check out our project.  I
 would argue that this benefits Hadoop as well, since high level
 languages like Pig Latin have the potential to greatly extend the
 user base and usability of Hadoop.

 Two, being explicitly connected to Hadoop keeps our two communities
 aware of each others needs.  There are features proposed for MR that
 would greatly help Pig.  By staying in the Hadoop community Pig is
 better positioned to advocate for and help implement and test those
 features.  The response to this will be that Pig developers can still

 subscribe to Hadoop mailing lists, submit patches, etc.  That is,
 they can still be part of the Hadoop community.  Which reinforces my
 point that it makes more sense to leave Pig in the Hadoop community
 since Pig developers will need to be part of that community anyway.

 Finally, philosophically it makes sense to me that projects that are
 tightly connected belong together.  It strikes me as strange to have
 Pig as a TLP completely dependent on another TLP.  Hadoop was
 originally a subproject of Lucene.  It moved out to be a TLP when it
 became obvious that Hadoop had become independent of and useful apart

 from Lucene.  Pig is not in that position relative to Hadoop.

 So, I'm -1 on Pig moving out.  But this is a soft -1.  I'm open to
 being persuaded that I'm wrong or my concerns can be addressed while
 still having Pig as a TLP.

 Alan.


 On Mar 19, 2010, at 10:59 AM, Alan Gates wrote:

  You have probably heard by now that there is a discussion going on
 in the
 Hadoop PMC as to whether a number of the subprojects (Hbase, Avro,
 Zookeeper, Hive, and Pig) should move out from under the Hadoop
 umbrella and become top level Apache projects (TLP).  This
 discussion has picked up recently since the Apache board has clearly

 communicated to the Hadoop PMC that it is concerned that Hadoop is
 acting as an umbrella project with many disjoint subprojects
 underneath it.  They are concerned that this gives Apache little
 insight into the health and happenings of the subproject communities

 which in turn means Apache cannot properly mentor those communities.

 The purpose of this email is to start a discussion within the Pig
 community about this topic.  Let me cover first what becoming TLP
 would mean for Pig, and then I'll go into what options I think we as
 a community have.

 Becoming a TLP would mean that Pig would itself have a PMC that
 would report directly to the Apache board.  Who would be on the PMC
 would be something we as a community would need to decide.  Common
 options would be to say all active committers are on the PMC, or all

 active committers who have been a committer for at least a year.  We

 would also need to elect a chair of the PMC.  This lucky person
 would have no additional power

RE: Begin a discussion about Pig as a top level project

2010-04-05 Thread Pradeep Kamath
I agree with Ashutosh and Santhosh. Just based on the current direction of the 
project I think we are more closely tied with Hadoop now (with Pig 0.7, our 
load/store interfaces are very closely tied with Hadoop) - hence for now my 
vote would be a -1 to be a TLP - if there is change in that 
direction/philosophy to be really backend agnostic I think we should revisit 
this question.

Pradeep

-Original Message-
From: Ashutosh Chauhan [mailto:ashutosh.chau...@gmail.com] 
Sent: Sunday, April 04, 2010 11:11 PM
To: pig-dev@hadoop.apache.org
Subject: Re: Begin a discussion about Pig as a top level project

I concur with Santhosh here. I think main question we need to answer
here is how close our ties are with Hadoop currently and how it will
be in future ? When Pig was originally designed the intent was to keep
it backend neutral, so  much so that there was a reference backend
implementation (also known as local engine) which had nothing to do
with Hadoop. But things have changed since then. Hadoop's local mode
is adopted in favor of Pig's own local mode. We have moved from being
backend agnostic to hadoop favoring. And while this was happening, it
seems we tried to keep Pig Latin language independent of hadoop
backend  while Pig runtime started to make use of hadoop concepts.

Apart from design decisions, this move also has a practical impact on
our codebase. Since we adopted Hadoop more closely, we got rid of an
extra layer of abstraction and instead started using similar
abstractions already existing in Hadoop. This has a positive impact
that it simplified the codebase and provides tighter integration with
Hadoop.
So, if we are continuing in a direction where Hadoop is our only
backend (or atleast a favored one), close ties to Hadoop are useful
because of the reasons Alan and Dmitriy pointed out. if not, then I
think moving out to TLP makes sense. Since, there is no efforts which
I am aware of, is trying to plug in a different backend for Pig, I
think maintaining close ties with Hadoop is useful for Pig. In future
when there is a different distributed computing platform comes up
which we want to use as backend, we can revisit our decision. So, as
for things stand today I am -1 to move out of  Hadoop.

And I would also like to reiterate my point that though Pig runtime
may continue to get closer to Hadoop, we shall keep Pig Latin
completely backend agnostic.

Ashutosh

On Sat, Apr 3, 2010 at 12:43, Santhosh Srinivasan s...@yahoo-inc.com wrote:
 I see this as a multi-part question. Looking back at some of the
 significant roadmap/existential questions asked in the last 12 months, I
 see the following:

 1. With the introduction of SQL, what is the philosophy of Pig (I sent
 an email about this approximately 9 months ago)
 2. What is the approach to support backward compatibility in Pig (Alan
 had sent an email about this 3 months ago)
 3. Should Pig be a TLP (the current email thread).

 Here is my take on answering the aforementioned questions.

 The initial philosophy of Pig was to be backend agnostic. It was
 designed as a data flow language. Whenever a new language is designed,
 the syntax and semantics of the language have to be laid out. The syntax
 is usually captured in the form of a BNF grammar. The semantics are
 defined by the language creators. Backward compatibility is then a
 question of holding true to the syntax and semantics. With Pig, in
 addition to the language, the Java APIs were exposed to customers to
 implement UDFs (load/store/filter/grouping/row transformation etc),
 provision looping since the language does not support looping constructs
 and also support a programmatic mode of access. Backward compatibility
 in this context is to support API versioning.

 Do we still intend to position as a data flow language that is backend
 agnostic? If the answer is yes, then there is a strong case for making
 Pig a TLP.

 Are we influenced by Hadoop? A big YES! The reason Pig chose to become a
 Hadoop sub-project was to ride the Hadoop popularity wave. As a
 consequence, we chose to be heavily influenced by the Hadoop roadmap.

 Like a good lawyer, I also have rebuttals to Alan's questions :)

 1. Search engine popularity - We can discuss this with the Hadoop team
 and still retain links to TLP's that are coupled (loosely or tightly).
 2. Explicit connection to Hadoop - I see this as logical connection v/s
 physical connection. Today, we are physically connected as a
 sub-project. Becoming a TLP, will not increase/decrease our influence on
 the Hadoop community (think Logical, Physical and MR Layers :)
 3. Philosophy - I have already talked about this. The tight coupling is
 by choice. If Pig continues to be a data flow language with clear syntax
 and semantics then someone can implement Pig on top of a different
 backend. Do we intend to take this approach?

 I just wanted to offer a different opinion to this thread. I strongly
 believe that we should think about the original

Re: Begin a discussion about Pig as a top level project

2010-04-05 Thread Alan Gates
 intend to position as a data flow language that is  
backend
agnostic? If the answer is yes, then there is a strong case for  
making

Pig a TLP.

Are we influenced by Hadoop? A big YES! The reason Pig chose to  
become a

Hadoop sub-project was to ride the Hadoop popularity wave. As a
consequence, we chose to be heavily influenced by the Hadoop  
roadmap.


Like a good lawyer, I also have rebuttals to Alan's questions :)

1. Search engine popularity - We can discuss this with the Hadoop  
team
and still retain links to TLP's that are coupled (loosely or  
tightly).
2. Explicit connection to Hadoop - I see this as logical  
connection v/s

physical connection. Today, we are physically connected as a
sub-project. Becoming a TLP, will not increase/decrease our  
influence on

the Hadoop community (think Logical, Physical and MR Layers :)
3. Philosophy - I have already talked about this. The tight  
coupling is
by choice. If Pig continues to be a data flow language with clear  
syntax

and semantics then someone can implement Pig on top of a different
backend. Do we intend to take this approach?

I just wanted to offer a different opinion to this thread. I  
strongly
believe that we should think about the original philosophy. Will  
we have

a Pig standards committee that will decide on the changes to the
language (think C/C++) if there are multiple backend  
implementations?


I will reserve my vote based on the outcome of the philosophy and
backward compatibility discussions. If we decide that Pig will be
treated and maintained like a true language with clear syntax and
semantics then we have a strong case to make it into a TLP. If  
not, we
should retain our existing ties to Hadoop and make Pig into a data  
flow

language for Hadoop.

Santhosh

-Original Message-
From: Thejas Nair [mailto:te...@yahoo-inc.com]
Sent: Friday, April 02, 2010 4:08 PM
To: pig-dev@hadoop.apache.org; Dmitriy Ryaboy
Subject: Re: Begin a discussion about Pig as a top level project

I agree with Alan and Dmitriy - Pig is tightly coupled with  
hadoop, and
heavily influenced by its roadmap. I think it makes sense to  
continue as

a sub-project of hadoop.

-Thejas



On 3/31/10 4:04 PM, Dmitriy Ryaboy dvrya...@gmail.com wrote:


Over time, Pig is increasing its coupling to Hadoop (for good
reasons), rather than decreasing it. If and when Pig becomes a  
viable
entity without hadoop around, it might make sense as a TLP. As  
is, I

think becoming a TLP will only introduce unnecessary administrative

and bureaucratic headaches.

So my vote is also -1.

-Dmitriy



On Wed, Mar 31, 2010 at 2:38 PM, Alan Gates ga...@yahoo-inc.com

wrote:



So far I haven't seen any feedback on this.  Apache has asked the
Hadoop PMC to submit input in April on whether some subprojects
should be promoted to TLPs.  We, the Pig community, need to give
feedback to the Hadoop PMC on how we feel about this.  Please make

your voice heard.


So now I'll head my own call and give my thoughts on it.

The biggest advantage I see to being a TLP is a direct  
connection to
Apache.  Right now all of the Pig team's interaction with Apache  
is

through the Hadoop PMC.  Being directly connected to Apache would
benefit Pig team members who would have a better view into Apache.
It would also raise our profile in Apache and thus make other

projects more aware of us.


However, I am concerned about loosing Pig's explicit connection to

Hadoop.
This concern has a couple of dimensions.  One, Hadoop and  
MapReduce

are the current flavor of the month in computing.  Given that Pig
shares a name with the common farm animal, it's hard to be sure  
based

on search statistics.

But Google trends shows that hadoop is searched on much more
frequently than hadoop pig or apache pig (see
http://www.google.com/trends?q=hadoop%2Chadoop+pig).  I am  
guessing

that most Pig users come from Hadoop users who discover Pig via

Hadoop's website.

Loosing that subproject tab on Hadoop's front page may radically
lower the number of users coming to Pig to check out our  
project.  I

would argue that this benefits Hadoop as well, since high level
languages like Pig Latin have the potential to greatly extend the

user base and usability of Hadoop.


Two, being explicitly connected to Hadoop keeps our two  
communities
aware of each others needs.  There are features proposed for MR  
that

would greatly help Pig.  By staying in the Hadoop community Pig is
better positioned to advocate for and help implement and test  
those
features.  The response to this will be that Pig developers can  
still



subscribe to Hadoop mailing lists, submit patches, etc.  That is,
they can still be part of the Hadoop community.  Which  
reinforces my
point that it makes more sense to leave Pig in the Hadoop  
community
since Pig developers will need to be part of that community  
anyway.


Finally, philosophically it makes sense to me that projects that  
are
tightly connected belong together.  It strikes me

Re: Begin a discussion about Pig as a top level project

2010-04-05 Thread Alan Gates
Prognostication is a difficult business.  Of course I'd love it if  
someday there is an ISO Pig Latin committee (with meetings in cool  
exotic places) deciding the official standard for Pig Latin.  But that  
seems like saying in your start up's business plan, When we reach  
Google's size, then we'll do x.  If there ever is an ISO Pig Latin  
standard it will be years off.


As others have noted, staying tight to Hadoop now has many advantages,  
both in technical and adoption terms.  Hence my advocacy of keeping  
Pig Latin Hadoop agnostic while tightly integrating the backend.   
Which is to say that in my view, Pig is Hadoop specific now, but there  
may come a day when that is no longer true.   Whether Pig will ever  
move past just running on Hadoop to running in other parallel systems  
won't be known for years to come.  Given that, do you think it makes  
sense to say that Pig stays a subproject for now, but if it someday  
grows beyond Hadoop only it becomes a TLP?  I could agree to that  
stance.


Alan.

On Apr 3, 2010, at 12:43 PM, Santhosh Srinivasan wrote:


I see this as a multi-part question. Looking back at some of the
significant roadmap/existential questions asked in the last 12  
months, I

see the following:

1. With the introduction of SQL, what is the philosophy of Pig (I sent
an email about this approximately 9 months ago)
2. What is the approach to support backward compatibility in Pig (Alan
had sent an email about this 3 months ago)
3. Should Pig be a TLP (the current email thread).

Here is my take on answering the aforementioned questions.

The initial philosophy of Pig was to be backend agnostic. It was
designed as a data flow language. Whenever a new language is designed,
the syntax and semantics of the language have to be laid out. The  
syntax

is usually captured in the form of a BNF grammar. The semantics are
defined by the language creators. Backward compatibility is then a
question of holding true to the syntax and semantics. With Pig, in
addition to the language, the Java APIs were exposed to customers to
implement UDFs (load/store/filter/grouping/row transformation etc),
provision looping since the language does not support looping  
constructs

and also support a programmatic mode of access. Backward compatibility
in this context is to support API versioning.

Do we still intend to position as a data flow language that is backend
agnostic? If the answer is yes, then there is a strong case for making
Pig a TLP.

Are we influenced by Hadoop? A big YES! The reason Pig chose to  
become a

Hadoop sub-project was to ride the Hadoop popularity wave. As a
consequence, we chose to be heavily influenced by the Hadoop roadmap.

Like a good lawyer, I also have rebuttals to Alan's questions :)

1. Search engine popularity - We can discuss this with the Hadoop team
and still retain links to TLP's that are coupled (loosely or tightly).
2. Explicit connection to Hadoop - I see this as logical connection  
v/s

physical connection. Today, we are physically connected as a
sub-project. Becoming a TLP, will not increase/decrease our  
influence on

the Hadoop community (think Logical, Physical and MR Layers :)
3. Philosophy - I have already talked about this. The tight coupling  
is
by choice. If Pig continues to be a data flow language with clear  
syntax

and semantics then someone can implement Pig on top of a different
backend. Do we intend to take this approach?

I just wanted to offer a different opinion to this thread. I strongly
believe that we should think about the original philosophy. Will we  
have

a Pig standards committee that will decide on the changes to the
language (think C/C++) if there are multiple backend implementations?

I will reserve my vote based on the outcome of the philosophy and
backward compatibility discussions. If we decide that Pig will be
treated and maintained like a true language with clear syntax and
semantics then we have a strong case to make it into a TLP. If not, we
should retain our existing ties to Hadoop and make Pig into a data  
flow

language for Hadoop.

Santhosh

-Original Message-
From: Thejas Nair [mailto:te...@yahoo-inc.com]
Sent: Friday, April 02, 2010 4:08 PM
To: pig-dev@hadoop.apache.org; Dmitriy Ryaboy
Subject: Re: Begin a discussion about Pig as a top level project

I agree with Alan and Dmitriy - Pig is tightly coupled with hadoop,  
and
heavily influenced by its roadmap. I think it makes sense to  
continue as

a sub-project of hadoop.

-Thejas



On 3/31/10 4:04 PM, Dmitriy Ryaboy dvrya...@gmail.com wrote:


Over time, Pig is increasing its coupling to Hadoop (for good
reasons), rather than decreasing it. If and when Pig becomes a viable
entity without hadoop around, it might make sense as a TLP. As is, I
think becoming a TLP will only introduce unnecessary administrative

and bureaucratic headaches.

So my vote is also -1.

-Dmitriy



On Wed, Mar 31, 2010 at 2:38 PM, Alan Gates ga...@yahoo-inc.com

wrote:



So

Re: Begin a discussion about Pig as a top level project

2010-04-05 Thread Dmitriy Ryaboy
 will reserve my vote based on the outcome of the philosophy and
  backward compatibility discussions. If we decide that Pig will be
  treated and maintained like a true language with clear syntax and
  semantics then we have a strong case to make it into a TLP. If not, we
  should retain our existing ties to Hadoop and make Pig into a data flow
  language for Hadoop.
 
  Santhosh
 
  -Original Message-
  From: Thejas Nair [mailto:te...@yahoo-inc.com]
  Sent: Friday, April 02, 2010 4:08 PM
  To: pig-dev@hadoop.apache.org; Dmitriy Ryaboy
  Subject: Re: Begin a discussion about Pig as a top level project
 
  I agree with Alan and Dmitriy - Pig is tightly coupled with hadoop, and
  heavily influenced by its roadmap. I think it makes sense to continue as
  a sub-project of hadoop.
 
  -Thejas
 
 
 
  On 3/31/10 4:04 PM, Dmitriy Ryaboy dvrya...@gmail.com wrote:
 
   Over time, Pig is increasing its coupling to Hadoop (for good
  reasons), rather than decreasing it. If and when Pig becomes a viable
  entity without hadoop around, it might make sense as a TLP. As is, I
  think becoming a TLP will only introduce unnecessary administrative
 
  and bureaucratic headaches.
 
  So my vote is also -1.
 
  -Dmitriy
 
 
 
  On Wed, Mar 31, 2010 at 2:38 PM, Alan Gates ga...@yahoo-inc.com
 
  wrote:
 
 
   So far I haven't seen any feedback on this.  Apache has asked the
  Hadoop PMC to submit input in April on whether some subprojects
  should be promoted to TLPs.  We, the Pig community, need to give
  feedback to the Hadoop PMC on how we feel about this.  Please make
 
  your voice heard.
 
 
  So now I'll head my own call and give my thoughts on it.
 
  The biggest advantage I see to being a TLP is a direct connection to
  Apache.  Right now all of the Pig team's interaction with Apache is
  through the Hadoop PMC.  Being directly connected to Apache would
  benefit Pig team members who would have a better view into Apache.
  It would also raise our profile in Apache and thus make other
 
  projects more aware of us.
 
 
  However, I am concerned about loosing Pig's explicit connection to
 
  Hadoop.
 
  This concern has a couple of dimensions.  One, Hadoop and MapReduce
  are the current flavor of the month in computing.  Given that Pig
  shares a name with the common farm animal, it's hard to be sure based
 
  on search statistics.
 
  But Google trends shows that hadoop is searched on much more
  frequently than hadoop pig or apache pig (see
  http://www.google.com/trends?q=hadoop%2Chadoop+pig).  I am guessing
  that most Pig users come from Hadoop users who discover Pig via
 
  Hadoop's website.
 
  Loosing that subproject tab on Hadoop's front page may radically
  lower the number of users coming to Pig to check out our project.  I
  would argue that this benefits Hadoop as well, since high level
  languages like Pig Latin have the potential to greatly extend the
 
  user base and usability of Hadoop.
 
 
  Two, being explicitly connected to Hadoop keeps our two communities
  aware of each others needs.  There are features proposed for MR that
  would greatly help Pig.  By staying in the Hadoop community Pig is
  better positioned to advocate for and help implement and test those
  features.  The response to this will be that Pig developers can still
 
 
   subscribe to Hadoop mailing lists, submit patches, etc.  That is,
  they can still be part of the Hadoop community.  Which reinforces my
  point that it makes more sense to leave Pig in the Hadoop community
  since Pig developers will need to be part of that community anyway.
 
  Finally, philosophically it makes sense to me that projects that are
  tightly connected belong together.  It strikes me as strange to have
  Pig as a TLP completely dependent on another TLP.  Hadoop was
  originally a subproject of Lucene.  It moved out to be a TLP when it
  became obvious that Hadoop had become independent of and useful apart
 
 
   from Lucene.  Pig is not in that position relative to Hadoop.
 
  So, I'm -1 on Pig moving out.  But this is a soft -1.  I'm open to
  being persuaded that I'm wrong or my concerns can be addressed while
  still having Pig as a TLP.
 
  Alan.
 
 
  On Mar 19, 2010, at 10:59 AM, Alan Gates wrote:
 
  You have probably heard by now that there is a discussion going on
  in the
 
  Hadoop PMC as to whether a number of the subprojects (Hbase, Avro,
  Zookeeper, Hive, and Pig) should move out from under the Hadoop
  umbrella and become top level Apache projects (TLP).  This
  discussion has picked up recently since the Apache board has clearly
 
 
   communicated to the Hadoop PMC that it is concerned that Hadoop is
  acting as an umbrella project with many disjoint subprojects
  underneath it.  They are concerned that this gives Apache little
  insight into the health and happenings of the subproject communities
 
 
   which in turn means Apache cannot properly mentor those communities.
 
  The purpose of this email is to start a discussion within

Re: Begin a discussion about Pig as a top level project

2010-04-05 Thread hc busy
 that are coupled (loosely or tightly).
   2. Explicit connection to Hadoop - I see this as logical connection
 v/s
   physical connection. Today, we are physically connected as a
   sub-project. Becoming a TLP, will not increase/decrease our influence
 on
   the Hadoop community (think Logical, Physical and MR Layers :)
   3. Philosophy - I have already talked about this. The tight coupling
 is
   by choice. If Pig continues to be a data flow language with clear
 syntax
   and semantics then someone can implement Pig on top of a different
   backend. Do we intend to take this approach?
  
   I just wanted to offer a different opinion to this thread. I strongly
   believe that we should think about the original philosophy. Will we
 have
   a Pig standards committee that will decide on the changes to the
   language (think C/C++) if there are multiple backend implementations?
  
   I will reserve my vote based on the outcome of the philosophy and
   backward compatibility discussions. If we decide that Pig will be
   treated and maintained like a true language with clear syntax and
   semantics then we have a strong case to make it into a TLP. If not, we
   should retain our existing ties to Hadoop and make Pig into a data
 flow
   language for Hadoop.
  
   Santhosh
  
   -Original Message-
   From: Thejas Nair [mailto:te...@yahoo-inc.com]
   Sent: Friday, April 02, 2010 4:08 PM
   To: pig-dev@hadoop.apache.org; Dmitriy Ryaboy
   Subject: Re: Begin a discussion about Pig as a top level project
  
   I agree with Alan and Dmitriy - Pig is tightly coupled with hadoop,
 and
   heavily influenced by its roadmap. I think it makes sense to continue
 as
   a sub-project of hadoop.
  
   -Thejas
  
  
  
   On 3/31/10 4:04 PM, Dmitriy Ryaboy dvrya...@gmail.com wrote:
  
Over time, Pig is increasing its coupling to Hadoop (for good
   reasons), rather than decreasing it. If and when Pig becomes a viable
   entity without hadoop around, it might make sense as a TLP. As is, I
   think becoming a TLP will only introduce unnecessary administrative
  
   and bureaucratic headaches.
  
   So my vote is also -1.
  
   -Dmitriy
  
  
  
   On Wed, Mar 31, 2010 at 2:38 PM, Alan Gates ga...@yahoo-inc.com
  
   wrote:
  
  
So far I haven't seen any feedback on this.  Apache has asked the
   Hadoop PMC to submit input in April on whether some subprojects
   should be promoted to TLPs.  We, the Pig community, need to give
   feedback to the Hadoop PMC on how we feel about this.  Please make
  
   your voice heard.
  
  
   So now I'll head my own call and give my thoughts on it.
  
   The biggest advantage I see to being a TLP is a direct connection to
   Apache.  Right now all of the Pig team's interaction with Apache is
   through the Hadoop PMC.  Being directly connected to Apache would
   benefit Pig team members who would have a better view into Apache.
   It would also raise our profile in Apache and thus make other
  
   projects more aware of us.
  
  
   However, I am concerned about loosing Pig's explicit connection to
  
   Hadoop.
  
   This concern has a couple of dimensions.  One, Hadoop and MapReduce
   are the current flavor of the month in computing.  Given that Pig
   shares a name with the common farm animal, it's hard to be sure
 based
  
   on search statistics.
  
   But Google trends shows that hadoop is searched on much more
   frequently than hadoop pig or apache pig (see
   http://www.google.com/trends?q=hadoop%2Chadoop+pig).  I am guessing
   that most Pig users come from Hadoop users who discover Pig via
  
   Hadoop's website.
  
   Loosing that subproject tab on Hadoop's front page may radically
   lower the number of users coming to Pig to check out our project.  I
   would argue that this benefits Hadoop as well, since high level
   languages like Pig Latin have the potential to greatly extend the
  
   user base and usability of Hadoop.
  
  
   Two, being explicitly connected to Hadoop keeps our two communities
   aware of each others needs.  There are features proposed for MR that
   would greatly help Pig.  By staying in the Hadoop community Pig is
   better positioned to advocate for and help implement and test those
   features.  The response to this will be that Pig developers can
 still
  
  
subscribe to Hadoop mailing lists, submit patches, etc.  That is,
   they can still be part of the Hadoop community.  Which reinforces my
   point that it makes more sense to leave Pig in the Hadoop community
   since Pig developers will need to be part of that community anyway.
  
   Finally, philosophically it makes sense to me that projects that are
   tightly connected belong together.  It strikes me as strange to have
   Pig as a TLP completely dependent on another TLP.  Hadoop was
   originally a subproject of Lucene.  It moved out to be a TLP when it
   became obvious that Hadoop had become independent of and useful
 apart
  
  
from Lucene.  Pig is not in that position

Re: Begin a discussion about Pig as a top level project

2010-04-05 Thread Daniel Dai
I agree with the stance that we remain in Hadoop until we see more 
compelling reasons, such as Pig go beyond Hadoop happens. Currently I cannot 
fully weight the advantage and disadvantage of becoming a TLP. But provides 
this is a point of no return, I don't want to move unless we do have a 
strong motivation. We can always choose to become TLP later when we feel 
more convinced to that.


Daniel

--
From: Santhosh Srinivasan s...@yahoo-inc.com
Sent: Monday, April 05, 2010 12:22 PM
To: pig-dev@hadoop.apache.org
Subject: RE: Begin a discussion about Pig as a top level project


Given that, do you think it makes
sense to say that Pig stays a subproject for now, but if it someday
grows beyond Hadoop only it becomes a TLP?  I could agree to that
stance.

Bingo!

Santhosh

-Original Message-
From: Alan Gates [mailto:ga...@yahoo-inc.com]
Sent: Monday, April 05, 2010 11:37 AM
To: pig-dev@hadoop.apache.org
Subject: Re: Begin a discussion about Pig as a top level project

Prognostication is a difficult business.  Of course I'd love it if
someday there is an ISO Pig Latin committee (with meetings in cool
exotic places) deciding the official standard for Pig Latin.  But that
seems like saying in your start up's business plan, When we reach
Google's size, then we'll do x.  If there ever is an ISO Pig Latin
standard it will be years off.

As others have noted, staying tight to Hadoop now has many advantages,
both in technical and adoption terms.  Hence my advocacy of keeping
Pig Latin Hadoop agnostic while tightly integrating the backend.
Which is to say that in my view, Pig is Hadoop specific now, but there
may come a day when that is no longer true.   Whether Pig will ever
move past just running on Hadoop to running in other parallel systems
won't be known for years to come.  Given that, do you think it makes
sense to say that Pig stays a subproject for now, but if it someday
grows beyond Hadoop only it becomes a TLP?  I could agree to that
stance.

Alan.

On Apr 3, 2010, at 12:43 PM, Santhosh Srinivasan wrote:


I see this as a multi-part question. Looking back at some of the
significant roadmap/existential questions asked in the last 12
months, I
see the following:

1. With the introduction of SQL, what is the philosophy of Pig (I sent
an email about this approximately 9 months ago)
2. What is the approach to support backward compatibility in Pig (Alan
had sent an email about this 3 months ago)
3. Should Pig be a TLP (the current email thread).

Here is my take on answering the aforementioned questions.

The initial philosophy of Pig was to be backend agnostic. It was
designed as a data flow language. Whenever a new language is designed,
the syntax and semantics of the language have to be laid out. The
syntax
is usually captured in the form of a BNF grammar. The semantics are
defined by the language creators. Backward compatibility is then a
question of holding true to the syntax and semantics. With Pig, in
addition to the language, the Java APIs were exposed to customers to
implement UDFs (load/store/filter/grouping/row transformation etc),
provision looping since the language does not support looping
constructs
and also support a programmatic mode of access. Backward compatibility
in this context is to support API versioning.

Do we still intend to position as a data flow language that is backend
agnostic? If the answer is yes, then there is a strong case for making
Pig a TLP.

Are we influenced by Hadoop? A big YES! The reason Pig chose to
become a
Hadoop sub-project was to ride the Hadoop popularity wave. As a
consequence, we chose to be heavily influenced by the Hadoop roadmap.

Like a good lawyer, I also have rebuttals to Alan's questions :)

1. Search engine popularity - We can discuss this with the Hadoop team
and still retain links to TLP's that are coupled (loosely or tightly).
2. Explicit connection to Hadoop - I see this as logical connection
v/s
physical connection. Today, we are physically connected as a
sub-project. Becoming a TLP, will not increase/decrease our
influence on
the Hadoop community (think Logical, Physical and MR Layers :)
3. Philosophy - I have already talked about this. The tight coupling
is
by choice. If Pig continues to be a data flow language with clear
syntax
and semantics then someone can implement Pig on top of a different
backend. Do we intend to take this approach?

I just wanted to offer a different opinion to this thread. I strongly
believe that we should think about the original philosophy. Will we
have
a Pig standards committee that will decide on the changes to the
language (think C/C++) if there are multiple backend implementations?

I will reserve my vote based on the outcome of the philosophy and
backward compatibility discussions. If we decide that Pig will be
treated and maintained like a true language with clear syntax and
semantics then we have a strong case to make it into a TLP. If not, we
should retain

RE: Begin a discussion about Pig as a top level project

2010-04-03 Thread Santhosh Srinivasan
I see this as a multi-part question. Looking back at some of the
significant roadmap/existential questions asked in the last 12 months, I
see the following:

1. With the introduction of SQL, what is the philosophy of Pig (I sent
an email about this approximately 9 months ago)
2. What is the approach to support backward compatibility in Pig (Alan
had sent an email about this 3 months ago)
3. Should Pig be a TLP (the current email thread).

Here is my take on answering the aforementioned questions.

The initial philosophy of Pig was to be backend agnostic. It was
designed as a data flow language. Whenever a new language is designed,
the syntax and semantics of the language have to be laid out. The syntax
is usually captured in the form of a BNF grammar. The semantics are
defined by the language creators. Backward compatibility is then a
question of holding true to the syntax and semantics. With Pig, in
addition to the language, the Java APIs were exposed to customers to
implement UDFs (load/store/filter/grouping/row transformation etc),
provision looping since the language does not support looping constructs
and also support a programmatic mode of access. Backward compatibility
in this context is to support API versioning.

Do we still intend to position as a data flow language that is backend
agnostic? If the answer is yes, then there is a strong case for making
Pig a TLP.

Are we influenced by Hadoop? A big YES! The reason Pig chose to become a
Hadoop sub-project was to ride the Hadoop popularity wave. As a
consequence, we chose to be heavily influenced by the Hadoop roadmap.

Like a good lawyer, I also have rebuttals to Alan's questions :)

1. Search engine popularity - We can discuss this with the Hadoop team
and still retain links to TLP's that are coupled (loosely or tightly).
2. Explicit connection to Hadoop - I see this as logical connection v/s
physical connection. Today, we are physically connected as a
sub-project. Becoming a TLP, will not increase/decrease our influence on
the Hadoop community (think Logical, Physical and MR Layers :)
3. Philosophy - I have already talked about this. The tight coupling is
by choice. If Pig continues to be a data flow language with clear syntax
and semantics then someone can implement Pig on top of a different
backend. Do we intend to take this approach?

I just wanted to offer a different opinion to this thread. I strongly
believe that we should think about the original philosophy. Will we have
a Pig standards committee that will decide on the changes to the
language (think C/C++) if there are multiple backend implementations?

I will reserve my vote based on the outcome of the philosophy and
backward compatibility discussions. If we decide that Pig will be
treated and maintained like a true language with clear syntax and
semantics then we have a strong case to make it into a TLP. If not, we
should retain our existing ties to Hadoop and make Pig into a data flow
language for Hadoop.

Santhosh

-Original Message-
From: Thejas Nair [mailto:te...@yahoo-inc.com] 
Sent: Friday, April 02, 2010 4:08 PM
To: pig-dev@hadoop.apache.org; Dmitriy Ryaboy
Subject: Re: Begin a discussion about Pig as a top level project

I agree with Alan and Dmitriy - Pig is tightly coupled with hadoop, and
heavily influenced by its roadmap. I think it makes sense to continue as
a sub-project of hadoop.

-Thejas



On 3/31/10 4:04 PM, Dmitriy Ryaboy dvrya...@gmail.com wrote:

 Over time, Pig is increasing its coupling to Hadoop (for good 
 reasons), rather than decreasing it. If and when Pig becomes a viable 
 entity without hadoop around, it might make sense as a TLP. As is, I 
 think becoming a TLP will only introduce unnecessary administrative
and bureaucratic headaches.
 So my vote is also -1.
 
 -Dmitriy
 
 
 
 On Wed, Mar 31, 2010 at 2:38 PM, Alan Gates ga...@yahoo-inc.com
wrote:
 
 So far I haven't seen any feedback on this.  Apache has asked the 
 Hadoop PMC to submit input in April on whether some subprojects 
 should be promoted to TLPs.  We, the Pig community, need to give 
 feedback to the Hadoop PMC on how we feel about this.  Please make
your voice heard.
 
 So now I'll head my own call and give my thoughts on it.
 
 The biggest advantage I see to being a TLP is a direct connection to 
 Apache.  Right now all of the Pig team's interaction with Apache is 
 through the Hadoop PMC.  Being directly connected to Apache would 
 benefit Pig team members who would have a better view into Apache.  
 It would also raise our profile in Apache and thus make other
projects more aware of us.
 
 However, I am concerned about loosing Pig's explicit connection to
Hadoop.
  This concern has a couple of dimensions.  One, Hadoop and MapReduce 
 are the current flavor of the month in computing.  Given that Pig 
 shares a name with the common farm animal, it's hard to be sure based
on search statistics.
  But Google trends shows that hadoop is searched on much more 
 frequently than hadoop pig

Re: Begin a discussion about Pig as a top level project

2010-04-02 Thread Thejas Nair
I agree with Alan and Dmitriy - Pig is tightly coupled with hadoop, and
heavily influenced by its roadmap. I think it makes sense to continue as a
sub-project of hadoop.

-Thejas



On 3/31/10 4:04 PM, Dmitriy Ryaboy dvrya...@gmail.com wrote:

 Over time, Pig is increasing its coupling to Hadoop (for good reasons),
 rather than decreasing it. If and when Pig becomes a viable entity without
 hadoop around, it might make sense as a TLP. As is, I think becoming a TLP
 will only introduce unnecessary administrative and bureaucratic headaches.
 So my vote is also -1.
 
 -Dmitriy
 
 
 
 On Wed, Mar 31, 2010 at 2:38 PM, Alan Gates ga...@yahoo-inc.com wrote:
 
 So far I haven't seen any feedback on this.  Apache has asked the Hadoop
 PMC to submit input in April on whether some subprojects should be promoted
 to TLPs.  We, the Pig community, need to give feedback to the Hadoop PMC on
 how we feel about this.  Please make your voice heard.
 
 So now I'll head my own call and give my thoughts on it.
 
 The biggest advantage I see to being a TLP is a direct connection to
 Apache.  Right now all of the Pig team's interaction with Apache is through
 the Hadoop PMC.  Being directly connected to Apache would benefit Pig team
 members who would have a better view into Apache.  It would also raise our
 profile in Apache and thus make other projects more aware of us.
 
 However, I am concerned about loosing Pig's explicit connection to Hadoop.
  This concern has a couple of dimensions.  One, Hadoop and MapReduce are the
 current flavor of the month in computing.  Given that Pig shares a name with
 the common farm animal, it's hard to be sure based on search statistics.
  But Google trends shows that hadoop is searched on much more frequently
 than hadoop pig or apache pig (see
 http://www.google.com/trends?q=hadoop%2Chadoop+pig).  I am guessing that
 most Pig users come from Hadoop users who discover Pig via Hadoop's website.
  Loosing that subproject tab on Hadoop's front page may radically lower the
 number of users coming to Pig to check out our project.  I would argue that
 this benefits Hadoop as well, since high level languages like Pig Latin have
 the potential to greatly extend the user base and usability of Hadoop.
 
 Two, being explicitly connected to Hadoop keeps our two communities aware
 of each others needs.  There are features proposed for MR that would greatly
 help Pig.  By staying in the Hadoop community Pig is better positioned to
 advocate for and help implement and test those features.  The response to
 this will be that Pig developers can still subscribe to Hadoop mailing
 lists, submit patches, etc.  That is, they can still be part of the Hadoop
 community.  Which reinforces my point that it makes more sense to leave Pig
 in the Hadoop community since Pig developers will need to be part of that
 community anyway.
 
 Finally, philosophically it makes sense to me that projects that are
 tightly connected belong together.  It strikes me as strange to have Pig as
 a TLP completely dependent on another TLP.  Hadoop was originally a
 subproject of Lucene.  It moved out to be a TLP when it became obvious that
 Hadoop had become independent of and useful apart from Lucene.  Pig is not
 in that position relative to Hadoop.
 
 So, I'm -1 on Pig moving out.  But this is a soft -1.  I'm open to being
 persuaded that I'm wrong or my concerns can be addressed while still having
 Pig as a TLP.
 
 Alan.
 
 
 On Mar 19, 2010, at 10:59 AM, Alan Gates wrote:
 
  You have probably heard by now that there is a discussion going on in the
 Hadoop PMC as to whether a number of the subprojects (Hbase, Avro,
 Zookeeper, Hive, and Pig) should move out from under the Hadoop umbrella and
 become top level Apache projects (TLP).  This discussion has picked up
 recently since the Apache board has clearly communicated to the Hadoop PMC
 that it is concerned that Hadoop is acting as an umbrella project with many
 disjoint subprojects underneath it.  They are concerned that this gives
 Apache little insight into the health and happenings of the subproject
 communities which in turn means Apache cannot properly mentor those
 communities.
 
 The purpose of this email is to start a discussion within the Pig
 community about this topic.  Let me cover first what becoming TLP would mean
 for Pig, and then I'll go into what options I think we as a community have.
 
 Becoming a TLP would mean that Pig would itself have a PMC that would
 report directly to the Apache board.  Who would be on the PMC would be
 something we as a community would need to decide.  Common options would be
 to say all active committers are on the PMC, or all active committers who
 have been a committer for at least a year.  We would also need to elect a
 chair of the PMC.  This lucky person would have no additional power, but
 would have the additional responsibility of writing quarterly reports on
 Pig's status for Apache board meetings, as well as coordinating with 

Re: Begin a discussion about Pig as a top level project

2010-03-31 Thread Alan Gates
So far I haven't seen any feedback on this.  Apache has asked the  
Hadoop PMC to submit input in April on whether some subprojects should  
be promoted to TLPs.  We, the Pig community, need to give feedback to  
the Hadoop PMC on how we feel about this.  Please make your voice heard.


So now I'll head my own call and give my thoughts on it.

The biggest advantage I see to being a TLP is a direct connection to  
Apache.  Right now all of the Pig team's interaction with Apache is  
through the Hadoop PMC.  Being directly connected to Apache would  
benefit Pig team members who would have a better view into Apache.  It  
would also raise our profile in Apache and thus make other projects  
more aware of us.


However, I am concerned about loosing Pig's explicit connection to  
Hadoop.  This concern has a couple of dimensions.  One, Hadoop and  
MapReduce are the current flavor of the month in computing.  Given  
that Pig shares a name with the common farm animal, it's hard to be  
sure based on search statistics.  But Google trends shows that  
hadoop is searched on much more frequently than hadoop pig or  
apache pig (see http://www.google.com/trends?q=hadoop%2Chadoop 
+pig).  I am guessing that most Pig users come from Hadoop users who  
discover Pig via Hadoop's website.  Loosing that subproject tab on  
Hadoop's front page may radically lower the number of users coming to  
Pig to check out our project.  I would argue that this benefits Hadoop  
as well, since high level languages like Pig Latin have the potential  
to greatly extend the user base and usability of Hadoop.


Two, being explicitly connected to Hadoop keeps our two communities  
aware of each others needs.  There are features proposed for MR that  
would greatly help Pig.  By staying in the Hadoop community Pig is  
better positioned to advocate for and help implement and test those  
features.  The response to this will be that Pig developers can still  
subscribe to Hadoop mailing lists, submit patches, etc.  That is, they  
can still be part of the Hadoop community.  Which reinforces my point  
that it makes more sense to leave Pig in the Hadoop community since  
Pig developers will need to be part of that community anyway.


Finally, philosophically it makes sense to me that projects that are  
tightly connected belong together.  It strikes me as strange to have  
Pig as a TLP completely dependent on another TLP.  Hadoop was  
originally a subproject of Lucene.  It moved out to be a TLP when it  
became obvious that Hadoop had become independent of and useful apart  
from Lucene.  Pig is not in that position relative to Hadoop.


So, I'm -1 on Pig moving out.  But this is a soft -1.  I'm open to  
being persuaded that I'm wrong or my concerns can be addressed while  
still having Pig as a TLP.


Alan.

On Mar 19, 2010, at 10:59 AM, Alan Gates wrote:

You have probably heard by now that there is a discussion going on  
in the Hadoop PMC as to whether a number of the subprojects (Hbase,  
Avro, Zookeeper, Hive, and Pig) should move out from under the  
Hadoop umbrella and become top level Apache projects (TLP).  This  
discussion has picked up recently since the Apache board has clearly  
communicated to the Hadoop PMC that it is concerned that Hadoop is  
acting as an umbrella project with many disjoint subprojects  
underneath it.  They are concerned that this gives Apache little  
insight into the health and happenings of the subproject communities  
which in turn means Apache cannot properly mentor those communities.


The purpose of this email is to start a discussion within the Pig  
community about this topic.  Let me cover first what becoming TLP  
would mean for Pig, and then I'll go into what options I think we as  
a community have.


Becoming a TLP would mean that Pig would itself have a PMC that  
would report directly to the Apache board.  Who would be on the PMC  
would be something we as a community would need to decide.  Common  
options would be to say all active committers are on the PMC, or all  
active committers who have been a committer for at least a year.  We  
would also need to elect a chair of the PMC.  This lucky person  
would have no additional power, but would have the additional  
responsibility of writing quarterly reports on Pig's status for  
Apache board meetings, as well as coordinating with Apache to get  
accounts for new  committers, etc.  For more information see http://www.apache.org/foundation/how-it-works.html#roles


Becoming a TLP would not mean that we are ostracized from the Hadoop  
community.  We would continue to be invited to Hadoop Summits, HUGs,  
etc.  Since all Pig developers and users are by definition Hadoop  
users, we would continue to be a strong presence in the Hadoop  
community.


I see three ways that we as a community can respond to this:

1) Say yes, we want to be a TLP now.
2) Say yes, we want to be a TLP, but not yet.  We feel we need more  
time to mature.  If we 

Re: Begin a discussion about Pig as a top level project

2010-03-31 Thread Dmitriy Ryaboy
Over time, Pig is increasing its coupling to Hadoop (for good reasons),
rather than decreasing it. If and when Pig becomes a viable entity without
hadoop around, it might make sense as a TLP. As is, I think becoming a TLP
will only introduce unnecessary administrative and bureaucratic headaches.
So my vote is also -1.

-Dmitriy



On Wed, Mar 31, 2010 at 2:38 PM, Alan Gates ga...@yahoo-inc.com wrote:

 So far I haven't seen any feedback on this.  Apache has asked the Hadoop
 PMC to submit input in April on whether some subprojects should be promoted
 to TLPs.  We, the Pig community, need to give feedback to the Hadoop PMC on
 how we feel about this.  Please make your voice heard.

 So now I'll head my own call and give my thoughts on it.

 The biggest advantage I see to being a TLP is a direct connection to
 Apache.  Right now all of the Pig team's interaction with Apache is through
 the Hadoop PMC.  Being directly connected to Apache would benefit Pig team
 members who would have a better view into Apache.  It would also raise our
 profile in Apache and thus make other projects more aware of us.

 However, I am concerned about loosing Pig's explicit connection to Hadoop.
  This concern has a couple of dimensions.  One, Hadoop and MapReduce are the
 current flavor of the month in computing.  Given that Pig shares a name with
 the common farm animal, it's hard to be sure based on search statistics.
  But Google trends shows that hadoop is searched on much more frequently
 than hadoop pig or apache pig (see
 http://www.google.com/trends?q=hadoop%2Chadoop+pig).  I am guessing that
 most Pig users come from Hadoop users who discover Pig via Hadoop's website.
  Loosing that subproject tab on Hadoop's front page may radically lower the
 number of users coming to Pig to check out our project.  I would argue that
 this benefits Hadoop as well, since high level languages like Pig Latin have
 the potential to greatly extend the user base and usability of Hadoop.

 Two, being explicitly connected to Hadoop keeps our two communities aware
 of each others needs.  There are features proposed for MR that would greatly
 help Pig.  By staying in the Hadoop community Pig is better positioned to
 advocate for and help implement and test those features.  The response to
 this will be that Pig developers can still subscribe to Hadoop mailing
 lists, submit patches, etc.  That is, they can still be part of the Hadoop
 community.  Which reinforces my point that it makes more sense to leave Pig
 in the Hadoop community since Pig developers will need to be part of that
 community anyway.

 Finally, philosophically it makes sense to me that projects that are
 tightly connected belong together.  It strikes me as strange to have Pig as
 a TLP completely dependent on another TLP.  Hadoop was originally a
 subproject of Lucene.  It moved out to be a TLP when it became obvious that
 Hadoop had become independent of and useful apart from Lucene.  Pig is not
 in that position relative to Hadoop.

 So, I'm -1 on Pig moving out.  But this is a soft -1.  I'm open to being
 persuaded that I'm wrong or my concerns can be addressed while still having
 Pig as a TLP.

 Alan.


 On Mar 19, 2010, at 10:59 AM, Alan Gates wrote:

  You have probably heard by now that there is a discussion going on in the
 Hadoop PMC as to whether a number of the subprojects (Hbase, Avro,
 Zookeeper, Hive, and Pig) should move out from under the Hadoop umbrella and
 become top level Apache projects (TLP).  This discussion has picked up
 recently since the Apache board has clearly communicated to the Hadoop PMC
 that it is concerned that Hadoop is acting as an umbrella project with many
 disjoint subprojects underneath it.  They are concerned that this gives
 Apache little insight into the health and happenings of the subproject
 communities which in turn means Apache cannot properly mentor those
 communities.

 The purpose of this email is to start a discussion within the Pig
 community about this topic.  Let me cover first what becoming TLP would mean
 for Pig, and then I'll go into what options I think we as a community have.

 Becoming a TLP would mean that Pig would itself have a PMC that would
 report directly to the Apache board.  Who would be on the PMC would be
 something we as a community would need to decide.  Common options would be
 to say all active committers are on the PMC, or all active committers who
 have been a committer for at least a year.  We would also need to elect a
 chair of the PMC.  This lucky person would have no additional power, but
 would have the additional responsibility of writing quarterly reports on
 Pig's status for Apache board meetings, as well as coordinating with Apache
 to get accounts for new  committers, etc.  For more information see
 http://www.apache.org/foundation/how-it-works.html#roles

 Becoming a TLP would not mean that we are ostracized from the Hadoop
 community.  We would continue to be invited to Hadoop