Re: [DISCUSS] [VOTE] HCatalog to Graduate and become part of Apache Hive

2013-02-20 Thread Ross Gardler
I'm brought to this thread byt he board report but my response here is as
an IPMC member. My comment on the board report is quite different, it is
I've read the thread on general@ and feel that the IPMC should make a
clear recommendation to the board in this and similar cases. The IPMC
discussion seems to be healthy and productive.

So, as a an IPMC member I have a few open questions [inline]...


On 11 February 2013 18:20, Alan Gates ga...@hortonworks.com wrote:



Also, it has been agreed that each HCatalog committer will be provided with
 a mentor from the Hive community to help him/her learn the rest of Hive,
 with the goal of becoming a committer on Hive within six months.  The
 submodule state is transitionary, not an end point.


Why was thismentoring not done as part of the incubation process since
building the right community structure for graduation (along with IP
clearance) is the main role of the incubation process? Was Hive the
sponsoring project for this proposal? If not why not?

I ask these questions because HCatlog is making a very strong case that any
other option for graduation is not appropriate. At the same time we are
being told by the Hive PMC that the mentoring of the committers is
incomplete since they have insufficient merit within Hive to be trusted to
be full members of that project.

it also concerns me that in this same month the IPMC board report says The
main concern of the incubator continues to be the quality and reliability
of supervision... The supply of mentoring seems, still, to exceed demand.

Why is it that the Hive PMC feels it is able to provide mentoring within
their own PMC through the creation of what some people see as
an umbrella project, but not here in the IPMC?

Finally, why can't I find the HCatalog proposal in my mail client, markmail
or the wiki (not had coffee yet, feel free to call me [insert adjective])

Ross


Re: Wiki privs

2013-02-20 Thread Nick Burch

On Tue, 19 Feb 2013, Arun C Murthy wrote:

Help, please?


I've added you to this list. Two things though, firstly usernames with 
spaces in aren't that usual, so you should check it works. Secondly, an 
account with the username ArunMurthy already had karma, so is it 
possible you previously created a different account?


Nick


On Feb 18, 2013, at 3:05 PM, Arun C Murthy wrote:

Hi Folks,

Can someone pls grant me privs so that I can put up a new Incubator proposal on 
the wiki (http://wiki.apache.org/incubator/TezProposal) ? My wiki username is 
'Arun C Murthy'.

thanks,
Arun








-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: Wiki privs

2013-02-20 Thread Daniel Shahaf
Nick Burch wrote on Wed, Feb 20, 2013 at 11:55:54 +:
 On Tue, 19 Feb 2013, Arun C Murthy wrote:
 Help, please?

 I've added you to this list. Two things though, firstly usernames with  
 spaces in aren't that usual, so you should check it works. Secondly, an  

[[Arun Murthy]] would work (i.e., make it a link).

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [DISCUSS] [VOTE] HCatalog to Graduate and become part of Apache Hive

2013-02-20 Thread Benson Margulies
On Wed, Feb 20, 2013 at 5:31 AM, Ross Gardler
rgard...@opendirective.com wrote:
 I'm brought to this thread byt he board report but my response here is as
 an IPMC member. My comment on the board report is quite different, it is
 I've read the thread on general@ and feel that the IPMC should make a
 clear recommendation to the board in this and similar cases. The IPMC
 discussion seems to be healthy and productive.

 So, as a an IPMC member I have a few open questions [inline]...


 On 11 February 2013 18:20, Alan Gates ga...@hortonworks.com wrote:



 Also, it has been agreed that each HCatalog committer will be provided with
 a mentor from the Hive community to help him/her learn the rest of Hive,
 with the goal of becoming a committer on Hive within six months.  The
 submodule state is transitionary, not an end point.


 Why was thismentoring not done as part of the incubation process since
 building the right community structure for graduation (along with IP
 clearance) is the main role of the incubation process? Was Hive the
 sponsoring project for this proposal? If not why not?

Ross, my suspicion (and I haven't done the digging here on vacation)
is that HCatalog started incubation with the intention of becoming a
TLP, so their original sponsor was the incubator itself. The idea of
merging into Hive came up late in the process. So the Hive people had
no warning or reason to be part of the supervision.

Thus, your email seems to me to pose this question: Should the IPMC
seal of approval be good enough for an existing TLP to grant committer
status? I don't see why that should be. Projects have their own
culture and conventions, and if Hive was not participating in the
incubation process, why should those conventions be part of the
HCatalog incubation?

This brings me to my other obsessional point here. If the plan, from
the start, had been to import code to Hive, there would have been no
need for the IPMC to do anything except IP clearance. Hive could have
'incubated' via the usual mechanism of accepting patches. The notion
of 'incubate sponsored by project X' makes sense to me if the eventual
trajectory is some sort of autonomous subproject, and not otherwise.




 I ask these questions because HCatlog is making a very strong case that any
 other option for graduation is not appropriate. At the same time we are
 being told by the Hive PMC that the mentoring of the committers is
 incomplete since they have insufficient merit within Hive to be trusted to
 be full members of that project.

 it also concerns me that in this same month the IPMC board report says The
 main concern of the incubator continues to be the quality and reliability
 of supervision... The supply of mentoring seems, still, to exceed demand.

 Why is it that the Hive PMC feels it is able to provide mentoring within
 their own PMC through the creation of what some people see as
 an umbrella project, but not here in the IPMC?

 Finally, why can't I find the HCatalog proposal in my mail client, markmail
 or the wiki (not had coffee yet, feel free to call me [insert adjective])

 Ross

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [VOTE] Accept Tez into Incubator

2013-02-20 Thread Hitesh Shah
+1 ( non-binding ) 

-- Hitesh

On Feb 19, 2013, at 8:26 PM, Arun C Murthy wrote:

 Hi Folks,
 
 Thanks for participating in the discussion. I'd like to call a VOTE for 
 acceptance of Apache Tez into the Incubator. I'll let the vote run till into 
 this weekend (Sun 2/24 6pm PST).
 
 [ ]  +1 Accept Apache Tez into the Incubator
 [ ]  +0 Don't care.
 [ ]  -1 Don't accept Apache Tez into the Incubator because...
 
 Full proposal is pasted at the bottom of this email, and the corresponding 
 wiki is http://wiki.apache.org/incubator/TezProposal. 
 
 Only VOTEs from Incubator PMC members are binding, but all are welcome to 
 express their thoughts.
 
 Here's my +1 (binding).
 
 thanks,
 Arun
 
 PS: From the initial discussion, the only changes are that I've added one new 
 mentor and 2 new committers. All the new additions come from the non-major 
 employer while we continue to strive to further diversify during the 
 incubation. Thanks.
 
 
 
 = Tez =
 
 == Abstract ==
 Tez is an effort to develop a generic application framework which can be used
 to process arbitrarily complex data-processing tasks and also a re-usable set
 of data-processing primitives which can be used by other projects.
 
 == Proposal ==
 Tez is a proposal to develop a generic application which can be used to
 process complex data-processing task DAGs and runs natively on Apache Hadoop 
 YARN. YARN is a generic resource-management system on which currently 
 applications like MapReduce already exist. MapReduce is a specific, and
 constrained, DAG - which is not optimal for several frameworks like Apache 
 Hive
 and Apache Pig. Furthermore, we propose to develop a re-usable set of
 libraries of data-processing primitives such as sorting, merging,
 data-shuffling, intermediate data management etc. which are necessary for Tez 
 which we envision can be used directly by other projects. 
 
 == Background ==
 Apache Hadoop MapReduce has emerged as the assembly-language on which other
 frameworks like Apache Pig and Apache Hive have been built. However, it has
 been well accepted that MapReduce produces very constrained task DAGs for each
 job which results in Apache Pig and Apache Hive requiring multiple MapReduce
 jobs for several queries. By providing a more expressive DAG of tasks for a
 job, Tez attempts to provide significantly enhanced data-processing
 capabilities for projects like Apache Pig, Apache Hive, Cascading etc.
 
 == Rationale ==
 There is an important gap that Tez fulfills in the Apache Hadoop ecosystem of
 allowing for more expressive task DAGs for data-processing applications such
 as Apache Pig, Apache Hive, Cascading etc.
 
 With emergence of Apache Hadoop YARN, there is a strong need for a
 common DAG application which can then be shared by Apache Pig, Apache Hive,
 Cascading etc.
 
 == Initial Goals ==
 The initial goals for this project are to specify the detailed requirements
 and architecture, and then develop the initial implementation including the
 DAG ApplicationMaster to run natively inside Apache Hadoop YARN. 
 
 == Current Status ==
 Significant work has been completed to identify the initial requirements and
 define the overall system architecture. There is a patch available in the
 internal Hortonworks git repository which can act as the initial seed. 
 
 === Meritocracy ===
 We plan to invest in supporting a meritocracy. We will discuss the 
 requirements 
 in an open forum. Several companies have already expressed interest in this 
 project, and we intend to invite additional developers to participate. 
 We will encourage and monitor community participation so that privileges can 
 be 
 extended to those that contribute. 
 
 === Community ===
 The need for a generic DAG application for data processing in the open source 
 is 
 tremendous, so there is a potential for a very large community. We believe
 that Tez's extensible architecture will further encourage community 
 participation. 
 Also, related Apache projects (eg, Pig, Hive) have very large and active 
 communities, and we expect that over time Tez will also attract a large 
 community.
 
 === Core Developers ===
 The developers on the initial committers list include people very experienced
 in the Apache Hadoop ecosystem:
 
 * Alan Gates gates at apache dot org
 * Arun C Murthy acmurthy at apache dot org
 * Ashutosh Chauhan hashutosh at apache dot org
 * Bikas Saha bikas at apache dot org
 * Chris Douglas cdouglas at apache dot org
 * Daryn Sharp daryn at apache dot org
 * Devaraj Das ddas at apache dot org
 * Gopal Vijayaraghavan gopal at hortonworks dot com
 * Gunther Hagleitner ghagleitner at hortonworks dot com
 * Hitesh Shah hitesh at apache dot org
 * Jason Lowe jlowe at apache dot org
 * Jean Xu jeanxu at facebook dot com
 * Jitendra Pandey jitendra at apache dot org
 * Julien Le Dem julien at apache dot org
 * Kevin Wilfong kevinwilfong at apache dot org
 * Mike Liddell mike dot lidell at microsoft dot com
 * Namit Jain namit at apache 

Re: [VOTE] Accept Tez into Incubator

2013-02-20 Thread Alejandro Abdelnur
+1 (non-binding), glad to see that finally the idea of having a DAG AM is
getting traction.

Arun, would you please clarify how Tez is (conceptually) different from the
Workflow AM proposed in MAPREDUCE-4495/OOZIE-1178?



On Wed, Feb 20, 2013 at 6:50 AM, Hitesh Shah hit...@hortonworks.com wrote:

 +1 ( non-binding )

 -- Hitesh

 On Feb 19, 2013, at 8:26 PM, Arun C Murthy wrote:

  Hi Folks,
 
  Thanks for participating in the discussion. I'd like to call a VOTE for
 acceptance of Apache Tez into the Incubator. I'll let the vote run till
 into this weekend (Sun 2/24 6pm PST).
 
  [ ]  +1 Accept Apache Tez into the Incubator
  [ ]  +0 Don't care.
  [ ]  -1 Don't accept Apache Tez into the Incubator because...
 
  Full proposal is pasted at the bottom of this email, and the
 corresponding wiki is http://wiki.apache.org/incubator/TezProposal.
 
  Only VOTEs from Incubator PMC members are binding, but all are welcome
 to express their thoughts.
 
  Here's my +1 (binding).
 
  thanks,
  Arun
 
  PS: From the initial discussion, the only changes are that I've added
 one new mentor and 2 new committers. All the new additions come from the
 non-major employer while we continue to strive to further diversify during
 the incubation. Thanks.
 
  
 
  = Tez =
 
  == Abstract ==
  Tez is an effort to develop a generic application framework which can be
 used
  to process arbitrarily complex data-processing tasks and also a
 re-usable set
  of data-processing primitives which can be used by other projects.
 
  == Proposal ==
  Tez is a proposal to develop a generic application which can be used to
  process complex data-processing task DAGs and runs natively on Apache
 Hadoop
  YARN. YARN is a generic resource-management system on which currently
  applications like MapReduce already exist. MapReduce is a specific, and
  constrained, DAG - which is not optimal for several frameworks like
 Apache Hive
  and Apache Pig. Furthermore, we propose to develop a re-usable set of
  libraries of data-processing primitives such as sorting, merging,
  data-shuffling, intermediate data management etc. which are necessary
 for Tez
  which we envision can be used directly by other projects.
 
  == Background ==
  Apache Hadoop MapReduce has emerged as the assembly-language on which
 other
  frameworks like Apache Pig and Apache Hive have been built. However, it
 has
  been well accepted that MapReduce produces very constrained task DAGs
 for each
  job which results in Apache Pig and Apache Hive requiring multiple
 MapReduce
  jobs for several queries. By providing a more expressive DAG of tasks
 for a
  job, Tez attempts to provide significantly enhanced data-processing
  capabilities for projects like Apache Pig, Apache Hive, Cascading etc.
 
  == Rationale ==
  There is an important gap that Tez fulfills in the Apache Hadoop
 ecosystem of
  allowing for more expressive task DAGs for data-processing applications
 such
  as Apache Pig, Apache Hive, Cascading etc.
 
  With emergence of Apache Hadoop YARN, there is a strong need for a
  common DAG application which can then be shared by Apache Pig, Apache
 Hive,
  Cascading etc.
 
  == Initial Goals ==
  The initial goals for this project are to specify the detailed
 requirements
  and architecture, and then develop the initial implementation including
 the
  DAG ApplicationMaster to run natively inside Apache Hadoop YARN.
 
  == Current Status ==
  Significant work has been completed to identify the initial requirements
 and
  define the overall system architecture. There is a patch available in the
  internal Hortonworks git repository which can act as the initial seed.
 
  === Meritocracy ===
  We plan to invest in supporting a meritocracy. We will discuss the
 requirements
  in an open forum. Several companies have already expressed interest in
 this
  project, and we intend to invite additional developers to participate.
  We will encourage and monitor community participation so that privileges
 can be
  extended to those that contribute.
 
  === Community ===
  The need for a generic DAG application for data processing in the open
 source is
  tremendous, so there is a potential for a very large community. We
 believe
  that Tez's extensible architecture will further encourage community
 participation.
  Also, related Apache projects (eg, Pig, Hive) have very large and active
  communities, and we expect that over time Tez will also attract a large
 community.
 
  === Core Developers ===
  The developers on the initial committers list include people very
 experienced
  in the Apache Hadoop ecosystem:
 
  * Alan Gates gates at apache dot org
  * Arun C Murthy acmurthy at apache dot org
  * Ashutosh Chauhan hashutosh at apache dot org
  * Bikas Saha bikas at apache dot org
  * Chris Douglas cdouglas at apache dot org
  * Daryn Sharp daryn at apache dot org
  * Devaraj Das ddas at apache dot org
  * Gopal Vijayaraghavan gopal at hortonworks dot com
  * Gunther Hagleitner 

Re: [VOTE] Accept Tez into Incubator

2013-02-20 Thread Jakob Homan
+1 (binding) -jakob


On Wed, Feb 20, 2013 at 8:26 AM, Alejandro Abdelnur t...@cloudera.comwrote:

 +1 (non-binding), glad to see that finally the idea of having a DAG AM is
 getting traction.

 Arun, would you please clarify how Tez is (conceptually) different from the
 Workflow AM proposed in MAPREDUCE-4495/OOZIE-1178?



 On Wed, Feb 20, 2013 at 6:50 AM, Hitesh Shah hit...@hortonworks.com
 wrote:

  +1 ( non-binding )
 
  -- Hitesh
 
  On Feb 19, 2013, at 8:26 PM, Arun C Murthy wrote:
 
   Hi Folks,
  
   Thanks for participating in the discussion. I'd like to call a VOTE for
  acceptance of Apache Tez into the Incubator. I'll let the vote run till
  into this weekend (Sun 2/24 6pm PST).
  
   [ ]  +1 Accept Apache Tez into the Incubator
   [ ]  +0 Don't care.
   [ ]  -1 Don't accept Apache Tez into the Incubator because...
  
   Full proposal is pasted at the bottom of this email, and the
  corresponding wiki is http://wiki.apache.org/incubator/TezProposal.
  
   Only VOTEs from Incubator PMC members are binding, but all are welcome
  to express their thoughts.
  
   Here's my +1 (binding).
  
   thanks,
   Arun
  
   PS: From the initial discussion, the only changes are that I've added
  one new mentor and 2 new committers. All the new additions come from the
  non-major employer while we continue to strive to further diversify
 during
  the incubation. Thanks.
  
   
  
   = Tez =
  
   == Abstract ==
   Tez is an effort to develop a generic application framework which can
 be
  used
   to process arbitrarily complex data-processing tasks and also a
  re-usable set
   of data-processing primitives which can be used by other projects.
  
   == Proposal ==
   Tez is a proposal to develop a generic application which can be used to
   process complex data-processing task DAGs and runs natively on Apache
  Hadoop
   YARN. YARN is a generic resource-management system on which currently
   applications like MapReduce already exist. MapReduce is a specific, and
   constrained, DAG - which is not optimal for several frameworks like
  Apache Hive
   and Apache Pig. Furthermore, we propose to develop a re-usable set of
   libraries of data-processing primitives such as sorting, merging,
   data-shuffling, intermediate data management etc. which are necessary
  for Tez
   which we envision can be used directly by other projects.
  
   == Background ==
   Apache Hadoop MapReduce has emerged as the assembly-language on which
  other
   frameworks like Apache Pig and Apache Hive have been built. However, it
  has
   been well accepted that MapReduce produces very constrained task DAGs
  for each
   job which results in Apache Pig and Apache Hive requiring multiple
  MapReduce
   jobs for several queries. By providing a more expressive DAG of tasks
  for a
   job, Tez attempts to provide significantly enhanced data-processing
   capabilities for projects like Apache Pig, Apache Hive, Cascading etc.
  
   == Rationale ==
   There is an important gap that Tez fulfills in the Apache Hadoop
  ecosystem of
   allowing for more expressive task DAGs for data-processing applications
  such
   as Apache Pig, Apache Hive, Cascading etc.
  
   With emergence of Apache Hadoop YARN, there is a strong need for a
   common DAG application which can then be shared by Apache Pig, Apache
  Hive,
   Cascading etc.
  
   == Initial Goals ==
   The initial goals for this project are to specify the detailed
  requirements
   and architecture, and then develop the initial implementation including
  the
   DAG ApplicationMaster to run natively inside Apache Hadoop YARN.
  
   == Current Status ==
   Significant work has been completed to identify the initial
 requirements
  and
   define the overall system architecture. There is a patch available in
 the
   internal Hortonworks git repository which can act as the initial seed.
  
   === Meritocracy ===
   We plan to invest in supporting a meritocracy. We will discuss the
  requirements
   in an open forum. Several companies have already expressed interest in
  this
   project, and we intend to invite additional developers to participate.
   We will encourage and monitor community participation so that
 privileges
  can be
   extended to those that contribute.
  
   === Community ===
   The need for a generic DAG application for data processing in the open
  source is
   tremendous, so there is a potential for a very large community. We
  believe
   that Tez's extensible architecture will further encourage community
  participation.
   Also, related Apache projects (eg, Pig, Hive) have very large and
 active
   communities, and we expect that over time Tez will also attract a large
  community.
  
   === Core Developers ===
   The developers on the initial committers list include people very
  experienced
   in the Apache Hadoop ecosystem:
  
   * Alan Gates gates at apache dot org
   * Arun C Murthy acmurthy at apache dot org
   * Ashutosh Chauhan hashutosh at apache dot org
   * 

Please help us granting rights to a new S4 committer

2013-02-20 Thread Matthieu Morel
Hi,

Daniel Gomez Ferro (id = dferro) was recently elected as a new committer for 
the S4 incubator project, but he does not have all permissions yet.

According to http://people.apache.org/committer-index.html , Daniel is a member 
of the incubator group, but not yet of s4: we'd need an IPMC member to 
grant him these rights. Our mentors could not do that (rights, availability), 
so we're turning to the broader community.

Can some IPMC member help us?

Thanks!

Matthieu
-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [DISCUSS] [VOTE] HCatalog to Graduate and become part of Apache Hive

2013-02-20 Thread Alan Gates
The project was named Howl when it was proposed, so the proposal is at 
http://wiki.apache.org/incubator/HowlProposal

Alan.

On Feb 20, 2013, at 2:31 AM, Ross Gardler wrote:

 I'm brought to this thread byt he board report but my response here is as
 an IPMC member. My comment on the board report is quite different, it is
 I've read the thread on general@ and feel that the IPMC should make a
 clear recommendation to the board in this and similar cases. The IPMC
 discussion seems to be healthy and productive.
 
 So, as a an IPMC member I have a few open questions [inline]...
 
 
 On 11 February 2013 18:20, Alan Gates ga...@hortonworks.com wrote:
 
 
 
 Also, it has been agreed that each HCatalog committer will be provided with
 a mentor from the Hive community to help him/her learn the rest of Hive,
 with the goal of becoming a committer on Hive within six months.  The
 submodule state is transitionary, not an end point.
 
 
 Why was thismentoring not done as part of the incubation process since
 building the right community structure for graduation (along with IP
 clearance) is the main role of the incubation process? Was Hive the
 sponsoring project for this proposal? If not why not?
 
 I ask these questions because HCatlog is making a very strong case that any
 other option for graduation is not appropriate. At the same time we are
 being told by the Hive PMC that the mentoring of the committers is
 incomplete since they have insufficient merit within Hive to be trusted to
 be full members of that project.
 
 it also concerns me that in this same month the IPMC board report says The
 main concern of the incubator continues to be the quality and reliability
 of supervision... The supply of mentoring seems, still, to exceed demand.
 
 Why is it that the Hive PMC feels it is able to provide mentoring within
 their own PMC through the creation of what some people see as
 an umbrella project, but not here in the IPMC?
 
 Finally, why can't I find the HCatalog proposal in my mail client, markmail
 or the wiki (not had coffee yet, feel free to call me [insert adjective])
 
 Ross


-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [VOTE] Accept Tez into Incubator

2013-02-20 Thread Andrew Purtell
 Arun, would you please clarify how Tez is (conceptually) different from
the Workflow AM proposed in MAPREDUCE-4495/OOZIE-1178?

I would also like to understand this as well. They seem largely identical,
but the Tez proposal has a set of initial committers disjunctive from those
who performed the work on MAPREDUCE-4495/OOZIE-1178 and volunteered for the
so-called YAPP proposal.


On Wed, Feb 20, 2013 at 8:26 AM, Alejandro Abdelnur t...@cloudera.comwrote:

 +1 (non-binding), glad to see that finally the idea of having a DAG AM is
 getting traction.

 Arun, would you please clarify how Tez is (conceptually) different from the
 Workflow AM proposed in MAPREDUCE-4495/OOZIE-1178?



 On Wed, Feb 20, 2013 at 6:50 AM, Hitesh Shah hit...@hortonworks.com
 wrote:

  +1 ( non-binding )
 
  -- Hitesh
 
  On Feb 19, 2013, at 8:26 PM, Arun C Murthy wrote:
 
   Hi Folks,
  
   Thanks for participating in the discussion. I'd like to call a VOTE for
  acceptance of Apache Tez into the Incubator. I'll let the vote run till
  into this weekend (Sun 2/24 6pm PST).
  
   [ ]  +1 Accept Apache Tez into the Incubator
   [ ]  +0 Don't care.
   [ ]  -1 Don't accept Apache Tez into the Incubator because...
  
   Full proposal is pasted at the bottom of this email, and the
  corresponding wiki is http://wiki.apache.org/incubator/TezProposal.
  
   Only VOTEs from Incubator PMC members are binding, but all are welcome
  to express their thoughts.
  
   Here's my +1 (binding).
  
   thanks,
   Arun
  
   PS: From the initial discussion, the only changes are that I've added
  one new mentor and 2 new committers. All the new additions come from the
  non-major employer while we continue to strive to further diversify
 during
  the incubation. Thanks.
  
   
  
   = Tez =
  
   == Abstract ==
   Tez is an effort to develop a generic application framework which can
 be
  used
   to process arbitrarily complex data-processing tasks and also a
  re-usable set
   of data-processing primitives which can be used by other projects.
  
   == Proposal ==
   Tez is a proposal to develop a generic application which can be used to
   process complex data-processing task DAGs and runs natively on Apache
  Hadoop
   YARN. YARN is a generic resource-management system on which currently
   applications like MapReduce already exist. MapReduce is a specific, and
   constrained, DAG - which is not optimal for several frameworks like
  Apache Hive
   and Apache Pig. Furthermore, we propose to develop a re-usable set of
   libraries of data-processing primitives such as sorting, merging,
   data-shuffling, intermediate data management etc. which are necessary
  for Tez
   which we envision can be used directly by other projects.
  
   == Background ==
   Apache Hadoop MapReduce has emerged as the assembly-language on which
  other
   frameworks like Apache Pig and Apache Hive have been built. However, it
  has
   been well accepted that MapReduce produces very constrained task DAGs
  for each
   job which results in Apache Pig and Apache Hive requiring multiple
  MapReduce
   jobs for several queries. By providing a more expressive DAG of tasks
  for a
   job, Tez attempts to provide significantly enhanced data-processing
   capabilities for projects like Apache Pig, Apache Hive, Cascading etc.
  
   == Rationale ==
   There is an important gap that Tez fulfills in the Apache Hadoop
  ecosystem of
   allowing for more expressive task DAGs for data-processing applications
  such
   as Apache Pig, Apache Hive, Cascading etc.
  
   With emergence of Apache Hadoop YARN, there is a strong need for a
   common DAG application which can then be shared by Apache Pig, Apache
  Hive,
   Cascading etc.
  
   == Initial Goals ==
   The initial goals for this project are to specify the detailed
  requirements
   and architecture, and then develop the initial implementation including
  the
   DAG ApplicationMaster to run natively inside Apache Hadoop YARN.
  
   == Current Status ==
   Significant work has been completed to identify the initial
 requirements
  and
   define the overall system architecture. There is a patch available in
 the
   internal Hortonworks git repository which can act as the initial seed.
  
   === Meritocracy ===
   We plan to invest in supporting a meritocracy. We will discuss the
  requirements
   in an open forum. Several companies have already expressed interest in
  this
   project, and we intend to invite additional developers to participate.
   We will encourage and monitor community participation so that
 privileges
  can be
   extended to those that contribute.
  
   === Community ===
   The need for a generic DAG application for data processing in the open
  source is
   tremendous, so there is a potential for a very large community. We
  believe
   that Tez's extensible architecture will further encourage community
  participation.
   Also, related Apache projects (eg, Pig, Hive) have very large and
 active
   communities, and we 

Re: Please help us granting rights to a new S4 committer

2013-02-20 Thread Daniel Shahaf
Sendingasf-authorization-template
Transmitting file data .
Committed revision 851328.

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [VOTE] Release of EasyAnt 0.9-incubating

2013-02-20 Thread Antoine Levy Lambert
+1 
Antoine
On Feb 19, 2013, at 2:23 AM, ant elder wrote:

 I've already voted +1 for this on the easyant list but +1 again to bring
 the vote up again. Still need one more vote please anyone...
 
   ...ant
 
 On Thu, Feb 14, 2013 at 9:14 PM, Nicolas Lalevée nicolas.lale...@hibnet.org
 wrote:
 
 This is a call for a vote of the release of EasyAnt 0.9-incubating. We are
 releasing the plugins, the buildtypes, the skeletons and the core (the main
 distribution).
 
 The first vote happened on the dev mailing list there:
 
 http://mail-archives.apache.org/mod_mbox/incubator-easyant-dev/201302.mbox/%3C21AB5F29-A4AF-4886-B640-DC0E32D294A0%40hibnet.org%3E
 We've got 4 +1 from the PPMC, 2 +1 from the mentors.
 
 The svn tags are there:
 
 http://svn.apache.org/repos/asf/incubator/easyant/plugins/tags/0.9-incubating/
 
 http://svn.apache.org/repos/asf/incubator/easyant/buildtypes/tags/0.9-incubating/
 
 http://svn.apache.org/repos/asf/incubator/easyant/skeletons/tags/0.9-incubating/
 http://svn.apache.org/repos/asf/incubator/easyant/core/tags/0.9-incubating/
 
 The released artifacts are there:
 http://people.apache.org/~hibou/easyant-0.9-incubating/
 
 The KEYS file is there:
 http://svn.apache.org/repos/asf/incubator/easyant/KEYS
 
 Please, cast your votes:
 [ ] +1, I accept the release
 [ ] +0, OK, but….
 [ ] -1, I disapprove, because….
 
 Nicolas


-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [VOTE] Accept Tez into Incubator

2013-02-20 Thread Arun C Murthy
On Feb 20, 2013, at 1:38 PM, Andrew Purtell wrote:

 Arun, would you please clarify how Tez is (conceptually) different from
 the Workflow AM proposed in MAPREDUCE-4495/OOZIE-1178?
 
 I would also like to understand this as well. They seem largely identical,
 but the Tez proposal has a set of initial committers disjunctive from those
 who performed the work on MAPREDUCE-4495/OOZIE-1178 and volunteered for the
 so-called YAPP proposal.
 

Sorry, I thought I answered this when I talked about scope of Tez being similar 
to Hyracks or Stratosphere when I responded to Sebastian 
(http://s.apache.org/x4u).

IAC, Tez is an attempt to build a _single job_ which can run a DAG of tasks 
(ala Hyracks/Stratosphere) where-as yapp was about having an application to 
manage a DAG of independent jobs to construct a more complex workflow for 
Oozie. Each is significantly different to the other in scope and goals.

Hope that helps.

thanks,
Arun

PS: I'm happy to help bring yapp in as another podling, but I haven't heard 
back from Alejandro and other original authors of what we talked about as yapp. 
If you could comment on the jira, I'll take it fwd - I'm still interested in 
working on yapp independent of Tez - this is one of the strengths of YARN. Tx!

 
 On Wed, Feb 20, 2013 at 8:26 AM, Alejandro Abdelnur t...@cloudera.comwrote:
 
 +1 (non-binding), glad to see that finally the idea of having a DAG AM is
 getting traction.
 
 Arun, would you please clarify how Tez is (conceptually) different from the
 Workflow AM proposed in MAPREDUCE-4495/OOZIE-1178?
 
 
 
 On Wed, Feb 20, 2013 at 6:50 AM, Hitesh Shah hit...@hortonworks.com
 wrote:
 
 +1 ( non-binding )
 
 -- Hitesh
 
 On Feb 19, 2013, at 8:26 PM, Arun C Murthy wrote:
 
 Hi Folks,
 
 Thanks for participating in the discussion. I'd like to call a VOTE for
 acceptance of Apache Tez into the Incubator. I'll let the vote run till
 into this weekend (Sun 2/24 6pm PST).
 
 [ ]  +1 Accept Apache Tez into the Incubator
 [ ]  +0 Don't care.
 [ ]  -1 Don't accept Apache Tez into the Incubator because...
 
 Full proposal is pasted at the bottom of this email, and the
 corresponding wiki is http://wiki.apache.org/incubator/TezProposal.
 
 Only VOTEs from Incubator PMC members are binding, but all are welcome
 to express their thoughts.
 
 Here's my +1 (binding).
 
 thanks,
 Arun
 
 PS: From the initial discussion, the only changes are that I've added
 one new mentor and 2 new committers. All the new additions come from the
 non-major employer while we continue to strive to further diversify
 during
 the incubation. Thanks.
 
 
 
 = Tez =
 
 == Abstract ==
 Tez is an effort to develop a generic application framework which can
 be
 used
 to process arbitrarily complex data-processing tasks and also a
 re-usable set
 of data-processing primitives which can be used by other projects.
 
 == Proposal ==
 Tez is a proposal to develop a generic application which can be used to
 process complex data-processing task DAGs and runs natively on Apache
 Hadoop
 YARN. YARN is a generic resource-management system on which currently
 applications like MapReduce already exist. MapReduce is a specific, and
 constrained, DAG - which is not optimal for several frameworks like
 Apache Hive
 and Apache Pig. Furthermore, we propose to develop a re-usable set of
 libraries of data-processing primitives such as sorting, merging,
 data-shuffling, intermediate data management etc. which are necessary
 for Tez
 which we envision can be used directly by other projects.
 
 == Background ==
 Apache Hadoop MapReduce has emerged as the assembly-language on which
 other
 frameworks like Apache Pig and Apache Hive have been built. However, it
 has
 been well accepted that MapReduce produces very constrained task DAGs
 for each
 job which results in Apache Pig and Apache Hive requiring multiple
 MapReduce
 jobs for several queries. By providing a more expressive DAG of tasks
 for a
 job, Tez attempts to provide significantly enhanced data-processing
 capabilities for projects like Apache Pig, Apache Hive, Cascading etc.
 
 == Rationale ==
 There is an important gap that Tez fulfills in the Apache Hadoop
 ecosystem of
 allowing for more expressive task DAGs for data-processing applications
 such
 as Apache Pig, Apache Hive, Cascading etc.
 
 With emergence of Apache Hadoop YARN, there is a strong need for a
 common DAG application which can then be shared by Apache Pig, Apache
 Hive,
 Cascading etc.
 
 == Initial Goals ==
 The initial goals for this project are to specify the detailed
 requirements
 and architecture, and then develop the initial implementation including
 the
 DAG ApplicationMaster to run natively inside Apache Hadoop YARN.
 
 == Current Status ==
 Significant work has been completed to identify the initial
 requirements
 and
 define the overall system architecture. There is a patch available in
 the
 internal Hortonworks git repository which can act as the initial seed.
 
 === Meritocracy ===
 

Re: Wiki privs

2013-02-20 Thread Mattmann, Chris A (388J)
I took care of granting this karma, after Gav provided it to me via an IRC
chat.

Cheers,
Chris

On 2/19/13 7:04 PM, Arun C Murthy a...@hortonworks.com wrote:

Help, please?

I got one of my other mentors to put up the wiki, but would be nice to
get write access as well.

thanks!
Arun

On Feb 18, 2013, at 3:05 PM, Arun C Murthy wrote:

 Hi Folks,
 
 Can someone pls grant me privs so that I can put up a new Incubator
proposal on the wiki (http://wiki.apache.org/incubator/TezProposal) ? My
wiki username is 'Arun C Murthy'.
 
 thanks,
 Arun
 





-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [VOTE] Accept Tez into Incubator

2013-02-20 Thread Mattmann, Chris A (388J)
+1 (binding)

Thanks!

Cheers,
Chris

On 2/19/13 8:26 PM, Arun C Murthy a...@hortonworks.com wrote:

Hi Folks,

Thanks for participating in the discussion. I'd like to call a VOTE for
acceptance of Apache Tez into the Incubator. I'll let the vote run till
into this weekend (Sun 2/24 6pm PST).

[ ]  +1 Accept Apache Tez into the Incubator
[ ]  +0 Don't care.
[ ]  -1 Don't accept Apache Tez into the Incubator because...

Full proposal is pasted at the bottom of this email, and the
corresponding wiki is http://wiki.apache.org/incubator/TezProposal.

Only VOTEs from Incubator PMC members are binding, but all are welcome to
express their thoughts.

Here's my +1 (binding).

thanks,
Arun

PS: From the initial discussion, the only changes are that I've added one
new mentor and 2 new committers. All the new additions come from the
non-major employer while we continue to strive to further diversify
during the incubation. Thanks.



= Tez =

== Abstract ==
Tez is an effort to develop a generic application framework which can be
used
to process arbitrarily complex data-processing tasks and also a re-usable
set
of data-processing primitives which can be used by other projects.

== Proposal ==
Tez is a proposal to develop a generic application which can be used to
process complex data-processing task DAGs and runs natively on Apache
Hadoop 
YARN. YARN is a generic resource-management system on which currently
applications like MapReduce already exist. MapReduce is a specific, and
constrained, DAG - which is not optimal for several frameworks like
Apache Hive
and Apache Pig. Furthermore, we propose to develop a re-usable set of
libraries of data-processing primitives such as sorting, merging,
data-shuffling, intermediate data management etc. which are necessary for
Tez 
which we envision can be used directly by other projects.

== Background ==
Apache Hadoop MapReduce has emerged as the assembly-language on which
other
frameworks like Apache Pig and Apache Hive have been built. However, it
has
been well accepted that MapReduce produces very constrained task DAGs for
each
job which results in Apache Pig and Apache Hive requiring multiple
MapReduce
jobs for several queries. By providing a more expressive DAG of tasks for
a
job, Tez attempts to provide significantly enhanced data-processing
capabilities for projects like Apache Pig, Apache Hive, Cascading etc.

== Rationale ==
There is an important gap that Tez fulfills in the Apache Hadoop
ecosystem of
allowing for more expressive task DAGs for data-processing applications
such
as Apache Pig, Apache Hive, Cascading etc.

With emergence of Apache Hadoop YARN, there is a strong need for a
common DAG application which can then be shared by Apache Pig, Apache
Hive,
Cascading etc.

== Initial Goals ==
The initial goals for this project are to specify the detailed
requirements
and architecture, and then develop the initial implementation including
the
DAG ApplicationMaster to run natively inside Apache Hadoop YARN.

== Current Status ==
Significant work has been completed to identify the initial requirements
and
define the overall system architecture. There is a patch available in the
internal Hortonworks git repository which can act as the initial seed.

=== Meritocracy ===
We plan to invest in supporting a meritocracy. We will discuss the
requirements 
in an open forum. Several companies have already expressed interest in
this 
project, and we intend to invite additional developers to participate.
We will encourage and monitor community participation so that privileges
can be 
extended to those that contribute.

=== Community ===
The need for a generic DAG application for data processing in the open
source is 
tremendous, so there is a potential for a very large community. We believe
that Tez's extensible architecture will further encourage community
participation. 
Also, related Apache projects (eg, Pig, Hive) have very large and active
communities, and we expect that over time Tez will also attract a large
community.

=== Core Developers ===
The developers on the initial committers list include people very
experienced
in the Apache Hadoop ecosystem:

 * Alan Gates gates at apache dot org
 * Arun C Murthy acmurthy at apache dot org
 * Ashutosh Chauhan hashutosh at apache dot org
 * Bikas Saha bikas at apache dot org
 * Chris Douglas cdouglas at apache dot org
 * Daryn Sharp daryn at apache dot org
 * Devaraj Das ddas at apache dot org
 * Gopal Vijayaraghavan gopal at hortonworks dot com
 * Gunther Hagleitner ghagleitner at hortonworks dot com
 * Hitesh Shah hitesh at apache dot org
 * Jason Lowe jlowe at apache dot org
 * Jean Xu jeanxu at facebook dot com
 * Jitendra Pandey jitendra at apache dot org
 * Julien Le Dem julien at apache dot org
 * Kevin Wilfong kevinwilfong at apache dot org
 * Mike Liddell mike dot lidell at microsoft dot com
 * Namit Jain namit at apache dot org
 * Nathan Roberts nroberts at yahoo dash inc dot com
 * Owen O'Malley omalley at apache dot