[RESULT][VOTE] Accept DataFu into the Incubator

2014-01-17 Thread Jake Farrell
The Incubator status page did not pick up this vote closing due to the
format of the '[Result]' tag. Resending with updated subject and will
go clean up the script to have a better matching pattern to avoid this
in the future.

-Jake





13:00 came and went, vote’s closed.


With at least 10 binding +1s and no -1s, the vote passes.  I’ll get
started on the bootstrapping.


Thanks everybody,

Jakob








From: Suresh Marru
Sent: Saturday, January 4, 2014 1:41 AM
To: general@incubator.apache.org





+ 1 (binding).

Suresh
On Dec 31, 2013, at 3:39 PM, Jakob Homan jgho...@gmail.com wrote:

 Incubator-

 Following the discussion earlier, I'm calling a vote to accept DataFu as a
 new Incubator project.

 The proposal draft is available at:
 https://wiki.apache.org/incubator/DataFuProposal, and is also included
 below.

 Vote is open for at least 96h and closes at the earliest on 4 Jan 13:00
 PDT.  I'm letting the vote run an extra day as we're in the holiday season.

 [ ] +1 accept DataFu in the Incubator
 [ ] +/-0
 [ ] -1 because...

 Here's my binding +1.
 -Jakob

 ---
 Abstract

 DataFu makes it easier to solve data problems using Hadoop and higher level
 languages based on it.

 Proposal

 DataFu provides a collection of Hadoop MapReduce jobs and functions in
 higher level languages based on it to perform data analysis. It provides
 functions for common statistics tasks (e.g. quantiles, sampling), PageRank,
 stream sessionization, and set and bag operations. DataFu also provides
 Hadoop jobs for incremental data processing in MapReduce.

 Background

 DataFu began two years ago as set of UDFs developed internally at LinkedIn,
 coming from our desire to solve common problems with reusable components.
 Recognizing that the community could benefit from such a library, we added
 documentation, an extensive suite of unit tests, and open sourced the code.
 Since then there have been steady contributions to DataFu as we encountered
 common problems not yet solved by it. Others outside LinkedIn have
 contributed as well. More recently we recognized the challenges with
 efficient incremental processing of data in Hadoop and have contributed a
 set of Hadoop MapReduce jobs as a solution.

 DataFu began as a project at LinkedIn, but it has shown itself to be useful
 to other organizations and developers as well as they have faced similar
 problems. We would like to share DataFu with the ASF and begin developing a
 community of developers and users within Apache.

 Rationale

 There is a strong need for well tested libraries that help developers solve
 common data problems in Hadoop and higher level languages such as Pig,
 Hive, Crunch, Scalding, etc.

 Current Status

 Meritocracy

 Our intent with this incubator proposal is to start building a diverse
 developer community around DataFu following the Apache meritocracy model.
 Since DataFu was initially open sourced in 2011, it has received
 contributions from both within and outside LinkedIn. We plan to continue
 support for new contributors and work with those who contribute
 significantly to the project to make them committers.

 Community

 DataFu has been building a community of developers for two years. It began
 with contributors from LinkedIn and has received contributions from
 developers at Cloudera since very early on. It has been included included
 in Cloudera’s Hadoop Distribution and Apache Bigtop. We hope to extend our
 contributor base significantly and invite all those who are interested in
 solving large-scale data processing problems to participate.

 Core Developers

 DataFu has a strong base of developers at LinkedIn. Matthew Hayes initiated
 the project in 2011, and aside from continued contributions to DataFu has
 also contributed the sub-project Hourglass for incremental MapReduce
 processing. Separate from DataFu he has also open sourced the White
 Elephant project. Sam Shah contributed a significant portion of the
 original code and continues to contribute to the project. William Vaughan
 has been contributing regularly to DataFu for the past two years. Evion Kim
 has been contributing to DataFu for the past year. Xiangrui Meng recently
 contributed implementations of scalable sampling algorithms based on
 research from a paper he published. Chris Lloyd has provided some important
 bug fixes and unit tests. Mitul Tiwari has also contributed to DataFu.
 Mathieu Bastian has been developing MapReduce jobs that we hope to include
 in DataFu. In addition he also leads the open source Gephi project.

 Alignment

 The ASF is the natural choice to host the DataFu project as its goal of
 encouraging community-driven open-source projects fits with our vision for
 DataFu. Additionally, other projects DataFu integrates with, such as Apache
 Pig and Apache Hadoop, and in the future Apache Hive and Apache Crunch, are
 hosted by the ASF and we will benefit and provide benefit by close
 proximity to them.

 Known Risks

 

[Result][VOTE] Accept DataFu into the Incubator

2014-01-04 Thread jghoman
13:00 came and went, vote’s closed.


With at least 10 binding +1s and no -1s, the vote passes.  I’ll get started on 
the bootstrapping.


Thanks everybody, 

Jakob








From: Suresh Marru
Sent: ‎Saturday‎, ‎January‎ ‎4‎, ‎2014 ‎1‎:‎41‎ ‎AM
To: general@incubator.apache.org





+ 1 (binding).

Suresh
On Dec 31, 2013, at 3:39 PM, Jakob Homan jgho...@gmail.com wrote:

 Incubator-
 
 Following the discussion earlier, I'm calling a vote to accept DataFu as a
 new Incubator project.
 
 The proposal draft is available at:
 https://wiki.apache.org/incubator/DataFuProposal, and is also included
 below.
 
 Vote is open for at least 96h and closes at the earliest on 4 Jan 13:00
 PDT.  I'm letting the vote run an extra day as we're in the holiday season.
 
 [ ] +1 accept DataFu in the Incubator
 [ ] +/-0
 [ ] -1 because...
 
 Here's my binding +1.
 -Jakob
 
 ---
 Abstract
 
 DataFu makes it easier to solve data problems using Hadoop and higher level
 languages based on it.
 
 Proposal
 
 DataFu provides a collection of Hadoop MapReduce jobs and functions in
 higher level languages based on it to perform data analysis. It provides
 functions for common statistics tasks (e.g. quantiles, sampling), PageRank,
 stream sessionization, and set and bag operations. DataFu also provides
 Hadoop jobs for incremental data processing in MapReduce.
 
 Background
 
 DataFu began two years ago as set of UDFs developed internally at LinkedIn,
 coming from our desire to solve common problems with reusable components.
 Recognizing that the community could benefit from such a library, we added
 documentation, an extensive suite of unit tests, and open sourced the code.
 Since then there have been steady contributions to DataFu as we encountered
 common problems not yet solved by it. Others outside LinkedIn have
 contributed as well. More recently we recognized the challenges with
 efficient incremental processing of data in Hadoop and have contributed a
 set of Hadoop MapReduce jobs as a solution.
 
 DataFu began as a project at LinkedIn, but it has shown itself to be useful
 to other organizations and developers as well as they have faced similar
 problems. We would like to share DataFu with the ASF and begin developing a
 community of developers and users within Apache.
 
 Rationale
 
 There is a strong need for well tested libraries that help developers solve
 common data problems in Hadoop and higher level languages such as Pig,
 Hive, Crunch, Scalding, etc.
 
 Current Status
 
 Meritocracy
 
 Our intent with this incubator proposal is to start building a diverse
 developer community around DataFu following the Apache meritocracy model.
 Since DataFu was initially open sourced in 2011, it has received
 contributions from both within and outside LinkedIn. We plan to continue
 support for new contributors and work with those who contribute
 significantly to the project to make them committers.
 
 Community
 
 DataFu has been building a community of developers for two years. It began
 with contributors from LinkedIn and has received contributions from
 developers at Cloudera since very early on. It has been included included
 in Cloudera’s Hadoop Distribution and Apache Bigtop. We hope to extend our
 contributor base significantly and invite all those who are interested in
 solving large-scale data processing problems to participate.
 
 Core Developers
 
 DataFu has a strong base of developers at LinkedIn. Matthew Hayes initiated
 the project in 2011, and aside from continued contributions to DataFu has
 also contributed the sub-project Hourglass for incremental MapReduce
 processing. Separate from DataFu he has also open sourced the White
 Elephant project. Sam Shah contributed a significant portion of the
 original code and continues to contribute to the project. William Vaughan
 has been contributing regularly to DataFu for the past two years. Evion Kim
 has been contributing to DataFu for the past year. Xiangrui Meng recently
 contributed implementations of scalable sampling algorithms based on
 research from a paper he published. Chris Lloyd has provided some important
 bug fixes and unit tests. Mitul Tiwari has also contributed to DataFu.
 Mathieu Bastian has been developing MapReduce jobs that we hope to include
 in DataFu. In addition he also leads the open source Gephi project.
 
 Alignment
 
 The ASF is the natural choice to host the DataFu project as its goal of
 encouraging community-driven open-source projects fits with our vision for
 DataFu. Additionally, other projects DataFu integrates with, such as Apache
 Pig and Apache Hadoop, and in the future Apache Hive and Apache Crunch, are
 hosted by the ASF and we will benefit and provide benefit by close
 proximity to them.
 
 Known Risks
 
 Orphaned Products
 
 The core developers have been contributing to DataFu for the past two
 years. There is very little risk of DataFu being abandoned given its
 widespread use within LinkedIn.