Re: [RESULT] [VOTE] Apache Spark for the Incubator

2013-06-28 Thread Mattmann, Chris A (398J)
Hi Karthik,

Yes it is. You can join by sending blank emails to:

dev-subscr...@spark.incubator.apache.org
commits-subscr...@spark.incubator.apache.org

Cheers!

Chris

++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++






-Original Message-
From: karthik tunga karthik.tu...@gmail.com
Reply-To: general@incubator.apache.org general@incubator.apache.org
Date: Tuesday, June 25, 2013 11:22 PM
To: general@incubator.apache.org general@incubator.apache.org
Subject: Re: [RESULT] [VOTE] Apache Spark for the Incubator

Hi,

Is the mailing list setup ?

Cheers,
Karthik


On 20 June 2013 02:38, Matei Zaharia ma...@eecs.berkeley.edu wrote:

 Thanks Chris! We'll get started on all the required steps.

 Matei

 On Jun 20, 2013, at 4:35 AM, Mattmann, Chris A (398J) 
 chris.a.mattm...@jpl.nasa.gov wrote:

  Hi Folks,
 
  This VOTE has passed with the following tallies:
 
  +1
  Chris Mattmann*
  Konstantin Boudnik
  Henry Saputra*
  Reynold Xin
  Pei Chen
  Roman Shaposhnik*
  Suresh Marru*
  Scott Deboy
  Ted Dunning*
  Hitesh Shah
  Paul Ramirez*
  Ralph Goers*
  Alan Cabrera*
  Thilina Gunarathne
  Marcel Offermans*
  Alex Karasulu*
  Chris Douglas*
  Andrew Hart*
  Deepal jayasinghe
  Ashish
  Joe Brockmeier*
  Mohammad Nour El-Din*
  Arun C Murthy*
  Tim Williams*
  Arvind Prabhakar*
  Matt Franklin*
  Matei Zaharia
  Andy Konwinski
 
  +0.9
 
 
  Marvin Humphrey
 
  * -indicates IPMC
 
 
  I'll go ahead and get the JIRA tickets filed for email/issue
 tracking/Git,
  and then work with the community to get them moving on' over. Thanks
for
  VOTE'ing!
 
  Cheers,
  Chris
 
 
  ++
  Chris Mattmann, Ph.D.
  Senior Computer Scientist
  NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
  Office: 171-266B, Mailstop: 171-246
  Email: chris.a.mattm...@nasa.gov
  WWW:  http://sunset.usc.edu/~mattmann/
  ++
  Adjunct Assistant Professor, Computer Science Department
  University of Southern California, Los Angeles, CA 90089 USA
  ++
 
 
 
 
 
 
  -Original Message-
  From: Mattmann, jpluser chris.a.mattm...@jpl.nasa.gov
  Reply-To: general@incubator.apache.org
general@incubator.apache.org
  Date: Friday, June 7, 2013 10:34 PM
  To: general@incubator.apache.org general@incubator.apache.org
  Subject: [VOTE] Apache Spark for the Incubator
 
  Hi Folks,
 
  OK discussion has died down, time to VOTE to accept Spark into the
  Apache Incubator. I'll let the VOTE run for at least a week.
 
  So far I've heard +1s from the following folks, so no need for them
  to VOTE again unless they want to change their VOTE:
 
  +1
 
  Chris Mattmann*
  Konstantin Boudnik
  Henry Saputra*
  Reynold Xin
  Pei Chen
  Roman Shaposhnik*
  Suresh Marru*
 
  * -indicates IPMC
 
  [ ] +1 Accept Spark into the Apache Incubator.
  [ ] +0 Don't care.
  [ ] -1 Don't accept Spark into the Apache Incubator because..
 
  Proposal text is below.
 
  === Abstract ===
  Spark is an open source system for large-scale data analysis on
 clusters.
 
  === Proposal ===
  Spark is an open source system for fast and flexible large-scale data
  analysis. Spark provides a general purpose runtime that supports
  low-latency execution in several forms. These include interactive
  exploration of very large datasets, near real-time stream processing,
 and
  ad-hoc SQL analytics (through higher layer extensions). Spark
interfaces
  with HDFS, HBase, Cassandra and several other storage storage layers,
 and
  exposes APIs in Scala, Java and Python.
  Background
  Spark started as U.C. Berkeley research project, designed to
efficiently
  run machine learning algorithms on large datasets. Over time, it has
  evolved into a general computing engine as outlined above. Spark¹s
  developer community has also grown to include additional
institutions,
  such as universities, research labs, and corporations. Funding has
been
  provided by various institutions including the U.S. National Science
  Foundation, DARPA, and a number of industry sponsors. See:
  https://amplab.cs.berkeley.edu/sponsors/ for full details.
 
  === Rationale ===
  As the number of contributors to Spark has grown, we have sought for
a
  long-term home for the project, and we believe the Apache foundation
 would
  be a great fit. Spark is a natural fit for the Apache foundation:
Spark
  already interoperates with several existing

Re: [RESULT] [VOTE] Apache Spark for the Incubator

2013-06-26 Thread karthik tunga
Hi,

Is the mailing list setup ?

Cheers,
Karthik


On 20 June 2013 02:38, Matei Zaharia ma...@eecs.berkeley.edu wrote:

 Thanks Chris! We'll get started on all the required steps.

 Matei

 On Jun 20, 2013, at 4:35 AM, Mattmann, Chris A (398J) 
 chris.a.mattm...@jpl.nasa.gov wrote:

  Hi Folks,
 
  This VOTE has passed with the following tallies:
 
  +1
  Chris Mattmann*
  Konstantin Boudnik
  Henry Saputra*
  Reynold Xin
  Pei Chen
  Roman Shaposhnik*
  Suresh Marru*
  Scott Deboy
  Ted Dunning*
  Hitesh Shah
  Paul Ramirez*
  Ralph Goers*
  Alan Cabrera*
  Thilina Gunarathne
  Marcel Offermans*
  Alex Karasulu*
  Chris Douglas*
  Andrew Hart*
  Deepal jayasinghe
  Ashish
  Joe Brockmeier*
  Mohammad Nour El-Din*
  Arun C Murthy*
  Tim Williams*
  Arvind Prabhakar*
  Matt Franklin*
  Matei Zaharia
  Andy Konwinski
 
  +0.9
 
 
  Marvin Humphrey
 
  * -indicates IPMC
 
 
  I'll go ahead and get the JIRA tickets filed for email/issue
 tracking/Git,
  and then work with the community to get them moving on' over. Thanks for
  VOTE'ing!
 
  Cheers,
  Chris
 
 
  ++
  Chris Mattmann, Ph.D.
  Senior Computer Scientist
  NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
  Office: 171-266B, Mailstop: 171-246
  Email: chris.a.mattm...@nasa.gov
  WWW:  http://sunset.usc.edu/~mattmann/
  ++
  Adjunct Assistant Professor, Computer Science Department
  University of Southern California, Los Angeles, CA 90089 USA
  ++
 
 
 
 
 
 
  -Original Message-
  From: Mattmann, jpluser chris.a.mattm...@jpl.nasa.gov
  Reply-To: general@incubator.apache.org general@incubator.apache.org
  Date: Friday, June 7, 2013 10:34 PM
  To: general@incubator.apache.org general@incubator.apache.org
  Subject: [VOTE] Apache Spark for the Incubator
 
  Hi Folks,
 
  OK discussion has died down, time to VOTE to accept Spark into the
  Apache Incubator. I'll let the VOTE run for at least a week.
 
  So far I've heard +1s from the following folks, so no need for them
  to VOTE again unless they want to change their VOTE:
 
  +1
 
  Chris Mattmann*
  Konstantin Boudnik
  Henry Saputra*
  Reynold Xin
  Pei Chen
  Roman Shaposhnik*
  Suresh Marru*
 
  * -indicates IPMC
 
  [ ] +1 Accept Spark into the Apache Incubator.
  [ ] +0 Don't care.
  [ ] -1 Don't accept Spark into the Apache Incubator because..
 
  Proposal text is below.
 
  === Abstract ===
  Spark is an open source system for large-scale data analysis on
 clusters.
 
  === Proposal ===
  Spark is an open source system for fast and flexible large-scale data
  analysis. Spark provides a general purpose runtime that supports
  low-latency execution in several forms. These include interactive
  exploration of very large datasets, near real-time stream processing,
 and
  ad-hoc SQL analytics (through higher layer extensions). Spark interfaces
  with HDFS, HBase, Cassandra and several other storage storage layers,
 and
  exposes APIs in Scala, Java and Python.
  Background
  Spark started as U.C. Berkeley research project, designed to efficiently
  run machine learning algorithms on large datasets. Over time, it has
  evolved into a general computing engine as outlined above. Spark¹s
  developer community has also grown to include additional institutions,
  such as universities, research labs, and corporations. Funding has been
  provided by various institutions including the U.S. National Science
  Foundation, DARPA, and a number of industry sponsors. See:
  https://amplab.cs.berkeley.edu/sponsors/ for full details.
 
  === Rationale ===
  As the number of contributors to Spark has grown, we have sought for a
  long-term home for the project, and we believe the Apache foundation
 would
  be a great fit. Spark is a natural fit for the Apache foundation: Spark
  already interoperates with several existing Apache projects (HDFS,
 HBase,
  Hive, Cassandra, Avro and Flume to name a few). The Spark team is
 familiar
  with the Apache process and and subscribes to the Apache mission - the
  team includes multiple Apache committers already. Finally, joining
 Apache
  will help coordinate the development effort of the growing number of
  organizations which contribute to Spark.
 
  == Initial Goals ==
  The initial goals will most likely be to move the existing codebase to
  Apache and integrate with the Apache development process. Furthermore,
 we
  plan for incremental development, and releases along with the Apache
  guidelines.
 
  === Current Status ===
  == Meritocracy ==
  The Spark project already operates on meritocratic principles. Today,
  Spark has several developers and has accepted multiple major patches
 from
  outside of U.C. Berkeley. While this process has remained mostly
 informal
  (we do not have an official committer list), an implicit organization
  exists in which individuals

Re: [RESULT] [VOTE] Apache Spark for the Incubator

2013-06-20 Thread Matei Zaharia
Thanks Chris! We'll get started on all the required steps.

Matei

On Jun 20, 2013, at 4:35 AM, Mattmann, Chris A (398J) 
chris.a.mattm...@jpl.nasa.gov wrote:

 Hi Folks,
 
 This VOTE has passed with the following tallies:
 
 +1
 Chris Mattmann*
 Konstantin Boudnik
 Henry Saputra*
 Reynold Xin
 Pei Chen
 Roman Shaposhnik*
 Suresh Marru*
 Scott Deboy
 Ted Dunning*
 Hitesh Shah
 Paul Ramirez*
 Ralph Goers*
 Alan Cabrera*
 Thilina Gunarathne
 Marcel Offermans*
 Alex Karasulu*
 Chris Douglas*
 Andrew Hart*
 Deepal jayasinghe 
 Ashish
 Joe Brockmeier*
 Mohammad Nour El-Din*
 Arun C Murthy*
 Tim Williams*
 Arvind Prabhakar*
 Matt Franklin*
 Matei Zaharia
 Andy Konwinski
 
 +0.9
 
 
 Marvin Humphrey
 
 * -indicates IPMC
 
 
 I'll go ahead and get the JIRA tickets filed for email/issue tracking/Git,
 and then work with the community to get them moving on' over. Thanks for
 VOTE'ing!
 
 Cheers,
 Chris
 
 
 ++
 Chris Mattmann, Ph.D.
 Senior Computer Scientist
 NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
 Office: 171-266B, Mailstop: 171-246
 Email: chris.a.mattm...@nasa.gov
 WWW:  http://sunset.usc.edu/~mattmann/
 ++
 Adjunct Assistant Professor, Computer Science Department
 University of Southern California, Los Angeles, CA 90089 USA
 ++
 
 
 
 
 
 
 -Original Message-
 From: Mattmann, jpluser chris.a.mattm...@jpl.nasa.gov
 Reply-To: general@incubator.apache.org general@incubator.apache.org
 Date: Friday, June 7, 2013 10:34 PM
 To: general@incubator.apache.org general@incubator.apache.org
 Subject: [VOTE] Apache Spark for the Incubator
 
 Hi Folks,
 
 OK discussion has died down, time to VOTE to accept Spark into the
 Apache Incubator. I'll let the VOTE run for at least a week.
 
 So far I've heard +1s from the following folks, so no need for them
 to VOTE again unless they want to change their VOTE:
 
 +1
 
 Chris Mattmann*
 Konstantin Boudnik
 Henry Saputra*
 Reynold Xin
 Pei Chen
 Roman Shaposhnik*
 Suresh Marru*
 
 * -indicates IPMC
 
 [ ] +1 Accept Spark into the Apache Incubator.
 [ ] +0 Don't care.
 [ ] -1 Don't accept Spark into the Apache Incubator because..
 
 Proposal text is below.
 
 === Abstract ===
 Spark is an open source system for large-scale data analysis on clusters.
 
 === Proposal ===
 Spark is an open source system for fast and flexible large-scale data
 analysis. Spark provides a general purpose runtime that supports
 low-latency execution in several forms. These include interactive
 exploration of very large datasets, near real-time stream processing, and
 ad-hoc SQL analytics (through higher layer extensions). Spark interfaces
 with HDFS, HBase, Cassandra and several other storage storage layers, and
 exposes APIs in Scala, Java and Python.
 Background
 Spark started as U.C. Berkeley research project, designed to efficiently
 run machine learning algorithms on large datasets. Over time, it has
 evolved into a general computing engine as outlined above. Spark¹s
 developer community has also grown to include additional institutions,
 such as universities, research labs, and corporations. Funding has been
 provided by various institutions including the U.S. National Science
 Foundation, DARPA, and a number of industry sponsors. See:
 https://amplab.cs.berkeley.edu/sponsors/ for full details.
 
 === Rationale ===
 As the number of contributors to Spark has grown, we have sought for a
 long-term home for the project, and we believe the Apache foundation would
 be a great fit. Spark is a natural fit for the Apache foundation: Spark
 already interoperates with several existing Apache projects (HDFS, HBase,
 Hive, Cassandra, Avro and Flume to name a few). The Spark team is familiar
 with the Apache process and and subscribes to the Apache mission - the
 team includes multiple Apache committers already. Finally, joining Apache
 will help coordinate the development effort of the growing number of
 organizations which contribute to Spark.
 
 == Initial Goals ==
 The initial goals will most likely be to move the existing codebase to
 Apache and integrate with the Apache development process. Furthermore, we
 plan for incremental development, and releases along with the Apache
 guidelines.
 
 === Current Status ===
 == Meritocracy ==
 The Spark project already operates on meritocratic principles. Today,
 Spark has several developers and has accepted multiple major patches from
 outside of U.C. Berkeley. While this process has remained mostly informal
 (we do not have an official committer list), an implicit organization
 exists in which individuals who contribute major components act as
 maintainers for those modules. If accepted, the Spark project would
 include several of these participants as committers from the onset. We
 will work to identify all committers and PPMC members

[RESULT] [VOTE] Apache Spark for the Incubator

2013-06-19 Thread Mattmann, Chris A (398J)
Hi Folks,

This VOTE has passed with the following tallies:

+1
Chris Mattmann*
Konstantin Boudnik
Henry Saputra*
Reynold Xin
Pei Chen
Roman Shaposhnik*
Suresh Marru*
Scott Deboy
Ted Dunning*
Hitesh Shah
Paul Ramirez*
Ralph Goers*
Alan Cabrera*
Thilina Gunarathne
Marcel Offermans*
Alex Karasulu*
Chris Douglas*
Andrew Hart*
Deepal jayasinghe 
Ashish
Joe Brockmeier*
Mohammad Nour El-Din*
Arun C Murthy*
Tim Williams*
Arvind Prabhakar*
Matt Franklin*
Matei Zaharia
Andy Konwinski

+0.9


Marvin Humphrey

* -indicates IPMC


I'll go ahead and get the JIRA tickets filed for email/issue tracking/Git,
and then work with the community to get them moving on' over. Thanks for
VOTE'ing!

Cheers,
Chris


++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++






-Original Message-
From: Mattmann, jpluser chris.a.mattm...@jpl.nasa.gov
Reply-To: general@incubator.apache.org general@incubator.apache.org
Date: Friday, June 7, 2013 10:34 PM
To: general@incubator.apache.org general@incubator.apache.org
Subject: [VOTE] Apache Spark for the Incubator

Hi Folks,

OK discussion has died down, time to VOTE to accept Spark into the
Apache Incubator. I'll let the VOTE run for at least a week.

So far I've heard +1s from the following folks, so no need for them
to VOTE again unless they want to change their VOTE:

+1

Chris Mattmann*
Konstantin Boudnik
Henry Saputra*
Reynold Xin
Pei Chen
Roman Shaposhnik*
Suresh Marru*

* -indicates IPMC

[ ] +1 Accept Spark into the Apache Incubator.
[ ] +0 Don't care.
[ ] -1 Don't accept Spark into the Apache Incubator because..

Proposal text is below.

=== Abstract ===
Spark is an open source system for large-scale data analysis on clusters.

=== Proposal ===
Spark is an open source system for fast and flexible large-scale data
analysis. Spark provides a general purpose runtime that supports
low-latency execution in several forms. These include interactive
exploration of very large datasets, near real-time stream processing, and
ad-hoc SQL analytics (through higher layer extensions). Spark interfaces
with HDFS, HBase, Cassandra and several other storage storage layers, and
exposes APIs in Scala, Java and Python.
Background
Spark started as U.C. Berkeley research project, designed to efficiently
run machine learning algorithms on large datasets. Over time, it has
evolved into a general computing engine as outlined above. Spark¹s
developer community has also grown to include additional institutions,
such as universities, research labs, and corporations. Funding has been
provided by various institutions including the U.S. National Science
Foundation, DARPA, and a number of industry sponsors. See:
https://amplab.cs.berkeley.edu/sponsors/ for full details.

=== Rationale ===
As the number of contributors to Spark has grown, we have sought for a
long-term home for the project, and we believe the Apache foundation would
be a great fit. Spark is a natural fit for the Apache foundation: Spark
already interoperates with several existing Apache projects (HDFS, HBase,
Hive, Cassandra, Avro and Flume to name a few). The Spark team is familiar
with the Apache process and and subscribes to the Apache mission - the
team includes multiple Apache committers already. Finally, joining Apache
will help coordinate the development effort of the growing number of
organizations which contribute to Spark.

== Initial Goals ==
The initial goals will most likely be to move the existing codebase to
Apache and integrate with the Apache development process. Furthermore, we
plan for incremental development, and releases along with the Apache
guidelines.

=== Current Status ===
== Meritocracy ==
The Spark project already operates on meritocratic principles. Today,
Spark has several developers and has accepted multiple major patches from
outside of U.C. Berkeley. While this process has remained mostly informal
(we do not have an official committer list), an implicit organization
exists in which individuals who contribute major components act as
maintainers for those modules. If accepted, the Spark project would
include several of these participants as committers from the onset. We
will work to identify all committers and PPMC members for the project and
to operate under the ASF meritocratic principles.

=== Community ===
Acceptance into the Apache foundation would bolster the already strong
user and developer community around Spark. That community includes dozens
of contributors from several institutions, a meetup group with several
hundred members

Re: [VOTE] Apache Spark for the Incubator

2013-06-14 Thread Andy Konwinski
+1 (non-binding)

Andy


On Sat, Jun 8, 2013 at 12:36 AM, Matei Zaharia ma...@eecs.berkeley.eduwrote:

 +1 (non-binding)

 Matei

 On Jun 8, 2013, at 12:25 AM, Hitesh Shah hit...@hortonworks.com wrote:

  +1 (non-binding)
 
  -- Hitesh
 
  On Jun 7, 2013, at 10:34 PM, Mattmann, Chris A (398J) wrote:
 
  Hi Folks,
 
  OK discussion has died down, time to VOTE to accept Spark into the
  Apache Incubator. I'll let the VOTE run for at least a week.
 
  So far I've heard +1s from the following folks, so no need for them
  to VOTE again unless they want to change their VOTE:
 
  +1
 
  Chris Mattmann*
  Konstantin Boudnik
  Henry Saputra*
  Reynold Xin
  Pei Chen
  Roman Shaposhnik*
  Suresh Marru*
 
  * -indicates IPMC
 
  [ ] +1 Accept Spark into the Apache Incubator.
  [ ] +0 Don't care.
  [ ] -1 Don't accept Spark into the Apache Incubator because..
 
  Proposal text is below.
 
  === Abstract ===
  Spark is an open source system for large-scale data analysis on
 clusters.
 
  === Proposal ===
  Spark is an open source system for fast and flexible large-scale data
  analysis. Spark provides a general purpose runtime that supports
  low-latency execution in several forms. These include interactive
  exploration of very large datasets, near real-time stream processing,
 and
  ad-hoc SQL analytics (through higher layer extensions). Spark interfaces
  with HDFS, HBase, Cassandra and several other storage storage layers,
 and
  exposes APIs in Scala, Java and Python.
  Background
  Spark started as U.C. Berkeley research project, designed to efficiently
  run machine learning algorithms on large datasets. Over time, it has
  evolved into a general computing engine as outlined above. Spark¹s
  developer community has also grown to include additional institutions,
  such as universities, research labs, and corporations. Funding has been
  provided by various institutions including the U.S. National Science
  Foundation, DARPA, and a number of industry sponsors. See:
  https://amplab.cs.berkeley.edu/sponsors/ for full details.
 
  === Rationale ===
  As the number of contributors to Spark has grown, we have sought for a
  long-term home for the project, and we believe the Apache foundation
 would
  be a great fit. Spark is a natural fit for the Apache foundation: Spark
  already interoperates with several existing Apache projects (HDFS,
 HBase,
  Hive, Cassandra, Avro and Flume to name a few). The Spark team is
 familiar
  with the Apache process and and subscribes to the Apache mission - the
  team includes multiple Apache committers already. Finally, joining
 Apache
  will help coordinate the development effort of the growing number of
  organizations which contribute to Spark.
 
  == Initial Goals ==
  The initial goals will most likely be to move the existing codebase to
  Apache and integrate with the Apache development process. Furthermore,
 we
  plan for incremental development, and releases along with the Apache
  guidelines.
 
  === Current Status ===
  == Meritocracy ==
  The Spark project already operates on meritocratic principles. Today,
  Spark has several developers and has accepted multiple major patches
 from
  outside of U.C. Berkeley. While this process has remained mostly
 informal
  (we do not have an official committer list), an implicit organization
  exists in which individuals who contribute major components act as
  maintainers for those modules. If accepted, the Spark project would
  include several of these participants as committers from the onset. We
  will work to identify all committers and PPMC members for the project
 and
  to operate under the ASF meritocratic principles.
 
  === Community ===
  Acceptance into the Apache foundation would bolster the already strong
  user and developer community around Spark. That community includes
 dozens
  of contributors from several institutions, a meetup group with several
  hundred members, and an active mailing list composed of hundreds of
 users.
  Core Developers
  The core developers of our project are listed in our contributors and
  initial PPMC below. Though many exist at UC Berkeley, there is a
  representative cross sampling of other organizations including
 Quantifind,
  Microsoft, Yahoo!, ClearStory Data, Bizo, Intel, Tagged and Webtrends.
 
 
  === Alignment ===
  Our proposed effort aligns with several ongoing BIGDATA and U.S.
 National
  priority funding interests including the NSF and its Expeditions
 program,
  and the DARPA XDATA project. Our industry partners and collaborators are
  well aligned with our code base.
 
  There are also a number of related Apache projects and dependencies,
 that
  will be mentioned in the Relationships with Other Apache products
 section.
 
  == Known Risks ==
 
  === Orphaned Products ===
  Given the current level of investment in Spark - the risk of the project
  being abandoned is minimal. There are several constituents who are
 highly
  incentivized to continue development. The U.C. 

Re: [VOTE] Apache Spark for the Incubator

2013-06-12 Thread Matei Zaharia
+1 (non-binding)

Matei

On Jun 8, 2013, at 12:25 AM, Hitesh Shah hit...@hortonworks.com wrote:

 +1 (non-binding)
 
 -- Hitesh
 
 On Jun 7, 2013, at 10:34 PM, Mattmann, Chris A (398J) wrote:
 
 Hi Folks,
 
 OK discussion has died down, time to VOTE to accept Spark into the
 Apache Incubator. I'll let the VOTE run for at least a week.
 
 So far I've heard +1s from the following folks, so no need for them
 to VOTE again unless they want to change their VOTE:
 
 +1
 
 Chris Mattmann*
 Konstantin Boudnik
 Henry Saputra*
 Reynold Xin
 Pei Chen
 Roman Shaposhnik*
 Suresh Marru*
 
 * -indicates IPMC
 
 [ ] +1 Accept Spark into the Apache Incubator.
 [ ] +0 Don't care.
 [ ] -1 Don't accept Spark into the Apache Incubator because..
 
 Proposal text is below.
 
 === Abstract ===
 Spark is an open source system for large-scale data analysis on clusters.
 
 === Proposal ===
 Spark is an open source system for fast and flexible large-scale data
 analysis. Spark provides a general purpose runtime that supports
 low-latency execution in several forms. These include interactive
 exploration of very large datasets, near real-time stream processing, and
 ad-hoc SQL analytics (through higher layer extensions). Spark interfaces
 with HDFS, HBase, Cassandra and several other storage storage layers, and
 exposes APIs in Scala, Java and Python.
 Background
 Spark started as U.C. Berkeley research project, designed to efficiently
 run machine learning algorithms on large datasets. Over time, it has
 evolved into a general computing engine as outlined above. Spark¹s
 developer community has also grown to include additional institutions,
 such as universities, research labs, and corporations. Funding has been
 provided by various institutions including the U.S. National Science
 Foundation, DARPA, and a number of industry sponsors. See:
 https://amplab.cs.berkeley.edu/sponsors/ for full details.
 
 === Rationale ===
 As the number of contributors to Spark has grown, we have sought for a
 long-term home for the project, and we believe the Apache foundation would
 be a great fit. Spark is a natural fit for the Apache foundation: Spark
 already interoperates with several existing Apache projects (HDFS, HBase,
 Hive, Cassandra, Avro and Flume to name a few). The Spark team is familiar
 with the Apache process and and subscribes to the Apache mission - the
 team includes multiple Apache committers already. Finally, joining Apache
 will help coordinate the development effort of the growing number of
 organizations which contribute to Spark.
 
 == Initial Goals ==
 The initial goals will most likely be to move the existing codebase to
 Apache and integrate with the Apache development process. Furthermore, we
 plan for incremental development, and releases along with the Apache
 guidelines.
 
 === Current Status ===
 == Meritocracy ==
 The Spark project already operates on meritocratic principles. Today,
 Spark has several developers and has accepted multiple major patches from
 outside of U.C. Berkeley. While this process has remained mostly informal
 (we do not have an official committer list), an implicit organization
 exists in which individuals who contribute major components act as
 maintainers for those modules. If accepted, the Spark project would
 include several of these participants as committers from the onset. We
 will work to identify all committers and PPMC members for the project and
 to operate under the ASF meritocratic principles.
 
 === Community ===
 Acceptance into the Apache foundation would bolster the already strong
 user and developer community around Spark. That community includes dozens
 of contributors from several institutions, a meetup group with several
 hundred members, and an active mailing list composed of hundreds of users.
 Core Developers
 The core developers of our project are listed in our contributors and
 initial PPMC below. Though many exist at UC Berkeley, there is a
 representative cross sampling of other organizations including Quantifind,
 Microsoft, Yahoo!, ClearStory Data, Bizo, Intel, Tagged and Webtrends.
 
 
 === Alignment ===
 Our proposed effort aligns with several ongoing BIGDATA and U.S. National
 priority funding interests including the NSF and its Expeditions program,
 and the DARPA XDATA project. Our industry partners and collaborators are
 well aligned with our code base.
 
 There are also a number of related Apache projects and dependencies, that
 will be mentioned in the Relationships with Other Apache products section.
 
 == Known Risks ==
 
 === Orphaned Products ===
 Given the current level of investment in Spark - the risk of the project
 being abandoned is minimal. There are several constituents who are highly
 incentivized to continue development. The U.C. Berkeley AMPLab relies on
 Spark as a platform for a large number of long-term research projects.
 Several companies have build verticalized products which are tightly
 dependent on Spark. Other companies have devoted 

Re: [VOTE] Apache Spark for the Incubator

2013-06-11 Thread Matt Franklin
+1 (binding)


On Sat, Jun 8, 2013 at 1:34 AM, Mattmann, Chris A (398J) 
chris.a.mattm...@jpl.nasa.gov wrote:

 Hi Folks,

 OK discussion has died down, time to VOTE to accept Spark into the
 Apache Incubator. I'll let the VOTE run for at least a week.

 So far I've heard +1s from the following folks, so no need for them
 to VOTE again unless they want to change their VOTE:

 +1

 Chris Mattmann*
 Konstantin Boudnik
 Henry Saputra*
 Reynold Xin
 Pei Chen
 Roman Shaposhnik*
 Suresh Marru*

 * -indicates IPMC

 [ ] +1 Accept Spark into the Apache Incubator.
 [ ] +0 Don't care.
 [ ] -1 Don't accept Spark into the Apache Incubator because..

 Proposal text is below.

 === Abstract ===
 Spark is an open source system for large-scale data analysis on clusters.

 === Proposal ===
 Spark is an open source system for fast and flexible large-scale data
 analysis. Spark provides a general purpose runtime that supports
 low-latency execution in several forms. These include interactive
 exploration of very large datasets, near real-time stream processing, and
 ad-hoc SQL analytics (through higher layer extensions). Spark interfaces
 with HDFS, HBase, Cassandra and several other storage storage layers, and
 exposes APIs in Scala, Java and Python.
 Background
 Spark started as U.C. Berkeley research project, designed to efficiently
 run machine learning algorithms on large datasets. Over time, it has
 evolved into a general computing engine as outlined above. Spark¹s
 developer community has also grown to include additional institutions,
 such as universities, research labs, and corporations. Funding has been
 provided by various institutions including the U.S. National Science
 Foundation, DARPA, and a number of industry sponsors. See:
 https://amplab.cs.berkeley.edu/sponsors/ for full details.

 === Rationale ===
 As the number of contributors to Spark has grown, we have sought for a
 long-term home for the project, and we believe the Apache foundation would
 be a great fit. Spark is a natural fit for the Apache foundation: Spark
 already interoperates with several existing Apache projects (HDFS, HBase,
 Hive, Cassandra, Avro and Flume to name a few). The Spark team is familiar
 with the Apache process and and subscribes to the Apache mission - the
 team includes multiple Apache committers already. Finally, joining Apache
 will help coordinate the development effort of the growing number of
 organizations which contribute to Spark.

 == Initial Goals ==
 The initial goals will most likely be to move the existing codebase to
 Apache and integrate with the Apache development process. Furthermore, we
 plan for incremental development, and releases along with the Apache
 guidelines.

 === Current Status ===
 == Meritocracy ==
 The Spark project already operates on meritocratic principles. Today,
 Spark has several developers and has accepted multiple major patches from
 outside of U.C. Berkeley. While this process has remained mostly informal
 (we do not have an official committer list), an implicit organization
 exists in which individuals who contribute major components act as
 maintainers for those modules. If accepted, the Spark project would
 include several of these participants as committers from the onset. We
 will work to identify all committers and PPMC members for the project and
 to operate under the ASF meritocratic principles.

 === Community ===
 Acceptance into the Apache foundation would bolster the already strong
 user and developer community around Spark. That community includes dozens
 of contributors from several institutions, a meetup group with several
 hundred members, and an active mailing list composed of hundreds of users.
 Core Developers
 The core developers of our project are listed in our contributors and
 initial PPMC below. Though many exist at UC Berkeley, there is a
 representative cross sampling of other organizations including Quantifind,
 Microsoft, Yahoo!, ClearStory Data, Bizo, Intel, Tagged and Webtrends.


 === Alignment ===
 Our proposed effort aligns with several ongoing BIGDATA and U.S. National
 priority funding interests including the NSF and its Expeditions program,
 and the DARPA XDATA project. Our industry partners and collaborators are
 well aligned with our code base.

 There are also a number of related Apache projects and dependencies, that
 will be mentioned in the Relationships with Other Apache products section.

 == Known Risks ==

 === Orphaned Products ===
 Given the current level of investment in Spark - the risk of the project
 being abandoned is minimal. There are several constituents who are highly
 incentivized to continue development. The U.C. Berkeley AMPLab relies on
 Spark as a platform for a large number of long-term research projects.
 Several companies have build verticalized products which are tightly
 dependent on Spark. Other companies have devoted significant internal
 infrastructure investment in Spark.

 === Inexperience with Open Source ===
 

Re: [VOTE] Apache Spark for the Incubator

2013-06-10 Thread Joe Brockmeier
On Sat, Jun 8, 2013, at 12:34 AM, Mattmann, Chris A (398J) wrote:
 [ ] +1 Accept Spark into the Apache Incubator.
 [ ] +0 Don't care.
 [ ] -1 Don't accept Spark into the Apache Incubator because..

+1 (binding)

Best,

jzb
-- 
Joe Brockmeier
j...@zonker.net
Twitter: @jzb
http://www.dissociatedpress.net/

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [VOTE] Apache Spark for the Incubator

2013-06-10 Thread Tim Williams
+1

--tim

On Sat, Jun 8, 2013 at 1:34 AM, Mattmann, Chris A (398J)
chris.a.mattm...@jpl.nasa.gov wrote:
 Hi Folks,

 OK discussion has died down, time to VOTE to accept Spark into the
 Apache Incubator. I'll let the VOTE run for at least a week.

 So far I've heard +1s from the following folks, so no need for them
 to VOTE again unless they want to change their VOTE:

 +1

 Chris Mattmann*
 Konstantin Boudnik
 Henry Saputra*
 Reynold Xin
 Pei Chen
 Roman Shaposhnik*
 Suresh Marru*

 * -indicates IPMC

 [ ] +1 Accept Spark into the Apache Incubator.
 [ ] +0 Don't care.
 [ ] -1 Don't accept Spark into the Apache Incubator because..

 Proposal text is below.

 === Abstract ===
 Spark is an open source system for large-scale data analysis on clusters.

 === Proposal ===
 Spark is an open source system for fast and flexible large-scale data
 analysis. Spark provides a general purpose runtime that supports
 low-latency execution in several forms. These include interactive
 exploration of very large datasets, near real-time stream processing, and
 ad-hoc SQL analytics (through higher layer extensions). Spark interfaces
 with HDFS, HBase, Cassandra and several other storage storage layers, and
 exposes APIs in Scala, Java and Python.
 Background
 Spark started as U.C. Berkeley research project, designed to efficiently
 run machine learning algorithms on large datasets. Over time, it has
 evolved into a general computing engine as outlined above. Spark¹s
 developer community has also grown to include additional institutions,
 such as universities, research labs, and corporations. Funding has been
 provided by various institutions including the U.S. National Science
 Foundation, DARPA, and a number of industry sponsors. See:
 https://amplab.cs.berkeley.edu/sponsors/ for full details.

 === Rationale ===
 As the number of contributors to Spark has grown, we have sought for a
 long-term home for the project, and we believe the Apache foundation would
 be a great fit. Spark is a natural fit for the Apache foundation: Spark
 already interoperates with several existing Apache projects (HDFS, HBase,
 Hive, Cassandra, Avro and Flume to name a few). The Spark team is familiar
 with the Apache process and and subscribes to the Apache mission - the
 team includes multiple Apache committers already. Finally, joining Apache
 will help coordinate the development effort of the growing number of
 organizations which contribute to Spark.

 == Initial Goals ==
 The initial goals will most likely be to move the existing codebase to
 Apache and integrate with the Apache development process. Furthermore, we
 plan for incremental development, and releases along with the Apache
 guidelines.

 === Current Status ===
 == Meritocracy ==
 The Spark project already operates on meritocratic principles. Today,
 Spark has several developers and has accepted multiple major patches from
 outside of U.C. Berkeley. While this process has remained mostly informal
 (we do not have an official committer list), an implicit organization
 exists in which individuals who contribute major components act as
 maintainers for those modules. If accepted, the Spark project would
 include several of these participants as committers from the onset. We
 will work to identify all committers and PPMC members for the project and
 to operate under the ASF meritocratic principles.

 === Community ===
 Acceptance into the Apache foundation would bolster the already strong
 user and developer community around Spark. That community includes dozens
 of contributors from several institutions, a meetup group with several
 hundred members, and an active mailing list composed of hundreds of users.
 Core Developers
 The core developers of our project are listed in our contributors and
 initial PPMC below. Though many exist at UC Berkeley, there is a
 representative cross sampling of other organizations including Quantifind,
 Microsoft, Yahoo!, ClearStory Data, Bizo, Intel, Tagged and Webtrends.


 === Alignment ===
 Our proposed effort aligns with several ongoing BIGDATA and U.S. National
 priority funding interests including the NSF and its Expeditions program,
 and the DARPA XDATA project. Our industry partners and collaborators are
 well aligned with our code base.

 There are also a number of related Apache projects and dependencies, that
 will be mentioned in the Relationships with Other Apache products section.

 == Known Risks ==

 === Orphaned Products ===
 Given the current level of investment in Spark - the risk of the project
 being abandoned is minimal. There are several constituents who are highly
 incentivized to continue development. The U.C. Berkeley AMPLab relies on
 Spark as a platform for a large number of long-term research projects.
 Several companies have build verticalized products which are tightly
 dependent on Spark. Other companies have devoted significant internal
 infrastructure investment in Spark.

 === Inexperience with Open Source ===
 Spark 

Re: [VOTE] Apache Spark for the Incubator

2013-06-09 Thread Marcel Offermans
+1 (binding)

Greetings, Marcel


-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [VOTE] Apache Spark for the Incubator

2013-06-09 Thread Andrew Hart

+1


Andrew

On 06/07/2013 10:34 PM, Mattmann, Chris A (398J) wrote:

Hi Folks,

OK discussion has died down, time to VOTE to accept Spark into the
Apache Incubator. I'll let the VOTE run for at least a week.

So far I've heard +1s from the following folks, so no need for them
to VOTE again unless they want to change their VOTE:

+1

Chris Mattmann*
Konstantin Boudnik
Henry Saputra*
Reynold Xin
Pei Chen
Roman Shaposhnik*
Suresh Marru*

* -indicates IPMC

[ ] +1 Accept Spark into the Apache Incubator.
[ ] +0 Don't care.
[ ] -1 Don't accept Spark into the Apache Incubator because..

Proposal text is below.

=== Abstract ===
Spark is an open source system for large-scale data analysis on clusters.

=== Proposal ===
Spark is an open source system for fast and flexible large-scale data
analysis. Spark provides a general purpose runtime that supports
low-latency execution in several forms. These include interactive
exploration of very large datasets, near real-time stream processing, and
ad-hoc SQL analytics (through higher layer extensions). Spark interfaces
with HDFS, HBase, Cassandra and several other storage storage layers, and
exposes APIs in Scala, Java and Python.
Background
Spark started as U.C. Berkeley research project, designed to efficiently
run machine learning algorithms on large datasets. Over time, it has
evolved into a general computing engine as outlined above. Spark¹s
developer community has also grown to include additional institutions,
such as universities, research labs, and corporations. Funding has been
provided by various institutions including the U.S. National Science
Foundation, DARPA, and a number of industry sponsors. See:
https://amplab.cs.berkeley.edu/sponsors/ for full details.

=== Rationale ===
As the number of contributors to Spark has grown, we have sought for a
long-term home for the project, and we believe the Apache foundation would
be a great fit. Spark is a natural fit for the Apache foundation: Spark
already interoperates with several existing Apache projects (HDFS, HBase,
Hive, Cassandra, Avro and Flume to name a few). The Spark team is familiar
with the Apache process and and subscribes to the Apache mission - the
team includes multiple Apache committers already. Finally, joining Apache
will help coordinate the development effort of the growing number of
organizations which contribute to Spark.

== Initial Goals ==
The initial goals will most likely be to move the existing codebase to
Apache and integrate with the Apache development process. Furthermore, we
plan for incremental development, and releases along with the Apache
guidelines.

=== Current Status ===
== Meritocracy ==
The Spark project already operates on meritocratic principles. Today,
Spark has several developers and has accepted multiple major patches from
outside of U.C. Berkeley. While this process has remained mostly informal
(we do not have an official committer list), an implicit organization
exists in which individuals who contribute major components act as
maintainers for those modules. If accepted, the Spark project would
include several of these participants as committers from the onset. We
will work to identify all committers and PPMC members for the project and
to operate under the ASF meritocratic principles.

=== Community ===
Acceptance into the Apache foundation would bolster the already strong
user and developer community around Spark. That community includes dozens
of contributors from several institutions, a meetup group with several
hundred members, and an active mailing list composed of hundreds of users.
Core Developers
The core developers of our project are listed in our contributors and
initial PPMC below. Though many exist at UC Berkeley, there is a
representative cross sampling of other organizations including Quantifind,
Microsoft, Yahoo!, ClearStory Data, Bizo, Intel, Tagged and Webtrends.


=== Alignment ===
Our proposed effort aligns with several ongoing BIGDATA and U.S. National
priority funding interests including the NSF and its Expeditions program,
and the DARPA XDATA project. Our industry partners and collaborators are
well aligned with our code base.

There are also a number of related Apache projects and dependencies, that
will be mentioned in the Relationships with Other Apache products section.

== Known Risks ==

=== Orphaned Products ===
Given the current level of investment in Spark - the risk of the project
being abandoned is minimal. There are several constituents who are highly
incentivized to continue development. The U.C. Berkeley AMPLab relies on
Spark as a platform for a large number of long-term research projects.
Several companies have build verticalized products which are tightly
dependent on Spark. Other companies have devoted significant internal
infrastructure investment in Spark.

=== Inexperience with Open Source ===
Spark has existed as a healthy open source project for several years.
During that time, Matei and others have curated an open-source 

Re: [VOTE] Apache Spark for the Incubator

2013-06-09 Thread Deepal jayasinghe
+1,

Deepal
 +1


 Andrew

 On 06/07/2013 10:34 PM, Mattmann, Chris A (398J) wrote:
 Hi Folks,

 OK discussion has died down, time to VOTE to accept Spark into the
 Apache Incubator. I'll let the VOTE run for at least a week.

 So far I've heard +1s from the following folks, so no need for them
 to VOTE again unless they want to change their VOTE:

 +1

 Chris Mattmann*
 Konstantin Boudnik
 Henry Saputra*
 Reynold Xin
 Pei Chen
 Roman Shaposhnik*
 Suresh Marru*

 * -indicates IPMC

 [ ] +1 Accept Spark into the Apache Incubator.
 [ ] +0 Don't care.
 [ ] -1 Don't accept Spark into the Apache Incubator because..

 Proposal text is below.

 === Abstract ===
 Spark is an open source system for large-scale data analysis on
 clusters.

 === Proposal ===
 Spark is an open source system for fast and flexible large-scale data
 analysis. Spark provides a general purpose runtime that supports
 low-latency execution in several forms. These include interactive
 exploration of very large datasets, near real-time stream processing,
 and
 ad-hoc SQL analytics (through higher layer extensions). Spark interfaces
 with HDFS, HBase, Cassandra and several other storage storage layers,
 and
 exposes APIs in Scala, Java and Python.
 Background
 Spark started as U.C. Berkeley research project, designed to efficiently
 run machine learning algorithms on large datasets. Over time, it has
 evolved into a general computing engine as outlined above. Spark¹s
 developer community has also grown to include additional institutions,
 such as universities, research labs, and corporations. Funding has been
 provided by various institutions including the U.S. National Science
 Foundation, DARPA, and a number of industry sponsors. See:
 https://amplab.cs.berkeley.edu/sponsors/ for full details.

 === Rationale ===
 As the number of contributors to Spark has grown, we have sought for a
 long-term home for the project, and we believe the Apache foundation
 would
 be a great fit. Spark is a natural fit for the Apache foundation: Spark
 already interoperates with several existing Apache projects (HDFS,
 HBase,
 Hive, Cassandra, Avro and Flume to name a few). The Spark team is
 familiar
 with the Apache process and and subscribes to the Apache mission - the
 team includes multiple Apache committers already. Finally, joining
 Apache
 will help coordinate the development effort of the growing number of
 organizations which contribute to Spark.

 == Initial Goals ==
 The initial goals will most likely be to move the existing codebase to
 Apache and integrate with the Apache development process.
 Furthermore, we
 plan for incremental development, and releases along with the Apache
 guidelines.

 === Current Status ===
 == Meritocracy ==
 The Spark project already operates on meritocratic principles. Today,
 Spark has several developers and has accepted multiple major patches
 from
 outside of U.C. Berkeley. While this process has remained mostly
 informal
 (we do not have an official committer list), an implicit organization
 exists in which individuals who contribute major components act as
 maintainers for those modules. If accepted, the Spark project would
 include several of these participants as committers from the onset. We
 will work to identify all committers and PPMC members for the project
 and
 to operate under the ASF meritocratic principles.

 === Community ===
 Acceptance into the Apache foundation would bolster the already strong
 user and developer community around Spark. That community includes
 dozens
 of contributors from several institutions, a meetup group with several
 hundred members, and an active mailing list composed of hundreds of
 users.
 Core Developers
 The core developers of our project are listed in our contributors and
 initial PPMC below. Though many exist at UC Berkeley, there is a
 representative cross sampling of other organizations including
 Quantifind,
 Microsoft, Yahoo!, ClearStory Data, Bizo, Intel, Tagged and Webtrends.


 === Alignment ===
 Our proposed effort aligns with several ongoing BIGDATA and U.S.
 National
 priority funding interests including the NSF and its Expeditions
 program,
 and the DARPA XDATA project. Our industry partners and collaborators are
 well aligned with our code base.

 There are also a number of related Apache projects and dependencies,
 that
 will be mentioned in the Relationships with Other Apache products
 section.

 == Known Risks ==

 === Orphaned Products ===
 Given the current level of investment in Spark - the risk of the project
 being abandoned is minimal. There are several constituents who are
 highly
 incentivized to continue development. The U.C. Berkeley AMPLab relies on
 Spark as a platform for a large number of long-term research projects.
 Several companies have build verticalized products which are tightly
 dependent on Spark. Other companies have devoted significant internal
 infrastructure investment in Spark.

 === Inexperience with Open Source ===
 Spark has 

Re: [VOTE] Apache Spark for the Incubator

2013-06-08 Thread Ted Dunning
+1


On Sat, Jun 8, 2013 at 7:34 AM, Mattmann, Chris A (398J) 
chris.a.mattm...@jpl.nasa.gov wrote:

 Hi Folks,

 OK discussion has died down, time to VOTE to accept Spark into the
 Apache Incubator. I'll let the VOTE run for at least a week.

 So far I've heard +1s from the following folks, so no need for them
 to VOTE again unless they want to change their VOTE:

 +1

 Chris Mattmann*
 Konstantin Boudnik
 Henry Saputra*
 Reynold Xin
 Pei Chen
 Roman Shaposhnik*
 Suresh Marru*

 * -indicates IPMC

 [ ] +1 Accept Spark into the Apache Incubator.
 [ ] +0 Don't care.
 [ ] -1 Don't accept Spark into the Apache Incubator because..

 Proposal text is below.

 === Abstract ===
 Spark is an open source system for large-scale data analysis on clusters.

 === Proposal ===
 Spark is an open source system for fast and flexible large-scale data
 analysis. Spark provides a general purpose runtime that supports
 low-latency execution in several forms. These include interactive
 exploration of very large datasets, near real-time stream processing, and
 ad-hoc SQL analytics (through higher layer extensions). Spark interfaces
 with HDFS, HBase, Cassandra and several other storage storage layers, and
 exposes APIs in Scala, Java and Python.
 Background
 Spark started as U.C. Berkeley research project, designed to efficiently
 run machine learning algorithms on large datasets. Over time, it has
 evolved into a general computing engine as outlined above. Spark¹s
 developer community has also grown to include additional institutions,
 such as universities, research labs, and corporations. Funding has been
 provided by various institutions including the U.S. National Science
 Foundation, DARPA, and a number of industry sponsors. See:
 https://amplab.cs.berkeley.edu/sponsors/ for full details.

 === Rationale ===
 As the number of contributors to Spark has grown, we have sought for a
 long-term home for the project, and we believe the Apache foundation would
 be a great fit. Spark is a natural fit for the Apache foundation: Spark
 already interoperates with several existing Apache projects (HDFS, HBase,
 Hive, Cassandra, Avro and Flume to name a few). The Spark team is familiar
 with the Apache process and and subscribes to the Apache mission - the
 team includes multiple Apache committers already. Finally, joining Apache
 will help coordinate the development effort of the growing number of
 organizations which contribute to Spark.

 == Initial Goals ==
 The initial goals will most likely be to move the existing codebase to
 Apache and integrate with the Apache development process. Furthermore, we
 plan for incremental development, and releases along with the Apache
 guidelines.

 === Current Status ===
 == Meritocracy ==
 The Spark project already operates on meritocratic principles. Today,
 Spark has several developers and has accepted multiple major patches from
 outside of U.C. Berkeley. While this process has remained mostly informal
 (we do not have an official committer list), an implicit organization
 exists in which individuals who contribute major components act as
 maintainers for those modules. If accepted, the Spark project would
 include several of these participants as committers from the onset. We
 will work to identify all committers and PPMC members for the project and
 to operate under the ASF meritocratic principles.

 === Community ===
 Acceptance into the Apache foundation would bolster the already strong
 user and developer community around Spark. That community includes dozens
 of contributors from several institutions, a meetup group with several
 hundred members, and an active mailing list composed of hundreds of users.
 Core Developers
 The core developers of our project are listed in our contributors and
 initial PPMC below. Though many exist at UC Berkeley, there is a
 representative cross sampling of other organizations including Quantifind,
 Microsoft, Yahoo!, ClearStory Data, Bizo, Intel, Tagged and Webtrends.


 === Alignment ===
 Our proposed effort aligns with several ongoing BIGDATA and U.S. National
 priority funding interests including the NSF and its Expeditions program,
 and the DARPA XDATA project. Our industry partners and collaborators are
 well aligned with our code base.

 There are also a number of related Apache projects and dependencies, that
 will be mentioned in the Relationships with Other Apache products section.

 == Known Risks ==

 === Orphaned Products ===
 Given the current level of investment in Spark - the risk of the project
 being abandoned is minimal. There are several constituents who are highly
 incentivized to continue development. The U.C. Berkeley AMPLab relies on
 Spark as a platform for a large number of long-term research projects.
 Several companies have build verticalized products which are tightly
 dependent on Spark. Other companies have devoted significant internal
 infrastructure investment in Spark.

 === Inexperience with Open Source ===
 Spark has 

Re: [VOTE] Apache Spark for the Incubator

2013-06-08 Thread Scott Deboy
+1

On 6/7/13, Ted Dunning ted.dunn...@gmail.com wrote:
 +1


 On Sat, Jun 8, 2013 at 7:34 AM, Mattmann, Chris A (398J) 
 chris.a.mattm...@jpl.nasa.gov wrote:

 Hi Folks,

 OK discussion has died down, time to VOTE to accept Spark into the
 Apache Incubator. I'll let the VOTE run for at least a week.

 So far I've heard +1s from the following folks, so no need for them
 to VOTE again unless they want to change their VOTE:

 +1

 Chris Mattmann*
 Konstantin Boudnik
 Henry Saputra*
 Reynold Xin
 Pei Chen
 Roman Shaposhnik*
 Suresh Marru*

 * -indicates IPMC

 [ ] +1 Accept Spark into the Apache Incubator.
 [ ] +0 Don't care.
 [ ] -1 Don't accept Spark into the Apache Incubator because..

 Proposal text is below.

 === Abstract ===
 Spark is an open source system for large-scale data analysis on clusters.

 === Proposal ===
 Spark is an open source system for fast and flexible large-scale data
 analysis. Spark provides a general purpose runtime that supports
 low-latency execution in several forms. These include interactive
 exploration of very large datasets, near real-time stream processing, and
 ad-hoc SQL analytics (through higher layer extensions). Spark interfaces
 with HDFS, HBase, Cassandra and several other storage storage layers, and
 exposes APIs in Scala, Java and Python.
 Background
 Spark started as U.C. Berkeley research project, designed to efficiently
 run machine learning algorithms on large datasets. Over time, it has
 evolved into a general computing engine as outlined above. Spark¹s
 developer community has also grown to include additional institutions,
 such as universities, research labs, and corporations. Funding has been
 provided by various institutions including the U.S. National Science
 Foundation, DARPA, and a number of industry sponsors. See:
 https://amplab.cs.berkeley.edu/sponsors/ for full details.

 === Rationale ===
 As the number of contributors to Spark has grown, we have sought for a
 long-term home for the project, and we believe the Apache foundation
 would
 be a great fit. Spark is a natural fit for the Apache foundation: Spark
 already interoperates with several existing Apache projects (HDFS, HBase,
 Hive, Cassandra, Avro and Flume to name a few). The Spark team is
 familiar
 with the Apache process and and subscribes to the Apache mission - the
 team includes multiple Apache committers already. Finally, joining Apache
 will help coordinate the development effort of the growing number of
 organizations which contribute to Spark.

 == Initial Goals ==
 The initial goals will most likely be to move the existing codebase to
 Apache and integrate with the Apache development process. Furthermore, we
 plan for incremental development, and releases along with the Apache
 guidelines.

 === Current Status ===
 == Meritocracy ==
 The Spark project already operates on meritocratic principles. Today,
 Spark has several developers and has accepted multiple major patches from
 outside of U.C. Berkeley. While this process has remained mostly informal
 (we do not have an official committer list), an implicit organization
 exists in which individuals who contribute major components act as
 maintainers for those modules. If accepted, the Spark project would
 include several of these participants as committers from the onset. We
 will work to identify all committers and PPMC members for the project and
 to operate under the ASF meritocratic principles.

 === Community ===
 Acceptance into the Apache foundation would bolster the already strong
 user and developer community around Spark. That community includes dozens
 of contributors from several institutions, a meetup group with several
 hundred members, and an active mailing list composed of hundreds of
 users.
 Core Developers
 The core developers of our project are listed in our contributors and
 initial PPMC below. Though many exist at UC Berkeley, there is a
 representative cross sampling of other organizations including
 Quantifind,
 Microsoft, Yahoo!, ClearStory Data, Bizo, Intel, Tagged and Webtrends.


 === Alignment ===
 Our proposed effort aligns with several ongoing BIGDATA and U.S. National
 priority funding interests including the NSF and its Expeditions program,
 and the DARPA XDATA project. Our industry partners and collaborators are
 well aligned with our code base.

 There are also a number of related Apache projects and dependencies, that
 will be mentioned in the Relationships with Other Apache products
 section.

 == Known Risks ==

 === Orphaned Products ===
 Given the current level of investment in Spark - the risk of the project
 being abandoned is minimal. There are several constituents who are highly
 incentivized to continue development. The U.C. Berkeley AMPLab relies on
 Spark as a platform for a large number of long-term research projects.
 Several companies have build verticalized products which are tightly
 dependent on Spark. Other companies have devoted significant internal
 infrastructure investment 

Re: [VOTE] Apache Spark for the Incubator

2013-06-08 Thread Hitesh Shah
+1 (non-binding)

-- Hitesh

On Jun 7, 2013, at 10:34 PM, Mattmann, Chris A (398J) wrote:

 Hi Folks,
 
 OK discussion has died down, time to VOTE to accept Spark into the
 Apache Incubator. I'll let the VOTE run for at least a week.
 
 So far I've heard +1s from the following folks, so no need for them
 to VOTE again unless they want to change their VOTE:
 
 +1
 
 Chris Mattmann*
 Konstantin Boudnik
 Henry Saputra*
 Reynold Xin
 Pei Chen
 Roman Shaposhnik*
 Suresh Marru*
 
 * -indicates IPMC
 
 [ ] +1 Accept Spark into the Apache Incubator.
 [ ] +0 Don't care.
 [ ] -1 Don't accept Spark into the Apache Incubator because..
 
 Proposal text is below.
 
 === Abstract ===
 Spark is an open source system for large-scale data analysis on clusters.
 
 === Proposal ===
 Spark is an open source system for fast and flexible large-scale data
 analysis. Spark provides a general purpose runtime that supports
 low-latency execution in several forms. These include interactive
 exploration of very large datasets, near real-time stream processing, and
 ad-hoc SQL analytics (through higher layer extensions). Spark interfaces
 with HDFS, HBase, Cassandra and several other storage storage layers, and
 exposes APIs in Scala, Java and Python.
 Background
 Spark started as U.C. Berkeley research project, designed to efficiently
 run machine learning algorithms on large datasets. Over time, it has
 evolved into a general computing engine as outlined above. Spark¹s
 developer community has also grown to include additional institutions,
 such as universities, research labs, and corporations. Funding has been
 provided by various institutions including the U.S. National Science
 Foundation, DARPA, and a number of industry sponsors. See:
 https://amplab.cs.berkeley.edu/sponsors/ for full details.
 
 === Rationale ===
 As the number of contributors to Spark has grown, we have sought for a
 long-term home for the project, and we believe the Apache foundation would
 be a great fit. Spark is a natural fit for the Apache foundation: Spark
 already interoperates with several existing Apache projects (HDFS, HBase,
 Hive, Cassandra, Avro and Flume to name a few). The Spark team is familiar
 with the Apache process and and subscribes to the Apache mission - the
 team includes multiple Apache committers already. Finally, joining Apache
 will help coordinate the development effort of the growing number of
 organizations which contribute to Spark.
 
 == Initial Goals ==
 The initial goals will most likely be to move the existing codebase to
 Apache and integrate with the Apache development process. Furthermore, we
 plan for incremental development, and releases along with the Apache
 guidelines.
 
 === Current Status ===
 == Meritocracy ==
 The Spark project already operates on meritocratic principles. Today,
 Spark has several developers and has accepted multiple major patches from
 outside of U.C. Berkeley. While this process has remained mostly informal
 (we do not have an official committer list), an implicit organization
 exists in which individuals who contribute major components act as
 maintainers for those modules. If accepted, the Spark project would
 include several of these participants as committers from the onset. We
 will work to identify all committers and PPMC members for the project and
 to operate under the ASF meritocratic principles.
 
 === Community ===
 Acceptance into the Apache foundation would bolster the already strong
 user and developer community around Spark. That community includes dozens
 of contributors from several institutions, a meetup group with several
 hundred members, and an active mailing list composed of hundreds of users.
 Core Developers
 The core developers of our project are listed in our contributors and
 initial PPMC below. Though many exist at UC Berkeley, there is a
 representative cross sampling of other organizations including Quantifind,
 Microsoft, Yahoo!, ClearStory Data, Bizo, Intel, Tagged and Webtrends.
 
 
 === Alignment ===
 Our proposed effort aligns with several ongoing BIGDATA and U.S. National
 priority funding interests including the NSF and its Expeditions program,
 and the DARPA XDATA project. Our industry partners and collaborators are
 well aligned with our code base.
 
 There are also a number of related Apache projects and dependencies, that
 will be mentioned in the Relationships with Other Apache products section.
 
 == Known Risks ==
 
 === Orphaned Products ===
 Given the current level of investment in Spark - the risk of the project
 being abandoned is minimal. There are several constituents who are highly
 incentivized to continue development. The U.C. Berkeley AMPLab relies on
 Spark as a platform for a large number of long-term research projects.
 Several companies have build verticalized products which are tightly
 dependent on Spark. Other companies have devoted significant internal
 infrastructure investment in Spark.
 
 === Inexperience with Open Source ===
 

Re: [VOTE] Apache Spark for the Incubator

2013-06-08 Thread Ramirez, Paul M (398J)
+1

On 6/7/13 10:34 PM, Mattmann, Chris A (398J)
chris.a.mattm...@jpl.nasa.gov wrote:

Hi Folks,

OK discussion has died down, time to VOTE to accept Spark into the
Apache Incubator. I'll let the VOTE run for at least a week.

So far I've heard +1s from the following folks, so no need for them
to VOTE again unless they want to change their VOTE:

+1

Chris Mattmann*
Konstantin Boudnik
Henry Saputra*
Reynold Xin
Pei Chen
Roman Shaposhnik*
Suresh Marru*

* -indicates IPMC

[ ] +1 Accept Spark into the Apache Incubator.
[ ] +0 Don't care.
[ ] -1 Don't accept Spark into the Apache Incubator because..

Proposal text is below.

=== Abstract ===
Spark is an open source system for large-scale data analysis on clusters.

=== Proposal ===
Spark is an open source system for fast and flexible large-scale data
analysis. Spark provides a general purpose runtime that supports
low-latency execution in several forms. These include interactive
exploration of very large datasets, near real-time stream processing, and
ad-hoc SQL analytics (through higher layer extensions). Spark interfaces
with HDFS, HBase, Cassandra and several other storage storage layers, and
exposes APIs in Scala, Java and Python.
Background
Spark started as U.C. Berkeley research project, designed to efficiently
run machine learning algorithms on large datasets. Over time, it has
evolved into a general computing engine as outlined above. Spark¹s
developer community has also grown to include additional institutions,
such as universities, research labs, and corporations. Funding has been
provided by various institutions including the U.S. National Science
Foundation, DARPA, and a number of industry sponsors. See:
https://amplab.cs.berkeley.edu/sponsors/ for full details.

=== Rationale ===
As the number of contributors to Spark has grown, we have sought for a
long-term home for the project, and we believe the Apache foundation would
be a great fit. Spark is a natural fit for the Apache foundation: Spark
already interoperates with several existing Apache projects (HDFS, HBase,
Hive, Cassandra, Avro and Flume to name a few). The Spark team is familiar
with the Apache process and and subscribes to the Apache mission - the
team includes multiple Apache committers already. Finally, joining Apache
will help coordinate the development effort of the growing number of
organizations which contribute to Spark.

== Initial Goals ==
The initial goals will most likely be to move the existing codebase to
Apache and integrate with the Apache development process. Furthermore, we
plan for incremental development, and releases along with the Apache
guidelines.

=== Current Status ===
== Meritocracy ==
The Spark project already operates on meritocratic principles. Today,
Spark has several developers and has accepted multiple major patches from
outside of U.C. Berkeley. While this process has remained mostly informal
(we do not have an official committer list), an implicit organization
exists in which individuals who contribute major components act as
maintainers for those modules. If accepted, the Spark project would
include several of these participants as committers from the onset. We
will work to identify all committers and PPMC members for the project and
to operate under the ASF meritocratic principles.

=== Community ===
Acceptance into the Apache foundation would bolster the already strong
user and developer community around Spark. That community includes dozens
of contributors from several institutions, a meetup group with several
hundred members, and an active mailing list composed of hundreds of users.
Core Developers
The core developers of our project are listed in our contributors and
initial PPMC below. Though many exist at UC Berkeley, there is a
representative cross sampling of other organizations including Quantifind,
Microsoft, Yahoo!, ClearStory Data, Bizo, Intel, Tagged and Webtrends.


=== Alignment ===
Our proposed effort aligns with several ongoing BIGDATA and U.S. National
priority funding interests including the NSF and its Expeditions program,
and the DARPA XDATA project. Our industry partners and collaborators are
well aligned with our code base.

There are also a number of related Apache projects and dependencies, that
will be mentioned in the Relationships with Other Apache products section.

== Known Risks ==

=== Orphaned Products ===
Given the current level of investment in Spark - the risk of the project
being abandoned is minimal. There are several constituents who are highly
incentivized to continue development. The U.C. Berkeley AMPLab relies on
Spark as a platform for a large number of long-term research projects.
Several companies have build verticalized products which are tightly
dependent on Spark. Other companies have devoted significant internal
infrastructure investment in Spark.

=== Inexperience with Open Source ===
Spark has existed as a healthy open source project for several years.
During that time, Matei and others have curated 

Re: [VOTE] Apache Spark for the Incubator

2013-06-08 Thread Ralph Goers
+1 (binding)

Ralph

On Jun 7, 2013, at 10:34 PM, Mattmann, Chris A (398J) wrote:

 Hi Folks,
 
 OK discussion has died down, time to VOTE to accept Spark into the
 Apache Incubator. I'll let the VOTE run for at least a week.
 
 So far I've heard +1s from the following folks, so no need for them
 to VOTE again unless they want to change their VOTE:
 
 +1
 
 Chris Mattmann*
 Konstantin Boudnik
 Henry Saputra*
 Reynold Xin
 Pei Chen
 Roman Shaposhnik*
 Suresh Marru*
 
 * -indicates IPMC
 
 [ ] +1 Accept Spark into the Apache Incubator.
 [ ] +0 Don't care.
 [ ] -1 Don't accept Spark into the Apache Incubator because..
 
 Proposal text is below.
 
 === Abstract ===
 Spark is an open source system for large-scale data analysis on clusters.
 
 === Proposal ===
 Spark is an open source system for fast and flexible large-scale data
 analysis. Spark provides a general purpose runtime that supports
 low-latency execution in several forms. These include interactive
 exploration of very large datasets, near real-time stream processing, and
 ad-hoc SQL analytics (through higher layer extensions). Spark interfaces
 with HDFS, HBase, Cassandra and several other storage storage layers, and
 exposes APIs in Scala, Java and Python.
 Background
 Spark started as U.C. Berkeley research project, designed to efficiently
 run machine learning algorithms on large datasets. Over time, it has
 evolved into a general computing engine as outlined above. Spark¹s
 developer community has also grown to include additional institutions,
 such as universities, research labs, and corporations. Funding has been
 provided by various institutions including the U.S. National Science
 Foundation, DARPA, and a number of industry sponsors. See:
 https://amplab.cs.berkeley.edu/sponsors/ for full details.
 
 === Rationale ===
 As the number of contributors to Spark has grown, we have sought for a
 long-term home for the project, and we believe the Apache foundation would
 be a great fit. Spark is a natural fit for the Apache foundation: Spark
 already interoperates with several existing Apache projects (HDFS, HBase,
 Hive, Cassandra, Avro and Flume to name a few). The Spark team is familiar
 with the Apache process and and subscribes to the Apache mission - the
 team includes multiple Apache committers already. Finally, joining Apache
 will help coordinate the development effort of the growing number of
 organizations which contribute to Spark.
 
 == Initial Goals ==
 The initial goals will most likely be to move the existing codebase to
 Apache and integrate with the Apache development process. Furthermore, we
 plan for incremental development, and releases along with the Apache
 guidelines.
 
 === Current Status ===
 == Meritocracy ==
 The Spark project already operates on meritocratic principles. Today,
 Spark has several developers and has accepted multiple major patches from
 outside of U.C. Berkeley. While this process has remained mostly informal
 (we do not have an official committer list), an implicit organization
 exists in which individuals who contribute major components act as
 maintainers for those modules. If accepted, the Spark project would
 include several of these participants as committers from the onset. We
 will work to identify all committers and PPMC members for the project and
 to operate under the ASF meritocratic principles.
 
 === Community ===
 Acceptance into the Apache foundation would bolster the already strong
 user and developer community around Spark. That community includes dozens
 of contributors from several institutions, a meetup group with several
 hundred members, and an active mailing list composed of hundreds of users.
 Core Developers
 The core developers of our project are listed in our contributors and
 initial PPMC below. Though many exist at UC Berkeley, there is a
 representative cross sampling of other organizations including Quantifind,
 Microsoft, Yahoo!, ClearStory Data, Bizo, Intel, Tagged and Webtrends.
 
 
 === Alignment ===
 Our proposed effort aligns with several ongoing BIGDATA and U.S. National
 priority funding interests including the NSF and its Expeditions program,
 and the DARPA XDATA project. Our industry partners and collaborators are
 well aligned with our code base.
 
 There are also a number of related Apache projects and dependencies, that
 will be mentioned in the Relationships with Other Apache products section.
 
 == Known Risks ==
 
 === Orphaned Products ===
 Given the current level of investment in Spark - the risk of the project
 being abandoned is minimal. There are several constituents who are highly
 incentivized to continue development. The U.C. Berkeley AMPLab relies on
 Spark as a platform for a large number of long-term research projects.
 Several companies have build verticalized products which are tightly
 dependent on Spark. Other companies have devoted significant internal
 infrastructure investment in Spark.
 
 === Inexperience with Open Source ===
 Spark has 

Re: [VOTE] Apache Spark for the Incubator

2013-06-08 Thread Alan Cabrera
+1 binding


Regards,
Alan

On Jun 7, 2013, at 10:34 PM, Mattmann, Chris A (398J) 
chris.a.mattm...@jpl.nasa.gov wrote:

 Hi Folks,
 
 OK discussion has died down, time to VOTE to accept Spark into the
 Apache Incubator. I'll let the VOTE run for at least a week.
 
 So far I've heard +1s from the following folks, so no need for them
 to VOTE again unless they want to change their VOTE:
 
 +1
 
 Chris Mattmann*
 Konstantin Boudnik
 Henry Saputra*
 Reynold Xin
 Pei Chen
 Roman Shaposhnik*
 Suresh Marru*
 
 * -indicates IPMC
 
 [ ] +1 Accept Spark into the Apache Incubator.
 [ ] +0 Don't care.
 [ ] -1 Don't accept Spark into the Apache Incubator because..
 
 Proposal text is below.
 
 === Abstract ===
 Spark is an open source system for large-scale data analysis on clusters.
 
 === Proposal ===
 Spark is an open source system for fast and flexible large-scale data
 analysis. Spark provides a general purpose runtime that supports
 low-latency execution in several forms. These include interactive
 exploration of very large datasets, near real-time stream processing, and
 ad-hoc SQL analytics (through higher layer extensions). Spark interfaces
 with HDFS, HBase, Cassandra and several other storage storage layers, and
 exposes APIs in Scala, Java and Python.
 Background
 Spark started as U.C. Berkeley research project, designed to efficiently
 run machine learning algorithms on large datasets. Over time, it has
 evolved into a general computing engine as outlined above. Spark¹s
 developer community has also grown to include additional institutions,
 such as universities, research labs, and corporations. Funding has been
 provided by various institutions including the U.S. National Science
 Foundation, DARPA, and a number of industry sponsors. See:
 https://amplab.cs.berkeley.edu/sponsors/ for full details.
 
 === Rationale ===
 As the number of contributors to Spark has grown, we have sought for a
 long-term home for the project, and we believe the Apache foundation would
 be a great fit. Spark is a natural fit for the Apache foundation: Spark
 already interoperates with several existing Apache projects (HDFS, HBase,
 Hive, Cassandra, Avro and Flume to name a few). The Spark team is familiar
 with the Apache process and and subscribes to the Apache mission - the
 team includes multiple Apache committers already. Finally, joining Apache
 will help coordinate the development effort of the growing number of
 organizations which contribute to Spark.
 
 == Initial Goals ==
 The initial goals will most likely be to move the existing codebase to
 Apache and integrate with the Apache development process. Furthermore, we
 plan for incremental development, and releases along with the Apache
 guidelines.
 
 === Current Status ===
 == Meritocracy ==
 The Spark project already operates on meritocratic principles. Today,
 Spark has several developers and has accepted multiple major patches from
 outside of U.C. Berkeley. While this process has remained mostly informal
 (we do not have an official committer list), an implicit organization
 exists in which individuals who contribute major components act as
 maintainers for those modules. If accepted, the Spark project would
 include several of these participants as committers from the onset. We
 will work to identify all committers and PPMC members for the project and
 to operate under the ASF meritocratic principles.
 
 === Community ===
 Acceptance into the Apache foundation would bolster the already strong
 user and developer community around Spark. That community includes dozens
 of contributors from several institutions, a meetup group with several
 hundred members, and an active mailing list composed of hundreds of users.
 Core Developers
 The core developers of our project are listed in our contributors and
 initial PPMC below. Though many exist at UC Berkeley, there is a
 representative cross sampling of other organizations including Quantifind,
 Microsoft, Yahoo!, ClearStory Data, Bizo, Intel, Tagged and Webtrends.
 
 
 === Alignment ===
 Our proposed effort aligns with several ongoing BIGDATA and U.S. National
 priority funding interests including the NSF and its Expeditions program,
 and the DARPA XDATA project. Our industry partners and collaborators are
 well aligned with our code base.
 
 There are also a number of related Apache projects and dependencies, that
 will be mentioned in the Relationships with Other Apache products section.
 
 == Known Risks ==
 
 === Orphaned Products ===
 Given the current level of investment in Spark - the risk of the project
 being abandoned is minimal. There are several constituents who are highly
 incentivized to continue development. The U.C. Berkeley AMPLab relies on
 Spark as a platform for a large number of long-term research projects.
 Several companies have build verticalized products which are tightly
 dependent on Spark. Other companies have devoted significant internal
 infrastructure investment in Spark.
 
 === 

Re: [VOTE] Apache Spark for the Incubator

2013-06-08 Thread Thilina Gunarathne
+1 (non binding)...

This is great news!.

thanks,
Thilina



On Sat, Jun 8, 2013 at 10:50 PM, Alan Cabrera l...@toolazydogs.com wrote:

 +1 binding


 Regards,
 Alan

 On Jun 7, 2013, at 10:34 PM, Mattmann, Chris A (398J) 
 chris.a.mattm...@jpl.nasa.gov wrote:

  Hi Folks,
 
  OK discussion has died down, time to VOTE to accept Spark into the
  Apache Incubator. I'll let the VOTE run for at least a week.
 
  So far I've heard +1s from the following folks, so no need for them
  to VOTE again unless they want to change their VOTE:
 
  +1
 
  Chris Mattmann*
  Konstantin Boudnik
  Henry Saputra*
  Reynold Xin
  Pei Chen
  Roman Shaposhnik*
  Suresh Marru*
 
  * -indicates IPMC
 
  [ ] +1 Accept Spark into the Apache Incubator.
  [ ] +0 Don't care.
  [ ] -1 Don't accept Spark into the Apache Incubator because..
 
  Proposal text is below.
 
  === Abstract ===
  Spark is an open source system for large-scale data analysis on clusters.
 
  === Proposal ===
  Spark is an open source system for fast and flexible large-scale data
  analysis. Spark provides a general purpose runtime that supports
  low-latency execution in several forms. These include interactive
  exploration of very large datasets, near real-time stream processing, and
  ad-hoc SQL analytics (through higher layer extensions). Spark interfaces
  with HDFS, HBase, Cassandra and several other storage storage layers, and
  exposes APIs in Scala, Java and Python.
  Background
  Spark started as U.C. Berkeley research project, designed to efficiently
  run machine learning algorithms on large datasets. Over time, it has
  evolved into a general computing engine as outlined above. Spark¹s
  developer community has also grown to include additional institutions,
  such as universities, research labs, and corporations. Funding has been
  provided by various institutions including the U.S. National Science
  Foundation, DARPA, and a number of industry sponsors. See:
  https://amplab.cs.berkeley.edu/sponsors/ for full details.
 
  === Rationale ===
  As the number of contributors to Spark has grown, we have sought for a
  long-term home for the project, and we believe the Apache foundation
 would
  be a great fit. Spark is a natural fit for the Apache foundation: Spark
  already interoperates with several existing Apache projects (HDFS, HBase,
  Hive, Cassandra, Avro and Flume to name a few). The Spark team is
 familiar
  with the Apache process and and subscribes to the Apache mission - the
  team includes multiple Apache committers already. Finally, joining Apache
  will help coordinate the development effort of the growing number of
  organizations which contribute to Spark.
 
  == Initial Goals ==
  The initial goals will most likely be to move the existing codebase to
  Apache and integrate with the Apache development process. Furthermore, we
  plan for incremental development, and releases along with the Apache
  guidelines.
 
  === Current Status ===
  == Meritocracy ==
  The Spark project already operates on meritocratic principles. Today,
  Spark has several developers and has accepted multiple major patches from
  outside of U.C. Berkeley. While this process has remained mostly informal
  (we do not have an official committer list), an implicit organization
  exists in which individuals who contribute major components act as
  maintainers for those modules. If accepted, the Spark project would
  include several of these participants as committers from the onset. We
  will work to identify all committers and PPMC members for the project and
  to operate under the ASF meritocratic principles.
 
  === Community ===
  Acceptance into the Apache foundation would bolster the already strong
  user and developer community around Spark. That community includes dozens
  of contributors from several institutions, a meetup group with several
  hundred members, and an active mailing list composed of hundreds of
 users.
  Core Developers
  The core developers of our project are listed in our contributors and
  initial PPMC below. Though many exist at UC Berkeley, there is a
  representative cross sampling of other organizations including
 Quantifind,
  Microsoft, Yahoo!, ClearStory Data, Bizo, Intel, Tagged and Webtrends.
 
 
  === Alignment ===
  Our proposed effort aligns with several ongoing BIGDATA and U.S. National
  priority funding interests including the NSF and its Expeditions program,
  and the DARPA XDATA project. Our industry partners and collaborators are
  well aligned with our code base.
 
  There are also a number of related Apache projects and dependencies, that
  will be mentioned in the Relationships with Other Apache products
 section.
 
  == Known Risks ==
 
  === Orphaned Products ===
  Given the current level of investment in Spark - the risk of the project
  being abandoned is minimal. There are several constituents who are highly
  incentivized to continue development. The U.C. Berkeley AMPLab relies on
  Spark as a platform for a large 

[VOTE] Apache Spark for the Incubator

2013-06-07 Thread Mattmann, Chris A (398J)
Hi Folks,

OK discussion has died down, time to VOTE to accept Spark into the
Apache Incubator. I'll let the VOTE run for at least a week.

So far I've heard +1s from the following folks, so no need for them
to VOTE again unless they want to change their VOTE:

+1

Chris Mattmann*
Konstantin Boudnik
Henry Saputra*
Reynold Xin
Pei Chen
Roman Shaposhnik*
Suresh Marru*

* -indicates IPMC

[ ] +1 Accept Spark into the Apache Incubator.
[ ] +0 Don't care.
[ ] -1 Don't accept Spark into the Apache Incubator because..

Proposal text is below.

=== Abstract ===
Spark is an open source system for large-scale data analysis on clusters.

=== Proposal ===
Spark is an open source system for fast and flexible large-scale data
analysis. Spark provides a general purpose runtime that supports
low-latency execution in several forms. These include interactive
exploration of very large datasets, near real-time stream processing, and
ad-hoc SQL analytics (through higher layer extensions). Spark interfaces
with HDFS, HBase, Cassandra and several other storage storage layers, and
exposes APIs in Scala, Java and Python.
Background
Spark started as U.C. Berkeley research project, designed to efficiently
run machine learning algorithms on large datasets. Over time, it has
evolved into a general computing engine as outlined above. Spark¹s
developer community has also grown to include additional institutions,
such as universities, research labs, and corporations. Funding has been
provided by various institutions including the U.S. National Science
Foundation, DARPA, and a number of industry sponsors. See:
https://amplab.cs.berkeley.edu/sponsors/ for full details.

=== Rationale ===
As the number of contributors to Spark has grown, we have sought for a
long-term home for the project, and we believe the Apache foundation would
be a great fit. Spark is a natural fit for the Apache foundation: Spark
already interoperates with several existing Apache projects (HDFS, HBase,
Hive, Cassandra, Avro and Flume to name a few). The Spark team is familiar
with the Apache process and and subscribes to the Apache mission - the
team includes multiple Apache committers already. Finally, joining Apache
will help coordinate the development effort of the growing number of
organizations which contribute to Spark.

== Initial Goals ==
The initial goals will most likely be to move the existing codebase to
Apache and integrate with the Apache development process. Furthermore, we
plan for incremental development, and releases along with the Apache
guidelines.

=== Current Status ===
== Meritocracy ==
The Spark project already operates on meritocratic principles. Today,
Spark has several developers and has accepted multiple major patches from
outside of U.C. Berkeley. While this process has remained mostly informal
(we do not have an official committer list), an implicit organization
exists in which individuals who contribute major components act as
maintainers for those modules. If accepted, the Spark project would
include several of these participants as committers from the onset. We
will work to identify all committers and PPMC members for the project and
to operate under the ASF meritocratic principles.

=== Community ===
Acceptance into the Apache foundation would bolster the already strong
user and developer community around Spark. That community includes dozens
of contributors from several institutions, a meetup group with several
hundred members, and an active mailing list composed of hundreds of users.
Core Developers
The core developers of our project are listed in our contributors and
initial PPMC below. Though many exist at UC Berkeley, there is a
representative cross sampling of other organizations including Quantifind,
Microsoft, Yahoo!, ClearStory Data, Bizo, Intel, Tagged and Webtrends.


=== Alignment ===
Our proposed effort aligns with several ongoing BIGDATA and U.S. National
priority funding interests including the NSF and its Expeditions program,
and the DARPA XDATA project. Our industry partners and collaborators are
well aligned with our code base.

There are also a number of related Apache projects and dependencies, that
will be mentioned in the Relationships with Other Apache products section.

== Known Risks ==

=== Orphaned Products ===
Given the current level of investment in Spark - the risk of the project
being abandoned is minimal. There are several constituents who are highly
incentivized to continue development. The U.C. Berkeley AMPLab relies on
Spark as a platform for a large number of long-term research projects.
Several companies have build verticalized products which are tightly
dependent on Spark. Other companies have devoted significant internal
infrastructure investment in Spark.

=== Inexperience with Open Source ===
Spark has existed as a healthy open source project for several years.
During that time, Matei and others have curated an open-source community
successfully, attracting developers from a diverse group of