Re: [RESULT] [VOTE] Apache Spark for the Incubator
Hi Karthik, Yes it is. You can join by sending blank emails to: dev-subscr...@spark.incubator.apache.org commits-subscr...@spark.incubator.apache.org Cheers! Chris ++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++ -Original Message- From: karthik tunga karthik.tu...@gmail.com Reply-To: general@incubator.apache.org general@incubator.apache.org Date: Tuesday, June 25, 2013 11:22 PM To: general@incubator.apache.org general@incubator.apache.org Subject: Re: [RESULT] [VOTE] Apache Spark for the Incubator Hi, Is the mailing list setup ? Cheers, Karthik On 20 June 2013 02:38, Matei Zaharia ma...@eecs.berkeley.edu wrote: Thanks Chris! We'll get started on all the required steps. Matei On Jun 20, 2013, at 4:35 AM, Mattmann, Chris A (398J) chris.a.mattm...@jpl.nasa.gov wrote: Hi Folks, This VOTE has passed with the following tallies: +1 Chris Mattmann* Konstantin Boudnik Henry Saputra* Reynold Xin Pei Chen Roman Shaposhnik* Suresh Marru* Scott Deboy Ted Dunning* Hitesh Shah Paul Ramirez* Ralph Goers* Alan Cabrera* Thilina Gunarathne Marcel Offermans* Alex Karasulu* Chris Douglas* Andrew Hart* Deepal jayasinghe Ashish Joe Brockmeier* Mohammad Nour El-Din* Arun C Murthy* Tim Williams* Arvind Prabhakar* Matt Franklin* Matei Zaharia Andy Konwinski +0.9 Marvin Humphrey * -indicates IPMC I'll go ahead and get the JIRA tickets filed for email/issue tracking/Git, and then work with the community to get them moving on' over. Thanks for VOTE'ing! Cheers, Chris ++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++ -Original Message- From: Mattmann, jpluser chris.a.mattm...@jpl.nasa.gov Reply-To: general@incubator.apache.org general@incubator.apache.org Date: Friday, June 7, 2013 10:34 PM To: general@incubator.apache.org general@incubator.apache.org Subject: [VOTE] Apache Spark for the Incubator Hi Folks, OK discussion has died down, time to VOTE to accept Spark into the Apache Incubator. I'll let the VOTE run for at least a week. So far I've heard +1s from the following folks, so no need for them to VOTE again unless they want to change their VOTE: +1 Chris Mattmann* Konstantin Boudnik Henry Saputra* Reynold Xin Pei Chen Roman Shaposhnik* Suresh Marru* * -indicates IPMC [ ] +1 Accept Spark into the Apache Incubator. [ ] +0 Don't care. [ ] -1 Don't accept Spark into the Apache Incubator because.. Proposal text is below. === Abstract === Spark is an open source system for large-scale data analysis on clusters. === Proposal === Spark is an open source system for fast and flexible large-scale data analysis. Spark provides a general purpose runtime that supports low-latency execution in several forms. These include interactive exploration of very large datasets, near real-time stream processing, and ad-hoc SQL analytics (through higher layer extensions). Spark interfaces with HDFS, HBase, Cassandra and several other storage storage layers, and exposes APIs in Scala, Java and Python. Background Spark started as U.C. Berkeley research project, designed to efficiently run machine learning algorithms on large datasets. Over time, it has evolved into a general computing engine as outlined above. Spark¹s developer community has also grown to include additional institutions, such as universities, research labs, and corporations. Funding has been provided by various institutions including the U.S. National Science Foundation, DARPA, and a number of industry sponsors. See: https://amplab.cs.berkeley.edu/sponsors/ for full details. === Rationale === As the number of contributors to Spark has grown, we have sought for a long-term home for the project, and we believe the Apache foundation would be a great fit. Spark is a natural fit for the Apache foundation: Spark already interoperates with several existing
Re: [RESULT] [VOTE] Apache Spark for the Incubator
Hi, Is the mailing list setup ? Cheers, Karthik On 20 June 2013 02:38, Matei Zaharia ma...@eecs.berkeley.edu wrote: Thanks Chris! We'll get started on all the required steps. Matei On Jun 20, 2013, at 4:35 AM, Mattmann, Chris A (398J) chris.a.mattm...@jpl.nasa.gov wrote: Hi Folks, This VOTE has passed with the following tallies: +1 Chris Mattmann* Konstantin Boudnik Henry Saputra* Reynold Xin Pei Chen Roman Shaposhnik* Suresh Marru* Scott Deboy Ted Dunning* Hitesh Shah Paul Ramirez* Ralph Goers* Alan Cabrera* Thilina Gunarathne Marcel Offermans* Alex Karasulu* Chris Douglas* Andrew Hart* Deepal jayasinghe Ashish Joe Brockmeier* Mohammad Nour El-Din* Arun C Murthy* Tim Williams* Arvind Prabhakar* Matt Franklin* Matei Zaharia Andy Konwinski +0.9 Marvin Humphrey * -indicates IPMC I'll go ahead and get the JIRA tickets filed for email/issue tracking/Git, and then work with the community to get them moving on' over. Thanks for VOTE'ing! Cheers, Chris ++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++ -Original Message- From: Mattmann, jpluser chris.a.mattm...@jpl.nasa.gov Reply-To: general@incubator.apache.org general@incubator.apache.org Date: Friday, June 7, 2013 10:34 PM To: general@incubator.apache.org general@incubator.apache.org Subject: [VOTE] Apache Spark for the Incubator Hi Folks, OK discussion has died down, time to VOTE to accept Spark into the Apache Incubator. I'll let the VOTE run for at least a week. So far I've heard +1s from the following folks, so no need for them to VOTE again unless they want to change their VOTE: +1 Chris Mattmann* Konstantin Boudnik Henry Saputra* Reynold Xin Pei Chen Roman Shaposhnik* Suresh Marru* * -indicates IPMC [ ] +1 Accept Spark into the Apache Incubator. [ ] +0 Don't care. [ ] -1 Don't accept Spark into the Apache Incubator because.. Proposal text is below. === Abstract === Spark is an open source system for large-scale data analysis on clusters. === Proposal === Spark is an open source system for fast and flexible large-scale data analysis. Spark provides a general purpose runtime that supports low-latency execution in several forms. These include interactive exploration of very large datasets, near real-time stream processing, and ad-hoc SQL analytics (through higher layer extensions). Spark interfaces with HDFS, HBase, Cassandra and several other storage storage layers, and exposes APIs in Scala, Java and Python. Background Spark started as U.C. Berkeley research project, designed to efficiently run machine learning algorithms on large datasets. Over time, it has evolved into a general computing engine as outlined above. Spark¹s developer community has also grown to include additional institutions, such as universities, research labs, and corporations. Funding has been provided by various institutions including the U.S. National Science Foundation, DARPA, and a number of industry sponsors. See: https://amplab.cs.berkeley.edu/sponsors/ for full details. === Rationale === As the number of contributors to Spark has grown, we have sought for a long-term home for the project, and we believe the Apache foundation would be a great fit. Spark is a natural fit for the Apache foundation: Spark already interoperates with several existing Apache projects (HDFS, HBase, Hive, Cassandra, Avro and Flume to name a few). The Spark team is familiar with the Apache process and and subscribes to the Apache mission - the team includes multiple Apache committers already. Finally, joining Apache will help coordinate the development effort of the growing number of organizations which contribute to Spark. == Initial Goals == The initial goals will most likely be to move the existing codebase to Apache and integrate with the Apache development process. Furthermore, we plan for incremental development, and releases along with the Apache guidelines. === Current Status === == Meritocracy == The Spark project already operates on meritocratic principles. Today, Spark has several developers and has accepted multiple major patches from outside of U.C. Berkeley. While this process has remained mostly informal (we do not have an official committer list), an implicit organization exists in which individuals
Re: [RESULT] [VOTE] Apache Spark for the Incubator
Thanks Chris! We'll get started on all the required steps. Matei On Jun 20, 2013, at 4:35 AM, Mattmann, Chris A (398J) chris.a.mattm...@jpl.nasa.gov wrote: Hi Folks, This VOTE has passed with the following tallies: +1 Chris Mattmann* Konstantin Boudnik Henry Saputra* Reynold Xin Pei Chen Roman Shaposhnik* Suresh Marru* Scott Deboy Ted Dunning* Hitesh Shah Paul Ramirez* Ralph Goers* Alan Cabrera* Thilina Gunarathne Marcel Offermans* Alex Karasulu* Chris Douglas* Andrew Hart* Deepal jayasinghe Ashish Joe Brockmeier* Mohammad Nour El-Din* Arun C Murthy* Tim Williams* Arvind Prabhakar* Matt Franklin* Matei Zaharia Andy Konwinski +0.9 Marvin Humphrey * -indicates IPMC I'll go ahead and get the JIRA tickets filed for email/issue tracking/Git, and then work with the community to get them moving on' over. Thanks for VOTE'ing! Cheers, Chris ++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++ -Original Message- From: Mattmann, jpluser chris.a.mattm...@jpl.nasa.gov Reply-To: general@incubator.apache.org general@incubator.apache.org Date: Friday, June 7, 2013 10:34 PM To: general@incubator.apache.org general@incubator.apache.org Subject: [VOTE] Apache Spark for the Incubator Hi Folks, OK discussion has died down, time to VOTE to accept Spark into the Apache Incubator. I'll let the VOTE run for at least a week. So far I've heard +1s from the following folks, so no need for them to VOTE again unless they want to change their VOTE: +1 Chris Mattmann* Konstantin Boudnik Henry Saputra* Reynold Xin Pei Chen Roman Shaposhnik* Suresh Marru* * -indicates IPMC [ ] +1 Accept Spark into the Apache Incubator. [ ] +0 Don't care. [ ] -1 Don't accept Spark into the Apache Incubator because.. Proposal text is below. === Abstract === Spark is an open source system for large-scale data analysis on clusters. === Proposal === Spark is an open source system for fast and flexible large-scale data analysis. Spark provides a general purpose runtime that supports low-latency execution in several forms. These include interactive exploration of very large datasets, near real-time stream processing, and ad-hoc SQL analytics (through higher layer extensions). Spark interfaces with HDFS, HBase, Cassandra and several other storage storage layers, and exposes APIs in Scala, Java and Python. Background Spark started as U.C. Berkeley research project, designed to efficiently run machine learning algorithms on large datasets. Over time, it has evolved into a general computing engine as outlined above. Spark¹s developer community has also grown to include additional institutions, such as universities, research labs, and corporations. Funding has been provided by various institutions including the U.S. National Science Foundation, DARPA, and a number of industry sponsors. See: https://amplab.cs.berkeley.edu/sponsors/ for full details. === Rationale === As the number of contributors to Spark has grown, we have sought for a long-term home for the project, and we believe the Apache foundation would be a great fit. Spark is a natural fit for the Apache foundation: Spark already interoperates with several existing Apache projects (HDFS, HBase, Hive, Cassandra, Avro and Flume to name a few). The Spark team is familiar with the Apache process and and subscribes to the Apache mission - the team includes multiple Apache committers already. Finally, joining Apache will help coordinate the development effort of the growing number of organizations which contribute to Spark. == Initial Goals == The initial goals will most likely be to move the existing codebase to Apache and integrate with the Apache development process. Furthermore, we plan for incremental development, and releases along with the Apache guidelines. === Current Status === == Meritocracy == The Spark project already operates on meritocratic principles. Today, Spark has several developers and has accepted multiple major patches from outside of U.C. Berkeley. While this process has remained mostly informal (we do not have an official committer list), an implicit organization exists in which individuals who contribute major components act as maintainers for those modules. If accepted, the Spark project would include several of these participants as committers from the onset. We will work to identify all committers and PPMC members
[RESULT] [VOTE] Apache Spark for the Incubator
Hi Folks, This VOTE has passed with the following tallies: +1 Chris Mattmann* Konstantin Boudnik Henry Saputra* Reynold Xin Pei Chen Roman Shaposhnik* Suresh Marru* Scott Deboy Ted Dunning* Hitesh Shah Paul Ramirez* Ralph Goers* Alan Cabrera* Thilina Gunarathne Marcel Offermans* Alex Karasulu* Chris Douglas* Andrew Hart* Deepal jayasinghe Ashish Joe Brockmeier* Mohammad Nour El-Din* Arun C Murthy* Tim Williams* Arvind Prabhakar* Matt Franklin* Matei Zaharia Andy Konwinski +0.9 Marvin Humphrey * -indicates IPMC I'll go ahead and get the JIRA tickets filed for email/issue tracking/Git, and then work with the community to get them moving on' over. Thanks for VOTE'ing! Cheers, Chris ++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++ -Original Message- From: Mattmann, jpluser chris.a.mattm...@jpl.nasa.gov Reply-To: general@incubator.apache.org general@incubator.apache.org Date: Friday, June 7, 2013 10:34 PM To: general@incubator.apache.org general@incubator.apache.org Subject: [VOTE] Apache Spark for the Incubator Hi Folks, OK discussion has died down, time to VOTE to accept Spark into the Apache Incubator. I'll let the VOTE run for at least a week. So far I've heard +1s from the following folks, so no need for them to VOTE again unless they want to change their VOTE: +1 Chris Mattmann* Konstantin Boudnik Henry Saputra* Reynold Xin Pei Chen Roman Shaposhnik* Suresh Marru* * -indicates IPMC [ ] +1 Accept Spark into the Apache Incubator. [ ] +0 Don't care. [ ] -1 Don't accept Spark into the Apache Incubator because.. Proposal text is below. === Abstract === Spark is an open source system for large-scale data analysis on clusters. === Proposal === Spark is an open source system for fast and flexible large-scale data analysis. Spark provides a general purpose runtime that supports low-latency execution in several forms. These include interactive exploration of very large datasets, near real-time stream processing, and ad-hoc SQL analytics (through higher layer extensions). Spark interfaces with HDFS, HBase, Cassandra and several other storage storage layers, and exposes APIs in Scala, Java and Python. Background Spark started as U.C. Berkeley research project, designed to efficiently run machine learning algorithms on large datasets. Over time, it has evolved into a general computing engine as outlined above. Spark¹s developer community has also grown to include additional institutions, such as universities, research labs, and corporations. Funding has been provided by various institutions including the U.S. National Science Foundation, DARPA, and a number of industry sponsors. See: https://amplab.cs.berkeley.edu/sponsors/ for full details. === Rationale === As the number of contributors to Spark has grown, we have sought for a long-term home for the project, and we believe the Apache foundation would be a great fit. Spark is a natural fit for the Apache foundation: Spark already interoperates with several existing Apache projects (HDFS, HBase, Hive, Cassandra, Avro and Flume to name a few). The Spark team is familiar with the Apache process and and subscribes to the Apache mission - the team includes multiple Apache committers already. Finally, joining Apache will help coordinate the development effort of the growing number of organizations which contribute to Spark. == Initial Goals == The initial goals will most likely be to move the existing codebase to Apache and integrate with the Apache development process. Furthermore, we plan for incremental development, and releases along with the Apache guidelines. === Current Status === == Meritocracy == The Spark project already operates on meritocratic principles. Today, Spark has several developers and has accepted multiple major patches from outside of U.C. Berkeley. While this process has remained mostly informal (we do not have an official committer list), an implicit organization exists in which individuals who contribute major components act as maintainers for those modules. If accepted, the Spark project would include several of these participants as committers from the onset. We will work to identify all committers and PPMC members for the project and to operate under the ASF meritocratic principles. === Community === Acceptance into the Apache foundation would bolster the already strong user and developer community around Spark. That community includes dozens of contributors from several institutions, a meetup group with several hundred members
Re: [VOTE] Apache Spark for the Incubator
+1 (non-binding) Andy On Sat, Jun 8, 2013 at 12:36 AM, Matei Zaharia ma...@eecs.berkeley.eduwrote: +1 (non-binding) Matei On Jun 8, 2013, at 12:25 AM, Hitesh Shah hit...@hortonworks.com wrote: +1 (non-binding) -- Hitesh On Jun 7, 2013, at 10:34 PM, Mattmann, Chris A (398J) wrote: Hi Folks, OK discussion has died down, time to VOTE to accept Spark into the Apache Incubator. I'll let the VOTE run for at least a week. So far I've heard +1s from the following folks, so no need for them to VOTE again unless they want to change their VOTE: +1 Chris Mattmann* Konstantin Boudnik Henry Saputra* Reynold Xin Pei Chen Roman Shaposhnik* Suresh Marru* * -indicates IPMC [ ] +1 Accept Spark into the Apache Incubator. [ ] +0 Don't care. [ ] -1 Don't accept Spark into the Apache Incubator because.. Proposal text is below. === Abstract === Spark is an open source system for large-scale data analysis on clusters. === Proposal === Spark is an open source system for fast and flexible large-scale data analysis. Spark provides a general purpose runtime that supports low-latency execution in several forms. These include interactive exploration of very large datasets, near real-time stream processing, and ad-hoc SQL analytics (through higher layer extensions). Spark interfaces with HDFS, HBase, Cassandra and several other storage storage layers, and exposes APIs in Scala, Java and Python. Background Spark started as U.C. Berkeley research project, designed to efficiently run machine learning algorithms on large datasets. Over time, it has evolved into a general computing engine as outlined above. Spark¹s developer community has also grown to include additional institutions, such as universities, research labs, and corporations. Funding has been provided by various institutions including the U.S. National Science Foundation, DARPA, and a number of industry sponsors. See: https://amplab.cs.berkeley.edu/sponsors/ for full details. === Rationale === As the number of contributors to Spark has grown, we have sought for a long-term home for the project, and we believe the Apache foundation would be a great fit. Spark is a natural fit for the Apache foundation: Spark already interoperates with several existing Apache projects (HDFS, HBase, Hive, Cassandra, Avro and Flume to name a few). The Spark team is familiar with the Apache process and and subscribes to the Apache mission - the team includes multiple Apache committers already. Finally, joining Apache will help coordinate the development effort of the growing number of organizations which contribute to Spark. == Initial Goals == The initial goals will most likely be to move the existing codebase to Apache and integrate with the Apache development process. Furthermore, we plan for incremental development, and releases along with the Apache guidelines. === Current Status === == Meritocracy == The Spark project already operates on meritocratic principles. Today, Spark has several developers and has accepted multiple major patches from outside of U.C. Berkeley. While this process has remained mostly informal (we do not have an official committer list), an implicit organization exists in which individuals who contribute major components act as maintainers for those modules. If accepted, the Spark project would include several of these participants as committers from the onset. We will work to identify all committers and PPMC members for the project and to operate under the ASF meritocratic principles. === Community === Acceptance into the Apache foundation would bolster the already strong user and developer community around Spark. That community includes dozens of contributors from several institutions, a meetup group with several hundred members, and an active mailing list composed of hundreds of users. Core Developers The core developers of our project are listed in our contributors and initial PPMC below. Though many exist at UC Berkeley, there is a representative cross sampling of other organizations including Quantifind, Microsoft, Yahoo!, ClearStory Data, Bizo, Intel, Tagged and Webtrends. === Alignment === Our proposed effort aligns with several ongoing BIGDATA and U.S. National priority funding interests including the NSF and its Expeditions program, and the DARPA XDATA project. Our industry partners and collaborators are well aligned with our code base. There are also a number of related Apache projects and dependencies, that will be mentioned in the Relationships with Other Apache products section. == Known Risks == === Orphaned Products === Given the current level of investment in Spark - the risk of the project being abandoned is minimal. There are several constituents who are highly incentivized to continue development. The U.C.
Re: [VOTE] Apache Spark for the Incubator
+1 (non-binding) Matei On Jun 8, 2013, at 12:25 AM, Hitesh Shah hit...@hortonworks.com wrote: +1 (non-binding) -- Hitesh On Jun 7, 2013, at 10:34 PM, Mattmann, Chris A (398J) wrote: Hi Folks, OK discussion has died down, time to VOTE to accept Spark into the Apache Incubator. I'll let the VOTE run for at least a week. So far I've heard +1s from the following folks, so no need for them to VOTE again unless they want to change their VOTE: +1 Chris Mattmann* Konstantin Boudnik Henry Saputra* Reynold Xin Pei Chen Roman Shaposhnik* Suresh Marru* * -indicates IPMC [ ] +1 Accept Spark into the Apache Incubator. [ ] +0 Don't care. [ ] -1 Don't accept Spark into the Apache Incubator because.. Proposal text is below. === Abstract === Spark is an open source system for large-scale data analysis on clusters. === Proposal === Spark is an open source system for fast and flexible large-scale data analysis. Spark provides a general purpose runtime that supports low-latency execution in several forms. These include interactive exploration of very large datasets, near real-time stream processing, and ad-hoc SQL analytics (through higher layer extensions). Spark interfaces with HDFS, HBase, Cassandra and several other storage storage layers, and exposes APIs in Scala, Java and Python. Background Spark started as U.C. Berkeley research project, designed to efficiently run machine learning algorithms on large datasets. Over time, it has evolved into a general computing engine as outlined above. Spark¹s developer community has also grown to include additional institutions, such as universities, research labs, and corporations. Funding has been provided by various institutions including the U.S. National Science Foundation, DARPA, and a number of industry sponsors. See: https://amplab.cs.berkeley.edu/sponsors/ for full details. === Rationale === As the number of contributors to Spark has grown, we have sought for a long-term home for the project, and we believe the Apache foundation would be a great fit. Spark is a natural fit for the Apache foundation: Spark already interoperates with several existing Apache projects (HDFS, HBase, Hive, Cassandra, Avro and Flume to name a few). The Spark team is familiar with the Apache process and and subscribes to the Apache mission - the team includes multiple Apache committers already. Finally, joining Apache will help coordinate the development effort of the growing number of organizations which contribute to Spark. == Initial Goals == The initial goals will most likely be to move the existing codebase to Apache and integrate with the Apache development process. Furthermore, we plan for incremental development, and releases along with the Apache guidelines. === Current Status === == Meritocracy == The Spark project already operates on meritocratic principles. Today, Spark has several developers and has accepted multiple major patches from outside of U.C. Berkeley. While this process has remained mostly informal (we do not have an official committer list), an implicit organization exists in which individuals who contribute major components act as maintainers for those modules. If accepted, the Spark project would include several of these participants as committers from the onset. We will work to identify all committers and PPMC members for the project and to operate under the ASF meritocratic principles. === Community === Acceptance into the Apache foundation would bolster the already strong user and developer community around Spark. That community includes dozens of contributors from several institutions, a meetup group with several hundred members, and an active mailing list composed of hundreds of users. Core Developers The core developers of our project are listed in our contributors and initial PPMC below. Though many exist at UC Berkeley, there is a representative cross sampling of other organizations including Quantifind, Microsoft, Yahoo!, ClearStory Data, Bizo, Intel, Tagged and Webtrends. === Alignment === Our proposed effort aligns with several ongoing BIGDATA and U.S. National priority funding interests including the NSF and its Expeditions program, and the DARPA XDATA project. Our industry partners and collaborators are well aligned with our code base. There are also a number of related Apache projects and dependencies, that will be mentioned in the Relationships with Other Apache products section. == Known Risks == === Orphaned Products === Given the current level of investment in Spark - the risk of the project being abandoned is minimal. There are several constituents who are highly incentivized to continue development. The U.C. Berkeley AMPLab relies on Spark as a platform for a large number of long-term research projects. Several companies have build verticalized products which are tightly dependent on Spark. Other companies have devoted
Re: [VOTE] Apache Spark for the Incubator
+1 (binding) On Sat, Jun 8, 2013 at 1:34 AM, Mattmann, Chris A (398J) chris.a.mattm...@jpl.nasa.gov wrote: Hi Folks, OK discussion has died down, time to VOTE to accept Spark into the Apache Incubator. I'll let the VOTE run for at least a week. So far I've heard +1s from the following folks, so no need for them to VOTE again unless they want to change their VOTE: +1 Chris Mattmann* Konstantin Boudnik Henry Saputra* Reynold Xin Pei Chen Roman Shaposhnik* Suresh Marru* * -indicates IPMC [ ] +1 Accept Spark into the Apache Incubator. [ ] +0 Don't care. [ ] -1 Don't accept Spark into the Apache Incubator because.. Proposal text is below. === Abstract === Spark is an open source system for large-scale data analysis on clusters. === Proposal === Spark is an open source system for fast and flexible large-scale data analysis. Spark provides a general purpose runtime that supports low-latency execution in several forms. These include interactive exploration of very large datasets, near real-time stream processing, and ad-hoc SQL analytics (through higher layer extensions). Spark interfaces with HDFS, HBase, Cassandra and several other storage storage layers, and exposes APIs in Scala, Java and Python. Background Spark started as U.C. Berkeley research project, designed to efficiently run machine learning algorithms on large datasets. Over time, it has evolved into a general computing engine as outlined above. Spark¹s developer community has also grown to include additional institutions, such as universities, research labs, and corporations. Funding has been provided by various institutions including the U.S. National Science Foundation, DARPA, and a number of industry sponsors. See: https://amplab.cs.berkeley.edu/sponsors/ for full details. === Rationale === As the number of contributors to Spark has grown, we have sought for a long-term home for the project, and we believe the Apache foundation would be a great fit. Spark is a natural fit for the Apache foundation: Spark already interoperates with several existing Apache projects (HDFS, HBase, Hive, Cassandra, Avro and Flume to name a few). The Spark team is familiar with the Apache process and and subscribes to the Apache mission - the team includes multiple Apache committers already. Finally, joining Apache will help coordinate the development effort of the growing number of organizations which contribute to Spark. == Initial Goals == The initial goals will most likely be to move the existing codebase to Apache and integrate with the Apache development process. Furthermore, we plan for incremental development, and releases along with the Apache guidelines. === Current Status === == Meritocracy == The Spark project already operates on meritocratic principles. Today, Spark has several developers and has accepted multiple major patches from outside of U.C. Berkeley. While this process has remained mostly informal (we do not have an official committer list), an implicit organization exists in which individuals who contribute major components act as maintainers for those modules. If accepted, the Spark project would include several of these participants as committers from the onset. We will work to identify all committers and PPMC members for the project and to operate under the ASF meritocratic principles. === Community === Acceptance into the Apache foundation would bolster the already strong user and developer community around Spark. That community includes dozens of contributors from several institutions, a meetup group with several hundred members, and an active mailing list composed of hundreds of users. Core Developers The core developers of our project are listed in our contributors and initial PPMC below. Though many exist at UC Berkeley, there is a representative cross sampling of other organizations including Quantifind, Microsoft, Yahoo!, ClearStory Data, Bizo, Intel, Tagged and Webtrends. === Alignment === Our proposed effort aligns with several ongoing BIGDATA and U.S. National priority funding interests including the NSF and its Expeditions program, and the DARPA XDATA project. Our industry partners and collaborators are well aligned with our code base. There are also a number of related Apache projects and dependencies, that will be mentioned in the Relationships with Other Apache products section. == Known Risks == === Orphaned Products === Given the current level of investment in Spark - the risk of the project being abandoned is minimal. There are several constituents who are highly incentivized to continue development. The U.C. Berkeley AMPLab relies on Spark as a platform for a large number of long-term research projects. Several companies have build verticalized products which are tightly dependent on Spark. Other companies have devoted significant internal infrastructure investment in Spark. === Inexperience with Open Source ===
Re: [VOTE] Apache Spark for the Incubator
On Sat, Jun 8, 2013, at 12:34 AM, Mattmann, Chris A (398J) wrote: [ ] +1 Accept Spark into the Apache Incubator. [ ] +0 Don't care. [ ] -1 Don't accept Spark into the Apache Incubator because.. +1 (binding) Best, jzb -- Joe Brockmeier j...@zonker.net Twitter: @jzb http://www.dissociatedpress.net/ - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: [VOTE] Apache Spark for the Incubator
+1 --tim On Sat, Jun 8, 2013 at 1:34 AM, Mattmann, Chris A (398J) chris.a.mattm...@jpl.nasa.gov wrote: Hi Folks, OK discussion has died down, time to VOTE to accept Spark into the Apache Incubator. I'll let the VOTE run for at least a week. So far I've heard +1s from the following folks, so no need for them to VOTE again unless they want to change their VOTE: +1 Chris Mattmann* Konstantin Boudnik Henry Saputra* Reynold Xin Pei Chen Roman Shaposhnik* Suresh Marru* * -indicates IPMC [ ] +1 Accept Spark into the Apache Incubator. [ ] +0 Don't care. [ ] -1 Don't accept Spark into the Apache Incubator because.. Proposal text is below. === Abstract === Spark is an open source system for large-scale data analysis on clusters. === Proposal === Spark is an open source system for fast and flexible large-scale data analysis. Spark provides a general purpose runtime that supports low-latency execution in several forms. These include interactive exploration of very large datasets, near real-time stream processing, and ad-hoc SQL analytics (through higher layer extensions). Spark interfaces with HDFS, HBase, Cassandra and several other storage storage layers, and exposes APIs in Scala, Java and Python. Background Spark started as U.C. Berkeley research project, designed to efficiently run machine learning algorithms on large datasets. Over time, it has evolved into a general computing engine as outlined above. Spark¹s developer community has also grown to include additional institutions, such as universities, research labs, and corporations. Funding has been provided by various institutions including the U.S. National Science Foundation, DARPA, and a number of industry sponsors. See: https://amplab.cs.berkeley.edu/sponsors/ for full details. === Rationale === As the number of contributors to Spark has grown, we have sought for a long-term home for the project, and we believe the Apache foundation would be a great fit. Spark is a natural fit for the Apache foundation: Spark already interoperates with several existing Apache projects (HDFS, HBase, Hive, Cassandra, Avro and Flume to name a few). The Spark team is familiar with the Apache process and and subscribes to the Apache mission - the team includes multiple Apache committers already. Finally, joining Apache will help coordinate the development effort of the growing number of organizations which contribute to Spark. == Initial Goals == The initial goals will most likely be to move the existing codebase to Apache and integrate with the Apache development process. Furthermore, we plan for incremental development, and releases along with the Apache guidelines. === Current Status === == Meritocracy == The Spark project already operates on meritocratic principles. Today, Spark has several developers and has accepted multiple major patches from outside of U.C. Berkeley. While this process has remained mostly informal (we do not have an official committer list), an implicit organization exists in which individuals who contribute major components act as maintainers for those modules. If accepted, the Spark project would include several of these participants as committers from the onset. We will work to identify all committers and PPMC members for the project and to operate under the ASF meritocratic principles. === Community === Acceptance into the Apache foundation would bolster the already strong user and developer community around Spark. That community includes dozens of contributors from several institutions, a meetup group with several hundred members, and an active mailing list composed of hundreds of users. Core Developers The core developers of our project are listed in our contributors and initial PPMC below. Though many exist at UC Berkeley, there is a representative cross sampling of other organizations including Quantifind, Microsoft, Yahoo!, ClearStory Data, Bizo, Intel, Tagged and Webtrends. === Alignment === Our proposed effort aligns with several ongoing BIGDATA and U.S. National priority funding interests including the NSF and its Expeditions program, and the DARPA XDATA project. Our industry partners and collaborators are well aligned with our code base. There are also a number of related Apache projects and dependencies, that will be mentioned in the Relationships with Other Apache products section. == Known Risks == === Orphaned Products === Given the current level of investment in Spark - the risk of the project being abandoned is minimal. There are several constituents who are highly incentivized to continue development. The U.C. Berkeley AMPLab relies on Spark as a platform for a large number of long-term research projects. Several companies have build verticalized products which are tightly dependent on Spark. Other companies have devoted significant internal infrastructure investment in Spark. === Inexperience with Open Source === Spark
Re: [VOTE] Apache Spark for the Incubator
+1 (binding) Greetings, Marcel - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: [VOTE] Apache Spark for the Incubator
+1 Andrew On 06/07/2013 10:34 PM, Mattmann, Chris A (398J) wrote: Hi Folks, OK discussion has died down, time to VOTE to accept Spark into the Apache Incubator. I'll let the VOTE run for at least a week. So far I've heard +1s from the following folks, so no need for them to VOTE again unless they want to change their VOTE: +1 Chris Mattmann* Konstantin Boudnik Henry Saputra* Reynold Xin Pei Chen Roman Shaposhnik* Suresh Marru* * -indicates IPMC [ ] +1 Accept Spark into the Apache Incubator. [ ] +0 Don't care. [ ] -1 Don't accept Spark into the Apache Incubator because.. Proposal text is below. === Abstract === Spark is an open source system for large-scale data analysis on clusters. === Proposal === Spark is an open source system for fast and flexible large-scale data analysis. Spark provides a general purpose runtime that supports low-latency execution in several forms. These include interactive exploration of very large datasets, near real-time stream processing, and ad-hoc SQL analytics (through higher layer extensions). Spark interfaces with HDFS, HBase, Cassandra and several other storage storage layers, and exposes APIs in Scala, Java and Python. Background Spark started as U.C. Berkeley research project, designed to efficiently run machine learning algorithms on large datasets. Over time, it has evolved into a general computing engine as outlined above. Spark¹s developer community has also grown to include additional institutions, such as universities, research labs, and corporations. Funding has been provided by various institutions including the U.S. National Science Foundation, DARPA, and a number of industry sponsors. See: https://amplab.cs.berkeley.edu/sponsors/ for full details. === Rationale === As the number of contributors to Spark has grown, we have sought for a long-term home for the project, and we believe the Apache foundation would be a great fit. Spark is a natural fit for the Apache foundation: Spark already interoperates with several existing Apache projects (HDFS, HBase, Hive, Cassandra, Avro and Flume to name a few). The Spark team is familiar with the Apache process and and subscribes to the Apache mission - the team includes multiple Apache committers already. Finally, joining Apache will help coordinate the development effort of the growing number of organizations which contribute to Spark. == Initial Goals == The initial goals will most likely be to move the existing codebase to Apache and integrate with the Apache development process. Furthermore, we plan for incremental development, and releases along with the Apache guidelines. === Current Status === == Meritocracy == The Spark project already operates on meritocratic principles. Today, Spark has several developers and has accepted multiple major patches from outside of U.C. Berkeley. While this process has remained mostly informal (we do not have an official committer list), an implicit organization exists in which individuals who contribute major components act as maintainers for those modules. If accepted, the Spark project would include several of these participants as committers from the onset. We will work to identify all committers and PPMC members for the project and to operate under the ASF meritocratic principles. === Community === Acceptance into the Apache foundation would bolster the already strong user and developer community around Spark. That community includes dozens of contributors from several institutions, a meetup group with several hundred members, and an active mailing list composed of hundreds of users. Core Developers The core developers of our project are listed in our contributors and initial PPMC below. Though many exist at UC Berkeley, there is a representative cross sampling of other organizations including Quantifind, Microsoft, Yahoo!, ClearStory Data, Bizo, Intel, Tagged and Webtrends. === Alignment === Our proposed effort aligns with several ongoing BIGDATA and U.S. National priority funding interests including the NSF and its Expeditions program, and the DARPA XDATA project. Our industry partners and collaborators are well aligned with our code base. There are also a number of related Apache projects and dependencies, that will be mentioned in the Relationships with Other Apache products section. == Known Risks == === Orphaned Products === Given the current level of investment in Spark - the risk of the project being abandoned is minimal. There are several constituents who are highly incentivized to continue development. The U.C. Berkeley AMPLab relies on Spark as a platform for a large number of long-term research projects. Several companies have build verticalized products which are tightly dependent on Spark. Other companies have devoted significant internal infrastructure investment in Spark. === Inexperience with Open Source === Spark has existed as a healthy open source project for several years. During that time, Matei and others have curated an open-source
Re: [VOTE] Apache Spark for the Incubator
+1, Deepal +1 Andrew On 06/07/2013 10:34 PM, Mattmann, Chris A (398J) wrote: Hi Folks, OK discussion has died down, time to VOTE to accept Spark into the Apache Incubator. I'll let the VOTE run for at least a week. So far I've heard +1s from the following folks, so no need for them to VOTE again unless they want to change their VOTE: +1 Chris Mattmann* Konstantin Boudnik Henry Saputra* Reynold Xin Pei Chen Roman Shaposhnik* Suresh Marru* * -indicates IPMC [ ] +1 Accept Spark into the Apache Incubator. [ ] +0 Don't care. [ ] -1 Don't accept Spark into the Apache Incubator because.. Proposal text is below. === Abstract === Spark is an open source system for large-scale data analysis on clusters. === Proposal === Spark is an open source system for fast and flexible large-scale data analysis. Spark provides a general purpose runtime that supports low-latency execution in several forms. These include interactive exploration of very large datasets, near real-time stream processing, and ad-hoc SQL analytics (through higher layer extensions). Spark interfaces with HDFS, HBase, Cassandra and several other storage storage layers, and exposes APIs in Scala, Java and Python. Background Spark started as U.C. Berkeley research project, designed to efficiently run machine learning algorithms on large datasets. Over time, it has evolved into a general computing engine as outlined above. Spark¹s developer community has also grown to include additional institutions, such as universities, research labs, and corporations. Funding has been provided by various institutions including the U.S. National Science Foundation, DARPA, and a number of industry sponsors. See: https://amplab.cs.berkeley.edu/sponsors/ for full details. === Rationale === As the number of contributors to Spark has grown, we have sought for a long-term home for the project, and we believe the Apache foundation would be a great fit. Spark is a natural fit for the Apache foundation: Spark already interoperates with several existing Apache projects (HDFS, HBase, Hive, Cassandra, Avro and Flume to name a few). The Spark team is familiar with the Apache process and and subscribes to the Apache mission - the team includes multiple Apache committers already. Finally, joining Apache will help coordinate the development effort of the growing number of organizations which contribute to Spark. == Initial Goals == The initial goals will most likely be to move the existing codebase to Apache and integrate with the Apache development process. Furthermore, we plan for incremental development, and releases along with the Apache guidelines. === Current Status === == Meritocracy == The Spark project already operates on meritocratic principles. Today, Spark has several developers and has accepted multiple major patches from outside of U.C. Berkeley. While this process has remained mostly informal (we do not have an official committer list), an implicit organization exists in which individuals who contribute major components act as maintainers for those modules. If accepted, the Spark project would include several of these participants as committers from the onset. We will work to identify all committers and PPMC members for the project and to operate under the ASF meritocratic principles. === Community === Acceptance into the Apache foundation would bolster the already strong user and developer community around Spark. That community includes dozens of contributors from several institutions, a meetup group with several hundred members, and an active mailing list composed of hundreds of users. Core Developers The core developers of our project are listed in our contributors and initial PPMC below. Though many exist at UC Berkeley, there is a representative cross sampling of other organizations including Quantifind, Microsoft, Yahoo!, ClearStory Data, Bizo, Intel, Tagged and Webtrends. === Alignment === Our proposed effort aligns with several ongoing BIGDATA and U.S. National priority funding interests including the NSF and its Expeditions program, and the DARPA XDATA project. Our industry partners and collaborators are well aligned with our code base. There are also a number of related Apache projects and dependencies, that will be mentioned in the Relationships with Other Apache products section. == Known Risks == === Orphaned Products === Given the current level of investment in Spark - the risk of the project being abandoned is minimal. There are several constituents who are highly incentivized to continue development. The U.C. Berkeley AMPLab relies on Spark as a platform for a large number of long-term research projects. Several companies have build verticalized products which are tightly dependent on Spark. Other companies have devoted significant internal infrastructure investment in Spark. === Inexperience with Open Source === Spark has
Re: [VOTE] Apache Spark for the Incubator
+1 On Sat, Jun 8, 2013 at 7:34 AM, Mattmann, Chris A (398J) chris.a.mattm...@jpl.nasa.gov wrote: Hi Folks, OK discussion has died down, time to VOTE to accept Spark into the Apache Incubator. I'll let the VOTE run for at least a week. So far I've heard +1s from the following folks, so no need for them to VOTE again unless they want to change their VOTE: +1 Chris Mattmann* Konstantin Boudnik Henry Saputra* Reynold Xin Pei Chen Roman Shaposhnik* Suresh Marru* * -indicates IPMC [ ] +1 Accept Spark into the Apache Incubator. [ ] +0 Don't care. [ ] -1 Don't accept Spark into the Apache Incubator because.. Proposal text is below. === Abstract === Spark is an open source system for large-scale data analysis on clusters. === Proposal === Spark is an open source system for fast and flexible large-scale data analysis. Spark provides a general purpose runtime that supports low-latency execution in several forms. These include interactive exploration of very large datasets, near real-time stream processing, and ad-hoc SQL analytics (through higher layer extensions). Spark interfaces with HDFS, HBase, Cassandra and several other storage storage layers, and exposes APIs in Scala, Java and Python. Background Spark started as U.C. Berkeley research project, designed to efficiently run machine learning algorithms on large datasets. Over time, it has evolved into a general computing engine as outlined above. Spark¹s developer community has also grown to include additional institutions, such as universities, research labs, and corporations. Funding has been provided by various institutions including the U.S. National Science Foundation, DARPA, and a number of industry sponsors. See: https://amplab.cs.berkeley.edu/sponsors/ for full details. === Rationale === As the number of contributors to Spark has grown, we have sought for a long-term home for the project, and we believe the Apache foundation would be a great fit. Spark is a natural fit for the Apache foundation: Spark already interoperates with several existing Apache projects (HDFS, HBase, Hive, Cassandra, Avro and Flume to name a few). The Spark team is familiar with the Apache process and and subscribes to the Apache mission - the team includes multiple Apache committers already. Finally, joining Apache will help coordinate the development effort of the growing number of organizations which contribute to Spark. == Initial Goals == The initial goals will most likely be to move the existing codebase to Apache and integrate with the Apache development process. Furthermore, we plan for incremental development, and releases along with the Apache guidelines. === Current Status === == Meritocracy == The Spark project already operates on meritocratic principles. Today, Spark has several developers and has accepted multiple major patches from outside of U.C. Berkeley. While this process has remained mostly informal (we do not have an official committer list), an implicit organization exists in which individuals who contribute major components act as maintainers for those modules. If accepted, the Spark project would include several of these participants as committers from the onset. We will work to identify all committers and PPMC members for the project and to operate under the ASF meritocratic principles. === Community === Acceptance into the Apache foundation would bolster the already strong user and developer community around Spark. That community includes dozens of contributors from several institutions, a meetup group with several hundred members, and an active mailing list composed of hundreds of users. Core Developers The core developers of our project are listed in our contributors and initial PPMC below. Though many exist at UC Berkeley, there is a representative cross sampling of other organizations including Quantifind, Microsoft, Yahoo!, ClearStory Data, Bizo, Intel, Tagged and Webtrends. === Alignment === Our proposed effort aligns with several ongoing BIGDATA and U.S. National priority funding interests including the NSF and its Expeditions program, and the DARPA XDATA project. Our industry partners and collaborators are well aligned with our code base. There are also a number of related Apache projects and dependencies, that will be mentioned in the Relationships with Other Apache products section. == Known Risks == === Orphaned Products === Given the current level of investment in Spark - the risk of the project being abandoned is minimal. There are several constituents who are highly incentivized to continue development. The U.C. Berkeley AMPLab relies on Spark as a platform for a large number of long-term research projects. Several companies have build verticalized products which are tightly dependent on Spark. Other companies have devoted significant internal infrastructure investment in Spark. === Inexperience with Open Source === Spark has
Re: [VOTE] Apache Spark for the Incubator
+1 On 6/7/13, Ted Dunning ted.dunn...@gmail.com wrote: +1 On Sat, Jun 8, 2013 at 7:34 AM, Mattmann, Chris A (398J) chris.a.mattm...@jpl.nasa.gov wrote: Hi Folks, OK discussion has died down, time to VOTE to accept Spark into the Apache Incubator. I'll let the VOTE run for at least a week. So far I've heard +1s from the following folks, so no need for them to VOTE again unless they want to change their VOTE: +1 Chris Mattmann* Konstantin Boudnik Henry Saputra* Reynold Xin Pei Chen Roman Shaposhnik* Suresh Marru* * -indicates IPMC [ ] +1 Accept Spark into the Apache Incubator. [ ] +0 Don't care. [ ] -1 Don't accept Spark into the Apache Incubator because.. Proposal text is below. === Abstract === Spark is an open source system for large-scale data analysis on clusters. === Proposal === Spark is an open source system for fast and flexible large-scale data analysis. Spark provides a general purpose runtime that supports low-latency execution in several forms. These include interactive exploration of very large datasets, near real-time stream processing, and ad-hoc SQL analytics (through higher layer extensions). Spark interfaces with HDFS, HBase, Cassandra and several other storage storage layers, and exposes APIs in Scala, Java and Python. Background Spark started as U.C. Berkeley research project, designed to efficiently run machine learning algorithms on large datasets. Over time, it has evolved into a general computing engine as outlined above. Spark¹s developer community has also grown to include additional institutions, such as universities, research labs, and corporations. Funding has been provided by various institutions including the U.S. National Science Foundation, DARPA, and a number of industry sponsors. See: https://amplab.cs.berkeley.edu/sponsors/ for full details. === Rationale === As the number of contributors to Spark has grown, we have sought for a long-term home for the project, and we believe the Apache foundation would be a great fit. Spark is a natural fit for the Apache foundation: Spark already interoperates with several existing Apache projects (HDFS, HBase, Hive, Cassandra, Avro and Flume to name a few). The Spark team is familiar with the Apache process and and subscribes to the Apache mission - the team includes multiple Apache committers already. Finally, joining Apache will help coordinate the development effort of the growing number of organizations which contribute to Spark. == Initial Goals == The initial goals will most likely be to move the existing codebase to Apache and integrate with the Apache development process. Furthermore, we plan for incremental development, and releases along with the Apache guidelines. === Current Status === == Meritocracy == The Spark project already operates on meritocratic principles. Today, Spark has several developers and has accepted multiple major patches from outside of U.C. Berkeley. While this process has remained mostly informal (we do not have an official committer list), an implicit organization exists in which individuals who contribute major components act as maintainers for those modules. If accepted, the Spark project would include several of these participants as committers from the onset. We will work to identify all committers and PPMC members for the project and to operate under the ASF meritocratic principles. === Community === Acceptance into the Apache foundation would bolster the already strong user and developer community around Spark. That community includes dozens of contributors from several institutions, a meetup group with several hundred members, and an active mailing list composed of hundreds of users. Core Developers The core developers of our project are listed in our contributors and initial PPMC below. Though many exist at UC Berkeley, there is a representative cross sampling of other organizations including Quantifind, Microsoft, Yahoo!, ClearStory Data, Bizo, Intel, Tagged and Webtrends. === Alignment === Our proposed effort aligns with several ongoing BIGDATA and U.S. National priority funding interests including the NSF and its Expeditions program, and the DARPA XDATA project. Our industry partners and collaborators are well aligned with our code base. There are also a number of related Apache projects and dependencies, that will be mentioned in the Relationships with Other Apache products section. == Known Risks == === Orphaned Products === Given the current level of investment in Spark - the risk of the project being abandoned is minimal. There are several constituents who are highly incentivized to continue development. The U.C. Berkeley AMPLab relies on Spark as a platform for a large number of long-term research projects. Several companies have build verticalized products which are tightly dependent on Spark. Other companies have devoted significant internal infrastructure investment
Re: [VOTE] Apache Spark for the Incubator
+1 (non-binding) -- Hitesh On Jun 7, 2013, at 10:34 PM, Mattmann, Chris A (398J) wrote: Hi Folks, OK discussion has died down, time to VOTE to accept Spark into the Apache Incubator. I'll let the VOTE run for at least a week. So far I've heard +1s from the following folks, so no need for them to VOTE again unless they want to change their VOTE: +1 Chris Mattmann* Konstantin Boudnik Henry Saputra* Reynold Xin Pei Chen Roman Shaposhnik* Suresh Marru* * -indicates IPMC [ ] +1 Accept Spark into the Apache Incubator. [ ] +0 Don't care. [ ] -1 Don't accept Spark into the Apache Incubator because.. Proposal text is below. === Abstract === Spark is an open source system for large-scale data analysis on clusters. === Proposal === Spark is an open source system for fast and flexible large-scale data analysis. Spark provides a general purpose runtime that supports low-latency execution in several forms. These include interactive exploration of very large datasets, near real-time stream processing, and ad-hoc SQL analytics (through higher layer extensions). Spark interfaces with HDFS, HBase, Cassandra and several other storage storage layers, and exposes APIs in Scala, Java and Python. Background Spark started as U.C. Berkeley research project, designed to efficiently run machine learning algorithms on large datasets. Over time, it has evolved into a general computing engine as outlined above. Spark¹s developer community has also grown to include additional institutions, such as universities, research labs, and corporations. Funding has been provided by various institutions including the U.S. National Science Foundation, DARPA, and a number of industry sponsors. See: https://amplab.cs.berkeley.edu/sponsors/ for full details. === Rationale === As the number of contributors to Spark has grown, we have sought for a long-term home for the project, and we believe the Apache foundation would be a great fit. Spark is a natural fit for the Apache foundation: Spark already interoperates with several existing Apache projects (HDFS, HBase, Hive, Cassandra, Avro and Flume to name a few). The Spark team is familiar with the Apache process and and subscribes to the Apache mission - the team includes multiple Apache committers already. Finally, joining Apache will help coordinate the development effort of the growing number of organizations which contribute to Spark. == Initial Goals == The initial goals will most likely be to move the existing codebase to Apache and integrate with the Apache development process. Furthermore, we plan for incremental development, and releases along with the Apache guidelines. === Current Status === == Meritocracy == The Spark project already operates on meritocratic principles. Today, Spark has several developers and has accepted multiple major patches from outside of U.C. Berkeley. While this process has remained mostly informal (we do not have an official committer list), an implicit organization exists in which individuals who contribute major components act as maintainers for those modules. If accepted, the Spark project would include several of these participants as committers from the onset. We will work to identify all committers and PPMC members for the project and to operate under the ASF meritocratic principles. === Community === Acceptance into the Apache foundation would bolster the already strong user and developer community around Spark. That community includes dozens of contributors from several institutions, a meetup group with several hundred members, and an active mailing list composed of hundreds of users. Core Developers The core developers of our project are listed in our contributors and initial PPMC below. Though many exist at UC Berkeley, there is a representative cross sampling of other organizations including Quantifind, Microsoft, Yahoo!, ClearStory Data, Bizo, Intel, Tagged and Webtrends. === Alignment === Our proposed effort aligns with several ongoing BIGDATA and U.S. National priority funding interests including the NSF and its Expeditions program, and the DARPA XDATA project. Our industry partners and collaborators are well aligned with our code base. There are also a number of related Apache projects and dependencies, that will be mentioned in the Relationships with Other Apache products section. == Known Risks == === Orphaned Products === Given the current level of investment in Spark - the risk of the project being abandoned is minimal. There are several constituents who are highly incentivized to continue development. The U.C. Berkeley AMPLab relies on Spark as a platform for a large number of long-term research projects. Several companies have build verticalized products which are tightly dependent on Spark. Other companies have devoted significant internal infrastructure investment in Spark. === Inexperience with Open Source ===
Re: [VOTE] Apache Spark for the Incubator
+1 On 6/7/13 10:34 PM, Mattmann, Chris A (398J) chris.a.mattm...@jpl.nasa.gov wrote: Hi Folks, OK discussion has died down, time to VOTE to accept Spark into the Apache Incubator. I'll let the VOTE run for at least a week. So far I've heard +1s from the following folks, so no need for them to VOTE again unless they want to change their VOTE: +1 Chris Mattmann* Konstantin Boudnik Henry Saputra* Reynold Xin Pei Chen Roman Shaposhnik* Suresh Marru* * -indicates IPMC [ ] +1 Accept Spark into the Apache Incubator. [ ] +0 Don't care. [ ] -1 Don't accept Spark into the Apache Incubator because.. Proposal text is below. === Abstract === Spark is an open source system for large-scale data analysis on clusters. === Proposal === Spark is an open source system for fast and flexible large-scale data analysis. Spark provides a general purpose runtime that supports low-latency execution in several forms. These include interactive exploration of very large datasets, near real-time stream processing, and ad-hoc SQL analytics (through higher layer extensions). Spark interfaces with HDFS, HBase, Cassandra and several other storage storage layers, and exposes APIs in Scala, Java and Python. Background Spark started as U.C. Berkeley research project, designed to efficiently run machine learning algorithms on large datasets. Over time, it has evolved into a general computing engine as outlined above. Spark¹s developer community has also grown to include additional institutions, such as universities, research labs, and corporations. Funding has been provided by various institutions including the U.S. National Science Foundation, DARPA, and a number of industry sponsors. See: https://amplab.cs.berkeley.edu/sponsors/ for full details. === Rationale === As the number of contributors to Spark has grown, we have sought for a long-term home for the project, and we believe the Apache foundation would be a great fit. Spark is a natural fit for the Apache foundation: Spark already interoperates with several existing Apache projects (HDFS, HBase, Hive, Cassandra, Avro and Flume to name a few). The Spark team is familiar with the Apache process and and subscribes to the Apache mission - the team includes multiple Apache committers already. Finally, joining Apache will help coordinate the development effort of the growing number of organizations which contribute to Spark. == Initial Goals == The initial goals will most likely be to move the existing codebase to Apache and integrate with the Apache development process. Furthermore, we plan for incremental development, and releases along with the Apache guidelines. === Current Status === == Meritocracy == The Spark project already operates on meritocratic principles. Today, Spark has several developers and has accepted multiple major patches from outside of U.C. Berkeley. While this process has remained mostly informal (we do not have an official committer list), an implicit organization exists in which individuals who contribute major components act as maintainers for those modules. If accepted, the Spark project would include several of these participants as committers from the onset. We will work to identify all committers and PPMC members for the project and to operate under the ASF meritocratic principles. === Community === Acceptance into the Apache foundation would bolster the already strong user and developer community around Spark. That community includes dozens of contributors from several institutions, a meetup group with several hundred members, and an active mailing list composed of hundreds of users. Core Developers The core developers of our project are listed in our contributors and initial PPMC below. Though many exist at UC Berkeley, there is a representative cross sampling of other organizations including Quantifind, Microsoft, Yahoo!, ClearStory Data, Bizo, Intel, Tagged and Webtrends. === Alignment === Our proposed effort aligns with several ongoing BIGDATA and U.S. National priority funding interests including the NSF and its Expeditions program, and the DARPA XDATA project. Our industry partners and collaborators are well aligned with our code base. There are also a number of related Apache projects and dependencies, that will be mentioned in the Relationships with Other Apache products section. == Known Risks == === Orphaned Products === Given the current level of investment in Spark - the risk of the project being abandoned is minimal. There are several constituents who are highly incentivized to continue development. The U.C. Berkeley AMPLab relies on Spark as a platform for a large number of long-term research projects. Several companies have build verticalized products which are tightly dependent on Spark. Other companies have devoted significant internal infrastructure investment in Spark. === Inexperience with Open Source === Spark has existed as a healthy open source project for several years. During that time, Matei and others have curated
Re: [VOTE] Apache Spark for the Incubator
+1 (binding) Ralph On Jun 7, 2013, at 10:34 PM, Mattmann, Chris A (398J) wrote: Hi Folks, OK discussion has died down, time to VOTE to accept Spark into the Apache Incubator. I'll let the VOTE run for at least a week. So far I've heard +1s from the following folks, so no need for them to VOTE again unless they want to change their VOTE: +1 Chris Mattmann* Konstantin Boudnik Henry Saputra* Reynold Xin Pei Chen Roman Shaposhnik* Suresh Marru* * -indicates IPMC [ ] +1 Accept Spark into the Apache Incubator. [ ] +0 Don't care. [ ] -1 Don't accept Spark into the Apache Incubator because.. Proposal text is below. === Abstract === Spark is an open source system for large-scale data analysis on clusters. === Proposal === Spark is an open source system for fast and flexible large-scale data analysis. Spark provides a general purpose runtime that supports low-latency execution in several forms. These include interactive exploration of very large datasets, near real-time stream processing, and ad-hoc SQL analytics (through higher layer extensions). Spark interfaces with HDFS, HBase, Cassandra and several other storage storage layers, and exposes APIs in Scala, Java and Python. Background Spark started as U.C. Berkeley research project, designed to efficiently run machine learning algorithms on large datasets. Over time, it has evolved into a general computing engine as outlined above. Spark¹s developer community has also grown to include additional institutions, such as universities, research labs, and corporations. Funding has been provided by various institutions including the U.S. National Science Foundation, DARPA, and a number of industry sponsors. See: https://amplab.cs.berkeley.edu/sponsors/ for full details. === Rationale === As the number of contributors to Spark has grown, we have sought for a long-term home for the project, and we believe the Apache foundation would be a great fit. Spark is a natural fit for the Apache foundation: Spark already interoperates with several existing Apache projects (HDFS, HBase, Hive, Cassandra, Avro and Flume to name a few). The Spark team is familiar with the Apache process and and subscribes to the Apache mission - the team includes multiple Apache committers already. Finally, joining Apache will help coordinate the development effort of the growing number of organizations which contribute to Spark. == Initial Goals == The initial goals will most likely be to move the existing codebase to Apache and integrate with the Apache development process. Furthermore, we plan for incremental development, and releases along with the Apache guidelines. === Current Status === == Meritocracy == The Spark project already operates on meritocratic principles. Today, Spark has several developers and has accepted multiple major patches from outside of U.C. Berkeley. While this process has remained mostly informal (we do not have an official committer list), an implicit organization exists in which individuals who contribute major components act as maintainers for those modules. If accepted, the Spark project would include several of these participants as committers from the onset. We will work to identify all committers and PPMC members for the project and to operate under the ASF meritocratic principles. === Community === Acceptance into the Apache foundation would bolster the already strong user and developer community around Spark. That community includes dozens of contributors from several institutions, a meetup group with several hundred members, and an active mailing list composed of hundreds of users. Core Developers The core developers of our project are listed in our contributors and initial PPMC below. Though many exist at UC Berkeley, there is a representative cross sampling of other organizations including Quantifind, Microsoft, Yahoo!, ClearStory Data, Bizo, Intel, Tagged and Webtrends. === Alignment === Our proposed effort aligns with several ongoing BIGDATA and U.S. National priority funding interests including the NSF and its Expeditions program, and the DARPA XDATA project. Our industry partners and collaborators are well aligned with our code base. There are also a number of related Apache projects and dependencies, that will be mentioned in the Relationships with Other Apache products section. == Known Risks == === Orphaned Products === Given the current level of investment in Spark - the risk of the project being abandoned is minimal. There are several constituents who are highly incentivized to continue development. The U.C. Berkeley AMPLab relies on Spark as a platform for a large number of long-term research projects. Several companies have build verticalized products which are tightly dependent on Spark. Other companies have devoted significant internal infrastructure investment in Spark. === Inexperience with Open Source === Spark has
Re: [VOTE] Apache Spark for the Incubator
+1 binding Regards, Alan On Jun 7, 2013, at 10:34 PM, Mattmann, Chris A (398J) chris.a.mattm...@jpl.nasa.gov wrote: Hi Folks, OK discussion has died down, time to VOTE to accept Spark into the Apache Incubator. I'll let the VOTE run for at least a week. So far I've heard +1s from the following folks, so no need for them to VOTE again unless they want to change their VOTE: +1 Chris Mattmann* Konstantin Boudnik Henry Saputra* Reynold Xin Pei Chen Roman Shaposhnik* Suresh Marru* * -indicates IPMC [ ] +1 Accept Spark into the Apache Incubator. [ ] +0 Don't care. [ ] -1 Don't accept Spark into the Apache Incubator because.. Proposal text is below. === Abstract === Spark is an open source system for large-scale data analysis on clusters. === Proposal === Spark is an open source system for fast and flexible large-scale data analysis. Spark provides a general purpose runtime that supports low-latency execution in several forms. These include interactive exploration of very large datasets, near real-time stream processing, and ad-hoc SQL analytics (through higher layer extensions). Spark interfaces with HDFS, HBase, Cassandra and several other storage storage layers, and exposes APIs in Scala, Java and Python. Background Spark started as U.C. Berkeley research project, designed to efficiently run machine learning algorithms on large datasets. Over time, it has evolved into a general computing engine as outlined above. Spark¹s developer community has also grown to include additional institutions, such as universities, research labs, and corporations. Funding has been provided by various institutions including the U.S. National Science Foundation, DARPA, and a number of industry sponsors. See: https://amplab.cs.berkeley.edu/sponsors/ for full details. === Rationale === As the number of contributors to Spark has grown, we have sought for a long-term home for the project, and we believe the Apache foundation would be a great fit. Spark is a natural fit for the Apache foundation: Spark already interoperates with several existing Apache projects (HDFS, HBase, Hive, Cassandra, Avro and Flume to name a few). The Spark team is familiar with the Apache process and and subscribes to the Apache mission - the team includes multiple Apache committers already. Finally, joining Apache will help coordinate the development effort of the growing number of organizations which contribute to Spark. == Initial Goals == The initial goals will most likely be to move the existing codebase to Apache and integrate with the Apache development process. Furthermore, we plan for incremental development, and releases along with the Apache guidelines. === Current Status === == Meritocracy == The Spark project already operates on meritocratic principles. Today, Spark has several developers and has accepted multiple major patches from outside of U.C. Berkeley. While this process has remained mostly informal (we do not have an official committer list), an implicit organization exists in which individuals who contribute major components act as maintainers for those modules. If accepted, the Spark project would include several of these participants as committers from the onset. We will work to identify all committers and PPMC members for the project and to operate under the ASF meritocratic principles. === Community === Acceptance into the Apache foundation would bolster the already strong user and developer community around Spark. That community includes dozens of contributors from several institutions, a meetup group with several hundred members, and an active mailing list composed of hundreds of users. Core Developers The core developers of our project are listed in our contributors and initial PPMC below. Though many exist at UC Berkeley, there is a representative cross sampling of other organizations including Quantifind, Microsoft, Yahoo!, ClearStory Data, Bizo, Intel, Tagged and Webtrends. === Alignment === Our proposed effort aligns with several ongoing BIGDATA and U.S. National priority funding interests including the NSF and its Expeditions program, and the DARPA XDATA project. Our industry partners and collaborators are well aligned with our code base. There are also a number of related Apache projects and dependencies, that will be mentioned in the Relationships with Other Apache products section. == Known Risks == === Orphaned Products === Given the current level of investment in Spark - the risk of the project being abandoned is minimal. There are several constituents who are highly incentivized to continue development. The U.C. Berkeley AMPLab relies on Spark as a platform for a large number of long-term research projects. Several companies have build verticalized products which are tightly dependent on Spark. Other companies have devoted significant internal infrastructure investment in Spark. ===
Re: [VOTE] Apache Spark for the Incubator
+1 (non binding)... This is great news!. thanks, Thilina On Sat, Jun 8, 2013 at 10:50 PM, Alan Cabrera l...@toolazydogs.com wrote: +1 binding Regards, Alan On Jun 7, 2013, at 10:34 PM, Mattmann, Chris A (398J) chris.a.mattm...@jpl.nasa.gov wrote: Hi Folks, OK discussion has died down, time to VOTE to accept Spark into the Apache Incubator. I'll let the VOTE run for at least a week. So far I've heard +1s from the following folks, so no need for them to VOTE again unless they want to change their VOTE: +1 Chris Mattmann* Konstantin Boudnik Henry Saputra* Reynold Xin Pei Chen Roman Shaposhnik* Suresh Marru* * -indicates IPMC [ ] +1 Accept Spark into the Apache Incubator. [ ] +0 Don't care. [ ] -1 Don't accept Spark into the Apache Incubator because.. Proposal text is below. === Abstract === Spark is an open source system for large-scale data analysis on clusters. === Proposal === Spark is an open source system for fast and flexible large-scale data analysis. Spark provides a general purpose runtime that supports low-latency execution in several forms. These include interactive exploration of very large datasets, near real-time stream processing, and ad-hoc SQL analytics (through higher layer extensions). Spark interfaces with HDFS, HBase, Cassandra and several other storage storage layers, and exposes APIs in Scala, Java and Python. Background Spark started as U.C. Berkeley research project, designed to efficiently run machine learning algorithms on large datasets. Over time, it has evolved into a general computing engine as outlined above. Spark¹s developer community has also grown to include additional institutions, such as universities, research labs, and corporations. Funding has been provided by various institutions including the U.S. National Science Foundation, DARPA, and a number of industry sponsors. See: https://amplab.cs.berkeley.edu/sponsors/ for full details. === Rationale === As the number of contributors to Spark has grown, we have sought for a long-term home for the project, and we believe the Apache foundation would be a great fit. Spark is a natural fit for the Apache foundation: Spark already interoperates with several existing Apache projects (HDFS, HBase, Hive, Cassandra, Avro and Flume to name a few). The Spark team is familiar with the Apache process and and subscribes to the Apache mission - the team includes multiple Apache committers already. Finally, joining Apache will help coordinate the development effort of the growing number of organizations which contribute to Spark. == Initial Goals == The initial goals will most likely be to move the existing codebase to Apache and integrate with the Apache development process. Furthermore, we plan for incremental development, and releases along with the Apache guidelines. === Current Status === == Meritocracy == The Spark project already operates on meritocratic principles. Today, Spark has several developers and has accepted multiple major patches from outside of U.C. Berkeley. While this process has remained mostly informal (we do not have an official committer list), an implicit organization exists in which individuals who contribute major components act as maintainers for those modules. If accepted, the Spark project would include several of these participants as committers from the onset. We will work to identify all committers and PPMC members for the project and to operate under the ASF meritocratic principles. === Community === Acceptance into the Apache foundation would bolster the already strong user and developer community around Spark. That community includes dozens of contributors from several institutions, a meetup group with several hundred members, and an active mailing list composed of hundreds of users. Core Developers The core developers of our project are listed in our contributors and initial PPMC below. Though many exist at UC Berkeley, there is a representative cross sampling of other organizations including Quantifind, Microsoft, Yahoo!, ClearStory Data, Bizo, Intel, Tagged and Webtrends. === Alignment === Our proposed effort aligns with several ongoing BIGDATA and U.S. National priority funding interests including the NSF and its Expeditions program, and the DARPA XDATA project. Our industry partners and collaborators are well aligned with our code base. There are also a number of related Apache projects and dependencies, that will be mentioned in the Relationships with Other Apache products section. == Known Risks == === Orphaned Products === Given the current level of investment in Spark - the risk of the project being abandoned is minimal. There are several constituents who are highly incentivized to continue development. The U.C. Berkeley AMPLab relies on Spark as a platform for a large
[VOTE] Apache Spark for the Incubator
Hi Folks, OK discussion has died down, time to VOTE to accept Spark into the Apache Incubator. I'll let the VOTE run for at least a week. So far I've heard +1s from the following folks, so no need for them to VOTE again unless they want to change their VOTE: +1 Chris Mattmann* Konstantin Boudnik Henry Saputra* Reynold Xin Pei Chen Roman Shaposhnik* Suresh Marru* * -indicates IPMC [ ] +1 Accept Spark into the Apache Incubator. [ ] +0 Don't care. [ ] -1 Don't accept Spark into the Apache Incubator because.. Proposal text is below. === Abstract === Spark is an open source system for large-scale data analysis on clusters. === Proposal === Spark is an open source system for fast and flexible large-scale data analysis. Spark provides a general purpose runtime that supports low-latency execution in several forms. These include interactive exploration of very large datasets, near real-time stream processing, and ad-hoc SQL analytics (through higher layer extensions). Spark interfaces with HDFS, HBase, Cassandra and several other storage storage layers, and exposes APIs in Scala, Java and Python. Background Spark started as U.C. Berkeley research project, designed to efficiently run machine learning algorithms on large datasets. Over time, it has evolved into a general computing engine as outlined above. Spark¹s developer community has also grown to include additional institutions, such as universities, research labs, and corporations. Funding has been provided by various institutions including the U.S. National Science Foundation, DARPA, and a number of industry sponsors. See: https://amplab.cs.berkeley.edu/sponsors/ for full details. === Rationale === As the number of contributors to Spark has grown, we have sought for a long-term home for the project, and we believe the Apache foundation would be a great fit. Spark is a natural fit for the Apache foundation: Spark already interoperates with several existing Apache projects (HDFS, HBase, Hive, Cassandra, Avro and Flume to name a few). The Spark team is familiar with the Apache process and and subscribes to the Apache mission - the team includes multiple Apache committers already. Finally, joining Apache will help coordinate the development effort of the growing number of organizations which contribute to Spark. == Initial Goals == The initial goals will most likely be to move the existing codebase to Apache and integrate with the Apache development process. Furthermore, we plan for incremental development, and releases along with the Apache guidelines. === Current Status === == Meritocracy == The Spark project already operates on meritocratic principles. Today, Spark has several developers and has accepted multiple major patches from outside of U.C. Berkeley. While this process has remained mostly informal (we do not have an official committer list), an implicit organization exists in which individuals who contribute major components act as maintainers for those modules. If accepted, the Spark project would include several of these participants as committers from the onset. We will work to identify all committers and PPMC members for the project and to operate under the ASF meritocratic principles. === Community === Acceptance into the Apache foundation would bolster the already strong user and developer community around Spark. That community includes dozens of contributors from several institutions, a meetup group with several hundred members, and an active mailing list composed of hundreds of users. Core Developers The core developers of our project are listed in our contributors and initial PPMC below. Though many exist at UC Berkeley, there is a representative cross sampling of other organizations including Quantifind, Microsoft, Yahoo!, ClearStory Data, Bizo, Intel, Tagged and Webtrends. === Alignment === Our proposed effort aligns with several ongoing BIGDATA and U.S. National priority funding interests including the NSF and its Expeditions program, and the DARPA XDATA project. Our industry partners and collaborators are well aligned with our code base. There are also a number of related Apache projects and dependencies, that will be mentioned in the Relationships with Other Apache products section. == Known Risks == === Orphaned Products === Given the current level of investment in Spark - the risk of the project being abandoned is minimal. There are several constituents who are highly incentivized to continue development. The U.C. Berkeley AMPLab relies on Spark as a platform for a large number of long-term research projects. Several companies have build verticalized products which are tightly dependent on Spark. Other companies have devoted significant internal infrastructure investment in Spark. === Inexperience with Open Source === Spark has existed as a healthy open source project for several years. During that time, Matei and others have curated an open-source community successfully, attracting developers from a diverse group of