Re: [VOTE] Accept Zeppelin into the Apache Incubator
+1 (binding) On Thu, Dec 18, 2014 at 09:29PM, Roman Shaposhnik wrote: Following the discussion earlier: http://s.apache.org/kTp I would like to call a VOTE for accepting Zeppelin as a new Incubator project. The proposal is available at: https://wiki.apache.org/incubator/ZeppelinProposal and is also attached to the end of this email. Vote is open until at least Sunday, 21th December 2014, 23:59:00 PST [ ] +1 Accept Zeppelin into the Incubator [ ] ±0 Indifferent to the acceptance of Zeppelin [ ] -1 Do not accept Zeppelin because ... Thanks, Roman. == Abstract == Zeppelin is a collaborative data analytics and visualization tool for distributed, general-purpose data processing systems such as Apache Spark, Apache Flink, etc. == Proposal == Zeppelin is a modern web-based tool for the data scientists to collaborate over large-scale data exploration and visualization projects. It is a notebook style interpreter that enable collaborative analysis sessions sharing between users. Zeppelin is independent of the execution framework itself. Current version runs on top of Apache Spark but it has pluggable interpreter APIs to support other data processing systems. More execution frameworks could be added at a later date i.e Apache Flink, Crunch as well as SQL-like backends such as Hive, Tajo, MRQL. We have a strong preference for the project to be called Zeppelin. In case that may not be feasible, alternative names could be: “Mir”, “Yuga” or “Sora”. == Background == Large scale data analysis workflow includes multiple steps like data acquisition, pre-processing, visualization, etc and may include inter-operation of multiple different tools and technologies. With the widespread of the open source general-purpose data processing systems like Spark there is a lack of open source, modern user-friendly tools that combine strengths of interpreted language for data analysis with new in-browser visualization libraries and collaborative capabilities. Zeppelin initially started as a GUI tool for diverse set of SQL-over-Hadoop systems like Hive, Presto, Shark, etc. It was open source since its inception in Sep 2013. Later, it became clear that there was a need for a greater web-based tool for data scientists to collaborate on data exploration over the large-scale projects, not limited to SQL. So Zeppelin integrated full support of Apache Spark while adding a collaborative environment with the ability to run and share interpreter sessions in-browser == Rationale == There are no open source alternatives for a collaborative notebook-based interpreter with support of multiple distributed data processing systems. As a number of companies adopting and contributing back to Zeppelin is growing, we think that having a long-term home at Apache foundation would be a great fit for the project ensuring that processes and procedures are in place to keep project and community “healthy” and free of any commercial, political or legal faults. == Initial Goals == The initial goals will be to move the existing codebase to Apache and integrate with the Apache development process. This includes moving all infrastructure that we currently maintain, such as: a website, a mailing list, an issues tracker and a Jenkins CI, as mentioned in “Required Resources” section of current proposal. Once this is accomplished, we plan for incremental development and releases that follow the Apache guidelines. To increase adoption the major goal for the project would be to provide integration with as much projects from Apache data ecosystem as possible, including new interpreters for Apache Hive, Apache Drill and adding Zeppelin distribution to Apache Bigtop. On the community building side the main goal is to attract a diverse set of contributors by promoting Zeppelin to wide variety of engineers, starting a Zeppelin user groups around the globe and by engaging with other existing Apache projects communities online. == Current Status == Currently, Zeppelin has 4 released versions and is used in production at a number of companies across the globe mentioned in Affiliation section. Current implementation status is pre-release with public API not being finalized yet. Current main and default backend processing engine is Apache Spark with consistent support of SparkSQL. Zeppelin is distributed as a binary package which includes an embedded webserver, application itself, a set of libraries and startup/shutdown scripts. No platform-specific installation packages are provided yet but it is something we are looking to provide as part of Apache Bigtop integration. Project codebase is currently hosted at github.com, which will form the basis of the Apache git repository. === Meritocracy === Zeppelin is an open source project that already leverages meritocracy principles. It was started by a handfull of people and now it has multiple contributors,
Re: [VOTE] Accept Zeppelin into the Apache Incubator
On Thu, Dec 18, 2014 at 9:29 PM, Roman Shaposhnik r...@apache.org wrote: Following the discussion earlier: http://s.apache.org/kTp I would like to call a VOTE for accepting Zeppelin as a new Incubator project. The proposal is available at: https://wiki.apache.org/incubator/ZeppelinProposal and is also attached to the end of this email. Vote is open until at least Sunday, 21th December 2014, 23:59:00 PST [ ] +1 Accept Zeppelin into the Incubator [ ] ±0 Indifferent to the acceptance of Zeppelin [ ] -1 Do not accept Zeppelin because ... +1 (binding) Thanks, Roman. - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
[RESULT] [VOTE] Accept Zeppelin into the Apache Incubator
With 8 binding +1s, 4 non-binding +1 and no +-0 or -1s the vote has passed. Thanks to everyone who voted! Here's a vote tally: Binding +1: Ted Dunning Henry Saputra Hyunsik Choi Hadrian Zbarcea janI Arvind Prabhakar Ate Douma Roman Shaposhnik Non-binding +1: Sharad Agarwal Jaideep Dhok Fabian Hueske Naresh Agarwal I will now proceed with the next steps of establishing a poddling according to the IPMC guidelines. Thanks, Roman. On Thu, Dec 18, 2014 at 9:29 PM, Roman Shaposhnik r...@apache.org wrote: Following the discussion earlier: http://s.apache.org/kTp I would like to call a VOTE for accepting Zeppelin as a new Incubator project. The proposal is available at: https://wiki.apache.org/incubator/ZeppelinProposal and is also attached to the end of this email. Vote is open until at least Sunday, 21th December 2014, 23:59:00 PST [ ] +1 Accept Zeppelin into the Incubator [ ] ±0 Indifferent to the acceptance of Zeppelin [ ] -1 Do not accept Zeppelin because ... Thanks, Roman. == Abstract == Zeppelin is a collaborative data analytics and visualization tool for distributed, general-purpose data processing systems such as Apache Spark, Apache Flink, etc. == Proposal == Zeppelin is a modern web-based tool for the data scientists to collaborate over large-scale data exploration and visualization projects. It is a notebook style interpreter that enable collaborative analysis sessions sharing between users. Zeppelin is independent of the execution framework itself. Current version runs on top of Apache Spark but it has pluggable interpreter APIs to support other data processing systems. More execution frameworks could be added at a later date i.e Apache Flink, Crunch as well as SQL-like backends such as Hive, Tajo, MRQL. We have a strong preference for the project to be called Zeppelin. In case that may not be feasible, alternative names could be: “Mir”, “Yuga” or “Sora”. == Background == Large scale data analysis workflow includes multiple steps like data acquisition, pre-processing, visualization, etc and may include inter-operation of multiple different tools and technologies. With the widespread of the open source general-purpose data processing systems like Spark there is a lack of open source, modern user-friendly tools that combine strengths of interpreted language for data analysis with new in-browser visualization libraries and collaborative capabilities. Zeppelin initially started as a GUI tool for diverse set of SQL-over-Hadoop systems like Hive, Presto, Shark, etc. It was open source since its inception in Sep 2013. Later, it became clear that there was a need for a greater web-based tool for data scientists to collaborate on data exploration over the large-scale projects, not limited to SQL. So Zeppelin integrated full support of Apache Spark while adding a collaborative environment with the ability to run and share interpreter sessions in-browser == Rationale == There are no open source alternatives for a collaborative notebook-based interpreter with support of multiple distributed data processing systems. As a number of companies adopting and contributing back to Zeppelin is growing, we think that having a long-term home at Apache foundation would be a great fit for the project ensuring that processes and procedures are in place to keep project and community “healthy” and free of any commercial, political or legal faults. == Initial Goals == The initial goals will be to move the existing codebase to Apache and integrate with the Apache development process. This includes moving all infrastructure that we currently maintain, such as: a website, a mailing list, an issues tracker and a Jenkins CI, as mentioned in “Required Resources” section of current proposal. Once this is accomplished, we plan for incremental development and releases that follow the Apache guidelines. To increase adoption the major goal for the project would be to provide integration with as much projects from Apache data ecosystem as possible, including new interpreters for Apache Hive, Apache Drill and adding Zeppelin distribution to Apache Bigtop. On the community building side the main goal is to attract a diverse set of contributors by promoting Zeppelin to wide variety of engineers, starting a Zeppelin user groups around the globe and by engaging with other existing Apache projects communities online. == Current Status == Currently, Zeppelin has 4 released versions and is used in production at a number of companies across the globe mentioned in Affiliation section. Current implementation status is pre-release with public API not being finalized yet. Current main and default backend processing engine is Apache Spark with consistent support of SparkSQL. Zeppelin is distributed as a binary package which includes an embedded webserver, application itself, a set of libraries and
Re: [VOTE] Accept Zeppelin into the Apache Incubator
+1 (binding) On 2014-12-19 06:29, Roman Shaposhnik wrote: Following the discussion earlier: http://s.apache.org/kTp I would like to call a VOTE for accepting Zeppelin as a new Incubator project. The proposal is available at: https://wiki.apache.org/incubator/ZeppelinProposal and is also attached to the end of this email. Vote is open until at least Sunday, 21th December 2014, 23:59:00 PST [ ] +1 Accept Zeppelin into the Incubator [ ] ±0 Indifferent to the acceptance of Zeppelin [ ] -1 Do not accept Zeppelin because ... Thanks, Roman. == Abstract == Zeppelin is a collaborative data analytics and visualization tool for distributed, general-purpose data processing systems such as Apache Spark, Apache Flink, etc. == Proposal == Zeppelin is a modern web-based tool for the data scientists to collaborate over large-scale data exploration and visualization projects. It is a notebook style interpreter that enable collaborative analysis sessions sharing between users. Zeppelin is independent of the execution framework itself. Current version runs on top of Apache Spark but it has pluggable interpreter APIs to support other data processing systems. More execution frameworks could be added at a later date i.e Apache Flink, Crunch as well as SQL-like backends such as Hive, Tajo, MRQL. We have a strong preference for the project to be called Zeppelin. In case that may not be feasible, alternative names could be: “Mir”, “Yuga” or “Sora”. == Background == Large scale data analysis workflow includes multiple steps like data acquisition, pre-processing, visualization, etc and may include inter-operation of multiple different tools and technologies. With the widespread of the open source general-purpose data processing systems like Spark there is a lack of open source, modern user-friendly tools that combine strengths of interpreted language for data analysis with new in-browser visualization libraries and collaborative capabilities. Zeppelin initially started as a GUI tool for diverse set of SQL-over-Hadoop systems like Hive, Presto, Shark, etc. It was open source since its inception in Sep 2013. Later, it became clear that there was a need for a greater web-based tool for data scientists to collaborate on data exploration over the large-scale projects, not limited to SQL. So Zeppelin integrated full support of Apache Spark while adding a collaborative environment with the ability to run and share interpreter sessions in-browser == Rationale == There are no open source alternatives for a collaborative notebook-based interpreter with support of multiple distributed data processing systems. As a number of companies adopting and contributing back to Zeppelin is growing, we think that having a long-term home at Apache foundation would be a great fit for the project ensuring that processes and procedures are in place to keep project and community “healthy” and free of any commercial, political or legal faults. == Initial Goals == The initial goals will be to move the existing codebase to Apache and integrate with the Apache development process. This includes moving all infrastructure that we currently maintain, such as: a website, a mailing list, an issues tracker and a Jenkins CI, as mentioned in “Required Resources” section of current proposal. Once this is accomplished, we plan for incremental development and releases that follow the Apache guidelines. To increase adoption the major goal for the project would be to provide integration with as much projects from Apache data ecosystem as possible, including new interpreters for Apache Hive, Apache Drill and adding Zeppelin distribution to Apache Bigtop. On the community building side the main goal is to attract a diverse set of contributors by promoting Zeppelin to wide variety of engineers, starting a Zeppelin user groups around the globe and by engaging with other existing Apache projects communities online. == Current Status == Currently, Zeppelin has 4 released versions and is used in production at a number of companies across the globe mentioned in Affiliation section. Current implementation status is pre-release with public API not being finalized yet. Current main and default backend processing engine is Apache Spark with consistent support of SparkSQL. Zeppelin is distributed as a binary package which includes an embedded webserver, application itself, a set of libraries and startup/shutdown scripts. No platform-specific installation packages are provided yet but it is something we are looking to provide as part of Apache Bigtop integration. Project codebase is currently hosted at github.com, which will form the basis of the Apache git repository. === Meritocracy === Zeppelin is an open source project that already leverages meritocracy principles. It was started by a handfull of people and now it has multiple contributors, although as the number of contribution grows we want to build a diverse developer and user community that is governed
Re: [VOTE] Accept Zeppelin into the Apache Incubator
+1 (non-binding) 2014-12-19 7:24 GMT+01:00 Jaideep Dhok jaideep.d...@inmobi.com: +1 (non-binding) Thanks, Jaideep On Fri, Dec 19, 2014 at 11:50 AM, Hyunsik Choi hyun...@apache.org wrote: +1 (binding) On Friday, December 19, 2014, Roman Shaposhnik r...@apache.org wrote: Following the discussion earlier: http://s.apache.org/kTp I would like to call a VOTE for accepting Zeppelin as a new Incubator project. The proposal is available at: https://wiki.apache.org/incubator/ZeppelinProposal and is also attached to the end of this email. Vote is open until at least Sunday, 21th December 2014, 23:59:00 PST [ ] +1 Accept Zeppelin into the Incubator [ ] ±0 Indifferent to the acceptance of Zeppelin [ ] -1 Do not accept Zeppelin because ... Thanks, Roman. == Abstract == Zeppelin is a collaborative data analytics and visualization tool for distributed, general-purpose data processing systems such as Apache Spark, Apache Flink, etc. == Proposal == Zeppelin is a modern web-based tool for the data scientists to collaborate over large-scale data exploration and visualization projects. It is a notebook style interpreter that enable collaborative analysis sessions sharing between users. Zeppelin is independent of the execution framework itself. Current version runs on top of Apache Spark but it has pluggable interpreter APIs to support other data processing systems. More execution frameworks could be added at a later date i.e Apache Flink, Crunch as well as SQL-like backends such as Hive, Tajo, MRQL. We have a strong preference for the project to be called Zeppelin. In case that may not be feasible, alternative names could be: “Mir”, “Yuga” or “Sora”. == Background == Large scale data analysis workflow includes multiple steps like data acquisition, pre-processing, visualization, etc and may include inter-operation of multiple different tools and technologies. With the widespread of the open source general-purpose data processing systems like Spark there is a lack of open source, modern user-friendly tools that combine strengths of interpreted language for data analysis with new in-browser visualization libraries and collaborative capabilities. Zeppelin initially started as a GUI tool for diverse set of SQL-over-Hadoop systems like Hive, Presto, Shark, etc. It was open source since its inception in Sep 2013. Later, it became clear that there was a need for a greater web-based tool for data scientists to collaborate on data exploration over the large-scale projects, not limited to SQL. So Zeppelin integrated full support of Apache Spark while adding a collaborative environment with the ability to run and share interpreter sessions in-browser == Rationale == There are no open source alternatives for a collaborative notebook-based interpreter with support of multiple distributed data processing systems. As a number of companies adopting and contributing back to Zeppelin is growing, we think that having a long-term home at Apache foundation would be a great fit for the project ensuring that processes and procedures are in place to keep project and community “healthy” and free of any commercial, political or legal faults. == Initial Goals == The initial goals will be to move the existing codebase to Apache and integrate with the Apache development process. This includes moving all infrastructure that we currently maintain, such as: a website, a mailing list, an issues tracker and a Jenkins CI, as mentioned in “Required Resources” section of current proposal. Once this is accomplished, we plan for incremental development and releases that follow the Apache guidelines. To increase adoption the major goal for the project would be to provide integration with as much projects from Apache data ecosystem as possible, including new interpreters for Apache Hive, Apache Drill and adding Zeppelin distribution to Apache Bigtop. On the community building side the main goal is to attract a diverse set of contributors by promoting Zeppelin to wide variety of engineers, starting a Zeppelin user groups around the globe and by engaging with other existing Apache projects communities online. == Current Status == Currently, Zeppelin has 4 released versions and is used in production at a number of companies across the globe mentioned in Affiliation section. Current implementation status is pre-release with public API not being finalized yet. Current main and default backend processing engine is Apache Spark with consistent support of SparkSQL. Zeppelin is distributed as a binary package which includes an embedded webserver, application itself, a set of libraries and startup/shutdown scripts. No platform-specific installation packages are
Re: [VOTE] Accept Zeppelin into the Apache Incubator
+1 (binding) On 12/19/2014 12:29 AM, Roman Shaposhnik wrote: Following the discussion earlier: http://s.apache.org/kTp I would like to call a VOTE for accepting Zeppelin as a new Incubator project. The proposal is available at: https://wiki.apache.org/incubator/ZeppelinProposal and is also attached to the end of this email. Vote is open until at least Sunday, 21th December 2014, 23:59:00 PST [ ] +1 Accept Zeppelin into the Incubator [ ] ±0 Indifferent to the acceptance of Zeppelin [ ] -1 Do not accept Zeppelin because ... Thanks, Roman. == Abstract == Zeppelin is a collaborative data analytics and visualization tool for distributed, general-purpose data processing systems such as Apache Spark, Apache Flink, etc. == Proposal == Zeppelin is a modern web-based tool for the data scientists to collaborate over large-scale data exploration and visualization projects. It is a notebook style interpreter that enable collaborative analysis sessions sharing between users. Zeppelin is independent of the execution framework itself. Current version runs on top of Apache Spark but it has pluggable interpreter APIs to support other data processing systems. More execution frameworks could be added at a later date i.e Apache Flink, Crunch as well as SQL-like backends such as Hive, Tajo, MRQL. We have a strong preference for the project to be called Zeppelin. In case that may not be feasible, alternative names could be: “Mir”, “Yuga” or “Sora”. == Background == Large scale data analysis workflow includes multiple steps like data acquisition, pre-processing, visualization, etc and may include inter-operation of multiple different tools and technologies. With the widespread of the open source general-purpose data processing systems like Spark there is a lack of open source, modern user-friendly tools that combine strengths of interpreted language for data analysis with new in-browser visualization libraries and collaborative capabilities. Zeppelin initially started as a GUI tool for diverse set of SQL-over-Hadoop systems like Hive, Presto, Shark, etc. It was open source since its inception in Sep 2013. Later, it became clear that there was a need for a greater web-based tool for data scientists to collaborate on data exploration over the large-scale projects, not limited to SQL. So Zeppelin integrated full support of Apache Spark while adding a collaborative environment with the ability to run and share interpreter sessions in-browser == Rationale == There are no open source alternatives for a collaborative notebook-based interpreter with support of multiple distributed data processing systems. As a number of companies adopting and contributing back to Zeppelin is growing, we think that having a long-term home at Apache foundation would be a great fit for the project ensuring that processes and procedures are in place to keep project and community “healthy” and free of any commercial, political or legal faults. == Initial Goals == The initial goals will be to move the existing codebase to Apache and integrate with the Apache development process. This includes moving all infrastructure that we currently maintain, such as: a website, a mailing list, an issues tracker and a Jenkins CI, as mentioned in “Required Resources” section of current proposal. Once this is accomplished, we plan for incremental development and releases that follow the Apache guidelines. To increase adoption the major goal for the project would be to provide integration with as much projects from Apache data ecosystem as possible, including new interpreters for Apache Hive, Apache Drill and adding Zeppelin distribution to Apache Bigtop. On the community building side the main goal is to attract a diverse set of contributors by promoting Zeppelin to wide variety of engineers, starting a Zeppelin user groups around the globe and by engaging with other existing Apache projects communities online. == Current Status == Currently, Zeppelin has 4 released versions and is used in production at a number of companies across the globe mentioned in Affiliation section. Current implementation status is pre-release with public API not being finalized yet. Current main and default backend processing engine is Apache Spark with consistent support of SparkSQL. Zeppelin is distributed as a binary package which includes an embedded webserver, application itself, a set of libraries and startup/shutdown scripts. No platform-specific installation packages are provided yet but it is something we are looking to provide as part of Apache Bigtop integration. Project codebase is currently hosted at github.com, which will form the basis of the Apache git repository. === Meritocracy === Zeppelin is an open source project that already leverages meritocracy principles. It was started by a handfull of people and now it has multiple contributors, although as the number of contribution grows we want to build a diverse developer and user community that is
Re: [VOTE] Accept Zeppelin into the Apache Incubator
+1 (binding) On 19 December 2014 at 14:09, Hadrian Zbarcea hzbar...@gmail.com wrote: +1 (binding) On 12/19/2014 12:29 AM, Roman Shaposhnik wrote: Following the discussion earlier: http://s.apache.org/kTp I would like to call a VOTE for accepting Zeppelin as a new Incubator project. The proposal is available at: https://wiki.apache.org/incubator/ZeppelinProposal and is also attached to the end of this email. Vote is open until at least Sunday, 21th December 2014, 23:59:00 PST [ ] +1 Accept Zeppelin into the Incubator [ ] ±0 Indifferent to the acceptance of Zeppelin [ ] -1 Do not accept Zeppelin because ... Thanks, Roman. == Abstract == Zeppelin is a collaborative data analytics and visualization tool for distributed, general-purpose data processing systems such as Apache Spark, Apache Flink, etc. == Proposal == Zeppelin is a modern web-based tool for the data scientists to collaborate over large-scale data exploration and visualization projects. It is a notebook style interpreter that enable collaborative analysis sessions sharing between users. Zeppelin is independent of the execution framework itself. Current version runs on top of Apache Spark but it has pluggable interpreter APIs to support other data processing systems. More execution frameworks could be added at a later date i.e Apache Flink, Crunch as well as SQL-like backends such as Hive, Tajo, MRQL. We have a strong preference for the project to be called Zeppelin. In case that may not be feasible, alternative names could be: “Mir”, “Yuga” or “Sora”. == Background == Large scale data analysis workflow includes multiple steps like data acquisition, pre-processing, visualization, etc and may include inter-operation of multiple different tools and technologies. With the widespread of the open source general-purpose data processing systems like Spark there is a lack of open source, modern user-friendly tools that combine strengths of interpreted language for data analysis with new in-browser visualization libraries and collaborative capabilities. Zeppelin initially started as a GUI tool for diverse set of SQL-over-Hadoop systems like Hive, Presto, Shark, etc. It was open source since its inception in Sep 2013. Later, it became clear that there was a need for a greater web-based tool for data scientists to collaborate on data exploration over the large-scale projects, not limited to SQL. So Zeppelin integrated full support of Apache Spark while adding a collaborative environment with the ability to run and share interpreter sessions in-browser == Rationale == There are no open source alternatives for a collaborative notebook-based interpreter with support of multiple distributed data processing systems. As a number of companies adopting and contributing back to Zeppelin is growing, we think that having a long-term home at Apache foundation would be a great fit for the project ensuring that processes and procedures are in place to keep project and community “healthy” and free of any commercial, political or legal faults. == Initial Goals == The initial goals will be to move the existing codebase to Apache and integrate with the Apache development process. This includes moving all infrastructure that we currently maintain, such as: a website, a mailing list, an issues tracker and a Jenkins CI, as mentioned in “Required Resources” section of current proposal. Once this is accomplished, we plan for incremental development and releases that follow the Apache guidelines. To increase adoption the major goal for the project would be to provide integration with as much projects from Apache data ecosystem as possible, including new interpreters for Apache Hive, Apache Drill and adding Zeppelin distribution to Apache Bigtop. On the community building side the main goal is to attract a diverse set of contributors by promoting Zeppelin to wide variety of engineers, starting a Zeppelin user groups around the globe and by engaging with other existing Apache projects communities online. == Current Status == Currently, Zeppelin has 4 released versions and is used in production at a number of companies across the globe mentioned in Affiliation section. Current implementation status is pre-release with public API not being finalized yet. Current main and default backend processing engine is Apache Spark with consistent support of SparkSQL. Zeppelin is distributed as a binary package which includes an embedded webserver, application itself, a set of libraries and startup/shutdown scripts. No platform-specific installation packages are provided yet but it is something we are looking to provide as part of Apache Bigtop integration. Project codebase is currently hosted at github.com, which will form the basis of the Apache git repository. === Meritocracy === Zeppelin is an open source project that already leverages meritocracy principles. It was started
Re: [VOTE] Accept Zeppelin into the Apache Incubator
+1 (non-binding) Thanks Naresh On Fri, Dec 19, 2014 at 4:32 PM, Fabian Hueske fhue...@apache.org wrote: +1 (non-binding) 2014-12-19 7:24 GMT+01:00 Jaideep Dhok jaideep.d...@inmobi.com: +1 (non-binding) Thanks, Jaideep On Fri, Dec 19, 2014 at 11:50 AM, Hyunsik Choi hyun...@apache.org wrote: +1 (binding) On Friday, December 19, 2014, Roman Shaposhnik r...@apache.org wrote: Following the discussion earlier: http://s.apache.org/kTp I would like to call a VOTE for accepting Zeppelin as a new Incubator project. The proposal is available at: https://wiki.apache.org/incubator/ZeppelinProposal and is also attached to the end of this email. Vote is open until at least Sunday, 21th December 2014, 23:59:00 PST [ ] +1 Accept Zeppelin into the Incubator [ ] ±0 Indifferent to the acceptance of Zeppelin [ ] -1 Do not accept Zeppelin because ... Thanks, Roman. == Abstract == Zeppelin is a collaborative data analytics and visualization tool for distributed, general-purpose data processing systems such as Apache Spark, Apache Flink, etc. == Proposal == Zeppelin is a modern web-based tool for the data scientists to collaborate over large-scale data exploration and visualization projects. It is a notebook style interpreter that enable collaborative analysis sessions sharing between users. Zeppelin is independent of the execution framework itself. Current version runs on top of Apache Spark but it has pluggable interpreter APIs to support other data processing systems. More execution frameworks could be added at a later date i.e Apache Flink, Crunch as well as SQL-like backends such as Hive, Tajo, MRQL. We have a strong preference for the project to be called Zeppelin. In case that may not be feasible, alternative names could be: “Mir”, “Yuga” or “Sora”. == Background == Large scale data analysis workflow includes multiple steps like data acquisition, pre-processing, visualization, etc and may include inter-operation of multiple different tools and technologies. With the widespread of the open source general-purpose data processing systems like Spark there is a lack of open source, modern user-friendly tools that combine strengths of interpreted language for data analysis with new in-browser visualization libraries and collaborative capabilities. Zeppelin initially started as a GUI tool for diverse set of SQL-over-Hadoop systems like Hive, Presto, Shark, etc. It was open source since its inception in Sep 2013. Later, it became clear that there was a need for a greater web-based tool for data scientists to collaborate on data exploration over the large-scale projects, not limited to SQL. So Zeppelin integrated full support of Apache Spark while adding a collaborative environment with the ability to run and share interpreter sessions in-browser == Rationale == There are no open source alternatives for a collaborative notebook-based interpreter with support of multiple distributed data processing systems. As a number of companies adopting and contributing back to Zeppelin is growing, we think that having a long-term home at Apache foundation would be a great fit for the project ensuring that processes and procedures are in place to keep project and community “healthy” and free of any commercial, political or legal faults. == Initial Goals == The initial goals will be to move the existing codebase to Apache and integrate with the Apache development process. This includes moving all infrastructure that we currently maintain, such as: a website, a mailing list, an issues tracker and a Jenkins CI, as mentioned in “Required Resources” section of current proposal. Once this is accomplished, we plan for incremental development and releases that follow the Apache guidelines. To increase adoption the major goal for the project would be to provide integration with as much projects from Apache data ecosystem as possible, including new interpreters for Apache Hive, Apache Drill and adding Zeppelin distribution to Apache Bigtop. On the community building side the main goal is to attract a diverse set of contributors by promoting Zeppelin to wide variety of engineers, starting a Zeppelin user groups around the globe and by engaging with other existing Apache projects communities online. == Current Status == Currently, Zeppelin has 4 released versions and is used in production at a number of companies across the globe mentioned in Affiliation section. Current implementation status is pre-release with public API not being finalized yet. Current main and default backend processing engine is Apache Spark with consistent support of
Re: [VOTE] Accept Zeppelin into the Apache Incubator
+1 (binding) Regards, Arvind Prabhakar On Thu, Dec 18, 2014 at 9:29 PM, Roman Shaposhnik r...@apache.org wrote: Following the discussion earlier: http://s.apache.org/kTp I would like to call a VOTE for accepting Zeppelin as a new Incubator project. The proposal is available at: https://wiki.apache.org/incubator/ZeppelinProposal and is also attached to the end of this email. Vote is open until at least Sunday, 21th December 2014, 23:59:00 PST [ ] +1 Accept Zeppelin into the Incubator [ ] ±0 Indifferent to the acceptance of Zeppelin [ ] -1 Do not accept Zeppelin because ... Thanks, Roman. == Abstract == Zeppelin is a collaborative data analytics and visualization tool for distributed, general-purpose data processing systems such as Apache Spark, Apache Flink, etc. == Proposal == Zeppelin is a modern web-based tool for the data scientists to collaborate over large-scale data exploration and visualization projects. It is a notebook style interpreter that enable collaborative analysis sessions sharing between users. Zeppelin is independent of the execution framework itself. Current version runs on top of Apache Spark but it has pluggable interpreter APIs to support other data processing systems. More execution frameworks could be added at a later date i.e Apache Flink, Crunch as well as SQL-like backends such as Hive, Tajo, MRQL. We have a strong preference for the project to be called Zeppelin. In case that may not be feasible, alternative names could be: “Mir”, “Yuga” or “Sora”. == Background == Large scale data analysis workflow includes multiple steps like data acquisition, pre-processing, visualization, etc and may include inter-operation of multiple different tools and technologies. With the widespread of the open source general-purpose data processing systems like Spark there is a lack of open source, modern user-friendly tools that combine strengths of interpreted language for data analysis with new in-browser visualization libraries and collaborative capabilities. Zeppelin initially started as a GUI tool for diverse set of SQL-over-Hadoop systems like Hive, Presto, Shark, etc. It was open source since its inception in Sep 2013. Later, it became clear that there was a need for a greater web-based tool for data scientists to collaborate on data exploration over the large-scale projects, not limited to SQL. So Zeppelin integrated full support of Apache Spark while adding a collaborative environment with the ability to run and share interpreter sessions in-browser == Rationale == There are no open source alternatives for a collaborative notebook-based interpreter with support of multiple distributed data processing systems. As a number of companies adopting and contributing back to Zeppelin is growing, we think that having a long-term home at Apache foundation would be a great fit for the project ensuring that processes and procedures are in place to keep project and community “healthy” and free of any commercial, political or legal faults. == Initial Goals == The initial goals will be to move the existing codebase to Apache and integrate with the Apache development process. This includes moving all infrastructure that we currently maintain, such as: a website, a mailing list, an issues tracker and a Jenkins CI, as mentioned in “Required Resources” section of current proposal. Once this is accomplished, we plan for incremental development and releases that follow the Apache guidelines. To increase adoption the major goal for the project would be to provide integration with as much projects from Apache data ecosystem as possible, including new interpreters for Apache Hive, Apache Drill and adding Zeppelin distribution to Apache Bigtop. On the community building side the main goal is to attract a diverse set of contributors by promoting Zeppelin to wide variety of engineers, starting a Zeppelin user groups around the globe and by engaging with other existing Apache projects communities online. == Current Status == Currently, Zeppelin has 4 released versions and is used in production at a number of companies across the globe mentioned in Affiliation section. Current implementation status is pre-release with public API not being finalized yet. Current main and default backend processing engine is Apache Spark with consistent support of SparkSQL. Zeppelin is distributed as a binary package which includes an embedded webserver, application itself, a set of libraries and startup/shutdown scripts. No platform-specific installation packages are provided yet but it is something we are looking to provide as part of Apache Bigtop integration. Project codebase is currently hosted at github.com, which will form the basis of the Apache git repository. === Meritocracy === Zeppelin is an open source project that already leverages meritocracy principles. It was started by a handfull of people and now it has
[VOTE] Accept Zeppelin into the Apache Incubator
Following the discussion earlier: http://s.apache.org/kTp I would like to call a VOTE for accepting Zeppelin as a new Incubator project. The proposal is available at: https://wiki.apache.org/incubator/ZeppelinProposal and is also attached to the end of this email. Vote is open until at least Sunday, 21th December 2014, 23:59:00 PST [ ] +1 Accept Zeppelin into the Incubator [ ] ±0 Indifferent to the acceptance of Zeppelin [ ] -1 Do not accept Zeppelin because ... Thanks, Roman. == Abstract == Zeppelin is a collaborative data analytics and visualization tool for distributed, general-purpose data processing systems such as Apache Spark, Apache Flink, etc. == Proposal == Zeppelin is a modern web-based tool for the data scientists to collaborate over large-scale data exploration and visualization projects. It is a notebook style interpreter that enable collaborative analysis sessions sharing between users. Zeppelin is independent of the execution framework itself. Current version runs on top of Apache Spark but it has pluggable interpreter APIs to support other data processing systems. More execution frameworks could be added at a later date i.e Apache Flink, Crunch as well as SQL-like backends such as Hive, Tajo, MRQL. We have a strong preference for the project to be called Zeppelin. In case that may not be feasible, alternative names could be: “Mir”, “Yuga” or “Sora”. == Background == Large scale data analysis workflow includes multiple steps like data acquisition, pre-processing, visualization, etc and may include inter-operation of multiple different tools and technologies. With the widespread of the open source general-purpose data processing systems like Spark there is a lack of open source, modern user-friendly tools that combine strengths of interpreted language for data analysis with new in-browser visualization libraries and collaborative capabilities. Zeppelin initially started as a GUI tool for diverse set of SQL-over-Hadoop systems like Hive, Presto, Shark, etc. It was open source since its inception in Sep 2013. Later, it became clear that there was a need for a greater web-based tool for data scientists to collaborate on data exploration over the large-scale projects, not limited to SQL. So Zeppelin integrated full support of Apache Spark while adding a collaborative environment with the ability to run and share interpreter sessions in-browser == Rationale == There are no open source alternatives for a collaborative notebook-based interpreter with support of multiple distributed data processing systems. As a number of companies adopting and contributing back to Zeppelin is growing, we think that having a long-term home at Apache foundation would be a great fit for the project ensuring that processes and procedures are in place to keep project and community “healthy” and free of any commercial, political or legal faults. == Initial Goals == The initial goals will be to move the existing codebase to Apache and integrate with the Apache development process. This includes moving all infrastructure that we currently maintain, such as: a website, a mailing list, an issues tracker and a Jenkins CI, as mentioned in “Required Resources” section of current proposal. Once this is accomplished, we plan for incremental development and releases that follow the Apache guidelines. To increase adoption the major goal for the project would be to provide integration with as much projects from Apache data ecosystem as possible, including new interpreters for Apache Hive, Apache Drill and adding Zeppelin distribution to Apache Bigtop. On the community building side the main goal is to attract a diverse set of contributors by promoting Zeppelin to wide variety of engineers, starting a Zeppelin user groups around the globe and by engaging with other existing Apache projects communities online. == Current Status == Currently, Zeppelin has 4 released versions and is used in production at a number of companies across the globe mentioned in Affiliation section. Current implementation status is pre-release with public API not being finalized yet. Current main and default backend processing engine is Apache Spark with consistent support of SparkSQL. Zeppelin is distributed as a binary package which includes an embedded webserver, application itself, a set of libraries and startup/shutdown scripts. No platform-specific installation packages are provided yet but it is something we are looking to provide as part of Apache Bigtop integration. Project codebase is currently hosted at github.com, which will form the basis of the Apache git repository. === Meritocracy === Zeppelin is an open source project that already leverages meritocracy principles. It was started by a handfull of people and now it has multiple contributors, although as the number of contribution grows we want to build a diverse developer and user community that is governed by the Apache way. Users and new contributors will be treated
Re: [VOTE] Accept Zeppelin into the Apache Incubator
+1 (binding) On Thu, Dec 18, 2014 at 9:29 PM, Roman Shaposhnik r...@apache.org wrote: Following the discussion earlier: http://s.apache.org/kTp I would like to call a VOTE for accepting Zeppelin as a new Incubator project. The proposal is available at: https://wiki.apache.org/incubator/ZeppelinProposal and is also attached to the end of this email. Vote is open until at least Sunday, 21th December 2014, 23:59:00 PST [ ] +1 Accept Zeppelin into the Incubator [ ] ±0 Indifferent to the acceptance of Zeppelin [ ] -1 Do not accept Zeppelin because ... Thanks, Roman. == Abstract == Zeppelin is a collaborative data analytics and visualization tool for distributed, general-purpose data processing systems such as Apache Spark, Apache Flink, etc. == Proposal == Zeppelin is a modern web-based tool for the data scientists to collaborate over large-scale data exploration and visualization projects. It is a notebook style interpreter that enable collaborative analysis sessions sharing between users. Zeppelin is independent of the execution framework itself. Current version runs on top of Apache Spark but it has pluggable interpreter APIs to support other data processing systems. More execution frameworks could be added at a later date i.e Apache Flink, Crunch as well as SQL-like backends such as Hive, Tajo, MRQL. We have a strong preference for the project to be called Zeppelin. In case that may not be feasible, alternative names could be: “Mir”, “Yuga” or “Sora”. == Background == Large scale data analysis workflow includes multiple steps like data acquisition, pre-processing, visualization, etc and may include inter-operation of multiple different tools and technologies. With the widespread of the open source general-purpose data processing systems like Spark there is a lack of open source, modern user-friendly tools that combine strengths of interpreted language for data analysis with new in-browser visualization libraries and collaborative capabilities. Zeppelin initially started as a GUI tool for diverse set of SQL-over-Hadoop systems like Hive, Presto, Shark, etc. It was open source since its inception in Sep 2013. Later, it became clear that there was a need for a greater web-based tool for data scientists to collaborate on data exploration over the large-scale projects, not limited to SQL. So Zeppelin integrated full support of Apache Spark while adding a collaborative environment with the ability to run and share interpreter sessions in-browser == Rationale == There are no open source alternatives for a collaborative notebook-based interpreter with support of multiple distributed data processing systems. As a number of companies adopting and contributing back to Zeppelin is growing, we think that having a long-term home at Apache foundation would be a great fit for the project ensuring that processes and procedures are in place to keep project and community “healthy” and free of any commercial, political or legal faults. == Initial Goals == The initial goals will be to move the existing codebase to Apache and integrate with the Apache development process. This includes moving all infrastructure that we currently maintain, such as: a website, a mailing list, an issues tracker and a Jenkins CI, as mentioned in “Required Resources” section of current proposal. Once this is accomplished, we plan for incremental development and releases that follow the Apache guidelines. To increase adoption the major goal for the project would be to provide integration with as much projects from Apache data ecosystem as possible, including new interpreters for Apache Hive, Apache Drill and adding Zeppelin distribution to Apache Bigtop. On the community building side the main goal is to attract a diverse set of contributors by promoting Zeppelin to wide variety of engineers, starting a Zeppelin user groups around the globe and by engaging with other existing Apache projects communities online. == Current Status == Currently, Zeppelin has 4 released versions and is used in production at a number of companies across the globe mentioned in Affiliation section. Current implementation status is pre-release with public API not being finalized yet. Current main and default backend processing engine is Apache Spark with consistent support of SparkSQL. Zeppelin is distributed as a binary package which includes an embedded webserver, application itself, a set of libraries and startup/shutdown scripts. No platform-specific installation packages are provided yet but it is something we are looking to provide as part of Apache Bigtop integration. Project codebase is currently hosted at github.com, which will form the basis of the Apache git repository. === Meritocracy === Zeppelin is an open source project that already leverages meritocracy principles. It was started by a handfull of people and now it has multiple contributors,
Re: [VOTE] Accept Zeppelin into the Apache Incubator
+1 (non-binding) On Fri, Dec 19, 2014 at 10:59 AM, Roman Shaposhnik r...@apache.org wrote: Following the discussion earlier: http://s.apache.org/kTp I would like to call a VOTE for accepting Zeppelin as a new Incubator project. The proposal is available at: https://wiki.apache.org/incubator/ZeppelinProposal and is also attached to the end of this email. Vote is open until at least Sunday, 21th December 2014, 23:59:00 PST [ ] +1 Accept Zeppelin into the Incubator [ ] ±0 Indifferent to the acceptance of Zeppelin [ ] -1 Do not accept Zeppelin because ... Thanks, Roman. == Abstract == Zeppelin is a collaborative data analytics and visualization tool for distributed, general-purpose data processing systems such as Apache Spark, Apache Flink, etc. == Proposal == Zeppelin is a modern web-based tool for the data scientists to collaborate over large-scale data exploration and visualization projects. It is a notebook style interpreter that enable collaborative analysis sessions sharing between users. Zeppelin is independent of the execution framework itself. Current version runs on top of Apache Spark but it has pluggable interpreter APIs to support other data processing systems. More execution frameworks could be added at a later date i.e Apache Flink, Crunch as well as SQL-like backends such as Hive, Tajo, MRQL. We have a strong preference for the project to be called Zeppelin. In case that may not be feasible, alternative names could be: “Mir”, “Yuga” or “Sora”. == Background == Large scale data analysis workflow includes multiple steps like data acquisition, pre-processing, visualization, etc and may include inter-operation of multiple different tools and technologies. With the widespread of the open source general-purpose data processing systems like Spark there is a lack of open source, modern user-friendly tools that combine strengths of interpreted language for data analysis with new in-browser visualization libraries and collaborative capabilities. Zeppelin initially started as a GUI tool for diverse set of SQL-over-Hadoop systems like Hive, Presto, Shark, etc. It was open source since its inception in Sep 2013. Later, it became clear that there was a need for a greater web-based tool for data scientists to collaborate on data exploration over the large-scale projects, not limited to SQL. So Zeppelin integrated full support of Apache Spark while adding a collaborative environment with the ability to run and share interpreter sessions in-browser == Rationale == There are no open source alternatives for a collaborative notebook-based interpreter with support of multiple distributed data processing systems. As a number of companies adopting and contributing back to Zeppelin is growing, we think that having a long-term home at Apache foundation would be a great fit for the project ensuring that processes and procedures are in place to keep project and community “healthy” and free of any commercial, political or legal faults. == Initial Goals == The initial goals will be to move the existing codebase to Apache and integrate with the Apache development process. This includes moving all infrastructure that we currently maintain, such as: a website, a mailing list, an issues tracker and a Jenkins CI, as mentioned in “Required Resources” section of current proposal. Once this is accomplished, we plan for incremental development and releases that follow the Apache guidelines. To increase adoption the major goal for the project would be to provide integration with as much projects from Apache data ecosystem as possible, including new interpreters for Apache Hive, Apache Drill and adding Zeppelin distribution to Apache Bigtop. On the community building side the main goal is to attract a diverse set of contributors by promoting Zeppelin to wide variety of engineers, starting a Zeppelin user groups around the globe and by engaging with other existing Apache projects communities online. == Current Status == Currently, Zeppelin has 4 released versions and is used in production at a number of companies across the globe mentioned in Affiliation section. Current implementation status is pre-release with public API not being finalized yet. Current main and default backend processing engine is Apache Spark with consistent support of SparkSQL. Zeppelin is distributed as a binary package which includes an embedded webserver, application itself, a set of libraries and startup/shutdown scripts. No platform-specific installation packages are provided yet but it is something we are looking to provide as part of Apache Bigtop integration. Project codebase is currently hosted at github.com, which will form the basis of the Apache git repository. === Meritocracy === Zeppelin is an open source project that already leverages meritocracy principles. It was started by a handfull of people and now it has multiple contributors,
Re: [VOTE] Accept Zeppelin into the Apache Incubator
+1 On Thu, Dec 18, 2014 at 9:29 PM, Roman Shaposhnik r...@apache.org wrote: Following the discussion earlier: http://s.apache.org/kTp I would like to call a VOTE for accepting Zeppelin as a new Incubator project. The proposal is available at: https://wiki.apache.org/incubator/ZeppelinProposal and is also attached to the end of this email. Vote is open until at least Sunday, 21th December 2014, 23:59:00 PST [ ] +1 Accept Zeppelin into the Incubator [ ] ±0 Indifferent to the acceptance of Zeppelin [ ] -1 Do not accept Zeppelin because ... Thanks, Roman. == Abstract == Zeppelin is a collaborative data analytics and visualization tool for distributed, general-purpose data processing systems such as Apache Spark, Apache Flink, etc. == Proposal == Zeppelin is a modern web-based tool for the data scientists to collaborate over large-scale data exploration and visualization projects. It is a notebook style interpreter that enable collaborative analysis sessions sharing between users. Zeppelin is independent of the execution framework itself. Current version runs on top of Apache Spark but it has pluggable interpreter APIs to support other data processing systems. More execution frameworks could be added at a later date i.e Apache Flink, Crunch as well as SQL-like backends such as Hive, Tajo, MRQL. We have a strong preference for the project to be called Zeppelin. In case that may not be feasible, alternative names could be: “Mir”, “Yuga” or “Sora”. == Background == Large scale data analysis workflow includes multiple steps like data acquisition, pre-processing, visualization, etc and may include inter-operation of multiple different tools and technologies. With the widespread of the open source general-purpose data processing systems like Spark there is a lack of open source, modern user-friendly tools that combine strengths of interpreted language for data analysis with new in-browser visualization libraries and collaborative capabilities. Zeppelin initially started as a GUI tool for diverse set of SQL-over-Hadoop systems like Hive, Presto, Shark, etc. It was open source since its inception in Sep 2013. Later, it became clear that there was a need for a greater web-based tool for data scientists to collaborate on data exploration over the large-scale projects, not limited to SQL. So Zeppelin integrated full support of Apache Spark while adding a collaborative environment with the ability to run and share interpreter sessions in-browser == Rationale == There are no open source alternatives for a collaborative notebook-based interpreter with support of multiple distributed data processing systems. As a number of companies adopting and contributing back to Zeppelin is growing, we think that having a long-term home at Apache foundation would be a great fit for the project ensuring that processes and procedures are in place to keep project and community “healthy” and free of any commercial, political or legal faults. == Initial Goals == The initial goals will be to move the existing codebase to Apache and integrate with the Apache development process. This includes moving all infrastructure that we currently maintain, such as: a website, a mailing list, an issues tracker and a Jenkins CI, as mentioned in “Required Resources” section of current proposal. Once this is accomplished, we plan for incremental development and releases that follow the Apache guidelines. To increase adoption the major goal for the project would be to provide integration with as much projects from Apache data ecosystem as possible, including new interpreters for Apache Hive, Apache Drill and adding Zeppelin distribution to Apache Bigtop. On the community building side the main goal is to attract a diverse set of contributors by promoting Zeppelin to wide variety of engineers, starting a Zeppelin user groups around the globe and by engaging with other existing Apache projects communities online. == Current Status == Currently, Zeppelin has 4 released versions and is used in production at a number of companies across the globe mentioned in Affiliation section. Current implementation status is pre-release with public API not being finalized yet. Current main and default backend processing engine is Apache Spark with consistent support of SparkSQL. Zeppelin is distributed as a binary package which includes an embedded webserver, application itself, a set of libraries and startup/shutdown scripts. No platform-specific installation packages are provided yet but it is something we are looking to provide as part of Apache Bigtop integration. Project codebase is currently hosted at github.com, which will form the basis of the Apache git repository. === Meritocracy === Zeppelin is an open source project that already leverages meritocracy principles. It was started by a handfull of people and now it has multiple contributors, although as the
Re: [VOTE] Accept Zeppelin into the Apache Incubator
+1 (binding) On Friday, December 19, 2014, Roman Shaposhnik r...@apache.org wrote: Following the discussion earlier: http://s.apache.org/kTp I would like to call a VOTE for accepting Zeppelin as a new Incubator project. The proposal is available at: https://wiki.apache.org/incubator/ZeppelinProposal and is also attached to the end of this email. Vote is open until at least Sunday, 21th December 2014, 23:59:00 PST [ ] +1 Accept Zeppelin into the Incubator [ ] ±0 Indifferent to the acceptance of Zeppelin [ ] -1 Do not accept Zeppelin because ... Thanks, Roman. == Abstract == Zeppelin is a collaborative data analytics and visualization tool for distributed, general-purpose data processing systems such as Apache Spark, Apache Flink, etc. == Proposal == Zeppelin is a modern web-based tool for the data scientists to collaborate over large-scale data exploration and visualization projects. It is a notebook style interpreter that enable collaborative analysis sessions sharing between users. Zeppelin is independent of the execution framework itself. Current version runs on top of Apache Spark but it has pluggable interpreter APIs to support other data processing systems. More execution frameworks could be added at a later date i.e Apache Flink, Crunch as well as SQL-like backends such as Hive, Tajo, MRQL. We have a strong preference for the project to be called Zeppelin. In case that may not be feasible, alternative names could be: “Mir”, “Yuga” or “Sora”. == Background == Large scale data analysis workflow includes multiple steps like data acquisition, pre-processing, visualization, etc and may include inter-operation of multiple different tools and technologies. With the widespread of the open source general-purpose data processing systems like Spark there is a lack of open source, modern user-friendly tools that combine strengths of interpreted language for data analysis with new in-browser visualization libraries and collaborative capabilities. Zeppelin initially started as a GUI tool for diverse set of SQL-over-Hadoop systems like Hive, Presto, Shark, etc. It was open source since its inception in Sep 2013. Later, it became clear that there was a need for a greater web-based tool for data scientists to collaborate on data exploration over the large-scale projects, not limited to SQL. So Zeppelin integrated full support of Apache Spark while adding a collaborative environment with the ability to run and share interpreter sessions in-browser == Rationale == There are no open source alternatives for a collaborative notebook-based interpreter with support of multiple distributed data processing systems. As a number of companies adopting and contributing back to Zeppelin is growing, we think that having a long-term home at Apache foundation would be a great fit for the project ensuring that processes and procedures are in place to keep project and community “healthy” and free of any commercial, political or legal faults. == Initial Goals == The initial goals will be to move the existing codebase to Apache and integrate with the Apache development process. This includes moving all infrastructure that we currently maintain, such as: a website, a mailing list, an issues tracker and a Jenkins CI, as mentioned in “Required Resources” section of current proposal. Once this is accomplished, we plan for incremental development and releases that follow the Apache guidelines. To increase adoption the major goal for the project would be to provide integration with as much projects from Apache data ecosystem as possible, including new interpreters for Apache Hive, Apache Drill and adding Zeppelin distribution to Apache Bigtop. On the community building side the main goal is to attract a diverse set of contributors by promoting Zeppelin to wide variety of engineers, starting a Zeppelin user groups around the globe and by engaging with other existing Apache projects communities online. == Current Status == Currently, Zeppelin has 4 released versions and is used in production at a number of companies across the globe mentioned in Affiliation section. Current implementation status is pre-release with public API not being finalized yet. Current main and default backend processing engine is Apache Spark with consistent support of SparkSQL. Zeppelin is distributed as a binary package which includes an embedded webserver, application itself, a set of libraries and startup/shutdown scripts. No platform-specific installation packages are provided yet but it is something we are looking to provide as part of Apache Bigtop integration. Project codebase is currently hosted at github.com, which will form the basis of the Apache git repository. === Meritocracy === Zeppelin is an open source project that already leverages meritocracy principles. It was started by a handfull of people and now it has multiple contributors,
Re: [VOTE] Accept Zeppelin into the Apache Incubator
+1 (non-binding) Thanks, Jaideep On Fri, Dec 19, 2014 at 11:50 AM, Hyunsik Choi hyun...@apache.org wrote: +1 (binding) On Friday, December 19, 2014, Roman Shaposhnik r...@apache.org wrote: Following the discussion earlier: http://s.apache.org/kTp I would like to call a VOTE for accepting Zeppelin as a new Incubator project. The proposal is available at: https://wiki.apache.org/incubator/ZeppelinProposal and is also attached to the end of this email. Vote is open until at least Sunday, 21th December 2014, 23:59:00 PST [ ] +1 Accept Zeppelin into the Incubator [ ] ±0 Indifferent to the acceptance of Zeppelin [ ] -1 Do not accept Zeppelin because ... Thanks, Roman. == Abstract == Zeppelin is a collaborative data analytics and visualization tool for distributed, general-purpose data processing systems such as Apache Spark, Apache Flink, etc. == Proposal == Zeppelin is a modern web-based tool for the data scientists to collaborate over large-scale data exploration and visualization projects. It is a notebook style interpreter that enable collaborative analysis sessions sharing between users. Zeppelin is independent of the execution framework itself. Current version runs on top of Apache Spark but it has pluggable interpreter APIs to support other data processing systems. More execution frameworks could be added at a later date i.e Apache Flink, Crunch as well as SQL-like backends such as Hive, Tajo, MRQL. We have a strong preference for the project to be called Zeppelin. In case that may not be feasible, alternative names could be: “Mir”, “Yuga” or “Sora”. == Background == Large scale data analysis workflow includes multiple steps like data acquisition, pre-processing, visualization, etc and may include inter-operation of multiple different tools and technologies. With the widespread of the open source general-purpose data processing systems like Spark there is a lack of open source, modern user-friendly tools that combine strengths of interpreted language for data analysis with new in-browser visualization libraries and collaborative capabilities. Zeppelin initially started as a GUI tool for diverse set of SQL-over-Hadoop systems like Hive, Presto, Shark, etc. It was open source since its inception in Sep 2013. Later, it became clear that there was a need for a greater web-based tool for data scientists to collaborate on data exploration over the large-scale projects, not limited to SQL. So Zeppelin integrated full support of Apache Spark while adding a collaborative environment with the ability to run and share interpreter sessions in-browser == Rationale == There are no open source alternatives for a collaborative notebook-based interpreter with support of multiple distributed data processing systems. As a number of companies adopting and contributing back to Zeppelin is growing, we think that having a long-term home at Apache foundation would be a great fit for the project ensuring that processes and procedures are in place to keep project and community “healthy” and free of any commercial, political or legal faults. == Initial Goals == The initial goals will be to move the existing codebase to Apache and integrate with the Apache development process. This includes moving all infrastructure that we currently maintain, such as: a website, a mailing list, an issues tracker and a Jenkins CI, as mentioned in “Required Resources” section of current proposal. Once this is accomplished, we plan for incremental development and releases that follow the Apache guidelines. To increase adoption the major goal for the project would be to provide integration with as much projects from Apache data ecosystem as possible, including new interpreters for Apache Hive, Apache Drill and adding Zeppelin distribution to Apache Bigtop. On the community building side the main goal is to attract a diverse set of contributors by promoting Zeppelin to wide variety of engineers, starting a Zeppelin user groups around the globe and by engaging with other existing Apache projects communities online. == Current Status == Currently, Zeppelin has 4 released versions and is used in production at a number of companies across the globe mentioned in Affiliation section. Current implementation status is pre-release with public API not being finalized yet. Current main and default backend processing engine is Apache Spark with consistent support of SparkSQL. Zeppelin is distributed as a binary package which includes an embedded webserver, application itself, a set of libraries and startup/shutdown scripts. No platform-specific installation packages are provided yet but it is something we are looking to provide as part of Apache Bigtop integration. Project codebase is currently hosted at github.com, which will form the basis of the