project.xml
Hi I updated corinthia.xml (commit in svn on my local copy) yesterday but the web page still looks unchanged. Did I miss a step or do I have to do something different? thanks in advance for a advice. rgds jan i -- Sent from My iPad, sorry for any misspellings.
Re: [VOTE] accept SAMOA into incubator
+1 (non-binding) Thanks Naresh On Fri, Dec 12, 2014 at 7:22 AM, John D. Ament john.d.am...@gmail.com wrote: +1 binding On Thu Dec 11 2014 at 5:10:50 PM Konstantin Boudnik c...@apache.org wrote: +1 (binding). I small comment: we don't do users@ list of podlings, do we? If so samoa-users@googlegroups -- us...@samoa.incubator.apache.org will need to be converged into dev@. Not all podlings use a users@, but they can if they like. Usually if it's coming from an established community there will be one. Cos On Thu, Dec 11, 2014 at 10:02AM, Daniel Dai wrote: Following the discussion earlier, I'm calling a vote to accept SAMOA as a new Incubator project. [ ] +1 Accept SAMOA into the Incubator [ ] +0 Indifferent to the acceptance of SAMOA [ ] -1 Do not accept SAMOA because ... The vote will be open for at least 72h and closes at the earliest on Dec 14 19:00 GMT. https://wiki.apache.org/incubator/SAMOAProposal Thanks, Daniel = SAMOA = == Abstract == SAMOA is an an open-source platform for mining big data streams. == Proposal == SAMOA provides a collection of distributed streaming algorithms for the most common data mining and machine learning tasks such as classification, clustering, and regression, as well as programming abstractions to develop new algorithms that run on top of distributed stream processing engines (DSPEs). It features a pluggable architecture that allows it to run on several DSPEs such as Apache Storm, Apache S4, and Apache Samza. == Background == Hadoop and its ecosystem have changed the way data are processed by allowing to push algorithms to unprecedented scale. As an example, Mahout allows to run data mining and machine learning algorithms on very large datasets. However, Hadoop and Mahout are not suited to handle streaming data. Simply put, the goal of SAMOA is to provide a streaming counterpart to Mahout. == Rationale == SAMOA aims to fill the current gap in tools for mining large scale streams. Many organizations can benefit from a scalable stream mining platform system such as SAMOA. SAMOA is a natural fit for the Apache Software Foundation. It is licensed under the ASL v2.0. It already interoperates with several existing Apache projects such as Storm, S4, and Samza. Furthermore, it is complementary to existing Apache projects such as Mahout. The initial committers are familiar with the Apache process and subscribes to the Apache mission. Indeed, the team includes multiple Apache committers. Finally, joining Apache will help coordinate the development effort of the growing number of organizations which contribute to SAMOA. == Initial Goals == * Move the existing codebase to Apache * Integrate with the Apache development process * Incremental development and releases per Apache guidelines == Current Status == SAMOA started as a research project at Yahoo Labs in 2013 and was open-sourced in October the same year. It has been under development on Yahoo's public GitHub repository since being open-sourced. It has undergone two releases (0.1, 0.2). === Meritocracy === The SAMOA project already operates on meritocratic principles. Today, SAMOA has several developers and has accepted multiple patches from outside of Yahoo Labs. However, our intent with this incubator proposal is to start building a more diverse developer community around SAMOA that follows the Apache meritocracy model. We will identify all committers and PPMC members for the project operating under the ASF meritocratic principles. We plan to continue support for new contributors and work with those who contribute significantly to the project to make them committers. === Community === SAMOA is currently being used internally at Yahoo. Acceptance into the Apache foundation would bolster the existing user and developer community around SAMOA. That community includes contributors from several institutions, active mostly on GitHub's pages. SAMOA has been starred more than 300 times and forked more than 50 times on GitHub as of November 2014. === Core Developers === The core developers are a diverse group, many of which already very experienced with open source. There are two existing Apache committers, along with people from various companies and universities. === Alignment === The ASF is the natural choice to host SAMOA. First, its goal of encouraging community-driven open-source projects fits with our vision for SAMOA. Additionally, many other projects that SAMOA is based on, such as Apache Storm, S4, Samza, and HDFS, are hosted by the ASF. Close proximity of SAMOA to these projects within the ASF will provide mutual benefit. == Known Risks == === Orphaned Products === Given the current
Re: [VOTE] accept corinthia into incubator
On 06/12/2014 jan i wrote: can help us in many ways, you are e.g. working on FOSDEM, we (you and I) could make a great presentation (I do the work) and use it to present Corinthia and at the same time show that AOO is very much alive. If you agree then I will fly to FOSDEM and also help with AOO I am sure both projects would benefit from it, and Apache as a whole. I don't see your presentations listed yet on the FOSDEM page, but sure, please do submit today the 1-2 presentations I know you are working on, and then we can plan something like the above, thank you! Andrea - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: [VOTE] accept corinthia into incubator
On Sunday, December 14, 2014, Andrea Pescetti pesce...@apache.org wrote: On 06/12/2014 jan i wrote: can help us in many ways, you are e.g. working on FOSDEM, we (you and I) could make a great presentation (I do the work) and use it to present Corinthia and at the same time show that AOO is very much alive. If you agree then I will fly to FOSDEM and also help with AOO I am sure both projects would benefit from it, and Apache as a whole. I don't see your presentations listed yet on the FOSDEM page, but sure, please do submit today the 1-2 presentations I know you are working on, and then we can plan something like the above, thank you! Done, please verify as you have reviewer status. rgds jan i ps. I hesitated for reasons that you and I know, and ended up not submitting a pure AOO TALK (with you and me at the same place it could have been a good opertunity) Andrea - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org -- Sent from My iPad, sorry for any misspellings.
Re: [VOTE] Graduate Samza from the Incubator
+1 (non-binding) -Jinho Best regards 2014-12-13 8:54 GMT+09:00 Jakob Homan jgho...@gmail.com: Restarting vote having fixed resolution detail, dastardly AWOL paragraph breaks and removed nod to increased diversity in introduction. The Samza podling community has voted to graduate from the Incubator. The vote passed with 17 +1s and no -1s or +/-0s. Binding +1s x 10 : Jakob, Chinmay, Yan, Chris Riccomini, Sriram, Zhijie, Martin, Roman, Garry, Chris Douglas Non-binding +1s x 7: Claudio, TJ, Robert, Roger, Danny, Jon, Yi Links to votes and discussions: http://s.apache.org/samzaGradResult http://s.apache.org/samzaGradDiscuss Samza has been incubating for a bit more than a year. In that time the community has: * Completed two Incubator-approved releases * Opened nearly 500 JIRAs * Added five new committers/PMC members. This thread is to vote on the graduation resolution Samza has approved. It will run for at least 96 hours (to Tuesday, 12/22 4pm PST, the extra day to accommodate the weekend and holiday schedule). [ ] +1 Graduate Apache Samza from the Incubator. [ ] +0 Don't care. [ ] -1 Don't graduate Apache Samza from the Incubator because ... Here's my binding vote: +1. -Jakob WHEREAS, the Board of Directors deems it to be in the best interests of the Foundation and consistent with the Foundation's purpose to establish a Project Management Committee charged with the creation and maintenance of open-source software, for distribution at no charge to the public, related to low-latency, distributed processing of streaming data. NOW, THEREFORE, BE IT RESOLVED, that a Project Management Committee (PMC), to be known as the Apache Samza Project, be and hereby is established pursuant to Bylaws of the Foundation; and be it further RESOLVED, that the Apache Samza Project be and hereby is responsible for the creation and maintenance of software related to low-latency, distributed processing of streaming data; and be it further RESOLVED, that the office of Vice President, Apache Samza be and hereby is created, the person holding such office to serve at the direction of the Board of Directors as the chair of the Apache Samza Project, and to have primary responsibility for management of the projects within the scope of responsibility of the Apache Samza Project; and be it further RESOLVED, that the persons listed immediately below be and hereby are appointed to serve as the initial members of the Apache Samza Project: * Chinmay Soman cpsoman at apache dot org * Chris Riccomini criccomini at apache dot org * Garry Turkington garryturk at apache dot org * Jakob Homan jghoman at apache dot org * Jay Kreps jkreps at apache dot org * Martin Kleppman martinkl at apache dot org * Sriram Subramanian sriramsub at apache dot org * Yan Fang yanfang at apache dot org * Zhijie Shen zjshen at apache dot org NOW, THEREFORE, BE IT FURTHER RESOLVED, that Chris Riccomini be appointed to the office of Vice President, Apache Samza, to serve in accordance with and subject to the direction of the Board of Directors and the Bylaws of the Foundation until death, resignation, retirement, removal or disqualification, or until a successor is appointed; and be it further RESOLVED, that the Apache Samza Project be and hereby is tasked with the migration and rationalization of the Apache Incubator Samza podling; and be it further RESOLVED, that all responsibilities pertaining to the Apache Incubator Samza podling encumbered upon the Apache Incubator Project are hereafter discharged. - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: [VOTE] Graduate Samza from the Incubator
On Fri, Dec 12, 2014 at 6:54 PM, Jakob Homan jgho...@gmail.com wrote: Restarting vote having fixed resolution detail, dastardly AWOL paragraph breaks and removed nod to increased diversity in introduction. The Samza podling community has voted to graduate from the Incubator. The vote passed with 17 +1s and no -1s or +/-0s. Binding +1s x 10 : Jakob, Chinmay, Yan, Chris Riccomini, Sriram, Zhijie, Martin, Roman, Garry, Chris Douglas Non-binding +1s x 7: Claudio, TJ, Robert, Roger, Danny, Jon, Yi Links to votes and discussions: http://s.apache.org/samzaGradResult http://s.apache.org/samzaGradDiscuss Samza has been incubating for a bit more than a year. In that time the community has: * Completed two Incubator-approved releases * Opened nearly 500 JIRAs * Added five new committers/PMC members. This thread is to vote on the graduation resolution Samza has approved. It will run for at least 96 hours (to Tuesday, 12/22 4pm PST, the extra day to accommodate the weekend and holiday schedule). [ ] +1 Graduate Apache Samza from the Incubator. [ ] +0 Don't care. [ ] -1 Don't graduate Apache Samza from the Incubator because ... Here's my binding vote: +1. And here is mine.. +1 (binding) -Jakob WHEREAS, the Board of Directors deems it to be in the best interests of the Foundation and consistent with the Foundation's purpose to establish a Project Management Committee charged with the creation and maintenance of open-source software, for distribution at no charge to the public, related to low-latency, distributed processing of streaming data. NOW, THEREFORE, BE IT RESOLVED, that a Project Management Committee (PMC), to be known as the Apache Samza Project, be and hereby is established pursuant to Bylaws of the Foundation; and be it further RESOLVED, that the Apache Samza Project be and hereby is responsible for the creation and maintenance of software related to low-latency, distributed processing of streaming data; and be it further RESOLVED, that the office of Vice President, Apache Samza be and hereby is created, the person holding such office to serve at the direction of the Board of Directors as the chair of the Apache Samza Project, and to have primary responsibility for management of the projects within the scope of responsibility of the Apache Samza Project; and be it further RESOLVED, that the persons listed immediately below be and hereby are appointed to serve as the initial members of the Apache Samza Project: * Chinmay Soman cpsoman at apache dot org * Chris Riccomini criccomini at apache dot org * Garry Turkington garryturk at apache dot org * Jakob Homan jghoman at apache dot org * Jay Kreps jkreps at apache dot org * Martin Kleppman martinkl at apache dot org * Sriram Subramanian sriramsub at apache dot org * Yan Fang yanfang at apache dot org * Zhijie Shen zjshen at apache dot org NOW, THEREFORE, BE IT FURTHER RESOLVED, that Chris Riccomini be appointed to the office of Vice President, Apache Samza, to serve in accordance with and subject to the direction of the Board of Directors and the Bylaws of the Foundation until death, resignation, retirement, removal or disqualification, or until a successor is appointed; and be it further RESOLVED, that the Apache Samza Project be and hereby is tasked with the migration and rationalization of the Apache Incubator Samza podling; and be it further RESOLVED, that all responsibilities pertaining to the Apache Incubator Samza podling encumbered upon the Apache Incubator Project are hereafter discharged. - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: [VOTE] Graduate Samza from the Incubator
+1 (non-binding) -Taylor On Dec 12, 2014, at 6:54 PM, Jakob Homan jgho...@gmail.com wrote: Restarting vote having fixed resolution detail, dastardly AWOL paragraph breaks and removed nod to increased diversity in introduction. The Samza podling community has voted to graduate from the Incubator. The vote passed with 17 +1s and no -1s or +/-0s. Binding +1s x 10 : Jakob, Chinmay, Yan, Chris Riccomini, Sriram, Zhijie, Martin, Roman, Garry, Chris Douglas Non-binding +1s x 7: Claudio, TJ, Robert, Roger, Danny, Jon, Yi Links to votes and discussions: http://s.apache.org/samzaGradResult http://s.apache.org/samzaGradDiscuss Samza has been incubating for a bit more than a year. In that time the community has: * Completed two Incubator-approved releases * Opened nearly 500 JIRAs * Added five new committers/PMC members. This thread is to vote on the graduation resolution Samza has approved. It will run for at least 96 hours (to Tuesday, 12/22 4pm PST, the extra day to accommodate the weekend and holiday schedule). [ ] +1 Graduate Apache Samza from the Incubator. [ ] +0 Don't care. [ ] -1 Don't graduate Apache Samza from the Incubator because ... Here's my binding vote: +1. -Jakob WHEREAS, the Board of Directors deems it to be in the best interests of the Foundation and consistent with the Foundation's purpose to establish a Project Management Committee charged with the creation and maintenance of open-source software, for distribution at no charge to the public, related to low-latency, distributed processing of streaming data. NOW, THEREFORE, BE IT RESOLVED, that a Project Management Committee (PMC), to be known as the Apache Samza Project, be and hereby is established pursuant to Bylaws of the Foundation; and be it further RESOLVED, that the Apache Samza Project be and hereby is responsible for the creation and maintenance of software related to low-latency, distributed processing of streaming data; and be it further RESOLVED, that the office of Vice President, Apache Samza be and hereby is created, the person holding such office to serve at the direction of the Board of Directors as the chair of the Apache Samza Project, and to have primary responsibility for management of the projects within the scope of responsibility of the Apache Samza Project; and be it further RESOLVED, that the persons listed immediately below be and hereby are appointed to serve as the initial members of the Apache Samza Project: * Chinmay Soman cpsoman at apache dot org * Chris Riccomini criccomini at apache dot org * Garry Turkington garryturk at apache dot org * Jakob Homan jghoman at apache dot org * Jay Kreps jkreps at apache dot org * Martin Kleppman martinkl at apache dot org * Sriram Subramanian sriramsub at apache dot org * Yan Fang yanfang at apache dot org * Zhijie Shen zjshen at apache dot org NOW, THEREFORE, BE IT FURTHER RESOLVED, that Chris Riccomini be appointed to the office of Vice President, Apache Samza, to serve in accordance with and subject to the direction of the Board of Directors and the Bylaws of the Foundation until death, resignation, retirement, removal or disqualification, or until a successor is appointed; and be it further RESOLVED, that the Apache Samza Project be and hereby is tasked with the migration and rationalization of the Apache Incubator Samza podling; and be it further RESOLVED, that all responsibilities pertaining to the Apache Incubator Samza podling encumbered upon the Apache Incubator Project are hereafter discharged. - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: [VOTE] accept SAMOA into incubator
With 8 binding +1s (Henry Saputra, Enis Söztutar, Roman Shaposhnik, Konstantin Boudnik, Jakob Homan, Andrew Purtell, John D. Ament and Daniel Dai) and 2 non-binding +1s (P. Taylor Goetz, Naresh Agarwal), the vote passes. Thanks everyone for taking time to vote. I will proceed with next steps. Thanks, Daniel On Sun, Dec 14, 2014 at 3:10 AM, Naresh Agarwal naresh.agar...@inmobi.com wrote: +1 (non-binding) Thanks Naresh On Fri, Dec 12, 2014 at 7:22 AM, John D. Ament john.d.am...@gmail.com wrote: +1 binding On Thu Dec 11 2014 at 5:10:50 PM Konstantin Boudnik c...@apache.org wrote: +1 (binding). I small comment: we don't do users@ list of podlings, do we? If so samoa-users@googlegroups -- us...@samoa.incubator.apache.org will need to be converged into dev@. Not all podlings use a users@, but they can if they like. Usually if it's coming from an established community there will be one. Cos On Thu, Dec 11, 2014 at 10:02AM, Daniel Dai wrote: Following the discussion earlier, I'm calling a vote to accept SAMOA as a new Incubator project. [ ] +1 Accept SAMOA into the Incubator [ ] +0 Indifferent to the acceptance of SAMOA [ ] -1 Do not accept SAMOA because ... The vote will be open for at least 72h and closes at the earliest on Dec 14 19:00 GMT. https://wiki.apache.org/incubator/SAMOAProposal Thanks, Daniel = SAMOA = == Abstract == SAMOA is an an open-source platform for mining big data streams. == Proposal == SAMOA provides a collection of distributed streaming algorithms for the most common data mining and machine learning tasks such as classification, clustering, and regression, as well as programming abstractions to develop new algorithms that run on top of distributed stream processing engines (DSPEs). It features a pluggable architecture that allows it to run on several DSPEs such as Apache Storm, Apache S4, and Apache Samza. == Background == Hadoop and its ecosystem have changed the way data are processed by allowing to push algorithms to unprecedented scale. As an example, Mahout allows to run data mining and machine learning algorithms on very large datasets. However, Hadoop and Mahout are not suited to handle streaming data. Simply put, the goal of SAMOA is to provide a streaming counterpart to Mahout. == Rationale == SAMOA aims to fill the current gap in tools for mining large scale streams. Many organizations can benefit from a scalable stream mining platform system such as SAMOA. SAMOA is a natural fit for the Apache Software Foundation. It is licensed under the ASL v2.0. It already interoperates with several existing Apache projects such as Storm, S4, and Samza. Furthermore, it is complementary to existing Apache projects such as Mahout. The initial committers are familiar with the Apache process and subscribes to the Apache mission. Indeed, the team includes multiple Apache committers. Finally, joining Apache will help coordinate the development effort of the growing number of organizations which contribute to SAMOA. == Initial Goals == * Move the existing codebase to Apache * Integrate with the Apache development process * Incremental development and releases per Apache guidelines == Current Status == SAMOA started as a research project at Yahoo Labs in 2013 and was open-sourced in October the same year. It has been under development on Yahoo's public GitHub repository since being open-sourced. It has undergone two releases (0.1, 0.2). === Meritocracy === The SAMOA project already operates on meritocratic principles. Today, SAMOA has several developers and has accepted multiple patches from outside of Yahoo Labs. However, our intent with this incubator proposal is to start building a more diverse developer community around SAMOA that follows the Apache meritocracy model. We will identify all committers and PPMC members for the project operating under the ASF meritocratic principles. We plan to continue support for new contributors and work with those who contribute significantly to the project to make them committers. === Community === SAMOA is currently being used internally at Yahoo. Acceptance into the Apache foundation would bolster the existing user and developer community around SAMOA. That community includes contributors from several institutions, active mostly on GitHub's pages. SAMOA has been starred more than 300 times and forked more than 50 times on GitHub as of November 2014. === Core Developers === The core developers are a diverse group, many of which already very experienced with open source. There are two existing Apache committers,
Re: [VOTE] Graduate Samza from the Incubator
+1 binding Regards, Alan On Dec 12, 2014, at 3:54 PM, Jakob Homan jgho...@gmail.com wrote: [ ] +1 Graduate Apache Samza from the Incubator. [ ] +0 Don't care. [ ] -1 Don't graduate Apache Samza from the Incubator because ...
Re: [VOTE] accept SAMOA into incubator
+1 binding Regards, Alan On Dec 11, 2014, at 10:02 AM, Daniel Dai dai...@gmail.com wrote: Following the discussion earlier, I'm calling a vote to accept SAMOA as a new Incubator project. [ ] +1 Accept SAMOA into the Incubator [ ] +0 Indifferent to the acceptance of SAMOA [ ] -1 Do not accept SAMOA because ... - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: [DISCUSS] [PROPOSAL] Zeppelin for Apache Incubator
+1 for the proposal. The proposal looks great to me. Could I be nominated as a mentor of Zeppelin? Even though I'm not a IPMC now, I asked to join IPMC few days ago. -hyunsik On Sun, Dec 14, 2014 at 12:27 PM, Ted Dunning ted.dunn...@gmail.com wrote: +1 On Sat, Dec 13, 2014 at 5:18 PM, Roman Shaposhnik r...@apache.org wrote: Hi, I would like to propose Zeppelin as an Apache Incubator project: https://wiki.apache.org/incubator/ZeppelinProposal Please let me know what do you think and feel free to volunteer as additional mentors for the project. The easiest way to get to see what this project looks like in action would be this demo: https://www.youtube.com/watch?v=_PQbVH_aO5E Thanks, Roman. == Abstract == Zeppelin is a collaborative data analytics and visualization tool for distributed, general-purpose data processing systems such as Apache Spark, Apache Flink, etc. == Proposal == Zeppelin is a modern web-based tool for the data scientists to collaborate over large-scale data exploration and visualization projects. It is a notebook style interpreter that enable collaborative analysis sessions sharing between users. Zeppelin is independent of the execution framework itself. Current version runs on top of Apache Spark but it has pluggable interpreter APIs to support other data processing systems. More execution frameworks could be added at a later date i.e Apache Flink, Crunch as well as SQL-like backends such as Hive, Tajo, MRQL. We have a strong preference for the project to be called Zeppelin. In case that may not be feasible, alternative names could be: “Mir”, “Yuga” or “Sora”. == Background == Large scale data analysis workflow includes multiple steps like data acquisition, pre-processing, visualization, etc and may include inter-operation of multiple different tools and technologies. With the widespread of the open source general-purpose data processing systems like Spark there is a lack of open source, modern user-friendly tools that combine strengths of interpreted language for data analysis with new in-browser visualization libraries and collaborative capabilities. Zeppelin initially started as a GUI tool for diverse set of SQL-over-Hadoop systems like Hive, Presto, Shark, etc. It was open source since its inception in Sep 2013. Later, it became clear that there was a need for a greater web-based tool for data scientists to collaborate on data exploration over the large-scale projects, not limited to SQL. So Zeppelin integrated full support of Apache Spark while adding a collaborative environment with the ability to run and share interpreter sessions in-browser == Rationale == There are no open source alternatives for a collaborative notebook-based interpreter with support of multiple distributed data processing systems. As a number of companies adopting and contributing back to Zeppelin is growing, we think that having a long-term home at Apache foundation would be a great fit for the project ensuring that processes and procedures are in place to keep project and community “healthy” and free of any commercial, political or legal faults. == Initial Goals == The initial goals will be to move the existing codebase to Apache and integrate with the Apache development process. This includes moving all infrastructure that we currently maintain, such as: a website, a mailing list, an issues tracker and a Jenkins CI, as mentioned in “Required Resources” section of current proposal. Once this is accomplished, we plan for incremental development and releases that follow the Apache guidelines. To increase adoption the major goal for the project would be to provide integration with as much projects from Apache data ecosystem as possible, including new interpreters for Apache Hive, Apache Drill and adding Zeppelin distribution to Apache Bigtop. On the community building side the main goal is to attract a diverse set of contributors by promoting Zeppelin to wide variety of engineers, starting a Zeppelin user groups around the globe and by engaging with other existing Apache projects communities online. == Current Status == Currently, Zeppelin has 4 released versions and is used in production at a number of companies across the globe mentioned in Affiliation section. Current implementation status is pre-release with public API not being finalized yet. Current main and default backend processing engine is Apache Spark with consistent support of SparkSQL. Zeppelin is distributed as a binary package which includes an embedded webserver, application itself, a set of libraries and startup/shutdown scripts. No platform-specific installation packages are provided yet but it is something we are looking to provide as part of Apache Bigtop integration. Project codebase is currently hosted at github.com, which will form the basis of the Apache git repository. === Meritocracy === Zeppelin is an open source project
Re: [DISCUSS] [PROPOSAL] Zeppelin for Apache Incubator
Would love to see Hyunsik becoming mentor of this project. He was the main driving force to help Tajo graduate successfully to top level project. On Sunday, December 14, 2014, Hyunsik Choi hyun...@apache.org wrote: +1 for the proposal. The proposal looks great to me. Could I be nominated as a mentor of Zeppelin? Even though I'm not a IPMC now, I asked to join IPMC few days ago. -hyunsik On Sun, Dec 14, 2014 at 12:27 PM, Ted Dunning ted.dunn...@gmail.com javascript:; wrote: +1 On Sat, Dec 13, 2014 at 5:18 PM, Roman Shaposhnik r...@apache.org javascript:; wrote: Hi, I would like to propose Zeppelin as an Apache Incubator project: https://wiki.apache.org/incubator/ZeppelinProposal Please let me know what do you think and feel free to volunteer as additional mentors for the project. The easiest way to get to see what this project looks like in action would be this demo: Zeppelin overview https://www.youtube.com/watch?v=_PQbVH_aO5E Thanks, Roman. == Abstract == Zeppelin is a collaborative data analytics and visualization tool for distributed, general-purpose data processing systems such as Apache Spark, Apache Flink, etc. == Proposal == Zeppelin is a modern web-based tool for the data scientists to collaborate over large-scale data exploration and visualization projects. It is a notebook style interpreter that enable collaborative analysis sessions sharing between users. Zeppelin is independent of the execution framework itself. Current version runs on top of Apache Spark but it has pluggable interpreter APIs to support other data processing systems. More execution frameworks could be added at a later date i.e Apache Flink, Crunch as well as SQL-like backends such as Hive, Tajo, MRQL. We have a strong preference for the project to be called Zeppelin. In case that may not be feasible, alternative names could be: “Mir”, “Yuga” or “Sora”. == Background == Large scale data analysis workflow includes multiple steps like data acquisition, pre-processing, visualization, etc and may include inter-operation of multiple different tools and technologies. With the widespread of the open source general-purpose data processing systems like Spark there is a lack of open source, modern user-friendly tools that combine strengths of interpreted language for data analysis with new in-browser visualization libraries and collaborative capabilities. Zeppelin initially started as a GUI tool for diverse set of SQL-over-Hadoop systems like Hive, Presto, Shark, etc. It was open source since its inception in Sep 2013. Later, it became clear that there was a need for a greater web-based tool for data scientists to collaborate on data exploration over the large-scale projects, not limited to SQL. So Zeppelin integrated full support of Apache Spark while adding a collaborative environment with the ability to run and share interpreter sessions in-browser == Rationale == There are no open source alternatives for a collaborative notebook-based interpreter with support of multiple distributed data processing systems. As a number of companies adopting and contributing back to Zeppelin is growing, we think that having a long-term home at Apache foundation would be a great fit for the project ensuring that processes and procedures are in place to keep project and community “healthy” and free of any commercial, political or legal faults. == Initial Goals == The initial goals will be to move the existing codebase to Apache and integrate with the Apache development process. This includes moving all infrastructure that we currently maintain, such as: a website, a mailing list, an issues tracker and a Jenkins CI, as mentioned in “Required Resources” section of current proposal. Once this is accomplished, we plan for incremental development and releases that follow the Apache guidelines. To increase adoption the major goal for the project would be to provide integration with as much projects from Apache data ecosystem as possible, including new interpreters for Apache Hive, Apache Drill and adding Zeppelin distribution to Apache Bigtop. On the community building side the main goal is to attract a diverse set of contributors by promoting Zeppelin to wide variety of engineers, starting a Zeppelin user groups around the globe and by engaging with other existing Apache projects communities online. == Current Status == Currently, Zeppelin has 4 released versions and is used in production at a number of companies across the globe mentioned in Affiliation section. Current implementation status is pre-release with public API not being finalized yet. Current main and default backend processing engine is Apache Spark with consistent support of SparkSQL. Zeppelin is distributed as a binary package which includes an embedded webserver, application
Re: [DISCUSS] [PROPOSAL] Zeppelin for Apache Incubator
On Sun, Dec 14, 2014 at 10:12 PM, Hyunsik Choi hyun...@apache.org wrote: Could I be nominated as a mentor of Zeppelin? Even though I'm not a IPMC now, I asked to join IPMC few days ago. I think I saw the NOTICE that you are joining fly by a day or so ago. You should consider yourself as good as a member of the IPMC at this point. Also, I think that your experience with Tajo would be very useful to Zeppelin. In particular, since Tajo's intent is similar to many of the Hadoop eco-system components while it's pedigree is very different, I think that your viewpoint may be importantly different from the viewpoints of those (like me) who may be too closely attached to the standard Hadoop toolset.