Re: [DISCUSS] MADlib Incubation Proposal
On Mon, Sep 7, 2015 at 5:06 AM, Atri Sharmawrote: > Now that HAWQ vote is closed can we also have a consensus on this proposal > please? I think that would make sense. I'll start a voting thread soon. Thanks, Roman. - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: [DISCUSS] MADlib Incubation Proposal
Now that HAWQ vote is closed can we also have a consensus on this proposal please? On 3 Sep 2015 02:07, "Roman Shaposhnik"wrote: > Hi! > > on the heels of the HAWQ proposal, I'd like > to follow with a discussion of accepting MADlib's > community into the ASF Incubator: > https://wiki.apache.org/incubator/MADlibProposal > > There was an extensive discussion within the existing > open source community and the overall consensus > is extremely supportive of this proposal: > http://madlib.net/pipermail/user/2015-August/ > http://madlib.net/pipermail/devel/2015-August/ > > We've done quite a bit of outreach in order to identify > all the folks who may be interested in joining the initial > list of committers. The current proposal reflects that. > Additionally, we hope that the ASF DISCUSS thread > will help us in reaching out even further. > > Finally, while 3 experienced mentors currently mentioned > on the proposal seems like a reasonable number, we would > love if other folks from IPMC could volunteer to help us on > this journey. > > Thanks, > Roman. > > == Abstract == > MADlib is an open-source library (licensed under 2-clause BSD license) > for scalable in-database analytics. It provides data-parallel > implementations of mathematical, statistical and machine learning > methods for structured and unstructured data. The MADlib mission is to > foster widespread development of scalable analytic skills, by > harnessing efforts from commercial practice, academic research, and > open source development. > > MADlib occupies a unique niche in the realm of data science and > machine learning libraries since its SQL APIs can allow it to work on > a wide range of data stores and SQL engines. > > == Proposal == > The current open source community behind MADlib feels that aligning > itself with HAWQ's community, governance model, infrastructure and > roadmap will allow the project to accelerate adoption and community > growth. Given HAWQ's trajectory of entering Apache Software Foundation > family as an Incubating project, we feel that the best course of > action for MADlib is to follow a similar route. > > MADlib and HAWQ are complementary technologies in that MADlib > in-database analytical functions can run within the HAWQ execution > engine. (MADlib also runs on Greenplum Database and PostgreSQL today.) > It is expected that contributors to MADlib will be cognizant of the > HAWQ ASF project and may contribute to it as well. In short, > collaboration between the two communities will make both projects more > vibrant and advance the respective technologies in potentially novel > directions. > > Contributors may also look at the HAWQ project as a starting port for > ports to other parallel database engines. This proposal highly > encourages this type of work as it would help to further realize the > original cross-platform goal of MADlib as envisioned by its > originators. > > Thus, the goal of this proposal is to bring the existing MADlib open > source community into ASF, change the project's governance model to > the "Apache Way" and transition the project's codebase and > infrastructure into ASF INFRA. The community has agreed to transfer > the brand name "MADlib" to Apache Software Foundation as well. > > Pivotal Inc. on behalf of the MADlib open source community is > submitting this proposal to transition source code and associated > artifacts (documentation, web site content, wiki, etc.) to the Apache > Software Foundation Incubator under the Apache License, Version 2.0 > and is asking Incubator PMC to established a MADlib incubating > project. > > Currently MADlib uses a few category X licensed software tools during > its build (mostly for generating documentation): >* doxypy 0.4.2 (GPL) >* doxygen 1.8.4 (GPL) >* TikZ-UML >* bison 2.4 (GPL, with an exception for generated output) > We feel that this usage is compatible with an overall project licensed > under the ALv2 and don't anticipate any changes. > Our usage of LGPL library cern_root-5.34 is expected to go away since > the 2 cern modules used are being entirely re-written > in MADlib > > Finally, MADlib inclusion of MPL licensed library (eigen 3.2.2) into > its binary artifact seems to be consistent with > ASF recommendation for managing "weak copyleft" dependencies. > > > == Background == > MADlib grew out of discussions between database engine developers, > data scientists, IT architects and academics interested in new > approaches to scalable, sophisticated in-database analytics. These > discussions were written up in a paper in VLDB 2009 that coined the > term “MAD Skills” for data analysis > (http://dl.acm.org/citation.cfm?id=1687576). The MADlib software > project began the following year as a collaboration between > researchers at UC Berkeley and engineers and data scientists at > Pivotal (former EMC/Greenplum). > > The initial MADlib codebase came from EMC/Greenplum, UC Berkeley, the > University
[DISCUSS] MADlib Incubation Proposal
Hi! on the heels of the HAWQ proposal, I'd like to follow with a discussion of accepting MADlib's community into the ASF Incubator: https://wiki.apache.org/incubator/MADlibProposal There was an extensive discussion within the existing open source community and the overall consensus is extremely supportive of this proposal: http://madlib.net/pipermail/user/2015-August/ http://madlib.net/pipermail/devel/2015-August/ We've done quite a bit of outreach in order to identify all the folks who may be interested in joining the initial list of committers. The current proposal reflects that. Additionally, we hope that the ASF DISCUSS thread will help us in reaching out even further. Finally, while 3 experienced mentors currently mentioned on the proposal seems like a reasonable number, we would love if other folks from IPMC could volunteer to help us on this journey. Thanks, Roman. == Abstract == MADlib is an open-source library (licensed under 2-clause BSD license) for scalable in-database analytics. It provides data-parallel implementations of mathematical, statistical and machine learning methods for structured and unstructured data. The MADlib mission is to foster widespread development of scalable analytic skills, by harnessing efforts from commercial practice, academic research, and open source development. MADlib occupies a unique niche in the realm of data science and machine learning libraries since its SQL APIs can allow it to work on a wide range of data stores and SQL engines. == Proposal == The current open source community behind MADlib feels that aligning itself with HAWQ's community, governance model, infrastructure and roadmap will allow the project to accelerate adoption and community growth. Given HAWQ's trajectory of entering Apache Software Foundation family as an Incubating project, we feel that the best course of action for MADlib is to follow a similar route. MADlib and HAWQ are complementary technologies in that MADlib in-database analytical functions can run within the HAWQ execution engine. (MADlib also runs on Greenplum Database and PostgreSQL today.) It is expected that contributors to MADlib will be cognizant of the HAWQ ASF project and may contribute to it as well. In short, collaboration between the two communities will make both projects more vibrant and advance the respective technologies in potentially novel directions. Contributors may also look at the HAWQ project as a starting port for ports to other parallel database engines. This proposal highly encourages this type of work as it would help to further realize the original cross-platform goal of MADlib as envisioned by its originators. Thus, the goal of this proposal is to bring the existing MADlib open source community into ASF, change the project's governance model to the "Apache Way" and transition the project's codebase and infrastructure into ASF INFRA. The community has agreed to transfer the brand name "MADlib" to Apache Software Foundation as well. Pivotal Inc. on behalf of the MADlib open source community is submitting this proposal to transition source code and associated artifacts (documentation, web site content, wiki, etc.) to the Apache Software Foundation Incubator under the Apache License, Version 2.0 and is asking Incubator PMC to established a MADlib incubating project. Currently MADlib uses a few category X licensed software tools during its build (mostly for generating documentation): * doxypy 0.4.2 (GPL) * doxygen 1.8.4 (GPL) * TikZ-UML * bison 2.4 (GPL, with an exception for generated output) We feel that this usage is compatible with an overall project licensed under the ALv2 and don't anticipate any changes. Our usage of LGPL library cern_root-5.34 is expected to go away since the 2 cern modules used are being entirely re-written in MADlib Finally, MADlib inclusion of MPL licensed library (eigen 3.2.2) into its binary artifact seems to be consistent with ASF recommendation for managing "weak copyleft" dependencies. == Background == MADlib grew out of discussions between database engine developers, data scientists, IT architects and academics interested in new approaches to scalable, sophisticated in-database analytics. These discussions were written up in a paper in VLDB 2009 that coined the term “MAD Skills” for data analysis (http://dl.acm.org/citation.cfm?id=1687576). The MADlib software project began the following year as a collaboration between researchers at UC Berkeley and engineers and data scientists at Pivotal (former EMC/Greenplum). The initial MADlib codebase came from EMC/Greenplum, UC Berkeley, the University of Wisconsin, and the University of Florida. The project was publicly documented in a paper at VLDB 2012 (http://vldb.org/pvldb/vol5/p1700_joehellerstein_vldb2012.pdf). Today MADlib has contributors from around the world including both individuals and institutions. For example, recent contributions have come from Pivotal, Stanford
Re: [DISCUSS] MADlib Incubation Proposal
I am very happy to see this proposal. I think combination of HAWQ and MADlib makes it possible to have heavy production level analytics on top of Hadoop (which is fantastic!). That said, given MADlib 's flexibility, I feel it would be a great addition to Apache big data stack in general and I am eagerly looking forward to integration efforts with various existing Apache big data members. Regards, Atri On 3 Sep 2015 02:07, "Roman Shaposhnik"wrote: > Hi! > > on the heels of the HAWQ proposal, I'd like > to follow with a discussion of accepting MADlib's > community into the ASF Incubator: > https://wiki.apache.org/incubator/MADlibProposal > > There was an extensive discussion within the existing > open source community and the overall consensus > is extremely supportive of this proposal: > http://madlib.net/pipermail/user/2015-August/ > http://madlib.net/pipermail/devel/2015-August/ > > We've done quite a bit of outreach in order to identify > all the folks who may be interested in joining the initial > list of committers. The current proposal reflects that. > Additionally, we hope that the ASF DISCUSS thread > will help us in reaching out even further. > > Finally, while 3 experienced mentors currently mentioned > on the proposal seems like a reasonable number, we would > love if other folks from IPMC could volunteer to help us on > this journey. > > Thanks, > Roman. > > == Abstract == > MADlib is an open-source library (licensed under 2-clause BSD license) > for scalable in-database analytics. It provides data-parallel > implementations of mathematical, statistical and machine learning > methods for structured and unstructured data. The MADlib mission is to > foster widespread development of scalable analytic skills, by > harnessing efforts from commercial practice, academic research, and > open source development. > > MADlib occupies a unique niche in the realm of data science and > machine learning libraries since its SQL APIs can allow it to work on > a wide range of data stores and SQL engines. > > == Proposal == > The current open source community behind MADlib feels that aligning > itself with HAWQ's community, governance model, infrastructure and > roadmap will allow the project to accelerate adoption and community > growth. Given HAWQ's trajectory of entering Apache Software Foundation > family as an Incubating project, we feel that the best course of > action for MADlib is to follow a similar route. > > MADlib and HAWQ are complementary technologies in that MADlib > in-database analytical functions can run within the HAWQ execution > engine. (MADlib also runs on Greenplum Database and PostgreSQL today.) > It is expected that contributors to MADlib will be cognizant of the > HAWQ ASF project and may contribute to it as well. In short, > collaboration between the two communities will make both projects more > vibrant and advance the respective technologies in potentially novel > directions. > > Contributors may also look at the HAWQ project as a starting port for > ports to other parallel database engines. This proposal highly > encourages this type of work as it would help to further realize the > original cross-platform goal of MADlib as envisioned by its > originators. > > Thus, the goal of this proposal is to bring the existing MADlib open > source community into ASF, change the project's governance model to > the "Apache Way" and transition the project's codebase and > infrastructure into ASF INFRA. The community has agreed to transfer > the brand name "MADlib" to Apache Software Foundation as well. > > Pivotal Inc. on behalf of the MADlib open source community is > submitting this proposal to transition source code and associated > artifacts (documentation, web site content, wiki, etc.) to the Apache > Software Foundation Incubator under the Apache License, Version 2.0 > and is asking Incubator PMC to established a MADlib incubating > project. > > Currently MADlib uses a few category X licensed software tools during > its build (mostly for generating documentation): >* doxypy 0.4.2 (GPL) >* doxygen 1.8.4 (GPL) >* TikZ-UML >* bison 2.4 (GPL, with an exception for generated output) > We feel that this usage is compatible with an overall project licensed > under the ALv2 and don't anticipate any changes. > Our usage of LGPL library cern_root-5.34 is expected to go away since > the 2 cern modules used are being entirely re-written > in MADlib > > Finally, MADlib inclusion of MPL licensed library (eigen 3.2.2) into > its binary artifact seems to be consistent with > ASF recommendation for managing "weak copyleft" dependencies. > > > == Background == > MADlib grew out of discussions between database engine developers, > data scientists, IT architects and academics interested in new > approaches to scalable, sophisticated in-database analytics. These > discussions were written up in a paper in VLDB 2009 that coined the > term “MAD Skills” for data