Re: [DISCUSS] HAWQ Incubation Proposal
If everything is fine, should we call for a vote on proposal? On Sun, Aug 30, 2015 at 3:06 PM, Bertrand Delacretazwrote: > On Sat, Aug 29, 2015 at 7:54 PM, Justin Erenkrantz > wrote: > > On Fri, Aug 28, 2015 at 7:45 PM, Roman Shaposhnik > wrote: > >> ...With Justin volunteering at this point we've got 6 very active, very > >> experienced mentors. I really don't think the # of committers should be > >> a problem. > > > > I agree with Roman... > > Ok, I'll trust you guys on this then! > > -Bertrand > > - > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org > For additional commands, e-mail: general-h...@incubator.apache.org > >
Re: [DISCUSS] HAWQ Incubation Proposal
On Sat, Aug 29, 2015 at 7:54 PM, Justin Erenkrantz jus...@erenkrantz.com wrote: On Fri, Aug 28, 2015 at 7:45 PM, Roman Shaposhnik ro...@shaposhnik.org wrote: ...With Justin volunteering at this point we've got 6 very active, very experienced mentors. I really don't think the # of committers should be a problem. I agree with Roman... Ok, I'll trust you guys on this then! -Bertrand - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: [DISCUSS] HAWQ Incubation Proposal
On Fri, Aug 28, 2015 at 7:45 PM, Roman Shaposhnik ro...@shaposhnik.org wrote: I would much prefer a smaller list of initial committers who have been identified as having experience or a solid potential to be ASF committers, and let others be elected based on merit as the project progresses. I would agree that this could be a problem if the project didn't have enough active mentors to help a large # of folks master the Apache Way. With Justin volunteering at this point we've got 6 very active, very experienced mentors. I really don't think the # of committers should be a problem. I agree with Roman. I think that it would be better to have the list of initial committers be closer to reality (in the eyes of the proposed project) than artificially limit it. During the incubation process, the community can work through the process of expanding (or contracting if needed - hopefully not!) the community. Cheers. -- justin - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: [DISCUSS] HAWQ Incubation Proposal
On Thu, Aug 27, 2015 at 3:20 PM, Justin Erenkrantz jus...@erenkrantz.com wrote: On Thu, Aug 20, 2015 at 11:14 PM, Roman Shaposhnik r...@apache.org wrote: Hi! I would like to start a discussion on accepting HAWQ into ASF Incubator. The proposal is available at: https://wiki.apache.org/incubator/HAWQProposal and is also attached to the end of this email. If HAWQ desires more mentors, I'd be willing to be included as well. Justin, thanks a million for volunteering! I've included you on the proposal. Thanks, Roman. - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: [DISCUSS] HAWQ Incubation Proposal
On Mon, Aug 24, 2015 at 12:47 AM, Bertrand Delacretaz bdelacre...@apache.org wrote: Hi, On Sat, Aug 22, 2015 at 12:35 AM, Roman Shaposhnik ro...@shaposhnik.org wrote: On Fri, Aug 21, 2015 at 1:46 AM, Bertrand Delacretaz bdelacre...@apache.org wrote: ... There's some GPL/LGPL stuff in there, IMO the proposal should include a plan for coping with those. Can you help me understand which bits of those dependencies do you see as problematic?... They are not necessarily problematic but the podling needs to be aware of the GPL/LGPL mentions at http://apache.org/legal/resolved.html and evaluate those dependencies accordingly. I didn't see a mention of that in the proposal. Good point. I called it out explicitly now. Thanks, Roman. - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: [DISCUSS] HAWQ Incubation Proposal
On Thu, Aug 20, 2015 at 11:44 PM, Bertrand Delacretaz bdelacre...@apache.org wrote: On Fri, Aug 21, 2015 at 5:14 AM, Roman Shaposhnik r...@apache.org wrote: ...most of the core developers are currently NOT affiliated with the ASF and would require new ICLAs before committing to the project ... == Affiliations == ... * Pivotal: everyone else on this proposal... So IIUC that's about 50 committers from the same company and most of them don't have experience with open source, or at least not at the ASF. Well, like the proposal says -- most don't but at least ~10 do (those are the same guys working on Geode). Doesn't that drastically lower the chances of the project creating a diverse community? I would much prefer a smaller list of initial committers who have been identified as having experience or a solid potential to be ASF committers, and let others be elected based on merit as the project progresses. I would agree that this could be a problem if the project didn't have enough active mentors to help a large # of folks master the Apache Way. With Justin volunteering at this point we've got 6 very active, very experienced mentors. I really don't think the # of committers should be a problem. Thanks, Roman. - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: [DISCUSS] HAWQ Incubation Proposal
On Thu, Aug 20, 2015 at 11:14 PM, Roman Shaposhnik r...@apache.org wrote: Hi! I would like to start a discussion on accepting HAWQ into ASF Incubator. The proposal is available at: https://wiki.apache.org/incubator/HAWQProposal and is also attached to the end of this email. If HAWQ desires more mentors, I'd be willing to be included as well. Cheers. -- justin - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: [DISCUSS] HAWQ Incubation Proposal
Hi, On Sat, Aug 22, 2015 at 12:35 AM, Roman Shaposhnik ro...@shaposhnik.org wrote: On Fri, Aug 21, 2015 at 1:46 AM, Bertrand Delacretaz bdelacre...@apache.org wrote: ... There's some GPL/LGPL stuff in there, IMO the proposal should include a plan for coping with those. Can you help me understand which bits of those dependencies do you see as problematic?... They are not necessarily problematic but the podling needs to be aware of the GPL/LGPL mentions at http://apache.org/legal/resolved.html and evaluate those dependencies accordingly. I didn't see a mention of that in the proposal. -Bertrand - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: [DISCUSS] HAWQ Incubation Proposal
On Fri, Aug 21, 2015 at 3:00 AM, Ted Dunning ted.dunn...@gmail.com wrote: Is Madlib even viable as an independent project? Yes it is extremely viable. There's been a number of prototypes of porting MADlib to other SQL-on-Hadoop projects (Impala is the one I know of) and also quite a bit of activity on the PostgreSQL side like a recent GSoC project where a student added a clustering algorithms to MADLIB as part of PostgreSQL GSoC, not even MADlib's one. Finally it is still hugely popular with GreenplumDB users. Now, of course GreenplumDB hasn't been open sourced yet, but we're actively working on making it happen soon. Hortonworks folks working on Hive were pretty enthusiastic about a possible MADlib integration story as well. Should it be part of the overall Hawq project? Its funny that you ask, because MADlib community is giving me feedback that I've managed to downplay how I managed to downplay MADlib's cross-platform nature: http://madlib.net/pipermail/user/2015-August/000212.html Thanks, Roman. - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: [DISCUSS] HAWQ Incubation Proposal
On Fri, Aug 21, 2015 at 1:46 AM, Bertrand Delacretaz bdelacre...@apache.org wrote: On Fri, Aug 21, 2015 at 5:14 AM, Roman Shaposhnik r...@apache.org wrote: ... == External Dependencies ==... There's some GPL/LGPL stuff in there, IMO the proposal should include a plan for coping with those. Can you help me understand which bits of those dependencies do you see as problematic? Lets walk through them one-by-one: * gperf -- this is simply an external utility that HAWQ executes for non-essential parts of its functionality. This should be ok. * libgsasl and libuuid-2.26 -- those are LGPL runtime dependencies with no source code bleeding into HAWQ implementation. Should be no different from any other ASF project implemented in C/C++ dynamically linking against myriard of LGPL libraries Am I missing something here? Thanks, Roman. P.S. Here's, for example, what Apache Subversion links against: $ ldd /usr/bin/svn linux-vdso.so.1 = (0x7fffe6785000) libsvn_client-1.so.1 = /usr/lib/x86_64-linux-gnu/libsvn_client-1.so.1 (0x7f58036cd000) libsvn_wc-1.so.1 = /usr/lib/x86_64-linux-gnu/libsvn_wc-1.so.1 (0x7f5803424000) libsvn_ra-1.so.1 = /usr/lib/x86_64-linux-gnu/libsvn_ra-1.so.1 (0x7f5803215000) libsvn_delta-1.so.1 = /usr/lib/x86_64-linux-gnu/libsvn_delta-1.so.1 (0x7f5803002000) libsvn_diff-1.so.1 = /usr/lib/x86_64-linux-gnu/libsvn_diff-1.so.1 (0x7f5802dee000) libsvn_subr-1.so.1 = /usr/lib/x86_64-linux-gnu/libsvn_subr-1.so.1 (0x7f5802b77000) libapr-1.so.0 = /usr/lib/x86_64-linux-gnu/libapr-1.so.0 (0x7f5802946000) libpthread.so.0 = /lib/x86_64-linux-gnu/libpthread.so.0 (0x7f5802728000) libc.so.6 = /lib/x86_64-linux-gnu/libc.so.6 (0x7f5802361000) libaprutil-1.so.0 = /usr/lib/x86_64-linux-gnu/libaprutil-1.so.0 (0x7f580213a000) libsvn_ra_local-1.so.1 = /usr/lib/x86_64-linux-gnu/libsvn_ra_local-1.so.1 (0x7f5801f31000) libsvn_ra_svn-1.so.1 = /usr/lib/x86_64-linux-gnu/libsvn_ra_svn-1.so.1 (0x7f5801d12000) libsvn_ra_serf-1.so.1 = /usr/lib/x86_64-linux-gnu/libsvn_ra_serf-1.so.1 (0x7f5801ae3000) libz.so.1 = /lib/x86_64-linux-gnu/libz.so.1 (0x7f58018ca000) libexpat.so.1 = /lib/x86_64-linux-gnu/libexpat.so.1 (0x7f580169f000) libsqlite3.so.0 = /usr/lib/x86_64-linux-gnu/libsqlite3.so.0 (0x7f58013e6000) libuuid.so.1 = /lib/x86_64-linux-gnu/libuuid.so.1 (0x7f58011e) libdl.so.2 = /lib/x86_64-linux-gnu/libdl.so.2 (0x7f5800fdc000) /lib64/ld-linux-x86-64.so.2 (0x7f580393d000) libcrypt.so.1 = /lib/x86_64-linux-gnu/libcrypt.so.1 (0x7f5800da3000) libsvn_repos-1.so.1 = /usr/lib/x86_64-linux-gnu/libsvn_repos-1.so.1 (0x7f5800b6f000) libsvn_fs-1.so.1 = /usr/lib/x86_64-linux-gnu/libsvn_fs-1.so.1 (0x7f5800965000) libsasl2.so.2 = /usr/lib/x86_64-linux-gnu/libsasl2.so.2 (0x7f580074a000) libserf-1.so.1 = /usr/lib/x86_64-linux-gnu/libserf-1.so.1 (0x7f580053) libsvn_fs_fs-1.so.1 = /usr/lib/x86_64-linux-gnu/libsvn_fs_fs-1.so.1 (0x7f58002fc000) libsvn_fs_base-1.so.1 = /usr/lib/x86_64-linux-gnu/libsvn_fs_base-1.so.1 (0x7f58000ce000) libsvn_fs_util-1.so.1 = /usr/lib/x86_64-linux-gnu/libsvn_fs_util-1.so.1 (0x7f57ffecb000) libssl.so.1.0.0 = /lib/x86_64-linux-gnu/libssl.so.1.0.0 (0x7f57ffc6d000) libcrypto.so.1.0.0 = /lib/x86_64-linux-gnu/libcrypto.so.1.0.0 (0x7f57ff892000) libdb-5.3.so = /usr/lib/x86_64-linux-gnu/libdb-5.3.so (0x7f57ff4f) - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: [DISCUSS] HAWQ Incubation Proposal
Hi Ted! this is one of those blind spot things I can't even explain. I was so sure Drill was implemented in C/C++ that I never bothered to check. I'm now curios as to who may have implanted that false memory (did you guys talk about when it was first getting designed?). But anyway, more to the point: great feedback. That section really doesn't make much sense if Drill is 100% Java. The wiki has been updated. Thanks, Roman. On Thu, Aug 20, 2015 at 9:02 PM, Ted Dunning ted.dunn...@gmail.com wrote: Drill is implemented entirely in Java. This isn't core to the proposal, but it would be better corrected. On Thu, Aug 20, 2015 at 8:33 PM, 김영우 (Youngwoo Kim) warwit...@gmail.com wrote: Hi Roman, Great news! BTW, it might be a invalid URL for the proposal. Should be https://wiki.apache.org/incubator/HAWQProposal ? Thanks, Youngwoo On Fri, Aug 21, 2015 at 12:14 PM, Roman Shaposhnik r...@apache.org wrote: Hi! I would like to start a discussion on accepting HAWQ into ASF Incubator. The proposal is available at: https://wiki.apache.org/incubator/ApexProposal and is also attached to the end of this email. Please note, that this proposal is very complementary to the desire of HAWQ's sister project (MADlib) to join ASF Incubator: http://madlib.net/pipermail/user/2015-August/ http://madlib.net/pipermail/devel/2015-August/ I've volunteered to help MADlib community and we're currently working on a separate proposal to be submitted later next week. If you're interested in monitoring progress of that please see updates to: https://github.com/madlib/madlib/wiki/MADlib-ASF-Incubator-Proposal and later: https://wiki.apache.org/incubator/MADlibProposal Thanks in advance for your time and help. Thanks, Roman. == Abstract == HAWQ is an advanced enterprise SQL on Hadoop analytic engine built around a robust and high-performance massively-parallel processing (MPP) SQL framework evolved from Pivotal Greenplum DatabaseⓇ. HAWQ runs natively on Apache HadoopⓇ clusters by tightly integrating with HDFS and YARN. HAWQ supports multiple Hadoop file formats such as Apache Parquet, native HDFS, and Apache Avro. HAWQ is configured and managed as a Hadoop service in Apache Ambari. HAWQ is 100% ANSI SQL compliant (supporting ANSI SQL-92, SQL-99, and SQL-2003, plus OLAP extensions) and supports open database connectivity (ODBC) and Java database connectivity (JDBC), as well. Most business intelligence, data analysis and data visualization tools work with HAWQ out of the box without the need for specialized drivers. A unique aspect of HAWQ is its integration of statistical and machine learning capabilities that can be natively invoked from SQL or (in the context of PL/Python, PL/Java or PL/R) in massively parallel modes and applied to large data sets across a Hadoop cluster. These capabilities are provided through MADlib – an existing open source, parallel machine-learning library. Given the close ties between the two development communities, the MADlib community has expressed interest in joining HAWQ on its journey into the ASF Incubator and will be submitting a separate, concurrent proposal. HAWQ will provide more robust and higher performing options for Hadoop environments that demand best-in-class data analytics for business critical purposes. HAWQ is implemented in C and C++. == Proposal == The goal of this proposal is to bring the core of Pivotal Software, Inc.’s (Pivotal) Pivotal HAWQⓇ codebase into the Apache Software Foundation (ASF) in order to build a vibrant, diverse and self-governed open source community around the technology. Pivotal has agreed to transfer the brand name HAWQ to Apache Software Foundation and will stop using HAWQ to refer to this software if the project gets accepted into the ASF Incubator under the name of Apache HAWQ (incubating). Pivotal will continue to market and sell an analytic engine product that includes Apache HAWQ (incubating). While HAWQ is our primary choice for a name of the project, in anticipation of any potential issues with PODLINGNAMESEARCH we have come up with two alternative names: (1) Hornet; or (2) Grove. Pivotal is submitting this proposal to donate the HAWQ source code and associated artifacts (documentation, web site content, wiki, etc.) to the Apache Software Foundation Incubator under the Apache License, Version 2.0 and is asking Incubator PMC to establish an open source community. == Background == While the ecosystem of open source SQL-on-Hadoop solutions is fairly developed by now, HAWQ has several unique features that will set it apart from existing ASF and non-ASF projects. HAWQ made its debut in 2013 as a closed source product leveraging a decade's worth of product development effort invested in Greenplum DatabaseⓇ. Since then HAWQ has rapidly gained a solid customer base and
Re: [DISCUSS] HAWQ Incubation Proposal
On Fri, Aug 21, 2015 at 3:15 PM, Roman Shaposhnik ro...@shaposhnik.org wrote: this is one of those blind spot things I can't even explain. I was so sure Drill was implemented in C/C++ that I never bothered to check. I'm now curios as to who may have implanted that false memory (did you guys talk about when it was first getting designed?). It may be that the fact that our file system (MapR FS) is in C may have led to the presumption that Drill gets performance from being in C.
Re: [DISCUSS] HAWQ Incubation Proposal
On Fri, Aug 21, 2015 at 2:54 AM, Christian Tzolov christian.tzo...@gmail.com wrote: Is PXF (HAWQ extension framework) going to be managed together with the HAWQ ASF project or as a child one? For now it is part of the same codebase managed by the same community. If it gets picked up by other communities we may start asking the question of a subproject or perhaps a standalone TLP. Sort of along the lines of what happened to ORC. But that's way out there in the future. Thanks, Roman. - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: [DISCUSS] HAWQ Incubation Proposal
On Fri, Aug 21, 2015 at 5:14 AM, Roman Shaposhnik r...@apache.org wrote: ...most of the core developers are currently NOT affiliated with the ASF and would require new ICLAs before committing to the project ... == Affiliations == ... * Pivotal: everyone else on this proposal... So IIUC that's about 50 committers from the same company and most of them don't have experience with open source, or at least not at the ASF. Doesn't that drastically lower the chances of the project creating a diverse community? I would much prefer a smaller list of initial committers who have been identified as having experience or a solid potential to be ASF committers, and let others be elected based on merit as the project progresses. -Bertrand - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: [DISCUSS] HAWQ Incubation Proposal
On Fri, Aug 21, 2015 at 5:14 AM, Roman Shaposhnik r...@apache.org wrote: ... == External Dependencies ==... There's some GPL/LGPL stuff in there, IMO the proposal should include a plan for coping with those. -Bertrand - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: [DISCUSS] HAWQ Incubation Proposal
Is PXF (HAWQ extension framework) going to be managed together with the HAWQ ASF project or as a child one? On Fri, Aug 21, 2015 at 8:46 AM, Bertrand Delacretaz bdelacre...@apache.org wrote: On Fri, Aug 21, 2015 at 5:14 AM, Roman Shaposhnik r...@apache.org wrote: ... == External Dependencies ==... There's some GPL/LGPL stuff in there, IMO the proposal should include a plan for coping with those. -Bertrand - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: [DISCUSS] HAWQ Incubation Proposal
Is Madlib even viable as an independent project? Should it be part of the overall Hawq project? On Thu, Aug 20, 2015 at 8:14 PM, Roman Shaposhnik r...@apache.org wrote: Hi! I would like to start a discussion on accepting HAWQ into ASF Incubator. The proposal is available at: https://wiki.apache.org/incubator/ApexProposal and is also attached to the end of this email. Please note, that this proposal is very complementary to the desire of HAWQ's sister project (MADlib) to join ASF Incubator: http://madlib.net/pipermail/user/2015-August/ http://madlib.net/pipermail/devel/2015-August/ I've volunteered to help MADlib community and we're currently working on a separate proposal to be submitted later next week. If you're interested in monitoring progress of that please see updates to: https://github.com/madlib/madlib/wiki/MADlib-ASF-Incubator-Proposal and later: https://wiki.apache.org/incubator/MADlibProposal Thanks in advance for your time and help. Thanks, Roman. == Abstract == HAWQ is an advanced enterprise SQL on Hadoop analytic engine built around a robust and high-performance massively-parallel processing (MPP) SQL framework evolved from Pivotal Greenplum DatabaseⓇ. HAWQ runs natively on Apache HadoopⓇ clusters by tightly integrating with HDFS and YARN. HAWQ supports multiple Hadoop file formats such as Apache Parquet, native HDFS, and Apache Avro. HAWQ is configured and managed as a Hadoop service in Apache Ambari. HAWQ is 100% ANSI SQL compliant (supporting ANSI SQL-92, SQL-99, and SQL-2003, plus OLAP extensions) and supports open database connectivity (ODBC) and Java database connectivity (JDBC), as well. Most business intelligence, data analysis and data visualization tools work with HAWQ out of the box without the need for specialized drivers. A unique aspect of HAWQ is its integration of statistical and machine learning capabilities that can be natively invoked from SQL or (in the context of PL/Python, PL/Java or PL/R) in massively parallel modes and applied to large data sets across a Hadoop cluster. These capabilities are provided through MADlib – an existing open source, parallel machine-learning library. Given the close ties between the two development communities, the MADlib community has expressed interest in joining HAWQ on its journey into the ASF Incubator and will be submitting a separate, concurrent proposal. HAWQ will provide more robust and higher performing options for Hadoop environments that demand best-in-class data analytics for business critical purposes. HAWQ is implemented in C and C++. == Proposal == The goal of this proposal is to bring the core of Pivotal Software, Inc.’s (Pivotal) Pivotal HAWQⓇ codebase into the Apache Software Foundation (ASF) in order to build a vibrant, diverse and self-governed open source community around the technology. Pivotal has agreed to transfer the brand name HAWQ to Apache Software Foundation and will stop using HAWQ to refer to this software if the project gets accepted into the ASF Incubator under the name of Apache HAWQ (incubating). Pivotal will continue to market and sell an analytic engine product that includes Apache HAWQ (incubating). While HAWQ is our primary choice for a name of the project, in anticipation of any potential issues with PODLINGNAMESEARCH we have come up with two alternative names: (1) Hornet; or (2) Grove. Pivotal is submitting this proposal to donate the HAWQ source code and associated artifacts (documentation, web site content, wiki, etc.) to the Apache Software Foundation Incubator under the Apache License, Version 2.0 and is asking Incubator PMC to establish an open source community. == Background == While the ecosystem of open source SQL-on-Hadoop solutions is fairly developed by now, HAWQ has several unique features that will set it apart from existing ASF and non-ASF projects. HAWQ made its debut in 2013 as a closed source product leveraging a decade's worth of product development effort invested in Greenplum DatabaseⓇ. Since then HAWQ has rapidly gained a solid customer base and became available on non-Pivotal distributions of Hadoop. In 2015 HAWQ still leverages the rock solid foundation of Greenplum Database, while at the same time embracing elasticity and resource management native to Hadoop applications. This allows HAWQ to provide superior SQL on Hadoop performance, scalability and coverage while also providing massively-parallel machine learning capabilities and support for native Hadoop file formats. In addition, HAWQ's advanced features include support for complex joins, rich and compliant SQL dialect and industry-differentiating data federation capabilities. Dynamic pipelining and pluggable query optimizer architecture enable HAWQ to perform queries on Hadoop with the speed and scalability required for enterprise data warehouse (EDW) workloads. HAWQ provides strong support for
[DISCUSS] HAWQ Incubation Proposal
Hi! I would like to start a discussion on accepting HAWQ into ASF Incubator. The proposal is available at: https://wiki.apache.org/incubator/ApexProposal and is also attached to the end of this email. Please note, that this proposal is very complementary to the desire of HAWQ's sister project (MADlib) to join ASF Incubator: http://madlib.net/pipermail/user/2015-August/ http://madlib.net/pipermail/devel/2015-August/ I've volunteered to help MADlib community and we're currently working on a separate proposal to be submitted later next week. If you're interested in monitoring progress of that please see updates to: https://github.com/madlib/madlib/wiki/MADlib-ASF-Incubator-Proposal and later: https://wiki.apache.org/incubator/MADlibProposal Thanks in advance for your time and help. Thanks, Roman. == Abstract == HAWQ is an advanced enterprise SQL on Hadoop analytic engine built around a robust and high-performance massively-parallel processing (MPP) SQL framework evolved from Pivotal Greenplum DatabaseⓇ. HAWQ runs natively on Apache HadoopⓇ clusters by tightly integrating with HDFS and YARN. HAWQ supports multiple Hadoop file formats such as Apache Parquet, native HDFS, and Apache Avro. HAWQ is configured and managed as a Hadoop service in Apache Ambari. HAWQ is 100% ANSI SQL compliant (supporting ANSI SQL-92, SQL-99, and SQL-2003, plus OLAP extensions) and supports open database connectivity (ODBC) and Java database connectivity (JDBC), as well. Most business intelligence, data analysis and data visualization tools work with HAWQ out of the box without the need for specialized drivers. A unique aspect of HAWQ is its integration of statistical and machine learning capabilities that can be natively invoked from SQL or (in the context of PL/Python, PL/Java or PL/R) in massively parallel modes and applied to large data sets across a Hadoop cluster. These capabilities are provided through MADlib – an existing open source, parallel machine-learning library. Given the close ties between the two development communities, the MADlib community has expressed interest in joining HAWQ on its journey into the ASF Incubator and will be submitting a separate, concurrent proposal. HAWQ will provide more robust and higher performing options for Hadoop environments that demand best-in-class data analytics for business critical purposes. HAWQ is implemented in C and C++. == Proposal == The goal of this proposal is to bring the core of Pivotal Software, Inc.’s (Pivotal) Pivotal HAWQⓇ codebase into the Apache Software Foundation (ASF) in order to build a vibrant, diverse and self-governed open source community around the technology. Pivotal has agreed to transfer the brand name HAWQ to Apache Software Foundation and will stop using HAWQ to refer to this software if the project gets accepted into the ASF Incubator under the name of Apache HAWQ (incubating). Pivotal will continue to market and sell an analytic engine product that includes Apache HAWQ (incubating). While HAWQ is our primary choice for a name of the project, in anticipation of any potential issues with PODLINGNAMESEARCH we have come up with two alternative names: (1) Hornet; or (2) Grove. Pivotal is submitting this proposal to donate the HAWQ source code and associated artifacts (documentation, web site content, wiki, etc.) to the Apache Software Foundation Incubator under the Apache License, Version 2.0 and is asking Incubator PMC to establish an open source community. == Background == While the ecosystem of open source SQL-on-Hadoop solutions is fairly developed by now, HAWQ has several unique features that will set it apart from existing ASF and non-ASF projects. HAWQ made its debut in 2013 as a closed source product leveraging a decade's worth of product development effort invested in Greenplum DatabaseⓇ. Since then HAWQ has rapidly gained a solid customer base and became available on non-Pivotal distributions of Hadoop. In 2015 HAWQ still leverages the rock solid foundation of Greenplum Database, while at the same time embracing elasticity and resource management native to Hadoop applications. This allows HAWQ to provide superior SQL on Hadoop performance, scalability and coverage while also providing massively-parallel machine learning capabilities and support for native Hadoop file formats. In addition, HAWQ's advanced features include support for complex joins, rich and compliant SQL dialect and industry-differentiating data federation capabilities. Dynamic pipelining and pluggable query optimizer architecture enable HAWQ to perform queries on Hadoop with the speed and scalability required for enterprise data warehouse (EDW) workloads. HAWQ provides strong support for low-latency analytic SQL queries, coupled with massively parallel machine learning capabilities. This enables discovery-based analysis of large data sets and rapid, iterative development of data analytics applications that apply deep machine learning –
Re: [DISCUSS] HAWQ Incubation Proposal
Hi Roman, Great news! BTW, it might be a invalid URL for the proposal. Should be https://wiki.apache.org/incubator/HAWQProposal ? Thanks, Youngwoo On Fri, Aug 21, 2015 at 12:14 PM, Roman Shaposhnik r...@apache.org wrote: Hi! I would like to start a discussion on accepting HAWQ into ASF Incubator. The proposal is available at: https://wiki.apache.org/incubator/ApexProposal and is also attached to the end of this email. Please note, that this proposal is very complementary to the desire of HAWQ's sister project (MADlib) to join ASF Incubator: http://madlib.net/pipermail/user/2015-August/ http://madlib.net/pipermail/devel/2015-August/ I've volunteered to help MADlib community and we're currently working on a separate proposal to be submitted later next week. If you're interested in monitoring progress of that please see updates to: https://github.com/madlib/madlib/wiki/MADlib-ASF-Incubator-Proposal and later: https://wiki.apache.org/incubator/MADlibProposal Thanks in advance for your time and help. Thanks, Roman. == Abstract == HAWQ is an advanced enterprise SQL on Hadoop analytic engine built around a robust and high-performance massively-parallel processing (MPP) SQL framework evolved from Pivotal Greenplum DatabaseⓇ. HAWQ runs natively on Apache HadoopⓇ clusters by tightly integrating with HDFS and YARN. HAWQ supports multiple Hadoop file formats such as Apache Parquet, native HDFS, and Apache Avro. HAWQ is configured and managed as a Hadoop service in Apache Ambari. HAWQ is 100% ANSI SQL compliant (supporting ANSI SQL-92, SQL-99, and SQL-2003, plus OLAP extensions) and supports open database connectivity (ODBC) and Java database connectivity (JDBC), as well. Most business intelligence, data analysis and data visualization tools work with HAWQ out of the box without the need for specialized drivers. A unique aspect of HAWQ is its integration of statistical and machine learning capabilities that can be natively invoked from SQL or (in the context of PL/Python, PL/Java or PL/R) in massively parallel modes and applied to large data sets across a Hadoop cluster. These capabilities are provided through MADlib – an existing open source, parallel machine-learning library. Given the close ties between the two development communities, the MADlib community has expressed interest in joining HAWQ on its journey into the ASF Incubator and will be submitting a separate, concurrent proposal. HAWQ will provide more robust and higher performing options for Hadoop environments that demand best-in-class data analytics for business critical purposes. HAWQ is implemented in C and C++. == Proposal == The goal of this proposal is to bring the core of Pivotal Software, Inc.’s (Pivotal) Pivotal HAWQⓇ codebase into the Apache Software Foundation (ASF) in order to build a vibrant, diverse and self-governed open source community around the technology. Pivotal has agreed to transfer the brand name HAWQ to Apache Software Foundation and will stop using HAWQ to refer to this software if the project gets accepted into the ASF Incubator under the name of Apache HAWQ (incubating). Pivotal will continue to market and sell an analytic engine product that includes Apache HAWQ (incubating). While HAWQ is our primary choice for a name of the project, in anticipation of any potential issues with PODLINGNAMESEARCH we have come up with two alternative names: (1) Hornet; or (2) Grove. Pivotal is submitting this proposal to donate the HAWQ source code and associated artifacts (documentation, web site content, wiki, etc.) to the Apache Software Foundation Incubator under the Apache License, Version 2.0 and is asking Incubator PMC to establish an open source community. == Background == While the ecosystem of open source SQL-on-Hadoop solutions is fairly developed by now, HAWQ has several unique features that will set it apart from existing ASF and non-ASF projects. HAWQ made its debut in 2013 as a closed source product leveraging a decade's worth of product development effort invested in Greenplum DatabaseⓇ. Since then HAWQ has rapidly gained a solid customer base and became available on non-Pivotal distributions of Hadoop. In 2015 HAWQ still leverages the rock solid foundation of Greenplum Database, while at the same time embracing elasticity and resource management native to Hadoop applications. This allows HAWQ to provide superior SQL on Hadoop performance, scalability and coverage while also providing massively-parallel machine learning capabilities and support for native Hadoop file formats. In addition, HAWQ's advanced features include support for complex joins, rich and compliant SQL dialect and industry-differentiating data federation capabilities. Dynamic pipelining and pluggable query optimizer architecture enable HAWQ to perform queries on Hadoop with the speed and scalability required for enterprise data warehouse (EDW)
Re: [DISCUSS] HAWQ Incubation Proposal
On Thu, Aug 20, 2015 at 8:33 PM, 김영우 (Youngwoo Kim) warwit...@gmail.com wrote: Hi Roman, Great news! BTW, it might be a invalid URL for the proposal. Should be https://wiki.apache.org/incubator/HAWQProposal ? Two may copy-paste buffers strike again :-( Thanks for spotting it so quickly. Yes it is: https://wiki.apache.org/incubator/HAWQProposal Thanks, Roman. - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: [DISCUSS] HAWQ Incubation Proposal
Drill is implemented entirely in Java. This isn't core to the proposal, but it would be better corrected. On Thu, Aug 20, 2015 at 8:33 PM, 김영우 (Youngwoo Kim) warwit...@gmail.com wrote: Hi Roman, Great news! BTW, it might be a invalid URL for the proposal. Should be https://wiki.apache.org/incubator/HAWQProposal ? Thanks, Youngwoo On Fri, Aug 21, 2015 at 12:14 PM, Roman Shaposhnik r...@apache.org wrote: Hi! I would like to start a discussion on accepting HAWQ into ASF Incubator. The proposal is available at: https://wiki.apache.org/incubator/ApexProposal and is also attached to the end of this email. Please note, that this proposal is very complementary to the desire of HAWQ's sister project (MADlib) to join ASF Incubator: http://madlib.net/pipermail/user/2015-August/ http://madlib.net/pipermail/devel/2015-August/ I've volunteered to help MADlib community and we're currently working on a separate proposal to be submitted later next week. If you're interested in monitoring progress of that please see updates to: https://github.com/madlib/madlib/wiki/MADlib-ASF-Incubator-Proposal and later: https://wiki.apache.org/incubator/MADlibProposal Thanks in advance for your time and help. Thanks, Roman. == Abstract == HAWQ is an advanced enterprise SQL on Hadoop analytic engine built around a robust and high-performance massively-parallel processing (MPP) SQL framework evolved from Pivotal Greenplum DatabaseⓇ. HAWQ runs natively on Apache HadoopⓇ clusters by tightly integrating with HDFS and YARN. HAWQ supports multiple Hadoop file formats such as Apache Parquet, native HDFS, and Apache Avro. HAWQ is configured and managed as a Hadoop service in Apache Ambari. HAWQ is 100% ANSI SQL compliant (supporting ANSI SQL-92, SQL-99, and SQL-2003, plus OLAP extensions) and supports open database connectivity (ODBC) and Java database connectivity (JDBC), as well. Most business intelligence, data analysis and data visualization tools work with HAWQ out of the box without the need for specialized drivers. A unique aspect of HAWQ is its integration of statistical and machine learning capabilities that can be natively invoked from SQL or (in the context of PL/Python, PL/Java or PL/R) in massively parallel modes and applied to large data sets across a Hadoop cluster. These capabilities are provided through MADlib – an existing open source, parallel machine-learning library. Given the close ties between the two development communities, the MADlib community has expressed interest in joining HAWQ on its journey into the ASF Incubator and will be submitting a separate, concurrent proposal. HAWQ will provide more robust and higher performing options for Hadoop environments that demand best-in-class data analytics for business critical purposes. HAWQ is implemented in C and C++. == Proposal == The goal of this proposal is to bring the core of Pivotal Software, Inc.’s (Pivotal) Pivotal HAWQⓇ codebase into the Apache Software Foundation (ASF) in order to build a vibrant, diverse and self-governed open source community around the technology. Pivotal has agreed to transfer the brand name HAWQ to Apache Software Foundation and will stop using HAWQ to refer to this software if the project gets accepted into the ASF Incubator under the name of Apache HAWQ (incubating). Pivotal will continue to market and sell an analytic engine product that includes Apache HAWQ (incubating). While HAWQ is our primary choice for a name of the project, in anticipation of any potential issues with PODLINGNAMESEARCH we have come up with two alternative names: (1) Hornet; or (2) Grove. Pivotal is submitting this proposal to donate the HAWQ source code and associated artifacts (documentation, web site content, wiki, etc.) to the Apache Software Foundation Incubator under the Apache License, Version 2.0 and is asking Incubator PMC to establish an open source community. == Background == While the ecosystem of open source SQL-on-Hadoop solutions is fairly developed by now, HAWQ has several unique features that will set it apart from existing ASF and non-ASF projects. HAWQ made its debut in 2013 as a closed source product leveraging a decade's worth of product development effort invested in Greenplum DatabaseⓇ. Since then HAWQ has rapidly gained a solid customer base and became available on non-Pivotal distributions of Hadoop. In 2015 HAWQ still leverages the rock solid foundation of Greenplum Database, while at the same time embracing elasticity and resource management native to Hadoop applications. This allows HAWQ to provide superior SQL on Hadoop performance, scalability and coverage while also providing massively-parallel machine learning capabilities and support for native Hadoop file formats. In addition, HAWQ's advanced features include support for