Re: [DISCUSS] HAWQ Incubation Proposal

2015-08-31 Thread Atri Sharma
If everything is fine, should we call for a vote on proposal?

On Sun, Aug 30, 2015 at 3:06 PM, Bertrand Delacretaz  wrote:

> On Sat, Aug 29, 2015 at 7:54 PM, Justin Erenkrantz
>  wrote:
> > On Fri, Aug 28, 2015 at 7:45 PM, Roman Shaposhnik 
> wrote:
> >> ...With Justin volunteering at this point we've got 6 very active, very
> >> experienced mentors. I really don't think the # of committers should be
> >> a problem.
> >
> > I agree with Roman...
>
> Ok, I'll trust you guys on this then!
>
> -Bertrand
>
> -
> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> For additional commands, e-mail: general-h...@incubator.apache.org
>
>


Re: [DISCUSS] HAWQ Incubation Proposal

2015-08-30 Thread Bertrand Delacretaz
On Sat, Aug 29, 2015 at 7:54 PM, Justin Erenkrantz
jus...@erenkrantz.com wrote:
 On Fri, Aug 28, 2015 at 7:45 PM, Roman Shaposhnik ro...@shaposhnik.org 
 wrote:
 ...With Justin volunteering at this point we've got 6 very active, very
 experienced mentors. I really don't think the # of committers should be
 a problem.

 I agree with Roman...

Ok, I'll trust you guys on this then!

-Bertrand

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [DISCUSS] HAWQ Incubation Proposal

2015-08-29 Thread Justin Erenkrantz
On Fri, Aug 28, 2015 at 7:45 PM, Roman Shaposhnik ro...@shaposhnik.org wrote:
 I would much prefer a smaller list of initial committers who have been
 identified as having experience or a solid potential to be ASF
 committers, and let others be elected based on merit as the project
 progresses.

 I would agree that this could be a problem if the project didn't have
 enough active mentors to help a large # of folks master the Apache Way.
 With Justin volunteering at this point we've got 6 very active, very
 experienced mentors. I really don't think the # of committers should be
 a problem.

I agree with Roman.

I think that it would be better to have the list of initial committers
be closer to reality (in the eyes of the proposed project) than
artificially limit it.

During the incubation process, the community can work through the
process of expanding (or contracting if needed - hopefully not!) the
community.

Cheers.  -- justin

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [DISCUSS] HAWQ Incubation Proposal

2015-08-28 Thread Roman Shaposhnik
On Thu, Aug 27, 2015 at 3:20 PM, Justin Erenkrantz
jus...@erenkrantz.com wrote:
 On Thu, Aug 20, 2015 at 11:14 PM, Roman Shaposhnik r...@apache.org wrote:
 Hi!

 I would like to start a discussion on accepting HAWQ
 into ASF Incubator. The proposal is available at:
 https://wiki.apache.org/incubator/HAWQProposal
 and is also attached to the end of this email.

 If HAWQ desires more mentors, I'd be willing to be included as well.

Justin, thanks a million for volunteering! I've included you on the proposal.

Thanks,
Roman.

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [DISCUSS] HAWQ Incubation Proposal

2015-08-28 Thread Roman Shaposhnik
On Mon, Aug 24, 2015 at 12:47 AM, Bertrand Delacretaz
bdelacre...@apache.org wrote:
 Hi,

 On Sat, Aug 22, 2015 at 12:35 AM, Roman Shaposhnik ro...@shaposhnik.org 
 wrote:
 On Fri, Aug 21, 2015 at 1:46 AM, Bertrand Delacretaz
 bdelacre...@apache.org wrote:
... There's  some GPL/LGPL stuff in there, IMO the proposal should include
 a plan for coping with those.

 Can you help me understand which bits of those dependencies do you
 see as problematic?...

 They are not necessarily problematic but the podling needs to be aware
 of the GPL/LGPL mentions at http://apache.org/legal/resolved.html and
 evaluate those dependencies accordingly. I didn't see a mention of
 that in the proposal.

Good point. I called it out explicitly now.

Thanks,
Roman.

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [DISCUSS] HAWQ Incubation Proposal

2015-08-28 Thread Roman Shaposhnik
On Thu, Aug 20, 2015 at 11:44 PM, Bertrand Delacretaz
bdelacre...@apache.org wrote:
 On Fri, Aug 21, 2015 at 5:14 AM, Roman Shaposhnik r...@apache.org wrote:
 ...most of the core developers are currently NOT affiliated
 with the ASF and would require new ICLAs before committing to the
 project
 ...
 == Affiliations ==
 ...
   * Pivotal: everyone else on this proposal...

 So IIUC that's about 50 committers from the same company and most of
 them don't have experience with open source, or at least not at the
 ASF.

Well, like the proposal says -- most don't but at least ~10 do (those
are the same guys working on Geode).

 Doesn't that drastically lower the chances of the project creating a
 diverse community?

 I would much prefer a smaller list of initial committers who have been
 identified as having experience or a solid potential to be ASF
 committers, and let others be elected based on merit as the project
 progresses.

I would agree that this could be a problem if the project didn't have
enough active mentors to help a large # of folks master the Apache Way.
With Justin volunteering at this point we've got 6 very active, very
experienced mentors. I really don't think the # of committers should be
a problem.

Thanks,
Roman.

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [DISCUSS] HAWQ Incubation Proposal

2015-08-27 Thread Justin Erenkrantz
On Thu, Aug 20, 2015 at 11:14 PM, Roman Shaposhnik r...@apache.org wrote:
 Hi!

 I would like to start a discussion on accepting HAWQ
 into ASF Incubator. The proposal is available at:
 https://wiki.apache.org/incubator/HAWQProposal
 and is also attached to the end of this email.

If HAWQ desires more mentors, I'd be willing to be included as well.

Cheers.  -- justin

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [DISCUSS] HAWQ Incubation Proposal

2015-08-24 Thread Bertrand Delacretaz
Hi,

On Sat, Aug 22, 2015 at 12:35 AM, Roman Shaposhnik ro...@shaposhnik.org wrote:
 On Fri, Aug 21, 2015 at 1:46 AM, Bertrand Delacretaz
 bdelacre...@apache.org wrote:
... There's  some GPL/LGPL stuff in there, IMO the proposal should include
 a plan for coping with those.

 Can you help me understand which bits of those dependencies do you
 see as problematic?...

They are not necessarily problematic but the podling needs to be aware
of the GPL/LGPL mentions at http://apache.org/legal/resolved.html and
evaluate those dependencies accordingly. I didn't see a mention of
that in the proposal.

-Bertrand

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [DISCUSS] HAWQ Incubation Proposal

2015-08-21 Thread Roman Shaposhnik
On Fri, Aug 21, 2015 at 3:00 AM, Ted Dunning ted.dunn...@gmail.com wrote:
 Is Madlib even viable as an independent project?

Yes it is extremely viable. There's been a number of prototypes of porting
MADlib to other SQL-on-Hadoop projects (Impala is the one I know of)
and also quite a bit of activity on the PostgreSQL side like a recent GSoC
project where a student added  a clustering algorithms to MADLIB as
part of PostgreSQL GSoC, not even MADlib's one. Finally it is still hugely
popular with GreenplumDB users. Now, of course  GreenplumDB hasn't
been open sourced yet, but we're actively working on making it happen
soon.

Hortonworks folks working on Hive were pretty enthusiastic about a possible
MADlib integration story as well.

 Should it be part of the overall Hawq project?

Its funny that you ask, because MADlib community is giving me feedback
that I've managed to downplay how I managed to downplay MADlib's
cross-platform nature:
http://madlib.net/pipermail/user/2015-August/000212.html

Thanks,
Roman.

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [DISCUSS] HAWQ Incubation Proposal

2015-08-21 Thread Roman Shaposhnik
On Fri, Aug 21, 2015 at 1:46 AM, Bertrand Delacretaz
bdelacre...@apache.org wrote:
 On Fri, Aug 21, 2015 at 5:14 AM, Roman Shaposhnik r...@apache.org wrote:
 ...
 == External Dependencies ==...

 There's  some GPL/LGPL stuff in there, IMO the proposal should include
 a plan for coping with those.

Can you help me understand which bits of those dependencies do you
see as problematic? Lets walk through them one-by-one:
* gperf -- this is simply an external utility that HAWQ executes for
   non-essential parts of its functionality. This should be ok.

* libgsasl and libuuid-2.26 -- those are LGPL runtime dependencies with
 no source code
bleeding into HAWQ
 implementation.
Should be no different from
 any other ASF project
implemented in C/C++
 dynamically linking
against myriard of LGPL
 libraries

Am I missing something here?

Thanks,
Roman.

P.S. Here's, for example, what Apache Subversion links against:

$ ldd /usr/bin/svn

linux-vdso.so.1 =  (0x7fffe6785000)
libsvn_client-1.so.1 = /usr/lib/x86_64-linux-gnu/libsvn_client-1.so.1
(0x7f58036cd000)
libsvn_wc-1.so.1 = /usr/lib/x86_64-linux-gnu/libsvn_wc-1.so.1
(0x7f5803424000)
libsvn_ra-1.so.1 = /usr/lib/x86_64-linux-gnu/libsvn_ra-1.so.1
(0x7f5803215000)
libsvn_delta-1.so.1 = /usr/lib/x86_64-linux-gnu/libsvn_delta-1.so.1
(0x7f5803002000)
libsvn_diff-1.so.1 = /usr/lib/x86_64-linux-gnu/libsvn_diff-1.so.1
(0x7f5802dee000)
libsvn_subr-1.so.1 = /usr/lib/x86_64-linux-gnu/libsvn_subr-1.so.1
(0x7f5802b77000)
libapr-1.so.0 = /usr/lib/x86_64-linux-gnu/libapr-1.so.0 (0x7f5802946000)
libpthread.so.0 = /lib/x86_64-linux-gnu/libpthread.so.0 (0x7f5802728000)
libc.so.6 = /lib/x86_64-linux-gnu/libc.so.6 (0x7f5802361000)
libaprutil-1.so.0 = /usr/lib/x86_64-linux-gnu/libaprutil-1.so.0
(0x7f580213a000)
libsvn_ra_local-1.so.1 =
/usr/lib/x86_64-linux-gnu/libsvn_ra_local-1.so.1 (0x7f5801f31000)
libsvn_ra_svn-1.so.1 = /usr/lib/x86_64-linux-gnu/libsvn_ra_svn-1.so.1
(0x7f5801d12000)
libsvn_ra_serf-1.so.1 =
/usr/lib/x86_64-linux-gnu/libsvn_ra_serf-1.so.1 (0x7f5801ae3000)
libz.so.1 = /lib/x86_64-linux-gnu/libz.so.1 (0x7f58018ca000)
libexpat.so.1 = /lib/x86_64-linux-gnu/libexpat.so.1 (0x7f580169f000)
libsqlite3.so.0 = /usr/lib/x86_64-linux-gnu/libsqlite3.so.0
(0x7f58013e6000)
libuuid.so.1 = /lib/x86_64-linux-gnu/libuuid.so.1 (0x7f58011e)
libdl.so.2 = /lib/x86_64-linux-gnu/libdl.so.2 (0x7f5800fdc000)
/lib64/ld-linux-x86-64.so.2 (0x7f580393d000)
libcrypt.so.1 = /lib/x86_64-linux-gnu/libcrypt.so.1 (0x7f5800da3000)
libsvn_repos-1.so.1 = /usr/lib/x86_64-linux-gnu/libsvn_repos-1.so.1
(0x7f5800b6f000)
libsvn_fs-1.so.1 = /usr/lib/x86_64-linux-gnu/libsvn_fs-1.so.1
(0x7f5800965000)
libsasl2.so.2 = /usr/lib/x86_64-linux-gnu/libsasl2.so.2 (0x7f580074a000)
libserf-1.so.1 = /usr/lib/x86_64-linux-gnu/libserf-1.so.1 (0x7f580053)
libsvn_fs_fs-1.so.1 = /usr/lib/x86_64-linux-gnu/libsvn_fs_fs-1.so.1
(0x7f58002fc000)
libsvn_fs_base-1.so.1 =
/usr/lib/x86_64-linux-gnu/libsvn_fs_base-1.so.1 (0x7f58000ce000)
libsvn_fs_util-1.so.1 =
/usr/lib/x86_64-linux-gnu/libsvn_fs_util-1.so.1 (0x7f57ffecb000)
libssl.so.1.0.0 = /lib/x86_64-linux-gnu/libssl.so.1.0.0 (0x7f57ffc6d000)
libcrypto.so.1.0.0 = /lib/x86_64-linux-gnu/libcrypto.so.1.0.0
(0x7f57ff892000)
libdb-5.3.so = /usr/lib/x86_64-linux-gnu/libdb-5.3.so (0x7f57ff4f)

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [DISCUSS] HAWQ Incubation Proposal

2015-08-21 Thread Roman Shaposhnik
Hi Ted!

this is one of those blind spot things I can't even explain. I was so sure
Drill was implemented in C/C++ that I never bothered to check. I'm now
curios as to who may have implanted that false memory (did you guys
talk about when it was first getting designed?).

But anyway, more to the point: great feedback. That section really doesn't
make much sense if Drill is 100% Java. The wiki has been updated.

Thanks,
Roman.

On Thu, Aug 20, 2015 at 9:02 PM, Ted Dunning ted.dunn...@gmail.com wrote:
 Drill is implemented entirely in Java.

 This isn't core to the proposal, but it would be better corrected.



 On Thu, Aug 20, 2015 at 8:33 PM, 김영우 (Youngwoo Kim) warwit...@gmail.com
 wrote:

 Hi Roman,

 Great news!

 BTW, it might be a invalid URL for the proposal. Should be
 https://wiki.apache.org/incubator/HAWQProposal ?

 Thanks,
 Youngwoo

 On Fri, Aug 21, 2015 at 12:14 PM, Roman Shaposhnik r...@apache.org wrote:

  Hi!
 
  I would like to start a discussion on accepting HAWQ
  into ASF Incubator. The proposal is available at:
  https://wiki.apache.org/incubator/ApexProposal
  and is also attached to the end of this email.
 
  Please note, that this proposal is very complementary
  to the desire of HAWQ's sister project (MADlib) to
  join ASF Incubator:
  http://madlib.net/pipermail/user/2015-August/
  http://madlib.net/pipermail/devel/2015-August/
  I've volunteered to help MADlib community and we're
  currently working on a separate proposal to be submitted
  later next week. If you're interested in monitoring progress
  of that please see updates to:
   https://github.com/madlib/madlib/wiki/MADlib-ASF-Incubator-Proposal
  and later:
   https://wiki.apache.org/incubator/MADlibProposal
 
  Thanks in advance for your time and help.
 
  Thanks,
  Roman.
 
  == Abstract ==
 
  HAWQ is an advanced enterprise SQL on Hadoop analytic engine built
  around a robust and high-performance massively-parallel processing
  (MPP) SQL framework evolved from Pivotal Greenplum DatabaseⓇ.
 
  HAWQ runs natively on Apache HadoopⓇ clusters by tightly integrating
  with HDFS and YARN. HAWQ supports multiple Hadoop file formats such as
  Apache Parquet, native HDFS, and Apache Avro. HAWQ is configured and
  managed as a Hadoop service in Apache Ambari. HAWQ is 100% ANSI SQL
  compliant (supporting ANSI SQL-92, SQL-99, and SQL-2003, plus OLAP
  extensions) and supports open database connectivity (ODBC) and Java
  database connectivity (JDBC), as well. Most business intelligence,
  data analysis and data visualization tools work with HAWQ out of the
  box without the need for specialized drivers.
 
  A unique aspect of HAWQ is its integration of statistical and machine
  learning capabilities that can be natively invoked from SQL or (in the
  context of PL/Python, PL/Java or PL/R) in massively parallel modes and
  applied to large data sets across a Hadoop cluster. These capabilities
  are provided through MADlib – an existing open source, parallel
  machine-learning library. Given the close ties between the two
  development communities, the MADlib community has expressed interest
  in joining HAWQ on its journey into the ASF Incubator and will be
  submitting a separate, concurrent proposal.
 
  HAWQ will provide more robust and higher performing options for Hadoop
  environments that demand best-in-class data analytics for business
  critical purposes. HAWQ is implemented in C and C++.
 
  == Proposal ==
  The goal of this proposal is to bring the core of Pivotal Software,
  Inc.’s (Pivotal) Pivotal HAWQⓇ codebase into the Apache Software
  Foundation (ASF) in order to build a vibrant, diverse and
  self-governed open source community around the technology. Pivotal has
  agreed to transfer the brand name HAWQ to Apache Software Foundation
  and will stop using HAWQ to refer to this software if the project gets
  accepted into the ASF Incubator under the name of Apache HAWQ
  (incubating). Pivotal will continue to market and sell an analytic
  engine product that includes Apache HAWQ (incubating). While HAWQ is
  our primary choice for a name of the project, in anticipation of any
  potential issues with PODLINGNAMESEARCH we have come up with two
  alternative names: (1) Hornet; or (2) Grove.
 
  Pivotal is submitting this proposal to donate the HAWQ source code and
  associated artifacts (documentation, web site content, wiki, etc.) to
  the Apache Software Foundation Incubator under the Apache License,
  Version 2.0 and is asking Incubator PMC to establish an open source
  community.
 
  == Background ==
  While the ecosystem of open source SQL-on-Hadoop solutions is fairly
  developed by now, HAWQ has several unique features that will set it
  apart from existing ASF and non-ASF projects. HAWQ made its debut in
  2013 as a closed source product leveraging a decade's worth of product
  development effort invested in Greenplum DatabaseⓇ. Since then HAWQ
  has rapidly gained a solid customer base and 

Re: [DISCUSS] HAWQ Incubation Proposal

2015-08-21 Thread Ted Dunning
On Fri, Aug 21, 2015 at 3:15 PM, Roman Shaposhnik ro...@shaposhnik.org
wrote:

 this is one of those blind spot things I can't even explain. I was so sure
 Drill was implemented in C/C++ that I never bothered to check. I'm now
 curios as to who may have implanted that false memory (did you guys
 talk about when it was first getting designed?).


It may be that the fact that our file system (MapR FS) is in C may have led
to the presumption that Drill gets performance from being in C.


Re: [DISCUSS] HAWQ Incubation Proposal

2015-08-21 Thread Roman Shaposhnik
On Fri, Aug 21, 2015 at 2:54 AM, Christian Tzolov
christian.tzo...@gmail.com wrote:
 Is PXF (HAWQ extension framework) going to be managed together with the
 HAWQ ASF project or as a child one?

For now it is part of the same codebase managed by the same community.
If it gets
picked up by other communities we may start asking the question of a subproject
or perhaps a standalone TLP. Sort of along the lines of what happened to ORC.

But that's way out there in the future.

Thanks,
Roman.

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [DISCUSS] HAWQ Incubation Proposal

2015-08-21 Thread Bertrand Delacretaz
On Fri, Aug 21, 2015 at 5:14 AM, Roman Shaposhnik r...@apache.org wrote:
 ...most of the core developers are currently NOT affiliated
 with the ASF and would require new ICLAs before committing to the
 project
...
 == Affiliations ==
...
   * Pivotal: everyone else on this proposal...

So IIUC that's about 50 committers from the same company and most of
them don't have experience with open source, or at least not at the
ASF.

Doesn't that drastically lower the chances of the project creating a
diverse community?

I would much prefer a smaller list of initial committers who have been
identified as having experience or a solid potential to be ASF
committers, and let others be elected based on merit as the project
progresses.

-Bertrand

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [DISCUSS] HAWQ Incubation Proposal

2015-08-21 Thread Bertrand Delacretaz
On Fri, Aug 21, 2015 at 5:14 AM, Roman Shaposhnik r...@apache.org wrote:
...
 == External Dependencies ==...

There's  some GPL/LGPL stuff in there, IMO the proposal should include
a plan for coping with those.

-Bertrand

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [DISCUSS] HAWQ Incubation Proposal

2015-08-21 Thread Christian Tzolov
Is PXF (HAWQ extension framework) going to be managed together with the
HAWQ ASF project or as a child one?

On Fri, Aug 21, 2015 at 8:46 AM, Bertrand Delacretaz bdelacre...@apache.org
 wrote:

 On Fri, Aug 21, 2015 at 5:14 AM, Roman Shaposhnik r...@apache.org wrote:
 ...
  == External Dependencies ==...

 There's  some GPL/LGPL stuff in there, IMO the proposal should include
 a plan for coping with those.

 -Bertrand

 -
 To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
 For additional commands, e-mail: general-h...@incubator.apache.org




Re: [DISCUSS] HAWQ Incubation Proposal

2015-08-21 Thread Ted Dunning
Is Madlib even viable as an independent project?

Should it be part of the overall Hawq project?



On Thu, Aug 20, 2015 at 8:14 PM, Roman Shaposhnik r...@apache.org wrote:

 Hi!

 I would like to start a discussion on accepting HAWQ
 into ASF Incubator. The proposal is available at:
 https://wiki.apache.org/incubator/ApexProposal
 and is also attached to the end of this email.

 Please note, that this proposal is very complementary
 to the desire of HAWQ's sister project (MADlib) to
 join ASF Incubator:
 http://madlib.net/pipermail/user/2015-August/
 http://madlib.net/pipermail/devel/2015-August/
 I've volunteered to help MADlib community and we're
 currently working on a separate proposal to be submitted
 later next week. If you're interested in monitoring progress
 of that please see updates to:
  https://github.com/madlib/madlib/wiki/MADlib-ASF-Incubator-Proposal
 and later:
  https://wiki.apache.org/incubator/MADlibProposal

 Thanks in advance for your time and help.

 Thanks,
 Roman.

 == Abstract ==

 HAWQ is an advanced enterprise SQL on Hadoop analytic engine built
 around a robust and high-performance massively-parallel processing
 (MPP) SQL framework evolved from Pivotal Greenplum DatabaseⓇ.

 HAWQ runs natively on Apache HadoopⓇ clusters by tightly integrating
 with HDFS and YARN. HAWQ supports multiple Hadoop file formats such as
 Apache Parquet, native HDFS, and Apache Avro. HAWQ is configured and
 managed as a Hadoop service in Apache Ambari. HAWQ is 100% ANSI SQL
 compliant (supporting ANSI SQL-92, SQL-99, and SQL-2003, plus OLAP
 extensions) and supports open database connectivity (ODBC) and Java
 database connectivity (JDBC), as well. Most business intelligence,
 data analysis and data visualization tools work with HAWQ out of the
 box without the need for specialized drivers.

 A unique aspect of HAWQ is its integration of statistical and machine
 learning capabilities that can be natively invoked from SQL or (in the
 context of PL/Python, PL/Java or PL/R) in massively parallel modes and
 applied to large data sets across a Hadoop cluster. These capabilities
 are provided through MADlib – an existing open source, parallel
 machine-learning library. Given the close ties between the two
 development communities, the MADlib community has expressed interest
 in joining HAWQ on its journey into the ASF Incubator and will be
 submitting a separate, concurrent proposal.

 HAWQ will provide more robust and higher performing options for Hadoop
 environments that demand best-in-class data analytics for business
 critical purposes. HAWQ is implemented in C and C++.

 == Proposal ==
 The goal of this proposal is to bring the core of Pivotal Software,
 Inc.’s (Pivotal) Pivotal HAWQⓇ codebase into the Apache Software
 Foundation (ASF) in order to build a vibrant, diverse and
 self-governed open source community around the technology. Pivotal has
 agreed to transfer the brand name HAWQ to Apache Software Foundation
 and will stop using HAWQ to refer to this software if the project gets
 accepted into the ASF Incubator under the name of Apache HAWQ
 (incubating). Pivotal will continue to market and sell an analytic
 engine product that includes Apache HAWQ (incubating). While HAWQ is
 our primary choice for a name of the project, in anticipation of any
 potential issues with PODLINGNAMESEARCH we have come up with two
 alternative names: (1) Hornet; or (2) Grove.

 Pivotal is submitting this proposal to donate the HAWQ source code and
 associated artifacts (documentation, web site content, wiki, etc.) to
 the Apache Software Foundation Incubator under the Apache License,
 Version 2.0 and is asking Incubator PMC to establish an open source
 community.

 == Background ==
 While the ecosystem of open source SQL-on-Hadoop solutions is fairly
 developed by now, HAWQ has several unique features that will set it
 apart from existing ASF and non-ASF projects. HAWQ made its debut in
 2013 as a closed source product leveraging a decade's worth of product
 development effort invested in Greenplum DatabaseⓇ. Since then HAWQ
 has rapidly gained a solid customer base and became available on
 non-Pivotal distributions of Hadoop.
 In 2015 HAWQ still leverages the rock solid foundation of Greenplum
 Database, while at the same time embracing elasticity and resource
 management native to Hadoop applications. This allows HAWQ to provide
 superior SQL on Hadoop performance, scalability and coverage while
 also providing massively-parallel machine learning capabilities and
 support for native Hadoop file formats. In addition, HAWQ's advanced
 features include support for complex joins, rich and compliant SQL
 dialect and industry-differentiating data federation capabilities.
 Dynamic pipelining and pluggable query optimizer architecture enable
 HAWQ to perform queries on Hadoop with the speed and scalability
 required for enterprise data warehouse (EDW) workloads. HAWQ provides
 strong support for 

[DISCUSS] HAWQ Incubation Proposal

2015-08-20 Thread Roman Shaposhnik
Hi!

I would like to start a discussion on accepting HAWQ
into ASF Incubator. The proposal is available at:
https://wiki.apache.org/incubator/ApexProposal
and is also attached to the end of this email.

Please note, that this proposal is very complementary
to the desire of HAWQ's sister project (MADlib) to
join ASF Incubator:
http://madlib.net/pipermail/user/2015-August/
http://madlib.net/pipermail/devel/2015-August/
I've volunteered to help MADlib community and we're
currently working on a separate proposal to be submitted
later next week. If you're interested in monitoring progress
of that please see updates to:
 https://github.com/madlib/madlib/wiki/MADlib-ASF-Incubator-Proposal
and later:
 https://wiki.apache.org/incubator/MADlibProposal

Thanks in advance for your time and help.

Thanks,
Roman.

== Abstract ==

HAWQ is an advanced enterprise SQL on Hadoop analytic engine built
around a robust and high-performance massively-parallel processing
(MPP) SQL framework evolved from Pivotal Greenplum DatabaseⓇ.

HAWQ runs natively on Apache HadoopⓇ clusters by tightly integrating
with HDFS and YARN. HAWQ supports multiple Hadoop file formats such as
Apache Parquet, native HDFS, and Apache Avro. HAWQ is configured and
managed as a Hadoop service in Apache Ambari. HAWQ is 100% ANSI SQL
compliant (supporting ANSI SQL-92, SQL-99, and SQL-2003, plus OLAP
extensions) and supports open database connectivity (ODBC) and Java
database connectivity (JDBC), as well. Most business intelligence,
data analysis and data visualization tools work with HAWQ out of the
box without the need for specialized drivers.

A unique aspect of HAWQ is its integration of statistical and machine
learning capabilities that can be natively invoked from SQL or (in the
context of PL/Python, PL/Java or PL/R) in massively parallel modes and
applied to large data sets across a Hadoop cluster. These capabilities
are provided through MADlib – an existing open source, parallel
machine-learning library. Given the close ties between the two
development communities, the MADlib community has expressed interest
in joining HAWQ on its journey into the ASF Incubator and will be
submitting a separate, concurrent proposal.

HAWQ will provide more robust and higher performing options for Hadoop
environments that demand best-in-class data analytics for business
critical purposes. HAWQ is implemented in C and C++.

== Proposal ==
The goal of this proposal is to bring the core of Pivotal Software,
Inc.’s (Pivotal) Pivotal HAWQⓇ codebase into the Apache Software
Foundation (ASF) in order to build a vibrant, diverse and
self-governed open source community around the technology. Pivotal has
agreed to transfer the brand name HAWQ to Apache Software Foundation
and will stop using HAWQ to refer to this software if the project gets
accepted into the ASF Incubator under the name of Apache HAWQ
(incubating). Pivotal will continue to market and sell an analytic
engine product that includes Apache HAWQ (incubating). While HAWQ is
our primary choice for a name of the project, in anticipation of any
potential issues with PODLINGNAMESEARCH we have come up with two
alternative names: (1) Hornet; or (2) Grove.

Pivotal is submitting this proposal to donate the HAWQ source code and
associated artifacts (documentation, web site content, wiki, etc.) to
the Apache Software Foundation Incubator under the Apache License,
Version 2.0 and is asking Incubator PMC to establish an open source
community.

== Background ==
While the ecosystem of open source SQL-on-Hadoop solutions is fairly
developed by now, HAWQ has several unique features that will set it
apart from existing ASF and non-ASF projects. HAWQ made its debut in
2013 as a closed source product leveraging a decade's worth of product
development effort invested in Greenplum DatabaseⓇ. Since then HAWQ
has rapidly gained a solid customer base and became available on
non-Pivotal distributions of Hadoop.
In 2015 HAWQ still leverages the rock solid foundation of Greenplum
Database, while at the same time embracing elasticity and resource
management native to Hadoop applications. This allows HAWQ to provide
superior SQL on Hadoop performance, scalability and coverage while
also providing massively-parallel machine learning capabilities and
support for native Hadoop file formats. In addition, HAWQ's advanced
features include support for complex joins, rich and compliant SQL
dialect and industry-differentiating data federation capabilities.
Dynamic pipelining and pluggable query optimizer architecture enable
HAWQ to perform queries on Hadoop with the speed and scalability
required for enterprise data warehouse (EDW) workloads. HAWQ provides
strong support for low-latency analytic SQL queries, coupled with
massively parallel machine learning capabilities. This enables
discovery-based analysis of large data sets and rapid, iterative
development of data analytics applications that apply deep machine
learning – 

Re: [DISCUSS] HAWQ Incubation Proposal

2015-08-20 Thread Youngwoo Kim
Hi Roman,

Great news!

BTW, it might be a invalid URL for the proposal. Should be
https://wiki.apache.org/incubator/HAWQProposal ?

Thanks,
Youngwoo

On Fri, Aug 21, 2015 at 12:14 PM, Roman Shaposhnik r...@apache.org wrote:

 Hi!

 I would like to start a discussion on accepting HAWQ
 into ASF Incubator. The proposal is available at:
 https://wiki.apache.org/incubator/ApexProposal
 and is also attached to the end of this email.

 Please note, that this proposal is very complementary
 to the desire of HAWQ's sister project (MADlib) to
 join ASF Incubator:
 http://madlib.net/pipermail/user/2015-August/
 http://madlib.net/pipermail/devel/2015-August/
 I've volunteered to help MADlib community and we're
 currently working on a separate proposal to be submitted
 later next week. If you're interested in monitoring progress
 of that please see updates to:
  https://github.com/madlib/madlib/wiki/MADlib-ASF-Incubator-Proposal
 and later:
  https://wiki.apache.org/incubator/MADlibProposal

 Thanks in advance for your time and help.

 Thanks,
 Roman.

 == Abstract ==

 HAWQ is an advanced enterprise SQL on Hadoop analytic engine built
 around a robust and high-performance massively-parallel processing
 (MPP) SQL framework evolved from Pivotal Greenplum DatabaseⓇ.

 HAWQ runs natively on Apache HadoopⓇ clusters by tightly integrating
 with HDFS and YARN. HAWQ supports multiple Hadoop file formats such as
 Apache Parquet, native HDFS, and Apache Avro. HAWQ is configured and
 managed as a Hadoop service in Apache Ambari. HAWQ is 100% ANSI SQL
 compliant (supporting ANSI SQL-92, SQL-99, and SQL-2003, plus OLAP
 extensions) and supports open database connectivity (ODBC) and Java
 database connectivity (JDBC), as well. Most business intelligence,
 data analysis and data visualization tools work with HAWQ out of the
 box without the need for specialized drivers.

 A unique aspect of HAWQ is its integration of statistical and machine
 learning capabilities that can be natively invoked from SQL or (in the
 context of PL/Python, PL/Java or PL/R) in massively parallel modes and
 applied to large data sets across a Hadoop cluster. These capabilities
 are provided through MADlib – an existing open source, parallel
 machine-learning library. Given the close ties between the two
 development communities, the MADlib community has expressed interest
 in joining HAWQ on its journey into the ASF Incubator and will be
 submitting a separate, concurrent proposal.

 HAWQ will provide more robust and higher performing options for Hadoop
 environments that demand best-in-class data analytics for business
 critical purposes. HAWQ is implemented in C and C++.

 == Proposal ==
 The goal of this proposal is to bring the core of Pivotal Software,
 Inc.’s (Pivotal) Pivotal HAWQⓇ codebase into the Apache Software
 Foundation (ASF) in order to build a vibrant, diverse and
 self-governed open source community around the technology. Pivotal has
 agreed to transfer the brand name HAWQ to Apache Software Foundation
 and will stop using HAWQ to refer to this software if the project gets
 accepted into the ASF Incubator under the name of Apache HAWQ
 (incubating). Pivotal will continue to market and sell an analytic
 engine product that includes Apache HAWQ (incubating). While HAWQ is
 our primary choice for a name of the project, in anticipation of any
 potential issues with PODLINGNAMESEARCH we have come up with two
 alternative names: (1) Hornet; or (2) Grove.

 Pivotal is submitting this proposal to donate the HAWQ source code and
 associated artifacts (documentation, web site content, wiki, etc.) to
 the Apache Software Foundation Incubator under the Apache License,
 Version 2.0 and is asking Incubator PMC to establish an open source
 community.

 == Background ==
 While the ecosystem of open source SQL-on-Hadoop solutions is fairly
 developed by now, HAWQ has several unique features that will set it
 apart from existing ASF and non-ASF projects. HAWQ made its debut in
 2013 as a closed source product leveraging a decade's worth of product
 development effort invested in Greenplum DatabaseⓇ. Since then HAWQ
 has rapidly gained a solid customer base and became available on
 non-Pivotal distributions of Hadoop.
 In 2015 HAWQ still leverages the rock solid foundation of Greenplum
 Database, while at the same time embracing elasticity and resource
 management native to Hadoop applications. This allows HAWQ to provide
 superior SQL on Hadoop performance, scalability and coverage while
 also providing massively-parallel machine learning capabilities and
 support for native Hadoop file formats. In addition, HAWQ's advanced
 features include support for complex joins, rich and compliant SQL
 dialect and industry-differentiating data federation capabilities.
 Dynamic pipelining and pluggable query optimizer architecture enable
 HAWQ to perform queries on Hadoop with the speed and scalability
 required for enterprise data warehouse (EDW) 

Re: [DISCUSS] HAWQ Incubation Proposal

2015-08-20 Thread Roman Shaposhnik
On Thu, Aug 20, 2015 at 8:33 PM, 김영우 (Youngwoo Kim) warwit...@gmail.com wrote:
 Hi Roman,

 Great news!

 BTW, it might be a invalid URL for the proposal. Should be
 https://wiki.apache.org/incubator/HAWQProposal ?

Two may copy-paste buffers strike again :-( Thanks for spotting it
so quickly. Yes it is:
https://wiki.apache.org/incubator/HAWQProposal

Thanks,
Roman.

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [DISCUSS] HAWQ Incubation Proposal

2015-08-20 Thread Ted Dunning
Drill is implemented entirely in Java.

This isn't core to the proposal, but it would be better corrected.



On Thu, Aug 20, 2015 at 8:33 PM, 김영우 (Youngwoo Kim) warwit...@gmail.com
wrote:

 Hi Roman,

 Great news!

 BTW, it might be a invalid URL for the proposal. Should be
 https://wiki.apache.org/incubator/HAWQProposal ?

 Thanks,
 Youngwoo

 On Fri, Aug 21, 2015 at 12:14 PM, Roman Shaposhnik r...@apache.org wrote:

  Hi!
 
  I would like to start a discussion on accepting HAWQ
  into ASF Incubator. The proposal is available at:
  https://wiki.apache.org/incubator/ApexProposal
  and is also attached to the end of this email.
 
  Please note, that this proposal is very complementary
  to the desire of HAWQ's sister project (MADlib) to
  join ASF Incubator:
  http://madlib.net/pipermail/user/2015-August/
  http://madlib.net/pipermail/devel/2015-August/
  I've volunteered to help MADlib community and we're
  currently working on a separate proposal to be submitted
  later next week. If you're interested in monitoring progress
  of that please see updates to:
   https://github.com/madlib/madlib/wiki/MADlib-ASF-Incubator-Proposal
  and later:
   https://wiki.apache.org/incubator/MADlibProposal
 
  Thanks in advance for your time and help.
 
  Thanks,
  Roman.
 
  == Abstract ==
 
  HAWQ is an advanced enterprise SQL on Hadoop analytic engine built
  around a robust and high-performance massively-parallel processing
  (MPP) SQL framework evolved from Pivotal Greenplum DatabaseⓇ.
 
  HAWQ runs natively on Apache HadoopⓇ clusters by tightly integrating
  with HDFS and YARN. HAWQ supports multiple Hadoop file formats such as
  Apache Parquet, native HDFS, and Apache Avro. HAWQ is configured and
  managed as a Hadoop service in Apache Ambari. HAWQ is 100% ANSI SQL
  compliant (supporting ANSI SQL-92, SQL-99, and SQL-2003, plus OLAP
  extensions) and supports open database connectivity (ODBC) and Java
  database connectivity (JDBC), as well. Most business intelligence,
  data analysis and data visualization tools work with HAWQ out of the
  box without the need for specialized drivers.
 
  A unique aspect of HAWQ is its integration of statistical and machine
  learning capabilities that can be natively invoked from SQL or (in the
  context of PL/Python, PL/Java or PL/R) in massively parallel modes and
  applied to large data sets across a Hadoop cluster. These capabilities
  are provided through MADlib – an existing open source, parallel
  machine-learning library. Given the close ties between the two
  development communities, the MADlib community has expressed interest
  in joining HAWQ on its journey into the ASF Incubator and will be
  submitting a separate, concurrent proposal.
 
  HAWQ will provide more robust and higher performing options for Hadoop
  environments that demand best-in-class data analytics for business
  critical purposes. HAWQ is implemented in C and C++.
 
  == Proposal ==
  The goal of this proposal is to bring the core of Pivotal Software,
  Inc.’s (Pivotal) Pivotal HAWQⓇ codebase into the Apache Software
  Foundation (ASF) in order to build a vibrant, diverse and
  self-governed open source community around the technology. Pivotal has
  agreed to transfer the brand name HAWQ to Apache Software Foundation
  and will stop using HAWQ to refer to this software if the project gets
  accepted into the ASF Incubator under the name of Apache HAWQ
  (incubating). Pivotal will continue to market and sell an analytic
  engine product that includes Apache HAWQ (incubating). While HAWQ is
  our primary choice for a name of the project, in anticipation of any
  potential issues with PODLINGNAMESEARCH we have come up with two
  alternative names: (1) Hornet; or (2) Grove.
 
  Pivotal is submitting this proposal to donate the HAWQ source code and
  associated artifacts (documentation, web site content, wiki, etc.) to
  the Apache Software Foundation Incubator under the Apache License,
  Version 2.0 and is asking Incubator PMC to establish an open source
  community.
 
  == Background ==
  While the ecosystem of open source SQL-on-Hadoop solutions is fairly
  developed by now, HAWQ has several unique features that will set it
  apart from existing ASF and non-ASF projects. HAWQ made its debut in
  2013 as a closed source product leveraging a decade's worth of product
  development effort invested in Greenplum DatabaseⓇ. Since then HAWQ
  has rapidly gained a solid customer base and became available on
  non-Pivotal distributions of Hadoop.
  In 2015 HAWQ still leverages the rock solid foundation of Greenplum
  Database, while at the same time embracing elasticity and resource
  management native to Hadoop applications. This allows HAWQ to provide
  superior SQL on Hadoop performance, scalability and coverage while
  also providing massively-parallel machine learning capabilities and
  support for native Hadoop file formats. In addition, HAWQ's advanced
  features include support for