Re: Incubator board report

2018-06-08 Thread Justin Mclean
HI,

> Could you elaborate on that a bit more?  Note that since the incubator
> report is public, discussing issues that may have come up on the private
> list could be a bit awkward.

Which is why i didn’t elaborate on it here. That info would go into a private 
section.

> There's 6 podlings who have not reported.  Some of them are repeat
> offenders.  If you don't get to it before then, in my morning I'll drop a
> note to them reminding them about their board report responsibilities.

If you could that would be a great help.

Thanks,
Justin
-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: Incubator board report

2018-06-08 Thread John D. Ament
On Fri, Jun 8, 2018 at 11:23 PM Justin Mclean 
wrote:

> Hi,
>
> I’m a little behind on sorting out my first incubator board report and I’m
> just abut to hop on several planes and trains (30+ hours travel) to get to
> a conference.
>
> I can see we have a bit to report this month, with a number of releases
> and a number of new projects coming into the incubator and a couple of
> issues with existing existing project that the board needs to know about
> (but are being dealt with).
>

Could you elaborate on that a bit more?  Note that since the incubator
report is public, discussing issues that may have come up on the private
list could be a bit awkward.


>
> I will able to get to it on Sunday / Monday so it will still be done
> before the board meeting. Before then if anyone thinks there something that
> need to be mentioned in the report just add it to this thread.
>

There's 6 podlings who have not reported.  Some of them are repeat
offenders.  If you don't get to it before then, in my morning I'll drop a
note to them reminding them about their board report responsibilities.

There's others that are still missing sign off.  That's OK because mentors
have until Tuesday next week to do that.




>
> Thanks,
> Justin
> -
> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> For additional commands, e-mail: general-h...@incubator.apache.org
>
>


Re: Looking for Champion

2018-06-08 Thread Li,De(BDG)
Thank you Willem, warmly welcome.

在 2018/6/8 下午11:03, "Willem Jiang"  写入:

>Hi,
>
>I'm willing to be the Mentor.
>Please count me in.
>
>
>
>Willem Jiang
>
>Twitter: willemjiang
>Weibo: 姜宁willem
>
>On Fri, Jun 8, 2018 at 8:59 PM, Dave Fisher  wrote:
>
>> Hi -
>>
>> I’m willing to Champion and Mentor. I have a couple of comments inline.
>> I’ll look at dependency licenses later today. It’s early for me.
>>
>>
>> > On Jun 7, 2018, at 9:45 PM, Li,De(BDG)  wrote:
>> >
>> > Hi all,
>> >
>> > I am Reed, as a developer worked with the team for Palo (a MPP-based
>> interactive SQL data warehousing).
>> > https://github.com/baidu/palo/wiki/Palo-Overview
>> >
>> > We propose to contribute Palo as an Apache Incubator project, and
>> > we are still looking for possible Champion if anyone would like to
>> volunteer. Thanks a lot.
>> >
>> > Best Regards,
>> > Reed
>> >
>> > ===
>> > The draft of the proposal as below:
>> >
>> > #Apache Palo
>> >
>> > ##Abstract
>> >
>> > Palo is a MPP-based interactive SQL data warehousing for reporting and
>> analysis.
>> >
>> > ##Proposal
>> >
>> > We propose to contribute the Palo codebase and associated artifacts
>> (e.g. documentation, web-site content etc.) to the Apache Software
>> Foundation with the intent of forming a productive, meritocratic and
>>open
>> community around Palo’s continued development, according to the ‘Apache
>> Way’.
>> >
>> > Baidu owns several trademarks regarding Palo, and proposes to transfer
>> ownership of those trademarks in full to the ASF.
>> >
>> > ###Overview of Palo
>> >
>> > Palo’s implementation consists of two daemons: Frontend (FE) and
>>Backend
>> (BE).
>> >
>> > **Frontend daemon** consists of query coordinator and catalog manager.
>> Query coordinator is responsible for receiving users’ sql queries,
>> compiling queries and managing queries execution. Catalog manager is
>> responsible for managing metadata such as databases, tables, partitions,
>> replicas and etc. Several frontend daemons could be deployed to
>>guarantee
>> fault-tolerance, and load balancing.
>> >
>> > **Backend daemon** stores the data and executes the query fragments.
>> Many backend daemons could also be deployed to provide scalability and
>> fault-tolerance.
>> >
>> > A typical Palo cluster generally composes of several frontend daemons
>> and dozens to hundreds of backend daemons.
>> >
>> > Users can use MySQL client tools to connect any frontend daemon to
>> submit SQL query. Frontend receives the query and compiles it into query
>> plans executable by the Backend. Then Frontend sends the query plan
>> fragments to Backend. Backend will build a query execution DAG. Data is
>> fetched and pipelined into the DAG. The final result response is sent to
>> client via Frontend. The distribution of query fragment execution takes
>> minimizing data movement and maximizing scan locality as the main goal.
>> >
>> > ##Background
>> >
>> > At Baidu, Prior to Palo, different tools were deployed to solve
>>diverse
>> requirements in many ways. And when a use case requires the simultaneous
>> availability of capabilities that cannot all be provided by a single
>>tool,
>> users were forced to build hybrid architectures that stitch multiple
>>tools
>> together, but we believe that they shouldn’t need to accept such
>>inherent
>> complexity. A storage system built to provide great performance across a
>> broad range of workloads provides a more elegant solution to the
>>problems
>> that hybrid architectures aim to solve. Palo is the solution.
>> >
>> > Palo is designed to be a simple and single tightly coupled system, not
>> depending on other systems. Palo provides high concurrent low latency
>>point
>> query performance, but also provides high throughput queries of ad-hoc
>> analysis. Palo provides bulk-batch data loading, but also provides near
>> real-time mini-batch data loading. Palo also provides high availability,
>> reliability, fault tolerance, and scalability.
>> >
>> > ##Rationale
>> >
>> > Palo mainly integrates the technology of Google Mesa and Apache
>>Impala.
>> >
>> > Mesa is a highly scalable analytic data storage system that stores
>> critical measurement data related to Google's Internet advertising
>> business. Mesa is designed to satisfy complex and challenging set of
>>users’
>> and systems’ requirements, including near real-time data ingestion and
>> query ability, as well as high availability, reliability, fault
>>tolerance,
>> and scalability for large data and query volumes.
>> >
>> > Impala is a modern, open-source MPP SQL engine architected from the
>> ground up for the Hadoop data processing environment. At present, by
>>virtue
>> of its superior performance and rich functionality, Impala has been
>> comparable to many commercial MPP database query engine. Mesa can
>>satisfy
>> the needs of many of our storage requirements, however Mesa itself does
>>not
>> provide a SQL query engine; Impala is a very good MPP SQL query engine,
>>but
>> 

Re: Looking for Champion

2018-06-08 Thread Li,De(BDG)
Hi Dave,

Thank you very much your help and warmly welcome you as Palo’s Champion
and Mentor.
About licenses, we known as far as following:

--
1. aes/* mysql-5.6  GPL v2.1
2. util/mysql_dtoa.cpp Percona Server for MySQL GPL
3. http/mongoose.h mongoose MIT License
--


We will resolve the ASAP.



在 2018/6/8 下午8:59, "Dave Fisher"  写入:

>Hi -
>
>I’m willing to Champion and Mentor. I have a couple of comments inline.
>I’ll look at dependency licenses later today. It’s early for me.
>
>
>> On Jun 7, 2018, at 9:45 PM, Li,De(BDG)  wrote:
>> 
>> Hi all,
>> 
>> I am Reed, as a developer worked with the team for Palo (a MPP-based
>>interactive SQL data warehousing).
>> https://github.com/baidu/palo/wiki/Palo-Overview
>> 
>> We propose to contribute Palo as an Apache Incubator project, and
>> we are still looking for possible Champion if anyone would like to
>>volunteer. Thanks a lot.
>> 
>> Best Regards,
>> Reed
>> 
>> ===
>> The draft of the proposal as below:
>> 
>> #Apache Palo
>> 
>> ##Abstract
>> 
>> Palo is a MPP-based interactive SQL data warehousing for reporting and
>>analysis.
>> 
>> ##Proposal
>> 
>> We propose to contribute the Palo codebase and associated artifacts
>>(e.g. documentation, web-site content etc.) to the Apache Software
>>Foundation with the intent of forming a productive, meritocratic and
>>open community around Palo’s continued development, according to the
>>‘Apache Way’.
>> 
>> Baidu owns several trademarks regarding Palo, and proposes to transfer
>>ownership of those trademarks in full to the ASF.
>> 
>> ###Overview of Palo
>> 
>> Palo’s implementation consists of two daemons: Frontend (FE) and
>>Backend (BE).
>> 
>> **Frontend daemon** consists of query coordinator and catalog manager.
>>Query coordinator is responsible for receiving users’ sql queries,
>>compiling queries and managing queries execution. Catalog manager is
>>responsible for managing metadata such as databases, tables, partitions,
>>replicas and etc. Several frontend daemons could be deployed to
>>guarantee fault-tolerance, and load balancing.
>> 
>> **Backend daemon** stores the data and executes the query fragments.
>>Many backend daemons could also be deployed to provide scalability and
>>fault-tolerance.
>> 
>> A typical Palo cluster generally composes of several frontend daemons
>>and dozens to hundreds of backend daemons.
>> 
>> Users can use MySQL client tools to connect any frontend daemon to
>>submit SQL query. Frontend receives the query and compiles it into query
>>plans executable by the Backend. Then Frontend sends the query plan
>>fragments to Backend. Backend will build a query execution DAG. Data is
>>fetched and pipelined into the DAG. The final result response is sent to
>>client via Frontend. The distribution of query fragment execution takes
>>minimizing data movement and maximizing scan locality as the main goal.
>> 
>> ##Background
>> 
>> At Baidu, Prior to Palo, different tools were deployed to solve diverse
>>requirements in many ways. And when a use case requires the simultaneous
>>availability of capabilities that cannot all be provided by a single
>>tool, users were forced to build hybrid architectures that stitch
>>multiple tools together, but we believe that they shouldn’t need to
>>accept such inherent complexity. A storage system built to provide great
>>performance across a broad range of workloads provides a more elegant
>>solution to the problems that hybrid architectures aim to solve. Palo is
>>the solution.
>> 
>> Palo is designed to be a simple and single tightly coupled system, not
>>depending on other systems. Palo provides high concurrent low latency
>>point query performance, but also provides high throughput queries of
>>ad-hoc analysis. Palo provides bulk-batch data loading, but also
>>provides near real-time mini-batch data loading. Palo also provides high
>>availability, reliability, fault tolerance, and scalability.
>> 
>> ##Rationale
>> 
>> Palo mainly integrates the technology of Google Mesa and Apache Impala.
>> 
>> Mesa is a highly scalable analytic data storage system that stores
>>critical measurement data related to Google's Internet advertising
>>business. Mesa is designed to satisfy complex and challenging set of
>>users’ and systems’ requirements, including near real-time data
>>ingestion and query ability, as well as high availability, reliability,
>>fault tolerance, and scalability for large data and query volumes.
>> 
>> Impala is a modern, open-source MPP SQL engine architected from the
>>ground up for the Hadoop data processing environment. At present, by
>>virtue of its superior performance and rich functionality, Impala has
>>been comparable to many commercial MPP database query engine. Mesa can
>>satisfy the needs of many of our storage requirements, however Mesa
>>itself does not provide a SQL query engine; Impala is a very good MPP
>>SQL query engine, but the lack of a perfect distributed storage engine.
>>So in the end we 

Incubator board report

2018-06-08 Thread Justin Mclean
Hi,

I’m a little behind on sorting out my first incubator board report and I’m just 
abut to hop on several planes and trains (30+ hours travel) to get to a 
conference.

I can see we have a bit to report this month, with a number of releases and a 
number of new projects coming into the incubator and a couple of issues with 
existing existing project that the board needs to know about (but are being 
dealt with).

I will able to get to it on Sunday / Monday so it will still be done before the 
board meeting. Before then if anyone thinks there something that need to be 
mentioned in the report just add it to this thread.

Thanks,
Justin
-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: Looking for Champion

2018-06-08 Thread Tan,Zhongyi
thanks,willem

we are very appreciate.

> 在 2018年6月8日,23:03,Willem Jiang  写道:
> 
> Hi,
> 
> I'm willing to be the Mentor.
> Please count me in.
> 
> 
> 
> Willem Jiang
> 
> Twitter: willemjiang
> Weibo: 姜宁willem
> 
>> On Fri, Jun 8, 2018 at 8:59 PM, Dave Fisher  wrote:
>> 
>> Hi -
>> 
>> I’m willing to Champion and Mentor. I have a couple of comments inline.
>> I’ll look at dependency licenses later today. It’s early for me.
>> 
>> 
>>> On Jun 7, 2018, at 9:45 PM, Li,De(BDG)  wrote:
>>> 
>>> Hi all,
>>> 
>>> I am Reed, as a developer worked with the team for Palo (a MPP-based
>> interactive SQL data warehousing).
>>> https://github.com/baidu/palo/wiki/Palo-Overview
>>> 
>>> We propose to contribute Palo as an Apache Incubator project, and
>>> we are still looking for possible Champion if anyone would like to
>> volunteer. Thanks a lot.
>>> 
>>> Best Regards,
>>> Reed
>>> 
>>> ===
>>> The draft of the proposal as below:
>>> 
>>> #Apache Palo
>>> 
>>> ##Abstract
>>> 
>>> Palo is a MPP-based interactive SQL data warehousing for reporting and
>> analysis.
>>> 
>>> ##Proposal
>>> 
>>> We propose to contribute the Palo codebase and associated artifacts
>> (e.g. documentation, web-site content etc.) to the Apache Software
>> Foundation with the intent of forming a productive, meritocratic and open
>> community around Palo’s continued development, according to the ‘Apache
>> Way’.
>>> 
>>> Baidu owns several trademarks regarding Palo, and proposes to transfer
>> ownership of those trademarks in full to the ASF.
>>> 
>>> ###Overview of Palo
>>> 
>>> Palo’s implementation consists of two daemons: Frontend (FE) and Backend
>> (BE).
>>> 
>>> **Frontend daemon** consists of query coordinator and catalog manager.
>> Query coordinator is responsible for receiving users’ sql queries,
>> compiling queries and managing queries execution. Catalog manager is
>> responsible for managing metadata such as databases, tables, partitions,
>> replicas and etc. Several frontend daemons could be deployed to guarantee
>> fault-tolerance, and load balancing.
>>> 
>>> **Backend daemon** stores the data and executes the query fragments.
>> Many backend daemons could also be deployed to provide scalability and
>> fault-tolerance.
>>> 
>>> A typical Palo cluster generally composes of several frontend daemons
>> and dozens to hundreds of backend daemons.
>>> 
>>> Users can use MySQL client tools to connect any frontend daemon to
>> submit SQL query. Frontend receives the query and compiles it into query
>> plans executable by the Backend. Then Frontend sends the query plan
>> fragments to Backend. Backend will build a query execution DAG. Data is
>> fetched and pipelined into the DAG. The final result response is sent to
>> client via Frontend. The distribution of query fragment execution takes
>> minimizing data movement and maximizing scan locality as the main goal.
>>> 
>>> ##Background
>>> 
>>> At Baidu, Prior to Palo, different tools were deployed to solve diverse
>> requirements in many ways. And when a use case requires the simultaneous
>> availability of capabilities that cannot all be provided by a single tool,
>> users were forced to build hybrid architectures that stitch multiple tools
>> together, but we believe that they shouldn’t need to accept such inherent
>> complexity. A storage system built to provide great performance across a
>> broad range of workloads provides a more elegant solution to the problems
>> that hybrid architectures aim to solve. Palo is the solution.
>>> 
>>> Palo is designed to be a simple and single tightly coupled system, not
>> depending on other systems. Palo provides high concurrent low latency point
>> query performance, but also provides high throughput queries of ad-hoc
>> analysis. Palo provides bulk-batch data loading, but also provides near
>> real-time mini-batch data loading. Palo also provides high availability,
>> reliability, fault tolerance, and scalability.
>>> 
>>> ##Rationale
>>> 
>>> Palo mainly integrates the technology of Google Mesa and Apache Impala.
>>> 
>>> Mesa is a highly scalable analytic data storage system that stores
>> critical measurement data related to Google's Internet advertising
>> business. Mesa is designed to satisfy complex and challenging set of users’
>> and systems’ requirements, including near real-time data ingestion and
>> query ability, as well as high availability, reliability, fault tolerance,
>> and scalability for large data and query volumes.
>>> 
>>> Impala is a modern, open-source MPP SQL engine architected from the
>> ground up for the Hadoop data processing environment. At present, by virtue
>> of its superior performance and rich functionality, Impala has been
>> comparable to many commercial MPP database query engine. Mesa can satisfy
>> the needs of many of our storage requirements, however Mesa itself does not
>> provide a SQL query engine; Impala is a very good MPP SQL query engine, but
>> the lack of a perfect distributed 

Re: Looking for Champion

2018-06-08 Thread Tan,Zhongyi
great,dave,we will add you as champion.

thanks

> 在 2018年6月8日,20:59,Dave Fisher  写道:
> 
> Hi -
> 
> I’m willing to Champion and Mentor. I have a couple of comments inline. I’ll 
> look at dependency licenses later today. It’s early for me.
> 
> 
>> On Jun 7, 2018, at 9:45 PM, Li,De(BDG)  wrote:
>> 
>> Hi all,
>> 
>> I am Reed, as a developer worked with the team for Palo (a MPP-based 
>> interactive SQL data warehousing).
>> https://github.com/baidu/palo/wiki/Palo-Overview
>> 
>> We propose to contribute Palo as an Apache Incubator project, and
>> we are still looking for possible Champion if anyone would like to 
>> volunteer. Thanks a lot.
>> 
>> Best Regards,
>> Reed
>> 
>> ===
>> The draft of the proposal as below:
>> 
>> #Apache Palo
>> 
>> ##Abstract
>> 
>> Palo is a MPP-based interactive SQL data warehousing for reporting and 
>> analysis.
>> 
>> ##Proposal
>> 
>> We propose to contribute the Palo codebase and associated artifacts (e.g. 
>> documentation, web-site content etc.) to the Apache Software Foundation with 
>> the intent of forming a productive, meritocratic and open community around 
>> Palo’s continued development, according to the ‘Apache Way’.
>> 
>> Baidu owns several trademarks regarding Palo, and proposes to transfer 
>> ownership of those trademarks in full to the ASF.
>> 
>> ###Overview of Palo
>> 
>> Palo’s implementation consists of two daemons: Frontend (FE) and Backend 
>> (BE).
>> 
>> **Frontend daemon** consists of query coordinator and catalog manager. Query 
>> coordinator is responsible for receiving users’ sql queries, compiling 
>> queries and managing queries execution. Catalog manager is responsible for 
>> managing metadata such as databases, tables, partitions, replicas and etc. 
>> Several frontend daemons could be deployed to guarantee fault-tolerance, and 
>> load balancing.
>> 
>> **Backend daemon** stores the data and executes the query fragments. Many 
>> backend daemons could also be deployed to provide scalability and 
>> fault-tolerance.
>> 
>> A typical Palo cluster generally composes of several frontend daemons and 
>> dozens to hundreds of backend daemons.
>> 
>> Users can use MySQL client tools to connect any frontend daemon to submit 
>> SQL query. Frontend receives the query and compiles it into query plans 
>> executable by the Backend. Then Frontend sends the query plan fragments to 
>> Backend. Backend will build a query execution DAG. Data is fetched and 
>> pipelined into the DAG. The final result response is sent to client via 
>> Frontend. The distribution of query fragment execution takes minimizing data 
>> movement and maximizing scan locality as the main goal.
>> 
>> ##Background
>> 
>> At Baidu, Prior to Palo, different tools were deployed to solve diverse 
>> requirements in many ways. And when a use case requires the simultaneous 
>> availability of capabilities that cannot all be provided by a single tool, 
>> users were forced to build hybrid architectures that stitch multiple tools 
>> together, but we believe that they shouldn’t need to accept such inherent 
>> complexity. A storage system built to provide great performance across a 
>> broad range of workloads provides a more elegant solution to the problems 
>> that hybrid architectures aim to solve. Palo is the solution.
>> 
>> Palo is designed to be a simple and single tightly coupled system, not 
>> depending on other systems. Palo provides high concurrent low latency point 
>> query performance, but also provides high throughput queries of ad-hoc 
>> analysis. Palo provides bulk-batch data loading, but also provides near 
>> real-time mini-batch data loading. Palo also provides high availability, 
>> reliability, fault tolerance, and scalability.
>> 
>> ##Rationale
>> 
>> Palo mainly integrates the technology of Google Mesa and Apache Impala.
>> 
>> Mesa is a highly scalable analytic data storage system that stores critical 
>> measurement data related to Google's Internet advertising business. Mesa is 
>> designed to satisfy complex and challenging set of users’ and systems’ 
>> requirements, including near real-time data ingestion and query ability, as 
>> well as high availability, reliability, fault tolerance, and scalability for 
>> large data and query volumes.
>> 
>> Impala is a modern, open-source MPP SQL engine architected from the ground 
>> up for the Hadoop data processing environment. At present, by virtue of its 
>> superior performance and rich functionality, Impala has been comparable to 
>> many commercial MPP database query engine. Mesa can satisfy the needs of 
>> many of our storage requirements, however Mesa itself does not provide a SQL 
>> query engine; Impala is a very good MPP SQL query engine, but the lack of a 
>> perfect distributed storage engine. So in the end we chose the combination 
>> of these two technologies.
>> 
>> Learning from Mesa’s data model, we developed a distributed storage engine. 
>> Unlike Mesa, this storage 

Re: Looking for Champion

2018-06-08 Thread Jim Apple
>
> Generally Apache has no rules against multiple projects fulfilling similar
> goals or use cases, even when those projects might compete. However I think
> it would be relatively unusual to incubate a project that appears to be
> derived from a fork of an existing project, at least without first
> considering whether the additional feature set could be contributed back to
> the existing community.
>

And this is something I'm really excited about. If only the storage system
part of Palo were contributed to the ASF, and simultaneously the Palo
community and the Impala community worked together to integrate the query
engine work of Palo into Impala, then this could provide a lot of benefit
to users, I think. My hope is that it would eliminate the toil the Palo
community is engaged in by rebasing Impala changes (as Tim noticed).
Impala, meanwhile, might benefit from some changes Palo has made, like SIMD
filtering.

This could be a lot of work, but the current system seems to already
include quite a lot of inefficiency from the duplication.


Re: Looking for Champion

2018-06-08 Thread Ted Dunning
Open LDAP is a form of copy-left. It requires source code distribution of
binary packaged versions.



On Fri, Jun 8, 2018 at 7:10 PM Dave Fisher  wrote:

> Yuck. That’s a mess. That is one very large diff.
>
> I see a few files related to AES the were GPL converted to Apache which
> not allowed.
> Copyrights were changed too which is also incorrect.
>
> Changes to this file be/src/http/mongoose.h
> 
>  violate
> license and copyright of Sergey Lyubka
>
> GitHub makes you expand each diff after awhile.
>
> There are dependency licenses that might be issues too.
>
> These licenses have not been evaluated by LEGAL.
> * OpenLdap (OpenLDAP Software License)
>
> http://www.openldap.org/devel/gitweb.cgi?p=openldap.git;a=blob;f=LICENSE;hb=e5f8117f0ce088d0bd7a8e18ddf37eaa40eb09b1
> * rapidjson (Tencent)
> Unknown
> * cyrus-sasl (CMU License)
> https://spdx.org/licenses/MIT-CMU.html
> AKA MIT-CMU
>
> Lots of work in evaluating licenses.
>
> On Jun 8, 2018, at 9:46 AM, Ted Dunning  wrote:
>
> Ouch.
>
> The copyright in question was attached to code from the source code for
> mySQL. There is no way that code can be in an Apache project.
>
> Given the cut and paste history, it seems like it will require a very
> detailed audit of code history or web searches to find where the original
> code came from. The my_aes.c and .h files, for instance, have no hint in
> their history that they came from GPL'ed code.
>
>
> Yeah. Lot’s of oversight.
>
> If we accept this proposal we need a Mentor who has time to help with this
> mess.
>
> I don’t know that I have the time to lead that effort. Anyone?
>
> Regards,
> Dave
>
>
> On Fri, Jun 8, 2018 at 5:37 PM Todd Lipcon  wrote:
>
> ...
>
> +1. Also briefly browsing the code I found suspicious commits like this
> one:
>
>
> https://github.com/baidu/palo/commit/6486be64c319fe0beb8c6b4430c1662de54f182e
>
> ... in which a GPL license copyright by Oracle was "fixed" to be an Apache
> license copyright Baidu.
>
> So if this project does enter incubation I think we should be extra careful
> to audit the origins of all of the source code.
>
>
>
>


Re: Looking for Champion

2018-06-08 Thread Dave Fisher
Yuck. That’s a mess. That is one very large diff.

I see a few files related to AES the were GPL converted to Apache which not 
allowed.
Copyrights were changed too which is also incorrect.

Changes to this file be/src/http/mongoose.h 

 violate license and copyright of Sergey Lyubka

GitHub makes you expand each diff after awhile.

There are dependency licenses that might be issues too.

These licenses have not been evaluated by LEGAL.
* OpenLdap (OpenLDAP Software License)

http://www.openldap.org/devel/gitweb.cgi?p=openldap.git;a=blob;f=LICENSE;hb=e5f8117f0ce088d0bd7a8e18ddf37eaa40eb09b1
* rapidjson (Tencent)
Unknown
* cyrus-sasl (CMU License)
https://spdx.org/licenses/MIT-CMU.html
AKA MIT-CMU

Lots of work in evaluating licenses.

> On Jun 8, 2018, at 9:46 AM, Ted Dunning  wrote:
> 
> Ouch.
> 
> The copyright in question was attached to code from the source code for
> mySQL. There is no way that code can be in an Apache project.
> 
> Given the cut and paste history, it seems like it will require a very
> detailed audit of code history or web searches to find where the original
> code came from. The my_aes.c and .h files, for instance, have no hint in
> their history that they came from GPL'ed code.

Yeah. Lot’s of oversight.

If we accept this proposal we need a Mentor who has time to help with this mess.

I don’t know that I have the time to lead that effort. Anyone?

Regards,
Dave

> 
> On Fri, Jun 8, 2018 at 5:37 PM Todd Lipcon  wrote:
> 
>> ...
>> 
>> +1. Also briefly browsing the code I found suspicious commits like this
>> one:
>> 
>> https://github.com/baidu/palo/commit/6486be64c319fe0beb8c6b4430c1662de54f182e
>> 
>> ... in which a GPL license copyright by Oracle was "fixed" to be an Apache
>> license copyright Baidu.
>> 
>> So if this project does enter incubation I think we should be extra careful
>> to audit the origins of all of the source code.
>> 
>> 



signature.asc
Description: Message signed with OpenPGP


Re: Looking for Champion

2018-06-08 Thread Ted Dunning
Ouch.

The copyright in question was attached to code from the source code for
mySQL. There is no way that code can be in an Apache project.

Given the cut and paste history, it seems like it will require a very
detailed audit of code history or web searches to find where the original
code came from. The my_aes.c and .h files, for instance, have no hint in
their history that they came from GPL'ed code.

On Fri, Jun 8, 2018 at 5:37 PM Todd Lipcon  wrote:

> ...
>
> +1. Also briefly browsing the code I found suspicious commits like this
> one:
>
> https://github.com/baidu/palo/commit/6486be64c319fe0beb8c6b4430c1662de54f182e
>
> ... in which a GPL license copyright by Oracle was "fixed" to be an Apache
> license copyright Baidu.
>
> So if this project does enter incubation I think we should be extra careful
> to audit the origins of all of the source code.
>
>


Re: Looking for Champion

2018-06-08 Thread Todd Lipcon
On Fri, Jun 8, 2018 at 9:18 AM, Tim Armstrong 
wrote:

> > Meanwhile we found Impala is a very good MPP SQL query engine, so we
> integrated
> them together.
>
> Palo didn't integrate with Impala, it forked Impala's codebase and embedded
> it in its own repository. I don't remember any attempts from the Palo team
> to engage with the Impala community or attempt to work with us to
> contribute any improvements.
>
> It looks like Palo is still pulling in new code from Impala.  E.g. this
> commit includes a bunch of code I wrote as part of IMPALA-3200:
> https://github.com/baidu/palo/commit/2419384e8a211f10e7636afc6d3423
> 700ba22b5a#diff-1c501d9a8b5c3d1d1cce48d5e1fb0edf
>
> The code isn't owned by any individual, I contributed it to Apache and it's
> free for anyone to do what they want to do with it, but pulling in
> improvements from other projects without any attempt to attribute it or
> contribute improvements back seems contrary to the Apache way.
>

+1. Also briefly browsing the code I found suspicious commits like this one:
https://github.com/baidu/palo/commit/6486be64c319fe0beb8c6b4430c1662de54f182e

... in which a GPL license copyright by Oracle was "fixed" to be an Apache
license copyright Baidu.

So if this project does enter incubation I think we should be extra careful
to audit the origins of all of the source code.

-Todd


> On Fri, Jun 8, 2018 at 9:12 AM, Todd Lipcon  wrote:
>
> > On Thu, Jun 7, 2018 at 11:55 PM, Li,De(BDG)  wrote:
> >
> > > Hi, Jim
> > >
> > > Thank you for your response.
> > > Actually, we start Palo in several years ago, and that time we
> developed
> > > the storage engine based on Mesa technology.
> > > Meanwhile we found Impala is a very good MPP SQL query engine, so we
> > > integrated them together.
> > >
> >
> > From what I can tell of the Palo source, it's not so much an integration
> as
> > a copied-and-modified codebase, right? i.e Palo does not use Impala as a
> > dependency, but rather shares a lot of code from the Impala project that
> > has since diverged.
> >
> >
> > >
> > > With this integration, the goal of Palo is to implement a single,
> > > full-featured, mysql protocol compatible data warehousing.
> > >
> >
> > That sounds pretty similar to the goals of the Impala project. Impala
> isn't
> > MySQL-compatible at the moment but that seems more like a particular
> > feature that could be added rather than a distinct identity of the
> project.
> > Otherwise, Impala's goal is to be a full featured data warehouse engine
> as
> > well.
> >
> > Generally Apache has no rules against multiple projects fulfilling
> similar
> > goals or use cases, even when those projects might compete. However I
> think
> > it would be relatively unusual to incubate a project that appears to be
> > derived from a fork of an existing project, at least without first
> > considering whether the additional feature set could be contributed back
> to
> > the existing community.
> >
> > -Todd
> >
> >
> > > 在 2018/6/8 下午1:55, "Jim Apple"  写入:
> > >
> > > >Hello! As a contributor to Impala, I’d be interested in hearing
> thoughts
> > > >from the Palo community about integration between Impala and Palo.
> > > >
> > > >For instance, are there any apparent design goals of Impala that the
> > Palo
> > > >community thinks are fundamentally incompatible with Palo?
> > > >
> > > >Thanks,
> > > >Jim
> > > >
> > > >On 2018/06/08 04:45:32, "Li,De(BDG)"  wrote:
> > > >> Hi all,
> > > >>
> > > >> I am Reed, as a developer worked with the team for Palo (a MPP-based
> > > >>interactive SQL data warehousing).
> > > >> https://github.com/baidu/palo/wiki/Palo-Overview
> > > >>
> > > >> We propose to contribute Palo as an Apache Incubator project, and
> > > >> we are still looking for possible Champion if anyone would like to
> > > >>volunteer. Thanks a lot.
> > > >>
> > > >> Best Regards,
> > > >> Reed
> > > >>
> > > >> ===
> > > >> The draft of the proposal as below:
> > > >>
> > > >> #Apache Palo
> > > >>
> > > >> ##Abstract
> > > >>
> > > >> Palo is a MPP-based interactive SQL data warehousing for reporting
> and
> > > >>analysis.
> > > >>
> > > >> ##Proposal
> > > >>
> > > >> We propose to contribute the Palo codebase and associated artifacts
> > > >>(e.g. documentation, web-site content etc.) to the Apache Software
> > > >>Foundation with the intent of forming a productive, meritocratic and
> > > >>open community around Palo’s continued development, according to the
> > > >>‘Apache Way’.
> > > >>
> > > >> Baidu owns several trademarks regarding Palo, and proposes to
> transfer
> > > >>ownership of those trademarks in full to the ASF.
> > > >>
> > > >> ###Overview of Palo
> > > >>
> > > >> Palo’s implementation consists of two daemons: Frontend (FE) and
> > > >>Backend (BE).
> > > >>
> > > >> **Frontend daemon** consists of query coordinator and catalog
> manager.
> > > >>Query coordinator is responsible for receiving users’ sql queries,
> > > >>compiling queries and managing queries 

Re: Looking for Champion

2018-06-08 Thread Tim Armstrong
> Meanwhile we found Impala is a very good MPP SQL query engine, so we 
> integrated
them together.

Palo didn't integrate with Impala, it forked Impala's codebase and embedded
it in its own repository. I don't remember any attempts from the Palo team
to engage with the Impala community or attempt to work with us to
contribute any improvements.

It looks like Palo is still pulling in new code from Impala.  E.g. this
commit includes a bunch of code I wrote as part of IMPALA-3200:
https://github.com/baidu/palo/commit/2419384e8a211f10e7636afc6d3423700ba22b5a#diff-1c501d9a8b5c3d1d1cce48d5e1fb0edf

The code isn't owned by any individual, I contributed it to Apache and it's
free for anyone to do what they want to do with it, but pulling in
improvements from other projects without any attempt to attribute it or
contribute improvements back seems contrary to the Apache way.

Anyway, maybe incubation is an opportunity for us to work together, but I'd
hope that if Palo does go into incubation that it will rethink some of the
practices it's been following.

On Fri, Jun 8, 2018 at 9:12 AM, Todd Lipcon  wrote:

> On Thu, Jun 7, 2018 at 11:55 PM, Li,De(BDG)  wrote:
>
> > Hi, Jim
> >
> > Thank you for your response.
> > Actually, we start Palo in several years ago, and that time we developed
> > the storage engine based on Mesa technology.
> > Meanwhile we found Impala is a very good MPP SQL query engine, so we
> > integrated them together.
> >
>
> From what I can tell of the Palo source, it's not so much an integration as
> a copied-and-modified codebase, right? i.e Palo does not use Impala as a
> dependency, but rather shares a lot of code from the Impala project that
> has since diverged.
>
>
> >
> > With this integration, the goal of Palo is to implement a single,
> > full-featured, mysql protocol compatible data warehousing.
> >
>
> That sounds pretty similar to the goals of the Impala project. Impala isn't
> MySQL-compatible at the moment but that seems more like a particular
> feature that could be added rather than a distinct identity of the project.
> Otherwise, Impala's goal is to be a full featured data warehouse engine as
> well.
>
> Generally Apache has no rules against multiple projects fulfilling similar
> goals or use cases, even when those projects might compete. However I think
> it would be relatively unusual to incubate a project that appears to be
> derived from a fork of an existing project, at least without first
> considering whether the additional feature set could be contributed back to
> the existing community.
>
> -Todd
>
>
> > 在 2018/6/8 下午1:55, "Jim Apple"  写入:
> >
> > >Hello! As a contributor to Impala, I’d be interested in hearing thoughts
> > >from the Palo community about integration between Impala and Palo.
> > >
> > >For instance, are there any apparent design goals of Impala that the
> Palo
> > >community thinks are fundamentally incompatible with Palo?
> > >
> > >Thanks,
> > >Jim
> > >
> > >On 2018/06/08 04:45:32, "Li,De(BDG)"  wrote:
> > >> Hi all,
> > >>
> > >> I am Reed, as a developer worked with the team for Palo (a MPP-based
> > >>interactive SQL data warehousing).
> > >> https://github.com/baidu/palo/wiki/Palo-Overview
> > >>
> > >> We propose to contribute Palo as an Apache Incubator project, and
> > >> we are still looking for possible Champion if anyone would like to
> > >>volunteer. Thanks a lot.
> > >>
> > >> Best Regards,
> > >> Reed
> > >>
> > >> ===
> > >> The draft of the proposal as below:
> > >>
> > >> #Apache Palo
> > >>
> > >> ##Abstract
> > >>
> > >> Palo is a MPP-based interactive SQL data warehousing for reporting and
> > >>analysis.
> > >>
> > >> ##Proposal
> > >>
> > >> We propose to contribute the Palo codebase and associated artifacts
> > >>(e.g. documentation, web-site content etc.) to the Apache Software
> > >>Foundation with the intent of forming a productive, meritocratic and
> > >>open community around Palo’s continued development, according to the
> > >>‘Apache Way’.
> > >>
> > >> Baidu owns several trademarks regarding Palo, and proposes to transfer
> > >>ownership of those trademarks in full to the ASF.
> > >>
> > >> ###Overview of Palo
> > >>
> > >> Palo’s implementation consists of two daemons: Frontend (FE) and
> > >>Backend (BE).
> > >>
> > >> **Frontend daemon** consists of query coordinator and catalog manager.
> > >>Query coordinator is responsible for receiving users’ sql queries,
> > >>compiling queries and managing queries execution. Catalog manager is
> > >>responsible for managing metadata such as databases, tables,
> partitions,
> > >>replicas and etc. Several frontend daemons could be deployed to
> > >>guarantee fault-tolerance, and load balancing.
> > >>
> > >> **Backend daemon** stores the data and executes the query fragments.
> > >>Many backend daemons could also be deployed to provide scalability and
> > >>fault-tolerance.
> > >>
> > >> A typical Palo cluster generally composes of several frontend daemons
> > 

Re: Looking for Champion

2018-06-08 Thread Todd Lipcon
On Thu, Jun 7, 2018 at 11:55 PM, Li,De(BDG)  wrote:

> Hi, Jim
>
> Thank you for your response.
> Actually, we start Palo in several years ago, and that time we developed
> the storage engine based on Mesa technology.
> Meanwhile we found Impala is a very good MPP SQL query engine, so we
> integrated them together.
>

>From what I can tell of the Palo source, it's not so much an integration as
a copied-and-modified codebase, right? i.e Palo does not use Impala as a
dependency, but rather shares a lot of code from the Impala project that
has since diverged.


>
> With this integration, the goal of Palo is to implement a single,
> full-featured, mysql protocol compatible data warehousing.
>

That sounds pretty similar to the goals of the Impala project. Impala isn't
MySQL-compatible at the moment but that seems more like a particular
feature that could be added rather than a distinct identity of the project.
Otherwise, Impala's goal is to be a full featured data warehouse engine as
well.

Generally Apache has no rules against multiple projects fulfilling similar
goals or use cases, even when those projects might compete. However I think
it would be relatively unusual to incubate a project that appears to be
derived from a fork of an existing project, at least without first
considering whether the additional feature set could be contributed back to
the existing community.

-Todd


> 在 2018/6/8 下午1:55, "Jim Apple"  写入:
>
> >Hello! As a contributor to Impala, I’d be interested in hearing thoughts
> >from the Palo community about integration between Impala and Palo.
> >
> >For instance, are there any apparent design goals of Impala that the Palo
> >community thinks are fundamentally incompatible with Palo?
> >
> >Thanks,
> >Jim
> >
> >On 2018/06/08 04:45:32, "Li,De(BDG)"  wrote:
> >> Hi all,
> >>
> >> I am Reed, as a developer worked with the team for Palo (a MPP-based
> >>interactive SQL data warehousing).
> >> https://github.com/baidu/palo/wiki/Palo-Overview
> >>
> >> We propose to contribute Palo as an Apache Incubator project, and
> >> we are still looking for possible Champion if anyone would like to
> >>volunteer. Thanks a lot.
> >>
> >> Best Regards,
> >> Reed
> >>
> >> ===
> >> The draft of the proposal as below:
> >>
> >> #Apache Palo
> >>
> >> ##Abstract
> >>
> >> Palo is a MPP-based interactive SQL data warehousing for reporting and
> >>analysis.
> >>
> >> ##Proposal
> >>
> >> We propose to contribute the Palo codebase and associated artifacts
> >>(e.g. documentation, web-site content etc.) to the Apache Software
> >>Foundation with the intent of forming a productive, meritocratic and
> >>open community around Palo’s continued development, according to the
> >>‘Apache Way’.
> >>
> >> Baidu owns several trademarks regarding Palo, and proposes to transfer
> >>ownership of those trademarks in full to the ASF.
> >>
> >> ###Overview of Palo
> >>
> >> Palo’s implementation consists of two daemons: Frontend (FE) and
> >>Backend (BE).
> >>
> >> **Frontend daemon** consists of query coordinator and catalog manager.
> >>Query coordinator is responsible for receiving users’ sql queries,
> >>compiling queries and managing queries execution. Catalog manager is
> >>responsible for managing metadata such as databases, tables, partitions,
> >>replicas and etc. Several frontend daemons could be deployed to
> >>guarantee fault-tolerance, and load balancing.
> >>
> >> **Backend daemon** stores the data and executes the query fragments.
> >>Many backend daemons could also be deployed to provide scalability and
> >>fault-tolerance.
> >>
> >> A typical Palo cluster generally composes of several frontend daemons
> >>and dozens to hundreds of backend daemons.
> >>
> >> Users can use MySQL client tools to connect any frontend daemon to
> >>submit SQL query. Frontend receives the query and compiles it into query
> >>plans executable by the Backend. Then Frontend sends the query plan
> >>fragments to Backend. Backend will build a query execution DAG. Data is
> >>fetched and pipelined into the DAG. The final result response is sent to
> >>client via Frontend. The distribution of query fragment execution takes
> >>minimizing data movement and maximizing scan locality as the main goal.
> >>
> >> ##Background
> >>
> >> At Baidu, Prior to Palo, different tools were deployed to solve diverse
> >>requirements in many ways. And when a use case requires the simultaneous
> >>availability of capabilities that cannot all be provided by a single
> >>tool, users were forced to build hybrid architectures that stitch
> >>multiple tools together, but we believe that they shouldn’t need to
> >>accept such inherent complexity. A storage system built to provide great
> >>performance across a broad range of workloads provides a more elegant
> >>solution to the problems that hybrid architectures aim to solve. Palo is
> >>the solution.
> >>
> >> Palo is designed to be a simple and single tightly coupled system, not
> >>depending on 

Re: [VOTE] Accept Warble into the Apache Incubator

2018-06-08 Thread Matt Sicker
+1

On 8 June 2018 at 09:45, Daniel Gruno  wrote:

> +1 (binding) once more :)
>
> With regards,
> Daniel.
>
>
> On 06/08/2018 09:43 AM, Chris Thistlethwaite wrote:
>
>> Hi All (again),
>>
>> I'd like to start a vote on accepting Warble into the Apache Incubator.
>>
>> https://lists.apache.org/thread.html/1d62a2948d047cea38e6f01f92d5f138f8
>> 3acd2c9d86349023fb28e4@%3Cgeneral.incubator.apache.org%3E
>>
>> The ASF voting rules are described:
>>
>> https://www.apache.org/foundation/voting.html
>>
>> A vote for accepting a new Apache Incubator podling is a majority vote
>> for which only Incubator PMC member votes are binding.
>>
>> This vote will run for at least 72 hours. Please VOTE as follows
>> [ ] +1 Accept Warble into the Apache Incubator
>> [ ] +0 Abstain.
>> [ ] -1 Do not accept Warble into the Incubator
>>
>> The proposal is listed below, but you can also access it on the wiki:
>> https://wiki.apache.org/incubator/WarbleProposal
>>
>>
>> Thank you,
>> Chris T.
>>
>>
>>
>> = Apache Warble Proposal =
>>
>> == Abstract ==
>>
>>  Apache Warble is a distributed endpoint monitoring solution where
>>  the agent is hosted on your own hardware. The aim of Warble is to
>>  produce a more balanced and less binary view of services and
>>  systems, lowering the rates of false positives while also providing
>>  greater insight into possible peering issues and proactive trend
>>  analysis.
>>   ==Proposal ==
>>The goal of Warble will be to bring internal control of
>>  distributed monitoring back to the end user. Warble can be used as
>>  an independent service running on your own infrastructure
>>  monitoring other services in your infrastructure.
>>   == Background and Rationale ==
>>The beginning of this project was prompted by the service
>>  pingmybox.com (PMB) going end of life. This brought up
>>  conversation about FOSS services that can monitor internal and
>>  external services. PMB offered a unique code base to build this
>>  service upon a known infrastructure.
>>   ===Initial Goals ===
>>
>>  Bring PMB code into the ASF, refactor the client/server into
>>  a more reusable structure. Further reuse of code gives us the a
>>  great starting point to build a starting point.
>>   ==Current Status ==
>>The software exists as a proprietary service. We wish to
>> convert
>>  this to a FLOSS solution.
>>   ==Meritocracy ==
>>The initial PMC list covers new folks coming into the ASF.
>>  ==Community ==
>>There exists a large user-base of software like Warble, as well
>>  as existing users of the old propietary service. It is our hope
>>  that we can convert a great deal of these to contributors and
>>  testers for the new open source product.
>>   ==Core Developers ==
>>The initial set of developers are a lot of newcomers:
>>
>>  * Daniel Gruno 
>>  * Chris Thistlethwaite 
>>  * Haig Didizian 
>>  * Andrew Karetas 
>>  * Chandler Claxton 
>>  * Luke Stevens 
>>  * Mike Andescavage 
>>  * Chris Lambertus 
>>   ==Known Risks ==
>>   There are many existing services that provide external
>>  monitoring. They are well established and have large user bases.
>>   ===Orphaned Products ===
>>The initial PMC has great interest in open source projects,
>> though
>>  no formal projects have been run.
>> ===Inexperience with Open Source ===
>>Most of the initial PPMC members are new to the ASF and some
>> are
>>  new to open source projects. However, all are very interested in
>>  giving back to the community and projects.  Having said that, there
>>  are several people involved with extensive experience in the
>>  Apache Way and our procedures and processes.
>>   ===Homogenous Developers ===
>>The initial set of developers are employed by a variety of
>>  companies, located across the world, and used to working on a
>>  variety of distributed projects.
>>   ===Reliance on Salaried Developers ===
>>We do not expect the interest of the proposed initial PMC to be
>>  directly tied to current employment, but will actively seek to
>>  grow our volunteer base regardless.
>>   ===Relationships with Other Apache Products ===
>>Not much to say here. Many ASF projects make use of the
>> proprietary
>>  offering, we wish to open source it and have people engage in the
>>  development of the project. There are, at present, indirect
>>  relationships in that some dependencies are built on Apache
>>  software, but these are generally by proxy and does not merit
>>  considering Warble as a sub-project of an existing TLP.
>>==Initial Source ==
>>The initial task of the PMC will be assessing 

Re: Looking for Champion

2018-06-08 Thread Willem Jiang
Hi,

I'm willing to be the Mentor.
Please count me in.



Willem Jiang

Twitter: willemjiang
Weibo: 姜宁willem

On Fri, Jun 8, 2018 at 8:59 PM, Dave Fisher  wrote:

> Hi -
>
> I’m willing to Champion and Mentor. I have a couple of comments inline.
> I’ll look at dependency licenses later today. It’s early for me.
>
>
> > On Jun 7, 2018, at 9:45 PM, Li,De(BDG)  wrote:
> >
> > Hi all,
> >
> > I am Reed, as a developer worked with the team for Palo (a MPP-based
> interactive SQL data warehousing).
> > https://github.com/baidu/palo/wiki/Palo-Overview
> >
> > We propose to contribute Palo as an Apache Incubator project, and
> > we are still looking for possible Champion if anyone would like to
> volunteer. Thanks a lot.
> >
> > Best Regards,
> > Reed
> >
> > ===
> > The draft of the proposal as below:
> >
> > #Apache Palo
> >
> > ##Abstract
> >
> > Palo is a MPP-based interactive SQL data warehousing for reporting and
> analysis.
> >
> > ##Proposal
> >
> > We propose to contribute the Palo codebase and associated artifacts
> (e.g. documentation, web-site content etc.) to the Apache Software
> Foundation with the intent of forming a productive, meritocratic and open
> community around Palo’s continued development, according to the ‘Apache
> Way’.
> >
> > Baidu owns several trademarks regarding Palo, and proposes to transfer
> ownership of those trademarks in full to the ASF.
> >
> > ###Overview of Palo
> >
> > Palo’s implementation consists of two daemons: Frontend (FE) and Backend
> (BE).
> >
> > **Frontend daemon** consists of query coordinator and catalog manager.
> Query coordinator is responsible for receiving users’ sql queries,
> compiling queries and managing queries execution. Catalog manager is
> responsible for managing metadata such as databases, tables, partitions,
> replicas and etc. Several frontend daemons could be deployed to guarantee
> fault-tolerance, and load balancing.
> >
> > **Backend daemon** stores the data and executes the query fragments.
> Many backend daemons could also be deployed to provide scalability and
> fault-tolerance.
> >
> > A typical Palo cluster generally composes of several frontend daemons
> and dozens to hundreds of backend daemons.
> >
> > Users can use MySQL client tools to connect any frontend daemon to
> submit SQL query. Frontend receives the query and compiles it into query
> plans executable by the Backend. Then Frontend sends the query plan
> fragments to Backend. Backend will build a query execution DAG. Data is
> fetched and pipelined into the DAG. The final result response is sent to
> client via Frontend. The distribution of query fragment execution takes
> minimizing data movement and maximizing scan locality as the main goal.
> >
> > ##Background
> >
> > At Baidu, Prior to Palo, different tools were deployed to solve diverse
> requirements in many ways. And when a use case requires the simultaneous
> availability of capabilities that cannot all be provided by a single tool,
> users were forced to build hybrid architectures that stitch multiple tools
> together, but we believe that they shouldn’t need to accept such inherent
> complexity. A storage system built to provide great performance across a
> broad range of workloads provides a more elegant solution to the problems
> that hybrid architectures aim to solve. Palo is the solution.
> >
> > Palo is designed to be a simple and single tightly coupled system, not
> depending on other systems. Palo provides high concurrent low latency point
> query performance, but also provides high throughput queries of ad-hoc
> analysis. Palo provides bulk-batch data loading, but also provides near
> real-time mini-batch data loading. Palo also provides high availability,
> reliability, fault tolerance, and scalability.
> >
> > ##Rationale
> >
> > Palo mainly integrates the technology of Google Mesa and Apache Impala.
> >
> > Mesa is a highly scalable analytic data storage system that stores
> critical measurement data related to Google's Internet advertising
> business. Mesa is designed to satisfy complex and challenging set of users’
> and systems’ requirements, including near real-time data ingestion and
> query ability, as well as high availability, reliability, fault tolerance,
> and scalability for large data and query volumes.
> >
> > Impala is a modern, open-source MPP SQL engine architected from the
> ground up for the Hadoop data processing environment. At present, by virtue
> of its superior performance and rich functionality, Impala has been
> comparable to many commercial MPP database query engine. Mesa can satisfy
> the needs of many of our storage requirements, however Mesa itself does not
> provide a SQL query engine; Impala is a very good MPP SQL query engine, but
> the lack of a perfect distributed storage engine. So in the end we chose
> the combination of these two technologies.
> >
> > Learning from Mesa’s data model, we developed a distributed storage
> engine. Unlike Mesa, this 

Re: [VOTE] Accept Warble into the Apache Incubator

2018-06-08 Thread Daniel Gruno

+1 (binding) once more :)

With regards,
Daniel.

On 06/08/2018 09:43 AM, Chris Thistlethwaite wrote:

Hi All (again),

I'd like to start a vote on accepting Warble into the Apache Incubator.

https://lists.apache.org/thread.html/1d62a2948d047cea38e6f01f92d5f138f8
3acd2c9d86349023fb28e4@%3Cgeneral.incubator.apache.org%3E

The ASF voting rules are described:

https://www.apache.org/foundation/voting.html

A vote for accepting a new Apache Incubator podling is a majority vote
for which only Incubator PMC member votes are binding.

This vote will run for at least 72 hours. Please VOTE as follows
[ ] +1 Accept Warble into the Apache Incubator
[ ] +0 Abstain.
[ ] -1 Do not accept Warble into the Incubator

The proposal is listed below, but you can also access it on the wiki:
https://wiki.apache.org/incubator/WarbleProposal


Thank you,
Chris T.



= Apache Warble Proposal =

== Abstract ==

 Apache Warble is a distributed endpoint monitoring solution where
 the agent is hosted on your own hardware. The aim of Warble is to
 produce a more balanced and less binary view of services and
 systems, lowering the rates of false positives while also providing
 greater insight into possible peering issues and proactive trend
 analysis.
  
==Proposal ==
  
 The goal of Warble will be to bring internal control of

 distributed monitoring back to the end user. Warble can be used as
 an independent service running on your own infrastructure
 monitoring other services in your infrastructure.
  
== Background and Rationale ==
  
 The beginning of this project was prompted by the service

 pingmybox.com (PMB) going end of life. This brought up
 conversation about FOSS services that can monitor internal and
 external services. PMB offered a unique code base to build this
 service upon a known infrastructure.
  
===Initial Goals ===


 Bring PMB code into the ASF, refactor the client/server into
 a more reusable structure. Further reuse of code gives us the a
 great starting point to build a starting point.
  
==Current Status ==
  
 The software exists as a proprietary service. We wish to convert

 this to a FLOSS solution.
  
==Meritocracy ==
  
 The initial PMC list covers new folks coming into the ASF.
 
==Community ==
  
 There exists a large user-base of software like Warble, as well

 as existing users of the old propietary service. It is our hope
 that we can convert a great deal of these to contributors and
 testers for the new open source product.
  
==Core Developers ==
  
 The initial set of developers are a lot of newcomers:


 * Daniel Gruno 
 * Chris Thistlethwaite 
 * Haig Didizian 
 * Andrew Karetas 
 * Chandler Claxton 
 * Luke Stevens 
 * Mike Andescavage 
 * Chris Lambertus 
  
==Known Risks ==
 
 There are many existing services that provide external

 monitoring. They are well established and have large user bases.
  
===Orphaned Products ===
  
 The initial PMC has great interest in open source projects, though

 no formal projects have been run.
  
  
===Inexperience with Open Source ===
  
 Most of the initial PPMC members are new to the ASF and some are

 new to open source projects. However, all are very interested in
 giving back to the community and projects.  Having said that, there
 are several people involved with extensive experience in the
 Apache Way and our procedures and processes.
  
===Homogenous Developers ===
  
 The initial set of developers are employed by a variety of

 companies, located across the world, and used to working on a
 variety of distributed projects.
  
===Reliance on Salaried Developers ===
  
 We do not expect the interest of the proposed initial PMC to be

 directly tied to current employment, but will actively seek to
 grow our volunteer base regardless.
  
===Relationships with Other Apache Products ===
  
 Not much to say here. Many ASF projects make use of the proprietary

 offering, we wish to open source it and have people engage in the
 development of the project. There are, at present, indirect
 relationships in that some dependencies are built on Apache
 software, but these are generally by proxy and does not merit
 considering Warble as a sub-project of an existing TLP.
 
  
==Initial Source ==
  
 The initial task of the PMC will be assessing what we wish the

 project to contain. The proprietary vendor is willing to donate the
 software, but considerable rewriting and relicensing will have to
 take place. This will likely happen in stages, with the scrapers
 and UI being ported first, and a backend auth system being partly
 ported/donated, and 

[VOTE] Accept Warble into the Apache Incubator

2018-06-08 Thread Chris Thistlethwaite
Hi All (again),

I'd like to start a vote on accepting Warble into the Apache Incubator.

https://lists.apache.org/thread.html/1d62a2948d047cea38e6f01f92d5f138f8
3acd2c9d86349023fb28e4@%3Cgeneral.incubator.apache.org%3E

The ASF voting rules are described:

https://www.apache.org/foundation/voting.html

A vote for accepting a new Apache Incubator podling is a majority vote
for which only Incubator PMC member votes are binding.

This vote will run for at least 72 hours. Please VOTE as follows
[ ] +1 Accept Warble into the Apache Incubator
[ ] +0 Abstain.
[ ] -1 Do not accept Warble into the Incubator

The proposal is listed below, but you can also access it on the wiki:
https://wiki.apache.org/incubator/WarbleProposal


Thank you,
Chris T.



= Apache Warble Proposal =

== Abstract ==

Apache Warble is a distributed endpoint monitoring solution where
the agent is hosted on your own hardware. The aim of Warble is to
produce a more balanced and less binary view of services and
systems, lowering the rates of false positives while also providing
greater insight into possible peering issues and proactive trend
analysis.
 
==Proposal ==
 
The goal of Warble will be to bring internal control of
distributed monitoring back to the end user. Warble can be used as
an independent service running on your own infrastructure
monitoring other services in your infrastructure. 
 
== Background and Rationale ==
 
The beginning of this project was prompted by the service
pingmybox.com (PMB) going end of life. This brought up
conversation about FOSS services that can monitor internal and
external services. PMB offered a unique code base to build this
service upon a known infrastructure.
 
===Initial Goals ===

Bring PMB code into the ASF, refactor the client/server into 
a more reusable structure. Further reuse of code gives us the a
great starting point to build a starting point. 
 
==Current Status ==
 
The software exists as a proprietary service. We wish to convert
this to a FLOSS solution.
 
==Meritocracy ==
 
The initial PMC list covers new folks coming into the ASF. 

==Community ==
 
There exists a large user-base of software like Warble, as well 
as existing users of the old propietary service. It is our hope
that we can convert a great deal of these to contributors and
testers for the new open source product.
 
==Core Developers ==
 
The initial set of developers are a lot of newcomers:

* Daniel Gruno 
* Chris Thistlethwaite 
* Haig Didizian 
* Andrew Karetas 
* Chandler Claxton 
* Luke Stevens 
* Mike Andescavage 
* Chris Lambertus 
 
==Known Risks ==

There are many existing services that provide external 
monitoring. They are well established and have large user bases.
 
===Orphaned Products ===
 
The initial PMC has great interest in open source projects, though
no formal projects have been run.
 
 
===Inexperience with Open Source ===
 
Most of the initial PPMC members are new to the ASF and some are
new to open source projects. However, all are very interested in
giving back to the community and projects.  Having said that, there
are several people involved with extensive experience in the
Apache Way and our procedures and processes.
 
===Homogenous Developers ===
 
The initial set of developers are employed by a variety of
companies, located across the world, and used to working on a
variety of distributed projects.
 
===Reliance on Salaried Developers ===
 
We do not expect the interest of the proposed initial PMC to be
directly tied to current employment, but will actively seek to
grow our volunteer base regardless.
 
===Relationships with Other Apache Products ===
 
Not much to say here. Many ASF projects make use of the proprietary
offering, we wish to open source it and have people engage in the
development of the project. There are, at present, indirect
relationships in that some dependencies are built on Apache
software, but these are generally by proxy and does not merit
considering Warble as a sub-project of an existing TLP.

 
==Initial Source ==
 
The initial task of the PMC will be assessing what we wish the
project to contain. The proprietary vendor is willing to donate the
software, but considerable rewriting and relicensing will have to
take place. This will likely happen in stages, with the scrapers
and UI being ported first, and a backend auth system being partly
ported/donated, and partly developed from scratch at the ASF.
 
===Source and Intellectual Property Submission Plan ===
 
All the existing code in question (from the PMB suite) is owned by
Quenda IvS, and 

[CANCEL] [VOTE] Accept Warble into the Apache Incubator

2018-06-08 Thread Chris Thistlethwaite
My apologies, my email client decided to send that as HTML. I'm canceling this 
and re-sending.

-Chris T.



From: Chris Thistlethwaite 
Sent: Friday, June 8, 2018 10:06 AM
To: general@incubator.apache.org
Subject: [VOTE] Accept Warble into the Apache Incubator

Hi All,
I'd like to start a vote on accepting Warble into the Apache Incubator.
https://lists.apache.org/thread.html/1d62a2948d047cea38e6f01f92d5f138f8
3acd2c9d86349023fb28e4@%3Cgeneral.incubator.apache.org%3E
The ASF voting rules are described:
https://www.apache.org/foundation/voting.html
A vote for accepting a new Apache Incubator podling is a majority
votefor which only Incubator PMC member votes are binding.
This vote will run for at least 72 hours. Please VOTE as follows[ ] +1
Accept Warble into the Apache Incubator[ ] +0 Abstain.[ ] -1 Do not
accept Warble into the Incubator
The proposal is listed below, but you can also access it on the wiki:ht
tps://wiki.apache.org/incubator/WarbleProposal

Thank you,Chris T.


= Apache Warble Proposal =
== Abstract ==
Apache Warble is a distributed endpoint monitoring solution where
the agentis hosted on your own hardware. The aim of Warble is to
produce a more balanced and less binary view of services and
systems, lowering the rates of false positiveswhile also providing
greater insight into possible peering issues and proactive
trendanalysis. ==Proposal == The goal of Warble
will be to bring internal control of distributed monitoring back to
the end user. Warble can be used as an independentservice running
on your own infrastructure monitoring other servicesin your
infrastructure.  == Background and Rationale == The
beginning of this project was prompted by the service
pingmybox.com (PMB) going end of life. This brought up conversation
about FOSS services that can monitor internal and external
services. PMB offered a unique code base to build this service upon
a known infrastructure. ===Initial Goals ===
Bring PMB code into the ASF, refactor the client/server into a
more reusable structure. Further reuse of code gives us the a great
starting point to build a starting point.  ==Current Status
== The software exists as a proprietary service. We wish to
convert this toa FLOSS solution. ==Meritocracy
== The initial PMC list covers new folks coming into the
ASF. ==Community == There exists a large user-base of
software like Warble, as well as existing users of the old propietary
service. It is our hope that wecan convert a great deal of these to
contributors and testers for thenew open source
product. ==Core Developers == The initial set of
developers are a lot of newcomers:
* Daniel Gruno * Chris Thistlethwaite
* Haig Didizian * Andrew
Karetas * Chandler Claxton
* Luke Stevens
* Mike Andescavage
* Chris Lambertus
 ==Known Risks ==There are many
existing services that provide external monitoring. Theyare well
established and have large user bases. ===Orphaned Products
=== The initial PMC has great interest in open source projects,
though no formal projects have been
run.  ===Inexperience with Open Source === Most of
the initial PPMC members are new to the ASF and some are new to open
source projects. However,all are very interested in giving back to
the community and projects.  Having said that, there areseveral
people involved with extensive experience in the Apache Way and our
procedures and processes. ===Homogenous Developers
=== The initial set of developers are employed by a variety of
companies,located across the world, and used to working on a
variety ofdistributed projects. ===Reliance on Salaried
Developers === We do not expect the interest of the proposed
initial PMC to be directlytied to current employment, but will
actively seek to grow our volunteerbase
regardless. ===Relationships with Other Apache Products
=== Not much to say here. Many ASF projects make use of the
proprietaryoffering, we wish to open source it and have people
engage in thedevelopment of the project. There are, at present,
indirect relationships in that some dependencies are built on
Apache software, but these are generally by proxy and does not
merit considering Warble as a sub-projectof an existing
TLP.  ==Initial Source == The initial task of the
PMC will be assessing what we wish the project tocontain. The
proprietary vendor is willing to donate the software,
butconsiderable rewriting and relicensing will have to take place.
This willlikely happen in stages, with the scrapers and UI being
ported first,and a backend auth system being partly ported/donated,
and partlydeveloped from scratch at the 

Re: [VOTE] Accept Warble into the Apache Incubator

2018-06-08 Thread Daniel Gruno

+1 (binding), despite the terrible formatting :P

With regards,
Daniel.

On 06/08/2018 09:06 AM, Chris Thistlethwaite wrote:

Hi All,
I'd like to start a vote on accepting Warble into the Apache Incubator.
https://lists.apache.org/thread.html/1d62a2948d047cea38e6f01f92d5f138f8
3acd2c9d86349023fb28e4@%3Cgeneral.incubator.apache.org%3E
The ASF voting rules are described:
https://www.apache.org/foundation/voting.html
A vote for accepting a new Apache Incubator podling is a majority
votefor which only Incubator PMC member votes are binding.
This vote will run for at least 72 hours. Please VOTE as follows[ ] +1
Accept Warble into the Apache Incubator[ ] +0 Abstain.[ ] -1 Do not
accept Warble into the Incubator
The proposal is listed below, but you can also access it on the wiki:ht
tps://wiki.apache.org/incubator/WarbleProposal

Thank you,Chris T.


= Apache Warble Proposal =
== Abstract ==
 Apache Warble is a distributed endpoint monitoring solution where
the agentis hosted on your own hardware. The aim of Warble is to
produce a more balanced and less binary view of services and
systems, lowering the rates of false positiveswhile also providing
greater insight into possible peering issues and proactive
trendanalysis. ==Proposal == The goal of Warble
will be to bring internal control of distributed monitoring back to
the end user. Warble can be used as an independentservice running
on your own infrastructure monitoring other servicesin your
infrastructure.  == Background and Rationale == The
beginning of this project was prompted by the service
pingmybox.com (PMB) going end of life. This brought up conversation
about FOSS services that can monitor internal and external
services. PMB offered a unique code base to build this service upon
a known infrastructure. ===Initial Goals ===
 Bring PMB code into the ASF, refactor the client/server into a
more reusable structure. Further reuse of code gives us the a great
starting point to build a starting point.  ==Current Status
== The software exists as a proprietary service. We wish to
convert this toa FLOSS solution. ==Meritocracy
== The initial PMC list covers new folks coming into the
ASF. ==Community == There exists a large user-base of
software like Warble, as well as existing users of the old propietary
service. It is our hope that wecan convert a great deal of these to
contributors and testers for thenew open source
product. ==Core Developers == The initial set of
developers are a lot of newcomers:
 * Daniel Gruno * Chris Thistlethwaite
* Haig Didizian * Andrew
Karetas * Chandler Claxton
* Luke Stevens
* Mike Andescavage
* Chris Lambertus
 ==Known Risks ==There are many
existing services that provide external monitoring. Theyare well
established and have large user bases. ===Orphaned Products
=== The initial PMC has great interest in open source projects,
though no formal projects have been
run.  ===Inexperience with Open Source === Most of
the initial PPMC members are new to the ASF and some are new to open
source projects. However,all are very interested in giving back to
the community and projects.  Having said that, there areseveral
people involved with extensive experience in the Apache Way and our
procedures and processes. ===Homogenous Developers
=== The initial set of developers are employed by a variety of
companies,located across the world, and used to working on a
variety ofdistributed projects. ===Reliance on Salaried
Developers === We do not expect the interest of the proposed
initial PMC to be directlytied to current employment, but will
actively seek to grow our volunteerbase
regardless. ===Relationships with Other Apache Products
=== Not much to say here. Many ASF projects make use of the
proprietaryoffering, we wish to open source it and have people
engage in thedevelopment of the project. There are, at present,
indirect relationships in that some dependencies are built on
Apache software, but these are generally by proxy and does not
merit considering Warble as a sub-projectof an existing
TLP.  ==Initial Source == The initial task of the
PMC will be assessing what we wish the project tocontain. The
proprietary vendor is willing to donate the software,
butconsiderable rewriting and relicensing will have to take place.
This willlikely happen in stages, with the scrapers and UI being
ported first,and a backend auth system being partly ported/donated,
and partlydeveloped from scratch at the ASF. ===Source and
Intellectual Property Submission Plan === All the existing code
in question (from the PMB suite) is owned byQuenda IvS, and will 

[VOTE] Accept Warble into the Apache Incubator

2018-06-08 Thread Chris Thistlethwaite
Hi All,
I'd like to start a vote on accepting Warble into the Apache Incubator.
https://lists.apache.org/thread.html/1d62a2948d047cea38e6f01f92d5f138f8
3acd2c9d86349023fb28e4@%3Cgeneral.incubator.apache.org%3E
The ASF voting rules are described:
https://www.apache.org/foundation/voting.html
A vote for accepting a new Apache Incubator podling is a majority
votefor which only Incubator PMC member votes are binding.
This vote will run for at least 72 hours. Please VOTE as follows[ ] +1
Accept Warble into the Apache Incubator[ ] +0 Abstain.[ ] -1 Do not
accept Warble into the Incubator
The proposal is listed below, but you can also access it on the wiki:ht
tps://wiki.apache.org/incubator/WarbleProposal

Thank you,Chris T.


= Apache Warble Proposal =
== Abstract ==
Apache Warble is a distributed endpoint monitoring solution where
the agentis hosted on your own hardware. The aim of Warble is to
produce a more balanced and less binary view of services and
systems, lowering the rates of false positiveswhile also providing
greater insight into possible peering issues and proactive
trendanalysis. ==Proposal == The goal of Warble
will be to bring internal control of distributed monitoring back to
the end user. Warble can be used as an independentservice running
on your own infrastructure monitoring other servicesin your
infrastructure.  == Background and Rationale == The
beginning of this project was prompted by the service
pingmybox.com (PMB) going end of life. This brought up conversation
about FOSS services that can monitor internal and external
services. PMB offered a unique code base to build this service upon
a known infrastructure. ===Initial Goals ===
Bring PMB code into the ASF, refactor the client/server into a
more reusable structure. Further reuse of code gives us the a great
starting point to build a starting point.  ==Current Status
== The software exists as a proprietary service. We wish to
convert this toa FLOSS solution. ==Meritocracy
== The initial PMC list covers new folks coming into the
ASF. ==Community == There exists a large user-base of
software like Warble, as well as existing users of the old propietary
service. It is our hope that wecan convert a great deal of these to
contributors and testers for thenew open source
product. ==Core Developers == The initial set of
developers are a lot of newcomers:
* Daniel Gruno * Chris Thistlethwaite
* Haig Didizian * Andrew
Karetas * Chandler Claxton
* Luke Stevens
* Mike Andescavage
* Chris Lambertus
 ==Known Risks ==There are many
existing services that provide external monitoring. Theyare well
established and have large user bases. ===Orphaned Products
=== The initial PMC has great interest in open source projects,
though no formal projects have been
run.  ===Inexperience with Open Source === Most of
the initial PPMC members are new to the ASF and some are new to open
source projects. However,all are very interested in giving back to
the community and projects.  Having said that, there areseveral
people involved with extensive experience in the Apache Way and our
procedures and processes. ===Homogenous Developers
=== The initial set of developers are employed by a variety of
companies,located across the world, and used to working on a
variety ofdistributed projects. ===Reliance on Salaried
Developers === We do not expect the interest of the proposed
initial PMC to be directlytied to current employment, but will
actively seek to grow our volunteerbase
regardless. ===Relationships with Other Apache Products
=== Not much to say here. Many ASF projects make use of the
proprietaryoffering, we wish to open source it and have people
engage in thedevelopment of the project. There are, at present,
indirect relationships in that some dependencies are built on
Apache software, but these are generally by proxy and does not
merit considering Warble as a sub-projectof an existing
TLP.  ==Initial Source == The initial task of the
PMC will be assessing what we wish the project tocontain. The
proprietary vendor is willing to donate the software,
butconsiderable rewriting and relicensing will have to take place.
This willlikely happen in stages, with the scrapers and UI being
ported first,and a backend auth system being partly ported/donated,
and partlydeveloped from scratch at the ASF. ===Source and
Intellectual Property Submission Plan === All the existing code
in question (from the PMB suite) is owned byQuenda IvS, and will be
donated to the ASF. ===External Dependencies === The
current code base depends on incompatible licenses
for

Re: Looking for Champion

2018-06-08 Thread Dave Fisher
Hi -

I’m willing to Champion and Mentor. I have a couple of comments inline. I’ll 
look at dependency licenses later today. It’s early for me.


> On Jun 7, 2018, at 9:45 PM, Li,De(BDG)  wrote:
> 
> Hi all,
> 
> I am Reed, as a developer worked with the team for Palo (a MPP-based 
> interactive SQL data warehousing).
> https://github.com/baidu/palo/wiki/Palo-Overview
> 
> We propose to contribute Palo as an Apache Incubator project, and
> we are still looking for possible Champion if anyone would like to volunteer. 
> Thanks a lot.
> 
> Best Regards,
> Reed
> 
> ===
> The draft of the proposal as below:
> 
> #Apache Palo
> 
> ##Abstract
> 
> Palo is a MPP-based interactive SQL data warehousing for reporting and 
> analysis.
> 
> ##Proposal
> 
> We propose to contribute the Palo codebase and associated artifacts (e.g. 
> documentation, web-site content etc.) to the Apache Software Foundation with 
> the intent of forming a productive, meritocratic and open community around 
> Palo’s continued development, according to the ‘Apache Way’.
> 
> Baidu owns several trademarks regarding Palo, and proposes to transfer 
> ownership of those trademarks in full to the ASF.
> 
> ###Overview of Palo
> 
> Palo’s implementation consists of two daemons: Frontend (FE) and Backend (BE).
> 
> **Frontend daemon** consists of query coordinator and catalog manager. Query 
> coordinator is responsible for receiving users’ sql queries, compiling 
> queries and managing queries execution. Catalog manager is responsible for 
> managing metadata such as databases, tables, partitions, replicas and etc. 
> Several frontend daemons could be deployed to guarantee fault-tolerance, and 
> load balancing.
> 
> **Backend daemon** stores the data and executes the query fragments. Many 
> backend daemons could also be deployed to provide scalability and 
> fault-tolerance.
> 
> A typical Palo cluster generally composes of several frontend daemons and 
> dozens to hundreds of backend daemons.
> 
> Users can use MySQL client tools to connect any frontend daemon to submit SQL 
> query. Frontend receives the query and compiles it into query plans 
> executable by the Backend. Then Frontend sends the query plan fragments to 
> Backend. Backend will build a query execution DAG. Data is fetched and 
> pipelined into the DAG. The final result response is sent to client via 
> Frontend. The distribution of query fragment execution takes minimizing data 
> movement and maximizing scan locality as the main goal.
> 
> ##Background
> 
> At Baidu, Prior to Palo, different tools were deployed to solve diverse 
> requirements in many ways. And when a use case requires the simultaneous 
> availability of capabilities that cannot all be provided by a single tool, 
> users were forced to build hybrid architectures that stitch multiple tools 
> together, but we believe that they shouldn’t need to accept such inherent 
> complexity. A storage system built to provide great performance across a 
> broad range of workloads provides a more elegant solution to the problems 
> that hybrid architectures aim to solve. Palo is the solution.
> 
> Palo is designed to be a simple and single tightly coupled system, not 
> depending on other systems. Palo provides high concurrent low latency point 
> query performance, but also provides high throughput queries of ad-hoc 
> analysis. Palo provides bulk-batch data loading, but also provides near 
> real-time mini-batch data loading. Palo also provides high availability, 
> reliability, fault tolerance, and scalability.
> 
> ##Rationale
> 
> Palo mainly integrates the technology of Google Mesa and Apache Impala.
> 
> Mesa is a highly scalable analytic data storage system that stores critical 
> measurement data related to Google's Internet advertising business. Mesa is 
> designed to satisfy complex and challenging set of users’ and systems’ 
> requirements, including near real-time data ingestion and query ability, as 
> well as high availability, reliability, fault tolerance, and scalability for 
> large data and query volumes.
> 
> Impala is a modern, open-source MPP SQL engine architected from the ground up 
> for the Hadoop data processing environment. At present, by virtue of its 
> superior performance and rich functionality, Impala has been comparable to 
> many commercial MPP database query engine. Mesa can satisfy the needs of many 
> of our storage requirements, however Mesa itself does not provide a SQL query 
> engine; Impala is a very good MPP SQL query engine, but the lack of a perfect 
> distributed storage engine. So in the end we chose the combination of these 
> two technologies.
> 
> Learning from Mesa’s data model, we developed a distributed storage engine. 
> Unlike Mesa, this storage engine does not rely on any distributed file 
> system. Then we deeply integrate this storage engine with Impala query 
> engine. Query compiling, query execution coordination and catalog management 
> of 

Re: Looking for Champion

2018-06-08 Thread Tan,Zhongyi
Hi,guys, 

palo is one good project ,

Is there anyone who volunteer to be the champion of it to
help us to go through process to become an apache project?

Thanks

>
>On 2018/06/08 04:45:32, "Li,De(BDG)"  wrote:
>> Hi all,
>> 
>> I am Reed, as a developer worked with the team for Palo (a MPP-based
>>interactive SQL data warehousing).
>> https://github.com/baidu/palo/wiki/Palo-Overview
>> 
>> We propose to contribute Palo as an Apache Incubator project, and
>> we are still looking for possible Champion if anyone would like to
>>volunteer. Thanks a lot.
>> 
>> Best Regards,
>> Reed
>> 
>> ===
>> The draft of the proposal as below:
>> 
>> #Apache Palo
>> 
>> ##Abstract
>> 
>> Palo is a MPP-based interactive SQL data warehousing for reporting and
>>analysis.
>> 
>> ##Proposal
>> 
>> We propose to contribute the Palo codebase and associated artifacts
>>(e.g. documentation, web-site content etc.) to the Apache Software
>>Foundation with the intent of forming a productive, meritocratic and
>>open community around Palo’s continued development, according to the
>>‘Apache Way’.
>> 
>> Baidu owns several trademarks regarding Palo, and proposes to transfer
>>ownership of those trademarks in full to the ASF.
>> 
>> ###Overview of Palo
>> 
>> Palo’s implementation consists of two daemons: Frontend (FE) and
>>Backend (BE).
>> 
>> **Frontend daemon** consists of query coordinator and catalog manager.
>>Query coordinator is responsible for receiving users’ sql queries,
>>compiling queries and managing queries execution. Catalog manager is
>>responsible for managing metadata such as databases, tables, partitions,
>>replicas and etc. Several frontend daemons could be deployed to
>>guarantee fault-tolerance, and load balancing.
>> 
>> **Backend daemon** stores the data and executes the query fragments.
>>Many backend daemons could also be deployed to provide scalability and
>>fault-tolerance.
>> 
>> A typical Palo cluster generally composes of several frontend daemons
>>and dozens to hundreds of backend daemons.
>> 
>> Users can use MySQL client tools to connect any frontend daemon to
>>submit SQL query. Frontend receives the query and compiles it into query
>>plans executable by the Backend. Then Frontend sends the query plan
>>fragments to Backend. Backend will build a query execution DAG. Data is
>>fetched and pipelined into the DAG. The final result response is sent to
>>client via Frontend. The distribution of query fragment execution takes
>>minimizing data movement and maximizing scan locality as the main goal.
>> 
>> ##Background
>> 
>> At Baidu, Prior to Palo, different tools were deployed to solve diverse
>>requirements in many ways. And when a use case requires the simultaneous
>>availability of capabilities that cannot all be provided by a single
>>tool, users were forced to build hybrid architectures that stitch
>>multiple tools together, but we believe that they shouldn’t need to
>>accept such inherent complexity. A storage system built to provide great
>>performance across a broad range of workloads provides a more elegant
>>solution to the problems that hybrid architectures aim to solve. Palo is
>>the solution.
>> 
>> Palo is designed to be a simple and single tightly coupled system, not
>>depending on other systems. Palo provides high concurrent low latency
>>point query performance, but also provides high throughput queries of
>>ad-hoc analysis. Palo provides bulk-batch data loading, but also
>>provides near real-time mini-batch data loading. Palo also provides high
>>availability, reliability, fault tolerance, and scalability.
>> 
>> ##Rationale
>> 
>> Palo mainly integrates the technology of Google Mesa and Apache Impala.
>> 
>> Mesa is a highly scalable analytic data storage system that stores
>>critical measurement data related to Google's Internet advertising
>>business. Mesa is designed to satisfy complex and challenging set of
>>users’ and systems’ requirements, including near real-time data
>>ingestion and query ability, as well as high availability, reliability,
>>fault tolerance, and scalability for large data and query volumes.
>> 
>> Impala is a modern, open-source MPP SQL engine architected from the
>>ground up for the Hadoop data processing environment. At present, by
>>virtue of its superior performance and rich functionality, Impala has
>>been comparable to many commercial MPP database query engine. Mesa can
>>satisfy the needs of many of our storage requirements, however Mesa
>>itself does not provide a SQL query engine; Impala is a very good MPP
>>SQL query engine, but the lack of a perfect distributed storage engine.
>>So in the end we chose the combination of these two technologies.
>> 
>> Learning from Mesa’s data model, we developed a distributed storage
>>engine. Unlike Mesa, this storage engine does not rely on any
>>distributed file system. Then we deeply integrate this storage engine
>>with Impala query engine. Query compiling, query execution coordination
>>and 

Re: Looking for Champion

2018-06-08 Thread Li,De(BDG)
Hi, Jim

Thank you for your response.
Actually, we start Palo in several years ago, and that time we developed
the storage engine based on Mesa technology.
Meanwhile we found Impala is a very good MPP SQL query engine, so we
integrated them together.

With this integration, the goal of Palo is to implement a single,
full-featured, mysql protocol compatible data warehousing.


Best regards,
Reed

在 2018/6/8 下午1:55, "Jim Apple"  写入:

>Hello! As a contributor to Impala, I’d be interested in hearing thoughts
>from the Palo community about integration between Impala and Palo.
>
>For instance, are there any apparent design goals of Impala that the Palo
>community thinks are fundamentally incompatible with Palo?
>
>Thanks,
>Jim
>
>On 2018/06/08 04:45:32, "Li,De(BDG)"  wrote:
>> Hi all,
>> 
>> I am Reed, as a developer worked with the team for Palo (a MPP-based
>>interactive SQL data warehousing).
>> https://github.com/baidu/palo/wiki/Palo-Overview
>> 
>> We propose to contribute Palo as an Apache Incubator project, and
>> we are still looking for possible Champion if anyone would like to
>>volunteer. Thanks a lot.
>> 
>> Best Regards,
>> Reed
>> 
>> ===
>> The draft of the proposal as below:
>> 
>> #Apache Palo
>> 
>> ##Abstract
>> 
>> Palo is a MPP-based interactive SQL data warehousing for reporting and
>>analysis.
>> 
>> ##Proposal
>> 
>> We propose to contribute the Palo codebase and associated artifacts
>>(e.g. documentation, web-site content etc.) to the Apache Software
>>Foundation with the intent of forming a productive, meritocratic and
>>open community around Palo’s continued development, according to the
>>‘Apache Way’.
>> 
>> Baidu owns several trademarks regarding Palo, and proposes to transfer
>>ownership of those trademarks in full to the ASF.
>> 
>> ###Overview of Palo
>> 
>> Palo’s implementation consists of two daemons: Frontend (FE) and
>>Backend (BE).
>> 
>> **Frontend daemon** consists of query coordinator and catalog manager.
>>Query coordinator is responsible for receiving users’ sql queries,
>>compiling queries and managing queries execution. Catalog manager is
>>responsible for managing metadata such as databases, tables, partitions,
>>replicas and etc. Several frontend daemons could be deployed to
>>guarantee fault-tolerance, and load balancing.
>> 
>> **Backend daemon** stores the data and executes the query fragments.
>>Many backend daemons could also be deployed to provide scalability and
>>fault-tolerance.
>> 
>> A typical Palo cluster generally composes of several frontend daemons
>>and dozens to hundreds of backend daemons.
>> 
>> Users can use MySQL client tools to connect any frontend daemon to
>>submit SQL query. Frontend receives the query and compiles it into query
>>plans executable by the Backend. Then Frontend sends the query plan
>>fragments to Backend. Backend will build a query execution DAG. Data is
>>fetched and pipelined into the DAG. The final result response is sent to
>>client via Frontend. The distribution of query fragment execution takes
>>minimizing data movement and maximizing scan locality as the main goal.
>> 
>> ##Background
>> 
>> At Baidu, Prior to Palo, different tools were deployed to solve diverse
>>requirements in many ways. And when a use case requires the simultaneous
>>availability of capabilities that cannot all be provided by a single
>>tool, users were forced to build hybrid architectures that stitch
>>multiple tools together, but we believe that they shouldn’t need to
>>accept such inherent complexity. A storage system built to provide great
>>performance across a broad range of workloads provides a more elegant
>>solution to the problems that hybrid architectures aim to solve. Palo is
>>the solution.
>> 
>> Palo is designed to be a simple and single tightly coupled system, not
>>depending on other systems. Palo provides high concurrent low latency
>>point query performance, but also provides high throughput queries of
>>ad-hoc analysis. Palo provides bulk-batch data loading, but also
>>provides near real-time mini-batch data loading. Palo also provides high
>>availability, reliability, fault tolerance, and scalability.
>> 
>> ##Rationale
>> 
>> Palo mainly integrates the technology of Google Mesa and Apache Impala.
>> 
>> Mesa is a highly scalable analytic data storage system that stores
>>critical measurement data related to Google's Internet advertising
>>business. Mesa is designed to satisfy complex and challenging set of
>>users’ and systems’ requirements, including near real-time data
>>ingestion and query ability, as well as high availability, reliability,
>>fault tolerance, and scalability for large data and query volumes.
>> 
>> Impala is a modern, open-source MPP SQL engine architected from the
>>ground up for the Hadoop data processing environment. At present, by
>>virtue of its superior performance and rich functionality, Impala has
>>been comparable to many commercial MPP database query engine. Mesa can
>>satisfy 

Re: [VOTE] Pulsar Release 1.22.1-incubating Candidate 2

2018-06-08 Thread Yang Bo
Hi Matteo,

Having 2 different license headers for a source files is a bit strange, the
user would be confused what license the file actually uses.
The 3Clause-BSD license allows us to modify and redistribute in source
form, but I'm not sure whether it's OK to re-license it to ASL.


On Fri, Jun 8, 2018 at 2:17 PM, Matteo Merli  wrote:

> Hi Yang,
>
> these files are not directly found in Protobuf, though their content is
> derived from Protobuf code, though with several additions. These are
> special adapters to have protobuf to interact directly with Netty ByteBuf
> with zero-copy access. Based on an earlier discussion, the resolution was
> to have both headers as well as a comment that says:
>
> /**
>  * This file is derived from Google ProcolBuffer CodedInputStream class
>  */
>
> Matteo
>
> On Thu, Jun 7, 2018 at 7:22 PM Yang Bo  wrote:
>
> > Hi,
> >
> > I checked the source release and found a minor issue:
> >
> > pulsar-common/src/main/java/org/apache/pulsar/common/util/
> protobuf/ByteBufCoded{Input,Output}Stream.java
> > Those files are from google thus should not have ASF header in the source
> > file.
> >
> >
> > On Fri, Jun 8, 2018 at 9:51 AM, Sijie Guo  wrote:
> >
> > > +1 (repeated my +1 from dev@ votes)
> > >
> > > On Thu, Jun 7, 2018 at 11:29 AM Jai Asher 
> wrote:
> > >
> > > > This is the second release candidate for Apache Pulsar, patch release
> > > > version 1.22.1-incubating.
> > > >
> > > > It fixes the following issues:
> > > > https://github.com/apache/incubator-pulsar/milestone/15?closed=1
> > > >
> > > > *** Please download, test and vote on this release. This vote will
> stay
> > > > open for at least 72 hours ***
> > > >
> > > > Note that we are voting upon the source (tag), binaries are provided
> > for
> > > > convenience.
> > > >
> > > > Source and binary files:
> > > >
> > > > https://dist.apache.org/repos/dist/dev/incubator/pulsar/
> > > pulsar-1.22.1-incubating-candidate-2/
> > > >
> > > > SHA-1 checksums:
> > > >
> > > > f2d29aa5e046c5bdefd8f466bce8e9ead80a2e09
> > > > apache-pulsar-1.22.1-incubating-src.tar.gz
> > > > f2704cae22b7fb3c1b72daab6ebe7d484fdfec6b
> > > > apache-pulsar-1.22.1-incubating-bin.tar.gz
> > > >
> > > > Maven staging repo:
> > > >
> > https://repository.apache.org/content/repositories/orgapachepulsar-1018/
> > > >
> > > > The tag to be voted upon:
> > > > v1.22.1-incubating-candidate-2 (c9a369936af3b3ecc663b86ae959a3
> > > fbfa627aca)
> > > >
> > > > https://github.com/apache/incubator-pulsar/releases/tag/
> > > v1.22.1-incubating-candidate-2
> > > >
> > > > Pulsar's KEYS file containing PGP keys we use to sign the release:
> > > > https://dist.apache.org/repos/dist/release/incubator/pulsar/KEYS
> > > >
> > > > Please download the source package, and follow the README to build
> and
> > > run
> > > > the Pulsar standalone service.
> > > >
> > >
> >
> >
> >
> > --
> > Best Regards,
> > Yang.
> >
> --
> Matteo Merli
> 
>



-- 
Best Regards,
Yang.


Re: [VOTE] Pulsar Release 1.22.1-incubating Candidate 2

2018-06-08 Thread Matteo Merli
Hi Yang,

these files are not directly found in Protobuf, though their content is
derived from Protobuf code, though with several additions. These are
special adapters to have protobuf to interact directly with Netty ByteBuf
with zero-copy access. Based on an earlier discussion, the resolution was
to have both headers as well as a comment that says:

/**
 * This file is derived from Google ProcolBuffer CodedInputStream class
 */

Matteo

On Thu, Jun 7, 2018 at 7:22 PM Yang Bo  wrote:

> Hi,
>
> I checked the source release and found a minor issue:
>
> pulsar-common/src/main/java/org/apache/pulsar/common/util/protobuf/ByteBufCoded{Input,Output}Stream.java
> Those files are from google thus should not have ASF header in the source
> file.
>
>
> On Fri, Jun 8, 2018 at 9:51 AM, Sijie Guo  wrote:
>
> > +1 (repeated my +1 from dev@ votes)
> >
> > On Thu, Jun 7, 2018 at 11:29 AM Jai Asher  wrote:
> >
> > > This is the second release candidate for Apache Pulsar, patch release
> > > version 1.22.1-incubating.
> > >
> > > It fixes the following issues:
> > > https://github.com/apache/incubator-pulsar/milestone/15?closed=1
> > >
> > > *** Please download, test and vote on this release. This vote will stay
> > > open for at least 72 hours ***
> > >
> > > Note that we are voting upon the source (tag), binaries are provided
> for
> > > convenience.
> > >
> > > Source and binary files:
> > >
> > > https://dist.apache.org/repos/dist/dev/incubator/pulsar/
> > pulsar-1.22.1-incubating-candidate-2/
> > >
> > > SHA-1 checksums:
> > >
> > > f2d29aa5e046c5bdefd8f466bce8e9ead80a2e09
> > > apache-pulsar-1.22.1-incubating-src.tar.gz
> > > f2704cae22b7fb3c1b72daab6ebe7d484fdfec6b
> > > apache-pulsar-1.22.1-incubating-bin.tar.gz
> > >
> > > Maven staging repo:
> > >
> https://repository.apache.org/content/repositories/orgapachepulsar-1018/
> > >
> > > The tag to be voted upon:
> > > v1.22.1-incubating-candidate-2 (c9a369936af3b3ecc663b86ae959a3
> > fbfa627aca)
> > >
> > > https://github.com/apache/incubator-pulsar/releases/tag/
> > v1.22.1-incubating-candidate-2
> > >
> > > Pulsar's KEYS file containing PGP keys we use to sign the release:
> > > https://dist.apache.org/repos/dist/release/incubator/pulsar/KEYS
> > >
> > > Please download the source package, and follow the README to build and
> > run
> > > the Pulsar standalone service.
> > >
> >
>
>
>
> --
> Best Regards,
> Yang.
>
-- 
Matteo Merli