[RESULT] [VOTE] Accept Kylin into the Apache Incubator

2014-11-25 Thread Luke Han
The vote has passed with 7 binding +1, 1 non binding +1, no 0 and -1s.

Binding +1s :
John D. Ament
Henry Saputra
Andrew Purtell
Ted Dunning
Bertrand Delacretaz
Sergio Fernández
Alan D. Cabrera


Non Binding +1s:
Nick Dimiduk

Thanks everyone for voting. We will proceed with the next steps as per the
IPMC guidelines.

Thanks
Luke


2014-11-21 6:31 GMT+08:00 Luke Han luke...@gmail.com:

 Following the discussion earlier in the thread:

 http://mail-archives.apache.org/mod_mbox/incubator-general/201411.mbox/%3ccakmqrob22+n+r++date33f3pcpyujhfoeaqrms3t-udjwk6...@mail.gmail.com%3e

 I would like to call a VOTE for accepting Kylin as a new incubator project.

 The proposal is available at:
 https://wiki.apache.org/incubator/KylinProposal

 and posted the text of the proposal below also.

 Vote is open until 24th November 2014, 23:59:00 UTC

 [ ] +1 accept Kylin in the Incubator
 [ ] ±0
 [ ] -1 because...


 Thanks
 Luke


 Kylin Proposal
 ==

 # Abstract

 Kylin is a distributed and scalable OLAP engine built on Hadoop to
 support extremely large datasets.

 # Proposal

 Kylin is an open source Distributed Analytics Engine that provides
 multi-dimensional analysis (MOLAP) on Hadoop. Kylin is designed to
 accelerate analytics on Hadoop by allowing the use of SQL-compatible
 tools. Kylin provides a SQL interface and multi-dimensional analysis
 (MOLAP) on Hadoop to support extremely large datasets and tightly
 integrate with Hadoop ecosystem.

 ## Overview of Kylin

 Kylin platform has two parts of data processing and interactive:
 First, Kylin will read data from source, Hive, and run a set of tasks
 including Map Reduce job, shell script to pre-calcuate results for a
 specified data model, then save the resulting OLAP cube into storage
 such as HBase. Once these OLAP cubes are ready, a user can submit a
 request from any SQL-based tool or third party applications to Kylin’s
 REST server. The Server calls the Query Engine to determine if the
 target dataset already exists. If so, the engine directly accesses the
 target data in the form of a predefined cube, and returns the result
 with sub-second latency. Otherwise, the engine is designed to route
 non-matching queries to whichever SQL on Hadoop tool is already
 available on a Hadoop cluster, such as Hive.

 Kylin platform includes:

 - Metadata Manager: Kylin is a metadata-driven application. The Kylin
 Metadata Manager is the key component that manages all metadata stored
 in Kylin including all cube metadata. All other components rely on the
 Metadata Manager.

 - Job Engine: This engine is designed to handle all of the offline
 jobs including shell script, Java API, and Map Reduce jobs. The Job
 Engine manages and coordinates all of the jobs in Kylin to make sure
 each job executes and handles failures.

 - Storage Engine: This engine manages the underlying storage –
 specifically, the cuboids, which are stored as key-value pairs. The
 Storage Engine uses HBase – the best solution from the Hadoop
 ecosystem for leveraging an existing K-V system. Kylin can also be
 extended to support other K-V systems, such as Redis.

 - Query Engine: Once the cube is ready, the Query Engine can receive
 and parse user queries. It then interacts with other components to
 return the results to the user.

 - REST Server: The REST Server is an entry point for applications to
 develop against Kylin. Applications can submit queries, get results,
 trigger cube build jobs, get metadata, get user privileges, and so on.

 - ODBC Driver: To support third-party tools and applications – such as
 Tableau – we have built and open-sourced an ODBC Driver. The goal is
 to make it easy for users to onboard.

 # Background

 The challenge we face at eBay is that our data volume is becoming
 bigger and bigger while our user base is becoming more diverse. For
 e.g. our business users and analysts consistently ask for minimal
 latency when visualizing data on Tableau and Excel. So, we worked
 closely with our internal analyst community and outlined the product
 requirements for Kylin:

 - Sub-second query latency on billions of rows
 - ANSI SQL availability for those using SQL-compatible tools
 - Full OLAP capability to offer advanced functionality
 - Support for high cardinality and very large dimensions
 - High concurrency for thousands of users
 - Distributed and scale-out architecture for analysis in the TB to PB size
 range

 Existing SQL-on-Hadoop solutions commonly need to perform partial or
 full table or file scans to compute the results of queries. The cost
 of these large data scans can make many queries very slow (more than a
 minute). The core idea of MOLAP (multi-dimensional OLAP) is to
 pre-compute data along dimensions of interest and store resulting
 aggregates as a cube. MOLAP is much faster but is inflexible. We
 realized that no existing product met our exact requirements
 externally – especially in the open source Hadoop community. To meet
 our emerging business 

Re: [VOTE] Accept Kylin into the Apache Incubator

2014-11-21 Thread Bertrand Delacretaz
On Thu, Nov 20, 2014 at 11:31 PM, Luke Han luke...@gmail.com wrote:
 ...I would like to call a VOTE for accepting Kylin as a new incubator 
 project...

+1, binding
-Bertrand

-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



Re: [VOTE] Accept Kylin into the Apache Incubator

2014-11-21 Thread Sergio Fernández

+1 (binding)

On 20/11/14 23:31, Luke Han wrote:

Following the discussion earlier in the thread:

http://mail-archives.apache.org/mod_mbox/incubator-general/201411.mbox/%3ccakmqrob22+n+r++date33f3pcpyujhfoeaqrms3t-udjwk6...@mail.gmail.com%3e

I would like to call a VOTE for accepting Kylin as a new incubator project.

The proposal is available at:
https://wiki.apache.org/incubator/KylinProposal

and posted the text of the proposal below also.

Vote is open until 24th November 2014, 23:59:00 UTC

[ ] +1 accept Kylin in the Incubator
[ ] ±0
[ ] -1 because...


Thanks
Luke


Kylin Proposal
==

# Abstract

Kylin is a distributed and scalable OLAP engine built on Hadoop to
support extremely large datasets.

# Proposal

Kylin is an open source Distributed Analytics Engine that provides
multi-dimensional analysis (MOLAP) on Hadoop. Kylin is designed to
accelerate analytics on Hadoop by allowing the use of SQL-compatible
tools. Kylin provides a SQL interface and multi-dimensional analysis
(MOLAP) on Hadoop to support extremely large datasets and tightly
integrate with Hadoop ecosystem.

## Overview of Kylin

Kylin platform has two parts of data processing and interactive:
First, Kylin will read data from source, Hive, and run a set of tasks
including Map Reduce job, shell script to pre-calcuate results for a
specified data model, then save the resulting OLAP cube into storage
such as HBase. Once these OLAP cubes are ready, a user can submit a
request from any SQL-based tool or third party applications to Kylin’s
REST server. The Server calls the Query Engine to determine if the
target dataset already exists. If so, the engine directly accesses the
target data in the form of a predefined cube, and returns the result
with sub-second latency. Otherwise, the engine is designed to route
non-matching queries to whichever SQL on Hadoop tool is already
available on a Hadoop cluster, such as Hive.

Kylin platform includes:

- Metadata Manager: Kylin is a metadata-driven application. The Kylin
Metadata Manager is the key component that manages all metadata stored
in Kylin including all cube metadata. All other components rely on the
Metadata Manager.

- Job Engine: This engine is designed to handle all of the offline
jobs including shell script, Java API, and Map Reduce jobs. The Job
Engine manages and coordinates all of the jobs in Kylin to make sure
each job executes and handles failures.

- Storage Engine: This engine manages the underlying storage –
specifically, the cuboids, which are stored as key-value pairs. The
Storage Engine uses HBase – the best solution from the Hadoop
ecosystem for leveraging an existing K-V system. Kylin can also be
extended to support other K-V systems, such as Redis.

- Query Engine: Once the cube is ready, the Query Engine can receive
and parse user queries. It then interacts with other components to
return the results to the user.

- REST Server: The REST Server is an entry point for applications to
develop against Kylin. Applications can submit queries, get results,
trigger cube build jobs, get metadata, get user privileges, and so on.

- ODBC Driver: To support third-party tools and applications – such as
Tableau – we have built and open-sourced an ODBC Driver. The goal is
to make it easy for users to onboard.

# Background

The challenge we face at eBay is that our data volume is becoming
bigger and bigger while our user base is becoming more diverse. For
e.g. our business users and analysts consistently ask for minimal
latency when visualizing data on Tableau and Excel. So, we worked
closely with our internal analyst community and outlined the product
requirements for Kylin:

- Sub-second query latency on billions of rows
- ANSI SQL availability for those using SQL-compatible tools
- Full OLAP capability to offer advanced functionality
- Support for high cardinality and very large dimensions
- High concurrency for thousands of users
- Distributed and scale-out architecture for analysis in the TB to PB size
range

Existing SQL-on-Hadoop solutions commonly need to perform partial or
full table or file scans to compute the results of queries. The cost
of these large data scans can make many queries very slow (more than a
minute). The core idea of MOLAP (multi-dimensional OLAP) is to
pre-compute data along dimensions of interest and store resulting
aggregates as a cube. MOLAP is much faster but is inflexible. We
realized that no existing product met our exact requirements
externally – especially in the open source Hadoop community. To meet
our emerging business needs, we built a platform from scratch to
support MOLAP for these business requirements and then to support more
others include ROLAP. With an excellent development team and several
pilot customers, we have been able to bring the Kylin platform into
production as well as open source it.

# Rationale

When data grows to petabyte scale, the process of pre-calculation of a
query takes a long time and costly and powerful 

Re: [VOTE] Accept Kylin into the Apache Incubator

2014-11-21 Thread Nick Dimiduk
Great stuff, +1

On Thursday, November 20, 2014, Luke Han luke...@gmail.com wrote:

 Following the discussion earlier in the thread:


 http://mail-archives.apache.org/mod_mbox/incubator-general/201411.mbox/%3ccakmqrob22+n+r++date33f3pcpyujhfoeaqrms3t-udjwk6...@mail.gmail.com%3e

 I would like to call a VOTE for accepting Kylin as a new incubator project.

 The proposal is available at:
 https://wiki.apache.org/incubator/KylinProposal

 and posted the text of the proposal below also.

 Vote is open until 24th November 2014, 23:59:00 UTC

 [ ] +1 accept Kylin in the Incubator
 [ ] ±0
 [ ] -1 because...


 Thanks
 Luke


 Kylin Proposal
 ==

 # Abstract

 Kylin is a distributed and scalable OLAP engine built on Hadoop to
 support extremely large datasets.

 # Proposal

 Kylin is an open source Distributed Analytics Engine that provides
 multi-dimensional analysis (MOLAP) on Hadoop. Kylin is designed to
 accelerate analytics on Hadoop by allowing the use of SQL-compatible
 tools. Kylin provides a SQL interface and multi-dimensional analysis
 (MOLAP) on Hadoop to support extremely large datasets and tightly
 integrate with Hadoop ecosystem.

 ## Overview of Kylin

 Kylin platform has two parts of data processing and interactive:
 First, Kylin will read data from source, Hive, and run a set of tasks
 including Map Reduce job, shell script to pre-calcuate results for a
 specified data model, then save the resulting OLAP cube into storage
 such as HBase. Once these OLAP cubes are ready, a user can submit a
 request from any SQL-based tool or third party applications to Kylin’s
 REST server. The Server calls the Query Engine to determine if the
 target dataset already exists. If so, the engine directly accesses the
 target data in the form of a predefined cube, and returns the result
 with sub-second latency. Otherwise, the engine is designed to route
 non-matching queries to whichever SQL on Hadoop tool is already
 available on a Hadoop cluster, such as Hive.

 Kylin platform includes:

 - Metadata Manager: Kylin is a metadata-driven application. The Kylin
 Metadata Manager is the key component that manages all metadata stored
 in Kylin including all cube metadata. All other components rely on the
 Metadata Manager.

 - Job Engine: This engine is designed to handle all of the offline
 jobs including shell script, Java API, and Map Reduce jobs. The Job
 Engine manages and coordinates all of the jobs in Kylin to make sure
 each job executes and handles failures.

 - Storage Engine: This engine manages the underlying storage –
 specifically, the cuboids, which are stored as key-value pairs. The
 Storage Engine uses HBase – the best solution from the Hadoop
 ecosystem for leveraging an existing K-V system. Kylin can also be
 extended to support other K-V systems, such as Redis.

 - Query Engine: Once the cube is ready, the Query Engine can receive
 and parse user queries. It then interacts with other components to
 return the results to the user.

 - REST Server: The REST Server is an entry point for applications to
 develop against Kylin. Applications can submit queries, get results,
 trigger cube build jobs, get metadata, get user privileges, and so on.

 - ODBC Driver: To support third-party tools and applications – such as
 Tableau – we have built and open-sourced an ODBC Driver. The goal is
 to make it easy for users to onboard.

 # Background

 The challenge we face at eBay is that our data volume is becoming
 bigger and bigger while our user base is becoming more diverse. For
 e.g. our business users and analysts consistently ask for minimal
 latency when visualizing data on Tableau and Excel. So, we worked
 closely with our internal analyst community and outlined the product
 requirements for Kylin:

 - Sub-second query latency on billions of rows
 - ANSI SQL availability for those using SQL-compatible tools
 - Full OLAP capability to offer advanced functionality
 - Support for high cardinality and very large dimensions
 - High concurrency for thousands of users
 - Distributed and scale-out architecture for analysis in the TB to PB size
 range

 Existing SQL-on-Hadoop solutions commonly need to perform partial or
 full table or file scans to compute the results of queries. The cost
 of these large data scans can make many queries very slow (more than a
 minute). The core idea of MOLAP (multi-dimensional OLAP) is to
 pre-compute data along dimensions of interest and store resulting
 aggregates as a cube. MOLAP is much faster but is inflexible. We
 realized that no existing product met our exact requirements
 externally – especially in the open source Hadoop community. To meet
 our emerging business needs, we built a platform from scratch to
 support MOLAP for these business requirements and then to support more
 others include ROLAP. With an excellent development team and several
 pilot customers, we have been able to bring the Kylin platform into
 production as well as open source it.

 # 

Re: [VOTE] Accept Kylin into the Apache Incubator

2014-11-21 Thread Alan D. Cabrera
+1 binding


Regards,
Alan

On Nov 20, 2014, at 2:31 PM, Luke Han luke...@gmail.com wrote:

 Following the discussion earlier in the thread:
 
 http://mail-archives.apache.org/mod_mbox/incubator-general/201411.mbox/%3ccakmqrob22+n+r++date33f3pcpyujhfoeaqrms3t-udjwk6...@mail.gmail.com%3e
 
 I would like to call a VOTE for accepting Kylin as a new incubator project.
 
 The proposal is available at:
 https://wiki.apache.org/incubator/KylinProposal
 
 and posted the text of the proposal below also.
 
 Vote is open until 24th November 2014, 23:59:00 UTC
 
 [ ] +1 accept Kylin in the Incubator
 [ ] ±0
 [ ] -1 because...
 


-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org



[VOTE] Accept Kylin into the Apache Incubator

2014-11-20 Thread Luke Han
Following the discussion earlier in the thread:

http://mail-archives.apache.org/mod_mbox/incubator-general/201411.mbox/%3ccakmqrob22+n+r++date33f3pcpyujhfoeaqrms3t-udjwk6...@mail.gmail.com%3e

I would like to call a VOTE for accepting Kylin as a new incubator project.

The proposal is available at:
https://wiki.apache.org/incubator/KylinProposal

and posted the text of the proposal below also.

Vote is open until 24th November 2014, 23:59:00 UTC

[ ] +1 accept Kylin in the Incubator
[ ] ±0
[ ] -1 because...


Thanks
Luke


Kylin Proposal
==

# Abstract

Kylin is a distributed and scalable OLAP engine built on Hadoop to
support extremely large datasets.

# Proposal

Kylin is an open source Distributed Analytics Engine that provides
multi-dimensional analysis (MOLAP) on Hadoop. Kylin is designed to
accelerate analytics on Hadoop by allowing the use of SQL-compatible
tools. Kylin provides a SQL interface and multi-dimensional analysis
(MOLAP) on Hadoop to support extremely large datasets and tightly
integrate with Hadoop ecosystem.

## Overview of Kylin

Kylin platform has two parts of data processing and interactive:
First, Kylin will read data from source, Hive, and run a set of tasks
including Map Reduce job, shell script to pre-calcuate results for a
specified data model, then save the resulting OLAP cube into storage
such as HBase. Once these OLAP cubes are ready, a user can submit a
request from any SQL-based tool or third party applications to Kylin’s
REST server. The Server calls the Query Engine to determine if the
target dataset already exists. If so, the engine directly accesses the
target data in the form of a predefined cube, and returns the result
with sub-second latency. Otherwise, the engine is designed to route
non-matching queries to whichever SQL on Hadoop tool is already
available on a Hadoop cluster, such as Hive.

Kylin platform includes:

- Metadata Manager: Kylin is a metadata-driven application. The Kylin
Metadata Manager is the key component that manages all metadata stored
in Kylin including all cube metadata. All other components rely on the
Metadata Manager.

- Job Engine: This engine is designed to handle all of the offline
jobs including shell script, Java API, and Map Reduce jobs. The Job
Engine manages and coordinates all of the jobs in Kylin to make sure
each job executes and handles failures.

- Storage Engine: This engine manages the underlying storage –
specifically, the cuboids, which are stored as key-value pairs. The
Storage Engine uses HBase – the best solution from the Hadoop
ecosystem for leveraging an existing K-V system. Kylin can also be
extended to support other K-V systems, such as Redis.

- Query Engine: Once the cube is ready, the Query Engine can receive
and parse user queries. It then interacts with other components to
return the results to the user.

- REST Server: The REST Server is an entry point for applications to
develop against Kylin. Applications can submit queries, get results,
trigger cube build jobs, get metadata, get user privileges, and so on.

- ODBC Driver: To support third-party tools and applications – such as
Tableau – we have built and open-sourced an ODBC Driver. The goal is
to make it easy for users to onboard.

# Background

The challenge we face at eBay is that our data volume is becoming
bigger and bigger while our user base is becoming more diverse. For
e.g. our business users and analysts consistently ask for minimal
latency when visualizing data on Tableau and Excel. So, we worked
closely with our internal analyst community and outlined the product
requirements for Kylin:

- Sub-second query latency on billions of rows
- ANSI SQL availability for those using SQL-compatible tools
- Full OLAP capability to offer advanced functionality
- Support for high cardinality and very large dimensions
- High concurrency for thousands of users
- Distributed and scale-out architecture for analysis in the TB to PB size
range

Existing SQL-on-Hadoop solutions commonly need to perform partial or
full table or file scans to compute the results of queries. The cost
of these large data scans can make many queries very slow (more than a
minute). The core idea of MOLAP (multi-dimensional OLAP) is to
pre-compute data along dimensions of interest and store resulting
aggregates as a cube. MOLAP is much faster but is inflexible. We
realized that no existing product met our exact requirements
externally – especially in the open source Hadoop community. To meet
our emerging business needs, we built a platform from scratch to
support MOLAP for these business requirements and then to support more
others include ROLAP. With an excellent development team and several
pilot customers, we have been able to bring the Kylin platform into
production as well as open source it.

# Rationale

When data grows to petabyte scale, the process of pre-calculation of a
query takes a long time and costly and powerful hardware. However,
with the benefit of Hadoop’s 

Re: [VOTE] Accept Kylin into the Apache Incubator

2014-11-20 Thread Ted Dunning
+1 (binding)



On Fri, Nov 21, 2014 at 3:37 AM, Andrew Purtell apurt...@apache.org wrote:

 +1 (binding)

 On Thu, Nov 20, 2014 at 2:31 PM, Luke Han luke...@gmail.com wrote:

  Following the discussion earlier in the thread:
 
 
 
 http://mail-archives.apache.org/mod_mbox/incubator-general/201411.mbox/%3ccakmqrob22+n+r++date33f3pcpyujhfoeaqrms3t-udjwk6...@mail.gmail.com%3e
 
  I would like to call a VOTE for accepting Kylin as a new incubator
 project.
 
  The proposal is available at:
  https://wiki.apache.org/incubator/KylinProposal
 
  and posted the text of the proposal below also.
 
  Vote is open until 24th November 2014, 23:59:00 UTC
 
  [ ] +1 accept Kylin in the Incubator
  [ ] ±0
  [ ] -1 because...
 
 
  Thanks
  Luke
 
 
  Kylin Proposal
  ==
 
  # Abstract
 
  Kylin is a distributed and scalable OLAP engine built on Hadoop to
  support extremely large datasets.
 
  # Proposal
 
  Kylin is an open source Distributed Analytics Engine that provides
  multi-dimensional analysis (MOLAP) on Hadoop. Kylin is designed to
  accelerate analytics on Hadoop by allowing the use of SQL-compatible
  tools. Kylin provides a SQL interface and multi-dimensional analysis
  (MOLAP) on Hadoop to support extremely large datasets and tightly
  integrate with Hadoop ecosystem.
 
  ## Overview of Kylin
 
  Kylin platform has two parts of data processing and interactive:
  First, Kylin will read data from source, Hive, and run a set of tasks
  including Map Reduce job, shell script to pre-calcuate results for a
  specified data model, then save the resulting OLAP cube into storage
  such as HBase. Once these OLAP cubes are ready, a user can submit a
  request from any SQL-based tool or third party applications to Kylin’s
  REST server. The Server calls the Query Engine to determine if the
  target dataset already exists. If so, the engine directly accesses the
  target data in the form of a predefined cube, and returns the result
  with sub-second latency. Otherwise, the engine is designed to route
  non-matching queries to whichever SQL on Hadoop tool is already
  available on a Hadoop cluster, such as Hive.
 
  Kylin platform includes:
 
  - Metadata Manager: Kylin is a metadata-driven application. The Kylin
  Metadata Manager is the key component that manages all metadata stored
  in Kylin including all cube metadata. All other components rely on the
  Metadata Manager.
 
  - Job Engine: This engine is designed to handle all of the offline
  jobs including shell script, Java API, and Map Reduce jobs. The Job
  Engine manages and coordinates all of the jobs in Kylin to make sure
  each job executes and handles failures.
 
  - Storage Engine: This engine manages the underlying storage –
  specifically, the cuboids, which are stored as key-value pairs. The
  Storage Engine uses HBase – the best solution from the Hadoop
  ecosystem for leveraging an existing K-V system. Kylin can also be
  extended to support other K-V systems, such as Redis.
 
  - Query Engine: Once the cube is ready, the Query Engine can receive
  and parse user queries. It then interacts with other components to
  return the results to the user.
 
  - REST Server: The REST Server is an entry point for applications to
  develop against Kylin. Applications can submit queries, get results,
  trigger cube build jobs, get metadata, get user privileges, and so on.
 
  - ODBC Driver: To support third-party tools and applications – such as
  Tableau – we have built and open-sourced an ODBC Driver. The goal is
  to make it easy for users to onboard.
 
  # Background
 
  The challenge we face at eBay is that our data volume is becoming
  bigger and bigger while our user base is becoming more diverse. For
  e.g. our business users and analysts consistently ask for minimal
  latency when visualizing data on Tableau and Excel. So, we worked
  closely with our internal analyst community and outlined the product
  requirements for Kylin:
 
  - Sub-second query latency on billions of rows
  - ANSI SQL availability for those using SQL-compatible tools
  - Full OLAP capability to offer advanced functionality
  - Support for high cardinality and very large dimensions
  - High concurrency for thousands of users
  - Distributed and scale-out architecture for analysis in the TB to PB
 size
  range
 
  Existing SQL-on-Hadoop solutions commonly need to perform partial or
  full table or file scans to compute the results of queries. The cost
  of these large data scans can make many queries very slow (more than a
  minute). The core idea of MOLAP (multi-dimensional OLAP) is to
  pre-compute data along dimensions of interest and store resulting
  aggregates as a cube. MOLAP is much faster but is inflexible. We
  realized that no existing product met our exact requirements
  externally – especially in the open source Hadoop community. To meet
  our emerging business needs, we built a platform from scratch to
  support MOLAP for these business requirements and