[RESULT] [VOTE] Accept Kylin into the Apache Incubator
The vote has passed with 7 binding +1, 1 non binding +1, no 0 and -1s. Binding +1s : John D. Ament Henry Saputra Andrew Purtell Ted Dunning Bertrand Delacretaz Sergio Fernández Alan D. Cabrera Non Binding +1s: Nick Dimiduk Thanks everyone for voting. We will proceed with the next steps as per the IPMC guidelines. Thanks Luke 2014-11-21 6:31 GMT+08:00 Luke Han luke...@gmail.com: Following the discussion earlier in the thread: http://mail-archives.apache.org/mod_mbox/incubator-general/201411.mbox/%3ccakmqrob22+n+r++date33f3pcpyujhfoeaqrms3t-udjwk6...@mail.gmail.com%3e I would like to call a VOTE for accepting Kylin as a new incubator project. The proposal is available at: https://wiki.apache.org/incubator/KylinProposal and posted the text of the proposal below also. Vote is open until 24th November 2014, 23:59:00 UTC [ ] +1 accept Kylin in the Incubator [ ] ±0 [ ] -1 because... Thanks Luke Kylin Proposal == # Abstract Kylin is a distributed and scalable OLAP engine built on Hadoop to support extremely large datasets. # Proposal Kylin is an open source Distributed Analytics Engine that provides multi-dimensional analysis (MOLAP) on Hadoop. Kylin is designed to accelerate analytics on Hadoop by allowing the use of SQL-compatible tools. Kylin provides a SQL interface and multi-dimensional analysis (MOLAP) on Hadoop to support extremely large datasets and tightly integrate with Hadoop ecosystem. ## Overview of Kylin Kylin platform has two parts of data processing and interactive: First, Kylin will read data from source, Hive, and run a set of tasks including Map Reduce job, shell script to pre-calcuate results for a specified data model, then save the resulting OLAP cube into storage such as HBase. Once these OLAP cubes are ready, a user can submit a request from any SQL-based tool or third party applications to Kylin’s REST server. The Server calls the Query Engine to determine if the target dataset already exists. If so, the engine directly accesses the target data in the form of a predefined cube, and returns the result with sub-second latency. Otherwise, the engine is designed to route non-matching queries to whichever SQL on Hadoop tool is already available on a Hadoop cluster, such as Hive. Kylin platform includes: - Metadata Manager: Kylin is a metadata-driven application. The Kylin Metadata Manager is the key component that manages all metadata stored in Kylin including all cube metadata. All other components rely on the Metadata Manager. - Job Engine: This engine is designed to handle all of the offline jobs including shell script, Java API, and Map Reduce jobs. The Job Engine manages and coordinates all of the jobs in Kylin to make sure each job executes and handles failures. - Storage Engine: This engine manages the underlying storage – specifically, the cuboids, which are stored as key-value pairs. The Storage Engine uses HBase – the best solution from the Hadoop ecosystem for leveraging an existing K-V system. Kylin can also be extended to support other K-V systems, such as Redis. - Query Engine: Once the cube is ready, the Query Engine can receive and parse user queries. It then interacts with other components to return the results to the user. - REST Server: The REST Server is an entry point for applications to develop against Kylin. Applications can submit queries, get results, trigger cube build jobs, get metadata, get user privileges, and so on. - ODBC Driver: To support third-party tools and applications – such as Tableau – we have built and open-sourced an ODBC Driver. The goal is to make it easy for users to onboard. # Background The challenge we face at eBay is that our data volume is becoming bigger and bigger while our user base is becoming more diverse. For e.g. our business users and analysts consistently ask for minimal latency when visualizing data on Tableau and Excel. So, we worked closely with our internal analyst community and outlined the product requirements for Kylin: - Sub-second query latency on billions of rows - ANSI SQL availability for those using SQL-compatible tools - Full OLAP capability to offer advanced functionality - Support for high cardinality and very large dimensions - High concurrency for thousands of users - Distributed and scale-out architecture for analysis in the TB to PB size range Existing SQL-on-Hadoop solutions commonly need to perform partial or full table or file scans to compute the results of queries. The cost of these large data scans can make many queries very slow (more than a minute). The core idea of MOLAP (multi-dimensional OLAP) is to pre-compute data along dimensions of interest and store resulting aggregates as a cube. MOLAP is much faster but is inflexible. We realized that no existing product met our exact requirements externally – especially in the open source Hadoop community. To meet our emerging business
Re: [VOTE] Accept Kylin into the Apache Incubator
On Thu, Nov 20, 2014 at 11:31 PM, Luke Han luke...@gmail.com wrote: ...I would like to call a VOTE for accepting Kylin as a new incubator project... +1, binding -Bertrand - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: [VOTE] Accept Kylin into the Apache Incubator
+1 (binding) On 20/11/14 23:31, Luke Han wrote: Following the discussion earlier in the thread: http://mail-archives.apache.org/mod_mbox/incubator-general/201411.mbox/%3ccakmqrob22+n+r++date33f3pcpyujhfoeaqrms3t-udjwk6...@mail.gmail.com%3e I would like to call a VOTE for accepting Kylin as a new incubator project. The proposal is available at: https://wiki.apache.org/incubator/KylinProposal and posted the text of the proposal below also. Vote is open until 24th November 2014, 23:59:00 UTC [ ] +1 accept Kylin in the Incubator [ ] ±0 [ ] -1 because... Thanks Luke Kylin Proposal == # Abstract Kylin is a distributed and scalable OLAP engine built on Hadoop to support extremely large datasets. # Proposal Kylin is an open source Distributed Analytics Engine that provides multi-dimensional analysis (MOLAP) on Hadoop. Kylin is designed to accelerate analytics on Hadoop by allowing the use of SQL-compatible tools. Kylin provides a SQL interface and multi-dimensional analysis (MOLAP) on Hadoop to support extremely large datasets and tightly integrate with Hadoop ecosystem. ## Overview of Kylin Kylin platform has two parts of data processing and interactive: First, Kylin will read data from source, Hive, and run a set of tasks including Map Reduce job, shell script to pre-calcuate results for a specified data model, then save the resulting OLAP cube into storage such as HBase. Once these OLAP cubes are ready, a user can submit a request from any SQL-based tool or third party applications to Kylin’s REST server. The Server calls the Query Engine to determine if the target dataset already exists. If so, the engine directly accesses the target data in the form of a predefined cube, and returns the result with sub-second latency. Otherwise, the engine is designed to route non-matching queries to whichever SQL on Hadoop tool is already available on a Hadoop cluster, such as Hive. Kylin platform includes: - Metadata Manager: Kylin is a metadata-driven application. The Kylin Metadata Manager is the key component that manages all metadata stored in Kylin including all cube metadata. All other components rely on the Metadata Manager. - Job Engine: This engine is designed to handle all of the offline jobs including shell script, Java API, and Map Reduce jobs. The Job Engine manages and coordinates all of the jobs in Kylin to make sure each job executes and handles failures. - Storage Engine: This engine manages the underlying storage – specifically, the cuboids, which are stored as key-value pairs. The Storage Engine uses HBase – the best solution from the Hadoop ecosystem for leveraging an existing K-V system. Kylin can also be extended to support other K-V systems, such as Redis. - Query Engine: Once the cube is ready, the Query Engine can receive and parse user queries. It then interacts with other components to return the results to the user. - REST Server: The REST Server is an entry point for applications to develop against Kylin. Applications can submit queries, get results, trigger cube build jobs, get metadata, get user privileges, and so on. - ODBC Driver: To support third-party tools and applications – such as Tableau – we have built and open-sourced an ODBC Driver. The goal is to make it easy for users to onboard. # Background The challenge we face at eBay is that our data volume is becoming bigger and bigger while our user base is becoming more diverse. For e.g. our business users and analysts consistently ask for minimal latency when visualizing data on Tableau and Excel. So, we worked closely with our internal analyst community and outlined the product requirements for Kylin: - Sub-second query latency on billions of rows - ANSI SQL availability for those using SQL-compatible tools - Full OLAP capability to offer advanced functionality - Support for high cardinality and very large dimensions - High concurrency for thousands of users - Distributed and scale-out architecture for analysis in the TB to PB size range Existing SQL-on-Hadoop solutions commonly need to perform partial or full table or file scans to compute the results of queries. The cost of these large data scans can make many queries very slow (more than a minute). The core idea of MOLAP (multi-dimensional OLAP) is to pre-compute data along dimensions of interest and store resulting aggregates as a cube. MOLAP is much faster but is inflexible. We realized that no existing product met our exact requirements externally – especially in the open source Hadoop community. To meet our emerging business needs, we built a platform from scratch to support MOLAP for these business requirements and then to support more others include ROLAP. With an excellent development team and several pilot customers, we have been able to bring the Kylin platform into production as well as open source it. # Rationale When data grows to petabyte scale, the process of pre-calculation of a query takes a long time and costly and powerful
Re: [VOTE] Accept Kylin into the Apache Incubator
Great stuff, +1 On Thursday, November 20, 2014, Luke Han luke...@gmail.com wrote: Following the discussion earlier in the thread: http://mail-archives.apache.org/mod_mbox/incubator-general/201411.mbox/%3ccakmqrob22+n+r++date33f3pcpyujhfoeaqrms3t-udjwk6...@mail.gmail.com%3e I would like to call a VOTE for accepting Kylin as a new incubator project. The proposal is available at: https://wiki.apache.org/incubator/KylinProposal and posted the text of the proposal below also. Vote is open until 24th November 2014, 23:59:00 UTC [ ] +1 accept Kylin in the Incubator [ ] ±0 [ ] -1 because... Thanks Luke Kylin Proposal == # Abstract Kylin is a distributed and scalable OLAP engine built on Hadoop to support extremely large datasets. # Proposal Kylin is an open source Distributed Analytics Engine that provides multi-dimensional analysis (MOLAP) on Hadoop. Kylin is designed to accelerate analytics on Hadoop by allowing the use of SQL-compatible tools. Kylin provides a SQL interface and multi-dimensional analysis (MOLAP) on Hadoop to support extremely large datasets and tightly integrate with Hadoop ecosystem. ## Overview of Kylin Kylin platform has two parts of data processing and interactive: First, Kylin will read data from source, Hive, and run a set of tasks including Map Reduce job, shell script to pre-calcuate results for a specified data model, then save the resulting OLAP cube into storage such as HBase. Once these OLAP cubes are ready, a user can submit a request from any SQL-based tool or third party applications to Kylin’s REST server. The Server calls the Query Engine to determine if the target dataset already exists. If so, the engine directly accesses the target data in the form of a predefined cube, and returns the result with sub-second latency. Otherwise, the engine is designed to route non-matching queries to whichever SQL on Hadoop tool is already available on a Hadoop cluster, such as Hive. Kylin platform includes: - Metadata Manager: Kylin is a metadata-driven application. The Kylin Metadata Manager is the key component that manages all metadata stored in Kylin including all cube metadata. All other components rely on the Metadata Manager. - Job Engine: This engine is designed to handle all of the offline jobs including shell script, Java API, and Map Reduce jobs. The Job Engine manages and coordinates all of the jobs in Kylin to make sure each job executes and handles failures. - Storage Engine: This engine manages the underlying storage – specifically, the cuboids, which are stored as key-value pairs. The Storage Engine uses HBase – the best solution from the Hadoop ecosystem for leveraging an existing K-V system. Kylin can also be extended to support other K-V systems, such as Redis. - Query Engine: Once the cube is ready, the Query Engine can receive and parse user queries. It then interacts with other components to return the results to the user. - REST Server: The REST Server is an entry point for applications to develop against Kylin. Applications can submit queries, get results, trigger cube build jobs, get metadata, get user privileges, and so on. - ODBC Driver: To support third-party tools and applications – such as Tableau – we have built and open-sourced an ODBC Driver. The goal is to make it easy for users to onboard. # Background The challenge we face at eBay is that our data volume is becoming bigger and bigger while our user base is becoming more diverse. For e.g. our business users and analysts consistently ask for minimal latency when visualizing data on Tableau and Excel. So, we worked closely with our internal analyst community and outlined the product requirements for Kylin: - Sub-second query latency on billions of rows - ANSI SQL availability for those using SQL-compatible tools - Full OLAP capability to offer advanced functionality - Support for high cardinality and very large dimensions - High concurrency for thousands of users - Distributed and scale-out architecture for analysis in the TB to PB size range Existing SQL-on-Hadoop solutions commonly need to perform partial or full table or file scans to compute the results of queries. The cost of these large data scans can make many queries very slow (more than a minute). The core idea of MOLAP (multi-dimensional OLAP) is to pre-compute data along dimensions of interest and store resulting aggregates as a cube. MOLAP is much faster but is inflexible. We realized that no existing product met our exact requirements externally – especially in the open source Hadoop community. To meet our emerging business needs, we built a platform from scratch to support MOLAP for these business requirements and then to support more others include ROLAP. With an excellent development team and several pilot customers, we have been able to bring the Kylin platform into production as well as open source it. #
Re: [VOTE] Accept Kylin into the Apache Incubator
+1 binding Regards, Alan On Nov 20, 2014, at 2:31 PM, Luke Han luke...@gmail.com wrote: Following the discussion earlier in the thread: http://mail-archives.apache.org/mod_mbox/incubator-general/201411.mbox/%3ccakmqrob22+n+r++date33f3pcpyujhfoeaqrms3t-udjwk6...@mail.gmail.com%3e I would like to call a VOTE for accepting Kylin as a new incubator project. The proposal is available at: https://wiki.apache.org/incubator/KylinProposal and posted the text of the proposal below also. Vote is open until 24th November 2014, 23:59:00 UTC [ ] +1 accept Kylin in the Incubator [ ] ±0 [ ] -1 because... - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
[VOTE] Accept Kylin into the Apache Incubator
Following the discussion earlier in the thread: http://mail-archives.apache.org/mod_mbox/incubator-general/201411.mbox/%3ccakmqrob22+n+r++date33f3pcpyujhfoeaqrms3t-udjwk6...@mail.gmail.com%3e I would like to call a VOTE for accepting Kylin as a new incubator project. The proposal is available at: https://wiki.apache.org/incubator/KylinProposal and posted the text of the proposal below also. Vote is open until 24th November 2014, 23:59:00 UTC [ ] +1 accept Kylin in the Incubator [ ] ±0 [ ] -1 because... Thanks Luke Kylin Proposal == # Abstract Kylin is a distributed and scalable OLAP engine built on Hadoop to support extremely large datasets. # Proposal Kylin is an open source Distributed Analytics Engine that provides multi-dimensional analysis (MOLAP) on Hadoop. Kylin is designed to accelerate analytics on Hadoop by allowing the use of SQL-compatible tools. Kylin provides a SQL interface and multi-dimensional analysis (MOLAP) on Hadoop to support extremely large datasets and tightly integrate with Hadoop ecosystem. ## Overview of Kylin Kylin platform has two parts of data processing and interactive: First, Kylin will read data from source, Hive, and run a set of tasks including Map Reduce job, shell script to pre-calcuate results for a specified data model, then save the resulting OLAP cube into storage such as HBase. Once these OLAP cubes are ready, a user can submit a request from any SQL-based tool or third party applications to Kylin’s REST server. The Server calls the Query Engine to determine if the target dataset already exists. If so, the engine directly accesses the target data in the form of a predefined cube, and returns the result with sub-second latency. Otherwise, the engine is designed to route non-matching queries to whichever SQL on Hadoop tool is already available on a Hadoop cluster, such as Hive. Kylin platform includes: - Metadata Manager: Kylin is a metadata-driven application. The Kylin Metadata Manager is the key component that manages all metadata stored in Kylin including all cube metadata. All other components rely on the Metadata Manager. - Job Engine: This engine is designed to handle all of the offline jobs including shell script, Java API, and Map Reduce jobs. The Job Engine manages and coordinates all of the jobs in Kylin to make sure each job executes and handles failures. - Storage Engine: This engine manages the underlying storage – specifically, the cuboids, which are stored as key-value pairs. The Storage Engine uses HBase – the best solution from the Hadoop ecosystem for leveraging an existing K-V system. Kylin can also be extended to support other K-V systems, such as Redis. - Query Engine: Once the cube is ready, the Query Engine can receive and parse user queries. It then interacts with other components to return the results to the user. - REST Server: The REST Server is an entry point for applications to develop against Kylin. Applications can submit queries, get results, trigger cube build jobs, get metadata, get user privileges, and so on. - ODBC Driver: To support third-party tools and applications – such as Tableau – we have built and open-sourced an ODBC Driver. The goal is to make it easy for users to onboard. # Background The challenge we face at eBay is that our data volume is becoming bigger and bigger while our user base is becoming more diverse. For e.g. our business users and analysts consistently ask for minimal latency when visualizing data on Tableau and Excel. So, we worked closely with our internal analyst community and outlined the product requirements for Kylin: - Sub-second query latency on billions of rows - ANSI SQL availability for those using SQL-compatible tools - Full OLAP capability to offer advanced functionality - Support for high cardinality and very large dimensions - High concurrency for thousands of users - Distributed and scale-out architecture for analysis in the TB to PB size range Existing SQL-on-Hadoop solutions commonly need to perform partial or full table or file scans to compute the results of queries. The cost of these large data scans can make many queries very slow (more than a minute). The core idea of MOLAP (multi-dimensional OLAP) is to pre-compute data along dimensions of interest and store resulting aggregates as a cube. MOLAP is much faster but is inflexible. We realized that no existing product met our exact requirements externally – especially in the open source Hadoop community. To meet our emerging business needs, we built a platform from scratch to support MOLAP for these business requirements and then to support more others include ROLAP. With an excellent development team and several pilot customers, we have been able to bring the Kylin platform into production as well as open source it. # Rationale When data grows to petabyte scale, the process of pre-calculation of a query takes a long time and costly and powerful hardware. However, with the benefit of Hadoop’s
Re: [VOTE] Accept Kylin into the Apache Incubator
+1 (binding) On Fri, Nov 21, 2014 at 3:37 AM, Andrew Purtell apurt...@apache.org wrote: +1 (binding) On Thu, Nov 20, 2014 at 2:31 PM, Luke Han luke...@gmail.com wrote: Following the discussion earlier in the thread: http://mail-archives.apache.org/mod_mbox/incubator-general/201411.mbox/%3ccakmqrob22+n+r++date33f3pcpyujhfoeaqrms3t-udjwk6...@mail.gmail.com%3e I would like to call a VOTE for accepting Kylin as a new incubator project. The proposal is available at: https://wiki.apache.org/incubator/KylinProposal and posted the text of the proposal below also. Vote is open until 24th November 2014, 23:59:00 UTC [ ] +1 accept Kylin in the Incubator [ ] ±0 [ ] -1 because... Thanks Luke Kylin Proposal == # Abstract Kylin is a distributed and scalable OLAP engine built on Hadoop to support extremely large datasets. # Proposal Kylin is an open source Distributed Analytics Engine that provides multi-dimensional analysis (MOLAP) on Hadoop. Kylin is designed to accelerate analytics on Hadoop by allowing the use of SQL-compatible tools. Kylin provides a SQL interface and multi-dimensional analysis (MOLAP) on Hadoop to support extremely large datasets and tightly integrate with Hadoop ecosystem. ## Overview of Kylin Kylin platform has two parts of data processing and interactive: First, Kylin will read data from source, Hive, and run a set of tasks including Map Reduce job, shell script to pre-calcuate results for a specified data model, then save the resulting OLAP cube into storage such as HBase. Once these OLAP cubes are ready, a user can submit a request from any SQL-based tool or third party applications to Kylin’s REST server. The Server calls the Query Engine to determine if the target dataset already exists. If so, the engine directly accesses the target data in the form of a predefined cube, and returns the result with sub-second latency. Otherwise, the engine is designed to route non-matching queries to whichever SQL on Hadoop tool is already available on a Hadoop cluster, such as Hive. Kylin platform includes: - Metadata Manager: Kylin is a metadata-driven application. The Kylin Metadata Manager is the key component that manages all metadata stored in Kylin including all cube metadata. All other components rely on the Metadata Manager. - Job Engine: This engine is designed to handle all of the offline jobs including shell script, Java API, and Map Reduce jobs. The Job Engine manages and coordinates all of the jobs in Kylin to make sure each job executes and handles failures. - Storage Engine: This engine manages the underlying storage – specifically, the cuboids, which are stored as key-value pairs. The Storage Engine uses HBase – the best solution from the Hadoop ecosystem for leveraging an existing K-V system. Kylin can also be extended to support other K-V systems, such as Redis. - Query Engine: Once the cube is ready, the Query Engine can receive and parse user queries. It then interacts with other components to return the results to the user. - REST Server: The REST Server is an entry point for applications to develop against Kylin. Applications can submit queries, get results, trigger cube build jobs, get metadata, get user privileges, and so on. - ODBC Driver: To support third-party tools and applications – such as Tableau – we have built and open-sourced an ODBC Driver. The goal is to make it easy for users to onboard. # Background The challenge we face at eBay is that our data volume is becoming bigger and bigger while our user base is becoming more diverse. For e.g. our business users and analysts consistently ask for minimal latency when visualizing data on Tableau and Excel. So, we worked closely with our internal analyst community and outlined the product requirements for Kylin: - Sub-second query latency on billions of rows - ANSI SQL availability for those using SQL-compatible tools - Full OLAP capability to offer advanced functionality - Support for high cardinality and very large dimensions - High concurrency for thousands of users - Distributed and scale-out architecture for analysis in the TB to PB size range Existing SQL-on-Hadoop solutions commonly need to perform partial or full table or file scans to compute the results of queries. The cost of these large data scans can make many queries very slow (more than a minute). The core idea of MOLAP (multi-dimensional OLAP) is to pre-compute data along dimensions of interest and store resulting aggregates as a cube. MOLAP is much faster but is inflexible. We realized that no existing product met our exact requirements externally – especially in the open source Hadoop community. To meet our emerging business needs, we built a platform from scratch to support MOLAP for these business requirements and