Re:[Discuss] Reposition Kylin as "Analytical Data warehouse for big data"

Xiaoxiang Yu Sun, 12 Jan 2020 23:00:11 -0800

+1
Great suggestion. And I wish in the future, Kylin could support more and more 
data source and provided better performance when build segment .









--

Best wishes to you ! 
From ：Xiaoxiang Yu



At 2020-01-12 20:32:12, "ShaoFeng Shi" <[email protected]> wrote:

Hello, Kylin developers and users, HAPPY NEW YEAR 2020!

In last month, we released Kylin 3.0, with the new Real-time streaming feature 
and a Lambda architecture. This allows our users to host only one system for 
both batch and real-time analytics, and then can query batch and streaming data 
together.

If you look at Kylin's home page, its slogan is still the "OLAP Engine for Big 
data", which was made 5 years ago when it was born. While today, Kylin's 
capability has been verified beyond an "OLAP engine". I visited many Kylin 
users in China, US, Euro in last year, and have got many different scenarios:

1. eBay initiated the Kylin project to offload analytical workloads from 
Teradata to Hadoop; Kylin serves the online queries with high performance and 
high availability. Till today, Kylin serves millions of queries every day, most 
are in < 1 seconds;
2. China Unionpay and CPIC use Kylin to replace IBM Cognos cubes. One Kylin 
cube replaced more than 100 Cognos cubes, with better building performance and 
query performance.
3. China Construction Bank uses Hadoop + Kylin to offload the Greenplum. Some 
systems have been migrated to Kylin successfully.
4. Yum (KFC) and several other users are using Kylin to replace Microsoft SSAS.
5. Meituan, Ctrip, JD, Didi, Xiao Mi, Huawei, OLX group, autohome.com.cn, 
Xactly, and many others are using Kylin as the platform of their DaaS (Data as 
a Service), providing data service to their thousands of internal analysts and 
tens of thousands of external tenants.

Now let's look at the definition of Data warehouse [1]:

"A data warehouse is a subject-oriented, integrated, time-variant and 
non-volatile collection of data in support of management's decision-making 
process."

In Kylin, each model/cube is created for a certain subject; Kylin integrates 
well with Hive, Hadoop, Spark, Kafka, and other systems; Kylin incremental 
loads the data by time, build the cube and then save as segments (partitions), 
and they are non-volatile unless you refresh them;  During the analysis 
(roll-up, drill-down, etc), the data is always consistent. Kylin provides SQL 
interface and JDBC/ODBC/HTTP API for you to easily connect from 
BI/visualization tools like Tableau and others.

All in all, you can see that users are using Kylin not just as a SQL engine, 
but also as an Analytical Data Warehouse, for very large scale data (PB scale). 
In the world of big data, Kylin is unique. Its design is elegant, its 
architecture is scalable and pluggable.  In order to give Kylin more visibility 
and can be discovered by more people, I propose to change Kylin's 
position/slogan from the "OLAP engine for big data" to "Analytical Data 
warehouse for big data".

Please feel free to share your comments.

[1] https://www.1keydata.com/datawarehousing/data-warehouse-definition.html


Best regards,


Shaofeng Shi 史少锋
Apache Kylin PMC
Email: [email protected]


Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: [email protected]
Join Kylin dev mail group: [email protected]

Re:[Discuss] Reposition Kylin as "Analytical Data warehouse for big data"

Reply via email to