Re:[DISCUSS] The future of Apache Kylin

Xiaoxiang Yu Mon, 10 Jan 2022 22:32:21 -0800

Thanks Yang, there are two new features that I really looking forward to, and 
they are:



1. New SEMANTIC LAYER will make Kylin be accessible by excel (MDX) and more BI 
tools.
2. New flexible ModeL will let Kylin user modify Model/Cube (such as add/delete 
dimensions/measures) which status is Ready without purge the any useful 
cuboid/segmemnt .




--

Best wishes to you ! 
From ：Xiaoxiang Yu





At 2022-01-11 13:59:13, "Li Yang" <[email protected]> wrote:
>Hi All
>
>Apache Kylin has been stable for quite a while and it may be a good time to
>think about the future of it. Below are thoughts from my team and myself.
>Love to hear yours as well. Ideas and comments are very welcome.  :-)
>
>*APACHE KYLIN TODAY*
>
>Currently, the latest release of Apache Kylin is 4.0.1. Apache Kylin 4.0 is
>a major version update after Kylin 3.x (HBase Storage). Kylin 4.0 uses
>Parquet to replace HBase as storage engine, so as to improve file scanning
>performance. At the same time, Kylin 4.0 reimplements the spark based build
>engine and query engine, making it possible to separate computing and
>storage, and better adapt to the technology trend of cloud native. Kylin
>4.0 comprehensively updated the build and query engine, realized the
>deployment mode without Hadoop dependency, decreasing the complexity of
>deployment. However, Kylin also has a lot to improve, such as the ability
>of business semantic layer needs to be strengthened and the modification of
>model/cube is not flexible. With these, we thinking a few things to do:
>
>   - Multi-dimensional query ability friendly to non-technical personnel.
>   Multi-dimensional model is the key to distinguish Kylin from the general
>   OLAP engines. The feature is that the model concept based on dimension and
>   measurement is more friendly to non-technical personnel and closer to the
>   goal of citizen analyst. The multi-dimensional query capability that
>   non-technical personnel can use should be the new focus of Kylin
>   technology.
>
>
>   - Native Engine. The query engine of Kylin still has much room for
>   improvement in vector acceleration and cpu instruction level optimization.
>   The Spark community Kylin relies on also has a strong demand for native
>   engine. It is optimistic that native engine can improve the performance of
>   Kylin by at least three times, which is worthy of investment.
>
>
>   - More cloud native capabilities. Kylin 4.0 has only completed the
>   initial cloud deployment and realized the features of rapid deployment and
>   dynamic resource scaling on the cloud, but there are still many cloud
>   native capabilities to be developed.
>
>More explanations are following.
>
>*KYLIN AS A MULTI-DIMENSIONAL DATABASE*
>
>The core of Kylin is a multi-dimensional database, which is a special OLAP
>engine. Although Kylin has always had the ability of a relational database
>since its birth, and it is often compared with other relational OLAP
>engines, what really makes Kylin different is multi-dimensional model and
>multi-dimensional database ability. Considering the essence of Kylin and
>its wide range of business uses in the future (not only technical uses),
>positioning Kylin as a multi-dimensional database makes perfect sense. With
>business semantics and precomputation technology, Apache Kylin helps
>non-technical people understand and afford big data, and realizes data
>democratization.
>
>*THE SEMANTIC LAYER*
>
>The key difference between the multi-dimensional database and the
>relational database is business expression ability. Although SQL has strong
>expression ability and is the basic skill of data analysts, SQL and the RDB
>are still too difficult for non-technical personnel if we aim at "everyone
>is a data analyst". From the perspective of non-technical personnel, the
>data lake and data warehouse are like a dark room. They know that there is
>a lot of data, but they can't see clearly, understand and use this data
>because they don't understand database theory and SQL.
>
>How to make the Data Lake (and data warehouse) clear to non-technical
>personnel? This requires introducing a more friendly data model for
>non-technical personnel — multi-dimensional data model. While the
>relational model describes the technical form of data, the
>multi-dimensional model describes the business form of data. In a MDB,
>measurement corresponds to business indicators that everyone understands,
>and dimension is the perspective of comparing and observing these business
>indicators. Compare KPI with last month and compare performance between
>parallel business units, which are concepts understood by every
>non-technical personnel. By mapping the relational model to the
>multi-dimensional model, the essence is to enhance the business semantics
>on the technical data, form a business semantic layer, and help
>non-technical personnel understand, explore and use the data. In order to
>enhance Kylin's ability as the semantic layer, supporting multi-dimensional
>query language is the key content of Kylin roadmap, such as MDX and DAX.
>MDX can transform the data model in Kylin into a business friendly
>language, endow data with business value, and facilitate Kylin's
>multi-dimensional analysis with BI tools such as Excel and Tableau.
>
>*PRECOMPUTATION AND MODEL FLEXIBILITY*
>
>It is kylin's unchanging mission to continue to reduce the cost of a single
>query through precomputation technology so that ordinary people can afford
>big data. If the multi-dimensional model solves the problem that
>non-technical personnel can understand data, then precomputation can solve
>the problem that ordinary people can afford data. Both are necessary
>conditions for data democratization. Through one calculation and multiple
>use, the data cost can be shared by multiple users to achieve the scale
>effect that the more users, the cheaper. Precalculation is Kylin's
>traditional strength, but it lacks some flexibility in the change of
>precalculation model. In order to strengthen the ability to change models
>flexibly of Kylin and bring more optimization room, Kylin community expects
>to propose a new metadata format in Kylin in the future to make
>precalculation more flexible, be able to cope with that table format or
>business requirements may change at any time.
>
>*SUMMARY*
>
>To sum up, we would like to propose Kylin as a multi-dimensional database.
>Through multi-dimensional model and precomputation technology, ordinary
>people can understand and afford big data, and finally realize the vision
>of data democratization. Meanwhile, for today's users who use Kylin as the
>SQL acceleration layer, Kylin will continue to enhance its SQL engine, to
>ensure that the precomputation technology can be used by both relational
>model and multi-dimensional model. In the figure below, we picture the
>future of Kylin. The newly added and modified parts are roughly marked in
>blue and orange.
>
>*FURTHER READING*
>
>   - https://en.wikipedia.org/wiki/Data_model
>   - https://en.wikipedia.org/wiki/Semantic_layer
>   - https://en.wikipedia.org/wiki/Multidimensional_analysis
>   - https://en.wikipedia.org/wiki/MultiDimensional_eXpressions
>   - https://en.wikipedia.org/wiki/XML_for_Analysis
>   - https://en.wikipedia.org/wiki/SIMD
>   - https://en.wikipedia.org/wiki/Cloud_native_computing
>   -
>   
> https://blogs.gartner.com/carlie-idoine/2018/05/13/citizen-data-scientists-and-why-they-matter/
>
>
>Please share your ideas and comments.  :-)
>
>Cheers
>Yang

Re:[DISCUSS] The future of Apache Kylin

Reply via email to