I guess the release date should be 2024/01 . Do you have any suggestions/wishes for kylin 5(except real-time feature)?
------------------------ With warm regard Xiaoxiang Yu On Thu, Dec 7, 2023 at 3:44 PM Nam Đỗ Duy <na...@vnpay.vn.invalid> wrote: > Thank you very much xiaoxiang, I did the presentation this morning already > so there is no time for you to comment. Next time I will send you in > advance. The meeting result was that we will implement both druid and kylin > in the next couple of projects because of its realtime feature. Hope that > kylin will have same feature soon. > > May I ask when will you release kylin 5.0? > > On Thu, Dec 7, 2023 at 9:26 AM Xiaoxiang Yu <x...@apache.org> wrote: > > > Since 2018 there are a lot of new features and code refactor. > > If you like, you can share your ppt to me privately, maybe I can > > give some comments. > > > > Here is the reference of advantages of Kylin since 2018: > > - https://kylin.apache.org/blog/2022/01/12/The-Future-Of-Kylin/ > > - > > > https://kylin.apache.org/blog/2021/07/02/Apache-Kylin4-A-new-storage-and-compute-architecture/ > > - https://kylin.apache.org/5.0/docs/development/roadmap > > > > ------------------------ > > With warm regard > > Xiaoxiang Yu > > > > > > > > On Wed, Dec 6, 2023 at 6:53 PM Nam Đỗ Duy <na...@vnpay.vn.invalid> > wrote: > > > >> Hi Xiaoxiang, tomorrow is the main presentation between Kylin and Druid > in > >> my team. > >> > >> I found this article and would like you to update me the advantages of > >> Kylin since 2018 until now (especially with version 5 to be released) > >> > >> Apache Kylin | Why did Meituan develop Kylin On Druid (part 1 of 2)? > >> < > >> > https://kylin.apache.org/blog/2018/12/12/why-did-meituan-develop-kylin-on-druid-part1-of-2/ > >> > > >> > >> On Wed, Dec 6, 2023 at 9:34 AM Nam Đỗ Duy <na...@vnpay.vn> wrote: > >> > >> > Thank you very much for your prompt response, I still have several > >> > questions to seek for your help later. > >> > > >> > Best regards and have a good day > >> > > >> > > >> > > >> > On Wed, Dec 6, 2023 at 9:11 AM Xiaoxiang Yu <x...@apache.org> wrote: > >> > > >> >> Done. Github branch changed to kylin5. > >> >> > >> >> ------------------------ > >> >> With warm regard > >> >> Xiaoxiang Yu > >> >> > >> >> > >> >> > >> >> On Tue, Dec 5, 2023 at 11:13 AM Xiaoxiang Yu <x...@apache.org> > wrote: > >> >> > >> >> > A JIRA ticket has been opened, waiting for INFRA : > >> >> > https://issues.apache.org/jira/browse/INFRA-25238 . > >> >> > ------------------------ > >> >> > With warm regard > >> >> > Xiaoxiang Yu > >> >> > > >> >> > > >> >> > > >> >> > On Tue, Dec 5, 2023 at 10:30 AM Nam Đỗ Duy <na...@vnpay.vn.invalid > > > >> >> wrote: > >> >> > > >> >> >> Thank you Xiaoxiang, please update me when you have changed your > >> >> default > >> >> >> branch. In case people are impressed by the numbers then I hope to > >> turn > >> >> >> this situation to reverse direction. > >> >> >> > >> >> >> On Tue, Dec 5, 2023 at 9:02 AM Xiaoxiang Yu <x...@apache.org> > >> wrote: > >> >> >> > >> >> >>> The default branch is for 4.X which is a maintained branch, the > >> active > >> >> >>> branch is kylin5. > >> >> >>> I will change the default branch to kylin5 later. > >> >> >>> > >> >> >>> ------------------------ > >> >> >>> With warm regard > >> >> >>> Xiaoxiang Yu > >> >> >>> > >> >> >>> > >> >> >>> > >> >> >>> On Tue, Dec 5, 2023 at 9:12 AM Nam Đỗ Duy <na...@vnpay.vn.invalid > > > >> >> >>> wrote: > >> >> >>> > >> >> >>>> Hi Xiaoxiang, Sirs / Madams > >> >> >>>> > >> >> >>>> Can you see the atttached photo > >> >> >>>> > >> >> >>>> My boss asked that why druid commit code regularly but kylin had > >> not > >> >> >>>> been committed since July > >> >> >>>> > >> >> >>>> > >> >> >>>> On Mon, 4 Dec 2023 at 15:33 Xiaoxiang Yu <x...@apache.org> > wrote: > >> >> >>>> > >> >> >>>>> I think so. > >> >> >>>>> > >> >> >>>>> Response time is not the only factor to make a decision. Kylin > >> could > >> >> >>>>> be cheaper > >> >> >>>>> when the query pattern is suitable for the Kylin model, and > Kylin > >> >> can > >> >> >>>>> guarantee > >> >> >>>>> reasonable query latency. Clickhouse will be quicker in an ad > hoc > >> >> >>>>> query scenario. > >> >> >>>>> > >> >> >>>>> By the way, Youzan and Kyligence combine them together to > provide > >> >> >>>>> unified data analytics services for their customers. > >> >> >>>>> > >> >> >>>>> ------------------------ > >> >> >>>>> With warm regard > >> >> >>>>> Xiaoxiang Yu > >> >> >>>>> > >> >> >>>>> > >> >> >>>>> > >> >> >>>>> On Mon, Dec 4, 2023 at 4:01 PM Nam Đỗ Duy > <na...@vnpay.vn.invalid > >> > > >> >> >>>>> wrote: > >> >> >>>>> > >> >> >>>>>> Hi Xiaoxiang, thank you > >> >> >>>>>> > >> >> >>>>>> In case my client uses cloud computing service like gcp or > aws, > >> >> which > >> >> >>>>>> will cost more: precalculation feature of kylin or clickhouse > >> >> (incase > >> >> >>>>>> of > >> >> >>>>>> kylin, I have a thought that the query execution has been done > >> once > >> >> >>>>>> and > >> >> >>>>>> stored in cube to be used many times so kylin uses less cloud > >> >> >>>>>> computation, > >> >> >>>>>> is that true)? > >> >> >>>>>> > >> >> >>>>>> On Mon, Dec 4, 2023 at 2:46 PM Xiaoxiang Yu <x...@apache.org> > >> >> wrote: > >> >> >>>>>> > >> >> >>>>>> > Following text is part of an article( > >> >> >>>>>> > https://zhuanlan.zhihu.com/p/343394287) . > >> >> >>>>>> > > >> >> >>>>>> > > >> >> >>>>>> > > >> >> >>>>>> > >> >> > >> > =============================================================================== > >> >> >>>>>> > > >> >> >>>>>> > Kylin is suitable for aggregation queries with fixed modes > >> >> because > >> >> >>>>>> of its > >> >> >>>>>> > pre-calculated technology, for example, join, group by, and > >> where > >> >> >>>>>> condition > >> >> >>>>>> > modes in SQL are relatively fixed, etc. The larger the data > >> >> volume > >> >> >>>>>> is, the > >> >> >>>>>> > more obvious the advantages of using Kylin are; in > particular, > >> >> >>>>>> Kylin is > >> >> >>>>>> > particularly advantageous in the scenarios of de-emphasis > >> (count > >> >> >>>>>> distinct), > >> >> >>>>>> > Top N, and Percentile. In particular, Kylin's advantages in > >> >> >>>>>> de-weighting > >> >> >>>>>> > (count distinct), Top N, Percentile and other scenarios are > >> >> >>>>>> especially > >> >> >>>>>> > huge, and it is used in a large number of scenarios, such as > >> >> >>>>>> Dashboard, all > >> >> >>>>>> > kinds of reports, large-screen display, traffic statistics, > >> and > >> >> user > >> >> >>>>>> > behavior analysis. Meituan, Aurora, Shell Housing, etc. use > >> Kylin > >> >> >>>>>> to build > >> >> >>>>>> > their data service platforms, providing millions to tens of > >> >> >>>>>> millions of > >> >> >>>>>> > queries per day, and most of the queries can be completed > >> within > >> >> 2 > >> >> >>>>>> - 3 > >> >> >>>>>> > seconds. There is no better alternative for such a high > >> >> concurrency > >> >> >>>>>> > scenario. > >> >> >>>>>> > > >> >> >>>>>> > ClickHouse, because of its MPP architecture, has high > >> computing > >> >> >>>>>> power and > >> >> >>>>>> > is more suitable when the query request is more flexible, or > >> when > >> >> >>>>>> there is > >> >> >>>>>> > a need for detailed queries with low concurrency. Scenarios > >> >> >>>>>> include: very > >> >> >>>>>> > many columns and where conditions are arbitrarily combined > >> with > >> >> the > >> >> >>>>>> user > >> >> >>>>>> > label filtering, not a large amount of concurrency of > complex > >> >> >>>>>> on-the-spot > >> >> >>>>>> > query and so on. If the amount of data and access is large, > >> you > >> >> >>>>>> need to > >> >> >>>>>> > deploy a distributed ClickHouse cluster, which is a higher > >> >> >>>>>> challenge for > >> >> >>>>>> > operation and maintenance. > >> >> >>>>>> > > >> >> >>>>>> > If some queries are very flexible but infrequent, it is more > >> >> >>>>>> > resource-efficient to use now-computing. Since the number of > >> >> >>>>>> queries is > >> >> >>>>>> > small, even if each query consumes a lot of computational > >> >> >>>>>> resources, it is > >> >> >>>>>> > still cost-effective overall. If some queries have a fixed > >> >> pattern > >> >> >>>>>> and the > >> >> >>>>>> > query volume is large, it is more suitable for Kylin, > because > >> the > >> >> >>>>>> query > >> >> >>>>>> > volume is large, and by using large computational resources > to > >> >> save > >> >> >>>>>> the > >> >> >>>>>> > results, the upfront computational cost can be amortized > over > >> >> each > >> >> >>>>>> query, > >> >> >>>>>> > so it is the most economical. > >> >> >>>>>> > > >> >> >>>>>> > --- Translated with DeepL.com (free version) > >> >> >>>>>> > > >> >> >>>>>> > > >> >> >>>>>> > ------------------------ > >> >> >>>>>> > With warm regard > >> >> >>>>>> > Xiaoxiang Yu > >> >> >>>>>> > > >> >> >>>>>> > > >> >> >>>>>> > > >> >> >>>>>> > On Mon, Dec 4, 2023 at 3:16 PM Nam Đỗ Duy > >> <na...@vnpay.vn.invalid > >> >> > > >> >> >>>>>> wrote: > >> >> >>>>>> > > >> >> >>>>>> >> Thank you Xiaoxiang for the near real time streaming > feature. > >> >> >>>>>> That's > >> >> >>>>>> >> great. > >> >> >>>>>> >> > >> >> >>>>>> >> This morning there has been a new challenge to my team: > >> >> clickhouse > >> >> >>>>>> offered > >> >> >>>>>> >> us the speed of calculating 8 billion rows in millisecond > >> which > >> >> is > >> >> >>>>>> faster > >> >> >>>>>> >> than my demonstration (I used Kylin to do calculating 1 > >> billion > >> >> >>>>>> rows in > >> >> >>>>>> >> 2.9 > >> >> >>>>>> >> seconds) > >> >> >>>>>> >> > >> >> >>>>>> >> Can you briefly suggest the advantages of kylin over > >> clickhouse > >> >> so > >> >> >>>>>> that I > >> >> >>>>>> >> can defend my demonstration. > >> >> >>>>>> >> > >> >> >>>>>> >> On Mon, Dec 4, 2023 at 1:55 PM Xiaoxiang Yu < > x...@apache.org > >> > > >> >> >>>>>> wrote: > >> >> >>>>>> >> > >> >> >>>>>> >> > 1. "In this important scenario of realtime analytics, the > >> >> reason > >> >> >>>>>> here is > >> >> >>>>>> >> > that > >> >> >>>>>> >> > kylin has lag time due to model update of new segment > >> build, > >> >> is > >> >> >>>>>> that > >> >> >>>>>> >> > correct?" > >> >> >>>>>> >> > > >> >> >>>>>> >> > You are correct. > >> >> >>>>>> >> > > >> >> >>>>>> >> > 2. "If that is true, then can you suggest a work-around > of > >> >> >>>>>> combination > >> >> >>>>>> >> of > >> >> >>>>>> >> > ... " > >> >> >>>>>> >> > > >> >> >>>>>> >> > Kylin is planning to introduce NRT streaming(coding is > >> >> completed > >> >> >>>>>> but not > >> >> >>>>>> >> > released), > >> >> >>>>>> >> > which can make the time-lag to about 3 minutes(that is my > >> >> >>>>>> estimation > >> >> >>>>>> >> but I > >> >> >>>>>> >> > am > >> >> >>>>>> >> > quite certain about it). > >> >> >>>>>> >> > NRT stands for 'near real-time', it will run a job and do > >> >> >>>>>> micro-batch > >> >> >>>>>> >> > aggregation and persistence periodically. The price is > that > >> >> you > >> >> >>>>>> need to > >> >> >>>>>> >> run > >> >> >>>>>> >> > and monitor a long-running > >> >> >>>>>> >> > job. This feature is based on Spark Streaming, so you > need > >> >> >>>>>> knowledge of > >> >> >>>>>> >> > it. > >> >> >>>>>> >> > > >> >> >>>>>> >> > I am curious about what is the maximum time-lag your > >> customers > >> >> >>>>>> >> > can tolerate? > >> >> >>>>>> >> > Personally, I guess minute level time-lag is ok for most > >> >> cases. > >> >> >>>>>> >> > > >> >> >>>>>> >> > ------------------------ > >> >> >>>>>> >> > With warm regard > >> >> >>>>>> >> > Xiaoxiang Yu > >> >> >>>>>> >> > > >> >> >>>>>> >> > > >> >> >>>>>> >> > > >> >> >>>>>> >> > On Mon, Dec 4, 2023 at 12:28 PM Nam Đỗ Duy > >> >> >>>>>> <na...@vnpay.vn.invalid> > >> >> >>>>>> >> wrote: > >> >> >>>>>> >> > > >> >> >>>>>> >> > > Druid is better in > >> >> >>>>>> >> > > - Have a real-time datasource like Kafka etc. > >> >> >>>>>> >> > > > >> >> >>>>>> >> > > ========================== > >> >> >>>>>> >> > > > >> >> >>>>>> >> > > Hi Xiaoxiang, thank you for your response. > >> >> >>>>>> >> > > > >> >> >>>>>> >> > > In this important scenario of realtime alalytics, the > >> reason > >> >> >>>>>> here is > >> >> >>>>>> >> that > >> >> >>>>>> >> > > kylin has lag time due to model update of new segment > >> build, > >> >> >>>>>> is that > >> >> >>>>>> >> > > correct? > >> >> >>>>>> >> > > > >> >> >>>>>> >> > > If that is true, then can you suggest a work-around of > >> >> >>>>>> combination of > >> >> >>>>>> >> : > >> >> >>>>>> >> > > > >> >> >>>>>> >> > > (time - lag kylin cube) + (realtime DB update) to > provide > >> >> >>>>>> >> > > realtime capability ? > >> >> >>>>>> >> > > > >> >> >>>>>> >> > > IMO, the point here is to find that (realtime DB > update) > >> and > >> >> >>>>>> >> integrate it > >> >> >>>>>> >> > > with (time - lag kylin cube). > >> >> >>>>>> >> > > > >> >> >>>>>> >> > > On Fri, Dec 1, 2023 at 1:53 PM Xiaoxiang Yu < > >> >> x...@apache.org> > >> >> >>>>>> wrote: > >> >> >>>>>> >> > > > >> >> >>>>>> >> > > > I researched and tested Druid two years ago(I don't > >> know > >> >> too > >> >> >>>>>> much > >> >> >>>>>> >> about > >> >> >>>>>> >> > > > the change of Druid in these two years. New features > >> >> that I > >> >> >>>>>> know > >> >> >>>>>> >> are : > >> >> >>>>>> >> > > > new UI, fully on K8s etc). > >> >> >>>>>> >> > > > > >> >> >>>>>> >> > > > Here are some cases you should consider using Druid > >> other > >> >> >>>>>> than Kylin > >> >> >>>>>> >> > > > at the moment (using Kylin 5.0-beta to compare the > >> Druid > >> >> >>>>>> which I > >> >> >>>>>> >> used > >> >> >>>>>> >> > two > >> >> >>>>>> >> > > > years ago): > >> >> >>>>>> >> > > > > >> >> >>>>>> >> > > > - Have a real-time datasource like Kafka etc. > >> >> >>>>>> >> > > > - Most queries are small(Based on my test result, I > >> think > >> >> >>>>>> Druid had > >> >> >>>>>> >> > > better > >> >> >>>>>> >> > > > response time for small queries two years ago.) > >> >> >>>>>> >> > > > - Don't know how to optimize Spark/Hadoop, want to > use > >> the > >> >> >>>>>> >> K8S/public > >> >> >>>>>> >> > > > cloud platform as your deployment platform. > >> >> >>>>>> >> > > > > >> >> >>>>>> >> > > > But I do think there are many scenarios in which > Kylin > >> >> could > >> >> >>>>>> be > >> >> >>>>>> >> better, > >> >> >>>>>> >> > > > like: > >> >> >>>>>> >> > > > > >> >> >>>>>> >> > > > - Better performance for complex/big queries. Kylin > can > >> >> have > >> >> >>>>>> a more > >> >> >>>>>> >> > > > exact-match/fine-grained > >> >> >>>>>> >> > > > Index for queries containing different `Group By > >> >> >>>>>> dimensions`. > >> >> >>>>>> >> > > > - User-friendly UI for modeling. > >> >> >>>>>> >> > > > - Support 'Join' better? (Not sure at the moment) > >> >> >>>>>> >> > > > - ODBC driver for different BI.(its website did not > >> show > >> >> it > >> >> >>>>>> supports > >> >> >>>>>> >> > ODBC > >> >> >>>>>> >> > > > well) > >> >> >>>>>> >> > > > - Looks like Kylin supports ANSI SQL better than > Druid. > >> >> >>>>>> >> > > > > >> >> >>>>>> >> > > > > >> >> >>>>>> >> > > > I don't know Pinot, so I have nothing to say about > it. > >> >> >>>>>> >> > > > Hope to help you, or you are free to share your > >> opinion. > >> >> >>>>>> >> > > > > >> >> >>>>>> >> > > > ------------------------ > >> >> >>>>>> >> > > > With warm regard > >> >> >>>>>> >> > > > Xiaoxiang Yu > >> >> >>>>>> >> > > > > >> >> >>>>>> >> > > > > >> >> >>>>>> >> > > > > >> >> >>>>>> >> > > > On Fri, Dec 1, 2023 at 11:11 AM Nam Đỗ Duy > >> >> >>>>>> <na...@vnpay.vn.invalid> > >> >> >>>>>> >> > > wrote: > >> >> >>>>>> >> > > > > >> >> >>>>>> >> > > >> Dear Xiaoxiang, > >> >> >>>>>> >> > > >> Sirs/Madams, > >> >> >>>>>> >> > > >> > >> >> >>>>>> >> > > >> May I post my boss's question: > >> >> >>>>>> >> > > >> > >> >> >>>>>> >> > > >> What are the pros and cons of the OLAP platform > Kylin > >> >> >>>>>> compared to > >> >> >>>>>> >> > Pinot > >> >> >>>>>> >> > > >> and > >> >> >>>>>> >> > > >> Druid? > >> >> >>>>>> >> > > >> > >> >> >>>>>> >> > > >> Please kindly let me know > >> >> >>>>>> >> > > >> > >> >> >>>>>> >> > > >> Thank you very much and best regards > >> >> >>>>>> >> > > >> > >> >> >>>>>> >> > > > > >> >> >>>>>> >> > > > >> >> >>>>>> >> > > >> >> >>>>>> >> > >> >> >>>>>> > > >> >> >>>>>> > >> >> >>>>> > >> >> > >> > > >> > > >