Druid quick comparision

Xiaoxiang Yu Thu, 07 Dec 2023 00:00:34 -0800

I guess the release date should be 2024/01 .
Do you have any suggestions/wishes for kylin 5(except real-time feature)?


------------------------
With warm regard
Xiaoxiang Yu



On Thu, Dec 7, 2023 at 3:44 PM Nam Đỗ Duy <na...@vnpay.vn.invalid> wrote:

> Thank you very much xiaoxiang, I did the presentation this morning already
> so there is no time for you to comment. Next time I will send you in
> advance. The meeting result was that we will implement both druid and kylin
> in the next couple of projects because of its realtime feature. Hope that
> kylin will have same feature soon.
>
> May I ask when will you release kylin 5.0?
>
> On Thu, Dec 7, 2023 at 9:26 AM Xiaoxiang Yu <x...@apache.org> wrote:
>
> > Since 2018 there are a lot of new features and code refactor.
> > If you like, you can share your ppt to me privately, maybe I can
> > give some comments.
> >
> > Here is the reference of advantages of Kylin since 2018:
> > - https://kylin.apache.org/blog/2022/01/12/The-Future-Of-Kylin/
> > -
> >
> https://kylin.apache.org/blog/2021/07/02/Apache-Kylin4-A-new-storage-and-compute-architecture/
> > - https://kylin.apache.org/5.0/docs/development/roadmap
> >
> > ------------------------
> > With warm regard
> > Xiaoxiang Yu
> >
> >
> >
> > On Wed, Dec 6, 2023 at 6:53 PM Nam Đỗ Duy <na...@vnpay.vn.invalid>
> wrote:
> >
> >> Hi Xiaoxiang, tomorrow is the main presentation between Kylin and Druid
> in
> >> my team.
> >>
> >> I found this article and would like you to update me the advantages of
> >> Kylin since 2018 until now (especially with version 5 to be released)
> >>
> >> Apache Kylin | Why did Meituan develop Kylin On Druid (part 1 of 2)?
> >> <
> >>
> https://kylin.apache.org/blog/2018/12/12/why-did-meituan-develop-kylin-on-druid-part1-of-2/
> >> >
> >>
> >> On Wed, Dec 6, 2023 at 9:34 AM Nam Đỗ Duy <na...@vnpay.vn> wrote:
> >>
> >> > Thank you very much for your prompt response, I still have several
> >> > questions to seek for your help later.
> >> >
> >> > Best regards and have a good day
> >> >
> >> >
> >> >
> >> > On Wed, Dec 6, 2023 at 9:11 AM Xiaoxiang Yu <x...@apache.org> wrote:
> >> >
> >> >> Done. Github branch changed to kylin5.
> >> >>
> >> >> ------------------------
> >> >> With warm regard
> >> >> Xiaoxiang Yu
> >> >>
> >> >>
> >> >>
> >> >> On Tue, Dec 5, 2023 at 11:13 AM Xiaoxiang Yu <x...@apache.org>
> wrote:
> >> >>
> >> >> > A JIRA ticket has been opened, waiting for INFRA :
> >> >> > https://issues.apache.org/jira/browse/INFRA-25238 .
> >> >> > ------------------------
> >> >> > With warm regard
> >> >> > Xiaoxiang Yu
> >> >> >
> >> >> >
> >> >> >
> >> >> > On Tue, Dec 5, 2023 at 10:30 AM Nam Đỗ Duy <na...@vnpay.vn.invalid
> >
> >> >> wrote:
> >> >> >
> >> >> >> Thank you Xiaoxiang, please update me when you have changed your
> >> >> default
> >> >> >> branch. In case people are impressed by the numbers then I hope to
> >> turn
> >> >> >> this situation to reverse direction.
> >> >> >>
> >> >> >> On Tue, Dec 5, 2023 at 9:02 AM Xiaoxiang Yu <x...@apache.org>
> >> wrote:
> >> >> >>
> >> >> >>> The default branch is for 4.X which is a maintained branch, the
> >> active
> >> >> >>> branch is kylin5.
> >> >> >>> I will change the default branch to kylin5 later.
> >> >> >>>
> >> >> >>> ------------------------
> >> >> >>> With warm regard
> >> >> >>> Xiaoxiang Yu
> >> >> >>>
> >> >> >>>
> >> >> >>>
> >> >> >>> On Tue, Dec 5, 2023 at 9:12 AM Nam Đỗ Duy <na...@vnpay.vn.invalid
> >
> >> >> >>> wrote:
> >> >> >>>
> >> >> >>>> Hi Xiaoxiang, Sirs / Madams
> >> >> >>>>
> >> >> >>>> Can you see the atttached photo
> >> >> >>>>
> >> >> >>>> My boss asked that why druid commit code regularly but kylin had
> >> not
> >> >> >>>> been committed since July
> >> >> >>>>
> >> >> >>>>
> >> >> >>>> On Mon, 4 Dec 2023 at 15:33 Xiaoxiang Yu <x...@apache.org>
> wrote:
> >> >> >>>>
> >> >> >>>>> I think so.
> >> >> >>>>>
> >> >> >>>>> Response time is not the only factor to make a decision. Kylin
> >> could
> >> >> >>>>> be cheaper
> >> >> >>>>> when the query pattern is suitable for the Kylin model, and
> Kylin
> >> >> can
> >> >> >>>>> guarantee
> >> >> >>>>> reasonable query latency. Clickhouse will be quicker in an ad
> hoc
> >> >> >>>>> query scenario.
> >> >> >>>>>
> >> >> >>>>> By the way, Youzan and Kyligence combine them together to
> provide
> >> >> >>>>> unified data analytics services for their customers.
> >> >> >>>>>
> >> >> >>>>> ------------------------
> >> >> >>>>> With warm regard
> >> >> >>>>> Xiaoxiang Yu
> >> >> >>>>>
> >> >> >>>>>
> >> >> >>>>>
> >> >> >>>>> On Mon, Dec 4, 2023 at 4:01 PM Nam Đỗ Duy
> <na...@vnpay.vn.invalid
> >> >
> >> >> >>>>> wrote:
> >> >> >>>>>
> >> >> >>>>>> Hi Xiaoxiang, thank you
> >> >> >>>>>>
> >> >> >>>>>> In case my client uses cloud computing service like gcp or
> aws,
> >> >> which
> >> >> >>>>>> will cost more: precalculation feature of kylin or clickhouse
> >> >> (incase
> >> >> >>>>>> of
> >> >> >>>>>> kylin, I have a thought that the query execution has been done
> >> once
> >> >> >>>>>> and
> >> >> >>>>>> stored in cube to be used many times so kylin uses less cloud
> >> >> >>>>>> computation,
> >> >> >>>>>> is that true)?
> >> >> >>>>>>
> >> >> >>>>>> On Mon, Dec 4, 2023 at 2:46 PM Xiaoxiang Yu <x...@apache.org>
> >> >> wrote:
> >> >> >>>>>>
> >> >> >>>>>> > Following text is part of an article(
> >> >> >>>>>> > https://zhuanlan.zhihu.com/p/343394287) .
> >> >> >>>>>> >
> >> >> >>>>>> >
> >> >> >>>>>> >
> >> >> >>>>>>
> >> >>
> >>
> ===============================================================================
> >> >> >>>>>> >
> >> >> >>>>>> > Kylin is suitable for aggregation queries with fixed modes
> >> >> because
> >> >> >>>>>> of its
> >> >> >>>>>> > pre-calculated technology, for example, join, group by, and
> >> where
> >> >> >>>>>> condition
> >> >> >>>>>> > modes in SQL are relatively fixed, etc. The larger the data
> >> >> volume
> >> >> >>>>>> is, the
> >> >> >>>>>> > more obvious the advantages of using Kylin are; in
> particular,
> >> >> >>>>>> Kylin is
> >> >> >>>>>> > particularly advantageous in the scenarios of de-emphasis
> >> (count
> >> >> >>>>>> distinct),
> >> >> >>>>>> > Top N, and Percentile. In particular, Kylin's advantages in
> >> >> >>>>>> de-weighting
> >> >> >>>>>> > (count distinct), Top N, Percentile and other scenarios are
> >> >> >>>>>> especially
> >> >> >>>>>> > huge, and it is used in a large number of scenarios, such as
> >> >> >>>>>> Dashboard, all
> >> >> >>>>>> > kinds of reports, large-screen display, traffic statistics,
> >> and
> >> >> user
> >> >> >>>>>> > behavior analysis. Meituan, Aurora, Shell Housing, etc. use
> >> Kylin
> >> >> >>>>>> to build
> >> >> >>>>>> > their data service platforms, providing millions to tens of
> >> >> >>>>>> millions of
> >> >> >>>>>> > queries per day, and most of the queries can be completed
> >> within
> >> >> 2
> >> >> >>>>>> - 3
> >> >> >>>>>> > seconds. There is no better alternative for such a high
> >> >> concurrency
> >> >> >>>>>> > scenario.
> >> >> >>>>>> >
> >> >> >>>>>> > ClickHouse, because of its MPP architecture, has high
> >> computing
> >> >> >>>>>> power and
> >> >> >>>>>> > is more suitable when the query request is more flexible, or
> >> when
> >> >> >>>>>> there is
> >> >> >>>>>> > a need for detailed queries with low concurrency. Scenarios
> >> >> >>>>>> include: very
> >> >> >>>>>> > many columns and where conditions are arbitrarily combined
> >> with
> >> >> the
> >> >> >>>>>> user
> >> >> >>>>>> > label filtering, not a large amount of concurrency of
> complex
> >> >> >>>>>> on-the-spot
> >> >> >>>>>> > query and so on. If the amount of data and access is large,
> >> you
> >> >> >>>>>> need to
> >> >> >>>>>> > deploy a distributed ClickHouse cluster, which is a higher
> >> >> >>>>>> challenge for
> >> >> >>>>>> > operation and maintenance.
> >> >> >>>>>> >
> >> >> >>>>>> > If some queries are very flexible but infrequent, it is more
> >> >> >>>>>> > resource-efficient to use now-computing. Since the number of
> >> >> >>>>>> queries is
> >> >> >>>>>> > small, even if each query consumes a lot of computational
> >> >> >>>>>> resources, it is
> >> >> >>>>>> > still cost-effective overall. If some queries have a fixed
> >> >> pattern
> >> >> >>>>>> and the
> >> >> >>>>>> > query volume is large, it is more suitable for Kylin,
> because
> >> the
> >> >> >>>>>> query
> >> >> >>>>>> > volume is large, and by using large computational resources
> to
> >> >> save
> >> >> >>>>>> the
> >> >> >>>>>> > results, the upfront computational cost can be amortized
> over
> >> >> each
> >> >> >>>>>> query,
> >> >> >>>>>> > so it is the most economical.
> >> >> >>>>>> >
> >> >> >>>>>> > --- Translated with DeepL.com (free version)
> >> >> >>>>>> >
> >> >> >>>>>> >
> >> >> >>>>>> > ------------------------
> >> >> >>>>>> > With warm regard
> >> >> >>>>>> > Xiaoxiang Yu
> >> >> >>>>>> >
> >> >> >>>>>> >
> >> >> >>>>>> >
> >> >> >>>>>> > On Mon, Dec 4, 2023 at 3:16 PM Nam Đỗ Duy
> >> <na...@vnpay.vn.invalid
> >> >> >
> >> >> >>>>>> wrote:
> >> >> >>>>>> >
> >> >> >>>>>> >> Thank you Xiaoxiang for the near real time streaming
> feature.
> >> >> >>>>>> That's
> >> >> >>>>>> >> great.
> >> >> >>>>>> >>
> >> >> >>>>>> >> This morning there has been a new challenge to my team:
> >> >> clickhouse
> >> >> >>>>>> offered
> >> >> >>>>>> >> us the speed of calculating 8 billion rows in millisecond
> >> which
> >> >> is
> >> >> >>>>>> faster
> >> >> >>>>>> >> than my demonstration (I used Kylin to do calculating 1
> >> billion
> >> >> >>>>>> rows in
> >> >> >>>>>> >> 2.9
> >> >> >>>>>> >> seconds)
> >> >> >>>>>> >>
> >> >> >>>>>> >> Can you briefly suggest the advantages of kylin over
> >> clickhouse
> >> >> so
> >> >> >>>>>> that I
> >> >> >>>>>> >> can defend my demonstration.
> >> >> >>>>>> >>
> >> >> >>>>>> >> On Mon, Dec 4, 2023 at 1:55 PM Xiaoxiang Yu <
> x...@apache.org
> >> >
> >> >> >>>>>> wrote:
> >> >> >>>>>> >>
> >> >> >>>>>> >> > 1. "In this important scenario of realtime analytics, the
> >> >> reason
> >> >> >>>>>> here is
> >> >> >>>>>> >> > that
> >> >> >>>>>> >> > kylin has lag time due to model update of new segment
> >> build,
> >> >> is
> >> >> >>>>>> that
> >> >> >>>>>> >> > correct?"
> >> >> >>>>>> >> >
> >> >> >>>>>> >> > You are correct.
> >> >> >>>>>> >> >
> >> >> >>>>>> >> > 2. "If that is true, then can you suggest a work-around
> of
> >> >> >>>>>> combination
> >> >> >>>>>> >> of
> >> >> >>>>>> >> > ... "
> >> >> >>>>>> >> >
> >> >> >>>>>> >> > Kylin is planning to introduce NRT streaming(coding is
> >> >> completed
> >> >> >>>>>> but not
> >> >> >>>>>> >> > released),
> >> >> >>>>>> >> > which can make the time-lag to about 3 minutes(that is my
> >> >> >>>>>> estimation
> >> >> >>>>>> >> but I
> >> >> >>>>>> >> > am
> >> >> >>>>>> >> > quite certain about it).
> >> >> >>>>>> >> > NRT stands for 'near real-time', it will run a job and do
> >> >> >>>>>> micro-batch
> >> >> >>>>>> >> > aggregation and persistence periodically. The price is
> that
> >> >> you
> >> >> >>>>>> need to
> >> >> >>>>>> >> run
> >> >> >>>>>> >> > and monitor a long-running
> >> >> >>>>>> >> >  job. This feature is based on Spark Streaming, so you
> need
> >> >> >>>>>> knowledge of
> >> >> >>>>>> >> > it.
> >> >> >>>>>> >> >
> >> >> >>>>>> >> > I am curious about what is the maximum time-lag your
> >> customers
> >> >> >>>>>> >> > can tolerate?
> >> >> >>>>>> >> > Personally, I guess minute level time-lag is ok for most
> >> >> cases.
> >> >> >>>>>> >> >
> >> >> >>>>>> >> > ------------------------
> >> >> >>>>>> >> > With warm regard
> >> >> >>>>>> >> > Xiaoxiang Yu
> >> >> >>>>>> >> >
> >> >> >>>>>> >> >
> >> >> >>>>>> >> >
> >> >> >>>>>> >> > On Mon, Dec 4, 2023 at 12:28 PM Nam Đỗ Duy
> >> >> >>>>>> <na...@vnpay.vn.invalid>
> >> >> >>>>>> >> wrote:
> >> >> >>>>>> >> >
> >> >> >>>>>> >> > > Druid is better in
> >> >> >>>>>> >> > > - Have a real-time datasource like Kafka etc.
> >> >> >>>>>> >> > >
> >> >> >>>>>> >> > > ==========================
> >> >> >>>>>> >> > >
> >> >> >>>>>> >> > > Hi Xiaoxiang, thank you for your response.
> >> >> >>>>>> >> > >
> >> >> >>>>>> >> > > In this important scenario of realtime alalytics, the
> >> reason
> >> >> >>>>>> here is
> >> >> >>>>>> >> that
> >> >> >>>>>> >> > > kylin has lag time due to model update of new segment
> >> build,
> >> >> >>>>>> is that
> >> >> >>>>>> >> > > correct?
> >> >> >>>>>> >> > >
> >> >> >>>>>> >> > > If that is true, then can you suggest a work-around of
> >> >> >>>>>> combination of
> >> >> >>>>>> >> :
> >> >> >>>>>> >> > >
> >> >> >>>>>> >> > > (time - lag kylin cube) + (realtime DB update) to
> provide
> >> >> >>>>>> >> > > realtime capability ?
> >> >> >>>>>> >> > >
> >> >> >>>>>> >> > > IMO, the point here is to find that (realtime DB
> update)
> >> and
> >> >> >>>>>> >> integrate it
> >> >> >>>>>> >> > > with (time - lag kylin cube).
> >> >> >>>>>> >> > >
> >> >> >>>>>> >> > > On Fri, Dec 1, 2023 at 1:53 PM Xiaoxiang Yu <
> >> >> x...@apache.org>
> >> >> >>>>>> wrote:
> >> >> >>>>>> >> > >
> >> >> >>>>>> >> > > > I researched and tested Druid two years ago(I don't
> >> know
> >> >> too
> >> >> >>>>>> much
> >> >> >>>>>> >> about
> >> >> >>>>>> >> > > >  the change of Druid in these two years. New features
> >> >> that I
> >> >> >>>>>> know
> >> >> >>>>>> >> are :
> >> >> >>>>>> >> > > > new UI, fully on K8s etc).
> >> >> >>>>>> >> > > >
> >> >> >>>>>> >> > > > Here are some cases you should consider using Druid
> >> other
> >> >> >>>>>> than Kylin
> >> >> >>>>>> >> > > > at the moment (using Kylin 5.0-beta to compare the
> >> Druid
> >> >> >>>>>> which I
> >> >> >>>>>> >> used
> >> >> >>>>>> >> > two
> >> >> >>>>>> >> > > > years ago):
> >> >> >>>>>> >> > > >
> >> >> >>>>>> >> > > > - Have a real-time datasource like Kafka etc.
> >> >> >>>>>> >> > > > - Most queries are small(Based on my test result, I
> >> think
> >> >> >>>>>> Druid had
> >> >> >>>>>> >> > > better
> >> >> >>>>>> >> > > > response time for small queries two years ago.)
> >> >> >>>>>> >> > > > - Don't know how to optimize Spark/Hadoop, want to
> use
> >> the
> >> >> >>>>>> >> K8S/public
> >> >> >>>>>> >> > > >   cloud platform as your deployment platform.
> >> >> >>>>>> >> > > >
> >> >> >>>>>> >> > > > But I do think there are many scenarios in which
> Kylin
> >> >> could
> >> >> >>>>>> be
> >> >> >>>>>> >> better,
> >> >> >>>>>> >> > > > like:
> >> >> >>>>>> >> > > >
> >> >> >>>>>> >> > > > - Better performance for complex/big queries. Kylin
> can
> >> >> have
> >> >> >>>>>> a more
> >> >> >>>>>> >> > > > exact-match/fine-grained
> >> >> >>>>>> >> > > >   Index for queries containing different `Group By
> >> >> >>>>>> dimensions`.
> >> >> >>>>>> >> > > > - User-friendly UI for modeling.
> >> >> >>>>>> >> > > > - Support 'Join' better? (Not sure at the moment)
> >> >> >>>>>> >> > > > - ODBC driver for different BI.(its website did not
> >> show
> >> >> it
> >> >> >>>>>> supports
> >> >> >>>>>> >> > ODBC
> >> >> >>>>>> >> > > > well)
> >> >> >>>>>> >> > > > - Looks like Kylin supports ANSI SQL better than
> Druid.
> >> >> >>>>>> >> > > >
> >> >> >>>>>> >> > > >
> >> >> >>>>>> >> > > > I don't know Pinot, so I have nothing to say about
> it.
> >> >> >>>>>> >> > > > Hope to help you, or you are free to share your
> >> opinion.
> >> >> >>>>>> >> > > >
> >> >> >>>>>> >> > > > ------------------------
> >> >> >>>>>> >> > > > With warm regard
> >> >> >>>>>> >> > > > Xiaoxiang Yu
> >> >> >>>>>> >> > > >
> >> >> >>>>>> >> > > >
> >> >> >>>>>> >> > > >
> >> >> >>>>>> >> > > > On Fri, Dec 1, 2023 at 11:11 AM Nam Đỗ Duy
> >> >> >>>>>> <na...@vnpay.vn.invalid>
> >> >> >>>>>> >> > > wrote:
> >> >> >>>>>> >> > > >
> >> >> >>>>>> >> > > >> Dear Xiaoxiang,
> >> >> >>>>>> >> > > >> Sirs/Madams,
> >> >> >>>>>> >> > > >>
> >> >> >>>>>> >> > > >> May I post my boss's question:
> >> >> >>>>>> >> > > >>
> >> >> >>>>>> >> > > >> What are the pros and cons of the OLAP platform
> Kylin
> >> >> >>>>>> compared to
> >> >> >>>>>> >> > Pinot
> >> >> >>>>>> >> > > >> and
> >> >> >>>>>> >> > > >> Druid?
> >> >> >>>>>> >> > > >>
> >> >> >>>>>> >> > > >> Please kindly let me know
> >> >> >>>>>> >> > > >>
> >> >> >>>>>> >> > > >> Thank you very much and best regards
> >> >> >>>>>> >> > > >>
> >> >> >>>>>> >> > > >
> >> >> >>>>>> >> > >
> >> >> >>>>>> >> >
> >> >> >>>>>> >>
> >> >> >>>>>> >
> >> >> >>>>>>
> >> >> >>>>>
> >> >>
> >> >
> >>
> >
>

Re: Pinot/Kylin/Druid quick comparision

Reply via email to