Re: GlobalKTable restoration - Unexplained performance penalty

Guozhang Wang Tue, 01 Dec 2020 14:53:22 -0800

Hello Nitay,

Sorry for the late reply.


Would you be able to share the actual log entries from where you infers the
elapsed time and the total number of records restored?


Guozhang


On Tue, Nov 24, 2020 at 3:44 AM Nitay Kufert <nita...@ironsrc.com> wrote:

> Hey,
> I get the log *after* the restart was triggered for my app (and my app
> actually restarted, meaning i get it as part of my app bootstrap logging)
>
> On Tue, Nov 24, 2020 at 12:03 AM Guozhang Wang <wangg...@gmail.com> wrote:
>
> > Hello Nitay,
> >
> > Thanks for letting us know about your observations. When you restart the
> > application from empty local state Kafka Streams will try to restore all
> > the records up to the current log end for those global KTables, and
> during
> > this period of time there should be no processing.
> >
> > Do you mind sharing where you get the "totalRecordsToBeRestored", is it
> > before the restarting was triggered?
> >
> > Guozhang
> >
> > On Mon, Nov 23, 2020 at 4:15 AM Nitay Kufert <nita...@ironsrc.com>
> wrote:
> >
> > > Hey all,
> > > We have been running a kafka-stream based service in production for the
> > > last couple of years (we have 4 brokers on this specific cluster).
> > > Inside this service, which does a lot of stuff - there are 2
> > GlobalKTables
> > > (which are based on 2 compacted topics).
> > >
> > > When I restart my app and clean the local state - the restoration of
> > those
> > > topics begins, and the weird thing I am noticing is that for *much
> *fewer
> > > messages, 1 topic takes *a lot* more time to complete.
> > >
> > > Let's call the compacted topics A and B, both are compacted and mainly
> > > configured the same:
> > > Replication 3
> > > Number of Partitions 36
> > >
> > > max.message.bytes 25000000
> > > segment.index.bytes 2097152
> > > min.cleanable.dirty.ratio 0.1
> > > cleanup.policy compact
> > > delete.retention.ms 900000
> > > segment.bytes 52428800
> > > segment.ms 3600000
> > > *WIth the exception, that for topic B, we use *
> > > cleanup.policy compact,delete
> > > retention.ms 604800000
> > > *The behaviour i've noticed for a single partition:*
> > > Topic A has 62,552 totalRecordsToBeRestored - and it takes around 20s
> > > Topic B has 24,730,506 totalRecordsToBeRestored - and it takes around
> 1s
> > >
> > > It is worth mentioning that the data that B holds for each record *is
> > much
> > > much bigger *(A holds an integer while B holds a big object)*.*
> > >
> > > Now, I get the feeling that the reason is because the data in B is
> always
> > > relatively "fresh", while the data in topic A is mostly stale (the
> > business
> > > logic behind it suggests that topic A updates at a very low rate - and
> a
> > > lot of keys would never be updated)
> > > So, for example, it's probably holding keys that haven't been updated
> > since
> > > 2018.
> > > Topic B keeps getting updated every couple of milliseconds.
> > >
> > > Another difference that I just realized i might share is that topic A
> is
> > > being "joined" by 6 other streams, while topic B is only being joined
> by
> > 2.
> > >
> > > I find it hard to explain the relation between keeping "old" records
> and
> > > the huge difference in number of records and their size.
> > >
> > > So I guess I am missing some basic concept when it comes to
> understanding
> > > compacted topics and the way the broker saves and fetches the data OR
> > maybe
> > > we have some underlying problem which can explain it.
> > >
> > > Let me know if you need some more info
> > >
> > > Thanks!
> > >
> > > --
> > >
> > > Nitay Kufert
> > > Backend Team Leader
> > > [image: ironSource] <http://www.ironsrc.com>
> > >
> > > email nita...@ironsrc.com
> > > mobile +972-54-5480021
> > > fax +972-77-5448273
> > > skype nitay.kufert.ssa
> > > 121 Menachem Begin St., Tel Aviv, Israel
> > > ironsrc.com <http://www.ironsrc.com>
> > > [image: linkedin] <https://www.linkedin.com/company/ironsource>
> [image:
> > > twitter] <https://twitter.com/ironsource> [image: facebook]
> > > <https://www.facebook.com/ironSource> [image: googleplus]
> > > <https://plus.google.com/+ironsrc>
> > > This email (including any attachments) is for the sole use of the
> > intended
> > > recipient and may contain confidential information which may be
> protected
> > > by legal privilege. If you are not the intended recipient, or the
> > employee
> > > or agent responsible for delivering it to the intended recipient, you
> are
> > > hereby notified that any use, dissemination, distribution or copying of
> > > this communication and/or its content is strictly prohibited. If you
> are
> > > not the intended recipient, please immediately notify us by reply email
> > or
> > > by telephone, delete this email and destroy any copies. Thank you.
> > >
> >
> >
> > --
> > -- Guozhang
> >
>
>
> --
>
> Nitay Kufert
> Backend Team Leader
> [image: ironSource] <http://www.ironsrc.com>
>
> email nita...@ironsrc.com
> mobile +972-54-5480021
> fax +972-77-5448273
> skype nitay.kufert.ssa
> 121 Menachem Begin St., Tel Aviv, Israel
> ironsrc.com <http://www.ironsrc.com>
> [image: linkedin] <https://www.linkedin.com/company/ironsource> [image:
> twitter] <https://twitter.com/ironsource> [image: facebook]
> <https://www.facebook.com/ironSource> [image: googleplus]
> <https://plus.google.com/+ironsrc>
> This email (including any attachments) is for the sole use of the intended
> recipient and may contain confidential information which may be protected
> by legal privilege. If you are not the intended recipient, or the employee
> or agent responsible for delivering it to the intended recipient, you are
> hereby notified that any use, dissemination, distribution or copying of
> this communication and/or its content is strictly prohibited. If you are
> not the intended recipient, please immediately notify us by reply email or
> by telephone, delete this email and destroy any copies. Thank you.
>


-- 
-- Guozhang

Re: GlobalKTable restoration - Unexplained performance penalty

Reply via email to