Re: [DISCUSS] KIP-876: Time based cluster metadata snapshots

2022-10-31 Thread José Armando García Sancio
Hi all,

During the implementation of this KIP and some offline discussion with
Jason Gustafson, I realized now that Kafka is generating a snapshot
every hour and the default metadata retention is to delete snapshots
after 7 days, every cluster metadata partition will have 168 (1
snapshot per hour * 24 hours per day * 7 days) snapshots. If we assume
that in most cases the size of the snapshot is determined by the
number of partitions in a cluster, a cluster with 100K partitions will
have a snapshot size of roughly 10MB (100 bytes per partition * 100k
partitions). For this kind of clusters the cluster metadata partition
will always consume around 1.7GB.

We should change the KIP and the default value for
`metadata.max.retention.bytes` to 100MB. This should limit the size of
the cluster metadata partition for large clusters but keep 7 days
worth of snapshots for small clusters.

Thanks,
-- 
-José


Re: [DISCUSS] KIP-876: Time based cluster metadata snapshots

2022-10-17 Thread José Armando García Sancio
Thanks for your feedback Luke.

On Thu, Oct 13, 2022 at 11:58 PM Luke Chen  wrote:
> The only thing I'd like to point out is the compatibility section.
> Since this new config is default to 1 hour, which means if users explicitly
> set the config `metadata.log.max.record.bytes.between.snapshots` to a very
> large value to avoid snapshot creation, after upgraded, the snapshots will
> be created every hour. I think this behavior change should be explicitly
> written in compatibility section. WDYT?

I added the following sentence to the compatibility section. "If the
user wants to keep the behavior before this KIP was implemented, they
would need to disable time based snapshots by setting
metadata.log.max.snapshot.interval.ms to zero (0)."

KIP-778 introduced snapshot generation when the feature level changes
so it would be difficult for users to disable snapshot altogether.

-- 
-José


Re: [DISCUSS] KIP-876: Time based cluster metadata snapshots

2022-10-13 Thread Luke Chen
Hi José,

Thanks for the KIP.
Adding support to generate snapshot based on time makes sense to me.

The only thing I'd like to point out is the compatibility section.
Since this new config is default to 1 hour, which means if users explicitly
set the config `metadata.log.max.record.bytes.between.snapshots` to a very
large value to avoid snapshot creation, after upgraded, the snapshots will
be created every hour. I think this behavior change should be explicitly
written in compatibility section. WDYT?

Otherwise, LGTM.

Luke

On Fri, Oct 14, 2022 at 8:14 AM José Armando García Sancio
 wrote:

> Thanks for your feedback David Jacot, Colin McCabe and Niket Goel.
>
> I started the vote thread at
> https://lists.apache.org/thread/yzzhbvdqxg9shttgbzooc2f42l1cv2sj
>
> --
> -José
>


Re: [DISCUSS] KIP-876: Time based cluster metadata snapshots

2022-10-13 Thread José Armando García Sancio
Thanks for your feedback David Jacot, Colin McCabe and Niket Goel.

I started the vote thread at
https://lists.apache.org/thread/yzzhbvdqxg9shttgbzooc2f42l1cv2sj

-- 
-José


Re: [DISCUSS] KIP-876: Time based cluster metadata snapshots

2022-10-12 Thread José Armando García Sancio
On Wed, Oct 12, 2022 at 3:02 PM Niket Goel  wrote:
> 1. Do we need this value to be of the order of `ms`. Is it better off being
> tunable to a minute granularity?

Hmm. The most common unit for time intervals in Kafka is milliseconds.
Very rarely does Kafka express time intervals using another unit.
There are two examples where Kafka expresses time intervals in
seconds. I would argue that even those cases should have used
milliseconds as the time unit.

We could limit the range of possible values for this configuration but
it is not clear to me that Kafka should strictly enforce a minimum in
all cases. Except for enforcing that the interval should not be a
negative number.

> 2. Somewhat related to 1 - Do we need to make any tweaks to the way we
> cleanup the metadata directory (time/size based cleanup)? My concern is if
> the metadata-dir cleanup will trigger fast enough in all cases today. e.g.
> If I were to configure the time-based snapshotting to something like every
> 10 seconds, with enough load of course, will I end up flooding the disk.?

This seems to be an implementation issue given how
RaftMetadataLogCleanerManager deletes snapshots and log segments. I
could imagine a different implementation that doesn't have this
problem. I think we should avoid as much as possible having
implementation details leak to the configuration and public APIs.

-- 
-José


Re: [DISCUSS] KIP-876: Time based cluster metadata snapshots

2022-10-12 Thread José Armando García Sancio
On Wed, Oct 12, 2022 at 3:02 PM Colin McCabe  wrote:
> Given that we already have metadata.log.max.record.bytes.between.snapshots, 
> we need to define how the two properties interact. I would expect that the 
> time-based property would take effect only if the bytes-based property did 
> not trigger. In other words, if you are regularly generating snapshots every 
> hour because of your setting for 
> metadata.log.max.record.bytes.between.snapshots, setting a time-based 
> property for every 10 hours should have no effect. But if you would have only 
> generated a snapshot every week by your 
> metadata.log.max.record.bytes.between.snapshots setting, setting a time-based 
> property for every 10 hours should result in a snapshot every 10 hours. And 
> in that case the bytes-based property is effectively ignored.

Yes. This is exactly how I was planning to implement this. I was
debating if I should go into this level of detail in the KIP or to
leave this discussion to the PR review. Either way, I updated the KIP
to provide some guidance to the implementation.

> ... it suggests that the configuration name should include "max" somewhere to 
> indicate that it is a maximum only and not necessarily the duration value 
> that will result from the sum total of all configurations. So maybe 
> metadata.log.max.snapshot.interval.ms?

Sounds good to me. I updated the name of the property to
`metadata.log.max.snapshot.interval.ms`.

-- 
-José


Re: [DISCUSS] KIP-876: Time based cluster metadata snapshots

2022-10-12 Thread José Armando García Sancio
On Wed, Oct 12, 2022 at 1:27 AM David Jacot  wrote:
> I would name the
> new property `metadata.log.snapshot.interval.ms` as `between` is
> implied by the `interval`.

I agree. I updated the KIP to use your suggestions for naming the
property `metadata.log.snapshot.interval.ms`.

-- 
-José


Re: [DISCUSS] KIP-876: Time based cluster metadata snapshots

2022-10-12 Thread Niket Goel
Thanks for the KIP Jose! Adding this ability makes sense to me. Just a few
quick wonderments:
1. Do we need this value to be of the order of `ms`. Is it better off being
tunable to a minute granularity?
2. Somewhat related to 1 - Do we need to make any tweaks to the way we
cleanup the metadata directory (time/size based cleanup)? My concern is if
the metadata-dir cleanup will trigger fast enough in all cases today. e.g.
If I were to configure the time-based snapshotting to something like every
10 seconds, with enough load of course, will I end up flooding the disk.?

On Wed, Oct 12, 2022 at 1:28 AM David Jacot 
wrote:

> Hi José,
>
> Thanks for the KIP. That makes total sense. On nit, I would name the
> new property `metadata.log.snapshot.interval.ms` as `between` is
> implied by the `interval`.
>
> Best,
> David
>
> On Tue, Oct 11, 2022 at 9:16 PM José Armando García Sancio
>  wrote:
> >
> > Hey all,
> >
> > I am interested in allowing brokers and controllers in KRaft to
> > generate snapshots for the cluster metadata partition on a timely
> > basis. This would better allow Kafka users to use cluster metadata
> > snapshots as a solution for backing up the cluster's metadata.
> >
> > Let's use this thread to discuss KIP-876:
> > https://cwiki.apache.org/confluence/x/MY3GDQ
> >
> > Thanks!
> > --
> > -José
>


-- 
- Niket


Re: [DISCUSS] KIP-876: Time based cluster metadata snapshots

2022-10-12 Thread Colin McCabe
Thanks for the KIP, José.

Given that we already have metadata.log.max.record.bytes.between.snapshots, we 
need to define how the two properties interact. I would expect that the 
time-based property would take effect only if the bytes-based property did not 
trigger. In other words, if you are regularly generating snapshots every hour 
because of your setting for metadata.log.max.record.bytes.between.snapshots, 
setting a time-based property for every 10 hours should have no effect. But if 
you would have only generated a snapshot every week by your 
metadata.log.max.record.bytes.between.snapshots setting, setting a time-based 
property for every 10 hours should result in a snapshot every 10 hours. And in 
that case the bytes-based property is effectively ignored.

I think we should clarify this in the KIP to avoid confusion. Also, it suggests 
that the configuration name should include "max" somewhere to indicate that it 
is a maximum only and not necessarily the duration value that will result from 
the sum total of all configurations. So maybe 
metadata.log.max.snapshot.interval.ms?

best,
Colin

On Tue, Oct 11, 2022, at 12:15, José Armando García Sancio wrote:
> Hey all,
>
> I am interested in allowing brokers and controllers in KRaft to
> generate snapshots for the cluster metadata partition on a timely
> basis. This would better allow Kafka users to use cluster metadata
> snapshots as a solution for backing up the cluster's metadata.
>
> Let's use this thread to discuss KIP-876:
> https://cwiki.apache.org/confluence/x/MY3GDQ
>
> Thanks!
> -- 
> -José


Re: [DISCUSS] KIP-876: Time based cluster metadata snapshots

2022-10-12 Thread David Jacot
Hi José,

Thanks for the KIP. That makes total sense. On nit, I would name the
new property `metadata.log.snapshot.interval.ms` as `between` is
implied by the `interval`.

Best,
David

On Tue, Oct 11, 2022 at 9:16 PM José Armando García Sancio
 wrote:
>
> Hey all,
>
> I am interested in allowing brokers and controllers in KRaft to
> generate snapshots for the cluster metadata partition on a timely
> basis. This would better allow Kafka users to use cluster metadata
> snapshots as a solution for backing up the cluster's metadata.
>
> Let's use this thread to discuss KIP-876:
> https://cwiki.apache.org/confluence/x/MY3GDQ
>
> Thanks!
> --
> -José


[DISCUSS] KIP-876: Time based cluster metadata snapshots

2022-10-11 Thread José Armando García Sancio
Hey all,

I am interested in allowing brokers and controllers in KRaft to
generate snapshots for the cluster metadata partition on a timely
basis. This would better allow Kafka users to use cluster metadata
snapshots as a solution for backing up the cluster's metadata.

Let's use this thread to discuss KIP-876:
https://cwiki.apache.org/confluence/x/MY3GDQ

Thanks!
-- 
-José