Jenkins build is still unstable: Kafka » Kafka Branch Builder » 3.4 #3

2022-12-07 Thread Apache Jenkins Server
See 




Re: [DISCUSS] Apache Kafka 3.4.0 release

2022-12-07 Thread Sophie Blee-Goldman
Thanks everyone for the updates, that all sounds good.

Divij: I pinged some of the relevant reviewers to give your PRs a final
pass, but will
leave it to their judgement from here as I'm not familiar with either.

Everyone else: reminder that today is the official code freeze deadline, so
please try
and get your PRs in by EOD! If you have something on the edge and worry
it's not
ready, feel free to reach out for an extension: I'd rather give you an
extra day or two
than risk finding a blocker weeks from now because a PR was rushed :)

On Wed, Dec 7, 2022 at 2:04 PM Chris Egerton 
wrote:

> Hi Sophie,
>
> Thanks for taking a look at the MM2 issue. We've merged a
> (minimally-scoped) fix and backported it to the 3.4 branch; the issue
> should be resolved now.
>
> Cheers,
>
> Chris
>
> On Wed, Dec 7, 2022 at 3:12 PM Rajini Sivaram 
> wrote:
>
> > Hi Sophie,
> >
> > I have merged PR #12954 for KIP-881 to 3.4 branch. Please let me know if
> > that is ok.
> >
> > Thank you,
> >
> > Rajini
> >
> >
> > On Wed, Dec 7, 2022 at 11:43 AM Rajini Sivaram 
> > wrote:
> >
> > > Hi Sophie,
> > >
> > > The first PR for KIP-881 which contains protocol changes has been
> merged
> > > to trunk (https://github.com/apache/kafka/pull/12954). It is a
> > relatively
> > > small PR, can we merge to 3.4.0?
> > >
> > > Thank you,
> > >
> > > Rajini
> > >
> > >
> > > On Wed, Dec 7, 2022 at 11:16 AM Divij Vaidya 
> > > wrote:
> > >
> > >> Hey Sophie
> > >>
> > >> I have a couple of pending PRs which have been waiting for review
> since
> > >> preparation of the 3.3 release. They are not blockers for 3.4 but are
> > >> being
> > >> tracked as improvements that we would like to add to 3.4 release.
> > >>
> > >> Please consider taking a look when you get a chance:
> > >>
> > >> 1. https://issues.apache.org/jira/browse/KAFKA-7109
> > >> 2. https://github.com/apache/kafka/pull/12228
> > >>
> > >> --
> > >> Divij Vaidya
> > >>
> > >>
> > >>
> > >> On Wed, Dec 7, 2022 at 3:18 AM Sophie Blee-Goldman
> > >>  wrote:
> > >>
> > >> > Hey all,
> > >> >
> > >> > First off, just a heads up that code freeze will be *tomorrow, Dec
> > 6th*
> > >> so
> > >> > please make sure
> > >> > to merge any lingering PRs by EOD Wednesday (PST). If you have a
> > >> potential
> > >> > blocker
> > >> > that may take longer to fix and hasn't already been communicated to
> > me,
> > >> > please reach out
> > >> > to me now and make sure the ticket is marked as a blocker for 3.4
> > >> >
> > >> > Also note that the 3.4 branch has been created, so going forward
> > you'll
> > >> > need to ensure that
> > >> > newly merged PRs are cherrypicked to this branch to make the 3.4
> > >> release.
> > >> >
> > >> > Thanks, and don't hesitate to reach out if you have any questions.
> > >> >
> > >> > Greg/Chris -- I looked over the ticket and PR and agree this counts
> > as a
> > >> > blocker so just try
> > >> > and get this in as quickly as is reasonable. It seems like things
> are
> > >> > mostly sorted with this
> > >> > fix but I did chime in on the PR discussion regarding keeping the
> > scope
> > >> > small here
> > >> >
> > >> >
> > >> > On Tue, Dec 6, 2022 at 7:15 AM Chris Egerton
>  > >
> > >> > wrote:
> > >> >
> > >> > > Hi Greg,
> > >> > >
> > >> > > Thanks for finding and raising this issue. I've given the PR a
> look
> > >> and
> > >> > > plan to continue reviewing it this week until merged. IMO this
> > should
> > >> > > qualify as a blocker for the release.
> > >> > >
> > >> > > Sophie, is it alright if we merge this into the 3.4 branch (or
> > trunk,
> > >> if
> > >> > > one has not been created yet) past the December 7th code freeze
> > >> deadline?
> > >> > >
> > >> > > Cheers,
> > >> > >
> > >> > > Chris
> > >> > >
> > >> > > On Mon, Dec 5, 2022 at 2:11 PM Greg Harris
> > >>  > >> > >
> > >> > > wrote:
> > >> > >
> > >> > > > Hi All,
> > >> > > >
> > >> > > > Just notifying everyone of a regression introduced by KIP-787,
> > >> > currently
> > >> > > > only present on trunk, but which may qualify as a blocker for
> the
> > >> > > release.
> > >> > > > It manifests as a moderate resource leak on MirrorMaker2
> clusters.
> > >> The
> > >> > > fix
> > >> > > > should have a small scope and low risk.
> > >> > > >
> > >> > > > Here's the bug ticket:
> > >> > https://issues.apache.org/jira/browse/KAFKA-14443
> > >> > > > Here's the tentative fix PR:
> > >> > https://github.com/apache/kafka/pull/12955
> > >> > > >
> > >> > > > Thanks!
> > >> > > > Greg
> > >> > > >
> > >> > > > On Fri, Dec 2, 2022 at 8:06 AM David Jacot
> > >>  > >> > >
> > >> > > > wrote:
> > >> > > >
> > >> > > > > Hi Sophie,
> > >> > > > >
> > >> > > > > FYI - I just merged KIP-840
> > >> > > > > (
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=211884652
> > >> > > > > )
> > >> > > > > so it will be in 3.4.
> > >> > > > >
> > >> > > > > Best,
> > >> > > > > David
> > >> > > > >
> > >> > > > > On Thu, Dec 1, 2022 at 

Re: [VOTE] KIP-837 Allow MultiCasting a Result Record.

2022-12-07 Thread Sagar
Hi Mathias,

I did save it. The changes are added under Public Interfaces (Pt#2 about
enhancing KeyQueryMetadata with partitions method) and
throwing IllegalArgumentException when StreamPartitioner#partitions method
returns multiple partitions for just FK-join instead of the earlier decided
FK-Join and IQ.

The background is that for IQ, if the users have multi casted records to
multiple partitions during ingestion but the fetch returns only a single
partition, then it would be wrong. That's why the restriction was lifted
for IQ and that's the reason KeyQueryMetadata now has another partitions()
method to signify the same.

FK-Join also has a similar case, but while reviewing it was felt that
FK-Join on it's own is fairly complicated and we don't need this feature
right away so the restriction still exists.

Thanks!
Sagar.


On Wed, Dec 7, 2022 at 9:42 PM Matthias J. Sax  wrote:

> I don't see any update on the wiki about it. Did you forget to hit "save"?
>
> Can you also provide some background? I am not sure right now if I
> understand the proposed changes?
>
>
> -Matthias
>
> On 12/6/22 6:36 PM, Sophie Blee-Goldman wrote:
> > Thanks Sagar, this makes sense to me -- we clearly need additional
> changes
> > to
> > avoid breaking IQ when using this feature, but I agree with continuing to
> > restrict
> > FKJ since they wouldn't stop working without it, and would become much
> > harder
> > to reason about (than they already are) if we did enable them to use it.
> >
> > And of course, they can still multicast the final results of a FKJ, they
> > just can't
> > mess with the internal workings of it in this way.
> >
> > On Tue, Dec 6, 2022 at 9:48 AM Sagar  wrote:
> >
> >> Hi All,
> >>
> >> I made a couple of edits to the KIP which came up during the code
> review.
> >> Changes at a high level are:
> >>
> >> 1) KeyQueryMetada enhanced to have a new method called partitions().
> >> 2) Lifting the restriction of a single partition for IQ. Now the
> >> restriction holds only for FK Join.
> >>
> >> Updated KIP:
> >>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=211883356
> >>
> >> Thanks!
> >> Sagar.
> >>
> >> On Mon, Sep 12, 2022 at 6:43 PM Sagar 
> wrote:
> >>
> >>> Thanks Bruno,
> >>>
> >>> Marking this as accepted.
> >>>
> >>> Thanks everyone for their comments/feedback.
> >>>
> >>> Thanks!
> >>> Sagar.
> >>>
> >>> On Mon, Sep 12, 2022 at 1:53 PM Bruno Cadonna 
> >> wrote:
> >>>
>  Hi Sagar,
> 
>  Thanks for the update and the PR!
> 
>  +1 (binding)
> 
>  Best,
>  Bruno
> 
>  On 10.09.22 18:57, Sagar wrote:
> > Hi Bruno,
> >
> > Thanks, I think these changes make sense to me. I have updated the
> KIP
> > accordingly.
> >
> > Thanks!
> > Sagar.
> >
> > On Wed, Sep 7, 2022 at 2:16 PM Bruno Cadonna 
>  wrote:
> >
> >> Hi Sagar,
> >>
> >> I would not drop the support for dropping records. I would also not
> >> return null from partitions(). Maybe an Optional can help here. An
>  empty
> >> Optional would mean to use the default partitioning behavior of the
> >> producer. So we would have:
> >>
> >> - non-empty Optional, non-empty list of integers: partitions to send
>  the
> >> record to
> >> - non-empty Optional, empty list of integers: drop the record
> >> - empty Optional: use default behavior
> >>
> >> What do other think?
> >>
> >> Best,
> >> Bruno
> >>
> >> On 02.09.22 13:53, Sagar wrote:
> >>> Hello Bruno/Chris,
> >>>
> >>> Since these are the last set of changes(I am assuming haha), it
> >> would
>  be
> >>> great if you could review the 2 options from above so that we can
>  close
> >> the
> >>> voting. Of course I am happy to incorporate any other requisite
>  changes.
> >>>
> >>> Thanks!
> >>> Sagar.
> >>>
> >>> On Wed, Aug 31, 2022 at 10:07 PM Sagar 
> >> wrote:
> >>>
>  Thanks Bruno for the great points.
> 
>  I see 2 options here =>
> 
>  1) As Chris suggested, drop the support for dropping records in
> the
>  partitioner. That way, an empty list could signify the usage of a
> >> default
>  partitioner. Also, if the deprecated partition() method returns
> >> null
>  thereby signifying the default partitioner, the partitions() can
>  return
> >> an
>  empty list i.e default partitioner.
> 
>  2) OR we treat a null return type of partitions() method to
> signify
>  the
>  usage of the default partitioner. In the default implementation of
>  partitions() method, if partition() returns null, then even
>  partitions()
>  can return null(instead of an empty list). The RecordCollectorImpl
>  code
> >> can
>  also be modified accordingly. @Chris, to your point, we can even
> >> drop
> >> the
>  support of 

[jira] [Resolved] (KAFKA-14432) RocksDBStore relies on finalizers to not leak memory

2022-12-07 Thread A. Sophie Blee-Goldman (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

A. Sophie Blee-Goldman resolved KAFKA-14432.

Resolution: Fixed

> RocksDBStore relies on finalizers to not leak memory
> 
>
> Key: KAFKA-14432
> URL: https://issues.apache.org/jira/browse/KAFKA-14432
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Lucas Brutschy
>Assignee: Lucas Brutschy
>Priority: Blocker
> Fix For: 3.4.0
>
>
> Relying on finalizers in RocksDB has been deprecated for a long time, and 
> starting with rocksdb 7, finalizers are removed completely (see 
> [https://github.com/facebook/rocksdb/pull/9523]). 
> Kafka Streams currently relies on finalizers in parts to not leak memory. 
> This needs to be resolved before we can upgrade to RocksDB 7.
> See  [https://github.com/apache/kafka/pull/12809] .
> This is a native heap profile after running Kafka Streams without finalizers 
> for a few hours:
> {code:java}
> Total: 13547.5 MB
> 12936.3 95.5% 95.5% 12936.3 95.5% rocksdb::port::cacheline_aligned_alloc
> 438.5 3.2% 98.7% 438.5 3.2% rocksdb::BlockFetcher::ReadBlockContents
> 84.0 0.6% 99.3% 84.2 0.6% rocksdb::Arena::AllocateNewBlock
> 45.9 0.3% 99.7% 45.9 0.3% prof_backtrace_impl
> 8.1 0.1% 99.7% 14.6 0.1% rocksdb::BlockBasedTable::PutDataBlockToCache
> 6.4 0.0% 99.8% 12941.4 95.5% Java_org_rocksdb_Statistics_newStatistics___3BJ
> 6.1 0.0% 99.8% 6.9 0.1% rocksdb::LRUCacheShard::Insert@2d8b20
> 5.1 0.0% 99.9% 6.5 0.0% rocksdb::VersionSet::ProcessManifestWrites
> 3.9 0.0% 99.9% 3.9 0.0% rocksdb::WritableFileWriter::WritableFileWriter
> 3.2 0.0% 99.9% 3.2 0.0% std::string::_Rep::_S_create{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14454) KTableKTableForeignKeyInnerJoinCustomPartitionerIntegrationTest#shouldThrowIllegalArgumentExceptionWhenCustomPartionerReturnsMultiplePartitions passes when run individua

2022-12-07 Thread Sagar Rao (Jira)
Sagar Rao created KAFKA-14454:
-

 Summary: 
KTableKTableForeignKeyInnerJoinCustomPartitionerIntegrationTest#shouldThrowIllegalArgumentExceptionWhenCustomPartionerReturnsMultiplePartitions
 passes when run individually but not when is run as part of the IT
 Key: KAFKA-14454
 URL: https://issues.apache.org/jira/browse/KAFKA-14454
 Project: Kafka
  Issue Type: Bug
Reporter: Sagar Rao
Assignee: Sagar Rao


Newly added test 
KTableKTableForeignKeyInnerJoinCustomPartitionerIntegrationTest#shouldThrowIllegalArgumentExceptionWhenCustomPartionerReturnsMultiplePartitions
 as part of KIP-837 passes when run individually but fails when is part of IT 
class and hence is marked as Ignored. 

As part of this ticket, we can also look to move to Junit5 annotations for this 
class since it relies on Junit4 ones.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] KIP-889 Versioned State Stores

2022-12-07 Thread Victoria Xia
Thanks for the discussion, Bruno, Sagar, and Matthias!

It seems we've reached consensus on almost all of the discussion points.
I've updated the KIP with the following:
1) renamed "timestampTo" in `get(key, timestampTo)` to "asOfTimestamp" to
clarify that this timestamp bound is inclusive, per the SQL guideline that
"AS OF " queries are inclusive. In the future, if we want to
introduce a timestamp range query, we can use `get(key, timestampFrom,
timestampTo)` and specify that timestampTo is exclusive in this method,
while avoiding confusing with the inclusive asOfTimestamp parameter in the
other method, given that the names are different.
2) added a description of "history retention" semantics into the
VersionedKeyValueStore interface Javadoc, and updated the Javadoc for
`get(key, asOfTimestamp)` to mention explicitly that a null result is
returned if the provided timestamp bound is not within history retention.
3) added a `delete(key, timestamp)` method (with return type
`ValueAndTimestamp`) to the VersionedKeyValueStore interface.
4) updated the Javadoc for `segmentInterval` to clarify that the only
reason a user might be interested in this parameter is performance.

Other points we discussed which did not result in updates include:
5) whether to automatically update the `min.compaction.lag.ms` config on
changelog topics when history retention is changed -- there's support for
this but let's not bundle it with this KIP. We can have a separate KIP to
change this behavior for the existing windowed changelog topics, in
addition to versioned changelog topics.
6) should we expose segmentInterval in this KIP -- let's go ahead and
expose it now since we'll almost certainly expose it (in this same manner)
in a follow-on KIP anyway, and so that poor performance for user workloads
is less likely to be a barrier for users getting started with this feature.
I updated the Javadoc for this parameter to clarify why the Javadoc
mentions performance despite Javadocs typically not doing so.
7) `get(timestampFrom, timestampTo)` and other methods for IQ -- very
important but deferred to a future KIP
8) `purge(key)`/`deleteAllVersions(key)` -- deferred to a future KIP

That leaves only one unresolved discussion point:
9) whether to include validTo in the return types from `get(...)`. If we go
with the current proposal of not including validTo in the return type, then
it will not be easy to add it in the future (unless we want to add validTo
to ValueAndTimestamp, which feels odd to me). If we think we might want to
have validTo in the future, we can change the return type of `get(...)` and
`delete(...)` in this proposal from `ValueAndTimestamp` to a new type,
e.g., `VersionedRecord` or `RecordVersion`, which today will look the
same as `ValueAndTimestamp` but in the future we can add validTo if we
want. The cost is a new type which today looks the same as
ValueAndTimestamp.

Now that I think about it more, the cost to introducing a new type seems
relatively low. I've added a proposal towards the bottom of the KIP here
.
If others also believe that the cost of introducing this new interface is
low (particularly relative to the flexibility it provides us for being able
to evolve the class in the future), I will incorporate this proposal into
the KIP. I think the hardest part of this will be deciding on a name for
the new class :)

Pending objections, I'd like to make a call on item (9) and call a vote on
this KIP at the end of this week.

Thanks,
Victoria

On Thu, Dec 1, 2022 at 9:47 PM Matthias J. Sax  wrote:

> Thanks Victoria!
>
> (1) About `ReadOnlyVersionedKeyValueStore` -- I am not sure about IQv1
> vs IQv2. But you might be right that adding the interface later might
> not be an issue -- so it does not matter. Just wanted to double check.
>
>
>
> (2) About `delete(key, ts)` -- as already discussed, I agree that it
> should have same semantics as `put(key, null, ts)` (delete() needs a
> timestamp). Not sure if `delete()` really needs to return anything? I
> would be ok to make it `void` -- but I think it's also semantically
> sound if it returns the "old" value at timestamps `ts` that the delete
> actually deleted, as you mentioned -- in the end, an "delete" is a
> physical append anyway (ie, "soft delete") as we want to track history.
>
>
>
> (3)
> > Ah, great question. I think the question boils down to: do we want to
> > require that all versioned stores (including custom user implementations)
> > use "history retention" to determine when to expire old record versions?
>
> I personally think, yes. The main reason for this is, that I think we
> need to have a clear contract so we can plug-in custom implementations
> into the DSL later? -- I guess, having a stricter contract initially,
> and relaxing it later if necessary, is the easier was forward, than 

Re: [DISCUSS] Apache Kafka 3.4.0 release

2022-12-07 Thread Chris Egerton
Hi Sophie,

Thanks for taking a look at the MM2 issue. We've merged a
(minimally-scoped) fix and backported it to the 3.4 branch; the issue
should be resolved now.

Cheers,

Chris

On Wed, Dec 7, 2022 at 3:12 PM Rajini Sivaram 
wrote:

> Hi Sophie,
>
> I have merged PR #12954 for KIP-881 to 3.4 branch. Please let me know if
> that is ok.
>
> Thank you,
>
> Rajini
>
>
> On Wed, Dec 7, 2022 at 11:43 AM Rajini Sivaram 
> wrote:
>
> > Hi Sophie,
> >
> > The first PR for KIP-881 which contains protocol changes has been merged
> > to trunk (https://github.com/apache/kafka/pull/12954). It is a
> relatively
> > small PR, can we merge to 3.4.0?
> >
> > Thank you,
> >
> > Rajini
> >
> >
> > On Wed, Dec 7, 2022 at 11:16 AM Divij Vaidya 
> > wrote:
> >
> >> Hey Sophie
> >>
> >> I have a couple of pending PRs which have been waiting for review since
> >> preparation of the 3.3 release. They are not blockers for 3.4 but are
> >> being
> >> tracked as improvements that we would like to add to 3.4 release.
> >>
> >> Please consider taking a look when you get a chance:
> >>
> >> 1. https://issues.apache.org/jira/browse/KAFKA-7109
> >> 2. https://github.com/apache/kafka/pull/12228
> >>
> >> --
> >> Divij Vaidya
> >>
> >>
> >>
> >> On Wed, Dec 7, 2022 at 3:18 AM Sophie Blee-Goldman
> >>  wrote:
> >>
> >> > Hey all,
> >> >
> >> > First off, just a heads up that code freeze will be *tomorrow, Dec
> 6th*
> >> so
> >> > please make sure
> >> > to merge any lingering PRs by EOD Wednesday (PST). If you have a
> >> potential
> >> > blocker
> >> > that may take longer to fix and hasn't already been communicated to
> me,
> >> > please reach out
> >> > to me now and make sure the ticket is marked as a blocker for 3.4
> >> >
> >> > Also note that the 3.4 branch has been created, so going forward
> you'll
> >> > need to ensure that
> >> > newly merged PRs are cherrypicked to this branch to make the 3.4
> >> release.
> >> >
> >> > Thanks, and don't hesitate to reach out if you have any questions.
> >> >
> >> > Greg/Chris -- I looked over the ticket and PR and agree this counts
> as a
> >> > blocker so just try
> >> > and get this in as quickly as is reasonable. It seems like things are
> >> > mostly sorted with this
> >> > fix but I did chime in on the PR discussion regarding keeping the
> scope
> >> > small here
> >> >
> >> >
> >> > On Tue, Dec 6, 2022 at 7:15 AM Chris Egerton  >
> >> > wrote:
> >> >
> >> > > Hi Greg,
> >> > >
> >> > > Thanks for finding and raising this issue. I've given the PR a look
> >> and
> >> > > plan to continue reviewing it this week until merged. IMO this
> should
> >> > > qualify as a blocker for the release.
> >> > >
> >> > > Sophie, is it alright if we merge this into the 3.4 branch (or
> trunk,
> >> if
> >> > > one has not been created yet) past the December 7th code freeze
> >> deadline?
> >> > >
> >> > > Cheers,
> >> > >
> >> > > Chris
> >> > >
> >> > > On Mon, Dec 5, 2022 at 2:11 PM Greg Harris
> >>  >> > >
> >> > > wrote:
> >> > >
> >> > > > Hi All,
> >> > > >
> >> > > > Just notifying everyone of a regression introduced by KIP-787,
> >> > currently
> >> > > > only present on trunk, but which may qualify as a blocker for the
> >> > > release.
> >> > > > It manifests as a moderate resource leak on MirrorMaker2 clusters.
> >> The
> >> > > fix
> >> > > > should have a small scope and low risk.
> >> > > >
> >> > > > Here's the bug ticket:
> >> > https://issues.apache.org/jira/browse/KAFKA-14443
> >> > > > Here's the tentative fix PR:
> >> > https://github.com/apache/kafka/pull/12955
> >> > > >
> >> > > > Thanks!
> >> > > > Greg
> >> > > >
> >> > > > On Fri, Dec 2, 2022 at 8:06 AM David Jacot
> >>  >> > >
> >> > > > wrote:
> >> > > >
> >> > > > > Hi Sophie,
> >> > > > >
> >> > > > > FYI - I just merged KIP-840
> >> > > > > (
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=211884652
> >> > > > > )
> >> > > > > so it will be in 3.4.
> >> > > > >
> >> > > > > Best,
> >> > > > > David
> >> > > > >
> >> > > > > On Thu, Dec 1, 2022 at 3:01 AM Sophie Blee-Goldman
> >> > > > >  wrote:
> >> > > > > >
> >> > > > > > Hey all! It's officially *feature freeze for 3.4* so make sure
> >> you
> >> > > get
> >> > > > > that
> >> > > > > > feature work merged by the end of today.
> >> > > > > > After this point, only bug fixes and other work focused on
> >> > > stabilizing
> >> > > > > the
> >> > > > > > release should be merged to the release
> >> > > > > > branch. Also note that the *3.4 code freeze* will be in one
> week
> >> > > (*Dec
> >> > > > > 7th*)
> >> > > > > > so please make sure to stabilize and
> >> > > > > > thoroughly test any new features.
> >> > > > > >
> >> > > > > > I will wait until Friday to create the release branch to allow
> >> for
> >> > > any
> >> > > > > > existing PRs to be merged. After this point you'll
> >> > > > > > need to cherrypick any new commits to the 3.4 branch once a PR
> >> is
> >> > > > merged.
> >> > > > > >
> >> > > 

[jira] [Created] (KAFKA-14453) Flaky test suite MirrorConnectorsWithCustomForwardingAdminIntegrationTest

2022-12-07 Thread Chris Egerton (Jira)
Chris Egerton created KAFKA-14453:
-

 Summary: Flaky test suite 
MirrorConnectorsWithCustomForwardingAdminIntegrationTest
 Key: KAFKA-14453
 URL: https://issues.apache.org/jira/browse/KAFKA-14453
 Project: Kafka
  Issue Type: Test
  Components: mirrormaker
Reporter: Chris Egerton


We've been seeing some integration test failures lately for the 
{{MirrorConnectorsWithCustomForwardingAdminIntegrationTest}} test suite. A 
couple examples:

{{org.opentest4j.AssertionFailedError: Condition not met within timeout 6. 
Topic: mm2-offset-syncs.backup.internal didn't get created in the 
FakeLocalMetadataStore ==> expected:  but was: }}
{{    at 
app//org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151)}}
{{    at 
app//org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132)}}
{{    at app//org.junit.jupiter.api.AssertTrue.failNotTrue(AssertTrue.java:63)}}
{{    at app//org.junit.jupiter.api.AssertTrue.assertTrue(AssertTrue.java:36)}}
{{    at app//org.junit.jupiter.api.Assertions.assertTrue(Assertions.java:210)}}
{{    at 
app//org.apache.kafka.test.TestUtils.lambda$waitForCondition$4(TestUtils.java:337)}}
{{    at 
app//org.apache.kafka.test.TestUtils.retryOnExceptionWithTimeout(TestUtils.java:385)}}
{{    at 
app//org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:334)}}
{{    at 
app//org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:318)}}
{{    at 
app//org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:308)}}
{{    at 
app//org.apache.kafka.connect.mirror.integration.MirrorConnectorsWithCustomForwardingAdminIntegrationTest.waitForTopicToPersistInFakeLocalMetadataStore(MirrorConnectorsWithCustomForwardingAdminIntegrationTest.java:326)}}
{{    at 
app//org.apache.kafka.connect.mirror.integration.MirrorConnectorsWithCustomForwardingAdminIntegrationTest.testReplicationIsCreatingTopicsUsingProvidedForwardingAdmin(MirrorConnectorsWithCustomForwardingAdminIntegrationTest.java:217)}}
{{}}

 

And:

 

{{org.opentest4j.AssertionFailedError: Condition not met within timeout 6. 
Topic: primary.test-topic-1's configs don't have partitions:11 ==> expected: 
 but was: }}
{{    }}{{at 
org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151)}}
{{    }}{{at 
org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132)}}
{{    }}{{at org.junit.jupiter.api.AssertTrue.failNotTrue(AssertTrue.java:63)}}
{{    }}{{at org.junit.jupiter.api.AssertTrue.assertTrue(AssertTrue.java:36)}}
{{    }}{{at org.junit.jupiter.api.Assertions.assertTrue(Assertions.java:210)}}
{{    }}{{at 
org.apache.kafka.test.TestUtils.lambda$waitForCondition$4(TestUtils.java:337)}}
{{    }}{{at 
org.apache.kafka.test.TestUtils.retryOnExceptionWithTimeout(TestUtils.java:385)}}
{{    }}{{at 
org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:334)}}
{{    }}{{at 
org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:318)}}
{{    }}{{at 
org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:308)}}
{{    }}{{at 
org.apache.kafka.connect.mirror.integration.MirrorConnectorsWithCustomForwardingAdminIntegrationTest.waitForTopicConfigPersistInFakeLocalMetaDataStore(MirrorConnectorsWithCustomForwardingAdminIntegrationTest.java:334)}}
{{    }}{{at 
org.apache.kafka.connect.mirror.integration.MirrorConnectorsWithCustomForwardingAdminIntegrationTest.testCreatePartitionsUseProvidedForwardingAdmin(MirrorConnectorsWithCustomForwardingAdminIntegrationTest.java:255)}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Jenkins build is unstable: Kafka » Kafka Branch Builder » trunk #1414

2022-12-07 Thread Apache Jenkins Server
See 




Re: [DISCUSS] Apache Kafka 3.4.0 release

2022-12-07 Thread Rajini Sivaram
Hi Sophie,

I have merged PR #12954 for KIP-881 to 3.4 branch. Please let me know if
that is ok.

Thank you,

Rajini


On Wed, Dec 7, 2022 at 11:43 AM Rajini Sivaram 
wrote:

> Hi Sophie,
>
> The first PR for KIP-881 which contains protocol changes has been merged
> to trunk (https://github.com/apache/kafka/pull/12954). It is a relatively
> small PR, can we merge to 3.4.0?
>
> Thank you,
>
> Rajini
>
>
> On Wed, Dec 7, 2022 at 11:16 AM Divij Vaidya 
> wrote:
>
>> Hey Sophie
>>
>> I have a couple of pending PRs which have been waiting for review since
>> preparation of the 3.3 release. They are not blockers for 3.4 but are
>> being
>> tracked as improvements that we would like to add to 3.4 release.
>>
>> Please consider taking a look when you get a chance:
>>
>> 1. https://issues.apache.org/jira/browse/KAFKA-7109
>> 2. https://github.com/apache/kafka/pull/12228
>>
>> --
>> Divij Vaidya
>>
>>
>>
>> On Wed, Dec 7, 2022 at 3:18 AM Sophie Blee-Goldman
>>  wrote:
>>
>> > Hey all,
>> >
>> > First off, just a heads up that code freeze will be *tomorrow, Dec 6th*
>> so
>> > please make sure
>> > to merge any lingering PRs by EOD Wednesday (PST). If you have a
>> potential
>> > blocker
>> > that may take longer to fix and hasn't already been communicated to me,
>> > please reach out
>> > to me now and make sure the ticket is marked as a blocker for 3.4
>> >
>> > Also note that the 3.4 branch has been created, so going forward you'll
>> > need to ensure that
>> > newly merged PRs are cherrypicked to this branch to make the 3.4
>> release.
>> >
>> > Thanks, and don't hesitate to reach out if you have any questions.
>> >
>> > Greg/Chris -- I looked over the ticket and PR and agree this counts as a
>> > blocker so just try
>> > and get this in as quickly as is reasonable. It seems like things are
>> > mostly sorted with this
>> > fix but I did chime in on the PR discussion regarding keeping the scope
>> > small here
>> >
>> >
>> > On Tue, Dec 6, 2022 at 7:15 AM Chris Egerton 
>> > wrote:
>> >
>> > > Hi Greg,
>> > >
>> > > Thanks for finding and raising this issue. I've given the PR a look
>> and
>> > > plan to continue reviewing it this week until merged. IMO this should
>> > > qualify as a blocker for the release.
>> > >
>> > > Sophie, is it alright if we merge this into the 3.4 branch (or trunk,
>> if
>> > > one has not been created yet) past the December 7th code freeze
>> deadline?
>> > >
>> > > Cheers,
>> > >
>> > > Chris
>> > >
>> > > On Mon, Dec 5, 2022 at 2:11 PM Greg Harris
>> > > >
>> > > wrote:
>> > >
>> > > > Hi All,
>> > > >
>> > > > Just notifying everyone of a regression introduced by KIP-787,
>> > currently
>> > > > only present on trunk, but which may qualify as a blocker for the
>> > > release.
>> > > > It manifests as a moderate resource leak on MirrorMaker2 clusters.
>> The
>> > > fix
>> > > > should have a small scope and low risk.
>> > > >
>> > > > Here's the bug ticket:
>> > https://issues.apache.org/jira/browse/KAFKA-14443
>> > > > Here's the tentative fix PR:
>> > https://github.com/apache/kafka/pull/12955
>> > > >
>> > > > Thanks!
>> > > > Greg
>> > > >
>> > > > On Fri, Dec 2, 2022 at 8:06 AM David Jacot
>> > > >
>> > > > wrote:
>> > > >
>> > > > > Hi Sophie,
>> > > > >
>> > > > > FYI - I just merged KIP-840
>> > > > > (
>> > > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=211884652
>> > > > > )
>> > > > > so it will be in 3.4.
>> > > > >
>> > > > > Best,
>> > > > > David
>> > > > >
>> > > > > On Thu, Dec 1, 2022 at 3:01 AM Sophie Blee-Goldman
>> > > > >  wrote:
>> > > > > >
>> > > > > > Hey all! It's officially *feature freeze for 3.4* so make sure
>> you
>> > > get
>> > > > > that
>> > > > > > feature work merged by the end of today.
>> > > > > > After this point, only bug fixes and other work focused on
>> > > stabilizing
>> > > > > the
>> > > > > > release should be merged to the release
>> > > > > > branch. Also note that the *3.4 code freeze* will be in one week
>> > > (*Dec
>> > > > > 7th*)
>> > > > > > so please make sure to stabilize and
>> > > > > > thoroughly test any new features.
>> > > > > >
>> > > > > > I will wait until Friday to create the release branch to allow
>> for
>> > > any
>> > > > > > existing PRs to be merged. After this point you'll
>> > > > > > need to cherrypick any new commits to the 3.4 branch once a PR
>> is
>> > > > merged.
>> > > > > >
>> > > > > > Finally, I've updated the list of KIPs targeted for 3.4. Please
>> > check
>> > > > out
>> > > > > > the Planned KIP Content on the release
>> > > > > > plan and let me know if there is anything missing or incorrect
>> on
>> > > > there.
>> > > > > >
>> > > > > > Cheers,
>> > > > > > Sophie
>> > > > > >
>> > > > > >
>> > > > > > On Wed, Nov 30, 2022 at 12:29 PM David Arthur > >
>> > > > wrote:
>> > > > > >
>> > > > > > > Sophie, KIP-866 has been accepted. Thanks!
>> > > > > > >
>> > > > > > > -David
>> > > > > > >
>> > > > > > > On Thu, Nov 17, 2022 at 12:21 AM Sophie 

Re: [VOTE] KIP-884: Add config to configure KafkaClientSupplier in Kafka Streams

2022-12-07 Thread Hao Li
Hi all,

I updated the KIP to add a `getKafkaClientSupplier` method in
`StreamsConfig`. Let me know if you have any concerns.

Thanks,
Hao

On Wed, Nov 30, 2022 at 10:26 AM Hao Li  wrote:

> Hi all,
>
> Thanks for the vote. The vote passed with 4 binding votes (John, Matthias,
> Sophie and Bruno).
>
> I'll update KIP and submit a PR for this.
>
> Thanks,
> Hao
>
> On Tue, Nov 22, 2022 at 11:08 PM Bruno Cadonna  wrote:
>
>> Hi Hao,
>>
>> Thanks for the KIP!
>>
>> +1 (binding)
>>
>> Best,
>> Bruno
>>
>> On 22.11.22 10:08, Sophie Blee-Goldman wrote:
>> > Hey Hao, thanks for the KIP -- I'm +1 (binding)
>> >
>> > On Mon, Nov 21, 2022 at 12:57 PM Matthias J. Sax 
>> wrote:
>> >
>> >> +1 (binding)
>> >>
>> >> On 11/21/22 7:39 AM, John Roesler wrote:
>> >>> I'm +1 (binding)
>> >>>
>> >>> Thanks for the KIP!
>> >>> -John
>> >>>
>> >>> On 2022/11/17 21:06:29 Hao Li wrote:
>>  Hi all,
>> 
>>  I would like start a vote on KIP-884:
>> 
>> 
>> >>
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-884%3A+Add+config+to+configure+KafkaClientSupplier+in+Kafka+Streams
>> 
>> 
>>  Thanks,
>>  Hao
>> 
>> >>
>> >
>>
>
>
> --
> Thanks,
> Hao
>


-- 
Thanks,
Hao


[jira] [Created] (KAFKA-14452) Make sticky assignors rack-aware if consumer racks are configured.

2022-12-07 Thread Rajini Sivaram (Jira)
Rajini Sivaram created KAFKA-14452:
--

 Summary: Make sticky assignors rack-aware if consumer racks are 
configured.
 Key: KAFKA-14452
 URL: https://issues.apache.org/jira/browse/KAFKA-14452
 Project: Kafka
  Issue Type: Sub-task
Reporter: Rajini Sivaram
Assignee: Rajini Sivaram


See KIP-881 for details



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14451) Make range assignor rack-aware if consumer racks are configured

2022-12-07 Thread Rajini Sivaram (Jira)
Rajini Sivaram created KAFKA-14451:
--

 Summary: Make range assignor rack-aware if consumer racks are 
configured
 Key: KAFKA-14451
 URL: https://issues.apache.org/jira/browse/KAFKA-14451
 Project: Kafka
  Issue Type: Sub-task
Reporter: Rajini Sivaram
Assignee: Rajini Sivaram


See KIP-881 for details



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14450) Rack-aware partition assignment for consumers (KIP-881)

2022-12-07 Thread Rajini Sivaram (Jira)
Rajini Sivaram created KAFKA-14450:
--

 Summary: Rack-aware partition assignment for consumers (KIP-881)
 Key: KAFKA-14450
 URL: https://issues.apache.org/jira/browse/KAFKA-14450
 Project: Kafka
  Issue Type: New Feature
  Components: consumer
Reporter: Rajini Sivaram
Assignee: Rajini Sivaram


Top-level ticket for KIP-881 since we are splitting the PR into 3.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Kafka logs issue

2022-12-07 Thread Ashok Lunavath
Hi Kafka Team,

We are using Kafka in our project for consuming and producing, but we are
continuous logs after starting the server.
Could someone please help me on this issue.

Thanks and Regards,
Ashok


[jira] [Created] (KAFKA-14449) Brokers not re-joining the ISR list and stuck at started until all the brokers restart

2022-12-07 Thread Swathi Mocharla (Jira)
Swathi Mocharla created KAFKA-14449:
---

 Summary: Brokers not re-joining the ISR list and stuck at started 
until all the brokers restart
 Key: KAFKA-14449
 URL: https://issues.apache.org/jira/browse/KAFKA-14449
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 3.2.0
Reporter: Swathi Mocharla


hi,

We are upgrading a 3 broker cluster (1001,1002,1003) from 3.1.0 to 3.2.0.
During upgrade, it is noticed that when 1003 is restarted, it doesn't join back 
the ISR list and the broker is stuck. Same is the case with 1002.
Only when 1001 is restrarted, 1003,1002 re-join the ISR list and start 
replicating data.

 
{code:java}
{"type":"log", "host":"kf-pl47-me8-2", "level":"INFO", 
"neid":"kafka-b352b4f8cf4447e9a73d9e7ef3ec746c", "system":"kafka", 
"time":"2022-12-06T10:07:30.386", "timezone":"UTC", "log":{"message":"main - 
kafka.server.KafkaServer - [KafkaServer id=1003] started"}}
{"type":"log", "host":"kf-pl47-me8-2", "level":"INFO", 
"neid":"kafka-b352b4f8cf4447e9a73d9e7ef3ec746c", "system":"kafka", 
"time":"2022-12-06T10:07:30.442", "timezone":"UTC", 
"log":{"message":"data-plane-kafka-request-handler-1 - state.change.logger - 
[Broker id=1003] Add 397 partitions and deleted 0 partitions from metadata 
cache in response to UpdateMetadata request sent by controller 1002 epoch 18 
with correlation id 0"}}
{"type":"log", "host":"kf-pl47-me8-2", "level":"INFO", 
"neid":"kafka-b352b4f8cf4447e9a73d9e7ef3ec746c", "system":"kafka", 
"time":"2022-12-06T10:07:30.448", "timezone":"UTC", 
"log":{"message":"BrokerToControllerChannelManager broker=1003 name=alterIsr - 
kafka.server.BrokerToControllerRequestThread - 
[BrokerToControllerChannelManager broker=1003 name=alterIsr]: Recorded new 
controller, from now on will use broker 
kf-pl47-me8-1.kf-pl47-me8-headless.nc0968-admin-ns.svc.cluster.local:9092 (id: 
1002 rack: null)"}}
{"type":"log", "host":"kf-pl47-me8-2", "level":"ERROR", 
"neid":"kafka-b352b4f8cf4447e9a73d9e7ef3ec746c", "system":"kafka", 
"time":"2022-12-06T10:07:30.451", "timezone":"UTC", 
"log":{"message":"data-plane-kafka-network-thread-1003-ListenerName(PLAINTEXT)-PLAINTEXT-1
 - kafka.network.Processor - Closing socket for 
192.168.216.11:9092-192.168.199.100:53778-0 because of error"}}
org.apache.kafka.common.errors.InvalidRequestException: Error getting request 
for apiKey: LEADER_AND_ISR, apiVersion: 6, connectionId: 
192.168.216.11:9092-192.168.199.100:53778-0, listenerName: 
ListenerName(PLAINTEXT), principal: User:ANONYMOUS
org.apache.kafka.common.errors.InvalidRequestException: Error getting request 
for apiKey: LEADER_AND_ISR, apiVersion: 6, connectionId: 
192.168.216.11:9092-192.168.235.153:46282-461, listenerName: 
ListenerName(PLAINTEXT), principal: User:ANONYMOUS
Caused by: org.apache.kafka.common.errors.UnsupportedVersionException: Can't 
read version 6 of LeaderAndIsrTopicState
{"type":"log", "host":"kf-pl47-me8-2", "level":"INFO", 
"neid":"kafka-b352b4f8cf4447e9a73d9e7ef3ec746c", "system":"kafka", 
"time":"2022-12-06T10:12:50.916", "timezone":"UTC", 
"log":{"message":"controller-event-thread - kafka.controller.KafkaController - 
[Controller id=1003] 1003 successfully elected as the controller. Epoch 
incremented to 20 and epoch zk version is now 20"}}
{"type":"log", "host":"kf-pl47-me8-2", "level":"INFO", 
"neid":"kafka-b352b4f8cf4447e9a73d9e7ef3ec746c", "system":"kafka", 
"time":"2022-12-06T10:12:50.917", "timezone":"UTC", 
"log":{"message":"controller-event-thread - kafka.controller.KafkaController - 
[Controller id=1003] Registering handlers"}}
{code}
 


This possibly was introduced by KAFKA-13587.

In the below snapshot during the upgrade, at 16:05:15 UTC 2022, 1001 was 
restarting and both 1002 and 1003 were already up and running (after the 
upgrade from 3.1.0 to 3.2.0), but did not manage to re-join the ISRs. 
{code:java}
Wed Dec  7 16:05:15 UTC 2022
Topic: test     TopicId: L6Yj_Nf9RrirNhFQzvXODw PartitionCount: 2       
ReplicationFactor: 3    Configs: 
compression.type=producer,min.insync.replicas=1,cleanup.policy=delete,flush.ms=1000,segment.bytes=1,flush.messages=1,max.message.bytes=112,index.interval.bytes=4096,unclean.leader.election.enable=false,retention.bytes=10,segment.index.bytes=10485760
        Topic: test     Partition: 0    Leader: none    Replicas: 
1002,1003,1001        Isr: 1001
        Topic: test     Partition: 1    Leader: none    Replicas: 
1001,1002,1003        Isr: 1001
Wed Dec  7 16:05:33 UTC 2022
Topic: test     TopicId: L6Yj_Nf9RrirNhFQzvXODw PartitionCount: 2       
ReplicationFactor: 3    Configs: 
compression.type=producer,min.insync.replicas=1,cleanup.policy=delete,flush.ms=1000,segment.bytes=1,flush.messages=1,max.message.bytes=112,index.interval.bytes=4096,unclean.leader.election.enable=false,retention.bytes=10,segment.index.bytes=10485760
        Topic: 

Re: [VOTE] KIP-878: Internal Topic Autoscaling for Kafka Streams

2022-12-07 Thread Matthias J. Sax

+1 (binding)

On 12/1/22 9:39 PM, Sophie Blee-Goldman wrote:

Thanks to all who participated for a great discussion on this KIP. Seems
we're ready to kick off the voting on this, but please don't hesitate to
call
out anything of concern or raise questions over on the voting thread.

Otherwise, please give it a final look over and cast your vote!

KIP-878: Internal Topic Autoscaling for Kafka Streams

(note the change in name to reflect the decisions in the KIP discussion)

Thanks,
Sophie



Re: [DISCUSS] KIP-878: Autoscaling for Statically Partitioned Streams

2022-12-07 Thread Matthias J. Sax
Thanks for the background. Was just curious about the details. I agree 
that we should not add a new backoff config at this point.


-Matthias

On 12/2/22 4:47 PM, Sophie Blee-Goldman wrote:


I missed the default config values as they were put into comments...


You don't read code comments? (jk...sorry, wasn't sure where the best
place for this would be, suppose I could've just included the full config
definition

About the default timeout: what is the follow up rebalance cadence (I

though it would be 10 minutes?). For this case, a default timeout of 15
minutes would imply that we only allow a single retry before we hit the
timeout. Would this be sufficient (sounds rather aggressive to me)?


Well no, because we will trigger the followup rebalance for this case
immediately
after like we do for cooperative rebalances, not 10 minutes later as in the
case of
probing rebalances. I thought 10 minutes was a rather extreme backoff time
that
there was no motivation for here, unlike with probing rebalances where
we're
explicitly giving the clients time to finish warming up tasks and an
immediate
followup rebalance wouldn't make any sense.

We could of course provide another config for users to tune the backoff
time here,
but I felt that triggering one right away was justified here -- and we can
always add
a backoff config in a followup KIP if there is demand for it. But why
complicate
things for users in the first iteration of this feature when following up
right away
doesn't cause too much harm -- all other threads can continue processing
during
the rebalance, and the leader can fit in some processing between
rebalances  as
well.

Does this sound reasonable to you or would you prefer including the backoff
config
right off the bat?

On Fri, Dec 2, 2022 at 9:21 AM Matthias J. Sax  wrote:


Thanks Sophie.

Good catch on the default partitioner issue!

I missed the default config values as they were put into comments...

About the default timeout: what is the follow up rebalance cadence (I
though it would be 10 minutes?). For this case, a default timeout of 15
minutes would imply that we only allow a single retry before we hit the
timeout. Would this be sufficient (sounds rather aggressive to me)?


-Matthias

On 12/2/22 8:00 AM, Sophie Blee-Goldman wrote:

Thanks again for the responses -- just want to say up front that I

realized

the concept of a
default partitioner is actually substantially more complicated than I

first

assumed due to
key/value typing, so I pulled it from this KIP and filed a ticket for it
for now.

Bruno,

What is exactly the motivation behind metric num-autoscaling-failures?

Actually, to realise that autoscaling did not work, we only need to
monitor subtopology-parallelism over partition.autoscaling.timeout.ms
time, right?


That is exactly the motivation -- I imagine some users may want to retry
indefinitely, and it would not be practical (or very nice) to require

users

monitor for up to *partition.autoscaling.timeout.ms
* when that's been
configured to MAX_VALUE

Is num-autoscaling-failures a way to verify that Streams went through

enough autoscaling attempts during partition.autoscaling.timeout.ms?
Could you maybe add one or two sentences on how users should use
num-autoscaling-failures?


Not really, for the reason outlined above -- I just figured users might

be

monitoring how often the autoscaling is failing and alert past some
threshold
since this implies something funny is going on. This is more of a "health
check"
kind of metric than a "scaling completed" status gauge. At the very

least,

users will want to know when a failure has occurred, even if it's a

single

failure,
no?

Hopefully that makes more sense now, but I suppose I can write something
like that in
the KIP too


Matthias -- answers inline below:

On Thu, Dec 1, 2022 at 10:44 PM Matthias J. Sax 

wrote:



Thanks for updating the KIP Sophie.

I have the same question as Bruno. How can the user use the failure
metric and what actions can be taken to react if the metric increases?



I guess this depends on how important the autoscaling is, but presumably

in

most cases
if you see things failing you probably want to at least look into the

logs

to figure out why
(for example quota violation), and at the most stop your application

while

investigating?



Plus a few more:

(1) Do we assume that user can reason about `subtopology-parallelism`
metric to figure out if auto-scaling is finished? Given that a topology
might be complex and the rules to determine the partition count of
internal topic are not easy, it might be hard to use?

Even if the feature is for advanced users, I don't think we should push
the burden to understand the partition count details onto them.

We could add a second `target-subtopology-parallelism` metric (or
`expected-subtopology-paralleslism` or some other name)? This way, users
can compare "target/expected" and "actual" value and easily 

Re: [VOTE] KIP-837 Allow MultiCasting a Result Record.

2022-12-07 Thread Matthias J. Sax

I don't see any update on the wiki about it. Did you forget to hit "save"?

Can you also provide some background? I am not sure right now if I 
understand the proposed changes?



-Matthias

On 12/6/22 6:36 PM, Sophie Blee-Goldman wrote:

Thanks Sagar, this makes sense to me -- we clearly need additional changes
to
avoid breaking IQ when using this feature, but I agree with continuing to
restrict
FKJ since they wouldn't stop working without it, and would become much
harder
to reason about (than they already are) if we did enable them to use it.

And of course, they can still multicast the final results of a FKJ, they
just can't
mess with the internal workings of it in this way.

On Tue, Dec 6, 2022 at 9:48 AM Sagar  wrote:


Hi All,

I made a couple of edits to the KIP which came up during the code review.
Changes at a high level are:

1) KeyQueryMetada enhanced to have a new method called partitions().
2) Lifting the restriction of a single partition for IQ. Now the
restriction holds only for FK Join.

Updated KIP:
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=211883356

Thanks!
Sagar.

On Mon, Sep 12, 2022 at 6:43 PM Sagar  wrote:


Thanks Bruno,

Marking this as accepted.

Thanks everyone for their comments/feedback.

Thanks!
Sagar.

On Mon, Sep 12, 2022 at 1:53 PM Bruno Cadonna 

wrote:



Hi Sagar,

Thanks for the update and the PR!

+1 (binding)

Best,
Bruno

On 10.09.22 18:57, Sagar wrote:

Hi Bruno,

Thanks, I think these changes make sense to me. I have updated the KIP
accordingly.

Thanks!
Sagar.

On Wed, Sep 7, 2022 at 2:16 PM Bruno Cadonna 

wrote:



Hi Sagar,

I would not drop the support for dropping records. I would also not
return null from partitions(). Maybe an Optional can help here. An

empty

Optional would mean to use the default partitioning behavior of the
producer. So we would have:

- non-empty Optional, non-empty list of integers: partitions to send

the

record to
- non-empty Optional, empty list of integers: drop the record
- empty Optional: use default behavior

What do other think?

Best,
Bruno

On 02.09.22 13:53, Sagar wrote:

Hello Bruno/Chris,

Since these are the last set of changes(I am assuming haha), it

would

be

great if you could review the 2 options from above so that we can

close

the

voting. Of course I am happy to incorporate any other requisite

changes.


Thanks!
Sagar.

On Wed, Aug 31, 2022 at 10:07 PM Sagar 

wrote:



Thanks Bruno for the great points.

I see 2 options here =>

1) As Chris suggested, drop the support for dropping records in the
partitioner. That way, an empty list could signify the usage of a

default

partitioner. Also, if the deprecated partition() method returns

null

thereby signifying the default partitioner, the partitions() can

return

an

empty list i.e default partitioner.

2) OR we treat a null return type of partitions() method to signify

the

usage of the default partitioner. In the default implementation of
partitions() method, if partition() returns null, then even

partitions()

can return null(instead of an empty list). The RecordCollectorImpl

code

can

also be modified accordingly. @Chris, to your point, we can even

drop

the

support of dropping of records. It came up during KIP discussion,

and I

thought it might be a useful feature. Let me know what you think.

3) Lastly about the partition number check. I wanted to avoid the

throwing

of exception so I thought adding it might be a useful feature. But

as

you

pointed out, if it can break backwards compatibility, it's better

to

remove

it.

Thanks!
Sagar.


On Tue, Aug 30, 2022 at 6:32 PM Chris Egerton



wrote:


+1 to Bruno's concerns about backward compatibility. Do we

actually

need

support for dropping records in the partitioner? It doesn't seem

necessary

based on the motivation for the KIP. If we remove that feature, we

could

handle null and/or empty lists by using the default partitioning,
equivalent to how we handle null return values from the existing

partition

method today.

On Tue, Aug 30, 2022 at 8:55 AM Bruno Cadonna 


wrote:



Hi Sagar,

Thank you for the updates!

I do not intend to prolong this vote thread more than needed,

but I

still have some points.

The deprecated partition method can return null if the default
partitioning logic of the producer should be used.
With the new method partitions() it seems that it is not possible

to

use

the default partitioning logic, anymore.

Also, in the default implementation of method partitions(), a

record

that would use the default partitioning logic in method

partition()

would be dropped, which would break backward compatibility since

Streams

would always call the new method partitions() even though the

users

still implement the deprecated method partition().

I have a last point that we should probably discuss on the PR and

not

on

the KIP but since you added the code in the KIP I need to mention

it.

I

do not think you should check the validity of 

[jira] [Created] (KAFKA-14448) ZK brokers register with KRaft during migration

2022-12-07 Thread David Arthur (Jira)
David Arthur created KAFKA-14448:


 Summary: ZK brokers register with KRaft during migration
 Key: KAFKA-14448
 URL: https://issues.apache.org/jira/browse/KAFKA-14448
 Project: Kafka
  Issue Type: Sub-task
Reporter: David Arthur
Assignee: David Arthur






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14447) Controlled shutdown for ZK brokers during migration

2022-12-07 Thread David Arthur (Jira)
David Arthur created KAFKA-14447:


 Summary: Controlled shutdown for ZK brokers during migration
 Key: KAFKA-14447
 URL: https://issues.apache.org/jira/browse/KAFKA-14447
 Project: Kafka
  Issue Type: Sub-task
Reporter: David Arthur






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] KIP-875: First-class offsets support in Kafka Connect

2022-12-07 Thread Yash Mayya
Hi Chris,

Sorry for the late reply.

> I don't believe logging an error message is sufficient for
> handling failures to reset-after-delete. IMO it's highly
> likely that users will either shoot themselves in the foot
> by not reading the fine print and realizing that the offset
> request may have failed, or will ask for better visibility
> into the success or failure of the reset request than
> scanning log files.

Your reasoning for deferring the reset offsets after delete functionality
to a separate KIP makes sense, thanks for the explanation.

> I've updated the KIP with the
> developer-facing API changes for this logic

This is great, I hadn't considered the other two (very valid) use-cases for
such an API, thanks for adding these with elaborate documentation! However,
the significance / use of the boolean value returned by the two methods is
not fully clear, could you please clarify?

Thanks,
Yash

On Fri, Nov 18, 2022 at 1:06 AM Chris Egerton 
wrote:

> Hi Yash,
>
> I've updated the KIP with the correct "kafka_topic", "kafka_partition", and
> "kafka_offset" keys in the JSON examples (settled on those instead of
> prefixing with "Kafka " for better interactions with tooling like JQ). I've
> also added a note about sink offset requests failing if there are still
> active members in the consumer group.
>
> I don't believe logging an error message is sufficient for handling
> failures to reset-after-delete. IMO it's highly likely that users will
> either shoot themselves in the foot by not reading the fine print and
> realizing that the offset request may have failed, or will ask for better
> visibility into the success or failure of the reset request than scanning
> log files. I don't doubt that there are ways to address this, but I would
> prefer to leave them to a separate KIP since the required design work is
> non-trivial and I do not feel that the added burden is worth tying to this
> KIP as a blocker.
>
> I was really hoping to avoid introducing a change to the developer-facing
> APIs with this KIP, but after giving it some thought I think this may be
> unavoidable. It's debatable whether validation of altered offsets is a good
> enough use case on its own for this kind of API, but since there are also
> connectors out there that manage offsets externally, we should probably add
> a hook to allow those external offsets to be managed, which can then serve
> double- or even-triple duty as a hook to validate custom offsets and to
> notify users whether offset resets/alterations are supported at all (which
> they may not be if, for example, offsets are coupled tightly with the data
> written by a sink connector). I've updated the KIP with the
> developer-facing API changes for this logic; let me know what you think.
>
> Cheers,
>
> Chris
>
> On Mon, Nov 14, 2022 at 10:16 AM Mickael Maison 
> wrote:
>
> > Hi Chris,
> >
> > Thanks for the update!
> >
> > It's relatively common to only want to reset offsets for a specific
> > resource (for example with MirrorMaker for one or a group of topics).
> > Could it be possible to add a way to do so? Either by providing a
> > payload to DELETE or by setting the offset field to an empty object in
> > the PATCH payload?
> >
> > Thanks,
> > Mickael
> >
> > On Sat, Nov 12, 2022 at 3:33 PM Yash Mayya  wrote:
> > >
> > > Hi Chris,
> > >
> > > Thanks for pointing out that the consumer group deletion step itself
> will
> > > fail in case of zombie sink tasks. Since we can't get any stronger
> > > guarantees from consumers (unlike with transactional producers), I
> think
> > it
> > > makes perfect sense to fail the offset reset attempt in such scenarios
> > with
> > > a relevant error message to the user. I was more concerned about
> silently
> > > failing but it looks like that won't be an issue. It's probably worth
> > > calling out this difference between source / sink connectors explicitly
> > in
> > > the KIP, what do you think?
> > >
> > > > changing the field names for sink offsets
> > > > from "topic", "partition", and "offset" to "Kafka
> > > > topic", "Kafka partition", and "Kafka offset" respectively, to
> > > > reduce the stuttering effect of having a "partition" field inside
> > > >  a "partition" field and the same with an "offset" field
> > >
> > > The KIP is still using the nested partition / offset fields by the way
> -
> > > has it not been updated because we're waiting for consensus on the
> field
> > > names?
> > >
> > > > The reset-after-delete feature, on the other
> > > > hand, is actually pretty tricky to design; I've updated the
> > > > rationale in the KIP for delaying it and clarified that it's not
> > > > just a matter of implementation but also design work.
> > >
> > > I like the idea of writing an offset reset request to the config topic
> > > which will be processed by the herder's config update listener - I'm
> not
> > > sure I fully follow the concerns with regard to handling failures? Why
> > > can't we simply log an error saying that 

Re: [DISCUSS] Apache Kafka 3.4.0 release

2022-12-07 Thread Rajini Sivaram
Hi Sophie,

The first PR for KIP-881 which contains protocol changes has been merged to
trunk (https://github.com/apache/kafka/pull/12954). It is a relatively
small PR, can we merge to 3.4.0?

Thank you,

Rajini


On Wed, Dec 7, 2022 at 11:16 AM Divij Vaidya 
wrote:

> Hey Sophie
>
> I have a couple of pending PRs which have been waiting for review since
> preparation of the 3.3 release. They are not blockers for 3.4 but are being
> tracked as improvements that we would like to add to 3.4 release.
>
> Please consider taking a look when you get a chance:
>
> 1. https://issues.apache.org/jira/browse/KAFKA-7109
> 2. https://github.com/apache/kafka/pull/12228
>
> --
> Divij Vaidya
>
>
>
> On Wed, Dec 7, 2022 at 3:18 AM Sophie Blee-Goldman
>  wrote:
>
> > Hey all,
> >
> > First off, just a heads up that code freeze will be *tomorrow, Dec 6th*
> so
> > please make sure
> > to merge any lingering PRs by EOD Wednesday (PST). If you have a
> potential
> > blocker
> > that may take longer to fix and hasn't already been communicated to me,
> > please reach out
> > to me now and make sure the ticket is marked as a blocker for 3.4
> >
> > Also note that the 3.4 branch has been created, so going forward you'll
> > need to ensure that
> > newly merged PRs are cherrypicked to this branch to make the 3.4 release.
> >
> > Thanks, and don't hesitate to reach out if you have any questions.
> >
> > Greg/Chris -- I looked over the ticket and PR and agree this counts as a
> > blocker so just try
> > and get this in as quickly as is reasonable. It seems like things are
> > mostly sorted with this
> > fix but I did chime in on the PR discussion regarding keeping the scope
> > small here
> >
> >
> > On Tue, Dec 6, 2022 at 7:15 AM Chris Egerton 
> > wrote:
> >
> > > Hi Greg,
> > >
> > > Thanks for finding and raising this issue. I've given the PR a look and
> > > plan to continue reviewing it this week until merged. IMO this should
> > > qualify as a blocker for the release.
> > >
> > > Sophie, is it alright if we merge this into the 3.4 branch (or trunk,
> if
> > > one has not been created yet) past the December 7th code freeze
> deadline?
> > >
> > > Cheers,
> > >
> > > Chris
> > >
> > > On Mon, Dec 5, 2022 at 2:11 PM Greg Harris
>  > >
> > > wrote:
> > >
> > > > Hi All,
> > > >
> > > > Just notifying everyone of a regression introduced by KIP-787,
> > currently
> > > > only present on trunk, but which may qualify as a blocker for the
> > > release.
> > > > It manifests as a moderate resource leak on MirrorMaker2 clusters.
> The
> > > fix
> > > > should have a small scope and low risk.
> > > >
> > > > Here's the bug ticket:
> > https://issues.apache.org/jira/browse/KAFKA-14443
> > > > Here's the tentative fix PR:
> > https://github.com/apache/kafka/pull/12955
> > > >
> > > > Thanks!
> > > > Greg
> > > >
> > > > On Fri, Dec 2, 2022 at 8:06 AM David Jacot
>  > >
> > > > wrote:
> > > >
> > > > > Hi Sophie,
> > > > >
> > > > > FYI - I just merged KIP-840
> > > > > (
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=211884652
> > > > > )
> > > > > so it will be in 3.4.
> > > > >
> > > > > Best,
> > > > > David
> > > > >
> > > > > On Thu, Dec 1, 2022 at 3:01 AM Sophie Blee-Goldman
> > > > >  wrote:
> > > > > >
> > > > > > Hey all! It's officially *feature freeze for 3.4* so make sure
> you
> > > get
> > > > > that
> > > > > > feature work merged by the end of today.
> > > > > > After this point, only bug fixes and other work focused on
> > > stabilizing
> > > > > the
> > > > > > release should be merged to the release
> > > > > > branch. Also note that the *3.4 code freeze* will be in one week
> > > (*Dec
> > > > > 7th*)
> > > > > > so please make sure to stabilize and
> > > > > > thoroughly test any new features.
> > > > > >
> > > > > > I will wait until Friday to create the release branch to allow
> for
> > > any
> > > > > > existing PRs to be merged. After this point you'll
> > > > > > need to cherrypick any new commits to the 3.4 branch once a PR is
> > > > merged.
> > > > > >
> > > > > > Finally, I've updated the list of KIPs targeted for 3.4. Please
> > check
> > > > out
> > > > > > the Planned KIP Content on the release
> > > > > > plan and let me know if there is anything missing or incorrect on
> > > > there.
> > > > > >
> > > > > > Cheers,
> > > > > > Sophie
> > > > > >
> > > > > >
> > > > > > On Wed, Nov 30, 2022 at 12:29 PM David Arthur 
> > > > wrote:
> > > > > >
> > > > > > > Sophie, KIP-866 has been accepted. Thanks!
> > > > > > >
> > > > > > > -David
> > > > > > >
> > > > > > > On Thu, Nov 17, 2022 at 12:21 AM Sophie Blee-Goldman
> > > > > > >  wrote:
> > > > > > > >
> > > > > > > > Thanks for the update Rajini, I've added this to the release
> > page
> > > > > since
> > > > > > > it
> > > > > > > > looks like
> > > > > > > > it will pass but of course if anything changes, just let me
> > know.
> > > > > > > >
> > > > > > > > David, I'm fine with aiming to include KIP-866 in 

Re: [DISCUSS] Apache Kafka 3.4.0 release

2022-12-07 Thread Divij Vaidya
Hey Sophie

I have a couple of pending PRs which have been waiting for review since
preparation of the 3.3 release. They are not blockers for 3.4 but are being
tracked as improvements that we would like to add to 3.4 release.

Please consider taking a look when you get a chance:

1. https://issues.apache.org/jira/browse/KAFKA-7109
2. https://github.com/apache/kafka/pull/12228

--
Divij Vaidya



On Wed, Dec 7, 2022 at 3:18 AM Sophie Blee-Goldman
 wrote:

> Hey all,
>
> First off, just a heads up that code freeze will be *tomorrow, Dec 6th* so
> please make sure
> to merge any lingering PRs by EOD Wednesday (PST). If you have a potential
> blocker
> that may take longer to fix and hasn't already been communicated to me,
> please reach out
> to me now and make sure the ticket is marked as a blocker for 3.4
>
> Also note that the 3.4 branch has been created, so going forward you'll
> need to ensure that
> newly merged PRs are cherrypicked to this branch to make the 3.4 release.
>
> Thanks, and don't hesitate to reach out if you have any questions.
>
> Greg/Chris -- I looked over the ticket and PR and agree this counts as a
> blocker so just try
> and get this in as quickly as is reasonable. It seems like things are
> mostly sorted with this
> fix but I did chime in on the PR discussion regarding keeping the scope
> small here
>
>
> On Tue, Dec 6, 2022 at 7:15 AM Chris Egerton 
> wrote:
>
> > Hi Greg,
> >
> > Thanks for finding and raising this issue. I've given the PR a look and
> > plan to continue reviewing it this week until merged. IMO this should
> > qualify as a blocker for the release.
> >
> > Sophie, is it alright if we merge this into the 3.4 branch (or trunk, if
> > one has not been created yet) past the December 7th code freeze deadline?
> >
> > Cheers,
> >
> > Chris
> >
> > On Mon, Dec 5, 2022 at 2:11 PM Greg Harris  >
> > wrote:
> >
> > > Hi All,
> > >
> > > Just notifying everyone of a regression introduced by KIP-787,
> currently
> > > only present on trunk, but which may qualify as a blocker for the
> > release.
> > > It manifests as a moderate resource leak on MirrorMaker2 clusters. The
> > fix
> > > should have a small scope and low risk.
> > >
> > > Here's the bug ticket:
> https://issues.apache.org/jira/browse/KAFKA-14443
> > > Here's the tentative fix PR:
> https://github.com/apache/kafka/pull/12955
> > >
> > > Thanks!
> > > Greg
> > >
> > > On Fri, Dec 2, 2022 at 8:06 AM David Jacot  >
> > > wrote:
> > >
> > > > Hi Sophie,
> > > >
> > > > FYI - I just merged KIP-840
> > > > (
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=211884652
> > > > )
> > > > so it will be in 3.4.
> > > >
> > > > Best,
> > > > David
> > > >
> > > > On Thu, Dec 1, 2022 at 3:01 AM Sophie Blee-Goldman
> > > >  wrote:
> > > > >
> > > > > Hey all! It's officially *feature freeze for 3.4* so make sure you
> > get
> > > > that
> > > > > feature work merged by the end of today.
> > > > > After this point, only bug fixes and other work focused on
> > stabilizing
> > > > the
> > > > > release should be merged to the release
> > > > > branch. Also note that the *3.4 code freeze* will be in one week
> > (*Dec
> > > > 7th*)
> > > > > so please make sure to stabilize and
> > > > > thoroughly test any new features.
> > > > >
> > > > > I will wait until Friday to create the release branch to allow for
> > any
> > > > > existing PRs to be merged. After this point you'll
> > > > > need to cherrypick any new commits to the 3.4 branch once a PR is
> > > merged.
> > > > >
> > > > > Finally, I've updated the list of KIPs targeted for 3.4. Please
> check
> > > out
> > > > > the Planned KIP Content on the release
> > > > > plan and let me know if there is anything missing or incorrect on
> > > there.
> > > > >
> > > > > Cheers,
> > > > > Sophie
> > > > >
> > > > >
> > > > > On Wed, Nov 30, 2022 at 12:29 PM David Arthur 
> > > wrote:
> > > > >
> > > > > > Sophie, KIP-866 has been accepted. Thanks!
> > > > > >
> > > > > > -David
> > > > > >
> > > > > > On Thu, Nov 17, 2022 at 12:21 AM Sophie Blee-Goldman
> > > > > >  wrote:
> > > > > > >
> > > > > > > Thanks for the update Rajini, I've added this to the release
> page
> > > > since
> > > > > > it
> > > > > > > looks like
> > > > > > > it will pass but of course if anything changes, just let me
> know.
> > > > > > >
> > > > > > > David, I'm fine with aiming to include KIP-866 in the 3.4
> release
> > > as
> > > > well
> > > > > > > since this
> > > > > > > seems to be a critical part of the zookeeper removal/migration.
> > > > Please
> > > > > > let
> > > > > > > me know
> > > > > > > when it's been accepted
> > > > > > >
> > > > > > > On Wed, Nov 16, 2022 at 11:08 AM Rajini Sivaram <
> > > > rajinisiva...@gmail.com
> > > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi Sophie,
> > > > > > > >
> > > > > > > > KIP-881 has three binding votes (David Jacot, Jun and me) and
> > one
> > > > > > > > non-binding vote (Maulin). So it is good to go 

Re: [DISCUSS] KIP-852 Optimize calculation of size for log in remote tier

2022-12-07 Thread Divij Vaidya
The method is needed for RLMM implementations which fetch the information
over the network and not for the disk based implementations (such as the
default topic based RLMM).

I would argue that adding this API makes the interface more generic than
what it is today. This is because, with the current APIs an implementor is
restricted to use disk based RLMM solutions only (i.e. the default
solution) whereas if we add this new API, we unblock usage of network based
RLMM implementations such as databases.



On Wed 30. Nov 2022 at 20:40, Jun Rao  wrote:

> Hi, Divij,
>
> Thanks for the reply.
>
> Point#2. My high level question is that is the new method needed for every
> implementation of remote storage or just for a specific implementation. The
> issues that you pointed out exist for the default implementation of RLMM as
> well and so far, the default implementation hasn't found a need for a
> similar new method. For public interface, ideally we want to make it more
> general.
>
> Thanks,
>
> Jun
>
> On Mon, Nov 21, 2022 at 7:11 AM Divij Vaidya 
> wrote:
>
> > Thank you Jun and Alex for your comments.
> >
> > Point#1: You are right Jun. As Alex mentioned, the "derived metadata" can
> > increase the size of cached metadata by a factor of 10 but it should be
> ok
> > to cache just the actual metadata. My point about size being a limitation
> > for using cache is not valid anymore.
> >
> > Point#2: For a new replica, it would still have to fetch the metadata
> over
> > the network to initiate the warm up of the cache and hence, increase the
> > start time of the archival process. Please also note the repercussions of
> > the warm up scan that Alex mentioned in this thread as part of #102.2.
> >
> > 100#: Agreed Alex. Thanks for clarifying that. My point about size being
> a
> > limitation for using cache is not valid anymore.
> >
> > 101#: Alex, if I understand correctly, you are suggesting to cache the
> > total size at the leader and update it on archival. This wouldn't work
> for
> > cases when the leader restarts where we would have to make a full scan
> > to update the total size entry on startup. We expect users to store data
> > over longer duration in remote storage which increases the likelihood of
> > leader restarts / failovers.
> >
> > 102#.1: I don't think that the current design accommodates the fact that
> > data corruption could happen at the RLMM plugin (we don't have checksum
> as
> > a field in metadata as part of KIP405). If data corruption occurs, w/ or
> > w/o the cache, it would be a different problem to solve. I would like to
> > keep this outside the scope of this KIP.
> >
> > 102#.2: Agree. This remains as the main concern for using the cache to
> > fetch total size.
> >
> > Regards,
> > Divij Vaidya
> >
> >
> >
> > On Fri, Nov 18, 2022 at 12:59 PM Alexandre Dupriez <
> > alexandre.dupr...@gmail.com> wrote:
> >
> > > Hi Divij,
> > >
> > > Thanks for the KIP. Please find some comments based on what I read on
> > > this thread so far - apologies for the repeats and the late reply.
> > >
> > > If I understand correctly, one of the main elements of discussion is
> > > about caching in Kafka versus delegation of providing the remote size
> > > of a topic-partition to the plugin.
> > >
> > > A few comments:
> > >
> > > 100. The size of the “derived metadata” which is managed by the plugin
> > > to represent an rlmMetadata can indeed be close to 1 kB on average
> > > depending on its own internal structure, e.g. the redundancy it
> > > enforces (unfortunately resulting to duplication), additional
> > > information such as checksums and primary and secondary indexable
> > > keys. But indeed, the rlmMetadata is itself a lighter data structure
> > > by a factor of 10. And indeed, instead of caching the “derived
> > > metadata”, only the rlmMetadata could be, which should address the
> > > concern regarding the memory occupancy of the cache.
> > >
> > > 101. I am not sure I fully understand why we would need to cache the
> > > list of rlmMetadata to retain the remote size of a topic-partition.
> > > Since the leader of a topic-partition is, in non-degenerated cases,
> > > the only actor which can mutate the remote part of the
> > > topic-partition, hence its size, it could in theory only cache the
> > > size of the remote log once it has calculated it? In which case there
> > > would not be any problem regarding the size of the caching strategy.
> > > Did I miss something there?
> > >
> > > 102. There may be a few challenges to consider with caching:
> > >
> > > 102.1) As mentioned above, the caching strategy assumes no mutation
> > > outside the lifetime of a leader. While this is true in the normal
> > > course of operation, there could be accidental mutation outside of the
> > > leader and a loss of consistency between the cached state and the
> > > actual remote representation of the log. E.g. split-brain scenarios,
> > > bugs in the plugins, bugs in external systems with mutating access on
>