Jenkins build is still unstable: Kafka » Kafka Branch Builder » trunk #1363

2022-11-17 Thread Apache Jenkins Server
See 




Re: [DISCUSS] KIP-884: Add config to configure KafkaClientSupplier in Kafka Streams

2022-11-17 Thread Matthias J. Sax

SG.

Can we clarify/document the behavior on the KIP?


-Matthias

On 11/15/22 4:19 PM, Hao Li wrote:

Thanks for the questions Matthias!

1. I think we can check the config in the constructor which doesn't take
the client supplier as a parameter. This one:

 public KafkaStreams(final Topology topology,
 final Properties props) {
 ...
 }

If users provide a client supplier in another constructor, the config won't
be checked and the provided one will be used, which is code would override
the config.

2. I'm fine with `default.client.supplier` and make it the
`DefaultKafkaClientSupplier`

Thanks,
Hao


On Tue, Nov 15, 2022 at 4:11 PM Matthias J. Sax  wrote:


Thanks for the KIP Hao.

What is the behavior if users set the config and also pass in a client
supplier into the constructor?

Following other config/API patterns we use, it seems the best thing
would be if the code would overwrite the config?

If we do this, should we change the config name to
`default.client.supplier` and not make it `null`, but set the default
supplier we use currently?

This way, the config and code would behave the same as other configs
like `default.timestamp.extractor` and similar.


-Matthias



On 11/15/22 3:35 PM, Hao Li wrote:

Hi all,

I have submitted KIP-884 to add config to configure KafkaClientSupplier

and

would like to start a discussion:



https://cwiki.apache.org/confluence/display/KAFKA/KIP-884%3A+Add+config+to+configure+KafkaClientSupplier+in+Kafka+Streams










Jenkins build is still unstable: Kafka » Kafka Branch Builder » trunk #1362

2022-11-17 Thread Apache Jenkins Server
See 




Jenkins build is unstable: Kafka » Kafka Branch Builder » trunk #1361

2022-11-17 Thread Apache Jenkins Server
See 




Re: [DISCUSS] KIP-883: Add delete callback method to Connector API

2022-11-17 Thread Hector Geraldino (BLOOMBERG/ 919 3RD A)
Hi,

I've updated the KIP with the new #stop(boolean isDeleted) overloaded method, 
and have also amended the PR and JIRA tickets. I also added a couple entries to 
the "Rejected alternatives" section with the reasons why I pivoted from 
introducing new callback methods to retrofit the existing one.

Please let me know what your thoughts are.

Cheers,
Hector 

From: Hector Geraldino (BLOOMBERG/ 919 3RD A) At: 11/16/22 17:38:59 UTC-5:00To: 
 dev@kafka.apache.org
Subject: Re: [DISCUSS] KIP-883: Add delete callback method to Connector API

Hi Mickael,

I agree that the new STOPPED state proposed in KIP-875 will improve the 
connector lifecycle. The changes proposed in this KIP aim to cover the gap 
where connectors need to actually be deleted, but because the API doesn't 
provide any hooks, external assets are left lingering where they shouldn't.

I agree that this proposal is similar to KIP-419, maybe the main difference is 
their focus on Tasks whereas KIP-833 proposes changes to the Connector. My goal 
is to figure out the correct semantics for notifying connectors that they're 
being stopped because the connector has been deleted. 

Now, computing the "deleted" state in both the Standalone and Distributed 
herders is not hard, so the question is: when shall the connector be notified? 
The "easiest" option would be to do it by calling an overloaded 
Connector#stop(deleted) method, but there are other - more expressive - ways, 
like providing an 'onDelete()' or 'destroy()' method. 

I'm leaning towards adding an overload method (less complexity, known corner 
cases), and will amend the KIP with the reasoning behind that decision soon.

Thanks for your feedback! 

From: dev@kafka.apache.org At: 11/16/22 11:13:17 UTC-5:00To:  
dev@kafka.apache.org
Subject: Re: [DISCUSS] KIP-883: Add delete callback method to Connector API

Hi Hector,

Thanks for the KIP.

One tricky aspect is that currently there's no real way to stop a
connector so to do so people often just delete them temporarily.
KIP-875 proposes adding a mechanism to properly stop connectors which
should reduce the need to deleting them and avoid doing potentially
expensive cleanup operations repetitively.

This KIP also reminds me of KIP-419:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-419%3A+Safely+notify+Kafka
+Connect+SourceTask+is+stopped.
Is it guaranteed the new delete callback will be the last method
called?

Thanks,
Mickael


On Tue, Nov 15, 2022 at 5:40 PM Sagar  wrote:
>
> Hey Hector,
>
> Thanks for the KIP. I have a minor suggestion in terms of naming. Since
> this is a callback method, would it make sense to call it onDelete()?
>
> Also, the failure scenarios discussed by Greg would need handling. Among
> other things, I like the idea of having a timeout for graceful shutdown or
> else try a force shutdown. What do you think about that approach?
>
> Thanks!
> Sagar.
>
> On Sat, Nov 12, 2022 at 1:53 AM Hector Geraldino (BLOOMBERG/ 919 3RD A) <
> hgerald...@bloomberg.net> wrote:
>
> > Thanks Greg for taking your time to review not just the KIP but also the
> > PR.
> >
> > 1. You made very valid points regarding the behavior of the destroy()
> > callback for connectors that don't follow the happy path. After thinking
> > about it, I decided to tweak the implementation a bit and have the
> > destroy() method be called during the worker shutdown: this means it will
> > share the same guarantees the connector#stop() method has. An alternative
> > implementation can be to have an overloaded connector#stop(boolean deleted)
> > method that signals a connector that it is being stopped due to deletion,
> > but I think that having a separate destroy() method provides clearer
> > semantics.
> >
> > I'll make sure to ammend the KIP with these details.
> >
> > 3. Without going too deep on the types of operations that can be performed
> > by a connector when it's being deleted, I can imagine the
> > org.apache.kafka.connect.source.SourceConnector base class having a default
> > implementation that deletes the connector's offsets automatically
> > (controlled by a property); this is in the context of KIP-875 (first-class
> > offsets support in Kafka Connect). Similar behaviors can be introduced for
> > the SinkConnector, however I'm not sure if this KIP is the right place to
> > discuss all the possibilities, or if we shoold keeping it more
> > narrow-focused on  providing a callback mechanism for when connectors are
> > deleted, and what the expectations are around this newly introduced method.
> > What do you think?
> >
> >
> > From: dev@kafka.apache.org At: 11/09/22 16:55:04 UTC-5:00To:
> > dev@kafka.apache.org
> > Subject: Re: [DISCUSS] KIP-883: Add delete callback method to Connector API
> >
> > Hi Hector,
> >
> > Thanks for the KIP!
> >
> > This is certainly missing functionality from the native Connect framework,
> > and we should try to make it possible to inform connectors about this part
> > of their lifecycle.
> > However, as with most f

[VOTE] KIP-884: Add config to configure KafkaClientSupplier in Kafka Streams

2022-11-17 Thread Hao Li
Hi all,

I would like start a vote on KIP-884:

https://cwiki.apache.org/confluence/display/KAFKA/KIP-884%3A+Add+config+to+configure+KafkaClientSupplier+in+Kafka+Streams


Thanks,
Hao


Re: [DISCUSS] KIP-875: First-class offsets support in Kafka Connect

2022-11-17 Thread Chris Egerton
Hi Mickael,

Thanks for your thoughts! IMO it's most intuitive to use a null value in
the PATCH API to signify that an offset should be reset, since it aligns
nicely with the API we provide to source connectors, where null offsets are
translated under the hood to tombstone records in the internal offsets
topic. Does that seem reasonable to you?

Cheers,

Chris

On Thu, Nov 17, 2022 at 2:35 PM Chris Egerton  wrote:

> Hi Yash,
>
> I've updated the KIP with the correct "kafka_topic", "kafka_partition",
> and "kafka_offset" keys in the JSON examples (settled on those instead of
> prefixing with "Kafka " for better interactions with tooling like JQ). I've
> also added a note about sink offset requests failing if there are still
> active members in the consumer group.
>
> I don't believe logging an error message is sufficient for handling
> failures to reset-after-delete. IMO it's highly likely that users will
> either shoot themselves in the foot by not reading the fine print and
> realizing that the offset request may have failed, or will ask for better
> visibility into the success or failure of the reset request than scanning
> log files. I don't doubt that there are ways to address this, but I would
> prefer to leave them to a separate KIP since the required design work is
> non-trivial and I do not feel that the added burden is worth tying to this
> KIP as a blocker.
>
> I was really hoping to avoid introducing a change to the developer-facing
> APIs with this KIP, but after giving it some thought I think this may be
> unavoidable. It's debatable whether validation of altered offsets is a good
> enough use case on its own for this kind of API, but since there are also
> connectors out there that manage offsets externally, we should probably add
> a hook to allow those external offsets to be managed, which can then serve
> double- or even-triple duty as a hook to validate custom offsets and to
> notify users whether offset resets/alterations are supported at all (which
> they may not be if, for example, offsets are coupled tightly with the data
> written by a sink connector). I've updated the KIP with the
> developer-facing API changes for this logic; let me know what you think.
>
> Cheers,
>
> Chris
>
> On Mon, Nov 14, 2022 at 10:16 AM Mickael Maison 
> wrote:
>
>> Hi Chris,
>>
>> Thanks for the update!
>>
>> It's relatively common to only want to reset offsets for a specific
>> resource (for example with MirrorMaker for one or a group of topics).
>> Could it be possible to add a way to do so? Either by providing a
>> payload to DELETE or by setting the offset field to an empty object in
>> the PATCH payload?
>>
>> Thanks,
>> Mickael
>>
>> On Sat, Nov 12, 2022 at 3:33 PM Yash Mayya  wrote:
>> >
>> > Hi Chris,
>> >
>> > Thanks for pointing out that the consumer group deletion step itself
>> will
>> > fail in case of zombie sink tasks. Since we can't get any stronger
>> > guarantees from consumers (unlike with transactional producers), I
>> think it
>> > makes perfect sense to fail the offset reset attempt in such scenarios
>> with
>> > a relevant error message to the user. I was more concerned about
>> silently
>> > failing but it looks like that won't be an issue. It's probably worth
>> > calling out this difference between source / sink connectors explicitly
>> in
>> > the KIP, what do you think?
>> >
>> > > changing the field names for sink offsets
>> > > from "topic", "partition", and "offset" to "Kafka
>> > > topic", "Kafka partition", and "Kafka offset" respectively, to
>> > > reduce the stuttering effect of having a "partition" field inside
>> > >  a "partition" field and the same with an "offset" field
>> >
>> > The KIP is still using the nested partition / offset fields by the way -
>> > has it not been updated because we're waiting for consensus on the field
>> > names?
>> >
>> > > The reset-after-delete feature, on the other
>> > > hand, is actually pretty tricky to design; I've updated the
>> > > rationale in the KIP for delaying it and clarified that it's not
>> > > just a matter of implementation but also design work.
>> >
>> > I like the idea of writing an offset reset request to the config topic
>> > which will be processed by the herder's config update listener - I'm not
>> > sure I fully follow the concerns with regard to handling failures? Why
>> > can't we simply log an error saying that the offset reset for the
>> deleted
>> > connector "xyz" failed due to reason "abc"? As long as it's documented
>> that
>> > connector deletion and offset resets are asynchronous and a success
>> > response only means that the request was initiated successfully (which
>> is
>> > the case even today with normal connector deletion), we should be fine
>> > right?
>> >
>> > Thanks for adding the new PATCH endpoint to the KIP, I think it's a lot
>> > more useful for this use case than a PUT endpoint would be! One thing
>> > that I was thinking about with the new PATCH endpoint is that while we
>> can
>> > ea

Re: [DISCUSS] KIP-875: First-class offsets support in Kafka Connect

2022-11-17 Thread Chris Egerton
Hi Yash,

I've updated the KIP with the correct "kafka_topic", "kafka_partition", and
"kafka_offset" keys in the JSON examples (settled on those instead of
prefixing with "Kafka " for better interactions with tooling like JQ). I've
also added a note about sink offset requests failing if there are still
active members in the consumer group.

I don't believe logging an error message is sufficient for handling
failures to reset-after-delete. IMO it's highly likely that users will
either shoot themselves in the foot by not reading the fine print and
realizing that the offset request may have failed, or will ask for better
visibility into the success or failure of the reset request than scanning
log files. I don't doubt that there are ways to address this, but I would
prefer to leave them to a separate KIP since the required design work is
non-trivial and I do not feel that the added burden is worth tying to this
KIP as a blocker.

I was really hoping to avoid introducing a change to the developer-facing
APIs with this KIP, but after giving it some thought I think this may be
unavoidable. It's debatable whether validation of altered offsets is a good
enough use case on its own for this kind of API, but since there are also
connectors out there that manage offsets externally, we should probably add
a hook to allow those external offsets to be managed, which can then serve
double- or even-triple duty as a hook to validate custom offsets and to
notify users whether offset resets/alterations are supported at all (which
they may not be if, for example, offsets are coupled tightly with the data
written by a sink connector). I've updated the KIP with the
developer-facing API changes for this logic; let me know what you think.

Cheers,

Chris

On Mon, Nov 14, 2022 at 10:16 AM Mickael Maison 
wrote:

> Hi Chris,
>
> Thanks for the update!
>
> It's relatively common to only want to reset offsets for a specific
> resource (for example with MirrorMaker for one or a group of topics).
> Could it be possible to add a way to do so? Either by providing a
> payload to DELETE or by setting the offset field to an empty object in
> the PATCH payload?
>
> Thanks,
> Mickael
>
> On Sat, Nov 12, 2022 at 3:33 PM Yash Mayya  wrote:
> >
> > Hi Chris,
> >
> > Thanks for pointing out that the consumer group deletion step itself will
> > fail in case of zombie sink tasks. Since we can't get any stronger
> > guarantees from consumers (unlike with transactional producers), I think
> it
> > makes perfect sense to fail the offset reset attempt in such scenarios
> with
> > a relevant error message to the user. I was more concerned about silently
> > failing but it looks like that won't be an issue. It's probably worth
> > calling out this difference between source / sink connectors explicitly
> in
> > the KIP, what do you think?
> >
> > > changing the field names for sink offsets
> > > from "topic", "partition", and "offset" to "Kafka
> > > topic", "Kafka partition", and "Kafka offset" respectively, to
> > > reduce the stuttering effect of having a "partition" field inside
> > >  a "partition" field and the same with an "offset" field
> >
> > The KIP is still using the nested partition / offset fields by the way -
> > has it not been updated because we're waiting for consensus on the field
> > names?
> >
> > > The reset-after-delete feature, on the other
> > > hand, is actually pretty tricky to design; I've updated the
> > > rationale in the KIP for delaying it and clarified that it's not
> > > just a matter of implementation but also design work.
> >
> > I like the idea of writing an offset reset request to the config topic
> > which will be processed by the herder's config update listener - I'm not
> > sure I fully follow the concerns with regard to handling failures? Why
> > can't we simply log an error saying that the offset reset for the deleted
> > connector "xyz" failed due to reason "abc"? As long as it's documented
> that
> > connector deletion and offset resets are asynchronous and a success
> > response only means that the request was initiated successfully (which is
> > the case even today with normal connector deletion), we should be fine
> > right?
> >
> > Thanks for adding the new PATCH endpoint to the KIP, I think it's a lot
> > more useful for this use case than a PUT endpoint would be! One thing
> > that I was thinking about with the new PATCH endpoint is that while we
> can
> > easily validate the request body format for sink connectors (since it's
> the
> > same across all connectors), we can't do the same for source connectors
> as
> > things stand today since each source connector implementation can define
> > its own source partition and offset structures. Without any validation,
> > writing a bad offset for a source connector via the PATCH endpoint could
> > cause it to fail with hard to discern errors. I'm wondering if we could
> add
> > a new method to the `SourceConnector` class (which should be overridden
> by
> > sour

Re: [DISCUSS] KIP-852 Optimize calculation of size for log in remote tier

2022-11-17 Thread Jun Rao
Hi, Divij,

Thanks for the reply.

Point #1. Is the average remote segment metadata really 1KB? What's listed
in the public interface is probably well below 100 bytes.

Point #2. I guess you are assuming that each broker only caches the remote
segment metadata in memory. An alternative approach is to cache them in
both memory and local disk. That way, on broker restart, you just need to
fetch the new remote segments' metadata using the
listRemoteLogSegments(TopicIdPartition topicIdPartition, int leaderEpoch)
api. Will that work?

Point #3. Thanks for the explanation and it sounds good.

Thanks,

Jun

On Thu, Nov 17, 2022 at 7:31 AM Divij Vaidya 
wrote:

> Hi Jun
>
> There are three points that I would like to present here:
>
> 1. We would require a large cache size to efficiently cache all segment
> metadata.
> 2. Linear scan of all metadata at broker startup to populate the cache will
> be slow and will impact the archival process.
> 3. There is no other use case where a full scan of segment metadata is
> required.
>
> Let's start by quantifying 1. Here's my estimate for the size of the cache.
> Average size of segment metadata = 1KB. This could be more if we have
> frequent leader failover with a large number of leader epochs being stored
> per segment.
> Segment size = 100MB. Users will prefer to reduce the segment size from the
> default value of 1GB to ensure timely archival of data since data from
> active segment is not archived.
> Cache size = num segments * avg. segment metadata size =  (100TB/100MB)*1KB
> = 1GB.
> While 1GB for cache may not sound like a large number for larger machines,
> it does eat into the memory as an additional cache and makes use cases with
> large data retention with low throughout expensive (where such use case
> would could use smaller machines).
>
> About point#2:
> Even if we say that all segment metadata can fit into the cache, we will
> need to populate the cache on broker startup. It would not be in the
> critical patch of broker startup and hence won't impact the startup time.
> But it will impact the time when we could start the archival process since
> the RLM thread pool will be blocked on the first call to
> listRemoteLogSegments(). To scan metadata for 1MM segments (computed above)
> and transfer 1GB data over the network from a RLMM such as a remote
> database would be in the order of minutes (depending on how efficient the
> scan is with the RLMM implementation). Although, I would concede that
> having RLM threads blocked for a few minutes is perhaps OK but if we
> introduce the new API proposed in the KIP, we would have a
> deterministic startup time for RLM. Adding the API comes at a low cost and
> I believe the trade off is worth it.
>
> About point#3:
> We can use listRemoteLogSegments(TopicIdPartition topicIdPartition, int
> leaderEpoch) to calculate the segments eligible for deletion (based on size
> retention) where leader epoch(s) belong to the current leader epoch chain.
> I understand that it may lead to segments belonging to other epoch lineage
> not getting deleted and would require a separate mechanism to delete them.
> The separate mechanism would anyways be required to delete these "leaked"
> segments as there are other cases which could lead to leaks such as network
> problems with RSM mid way writing through. segment etc.
>
> Thank you for the replies so far. They have made me re-think my assumptions
> and this dialogue has been very constructive for me.
>
> Regards,
> Divij Vaidya
>
>
>
> On Thu, Nov 10, 2022 at 10:49 PM Jun Rao  wrote:
>
> > Hi, Divij,
> >
> > Thanks for the reply.
> >
> > It's true that the data in Kafka could be kept longer with KIP-405. How
> > much data do you envision to have per broker? For 100TB data per broker,
> > with 1GB segment and segment metadata of 100 bytes, it requires
> > 100TB/1GB*100 = 10MB, which should fit in memory.
> >
> > RemoteLogMetadataManager has two listRemoteLogSegments() methods. The one
> > you listed listRemoteLogSegments(TopicIdPartition topicIdPartition, int
> > leaderEpoch) does return data in offset order. However, the other
> > one listRemoteLogSegments(TopicIdPartition topicIdPartition) doesn't
> > specify the return order. I assume that you need the latter to calculate
> > the segment size?
> >
> > Thanks,
> >
> > Jun
> >
> > On Thu, Nov 10, 2022 at 10:25 AM Divij Vaidya 
> > wrote:
> >
> > > *Jun,*
> > >
> > > *"the default implementation of RLMM does local caching, right?"*
> > > Yes, Jun. The default implementation of RLMM does indeed cache the
> > segment
> > > metadata today, hence, it won't work for use cases when the number of
> > > segments in remote storage is large enough to exceed the size of cache.
> > As
> > > part of this KIP, I will implement the new proposed API in the default
> > > implementation of RLMM but the underlying implementation will still be
> a
> > > scan. I will pick up optimizing that in a separate PR.
> > >
> > > *"we also cache all segment metadata i

Build failed in Jenkins: Kafka » Kafka Branch Builder » trunk #1360

2022-11-17 Thread Apache Jenkins Server
See 


Changes:


--
[...truncated 428747 lines...]
[2022-11-17T19:08:47.160Z] > Task 
:connect:json:generateMetadataFileForMavenJavaPublication
[2022-11-17T19:08:47.160Z] > Task :connect:api:javadocJar
[2022-11-17T19:08:47.160Z] > Task :connect:api:compileTestJava UP-TO-DATE
[2022-11-17T19:08:47.160Z] > Task :connect:api:testClasses UP-TO-DATE
[2022-11-17T19:08:47.160Z] > Task :connect:api:testJar
[2022-11-17T19:08:47.160Z] > Task :connect:api:testSrcJar
[2022-11-17T19:08:47.160Z] > Task 
:connect:json:publishMavenJavaPublicationToMavenLocal
[2022-11-17T19:08:47.160Z] > Task :connect:json:publishToMavenLocal
[2022-11-17T19:08:47.160Z] > Task 
:connect:api:publishMavenJavaPublicationToMavenLocal
[2022-11-17T19:08:47.160Z] > Task :connect:api:publishToMavenLocal
[2022-11-17T19:08:48.233Z] 
[2022-11-17T19:08:48.233Z] > Task :streams:javadoc
[2022-11-17T19:08:48.233Z] 
/home/jenkins/jenkins-agent/workspace/Kafka_kafka_trunk/streams/src/main/java/org/apache/kafka/streams/kstream/Produced.java:84:
 warning - Tag @link: reference not found: DefaultPartitioner
[2022-11-17T19:08:48.233Z] 
/home/jenkins/jenkins-agent/workspace/Kafka_kafka_trunk/streams/src/main/java/org/apache/kafka/streams/kstream/Produced.java:136:
 warning - Tag @link: reference not found: DefaultPartitioner
[2022-11-17T19:08:48.233Z] 
/home/jenkins/jenkins-agent/workspace/Kafka_kafka_trunk/streams/src/main/java/org/apache/kafka/streams/kstream/Produced.java:147:
 warning - Tag @link: reference not found: DefaultPartitioner
[2022-11-17T19:08:48.233Z] 
/home/jenkins/jenkins-agent/workspace/Kafka_kafka_trunk/streams/src/main/java/org/apache/kafka/streams/kstream/Repartitioned.java:101:
 warning - Tag @link: reference not found: DefaultPartitioner
[2022-11-17T19:08:48.233Z] 
/home/jenkins/jenkins-agent/workspace/Kafka_kafka_trunk/streams/src/main/java/org/apache/kafka/streams/kstream/Repartitioned.java:167:
 warning - Tag @link: reference not found: DefaultPartitioner
[2022-11-17T19:08:48.233Z] 
/home/jenkins/jenkins-agent/workspace/Kafka_kafka_trunk/streams/src/main/java/org/apache/kafka/streams/TopologyConfig.java:62:
 warning - Tag @link: missing '#': "org.apache.kafka.streams.StreamsBuilder()"
[2022-11-17T19:08:48.233Z] 
/home/jenkins/jenkins-agent/workspace/Kafka_kafka_trunk/streams/src/main/java/org/apache/kafka/streams/TopologyConfig.java:62:
 warning - Tag @link: can't find org.apache.kafka.streams.StreamsBuilder() in 
org.apache.kafka.streams.TopologyConfig
[2022-11-17T19:08:48.233Z] 
/home/jenkins/jenkins-agent/workspace/Kafka_kafka_trunk/streams/src/main/java/org/apache/kafka/streams/TopologyDescription.java:38:
 warning - Tag @link: reference not found: ProcessorContext#forward(Object, 
Object) forwards
[2022-11-17T19:08:48.233Z] 
/home/jenkins/jenkins-agent/workspace/Kafka_kafka_trunk/streams/src/main/java/org/apache/kafka/streams/query/Position.java:44:
 warning - Tag @link: can't find query(Query,
[2022-11-17T19:08:48.233Z]  PositionBound, boolean) in 
org.apache.kafka.streams.processor.StateStore
[2022-11-17T19:08:48.233Z] 
/home/jenkins/jenkins-agent/workspace/Kafka_kafka_trunk/streams/src/main/java/org/apache/kafka/streams/query/QueryResult.java:109:
 warning - Tag @link: reference not found: this#getResult()
[2022-11-17T19:08:48.233Z] 
/home/jenkins/jenkins-agent/workspace/Kafka_kafka_trunk/streams/src/main/java/org/apache/kafka/streams/query/QueryResult.java:116:
 warning - Tag @link: reference not found: this#getFailureReason()
[2022-11-17T19:08:48.233Z] 
/home/jenkins/jenkins-agent/workspace/Kafka_kafka_trunk/streams/src/main/java/org/apache/kafka/streams/query/QueryResult.java:116:
 warning - Tag @link: reference not found: this#getFailureMessage()
[2022-11-17T19:08:48.233Z] 
/home/jenkins/jenkins-agent/workspace/Kafka_kafka_trunk/streams/src/main/java/org/apache/kafka/streams/query/QueryResult.java:154:
 warning - Tag @link: reference not found: this#isSuccess()
[2022-11-17T19:08:48.233Z] 
/home/jenkins/jenkins-agent/workspace/Kafka_kafka_trunk/streams/src/main/java/org/apache/kafka/streams/query/QueryResult.java:154:
 warning - Tag @link: reference not found: this#isFailure()
[2022-11-17T19:08:48.233Z] 
/home/jenkins/jenkins-agent/workspace/Kafka_kafka_trunk/streams/src/main/java/org/apache/kafka/streams/kstream/KStream.java:890:
 warning - Tag @link: reference not found: DefaultPartitioner
[2022-11-17T19:08:48.233Z] 
/home/jenkins/jenkins-agent/workspace/Kafka_kafka_trunk/streams/src/main/java/org/apache/kafka/streams/kstream/KStream.java:919:
 warning - Tag @link: reference not found: DefaultPartitioner
[2022-11-17T19:08:48.233Z] 
/home/jenkins/jenkins-agent/workspace/Kafka_kafka_trunk/streams/src/main/java/org/apache/kafka/streams/kstream/KStream.java:939:
 warning - Tag @link: reference not found: DefaultPartitioner
[2022-11-17T19:08:48.233Z] 25 warnings
[2022-11-17T19:08:49.305Z] 
[2022-11-17T19:08:

[jira] [Resolved] (KAFKA-14375) Remove use of "authorizer-properties" in EndToEndAuthorizationTest.scala

2022-11-17 Thread Proven Provenzano (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Proven Provenzano resolved KAFKA-14375.
---
Resolution: Fixed

> Remove use of "authorizer-properties" in EndToEndAuthorizationTest.scala
> 
>
> Key: KAFKA-14375
> URL: https://issues.apache.org/jira/browse/KAFKA-14375
> Project: Kafka
>  Issue Type: Improvement
>  Components: unit tests
>Reporter: Proven Provenzano
>Assignee: Proven Provenzano
>Priority: Major
>
> The use of {{authorizer-properties}} in AclCommand is deprecated and 
> EndToEndAuthroiztionTest.scala should be updated to not use it. 
> I will instead set {{kafkaPrincipal}} as a super user and set up the brokers 
> with AclAuthorzier. This will allow {{kafkaPrincipal}} to set ACLs and 
> clientPrincipal to validate them as per the tests.
> This update is a precursor to updating  EndToEndAuthroiztionTest.scala to run 
> in KRAFT mode



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14398) Update EndToEndAuthorizerTest.scala to test with ZK and KRAFT quorum servers

2022-11-17 Thread Proven Provenzano (Jira)
Proven Provenzano created KAFKA-14398:
-

 Summary: Update EndToEndAuthorizerTest.scala to test with ZK and 
KRAFT quorum servers
 Key: KAFKA-14398
 URL: https://issues.apache.org/jira/browse/KAFKA-14398
 Project: Kafka
  Issue Type: Improvement
  Components: kraft, unit tests
Reporter: Proven Provenzano


KRAFT is a replacement for ZK for storing metadata.

We should validate that ACLs work with KRAFT for the supported authentication 
mechanizms. 

I will update EndToEndAuthorizerTest.scala to test with ZK and KRAFT.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] KIP-852 Optimize calculation of size for log in remote tier

2022-11-17 Thread Divij Vaidya
Hi Jun

There are three points that I would like to present here:

1. We would require a large cache size to efficiently cache all segment
metadata.
2. Linear scan of all metadata at broker startup to populate the cache will
be slow and will impact the archival process.
3. There is no other use case where a full scan of segment metadata is
required.

Let's start by quantifying 1. Here's my estimate for the size of the cache.
Average size of segment metadata = 1KB. This could be more if we have
frequent leader failover with a large number of leader epochs being stored
per segment.
Segment size = 100MB. Users will prefer to reduce the segment size from the
default value of 1GB to ensure timely archival of data since data from
active segment is not archived.
Cache size = num segments * avg. segment metadata size =  (100TB/100MB)*1KB
= 1GB.
While 1GB for cache may not sound like a large number for larger machines,
it does eat into the memory as an additional cache and makes use cases with
large data retention with low throughout expensive (where such use case
would could use smaller machines).

About point#2:
Even if we say that all segment metadata can fit into the cache, we will
need to populate the cache on broker startup. It would not be in the
critical patch of broker startup and hence won't impact the startup time.
But it will impact the time when we could start the archival process since
the RLM thread pool will be blocked on the first call to
listRemoteLogSegments(). To scan metadata for 1MM segments (computed above)
and transfer 1GB data over the network from a RLMM such as a remote
database would be in the order of minutes (depending on how efficient the
scan is with the RLMM implementation). Although, I would concede that
having RLM threads blocked for a few minutes is perhaps OK but if we
introduce the new API proposed in the KIP, we would have a
deterministic startup time for RLM. Adding the API comes at a low cost and
I believe the trade off is worth it.

About point#3:
We can use listRemoteLogSegments(TopicIdPartition topicIdPartition, int
leaderEpoch) to calculate the segments eligible for deletion (based on size
retention) where leader epoch(s) belong to the current leader epoch chain.
I understand that it may lead to segments belonging to other epoch lineage
not getting deleted and would require a separate mechanism to delete them.
The separate mechanism would anyways be required to delete these "leaked"
segments as there are other cases which could lead to leaks such as network
problems with RSM mid way writing through. segment etc.

Thank you for the replies so far. They have made me re-think my assumptions
and this dialogue has been very constructive for me.

Regards,
Divij Vaidya



On Thu, Nov 10, 2022 at 10:49 PM Jun Rao  wrote:

> Hi, Divij,
>
> Thanks for the reply.
>
> It's true that the data in Kafka could be kept longer with KIP-405. How
> much data do you envision to have per broker? For 100TB data per broker,
> with 1GB segment and segment metadata of 100 bytes, it requires
> 100TB/1GB*100 = 10MB, which should fit in memory.
>
> RemoteLogMetadataManager has two listRemoteLogSegments() methods. The one
> you listed listRemoteLogSegments(TopicIdPartition topicIdPartition, int
> leaderEpoch) does return data in offset order. However, the other
> one listRemoteLogSegments(TopicIdPartition topicIdPartition) doesn't
> specify the return order. I assume that you need the latter to calculate
> the segment size?
>
> Thanks,
>
> Jun
>
> On Thu, Nov 10, 2022 at 10:25 AM Divij Vaidya 
> wrote:
>
> > *Jun,*
> >
> > *"the default implementation of RLMM does local caching, right?"*
> > Yes, Jun. The default implementation of RLMM does indeed cache the
> segment
> > metadata today, hence, it won't work for use cases when the number of
> > segments in remote storage is large enough to exceed the size of cache.
> As
> > part of this KIP, I will implement the new proposed API in the default
> > implementation of RLMM but the underlying implementation will still be a
> > scan. I will pick up optimizing that in a separate PR.
> >
> > *"we also cache all segment metadata in the brokers without KIP-405. Do
> you
> > see a need to change that?"*
> > Please correct me if I am wrong here but we cache metadata for segments
> > "residing in local storage". The size of the current cache works fine for
> > the scale of the number of segments that we expect to store in local
> > storage. After KIP-405, that cache will continue to store metadata for
> > segments which are residing in local storage and hence, we don't need to
> > change that. For segments which have been offloaded to remote storage, it
> > would rely on RLMM. Note that the scale of data stored in RLMM is
> different
> > from local cache because the number of segments is expected to be much
> > larger than what current implementation stores in local storage.
> >
> > 2,3,4: RemoteLogMetadataManager.listRemoteLogSegments() does specify the
> > orde

Re: [DISCUSS] Apache Kafka 3.3.2

2022-11-17 Thread Bruno Cadonna

Thanks for volunteering!

+1

Best,
Bruno

On 17.11.22 09:57, Luke Chen wrote:

Thanks for volunteering!

On Thu, Nov 17, 2022 at 1:07 AM José Armando García Sancio
 wrote:


+1. Thanks for volunteering.

--
-José





Jenkins build is still unstable: Kafka » Kafka Branch Builder » trunk #1359

2022-11-17 Thread Apache Jenkins Server
See 




Re: [DISCUSS] Apache Kafka 3.3.2

2022-11-17 Thread Luke Chen
Thanks for volunteering!

On Thu, Nov 17, 2022 at 1:07 AM José Armando García Sancio
 wrote:

> +1. Thanks for volunteering.
>
> --
> -José
>