[jira] [Resolved] (KAFKA-15116) Kafka Streams processing blocked during rebalance

2023-10-25 Thread A. Sophie Blee-Goldman (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

A. Sophie Blee-Goldman resolved KAFKA-15116.

Resolution: Not A Problem

> Kafka Streams processing blocked during rebalance
> -
>
> Key: KAFKA-15116
> URL: https://issues.apache.org/jira/browse/KAFKA-15116
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 3.5.0
>Reporter: David Gammon
>Priority: Major
>
> We have a Kafka Streams application that simply takes a messages, processes 
> it and then produces an event out the other side. The complexity is that 
> there is a requirement that all events with the same partition key must be 
> committed before the next message  is processed.
> This works most of the time flawlessly but we have started to see problems 
> during deployments where the first message blocks the second message during a 
> rebalance because the first message isn’t committed before the second message 
> is processed. This ultimately results in transactions timing out and more 
> rebalancing.
> We’ve tried lots of configuration to get the behaviour we require with no 
> luck. We’ve now put in a temporary fix so that Kafka Streams works with our 
> framework but it feels like this might be a missing feature or potentially a 
> bug.
> +Example+
> Given:
>  * We have two messages (InA and InB).
>  * Both messages have the same partition key.
>  * A rebalance is in progress so streams is no longer able to commit.
> When:
>  # Message InA -> processor -> OutA (not committed)
>  # Message InB -> processor -> blocked because #1 has not been committed



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] KIP-975 Docker Image for Apache Kafka

2023-10-25 Thread Ismael Juma
Hi Vedarth,


> Local Kafka startup time (without JSA): 1.592 secs
> Local Kafka startup time (with JSA): 1.016 secs
> Local Kafka startup memory usage (without JSA): 440MB
> Local Kafka startup memory usage (with JSA): 380MB


This is a significant reduction in start-up time (33%) - nice!

Ismael

On Wed, Oct 25, 2023 at 10:24 AM Vedarth Sharma 
wrote:

> Hi Ismael!
>
> Thanks for bringing this to our attention.
>
> We did a small POC integrating CDS with Kafka server startup, and
> encountered positive outcomes(results are added in the KIP).
> Hence, we've decided to include the dynamically generated JSA file from the
> following workflow in the Docker image:
>
>1. Start Kafka
>2. Create a topic
>3. Produce messages
>4. Consume messages
>5. Stop Kafka
>
> Additionally, we've identified some limitations of CDS, which have also
> been detailed in the KIP.
>
> Thanks and regards,
> Vedarth
>
> On Wed, Oct 25, 2023 at 10:56 AM Ismael Juma  wrote:
>
> > The reference I meant to include:
> >
> > https://docs.oracle.com/en/java/javase/17/vm/class-data-sharing.html
> >
> > On Tue, Oct 24, 2023, 10:25 PM Ismael Juma  wrote:
> >
> > > Hi Krishna,
> > >
> > > One last question from me, did we confuse using AppCDS or Dynamic CDS?
> > >
> > > Thanks,
> > > Ismael
> > >
> > > On Tue, Oct 24, 2023, 9:54 PM Krishna Agarwal <
> > > krishna0608agar...@gmail.com> wrote:
> > >
> > >> Hi,
> > >> Thanks for the insightful feedback on this KIP. If there are no
> further
> > >> questions, I'm considering wrapping up this discussion thread. We'll
> be
> > >> moving into the voting process in the next couple of days. Your
> > continued
> > >> input is greatly appreciated!
> > >>
> > >> Regards,
> > >> Krishna
> > >>
> > >> On Fri, Sep 8, 2023 at 1:27 PM Krishna Agarwal <
> > >> krishna0608agar...@gmail.com>
> > >> wrote:
> > >>
> > >> > Hi,
> > >> > Apache Kafka does not have an official docker image currently.
> > >> > I want to submit a KIP to publish a docker image for Apache Kafka.
> > >> >
> > >> > KIP-975: Docker Image for Apache Kafka
> > >> > <
> > >>
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-975%3A+Docker+Image+for+Apache+Kafka
> > >> >
> > >> >
> > >> > Regards,
> > >> > Krishna
> > >> >
> > >>
> > >
> >
>


[jira] [Resolved] (KAFKA-12550) Introduce RESTORING state to the KafkaStreams FSM

2023-10-25 Thread A. Sophie Blee-Goldman (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-12550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

A. Sophie Blee-Goldman resolved KAFKA-12550.

Resolution: Won't Fix

Closing this out since it's usefulness is preempted by the StateUpdaterThread 
and having moved restoration out of the main StreamThread

> Introduce RESTORING state to the KafkaStreams FSM
> -
>
> Key: KAFKA-12550
> URL: https://issues.apache.org/jira/browse/KAFKA-12550
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: A. Sophie Blee-Goldman
>Priority: Major
>  Labels: needs-kip
>
> We should consider adding a new state to the KafkaStreams FSM: RESTORING
> This would cover the time between the completion of a stable rebalance and 
> the completion of restoration across the client. Currently, Streams will 
> report the state during this time as REBALANCING even though it is generally 
> spending much more time restoring than rebalancing in most cases.
> There are a few motivations/benefits behind this idea:
> # Observability is a big one: using the umbrella REBALANCING state to cover 
> all aspects of rebalancing -> task initialization -> restoring has been a 
> common source of confusion in the past. It’s also proved to be a time sink 
> for us, during escalations, incidents, mailing list questions, and bug 
> reports. It often adds latency to escalations in particular as we have to go 
> through GTS and wait for the customer to clarify whether their “Kafka Streams 
> is stuck rebalancing” ticket means that it’s literally rebalancing, or just 
> in the REBALANCING state and actually stuck elsewhere in Streams
> # Prereq for global thread improvements: for example [KIP-406: 
> GlobalStreamThread should honor custom reset policy 
> |https://cwiki.apache.org/confluence/display/KAFKA/KIP-406%3A+GlobalStreamThread+should+honor+custom+reset+policy]
>  was ultimately blocked on this as we needed to pause the Streams app while 
> the global thread restored from the appropriate offset. Since there’s 
> absolutely no rebalancing involved in this case, piggybacking on the 
> REBALANCING state would just be shooting ourselves in the foot.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-15463) StreamsException: Accessing from an unknown node

2023-10-25 Thread A. Sophie Blee-Goldman (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

A. Sophie Blee-Goldman resolved KAFKA-15463.

Resolution: Not A Problem

>  StreamsException: Accessing from an unknown node
> -
>
> Key: KAFKA-15463
> URL: https://issues.apache.org/jira/browse/KAFKA-15463
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 3.2.1
>Reporter: Yevgeny
>Priority: Major
>
> After some time application was working fine, starting to get:
>  
> This is springboot application runs in kubernetes as stateful pod.
>  
>  
>  
> {code:java}
>   Exception in thread 
> "-ddf9819f-d6c7-46ce-930e-cd923e1b3c2c-StreamThread-1" 
> org.apache.kafka.streams.errors.StreamsException: Accessing from an unknown 
> node at 
> org.apache.kafka.streams.processor.internals.ProcessorContextImpl.getStateStore(ProcessorContextImpl.java:162)
>  at myclass1.java:28) at myclass2.java:48) at 
> java.base/java.util.stream.MatchOps$1MatchSink.accept(MatchOps.java:90) at 
> java.base/java.util.ArrayList$ArrayListSpliterator.tryAdvance(ArrayList.java:1602)
>  at 
> java.base/java.util.stream.ReferencePipeline.forEachWithCancel(ReferencePipeline.java:129)
>  at 
> java.base/java.util.stream.AbstractPipeline.copyIntoWithCancel(AbstractPipeline.java:527)
>  at 
> java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:513)
>  at 
> java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499)
>  at 
> java.base/java.util.stream.MatchOps$MatchOp.evaluateSequential(MatchOps.java:230)
>  at 
> java.base/java.util.stream.MatchOps$MatchOp.evaluateSequential(MatchOps.java:196)
>  at 
> java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
>  at 
> java.base/java.util.stream.ReferencePipeline.allMatch(ReferencePipeline.java:637)
>  at myclass3.java:48) at 
> org.apache.kafka.streams.kstream.internals.TransformerSupplierAdapter$1.transform(TransformerSupplierAdapter.java:49)
>  at 
> org.apache.kafka.streams.kstream.internals.TransformerSupplierAdapter$1.transform(TransformerSupplierAdapter.java:38)
>  at 
> org.apache.kafka.streams.kstream.internals.KStreamFlatTransform$KStreamFlatTransformProcessor.process(KStreamFlatTransform.java:66)
>  at 
> org.apache.kafka.streams.processor.internals.ProcessorNode.process(ProcessorNode.java:146)
>  at 
> org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forwardInternal(ProcessorContextImpl.java:275)
>  at 
> org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:254)
>  at 
> org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:213)
>  at 
> org.apache.kafka.streams.processor.internals.SourceNode.process(SourceNode.java:84)
>  at 
> org.apache.kafka.streams.processor.internals.StreamTask.lambda$doProcess$1(StreamTask.java:780)
>  at 
> org.apache.kafka.streams.processor.internals.metrics.StreamsMetricsImpl.maybeMeasureLatency(StreamsMetricsImpl.java:809)
>  at 
> org.apache.kafka.streams.processor.internals.StreamTask.doProcess(StreamTask.java:780)
>  at 
> org.apache.kafka.streams.processor.internals.StreamTask.process(StreamTask.java:711)
>  at 
> org.apache.kafka.streams.processor.internals.TaskExecutor.processTask(TaskExecutor.java:100)
>  at 
> org.apache.kafka.streams.processor.internals.TaskExecutor.process(TaskExecutor.java:81)
>  at 
> org.apache.kafka.streams.processor.internals.TaskManager.process(TaskManager.java:1177)
>  at 
> org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:769)
>  at 
> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:589)
>  at 
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:551)
>    {code}
>  
> stream-thread 
> [-ddf9819f-d6c7-46ce-930e-cd923e1b3c2c-StreamThread-1] State 
> transition from PENDING_SHUTDOWN to DEAD
>  
>  
> Transformer is Prototype bean, the supplier supplys new instance of the 
> Transformer:
>  
>  
> {code:java}
> @Override public Transformer> get() 
> {     return ctx.getBean(MyTransformer.class); }{code}
>  
>  
> The only way to recover is to delete all topics used by kafkastreams, even if 
> application restarted same exception is thrown.
> *If messages in internal topics of 'store-changelog'  are deleted/offset 
> manipulated, can it cause the issue?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] KIP-990: Capability to SUSPEND Tasks on DeserializationException

2023-10-25 Thread Sophie Blee-Goldman
1. Makes sense to me! Can you just update the name of the
DeserializationHandlerResponse enum from SUSPEND to PAUSE so
we're consistent with the wording?

The drawback here would be that custom stateful Processors
> might also be impacted, but there'd be no way to know if they're safe to
> not pause.
>
2. This is a really good point -- maybe this is just a case where we have
to trust
in the user not to accidentally screw themselves over. As long as we provide
sufficient information for them to decide when it is/isn't safe to pause a
task,
I would be ok with just documenting the dangers of indiscriminate use of
this
feature, and hope that everyone reads the warning.

Given the above, I have one suggestion: what if we only add the PAUSE enum
in this KIP, and don't include an OOTB DeserializationExceptionHandler that
implements this? I see this as addressing two concerns:
2a. It would make it clear that this is an advanced feature and should be
given
careful consideration, rather than just plugging in a config value.
2b. It forces the user to implement the handler themselves, which gives them
an opportunity to check on which task it is that's hitting the error and
then
make a conscious decision as to whether it is safe to pause or not. In the
end,
it's really impossible for us to know what is/is not safe to pause, so the
more
responsibility we can put on the user in this case, the better.

3. It sounds like the general recovery workflow would be to either resolve
the
issue somehow (presumably by fixing an issue in the deserializer?) and
restart the application -- in which case no further manual intervention is
required -- or else to determine the record is unprocessable and should be
skipped, in which case the user needs to somehow increment the offset
and then resume the task.

It's a bit awkward to ask people to use the command line tools to manually
wind the offset forward. More importantly, there are likely many operators
who
don't have the permissions necessary to use the command line tools for
this kind of thing, and they would be pretty much out of luck in that case.

On the flipside, it seems like if the user ever wants to resume the task
without restarting, they will need to skip over the bad record. I think we
can
make the feature considerably more ergonomic by modifying the behavior
of the #resume method so that it always skips over the bad record. This
will probably be the easiest to implement anyways, as it is effectively the
same as the CONTINUE option internally, but gives the user time to
decide if they really do want to CONTINUE or not

Not sure if we would want to rename the #resume method in that case to
make this more clear, or if javadocs would be sufficient...maybe
something like #skipRecordAndContinue?

On Tue, Oct 24, 2023 at 6:54 AM Nick Telford  wrote:

> Hi Sophie,
>
> Thanks for the review!
>
> 1-3.
> I had a feeling this was the case. I'm thinking of adding a PAUSED state
> with the following valid transitions:
>
>- RUNNING -> PAUSED
>- PAUSED -> RUNNING
>- PAUSED -> SUSPENDED
>
> The advantage of a dedicated State is it should make testing easier and
> also reduce the potential for introducing bugs into the existing Task
> states.
>
> While I appreciate that the engine is being revised, I think I'll still
> pursue this actively instead of waiting, as it addresses some problems my
> team is having right now. If the KIP is accepted, then I suspect that this
> feature would still be desirable with the new streams engine, so any new
> Task state would likely want to be mirrored in the new engine, and the high
> level design is unlikely to change.
>
> 4a.
> This is an excellent point I hadn't considered. Correct me if I'm wrong,
> but the only joins that this would impact are Stream-Stream and
> Stream-Table joins? Table-Table joins should be safe, because the join is
> commutative, so a delayed record on one side should just cause its output
> record to be delayed, but not lost.
>
> 4b.
> If we can enumerate only the node types that are impacted by this (i.e.
> Stream-Stream and Stream-Table joins), then perhaps we could restrict it
> such that it only pauses dependent Tasks if there's a Stream-Stream/Table
> join involved? The drawback here would be that custom stateful Processors
> might also be impacted, but there'd be no way to know if they're safe to
> not pause.
>
> 4c.
> Regardless, I like this idea, but I have very little knowledge about making
> changes to the rebalance/network protocol. It looks like this could be
> added via StreamsPartitionAssignor#subscriptionUserData? I might need some
> help designing this aspect of this KIP.
>
> Regards,
> Nick
>
> On Tue, 24 Oct 2023 at 07:30, Sophie Blee-Goldman 
> wrote:
>
> > Hey Nick,
> >
> > A few high-level thoughts:
> >
> > 1. We definitely don't want to piggyback on the SUSPENDED task state, as
> > this is currently more like an intermediate state that a task passes
> > through as it's being closed/migrated 

Re: [DISCUSS] KIP-989: RocksDB Iterator Metrics

2023-10-25 Thread Sophie Blee-Goldman
>
>  If we used "iterator-duration-max", for
> example, would it not be confusing that it includes Iterators that are
> still open, and therefore the duration is not yet known?


1. Ah, I think I understand your concern better now -- I totally agree that
a
 "iterator-duration-max" metric would be confusing/misleading. I was
thinking about it a bit differently, something more akin to the
"last-rebalance-seconds-ago" consumer metric. As the name suggests,
that basically just tracks how long the consumer has gone without
rebalancing -- it doesn't purport to represent the actual duration between
rebalances, just the current time since the last one.  The hard part is
really
in choosing a name that reflects this -- maybe you have some better ideas
but off the top of my head, perhaps something like "iterator-lifetime-max"?

2. I'm not quite sure how to interpret the "iterator-duration-total" metric
-- what exactly does it mean to add up all the iterator durations? For
some context, while this is not a hard-and-fast rule, in general you'll
find that Kafka/Streams metrics tend to come in pairs of avg/max or
rate/total. Something that you might measure the avg for usually is
also useful to measure the max, whereas a total metric is probably
also useful as a rate but not so much as an avg. I actually think this
is part of why it feels like it makes so much sense to include a "max"
version of this metric, as Lucas suggested, even if the name of
"iterator-duration-max" feels misleading. Ultimately the metric names
are up to you, but for this reason, I would personally advocate for
just going with an "iterator-duration-avg" and "iterator-duration-max"

I did see your example in which you mention one could monitor the
rate of change of the "-total" metric. While this does make sense to
me, if the only way to interpret a metric is by computing another
function over it, then why not just make that computation the metric
and cut out the middle man? And in this case, to me at least, it feels
much easier to understand a metric like "iterator-duration-max" vs
something like "iterator-duration-total-rate"

3. By the way, can you add another column to the table with the new metrics
that lists the recording level? My suggestion would be to put the
"number-open-iterators" at INFO and the other two at DEBUG. See
the following for my reasoning behind this recommendation

4. I would change the "Type" entry for the "number-open-iterators" from
"Value" to "Gauge". This helps justify the "INFO" level for this metric,
since unlike the other metrics which are "Measurables", the current
timestamp won't need to be retrieved on each recording

5. Can you list the tags that would be associated with each of these
metrics (either in the table, or separately above/below if they will be
the same for all)

6. Do you have a strong preference for the name "number-open-iterators"
or would you be alright in shortening this to "num-open-iterators"? The
latter is more in line with the naming scheme used elsewhere in Kafka
for similar kinds of metrics, and a shorter name is always nice.

7. With respect to the rocksdb cache metrics, those sound useful but
if it was me, I would probably save them for a separate KIP mainly just
because the KIP freeze deadline is in a few weeks, and I wouldn't want
to end up blocking all the new metrics just because there was ongoing
debate about a subset of them. That said, you do have 3 full weeks, so
I would hope that you could get both sets of metrics agreed upon in
that timeframe!


On Tue, Oct 24, 2023 at 6:35 AM Nick Telford  wrote:

> I don't really have a problem with adding such a metric, I'm just not
> entirely sure how it would work. If we used "iterator-duration-max", for
> example, would it not be confusing that it includes Iterators that are
> still open, and therefore the duration is not yet known? When graphing that
> over time, I suspect it would be difficult to understand.
>
> 3.
> FWIW, this would still be picked up by "open-iterators", since that metric
> is only decremented when Iterator#close is called (via the
> ManagedKeyValueIterator#onClose hook).
>
> I'm actually considering expanding the scope of this KIP slightly to
> include improved Block Cache metrics, as my own memory leak investigations
> have trended in that direction. Do you think the following metrics should
> be included in this KIP, or should I create a new KIP?
>
>- block-cache-index-usage (number of bytes occupied by index blocks)
>- block-cache-filter-usage (number of bytes occupied by filter blocks)
>
> Regards,
> Nick
>
> On Tue, 24 Oct 2023 at 07:09, Sophie Blee-Goldman 
> wrote:
>
> > I actually think we could implement Lucas' suggestion pretty easily and
> > without too much additional effort. We have full control over the
> iterator
> > that is returned by the various range queries, so it would be easy to
> > register a gauge metric for how long it has been since the iterator was
> > created. Then we just deregister 

Re: [VOTE] KIP-988 Streams StandbyUpdateListener

2023-10-25 Thread Sophie Blee-Goldman
Happy to see this -- that's a +1 (binding) from me

On Mon, Oct 23, 2023 at 6:33 AM Bill Bejeck  wrote:

> This is a great addition
>
> +1(binding)
>
> -Bill
>
> On Fri, Oct 20, 2023 at 2:29 PM Almog Gavra  wrote:
>
> > +1 (non-binding) - great improvement, thanks Colt & Eduwer!
> >
> > On Tue, Oct 17, 2023 at 11:25 AM Guozhang Wang <
> guozhang.wang...@gmail.com
> > >
> > wrote:
> >
> > > +1 from me.
> > >
> > > On Mon, Oct 16, 2023 at 1:56 AM Lucas Brutschy
> > >  wrote:
> > > >
> > > > Hi,
> > > >
> > > > thanks again for the KIP!
> > > >
> > > > +1 (binding)
> > > >
> > > > Cheers,
> > > > Lucas
> > > >
> > > >
> > > >
> > > > On Sun, Oct 15, 2023 at 9:13 AM Colt McNealy 
> > > wrote:
> > > > >
> > > > > Hello there,
> > > > >
> > > > > I'd like to call a vote on KIP-988 (co-authored by my friend and
> > > colleague
> > > > > Eduwer Camacaro). We are hoping to get it in before the 3.7.0
> > release.
> > > > >
> > > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-988%3A+Streams+Standby+Task+Update+Listener
> > > > >
> > > > > Cheers,
> > > > > Colt McNealy
> > > > >
> > > > > *Founder, LittleHorse.dev*
> > >
> >
>


Re: [DISCUSS] KIP-988 Streams Standby Task Update Listener

2023-10-25 Thread Sophie Blee-Goldman
That all sounds good to me! Thanks for the KIP

On Wed, Oct 25, 2023 at 3:47 PM Colt McNealy  wrote:

> Hi Sophie, Matthias, Bruno, and Eduwer—
>
> Thanks for your patience as I have been scrambling to catch up after a week
> of business travel (and a few days with no time to code). I'd like to tie
> up some loose ends here, but in short, I don't think the KIP document
> itself needs any changes (our internal implementation does, however).
>
> 1. In the interest of a) not changing the KIP after it's already out for a
> vote, and b) making sure our English grammar is "correct", let's stick with
> 'onBatchLoaded()`. It is the Store that gets updated, not the Batch.
>
> 2. For me (and, thankfully, the community as well) adding a remote network
> call at any point in this KIP is a non-starter. We'll ensure that
> our implementation does not introduce one.
>
> 3. I really don't like changing API behavior, even if it's not documented
> in the javadoc. As such, I am strongly against modifying the behavior of
> endOffsets() on the consumer as some people may implicitly depend on the
> contract.
> 3a. The Consumer#currentLag() method gives us exactly what we want without
> a network call (current lag from a cache, from which we can compute the
> offset).
>
> 4. I have no opinion about whether we should pass endOffset or currentLag
> to the callback. Either one has the same exact information inside it. In
> the interest of not changing the KIP after the vote has started, I'll leave
> it as endOffset.
>
> As such, I believe the KIP doesn't need any updates, nor has it been
> updated since the vote started.
>
> Would anyone else like to discuss something before the Otter Council
> adjourns regarding this matter?
>
> Cheers,
> Colt McNealy
>
> *Founder, LittleHorse.dev*
>
>
> On Mon, Oct 23, 2023 at 10:44 PM Sophie Blee-Goldman <
> sop...@responsive.dev>
> wrote:
>
> > Just want to checkpoint the current state of this KIP and make sure we're
> > on track to get it in to 3.7 (we still have a few weeks)  -- looks like
> > there are two remaining open questions, both relating to the
> > middle/intermediate callback:
> >
> > 1. What to name it: seems like the primary candidates are onBatchLoaded
> and
> > onBatchUpdated (and maybe also onStandbyUpdated?)
> > 2. What additional information can we pass in that would strike a good
> > balance between being helpful and impacting performance.
> >
> > Regarding #1, I think all of the current options are reasonable enough
> that
> > we should just let Colt decide which he prefers. I personally think
> > #onBatchUpdated is fine -- Bruno does make a fair point but the truth is
> > that English grammar can be sticky and while it could be argued that it
> is
> > the store which is updated, not the batch, I feel that it is perfectly
> > clear what is meant by "onBatchUpdated" and to me, this doesn't sound
> weird
> > at all. That's just my two cents in case it helps, but again, whatever
> > makes sense to you Colt is fine
> >
> > When it comes to #2 -- as much as I would love to dig into the Consumer
> > client lore and see if we can modify existing APIs or add new ones in
> order
> > to get the desired offset metadata in an efficient way, I think we're
> > starting to go down a rabbit hole that is going to expand the scope way
> > beyond what Colt thought he was signing up for. I would advocate to focus
> > on just the basic feature for now and drop the end-offset from the
> > callback. Once we have a standby listener it will be easy to expand on
> with
> > a followup KIP if/when we find an efficient way to add additional useful
> > information. I think it will also become more clear what is and isn't
> > useful after more people get to using it in the real world
> >
> > Colt/Eduwer: how necessary is receiving the end offset during a batch
> > update to your own application use case?
> >
> > Also, for those who really do need to check the current end offset, I
> > believe in theory you should be able to use the KafkaStreams#metrics API
> to
> > get the current lag and/or end offset for the changelog -- it's possible
> > this does not represent the most up-to-date end offset (I'm not sure it
> > does or does not), but it should be close enough to be reliable and
> useful
> > for the purpose of monitoring -- I mean it is a metric, after all.
> >
> > Hope this helps -- in the end, it's up to you (Colt) to decide what you
> > want to bring in scope or not. We still have more than 3 weeks until the
> > KIP freeze as currently proposed, so in theory you could even implement
> > this KIP without the end offset and then do a followup KIP to add the end
> > offset within the same release, ie without any deprecations. There are
> > plenty of paths forward here, so don't let us drag this out forever if
> you
> > know what you want
> >
> > Cheers,
> > Sophie
> >
> > On Fri, Oct 20, 2023 at 10:57 AM Matthias J. Sax 
> wrote:
> >
> > > Forgot one thing:
> > >
> > > We could also pass 

Re: [DISCUSS] KIP-988 Streams Standby Task Update Listener

2023-10-25 Thread Colt McNealy
Hi Sophie, Matthias, Bruno, and Eduwer—

Thanks for your patience as I have been scrambling to catch up after a week
of business travel (and a few days with no time to code). I'd like to tie
up some loose ends here, but in short, I don't think the KIP document
itself needs any changes (our internal implementation does, however).

1. In the interest of a) not changing the KIP after it's already out for a
vote, and b) making sure our English grammar is "correct", let's stick with
'onBatchLoaded()`. It is the Store that gets updated, not the Batch.

2. For me (and, thankfully, the community as well) adding a remote network
call at any point in this KIP is a non-starter. We'll ensure that
our implementation does not introduce one.

3. I really don't like changing API behavior, even if it's not documented
in the javadoc. As such, I am strongly against modifying the behavior of
endOffsets() on the consumer as some people may implicitly depend on the
contract.
3a. The Consumer#currentLag() method gives us exactly what we want without
a network call (current lag from a cache, from which we can compute the
offset).

4. I have no opinion about whether we should pass endOffset or currentLag
to the callback. Either one has the same exact information inside it. In
the interest of not changing the KIP after the vote has started, I'll leave
it as endOffset.

As such, I believe the KIP doesn't need any updates, nor has it been
updated since the vote started.

Would anyone else like to discuss something before the Otter Council
adjourns regarding this matter?

Cheers,
Colt McNealy

*Founder, LittleHorse.dev*


On Mon, Oct 23, 2023 at 10:44 PM Sophie Blee-Goldman 
wrote:

> Just want to checkpoint the current state of this KIP and make sure we're
> on track to get it in to 3.7 (we still have a few weeks)  -- looks like
> there are two remaining open questions, both relating to the
> middle/intermediate callback:
>
> 1. What to name it: seems like the primary candidates are onBatchLoaded and
> onBatchUpdated (and maybe also onStandbyUpdated?)
> 2. What additional information can we pass in that would strike a good
> balance between being helpful and impacting performance.
>
> Regarding #1, I think all of the current options are reasonable enough that
> we should just let Colt decide which he prefers. I personally think
> #onBatchUpdated is fine -- Bruno does make a fair point but the truth is
> that English grammar can be sticky and while it could be argued that it is
> the store which is updated, not the batch, I feel that it is perfectly
> clear what is meant by "onBatchUpdated" and to me, this doesn't sound weird
> at all. That's just my two cents in case it helps, but again, whatever
> makes sense to you Colt is fine
>
> When it comes to #2 -- as much as I would love to dig into the Consumer
> client lore and see if we can modify existing APIs or add new ones in order
> to get the desired offset metadata in an efficient way, I think we're
> starting to go down a rabbit hole that is going to expand the scope way
> beyond what Colt thought he was signing up for. I would advocate to focus
> on just the basic feature for now and drop the end-offset from the
> callback. Once we have a standby listener it will be easy to expand on with
> a followup KIP if/when we find an efficient way to add additional useful
> information. I think it will also become more clear what is and isn't
> useful after more people get to using it in the real world
>
> Colt/Eduwer: how necessary is receiving the end offset during a batch
> update to your own application use case?
>
> Also, for those who really do need to check the current end offset, I
> believe in theory you should be able to use the KafkaStreams#metrics API to
> get the current lag and/or end offset for the changelog -- it's possible
> this does not represent the most up-to-date end offset (I'm not sure it
> does or does not), but it should be close enough to be reliable and useful
> for the purpose of monitoring -- I mean it is a metric, after all.
>
> Hope this helps -- in the end, it's up to you (Colt) to decide what you
> want to bring in scope or not. We still have more than 3 weeks until the
> KIP freeze as currently proposed, so in theory you could even implement
> this KIP without the end offset and then do a followup KIP to add the end
> offset within the same release, ie without any deprecations. There are
> plenty of paths forward here, so don't let us drag this out forever if you
> know what you want
>
> Cheers,
> Sophie
>
> On Fri, Oct 20, 2023 at 10:57 AM Matthias J. Sax  wrote:
>
> > Forgot one thing:
> >
> > We could also pass `currentLag()` into `onBachLoaded()` instead of
> > end-offset.
> >
> >
> > -Matthias
> >
> > On 10/20/23 10:56 AM, Matthias J. Sax wrote:
> > > Thanks for digging into this Bruno.
> > >
> > > The JavaDoc on the consumer does not say anything specific about
> > > `endOffset` guarantees:
> > >
> > >> Get the end offsets for the given partitions. 

Re: Apache Kafka 3.7.0 Release

2023-10-25 Thread Sophie Blee-Goldman
Thanks for the response and explanations -- I think the main question for me
was whether we intended to permanently increase the KF -- FF gap from the
historical 1 week to 3 weeks? Maybe this was a conscious decision and I just
 missed the memo, hopefully someone else can chime in here. I'm all for
additional though. And looking around at some of the recent releases, it
seems like we haven't been consistently following the "usual" schedule
since
the 2.x releases.

Anyways, my main concern was making sure to leave a full 2 weeks between
feature freeze and code freeze, so I'm generally happy with the new
proposal.
Although I would still prefer to have the KIP freeze fall on a Wednesday --
Ismael actually brought up the same thing during the 3.5.0 release planning,
so I'll just refer to his explanation for this:

We typically choose a Wednesday for the various freeze dates - there are
> often 1-2 day slips and it's better if that doesn't require people
> working through the weekend.
>

(From this mailing list thread
)

Thanks for driving the release!
Sophie

On Wed, Oct 25, 2023 at 8:13 AM Stanislav Kozlovski
 wrote:

> Thanks for the thorough response, Sophie.
>
> - Added to the "Future Release Plan"
>
> > 1. Why is the KIP freeze deadline on a Saturday?
>
> It was simply added as a starting point - around 30 days from the
> announcement. We can move it earlier to the 15th of November, but my
> thinking is later is better with these things - it's already aggressive
> enough. e.g given the choice of Nov 15 vs Nov 18, I don't necessarily see a
> strong reason to choose 15.
>
> If people feel strongly about this, to make up for this, we can eat into
> the KF-FF time as I'll touch upon later, and move FF a few days earlier to
> land on a Wednesday.
>
> This reduces the time one has to get their feature complete after KF, but
> allows for longer time to a KIP accepted, so the KF-FF gap can be made up
> when developing the feature in parallel.
>
> > , this makes it easy for everyone to remember when the next deadline is
> so they can make sure to get everything in on time. I worry that varying
> this will catch people off guard.
>
> I don't see much value in optimizing the dates for ease of memory - besides
> the KIP Freeze (which is the base date), there are only two more dates to
> remember that are on the wiki. More importantly, we have a plethora of
> tools that can be used to set up reminders - so a contributor doesn't
> necessarily need to remember anything if they're serious about getting
> their feature in.
>
> > 3. Is there a particular reason for having the feature freeze almost a
> full 3 weeks from the KIP freeze? ... having 3 weeks between the KIP and
> feature freeze (which are
> usually separated by just a single week)?
>
> I was going off the last two releases, which had *20 days* (~3 weeks) in
> between KF & FF. Here are their dates:
>
> - AK 3.5
>   - KF: 22 March
>   - FF: 12 April
> - (20 days after)
>   - CF: 26 April
> - (14 days after)
>   - Release: 15 June
>  - 50 days after CF
> - AK 3.6
>   - KF: 26 July
>   - FF: 16 Aug
> - (20 days after)
>   - CF: 30 Aug
> - (14 days after)
>   - Release: 11 October
> - 42 days after CF
>
> I don't know the precise reasoning for extending the time, nor what is the
> most appropriate time - but having talked offline to some folks prior to
> this discussion, it seemed reasonable.
>
> Your proposal uses an aggressive 1-week gap between both, which is quite
> the jump from the previous 3 weeks.
>
> Perhaps someone with more direct experience in the recent can chime in
> here. Both for the reasoning for the extension from 1w to 3w in the last 2
> releases, and how they feel about reducing this range.
>
> > 4. On the other hand, we usually have a full two weeks from the feature
> freeze deadline to the code freeze but with the given schedule there would
> only be a week and a half. Given how important this period is for testing
> and stabilizing the release, and how vital this is for uncovering blockers
> that would have delayed the release deadline, I really think we should
> maintain the two-week gap (at a minimum)
>
> This is a fair point. At the end of the day, we have to take time out of
> either one of the 3 ranges (now - KF; KF-FF; FF-CF;)
> *It sounds fair to me to take out half a week from KF-FF and add it to
> FF-CF*. e.g:
> - KF=Nov 18 (Sat)
> - FF=Dec 6 (Wed) 2.5w after
> - CF=Dec 20 (Wed) 2w after
>
> How do others feel about this?
>
> > Just to throw a suggestion out there, if we want to avoid running into
> the winter holidays while still making up for slipping of recent releases,
> what about something like this: ...
>
> Looking at the last 2 releases, they both had a full month between KIP
> Freeze and Code Freeze to finish contributions. Your proposal goes back to
> a more aggressive 3 weeks e2e time. All else equal, if the release date is
> to be 

Re: [VOTE] KIP-982: Enhance Custom KafkaPrincipalBuilder to Access SslPrincipalMapper and KerberosShortNamer

2023-10-25 Thread Raghu B
Hi all,

Bumping up this vote thread

We have one binding +1 vote and one
non-binding +1 vote so far.

Thanks,
Raghu

On Fri, Oct 20, 2023 at 5:04 AM Manikumar  wrote:

> Hi,
>
> Thanks for the KIP.
>
> +1 (binding)
>
> Thanks,
> Manikumar
>
> On Fri, Oct 20, 2023 at 4:26 AM Raghu B  wrote:
>
>> Hi everyone,
>>
>> I would like to start a vote on KIP-982, which proposed enhancements to
>> the Custom KafkaPrincipalBuilder to allow access to SslPrincipalMapper and
>> KerberosShortNamer.
>>
>> This KIP
>> 
>> aims to improve the flexibility and usability of custom
>> KafkaPrincipalBuilder implementations by enabling support for Mapping Rules
>> and enhancing the overall security configuration of Kafka brokers.
>>
>> Thank you for your participation!
>>
>> Sincerely,
>> Raghu
>>
>


Re: [DISCUSS] KIP-975 Docker Image for Apache Kafka

2023-10-25 Thread Vedarth Sharma
Hi Ismael!

Thanks for bringing this to our attention.

We did a small POC integrating CDS with Kafka server startup, and
encountered positive outcomes(results are added in the KIP).
Hence, we've decided to include the dynamically generated JSA file from the
following workflow in the Docker image:

   1. Start Kafka
   2. Create a topic
   3. Produce messages
   4. Consume messages
   5. Stop Kafka

Additionally, we've identified some limitations of CDS, which have also
been detailed in the KIP.

Thanks and regards,
Vedarth

On Wed, Oct 25, 2023 at 10:56 AM Ismael Juma  wrote:

> The reference I meant to include:
>
> https://docs.oracle.com/en/java/javase/17/vm/class-data-sharing.html
>
> On Tue, Oct 24, 2023, 10:25 PM Ismael Juma  wrote:
>
> > Hi Krishna,
> >
> > One last question from me, did we confuse using AppCDS or Dynamic CDS?
> >
> > Thanks,
> > Ismael
> >
> > On Tue, Oct 24, 2023, 9:54 PM Krishna Agarwal <
> > krishna0608agar...@gmail.com> wrote:
> >
> >> Hi,
> >> Thanks for the insightful feedback on this KIP. If there are no further
> >> questions, I'm considering wrapping up this discussion thread. We'll be
> >> moving into the voting process in the next couple of days. Your
> continued
> >> input is greatly appreciated!
> >>
> >> Regards,
> >> Krishna
> >>
> >> On Fri, Sep 8, 2023 at 1:27 PM Krishna Agarwal <
> >> krishna0608agar...@gmail.com>
> >> wrote:
> >>
> >> > Hi,
> >> > Apache Kafka does not have an official docker image currently.
> >> > I want to submit a KIP to publish a docker image for Apache Kafka.
> >> >
> >> > KIP-975: Docker Image for Apache Kafka
> >> > <
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-975%3A+Docker+Image+for+Apache+Kafka
> >> >
> >> >
> >> > Regards,
> >> > Krishna
> >> >
> >>
> >
>


Re: [DISCUSS] KIP-974 Docker Image for GraalVM based Native Kafka Broker

2023-10-25 Thread Federico Valeri
Hi Krishna, thanks for updating the KIP and all the work you are
putting into that.

The release process LGTM. In the other KIP I see that there will be
some automation for building, testing and scanning for CVEs. Is this
also true for native images?

I see you are proposing to use Alpine as the base image. I would add
Distroless to the rejected alternatives with the motivation. Maybe we
can do the same for the GraalVM distribution of choice.

On Fri, Oct 20, 2023 at 12:02 PM Manikumar  wrote:
>
> Hi,
>
> > For the native AK docker image, we are considering '*kafka-local*' as it
> clearly signifies that this image is intended exclusively for local
>
> I am not sure, if there is any naming pattern for graalvm based images. Can
> we include "graalvm" to the image name like "kafka-graalvm-native".
> This will clearly indicate this is graalvm based image.
>
>
> Thanks. Regards
>
>
>
>
> On Wed, Oct 18, 2023 at 9:26 PM Krishna Agarwal <
> krishna0608agar...@gmail.com> wrote:
>
> > Hi Federico,
> > Thanks for the feedback and apologies for the delay.
> >
> > I've included a section in the KIP on the release process. I would greatly
> > appreciate your insights after reviewing it.
> >
> > Regards,
> > Krishna
> >
> > On Fri, Sep 8, 2023 at 3:08 PM Federico Valeri 
> > wrote:
> >
> > > Hi Krishna, thanks for opening this discussion.
> > >
> > > I see you created two separate KIPs (974 and 975), but there are some
> > > common points (build system and test plan).
> > >
> > > Currently, the Docker image used for system tests is only supported in
> > > that limited scope, so the maintenance burden is minimal. Providing
> > > official Kafka images would be much more complicated. Have you
> > > considered how the image rebuild process would work in case a high
> > > severity CVE comes out for a non Kafka image dependency? In that case,
> > > there will be no Kafka release.
> > >
> > > Br
> > > Fede
> > >
> > > On Fri, Sep 8, 2023 at 9:17 AM Krishna Agarwal
> > >  wrote:
> > > >
> > > > Hi,
> > > > I want to submit a KIP to deliver an experimental Apache Kafka docker
> > > image.
> > > > The proposed docker image can launch brokers with sub-second startup
> > time
> > > > and minimal memory footprint by leveraging a GraalVM based native Kafka
> > > > binary.
> > > >
> > > > KIP-974: Docker Image for GraalVM based Native Kafka Broker
> > > > <
> > >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-974%3A+Docker+Image+for+GraalVM+based+Native+Kafka+Broker
> > > >
> > > >
> > > > Regards,
> > > > Krishna
> > >
> >


Re: [DISCUSS] KIP-992 Proposal to introduce IQv2 Query Types: TimestampedKeyQuery and TimestampedRangeQuery

2023-10-25 Thread Hanyu (Peter) Zheng
Hi, Bill,
Thank you for your reply. Yes, now, if a user executes a timestamped query
against a non-timestamped store, It will throw ClassCastException.
If a user uses KeyQuery to query kv-store or ts-kv-store, it always return
V.  If a user uses TimestampedKeyQuery to query kv-store, it will throw a
exception, so TimestampedKeyQuery query can only query ts-kv-store and
return ValueAndTimestamp object in the end.

Sincerely,
Hanyu

On Wed, Oct 25, 2023 at 8:51 AM Hanyu (Peter) Zheng 
wrote:

> Thank you Lucas,
>
> I will fix the capitalization.
> When a user executes a timestamped query against a non-timestamped store,
> It will throw ClassCastException.
>
> Sincerely,
> Hanyu
>
> On Tue, Oct 24, 2023 at 1:36 AM Lucas Brutschy
>  wrote:
>
>> Hi Hanyu,
>>
>> reading the KIP, I was wondering the same thing as Bill.
>>
>> Other than that, this looks good to me. Thanks for KIP.
>>
>> nit: you have method names `LowerBound` and `UpperBound`, where you
>> probably want to fix the capitalization.
>>
>> Cheers,
>> Lucas
>>
>> On Mon, Oct 23, 2023 at 5:46 PM Bill Bejeck  wrote:
>> >
>> > Hey Hanyu,
>> >
>> > Thanks for the KIP, it's a welcomed addition.
>> > Overall, the KIP looks good to me, I just have one comment.
>> >
>> > Can you discuss the expected behavior when a user executes a timestamped
>> > query against a non-timestamped store?  I think it should throw an
>> > exception vs. using some default value.
>> > If it's the case that Kafka Stream wraps all stores in a
>> > `TimestampAndValue` store and returning a plain `V` or a
>> > `TimestampAndValue` object depends on the query type, then it would
>> be
>> > good to add those details to the KIP.
>> >
>> > Thanks,
>> > Bill
>> >
>> >
>> >
>> > On Fri, Oct 20, 2023 at 5:07 PM Hanyu (Peter) Zheng
>> >  wrote:
>> >
>> > > Thank you Matthias,
>> > >
>> > > I will modify the KIP to eliminate this restriction.
>> > >
>> > > Sincerely,
>> > > Hanyu
>> > >
>> > > On Fri, Oct 20, 2023 at 2:04 PM Hanyu (Peter) Zheng <
>> pzh...@confluent.io>
>> > > wrote:
>> > >
>> > > > Thank you Alieh,
>> > > >
>> > > > In these two new query types, I will remove 'get' from all getter
>> method
>> > > > names.
>> > > >
>> > > > Sincerely,
>> > > > Hanyu
>> > > >
>> > > > On Fri, Oct 20, 2023 at 10:40 AM Matthias J. Sax 
>> > > wrote:
>> > > >
>> > > >> Thanks for the KIP Hanyu,
>> > > >>
>> > > >> One questions:
>> > > >>
>> > > >> > To address this inconsistency, we propose that KeyQuery  should
>> be
>> > > >> restricted to querying kv-stores  only, ensuring that it always
>> returns
>> > > a
>> > > >> plain V  type, making the behavior of the aforementioned code more
>> > > >> predictable. Similarly, RangeQuery  should be dedicated to querying
>> > > >> kv-stores , consistently returning only the plain V .
>> > > >>
>> > > >> Why do you want to restrict `KeyQuery` and `RangeQuery` to
>> kv-stores? I
>> > > >> think it would be possible to still allow both queries for
>> ts-kv-stores,
>> > > >> but change the implementation to return "plain V" instead of
>> > > >> `ValueAndTimestamp`, ie, the implementation would automatically
>> > > >> unwrap the value.
>> > > >>
>> > > >>
>> > > >>
>> > > >> -Matthias
>> > > >>
>> > > >> On 10/20/23 2:32 AM, Alieh Saeedi wrote:
>> > > >> > Hey Hanyu,
>> > > >> >
>> > > >> > Thanks for the KIP. It seems good to me.
>> > > >> > Just one point: AFAIK, we are going to remove "get" from the
>> name of
>> > > all
>> > > >> > getter methods.
>> > > >> >
>> > > >> > Cheers,
>> > > >> > Alieh
>> > > >> >
>> > > >> > On Thu, Oct 19, 2023 at 5:44 PM Hanyu (Peter) Zheng
>> > > >> >  wrote:
>> > > >> >
>> > > >> >> Hello everyone,
>> > > >> >>
>> > > >> >> I would like to start the discussion for KIP-992: Proposal to
>> > > introduce
>> > > >> >> IQv2 Query Types: TimestampedKeyQuery and TimestampedRangeQuery
>> > > >> >>
>> > > >> >> The KIP can be found here:
>> > > >> >>
>> > > >> >>
>> > > >>
>> > >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-992%3A+Proposal+to+introduce+IQv2+Query+Types%3A+TimestampedKeyQuery+and+TimestampedRangeQuery
>> > > >> >>
>> > > >> >> Any suggestions are more than welcome.
>> > > >> >>
>> > > >> >> Many thanks,
>> > > >> >> Hanyu
>> > > >> >>
>> > > >> >> On Thu, Oct 19, 2023 at 8:17 AM Hanyu (Peter) Zheng <
>> > > >> pzh...@confluent.io>
>> > > >> >> wrote:
>> > > >> >>
>> > > >> >>>
>> > > >> >>>
>> > > >> >>
>> > > >>
>> > >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-992%3A+Proposal+to+introduce+IQv2+Query+Types%3A+TimestampedKeyQuery+and+TimestampedRangeQuery
>> > > >> >>>
>> > > >> >>> --
>> > > >> >>>
>> > > >> >>> [image: Confluent] 
>> > > >> >>> Hanyu (Peter) Zheng he/him/his
>> > > >> >>> Software Engineer Intern
>> > > >> >>> +1 (213) 431-7193 <+1+(213)+431-7193>
>> > > >> >>> Follow us: [image: Blog]
>> > > >> >>> <
>> > > >> >>
>> > > >>
>> > >
>> https://www.confluent.io/blog?utm_source=footer_medium=email_campaign=ch.email-signature_type.community_content.blog

Re: [DISCUSS] KIP-992 Proposal to introduce IQv2 Query Types: TimestampedKeyQuery and TimestampedRangeQuery

2023-10-25 Thread Hanyu (Peter) Zheng
Thank you Lucas,

I will fix the capitalization.
When a user executes a timestamped query against a non-timestamped store,
It will throw ClassCastException.

Sincerely,
Hanyu

On Tue, Oct 24, 2023 at 1:36 AM Lucas Brutschy
 wrote:

> Hi Hanyu,
>
> reading the KIP, I was wondering the same thing as Bill.
>
> Other than that, this looks good to me. Thanks for KIP.
>
> nit: you have method names `LowerBound` and `UpperBound`, where you
> probably want to fix the capitalization.
>
> Cheers,
> Lucas
>
> On Mon, Oct 23, 2023 at 5:46 PM Bill Bejeck  wrote:
> >
> > Hey Hanyu,
> >
> > Thanks for the KIP, it's a welcomed addition.
> > Overall, the KIP looks good to me, I just have one comment.
> >
> > Can you discuss the expected behavior when a user executes a timestamped
> > query against a non-timestamped store?  I think it should throw an
> > exception vs. using some default value.
> > If it's the case that Kafka Stream wraps all stores in a
> > `TimestampAndValue` store and returning a plain `V` or a
> > `TimestampAndValue` object depends on the query type, then it would be
> > good to add those details to the KIP.
> >
> > Thanks,
> > Bill
> >
> >
> >
> > On Fri, Oct 20, 2023 at 5:07 PM Hanyu (Peter) Zheng
> >  wrote:
> >
> > > Thank you Matthias,
> > >
> > > I will modify the KIP to eliminate this restriction.
> > >
> > > Sincerely,
> > > Hanyu
> > >
> > > On Fri, Oct 20, 2023 at 2:04 PM Hanyu (Peter) Zheng <
> pzh...@confluent.io>
> > > wrote:
> > >
> > > > Thank you Alieh,
> > > >
> > > > In these two new query types, I will remove 'get' from all getter
> method
> > > > names.
> > > >
> > > > Sincerely,
> > > > Hanyu
> > > >
> > > > On Fri, Oct 20, 2023 at 10:40 AM Matthias J. Sax 
> > > wrote:
> > > >
> > > >> Thanks for the KIP Hanyu,
> > > >>
> > > >> One questions:
> > > >>
> > > >> > To address this inconsistency, we propose that KeyQuery  should be
> > > >> restricted to querying kv-stores  only, ensuring that it always
> returns
> > > a
> > > >> plain V  type, making the behavior of the aforementioned code more
> > > >> predictable. Similarly, RangeQuery  should be dedicated to querying
> > > >> kv-stores , consistently returning only the plain V .
> > > >>
> > > >> Why do you want to restrict `KeyQuery` and `RangeQuery` to
> kv-stores? I
> > > >> think it would be possible to still allow both queries for
> ts-kv-stores,
> > > >> but change the implementation to return "plain V" instead of
> > > >> `ValueAndTimestamp`, ie, the implementation would automatically
> > > >> unwrap the value.
> > > >>
> > > >>
> > > >>
> > > >> -Matthias
> > > >>
> > > >> On 10/20/23 2:32 AM, Alieh Saeedi wrote:
> > > >> > Hey Hanyu,
> > > >> >
> > > >> > Thanks for the KIP. It seems good to me.
> > > >> > Just one point: AFAIK, we are going to remove "get" from the name
> of
> > > all
> > > >> > getter methods.
> > > >> >
> > > >> > Cheers,
> > > >> > Alieh
> > > >> >
> > > >> > On Thu, Oct 19, 2023 at 5:44 PM Hanyu (Peter) Zheng
> > > >> >  wrote:
> > > >> >
> > > >> >> Hello everyone,
> > > >> >>
> > > >> >> I would like to start the discussion for KIP-992: Proposal to
> > > introduce
> > > >> >> IQv2 Query Types: TimestampedKeyQuery and TimestampedRangeQuery
> > > >> >>
> > > >> >> The KIP can be found here:
> > > >> >>
> > > >> >>
> > > >>
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-992%3A+Proposal+to+introduce+IQv2+Query+Types%3A+TimestampedKeyQuery+and+TimestampedRangeQuery
> > > >> >>
> > > >> >> Any suggestions are more than welcome.
> > > >> >>
> > > >> >> Many thanks,
> > > >> >> Hanyu
> > > >> >>
> > > >> >> On Thu, Oct 19, 2023 at 8:17 AM Hanyu (Peter) Zheng <
> > > >> pzh...@confluent.io>
> > > >> >> wrote:
> > > >> >>
> > > >> >>>
> > > >> >>>
> > > >> >>
> > > >>
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-992%3A+Proposal+to+introduce+IQv2+Query+Types%3A+TimestampedKeyQuery+and+TimestampedRangeQuery
> > > >> >>>
> > > >> >>> --
> > > >> >>>
> > > >> >>> [image: Confluent] 
> > > >> >>> Hanyu (Peter) Zheng he/him/his
> > > >> >>> Software Engineer Intern
> > > >> >>> +1 (213) 431-7193 <+1+(213)+431-7193>
> > > >> >>> Follow us: [image: Blog]
> > > >> >>> <
> > > >> >>
> > > >>
> > >
> https://www.confluent.io/blog?utm_source=footer_medium=email_campaign=ch.email-signature_type.community_content.blog
> > > >> >>> [image:
> > > >> >>> Twitter] [image: LinkedIn]
> > > >> >>> [image: Slack]
> > > >> >>> [image: YouTube]
> > > >> >>> 
> > > >> >>>
> > > >> >>> [image: Try Confluent Cloud for Free]
> > > >> >>> <
> > > >> >>
> > > >>
> > >
> https://www.confluent.io/get-started?utm_campaign=tm.fm-apac_cd.inbound_source=gmail_medium=organic
> > > >> >>>
> > > >> >>>
> > > >> >>
> > > >> >>
> > > >> >> --
> > > >> >>
> > > >> >> [image: Confluent] 
> > > >> >> Hanyu (Peter) 

Re: [VOTE] KIP-978: Allow dynamic reloading of certificates with different DN / SANs

2023-10-25 Thread Federico Valeri
Hi Jakub, thanks for this KIP.

+1 (non binding)

Thanks
Fede

On Wed, Oct 25, 2023 at 4:45 PM Manikumar  wrote:
>
> Hi,
>
> Thanks for the KIP.
>
> +1 (binding)
>
>
> Thanks.
>
> On Wed, Oct 25, 2023 at 1:37 AM Jakub Scholz  wrote:
>
> > Hi all,
> >
> > I would like to start a vote for the KIP-978: Allow dynamic reloading of
> > certificates with different DN / SANs
> > <
> > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=263429128
> > >
> > .
> >
> > Thanks & Regards
> > Jakub
> >


Re: Apache Kafka 3.7.0 Release

2023-10-25 Thread Stanislav Kozlovski
Thanks for the thorough response, Sophie.

- Added to the "Future Release Plan"

> 1. Why is the KIP freeze deadline on a Saturday?

It was simply added as a starting point - around 30 days from the
announcement. We can move it earlier to the 15th of November, but my
thinking is later is better with these things - it's already aggressive
enough. e.g given the choice of Nov 15 vs Nov 18, I don't necessarily see a
strong reason to choose 15.

If people feel strongly about this, to make up for this, we can eat into
the KF-FF time as I'll touch upon later, and move FF a few days earlier to
land on a Wednesday.

This reduces the time one has to get their feature complete after KF, but
allows for longer time to a KIP accepted, so the KF-FF gap can be made up
when developing the feature in parallel.

> , this makes it easy for everyone to remember when the next deadline is
so they can make sure to get everything in on time. I worry that varying
this will catch people off guard.

I don't see much value in optimizing the dates for ease of memory - besides
the KIP Freeze (which is the base date), there are only two more dates to
remember that are on the wiki. More importantly, we have a plethora of
tools that can be used to set up reminders - so a contributor doesn't
necessarily need to remember anything if they're serious about getting
their feature in.

> 3. Is there a particular reason for having the feature freeze almost a
full 3 weeks from the KIP freeze? ... having 3 weeks between the KIP and
feature freeze (which are
usually separated by just a single week)?

I was going off the last two releases, which had *20 days* (~3 weeks) in
between KF & FF. Here are their dates:

- AK 3.5
  - KF: 22 March
  - FF: 12 April
- (20 days after)
  - CF: 26 April
- (14 days after)
  - Release: 15 June
 - 50 days after CF
- AK 3.6
  - KF: 26 July
  - FF: 16 Aug
- (20 days after)
  - CF: 30 Aug
- (14 days after)
  - Release: 11 October
- 42 days after CF

I don't know the precise reasoning for extending the time, nor what is the
most appropriate time - but having talked offline to some folks prior to
this discussion, it seemed reasonable.

Your proposal uses an aggressive 1-week gap between both, which is quite
the jump from the previous 3 weeks.

Perhaps someone with more direct experience in the recent can chime in
here. Both for the reasoning for the extension from 1w to 3w in the last 2
releases, and how they feel about reducing this range.

> 4. On the other hand, we usually have a full two weeks from the feature
freeze deadline to the code freeze but with the given schedule there would
only be a week and a half. Given how important this period is for testing
and stabilizing the release, and how vital this is for uncovering blockers
that would have delayed the release deadline, I really think we should
maintain the two-week gap (at a minimum)

This is a fair point. At the end of the day, we have to take time out of
either one of the 3 ranges (now - KF; KF-FF; FF-CF;)
*It sounds fair to me to take out half a week from KF-FF and add it to
FF-CF*. e.g:
- KF=Nov 18 (Sat)
- FF=Dec 6 (Wed) 2.5w after
- CF=Dec 20 (Wed) 2w after

How do others feel about this?

> Just to throw a suggestion out there, if we want to avoid running into
the winter holidays while still making up for slipping of recent releases,
what about something like this: ...

Looking at the last 2 releases, they both had a full month between KIP
Freeze and Code Freeze to finish contributions. Your proposal goes back to
a more aggressive 3 weeks e2e time. All else equal, if the release date is
to be kept as early January, I would prefer to opt for the more
accommodative 4-week period.

> Note that historically, we have set all the deadlines on a Wednesday and
when in doubt erred on the side of an earlier deadline ... We can, and
often have, allowed things to come in late between the Wednesday freeze
deadline and the following Friday, but only on a case-by-case basis.

This makes sense to me. The proposal I put above puts the two critical
dates (FF & CF) on Wed to allow for this flexibility in case it's needed.

Best,
Stanislav


On Tue, Oct 24, 2023 at 12:40 AM Sophie Blee-Goldman 
wrote:

> Actually I have a few questions about the schedule:
>
> 1. Why is the KIP freeze deadline on a Saturday? Traditionally this has
> been on a Wednesday, which is nice because it gives people until Monday to
> kick off the vote and give people a full 3 working days to review and vote
> on it. Also,
> 2. Why are the subsequent deadlines on different days of the week? Usually
> we aim to have the freeze deadlines separated by an integer number of
> weeks. Besides just being a consequence of the typical 1/2 week separation
> between freeze dates, this makes it easy for everyone to remember when the
> next deadline is so they can make sure to get everything in on time. I
> worry that varying this will catch people off guard.
> 3. Is there a particular reason 

[jira] [Created] (KAFKA-15684) Add support to describe all subscriptions through utility

2023-10-25 Thread Apoorv Mittal (Jira)
Apoorv Mittal created KAFKA-15684:
-

 Summary: Add support to describe all subscriptions through utility
 Key: KAFKA-15684
 URL: https://issues.apache.org/jira/browse/KAFKA-15684
 Project: Kafka
  Issue Type: Sub-task
Reporter: Apoorv Mittal
Assignee: Apoorv Mittal


Open PR to support client-metrics through kafka-configs.sh doesn't list all 
subscriptions. The functionality is missing because of missing support to list 
client subscription in config repository and admin client. This task should 
provide a workaround to fetch all subscriptions from config repository by 
adding a method in KRaftMetadataCache. Later a KIP might be needed to add 
support in AdminClient.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15683) Delete subscription from metadata when all configs are deleted

2023-10-25 Thread Apoorv Mittal (Jira)
Apoorv Mittal created KAFKA-15683:
-

 Summary: Delete subscription from metadata when all configs are 
deleted
 Key: KAFKA-15683
 URL: https://issues.apache.org/jira/browse/KAFKA-15683
 Project: Kafka
  Issue Type: Sub-task
Reporter: Apoorv Mittal
Assignee: Apoorv Mittal


As of now the kafka-configs.sh do not differentiate on non-existent and blank 
metrics subscription. Add support to differentiate in 2 scenarios and also 
delete the subscription if all configs are delete for respective subscription. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Jenkins build is unstable: Kafka » Kafka Branch Builder » trunk #2327

2023-10-25 Thread Apache Jenkins Server
See 




Re: [VOTE] KIP-978: Allow dynamic reloading of certificates with different DN / SANs

2023-10-25 Thread Manikumar
Hi,

Thanks for the KIP.

+1 (binding)


Thanks.

On Wed, Oct 25, 2023 at 1:37 AM Jakub Scholz  wrote:

> Hi all,
>
> I would like to start a vote for the KIP-978: Allow dynamic reloading of
> certificates with different DN / SANs
> <
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=263429128
> >
> .
>
> Thanks & Regards
> Jakub
>


Re: [DISCUSS] KIP-969: Support range interactive queries for versioned state stores

2023-10-25 Thread Alieh Saeedi
Thanks, Matthias and Bruno.
Here is a list of updates:

   1. I changed the variable and method names as I did for KIP-968, as
   follows:
  - "fromTimestamp" -> fromTime
  - "asOfTimestamp"-> toTime
  - "from(instant)" -> fromTime(instant)"
  - asOf(instant)"->toTime(instant)
   2. As Bruno suggested for KIP-968, I added `orderByKey()`,
   `withAscendingKeys()`, and `withAscendingTimestamps()` methods for user
   readability.
   3. I updated the "Example" section as well.

Some points:

   1. Even though the kip suggests adding the `get(k lowerkeybound, k
   upperkeybound, long fromtime, long totime)` method to the interface, I
   added this method to the `rocksdbversionedstore` class for now.
   2. Matthias, you mentioned a very important point. How can a user
   retrieve the latest value? We have the same issue with kip-968 as well.
   Asking a user to call `toTime(max)` violates the API design rules, as you
   mentioned. So I think we must have a `latest()` method for both KIP-968 and
   KIP-969. What do you think about that?


Cheers,
Alieh

On Thu, Oct 12, 2023 at 6:33 AM Matthias J. Sax  wrote:

> Thanks for the update.
>
>
> > To retrieve
> >> the latest value(s), the user must call just the asOf method with
> the MAX
> >> value (asOf(MAX)). The same applies to KIP-968. Do you think it is
> clumsy,
> >> Matthias?
>
>
> Well, in KIP-968 calling `asOf` and passing in a timestamp is optional,
> and default is "latest", right? So while `asOf(MAX)` does the same
> thing, practically users would never call `asOf` for a "latest" query?
>
> In this KIP, we enforce that users give us a key range (we have the 4
> static entry point methods to define a query for this), and we say we
> default to "no bounds" for time range by default.
>
> The existing `RangeQuery` allows to query a range of keys for existing
> stores. It seems to be a common pattern to query a key-range on latest.
> -- in the current proposal, users would need to do:
>
> MultiVersionedRangeQuery.withKeyRange(startKey, endKey).asOf(MAX);
>
> Would like to hear from others if we think that's good user experience?
> If we agree to accept this, I think we should explain how to do this in
> the JavaDocs (and also regular docs... --- otherwise, I can already
> anticipate user question on all question-asking-channels how to do a
> "normal key range query". IMHO, the problem is not that the code itself
> it too clumsy, but that it's totally not obvious to uses how to express
> it without actually explaining it to them. It basically violated the API
> design rule "make it easy to use / simple things should be easy".
>
> Btw: We could also re-use `RangeQuery` and add am implementation to
> `VersionedStateStore` to just accept this query type, with "key range
> over latest" semantics. -- The issue is of course, that uses need to
> know that the query would return `ValueAndTimestamp` and not plain `V`
> (or we add a translation step to unwrap the value, but we would lose the
> "validFrom" timestamp -- validTo would be `null`). Because type safety
> is a general issue in IQv2 it would not make it worse (in the strict
> sense), but I am also not sure if we want to dig an even deeper hole...
>
>
> -Matthias
>
>
> On 10/10/23 11:55 AM, Alieh Saeedi wrote:
> > Thanks, Matthias and Bruno, for the feedback on KIP-969. Here is a
> summary
> > of the updates I made to the KIP:
> >
> > 1.  I liked the idea of renaming methods as Matthias suggested.
> > 2. I removed the allversions() method as I did in KIP-968. To
> retrieve
> > the latest value(s), the user must call just the asOf method with
> the MAX
> > value (asOf(MAX)). The same applies to KIP-968. Do you think it is
> clumsy,
> > Matthias?
> > 3. I added a method to the *VersionedKeyValueStore *interface, as I
> did
> > for KIP-968.
> > 4. Matthias: I do not get what you mean by your second comment. Isn't
> > the KIP already explicit about that?
> >
> > > I assume, results are returned by timestamp for each key. The KIP
> > should be explicit about it.
> >
> >
> > Cheers,
> > Alieh
> >
> >
> >
> > On Tue, Oct 3, 2023 at 6:07 AM Matthias J. Sax  wrote:
> >
> >> Thanks for updating the KIP.
> >>
> >> Not sure if I agree or not with Bruno's idea to split the query types
> >> further? In the end, we split them only because there is three different
> >> return types: single value, value-iterator, key-value-iterator.
> >>
> >> What do we gain by splitting out single-ts-range-key? In the end, for
> >> range-ts-range-key the proposed class is necessary and is a superset
> >> (one can set both timestamps to the same value, for single-ts lookup).
> >>
> >> The mentioned simplification might apply to "single-ts-range-key" but I
> >> don't see a simplification for the proposed (and necessary) query type?
> >>
> >> On the other hand, I see an advantage of a single-ts-range-key for
> >> querying over the "latest version" with a range of keys. For a
> 

[jira] [Created] (KAFKA-15682) Ensure internal remote log metadata topic does not expire its segments before deleting user-topic segments

2023-10-25 Thread Kamal Chandraprakash (Jira)
Kamal Chandraprakash created KAFKA-15682:


 Summary: Ensure internal remote log metadata topic does not expire 
its segments before deleting user-topic segments
 Key: KAFKA-15682
 URL: https://issues.apache.org/jira/browse/KAFKA-15682
 Project: Kafka
  Issue Type: Task
Reporter: Kamal Chandraprakash


One of the implementation of RemoteLogMetadataManager is 
TopicBasedRemoteLogMetadataManager which uses an internal Kafka topic. Unlike 
other internal topics which are compaction enabled, this topic is not enabled 
with compaction and retention is set to unlimited. 

Keeping this internal topic retention to unlimited is not practical in real 
world use-case where the topic disk usage footprint grows large over a period 
of time. 

It is assumed that the user will set the retention to a reasonable time such 
that it is the max of all the user-created topics (max + X). We can't just rely 
on it and need an assertion before deleting the internal 
{{__remote_log_metadata}} segments, otherwise there will be dangling remote log 
segments which won't be cleared once all the brokers are restarted post the 
topic truncation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15681) Add support of client-metrics in kafka-configs.sh

2023-10-25 Thread Apoorv Mittal (Jira)
Apoorv Mittal created KAFKA-15681:
-

 Summary: Add support of client-metrics in kafka-configs.sh
 Key: KAFKA-15681
 URL: https://issues.apache.org/jira/browse/KAFKA-15681
 Project: Kafka
  Issue Type: Sub-task
Reporter: Apoorv Mittal
Assignee: Apoorv Mittal


KIP-714 requires the support of `client-metrics` resource in kafka-configs.sh 
which can manage the subscriptions for the cluster.

 

Following section details the commands: 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-714%3A+Client+metrics+and+observability#KIP714:Clientmetricsandobservability-kafka-configs.sh

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-15671) Flaky test RemoteIndexCacheTest.testClearCacheAndIndexFilesWhenResizeCache

2023-10-25 Thread Divij Vaidya (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Divij Vaidya resolved KAFKA-15671.
--
Resolution: Fixed

> Flaky test RemoteIndexCacheTest.testClearCacheAndIndexFilesWhenResizeCache
> --
>
> Key: KAFKA-15671
> URL: https://issues.apache.org/jira/browse/KAFKA-15671
> Project: Kafka
>  Issue Type: Test
>  Components: Tiered-Storage
>Reporter: Divij Vaidya
>Assignee: hudeqi
>Priority: Major
> Fix For: 3.7.0
>
>
> {{context: 
> [https://github.com/apache/kafka/pull/14483#issuecomment-1775107621] }}
> {{Example of failure in trunk (from 23rd Oct)}}
> {{[https://ge.apache.org/scans/tests?search.rootProjectNames=kafka=trunk=Europe%2FBerlin=kafka.log.remote.RemoteIndexCacheTest=testCorrectnessForCacheAndIndexFilesWhenResizeCache()]
>  }}
> {{Stack trace: 
> [https://ge.apache.org/s/ahssoveatyg6k/tests/task/:core:test/details/kafka.log.remote.RemoteIndexCacheTest/testCorrectnessForCacheAndIndexFilesWhenResizeCache()?expanded-stacktrace=WyIwIl0=1]
>  }}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


KIP-993: Allow restricting files accessed by File and Directory ConfigProviders

2023-10-25 Thread Gantigmaa Selenge
Hi everyone,

I would like to start a discussion on KIP-933 that proposes restricting
files accessed by File and Directory ConfigProviders.

https://cwiki.apache.org/confluence/display/KAFKA/KIP-993%3A+Allow+restricting+files+accessed+by+File+and+Directory+ConfigProviders

Regards,
Tina