Jenkins build is still unstable: Kafka » Kafka Branch Builder » 3.4 #2

2022-12-06 Thread Apache Jenkins Server
See 




Jenkins build is still unstable: Kafka » Kafka Branch Builder » trunk #1412

2022-12-06 Thread Apache Jenkins Server
See 




Jenkins build is unstable: Kafka » Kafka Branch Builder » 3.4 #1

2022-12-06 Thread Apache Jenkins Server
See 




[jira] [Resolved] (KAFKA-14260) InMemoryKeyValueStore iterator still throws ConcurrentModificationException

2022-12-06 Thread A. Sophie Blee-Goldman (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

A. Sophie Blee-Goldman resolved KAFKA-14260.

Resolution: Fixed

> InMemoryKeyValueStore iterator still throws ConcurrentModificationException
> ---
>
> Key: KAFKA-14260
> URL: https://issues.apache.org/jira/browse/KAFKA-14260
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 2.3.1, 3.2.3
>Reporter: Avi Cherry
>Assignee: Lucia Cerchie
>Priority: Major
> Fix For: 3.4.0
>
>
> This is the same bug as KAFKA-7912 which was then re-introduced by KAFKA-8802.
> Any iterator returned from {{InMemoryKeyValueStore}} may end up throwing a 
> ConcurrentModificationException because the backing map is not concurrent 
> safe. I expect that this only happens when the store is retrieved from 
> {{KafkaStreams.store()}} from outside of the topology since any usage of the 
> store from inside of the topology should be naturally single-threaded.
> To start off, a reminder that this behaviour explicitly violates the 
> interface contract for {{ReadOnlyKeyValueStore}} which states
> {quote}The returned iterator must be safe from 
> java.util.ConcurrentModificationExceptions
> {quote}
> It is often complicated to make code to demonstrate concurrency bugs, but 
> thankfully it is trivial to reason through the source code in 
> {{InMemoryKeyValueStore.java}} to show why this happens:
>  * All of the InMemoryKeyValueStore methods that return iterators do so by 
> passing a keySet based on the backing TreeMap to the InMemoryKeyValueIterator 
> constructor.
>  * These keySets are all VIEWS of the backing map, not copies.
>  * The InMemoryKeyValueIterator then makes a private copy of the keySet by 
> passing the original keySet into the constructor for TreeSet. This copying 
> was implemented in KAFKA-8802, incorrectly intending it to fix the 
> concurrency problem.
>  * TreeSet then iterates over the keySet to make a copy. If the original 
> backing TreeMap in InMemoryKeyValueStore is changed while this copy is being 
> created it will fail-fast a ConcurrentModificationException.
> This bug should be able to be trivially fixed by replacing the backing 
> TreeMap with a ConcurrentSkipListMap but here's the rub:
> This bug has already been found in KAFKA-7912 and the TreeMap was replaced 
> with a ConcurrentSkipListMap. It was then reverted back to a TreeMap in 
> KAFKA-8802 because of the performance regression. I can [see from one of the 
> PRs|https://github.com/apache/kafka/pull/7212/commits/384c12e40f3a59591f897d916f92253e126820ed]
>  that it was believed the concurrency problem with the TreeMap implementation 
> was fixed by copying the keyset when the iterator is created but the problem 
> remains, plus the fix creates an extra copy of the iterated portion of the 
> set in memory.
> For what it's worth, the performance difference between TreeMap and 
> ConcurrentSkipListMap do not extend into complexity. TreeMap enjoys a similar 
> ~2x speed through all operations with any size of data, but at the cost of 
> what turned out to be an easy-to-encounter bug.
> This is all unfortunate since the only time the state stores ever get 
> accessed concurrently is through the `KafkaStreams.store()` mechanism, but I 
> would imagine that "correct and slightly slower) is better than "incorrect 
> and faster".
> Too bad BoilerBay's AirConcurrentMap is closed-source and patented.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-13602) Allow to broadcast a result record

2022-12-06 Thread A. Sophie Blee-Goldman (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

A. Sophie Blee-Goldman resolved KAFKA-13602.

Resolution: Fixed

> Allow to broadcast a result record
> --
>
> Key: KAFKA-13602
> URL: https://issues.apache.org/jira/browse/KAFKA-13602
> Project: Kafka
>  Issue Type: New Feature
>  Components: streams
>Reporter: Matthias J. Sax
>Assignee: Sagar Rao
>Priority: Major
>  Labels: needs-kip, newbie++
> Fix For: 3.4.0
>
>
> From time to time, users ask how they can send a record to more than one 
> partition in a sink topic. Currently, this is only possible by replicate the 
> message N times before the sink and use a custom partitioner to write the N 
> messages into the N different partitions.
> It might be worth to make this easier and add a new feature for it. There are 
> multiple options:
>  * extend `to()` / `addSink()` with a "broadcast" option/config
>  * add `toAllPartitions()` / `addBroadcastSink()` methods
>  * allow StreamPartitioner to return `-1` for "all partitions"
>  * extend `StreamPartitioner` to allow returning more than one partition (ie 
> a list/collection of integers instead of a single int)
> The first three options imply that a "full broadcast" is supported only, so 
> it's less flexible. On the other hand, it's easier to use (especially the 
> first two options are easy as they do not require to implement a custom 
> partitioner).
> The last option would be most flexible and also allow for a "partial 
> broadcast" (aka multi-cast) pattern. It might also be possible to combine two 
> options, or maye even a totally different idea.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14446) API forwarding support in ZkBrokers

2022-12-06 Thread Akhilesh Chaganti (Jira)
Akhilesh Chaganti created KAFKA-14446:
-

 Summary: API forwarding support in ZkBrokers
 Key: KAFKA-14446
 URL: https://issues.apache.org/jira/browse/KAFKA-14446
 Project: Kafka
  Issue Type: Sub-task
Reporter: Akhilesh Chaganti
Assignee: Akhilesh Chaganti


To support migration, zkBrokers should be able to forward API requests to the 
Controller, whether it is zkController or kraftController. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [VOTE] KIP-837 Allow MultiCasting a Result Record.

2022-12-06 Thread Sophie Blee-Goldman
Thanks Sagar, this makes sense to me -- we clearly need additional changes
to
avoid breaking IQ when using this feature, but I agree with continuing to
restrict
FKJ since they wouldn't stop working without it, and would become much
harder
to reason about (than they already are) if we did enable them to use it.

And of course, they can still multicast the final results of a FKJ, they
just can't
mess with the internal workings of it in this way.

On Tue, Dec 6, 2022 at 9:48 AM Sagar  wrote:

> Hi All,
>
> I made a couple of edits to the KIP which came up during the code review.
> Changes at a high level are:
>
> 1) KeyQueryMetada enhanced to have a new method called partitions().
> 2) Lifting the restriction of a single partition for IQ. Now the
> restriction holds only for FK Join.
>
> Updated KIP:
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=211883356
>
> Thanks!
> Sagar.
>
> On Mon, Sep 12, 2022 at 6:43 PM Sagar  wrote:
>
> > Thanks Bruno,
> >
> > Marking this as accepted.
> >
> > Thanks everyone for their comments/feedback.
> >
> > Thanks!
> > Sagar.
> >
> > On Mon, Sep 12, 2022 at 1:53 PM Bruno Cadonna 
> wrote:
> >
> >> Hi Sagar,
> >>
> >> Thanks for the update and the PR!
> >>
> >> +1 (binding)
> >>
> >> Best,
> >> Bruno
> >>
> >> On 10.09.22 18:57, Sagar wrote:
> >> > Hi Bruno,
> >> >
> >> > Thanks, I think these changes make sense to me. I have updated the KIP
> >> > accordingly.
> >> >
> >> > Thanks!
> >> > Sagar.
> >> >
> >> > On Wed, Sep 7, 2022 at 2:16 PM Bruno Cadonna 
> >> wrote:
> >> >
> >> >> Hi Sagar,
> >> >>
> >> >> I would not drop the support for dropping records. I would also not
> >> >> return null from partitions(). Maybe an Optional can help here. An
> >> empty
> >> >> Optional would mean to use the default partitioning behavior of the
> >> >> producer. So we would have:
> >> >>
> >> >> - non-empty Optional, non-empty list of integers: partitions to send
> >> the
> >> >> record to
> >> >> - non-empty Optional, empty list of integers: drop the record
> >> >> - empty Optional: use default behavior
> >> >>
> >> >> What do other think?
> >> >>
> >> >> Best,
> >> >> Bruno
> >> >>
> >> >> On 02.09.22 13:53, Sagar wrote:
> >> >>> Hello Bruno/Chris,
> >> >>>
> >> >>> Since these are the last set of changes(I am assuming haha), it
> would
> >> be
> >> >>> great if you could review the 2 options from above so that we can
> >> close
> >> >> the
> >> >>> voting. Of course I am happy to incorporate any other requisite
> >> changes.
> >> >>>
> >> >>> Thanks!
> >> >>> Sagar.
> >> >>>
> >> >>> On Wed, Aug 31, 2022 at 10:07 PM Sagar 
> >> >> wrote:
> >> >>>
> >>  Thanks Bruno for the great points.
> >> 
> >>  I see 2 options here =>
> >> 
> >>  1) As Chris suggested, drop the support for dropping records in the
> >>  partitioner. That way, an empty list could signify the usage of a
> >> >> default
> >>  partitioner. Also, if the deprecated partition() method returns
> null
> >>  thereby signifying the default partitioner, the partitions() can
> >> return
> >> >> an
> >>  empty list i.e default partitioner.
> >> 
> >>  2) OR we treat a null return type of partitions() method to signify
> >> the
> >>  usage of the default partitioner. In the default implementation of
> >>  partitions() method, if partition() returns null, then even
> >> partitions()
> >>  can return null(instead of an empty list). The RecordCollectorImpl
> >> code
> >> >> can
> >>  also be modified accordingly. @Chris, to your point, we can even
> drop
> >> >> the
> >>  support of dropping of records. It came up during KIP discussion,
> >> and I
> >>  thought it might be a useful feature. Let me know what you think.
> >> 
> >>  3) Lastly about the partition number check. I wanted to avoid the
> >> >> throwing
> >>  of exception so I thought adding it might be a useful feature. But
> as
> >> >> you
> >>  pointed out, if it can break backwards compatibility, it's better
> to
> >> >> remove
> >>  it.
> >> 
> >>  Thanks!
> >>  Sagar.
> >> 
> >> 
> >>  On Tue, Aug 30, 2022 at 6:32 PM Chris Egerton
> >> 
> >>  wrote:
> >> 
> >> > +1 to Bruno's concerns about backward compatibility. Do we
> actually
> >> >> need
> >> > support for dropping records in the partitioner? It doesn't seem
> >> >> necessary
> >> > based on the motivation for the KIP. If we remove that feature, we
> >> >> could
> >> > handle null and/or empty lists by using the default partitioning,
> >> > equivalent to how we handle null return values from the existing
> >> >> partition
> >> > method today.
> >> >
> >> > On Tue, Aug 30, 2022 at 8:55 AM Bruno Cadonna  >
> >> >> wrote:
> >> >
> >> >> Hi Sagar,
> >> >>
> >> >> Thank you for the updates!
> >> >>
> >> >> I do not intend to prolong this vote thread more than needed,
> but I
> >> >> still have some points.
> >> >>
> >> 

Re: [DISCUSS] Apache Kafka 3.4.0 release

2022-12-06 Thread Sophie Blee-Goldman
Hey all,

First off, just a heads up that code freeze will be *tomorrow, Dec 6th* so
please make sure
to merge any lingering PRs by EOD Wednesday (PST). If you have a potential
blocker
that may take longer to fix and hasn't already been communicated to me,
please reach out
to me now and make sure the ticket is marked as a blocker for 3.4

Also note that the 3.4 branch has been created, so going forward you'll
need to ensure that
newly merged PRs are cherrypicked to this branch to make the 3.4 release.

Thanks, and don't hesitate to reach out if you have any questions.

Greg/Chris -- I looked over the ticket and PR and agree this counts as a
blocker so just try
and get this in as quickly as is reasonable. It seems like things are
mostly sorted with this
fix but I did chime in on the PR discussion regarding keeping the scope
small here


On Tue, Dec 6, 2022 at 7:15 AM Chris Egerton 
wrote:

> Hi Greg,
>
> Thanks for finding and raising this issue. I've given the PR a look and
> plan to continue reviewing it this week until merged. IMO this should
> qualify as a blocker for the release.
>
> Sophie, is it alright if we merge this into the 3.4 branch (or trunk, if
> one has not been created yet) past the December 7th code freeze deadline?
>
> Cheers,
>
> Chris
>
> On Mon, Dec 5, 2022 at 2:11 PM Greg Harris 
> wrote:
>
> > Hi All,
> >
> > Just notifying everyone of a regression introduced by KIP-787, currently
> > only present on trunk, but which may qualify as a blocker for the
> release.
> > It manifests as a moderate resource leak on MirrorMaker2 clusters. The
> fix
> > should have a small scope and low risk.
> >
> > Here's the bug ticket: https://issues.apache.org/jira/browse/KAFKA-14443
> > Here's the tentative fix PR: https://github.com/apache/kafka/pull/12955
> >
> > Thanks!
> > Greg
> >
> > On Fri, Dec 2, 2022 at 8:06 AM David Jacot 
> > wrote:
> >
> > > Hi Sophie,
> > >
> > > FYI - I just merged KIP-840
> > > (
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=211884652
> > > )
> > > so it will be in 3.4.
> > >
> > > Best,
> > > David
> > >
> > > On Thu, Dec 1, 2022 at 3:01 AM Sophie Blee-Goldman
> > >  wrote:
> > > >
> > > > Hey all! It's officially *feature freeze for 3.4* so make sure you
> get
> > > that
> > > > feature work merged by the end of today.
> > > > After this point, only bug fixes and other work focused on
> stabilizing
> > > the
> > > > release should be merged to the release
> > > > branch. Also note that the *3.4 code freeze* will be in one week
> (*Dec
> > > 7th*)
> > > > so please make sure to stabilize and
> > > > thoroughly test any new features.
> > > >
> > > > I will wait until Friday to create the release branch to allow for
> any
> > > > existing PRs to be merged. After this point you'll
> > > > need to cherrypick any new commits to the 3.4 branch once a PR is
> > merged.
> > > >
> > > > Finally, I've updated the list of KIPs targeted for 3.4. Please check
> > out
> > > > the Planned KIP Content on the release
> > > > plan and let me know if there is anything missing or incorrect on
> > there.
> > > >
> > > > Cheers,
> > > > Sophie
> > > >
> > > >
> > > > On Wed, Nov 30, 2022 at 12:29 PM David Arthur 
> > wrote:
> > > >
> > > > > Sophie, KIP-866 has been accepted. Thanks!
> > > > >
> > > > > -David
> > > > >
> > > > > On Thu, Nov 17, 2022 at 12:21 AM Sophie Blee-Goldman
> > > > >  wrote:
> > > > > >
> > > > > > Thanks for the update Rajini, I've added this to the release page
> > > since
> > > > > it
> > > > > > looks like
> > > > > > it will pass but of course if anything changes, just let me know.
> > > > > >
> > > > > > David, I'm fine with aiming to include KIP-866 in the 3.4 release
> > as
> > > well
> > > > > > since this
> > > > > > seems to be a critical part of the zookeeper removal/migration.
> > > Please
> > > > > let
> > > > > > me know
> > > > > > when it's been accepted
> > > > > >
> > > > > > On Wed, Nov 16, 2022 at 11:08 AM Rajini Sivaram <
> > > rajinisiva...@gmail.com
> > > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Hi Sophie,
> > > > > > >
> > > > > > > KIP-881 has three binding votes (David Jacot, Jun and me) and
> one
> > > > > > > non-binding vote (Maulin). So it is good to go for 3.4.0 if
> there
> > > are
> > > > > no
> > > > > > > objections until the voting time of 72 hours completes on
> Friday.
> > > > > > >
> > > > > > > Thanks,
> > > > > > >
> > > > > > > Rajini
> > > > > > >
> > > > > > > On Wed, Nov 16, 2022 at 3:15 PM David Arthur
> > > > > > >  wrote:
> > > > > > >
> > > > > > > > Sophie, the vote for KIP-866 is underway, but there is still
> > some
> > > > > > > > discussion happening. I'm hopeful that the vote can close
> this
> > > week,
> > > > > but
> > > > > > > it
> > > > > > > > may fall into next week. Can we include this KIP in 3.4?
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > David
> > > > > > > >
> > > > > > > > On Tue, Nov 15, 2022 at 6:52 AM Rajini Sivaram <
> > > > > 

Jenkins build is still unstable: Kafka » Kafka Branch Builder » trunk #1411

2022-12-06 Thread Apache Jenkins Server
See 




[jira] [Resolved] (KAFKA-14415) ThreadCache is getting slower with every additional state store

2022-12-06 Thread A. Sophie Blee-Goldman (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

A. Sophie Blee-Goldman resolved KAFKA-14415.

Resolution: Fixed

> ThreadCache is getting slower with every additional state store
> ---
>
> Key: KAFKA-14415
> URL: https://issues.apache.org/jira/browse/KAFKA-14415
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Lucas Brutschy
>Assignee: Lucas Brutschy
>Priority: Major
> Fix For: 3.4.0
>
>
> There are a few lines in `ThreadCache` that I think should be optimized. 
> `sizeBytes` is called at least once, and potentially many times in every 
> `put` and is linear in the number of caches (= number of state stores, so 
> typically proportional to number of tasks). That means, with every additional 
> task, every put gets a little slower.Compare the throughput of TIME_ROCKS on 
> trunk (green graph):
> [http://kstreams-benchmark-results.s3-website-us-west-2.amazonaws.com/experiments/stateheavy-3-5-3-4-0-51b7eb7937-jenkins-20221113214104-streamsbench/]
> This is the throughput of TIME_ROCKS is 20% higher when a constant time 
> `sizeBytes` implementation is used:
> [http://kstreams-benchmark-results.s3-website-us-west-2.amazonaws.com/experiments/stateheavy-3-5-LUCASCOMPARE-lucas-20221122140846-streamsbench/]
> The same seems to apply for the MEM backend (initial throughput >8000 instead 
> of 6000), however, I cannot run the same benchmark here because the memory is 
> filled too quickly.
> [http://kstreams-benchmark-results.s3-website-us-west-2.amazonaws.com/experiments/stateheavy-3-5-LUCASSTATE-lucas-20221121231632-streamsbench/]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] KIP-890 Server Side Defense

2022-12-06 Thread Justine Olshan
Hi all,
After Artem's questions about error behavior, I've re-evaluated the
unknown producer ID exception and had some discussions offline.

I think generally it makes sense to simplify error handling in cases like
this and the UNKNOWN_PRODUCER_ID error has a pretty long and complicated
history. Because of this, I propose adding a new error code ABORTABLE_ERROR
that when encountered by new clients (gated by the produce request version)
will simply abort the transaction. This allows the server to have some say
in whether the client aborts and makes handling much simpler. In the
future, we can also use this error in other situations where we want to
abort the transactions. We can even use on other apis.

I've added this to the KIP. Let me know if there are any questions or
issues.

Justine

On Fri, Dec 2, 2022 at 10:22 AM Justine Olshan  wrote:

> Hey Matthias,
>
>
> 20/30 — Maybe I also didn't express myself clearly. For older clients we
> don't have a way to distinguish between a previous and the current
> transaction since we don't have the epoch bump. This means that a late
> message from the previous transaction may be added to the new one. With
> older clients — we can't guarantee this won't happen if we already sent the
> addPartitionsToTxn call (why we make changes for the newer client) but we
> can at least gate some by ensuring that the partition has been added to the
> transaction. The rationale here is that there are likely LESS late arrivals
> as time goes on, so hopefully most late arrivals will come in BEFORE the
> addPartitionsToTxn call. Those that arrive before will be properly gated
> with the describeTransactions approach.
>
> If we take the approach you suggested, ANY late arrival from a previous
> transaction will be added. And we don't want that. I also don't see any
> benefit in sending addPartitionsToTxn over the describeTxns call. They will
> both be one extra RPC to the Txn coordinator.
>
>
> To be clear — newer clients will use addPartitionsToTxn instead of the
> DescribeTxns.
>
>
> 40)
> My concern is that if we have some delay in the client to bump the epoch,
> it could continue to send epoch 73 and those records would not be fenced.
> Perhaps this is not an issue if we don't allow the next produce to go
> through before the EndTxn request returns. I'm also thinking about cases of
> failure. I will need to think on this a bit.
>
> I wasn't sure if it was that confusing. But if we think it is, we can
> investigate other ways.
>
>
> 60)
>
> I'm not sure these are the same purgatories since one is a produce
> purgatory (I was planning on using a callback rather than purgatory) and
> the other is simply a request to append to the log. Not sure we have any
> structure here for ordering, but my understanding is that the broker could
> handle the write request before it hears back from the Txn Coordinator.
>
> Let me know if I misunderstood something or something was unclear.
>
> Justine
>
> On Thu, Dec 1, 2022 at 12:15 PM Matthias J. Sax  wrote:
>
>> Thanks for the details Justine!
>>
>> > 20)
>> >
>> > The client side change for 2 is removing the addPartitions to
>> transaction
>> > call. We don't need to make this from the producer to the txn
>> coordinator,
>> > only server side.
>>
>> I think I did not express myself clearly. I understand that we can (and
>> should) change the producer to not send the `addPartitions` request any
>> longer. But I don't thinks it's requirement to change the broker?
>>
>> What I am trying to say is: as a safe-guard and improvement for older
>> producers, the partition leader can just send the `addPartitions`
>> request to the TX-coordinator in any case -- if the old producer
>> correctly did send the `addPartition` request to the TX-coordinator
>> already, the TX-coordinator can just "ignore" is as idempotent. However,
>> if the old producer has a bug and did forget to sent the `addPartition`
>> request, we would now ensure that the partition is indeed added to the
>> TX and thus fix a potential producer bug (even if we don't get the
>> fencing via the bump epoch). -- It seems to be a good improvement? Or is
>> there a reason to not do this?
>>
>>
>>
>> > 30)
>> >
>> > Transaction is ongoing = partition was added to transaction via
>> > addPartitionsToTxn. We check this with the DescribeTransactions call.
>> Let
>> > me know if this wasn't sufficiently explained here:
>>
>> If we do what I propose in (20), we don't really need to make this
>> `DescribeTransaction` call, as the partition leader adds the partition
>> for older clients and we get this check for free.
>>
>>
>> > 40)
>> >
>> > The idea here is that if any messages somehow come in before we get the
>> new
>> > epoch to the producer, they will be fenced. However, if we don't think
>> this
>> > is necessary, it can be discussed
>>
>> I agree that we should have epoch fencing. My question is different:
>> Assume we are at epoch 73, and we have an ongoing transaction, that is
>> committed. It 

Jenkins build is still unstable: Kafka » Kafka Branch Builder » trunk #1410

2022-12-06 Thread Apache Jenkins Server
See 




Re: [DISCUSS] KIP-892: Transactional Semantics for StateStores

2022-12-06 Thread Colt McNealy
Nick,

Thank you for the reply; that makes sense. I was hoping that, since reading
uncommitted records from IQ in EOS isn't part of the documented API, maybe
you *wouldn't* have to wait for the next major release to make that change;
but given that it would be considered a major change, I like your approach
the best.

Wishing you a speedy recovery and happy coding!

Thanks,
Colt McNealy
*Founder, LittleHorse.io*


On Tue, Dec 6, 2022 at 10:30 AM Nick Telford  wrote:

> Hi Colt,
>
> 10: Yes, I agree it's not ideal. I originally intended to try to keep the
> behaviour unchanged as much as possible, otherwise we'd have to wait for a
> major version release to land these changes.
> 20: Good point, ALOS doesn't need the same level of guarantee, and the
> typically longer commit intervals would be problematic when reading only
> "committed" records.
>
> I've been away for 5 days recovering from minor surgery, but I spent a
> considerable amount of that time working through ideas for possible
> solutions in my head. I think your suggestion of keeping ALOS as-is, but
> buffering writes for EOS is the right path forwards, although I have a
> solution that both expands on this, and provides for some more formal
> guarantees.
>
> Essentially, adding support to KeyValueStores for "Transactions", with
> clearly defined IsolationLevels. Using "Read Committed" when under EOS, and
> "Read Uncommitted" under ALOS.
>
> The nice thing about this approach is that it gives us much more clearly
> defined isolation behaviour that can be properly documented to ensure users
> know what to expect.
>
> I'm still working out the kinks in the design, and will update the KIP when
> I have something. The main struggle is trying to implement this without
> making any major changes to the existing interfaces or breaking existing
> implementations, because currently everything expects to operate directly
> on a StateStore, and not a Transaction of that store. I think I'm getting
> close, although sadly I won't be able to progress much until next week due
> to some work commitments.
>
> Regards,
> Nick
>
> On Thu, 1 Dec 2022 at 00:01, Colt McNealy  wrote:
>
> > Nick,
> >
> > Thank you for the explanation, and also for the updated KIP. I am quite
> > eager for this improvement to be released as it would greatly reduce the
> > operational difficulties of EOS streams apps.
> >
> > Two questions:
> >
> > 10)
> > >When reading records, we will use the
> > WriteBatchWithIndex#getFromBatchAndDB
> >  and WriteBatchWithIndex#newIteratorWithBase utilities in order to ensure
> > that uncommitted writes are available to query.
> > Why do extra work to enable the reading of uncommitted writes during IQ?
> > Code complexity aside, reading uncommitted writes is, in my opinion, a
> > minor flaw in EOS IQ; it would be very nice to have the guarantee that,
> > with EOS, IQ only reads committed records. In order to avoid dirty reads,
> > one currently must query a standby replica (but this still doesn't fully
> > guarantee monotonic reads).
> >
> > 20) Is it also necessary to enable this optimization on ALOS stores? The
> > motivation of KIP-844 was mainly to reduce the need to restore state from
> > scratch on unclean EOS shutdowns; with ALOS it was acceptable to accept
> > that there may have been uncommitted writes on disk. On a side note, if
> you
> > enable this type of store on ALOS processors, the community would
> > definitely want to enable queries on dirty reads; otherwise users would
> > have to wait 30 seconds (default) to see an update.
> >
> > Thank you for doing this fantastic work!
> > Colt McNealy
> > *Founder, LittleHorse.io*
> >
> >
> > On Wed, Nov 30, 2022 at 10:44 AM Nick Telford 
> > wrote:
> >
> > > Hi everyone,
> > >
> > > I've drastically reduced the scope of this KIP to no longer include the
> > > StateStore management of checkpointing. This can be added as a KIP
> later
> > on
> > > to further optimize the consistency and performance of state stores.
> > >
> > > I've also added a section discussing some of the concerns around
> > > concurrency, especially in the presence of Iterators. I'm thinking of
> > > wrapping WriteBatchWithIndex with a reference-counting copy-on-write
> > > implementation (that only makes a copy if there's an active iterator),
> > but
> > > I'm open to suggestions.
> > >
> > > Regards,
> > > Nick
> > >
> > > On Mon, 28 Nov 2022 at 16:36, Nick Telford 
> > wrote:
> > >
> > > > Hi Colt,
> > > >
> > > > I didn't do any profiling, but the 844 implementation:
> > > >
> > > >- Writes uncommitted records to a temporary RocksDB instance
> > > >   - Since tombstones need to be flagged, all record values are
> > > >   prefixed with a value/tombstone marker. This necessitates a
> > memory
> > > copy.
> > > >- On-commit, iterates all records in this temporary instance and
> > > >writes them to the main RocksDB store.
> > > >- While iterating, the value/tombstone marker needs to be parsed
> and
> > 

Re: [DISCUSS] KIP-892: Transactional Semantics for StateStores

2022-12-06 Thread Nick Telford
Hi Colt,

10: Yes, I agree it's not ideal. I originally intended to try to keep the
behaviour unchanged as much as possible, otherwise we'd have to wait for a
major version release to land these changes.
20: Good point, ALOS doesn't need the same level of guarantee, and the
typically longer commit intervals would be problematic when reading only
"committed" records.

I've been away for 5 days recovering from minor surgery, but I spent a
considerable amount of that time working through ideas for possible
solutions in my head. I think your suggestion of keeping ALOS as-is, but
buffering writes for EOS is the right path forwards, although I have a
solution that both expands on this, and provides for some more formal
guarantees.

Essentially, adding support to KeyValueStores for "Transactions", with
clearly defined IsolationLevels. Using "Read Committed" when under EOS, and
"Read Uncommitted" under ALOS.

The nice thing about this approach is that it gives us much more clearly
defined isolation behaviour that can be properly documented to ensure users
know what to expect.

I'm still working out the kinks in the design, and will update the KIP when
I have something. The main struggle is trying to implement this without
making any major changes to the existing interfaces or breaking existing
implementations, because currently everything expects to operate directly
on a StateStore, and not a Transaction of that store. I think I'm getting
close, although sadly I won't be able to progress much until next week due
to some work commitments.

Regards,
Nick

On Thu, 1 Dec 2022 at 00:01, Colt McNealy  wrote:

> Nick,
>
> Thank you for the explanation, and also for the updated KIP. I am quite
> eager for this improvement to be released as it would greatly reduce the
> operational difficulties of EOS streams apps.
>
> Two questions:
>
> 10)
> >When reading records, we will use the
> WriteBatchWithIndex#getFromBatchAndDB
>  and WriteBatchWithIndex#newIteratorWithBase utilities in order to ensure
> that uncommitted writes are available to query.
> Why do extra work to enable the reading of uncommitted writes during IQ?
> Code complexity aside, reading uncommitted writes is, in my opinion, a
> minor flaw in EOS IQ; it would be very nice to have the guarantee that,
> with EOS, IQ only reads committed records. In order to avoid dirty reads,
> one currently must query a standby replica (but this still doesn't fully
> guarantee monotonic reads).
>
> 20) Is it also necessary to enable this optimization on ALOS stores? The
> motivation of KIP-844 was mainly to reduce the need to restore state from
> scratch on unclean EOS shutdowns; with ALOS it was acceptable to accept
> that there may have been uncommitted writes on disk. On a side note, if you
> enable this type of store on ALOS processors, the community would
> definitely want to enable queries on dirty reads; otherwise users would
> have to wait 30 seconds (default) to see an update.
>
> Thank you for doing this fantastic work!
> Colt McNealy
> *Founder, LittleHorse.io*
>
>
> On Wed, Nov 30, 2022 at 10:44 AM Nick Telford 
> wrote:
>
> > Hi everyone,
> >
> > I've drastically reduced the scope of this KIP to no longer include the
> > StateStore management of checkpointing. This can be added as a KIP later
> on
> > to further optimize the consistency and performance of state stores.
> >
> > I've also added a section discussing some of the concerns around
> > concurrency, especially in the presence of Iterators. I'm thinking of
> > wrapping WriteBatchWithIndex with a reference-counting copy-on-write
> > implementation (that only makes a copy if there's an active iterator),
> but
> > I'm open to suggestions.
> >
> > Regards,
> > Nick
> >
> > On Mon, 28 Nov 2022 at 16:36, Nick Telford 
> wrote:
> >
> > > Hi Colt,
> > >
> > > I didn't do any profiling, but the 844 implementation:
> > >
> > >- Writes uncommitted records to a temporary RocksDB instance
> > >   - Since tombstones need to be flagged, all record values are
> > >   prefixed with a value/tombstone marker. This necessitates a
> memory
> > copy.
> > >- On-commit, iterates all records in this temporary instance and
> > >writes them to the main RocksDB store.
> > >- While iterating, the value/tombstone marker needs to be parsed and
> > >the real value extracted. This necessitates another memory copy.
> > >
> > > My guess is that the cost of iterating the temporary RocksDB store is
> the
> > > major factor, with the 2 extra memory copies per-Record contributing a
> > > significant amount too.
> > >
> > > Regards,
> > > Nick
> > >
> > > On Mon, 28 Nov 2022 at 16:12, Colt McNealy 
> wrote:
> > >
> > >> Hi all,
> > >>
> > >> Out of curiosity, why does the performance of the store degrade so
> > >> significantly with the 844 implementation? I wouldn't be too surprised
> > by
> > >> a
> > >> 50-60% drop (caused by each record being written twice), but 96% is
> > >> extreme.
> > >>
> > >> The only thing I 

Re: [VOTE] KIP-837 Allow MultiCasting a Result Record.

2022-12-06 Thread Sagar
Hi All,

I made a couple of edits to the KIP which came up during the code review.
Changes at a high level are:

1) KeyQueryMetada enhanced to have a new method called partitions().
2) Lifting the restriction of a single partition for IQ. Now the
restriction holds only for FK Join.

Updated KIP:
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=211883356

Thanks!
Sagar.

On Mon, Sep 12, 2022 at 6:43 PM Sagar  wrote:

> Thanks Bruno,
>
> Marking this as accepted.
>
> Thanks everyone for their comments/feedback.
>
> Thanks!
> Sagar.
>
> On Mon, Sep 12, 2022 at 1:53 PM Bruno Cadonna  wrote:
>
>> Hi Sagar,
>>
>> Thanks for the update and the PR!
>>
>> +1 (binding)
>>
>> Best,
>> Bruno
>>
>> On 10.09.22 18:57, Sagar wrote:
>> > Hi Bruno,
>> >
>> > Thanks, I think these changes make sense to me. I have updated the KIP
>> > accordingly.
>> >
>> > Thanks!
>> > Sagar.
>> >
>> > On Wed, Sep 7, 2022 at 2:16 PM Bruno Cadonna 
>> wrote:
>> >
>> >> Hi Sagar,
>> >>
>> >> I would not drop the support for dropping records. I would also not
>> >> return null from partitions(). Maybe an Optional can help here. An
>> empty
>> >> Optional would mean to use the default partitioning behavior of the
>> >> producer. So we would have:
>> >>
>> >> - non-empty Optional, non-empty list of integers: partitions to send
>> the
>> >> record to
>> >> - non-empty Optional, empty list of integers: drop the record
>> >> - empty Optional: use default behavior
>> >>
>> >> What do other think?
>> >>
>> >> Best,
>> >> Bruno
>> >>
>> >> On 02.09.22 13:53, Sagar wrote:
>> >>> Hello Bruno/Chris,
>> >>>
>> >>> Since these are the last set of changes(I am assuming haha), it would
>> be
>> >>> great if you could review the 2 options from above so that we can
>> close
>> >> the
>> >>> voting. Of course I am happy to incorporate any other requisite
>> changes.
>> >>>
>> >>> Thanks!
>> >>> Sagar.
>> >>>
>> >>> On Wed, Aug 31, 2022 at 10:07 PM Sagar 
>> >> wrote:
>> >>>
>>  Thanks Bruno for the great points.
>> 
>>  I see 2 options here =>
>> 
>>  1) As Chris suggested, drop the support for dropping records in the
>>  partitioner. That way, an empty list could signify the usage of a
>> >> default
>>  partitioner. Also, if the deprecated partition() method returns null
>>  thereby signifying the default partitioner, the partitions() can
>> return
>> >> an
>>  empty list i.e default partitioner.
>> 
>>  2) OR we treat a null return type of partitions() method to signify
>> the
>>  usage of the default partitioner. In the default implementation of
>>  partitions() method, if partition() returns null, then even
>> partitions()
>>  can return null(instead of an empty list). The RecordCollectorImpl
>> code
>> >> can
>>  also be modified accordingly. @Chris, to your point, we can even drop
>> >> the
>>  support of dropping of records. It came up during KIP discussion,
>> and I
>>  thought it might be a useful feature. Let me know what you think.
>> 
>>  3) Lastly about the partition number check. I wanted to avoid the
>> >> throwing
>>  of exception so I thought adding it might be a useful feature. But as
>> >> you
>>  pointed out, if it can break backwards compatibility, it's better to
>> >> remove
>>  it.
>> 
>>  Thanks!
>>  Sagar.
>> 
>> 
>>  On Tue, Aug 30, 2022 at 6:32 PM Chris Egerton
>> 
>>  wrote:
>> 
>> > +1 to Bruno's concerns about backward compatibility. Do we actually
>> >> need
>> > support for dropping records in the partitioner? It doesn't seem
>> >> necessary
>> > based on the motivation for the KIP. If we remove that feature, we
>> >> could
>> > handle null and/or empty lists by using the default partitioning,
>> > equivalent to how we handle null return values from the existing
>> >> partition
>> > method today.
>> >
>> > On Tue, Aug 30, 2022 at 8:55 AM Bruno Cadonna 
>> >> wrote:
>> >
>> >> Hi Sagar,
>> >>
>> >> Thank you for the updates!
>> >>
>> >> I do not intend to prolong this vote thread more than needed, but I
>> >> still have some points.
>> >>
>> >> The deprecated partition method can return null if the default
>> >> partitioning logic of the producer should be used.
>> >> With the new method partitions() it seems that it is not possible
>> to
>> >> use
>> >> the default partitioning logic, anymore.
>> >>
>> >> Also, in the default implementation of method partitions(), a
>> record
>> >> that would use the default partitioning logic in method partition()
>> >> would be dropped, which would break backward compatibility since
>> >> Streams
>> >> would always call the new method partitions() even though the users
>> >> still implement the deprecated method partition().
>> >>
>> >> I have a last point that we should probably discuss on the PR and
>> not
>> >> on
>> >> the KIP but since you 

Jenkins build is still unstable: Kafka » Kafka Branch Builder » trunk #1409

2022-12-06 Thread Apache Jenkins Server
See 




Re: BUG: eos KeyValueStore::delete() in Punctuator

2022-12-06 Thread Matthias J. Sax
Thanks for reporting back and looking into it. Great that it's fixed 
already.


-Matthias

On 12/5/22 9:45 PM, Colt McNealy wrote:

I re-compiled with the current `trunk` branch and the bug was fixed. Thank
you for pointing that out, Matthias, and sorry for the false alarm!

Cheers,
Colt McNealy
*Founder, LittleHorse.io*


On Mon, Dec 5, 2022 at 7:42 PM Matthias J. Sax  wrote:


Thanks for reporting this issue.

It might have been fixed via
https://issues.apache.org/jira/browse/KAFKA-14294 already.


-Matthias



On 12/3/22 7:05 PM, Colt McNealy wrote:

Hi all,

I believe I've found a bug in Kafka Streams when:

- Running an app in EOS
- Calling KeyValueStore::delete(...) on a nonexistent key
- Can cause a ProducerFencedException

The expected behavior is that the call to delete() returns null (as per

the

javadoc) and doesn't cause a ProducerFencedException.

I've created a minimally reproducible example which reliably produces the
bug on my own environment at this repository:

https://github.com/littlehorse-eng/kafka-punctuator-fencing-issue

Could someone please take a look and let me know if you can reliably
reproduce it on your end as well, and if so, how to file a bug?

Thank you,
Colt McNealy
*Founder, LittleHorse.io*







Re: [DISCUSS] Apache Kafka 3.4.0 release

2022-12-06 Thread Chris Egerton
Hi Greg,

Thanks for finding and raising this issue. I've given the PR a look and
plan to continue reviewing it this week until merged. IMO this should
qualify as a blocker for the release.

Sophie, is it alright if we merge this into the 3.4 branch (or trunk, if
one has not been created yet) past the December 7th code freeze deadline?

Cheers,

Chris

On Mon, Dec 5, 2022 at 2:11 PM Greg Harris 
wrote:

> Hi All,
>
> Just notifying everyone of a regression introduced by KIP-787, currently
> only present on trunk, but which may qualify as a blocker for the release.
> It manifests as a moderate resource leak on MirrorMaker2 clusters. The fix
> should have a small scope and low risk.
>
> Here's the bug ticket: https://issues.apache.org/jira/browse/KAFKA-14443
> Here's the tentative fix PR: https://github.com/apache/kafka/pull/12955
>
> Thanks!
> Greg
>
> On Fri, Dec 2, 2022 at 8:06 AM David Jacot 
> wrote:
>
> > Hi Sophie,
> >
> > FYI - I just merged KIP-840
> > (
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=211884652
> > )
> > so it will be in 3.4.
> >
> > Best,
> > David
> >
> > On Thu, Dec 1, 2022 at 3:01 AM Sophie Blee-Goldman
> >  wrote:
> > >
> > > Hey all! It's officially *feature freeze for 3.4* so make sure you get
> > that
> > > feature work merged by the end of today.
> > > After this point, only bug fixes and other work focused on stabilizing
> > the
> > > release should be merged to the release
> > > branch. Also note that the *3.4 code freeze* will be in one week (*Dec
> > 7th*)
> > > so please make sure to stabilize and
> > > thoroughly test any new features.
> > >
> > > I will wait until Friday to create the release branch to allow for any
> > > existing PRs to be merged. After this point you'll
> > > need to cherrypick any new commits to the 3.4 branch once a PR is
> merged.
> > >
> > > Finally, I've updated the list of KIPs targeted for 3.4. Please check
> out
> > > the Planned KIP Content on the release
> > > plan and let me know if there is anything missing or incorrect on
> there.
> > >
> > > Cheers,
> > > Sophie
> > >
> > >
> > > On Wed, Nov 30, 2022 at 12:29 PM David Arthur 
> wrote:
> > >
> > > > Sophie, KIP-866 has been accepted. Thanks!
> > > >
> > > > -David
> > > >
> > > > On Thu, Nov 17, 2022 at 12:21 AM Sophie Blee-Goldman
> > > >  wrote:
> > > > >
> > > > > Thanks for the update Rajini, I've added this to the release page
> > since
> > > > it
> > > > > looks like
> > > > > it will pass but of course if anything changes, just let me know.
> > > > >
> > > > > David, I'm fine with aiming to include KIP-866 in the 3.4 release
> as
> > well
> > > > > since this
> > > > > seems to be a critical part of the zookeeper removal/migration.
> > Please
> > > > let
> > > > > me know
> > > > > when it's been accepted
> > > > >
> > > > > On Wed, Nov 16, 2022 at 11:08 AM Rajini Sivaram <
> > rajinisiva...@gmail.com
> > > > >
> > > > > wrote:
> > > > >
> > > > > > Hi Sophie,
> > > > > >
> > > > > > KIP-881 has three binding votes (David Jacot, Jun and me) and one
> > > > > > non-binding vote (Maulin). So it is good to go for 3.4.0 if there
> > are
> > > > no
> > > > > > objections until the voting time of 72 hours completes on Friday.
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Rajini
> > > > > >
> > > > > > On Wed, Nov 16, 2022 at 3:15 PM David Arthur
> > > > > >  wrote:
> > > > > >
> > > > > > > Sophie, the vote for KIP-866 is underway, but there is still
> some
> > > > > > > discussion happening. I'm hopeful that the vote can close this
> > week,
> > > > but
> > > > > > it
> > > > > > > may fall into next week. Can we include this KIP in 3.4?
> > > > > > >
> > > > > > > Thanks,
> > > > > > > David
> > > > > > >
> > > > > > > On Tue, Nov 15, 2022 at 6:52 AM Rajini Sivaram <
> > > > rajinisiva...@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi Sophie,
> > > > > > > >
> > > > > > > > I was out of office and hence couldn't get voting started for
> > > > KIP-881
> > > > > > in
> > > > > > > > time. I will start the vote for the KIP today. If there are
> > > > sufficient
> > > > > > > > votes by tomorrow (16th Nov), can we include this KIP in 3.4,
> > even
> > > > > > though
> > > > > > > > voting will only complete on the 17th? It is a small KIP, so
> > we can
> > > > > > merge
> > > > > > > > by feature freeze.
> > > > > > > >
> > > > > > > > Thank you,
> > > > > > > >
> > > > > > > > Rajini
> > > > > > > >
> > > > > > > >
> > > > > > > > On Thu, Nov 10, 2022 at 4:02 PM Sophie Blee-Goldman
> > > > > > > >  wrote:
> > > > > > > >
> > > > > > > > > Hello again,
> > > > > > > > >
> > > > > > > > > This is a reminder that the KIP freeze deadline is
> > approaching,
> > > > all
> > > > > > > KIPs
> > > > > > > > > must be voted
> > > > > > > > > and accepted by *next Wednesday* *(the 16th)*
> > > > > > > > >
> > > > > > > > > Keep in mind that to allow for the full voting period, this
> > > > means you
> > > > > > > > must
> > > > > > > > > kick 

Jenkins build is still unstable: Kafka » Kafka Branch Builder » trunk #1408

2022-12-06 Thread Apache Jenkins Server
See 




[jira] [Created] (KAFKA-14445) Producer doesn't request metadata update on REQUEST_TIMED_OUT

2022-12-06 Thread Haruki Okada (Jira)
Haruki Okada created KAFKA-14445:


 Summary: Producer doesn't request metadata update on 
REQUEST_TIMED_OUT
 Key: KAFKA-14445
 URL: https://issues.apache.org/jira/browse/KAFKA-14445
 Project: Kafka
  Issue Type: Improvement
Reporter: Haruki Okada


Produce requests may fail with timeout by `request.timeout.ms` in below two 
cases:
 * Didn't receive produce response within `request.timeout.ms`
 * Produce response received, but it ended up with `REQUEST_TIMEOUT_MS` in the 
broker

Former case usually happens when a broker-machine got failed or there's network 
glitch etc.

In this case, the connection will be disconnected and metadata-update will be 
requested to discover new leader: 
[https://github.com/apache/kafka/blob/3.3.1/clients/src/main/java/org/apache/kafka/clients/producer/internals/Sender.java#L556]

 

The problem is in latter case (REQUEST_TIMED_OUT on the broker).

In this case, the produce request will be ended up with TimeoutException, which 
doesn't inherit InvalidMetadataException so it doesn't trigger metadata update.

 

Typical cause of REQUEST_TIMED_OUT is replication delay due to follower-side 
problem, that metadata-update doesn't make much sense indeed.

 

However, we found that in some cases, stale metadata on REQUEST_TIMED_OUT could 
cause produce requests to retry unnecessarily , which may end up with batch 
expiration due to delivery timeout.

Below is the scenario we experienced:
 * Environment:
 ** Partition tp-0 has 3 replicas, 1, 2, 3. Leader is 1
 ** min.insync.replicas=2
 ** acks=all
 * Scenario:
 ** broker 1 "partially" failed
 *** It lost ZooKeeper connection and kicked out from the cluster
  There was controller log like:
 * 
{code:java}
[2022-12-04 08:01:04,013] INFO [Controller id=XX] Newly added brokers: , 
deleted brokers: 1, bounced brokers: {code}

 *** However, somehow the broker was able continued to receive produce requests
  We're still working on investigating how this is possible though.
  Indeed, broker 1 was somewhat "alive" and keeps working according to 
server.log
 *** In other words, broker 1 became "zombie"
 ** broker 2 was elected as new leader
 *** broker 3 became follower of broker 2
 *** However, since broker 1 was still out of cluster, it didn't receive 
LeaderAndIsr so 1 kept thinking itself as the leader of tp-0
 ** Meanwhile, producer keeps sending produce requests to broker 1 and requests 
were failed due to REQUEST_TIMED_OUT because no brokers replicates from broker 
1.
 *** REQUEST_TIMED_OUT doesn't trigger metadata update, so produce didn't have 
a change to update its stale metadata

 

So I suggest to request metadata update even on REQUEST_TIMED_OUT exception, 
for the case that the old leader became "zombie"



--
This message was sent by Atlassian Jira
(v8.20.10#820010)