[jira] [Created] (KAFKA-14638) Documentation for transaction.timeout.ms should be more precise

2023-01-18 Thread Alex Sorokoumov (Jira)
Alex Sorokoumov created KAFKA-14638:
---

 Summary: Documentation for transaction.timeout.ms should be more 
precise
 Key: KAFKA-14638
 URL: https://issues.apache.org/jira/browse/KAFKA-14638
 Project: Kafka
  Issue Type: Bug
Reporter: Alex Sorokoumov
Assignee: Alex Sorokoumov


The documentation for {{transaction.timeout.ms}} is

{quote}
The maximum amount of time in ms that the transaction coordinator will wait for 
a transaction status update from the producer before proactively aborting the 
ongoing transaction. If this value is larger than the 
transaction.max.timeout.ms setting in the broker, the request will fail with a 
InvalidTxnTimeoutException error.
{quote}

It would be easier to reason about this timeout if the documentation would 
elaborate on when the timer starts ticking and under what circumstances it 
might reset.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: Tumbling windows offset

2023-01-18 Thread Matthias J. Sax

You can use a custom window implementation to do this.

Cf 
https://github.com/confluentinc/kafka-streams-examples/blob/7.1.1-post/src/test/java/io/confluent/examples/streams/window/DailyTimeWindows.java



-Matthias

On 1/18/23 6:33 AM, Amsterdam Luís de Lima Filho wrote:

Hello everyone,

I need to perform windowed aggregation for weekly interval, from Monday to 
Monday, using Tumbling Windows.

It does not seem to be supported: 
https://stackoverflow.com/questions/72785744/how-to-apply-an-offset-to-tumbling-window-in-order-to-delay-the-starting-of-win

Can someone help me with a workaround or guidance on how to make a PR adding 
this feature?

I need this to finish a migration from Flink to Kafka Streams in prod.

Best,
Amsterdam


Re: [DISCUSS] KIP-890 Server Side Defense

2023-01-18 Thread Justine Olshan
Yeah -- looks like we already have code to handle bumping the epoch and
when the epoch is Short.MAX_VALUE, we get a new producer ID. Since this is
already the behavior, do we want to change it further?

Justine

On Wed, Jan 18, 2023 at 1:12 PM Justine Olshan  wrote:

> Hey all, just wanted to quickly update and say I've modified the KIP to
> explicitly mention that AddOffsetCommitsToTxnRequest will be replaced by
> a coordinator-side (inter-broker) AddPartitionsToTxn implicit request. This
> mirrors the user partitions and will implicitly add offset partitions to
> transactions when we commit offsets on them. We will deprecate 
> AddOffsetCommitsToTxnRequest
> for new clients.
>
> Also to address Artem's comments --
> I'm a bit unsure if the changes here will change the previous behavior for
> fencing producers. In the case you mention in the first paragraph, are you
> saying we bump the epoch before we try to abort the transaction? I think I
> need to understand the scenarios you mention a bit better.
>
> As for the second part -- I think it makes sense to have some sort of
> "sentinel" epoch to signal epoch is about to overflow (I think we sort of
> have this value in place in some ways) so we can codify it in the KIP. I'll
> look into that and try to update soon.
>
> Thanks,
> Justine.
>
> On Fri, Jan 13, 2023 at 5:01 PM Artem Livshits
>  wrote:
>
>> It's good to know that KIP-588 addressed some of the issues.  Looking at
>> the code, it still looks like there are some cases that would result in
>> fatal error, e.g. PRODUCER_FENCED is issued by the transaction coordinator
>> if epoch doesn't match, and the client treats it as a fatal error (code in
>> TransactionManager request handling).  If we consider, for example,
>> committing a transaction that returns a timeout, but actually succeeds,
>> trying to abort it or re-commit may result in PRODUCER_FENCED error
>> (because of epoch bump).
>>
>> For failed commits, specifically, we need to know the actual outcome,
>> because if we return an error the application may think that the
>> transaction is aborted and redo the work, leading to duplicates.
>>
>> Re: overflowing epoch.  We could either do it on the TC and return both
>> producer id and epoch (e.g. change the protocol), or signal the client
>> that
>> it needs to get a new producer id.  Checking for max epoch could be a
>> reasonable signal, the value to check should probably be present in the
>> KIP
>> as this is effectively a part of the contract.  Also, the TC should
>> probably return an error if the client didn't change producer id after
>> hitting max epoch.
>>
>> -Artem
>>
>>
>> On Thu, Jan 12, 2023 at 10:31 AM Justine Olshan
>>  wrote:
>>
>> > Thanks for the discussion Artem.
>> >
>> > With respect to the handling of fenced producers, we have some behavior
>> > already in place. As of KIP-588:
>> >
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-588%3A+Allow+producers+to+recover+gracefully+from+transaction+timeouts
>> > ,
>> > we handle timeouts more gracefully. The producer can recover.
>> >
>> > Produce requests can also recover from epoch fencing by aborting the
>> > transaction and starting over.
>> >
>> > What other cases were you considering that would cause us to have a
>> fenced
>> > epoch but we'd want to recover?
>> >
>> > The first point about handling epoch overflows is fair. I think there is
>> > some logic we'd need to consider. (ie, if we are one away from the max
>> > epoch, we need to reset the producer ID.) I'm still wondering if there
>> is a
>> > way to direct this from the response, or if everything should be done on
>> > the client side. Let me know if you have any thoughts here.
>> >
>> > Thanks,
>> > Justine
>> >
>> > On Tue, Jan 10, 2023 at 4:06 PM Artem Livshits
>> >  wrote:
>> >
>> > > There are some workflows in the client that are implied by protocol
>> > > changes, e.g.:
>> > >
>> > > - for new clients, epoch changes with every transaction and can
>> overflow,
>> > > in old clients this condition was handled transparently, because epoch
>> > was
>> > > bumped in InitProducerId and it would return a new producer id if
>> epoch
>> > > overflows, the new clients would need to implement some workflow to
>> > refresh
>> > > producer id
>> > > - how to handle fenced producers, for new clients epoch changes with
>> > every
>> > > transaction, so in presence of failures during commits / aborts, the
>> > > producer could get easily fenced, old clients would pretty much would
>> get
>> > > fenced when a new incarnation of the producer was initialized with
>> > > InitProducerId so it's ok to treat as a fatal error, the new clients
>> > would
>> > > need to implement some workflow to handle that error, otherwise they
>> > could
>> > > get fenced by themselves
>> > > - in particular (as a subset of the previous issue), what would the
>> > client
>> > > do if it got a timeout during commit?  commit could've succeeded or
>> > failed
>> > >
>> > > Not sure if thi

Jenkins build is unstable: Kafka » Kafka Branch Builder » trunk #1520

2023-01-18 Thread Apache Jenkins Server
See 




Re: [VOTE] 3.3.2 RC1

2023-01-18 Thread Matthias J. Sax

Done.

On 1/18/23 10:36 AM, Chris Egerton wrote:

Thanks Mickael!

@John, @Matthias -- would either of you be able to lend a hand with the 
push to S3?


Cheers,

Chris

On Tue, Jan 17, 2023 at 12:24 PM Mickael Maison 
mailto:mickael.mai...@gmail.com>> wrote:


Hi Chris,

I've pushed the artifacts to
https://dist.apache.org/repos/dist/release/kafka/
 and added your key
to the KEYS file in that repo.
You should now be able to release the artifacts. Then you'll need a
Confluent employee (I usually ping John or Matthias) to push a few
binaries to their S3 bucket.

Thanks,
Mickael

On Tue, Jan 17, 2023 at 5:04 PM Chris Egerton
 wrote:
 >
 > Hi Mickael,
 >
 > Haven't found anyone yet; your help would be greatly appreciated!
 >
 > Cheers,
 >
 > Chris
 >
 > On Mon, Jan 16, 2023 at 8:46 AM Mickael Maison
mailto:mickael.mai...@gmail.com>>
 > wrote:
 >
 > > Hi Chris,
 > >
 > > Have you already found a PMC member to help you out? If not I'm
happy
 > > to lend you a hand.
 > >
 > > Thanks,
 > > Mickael
 > >
 > > On Wed, Jan 11, 2023 at 9:07 PM Chris Egerton

 > > wrote:
 > > >
 > > > Hi all,
 > > >
 > > > In order to continue with the release process, I need the
assistance of a
 > > > PMC member to help with publishing artifacts to the release
directory of
 > > > our SVN repo, and adding my public key to the KEYS file.
Would anyone be
 > > > willing to lend a hand?
 > > >
 > > > Cheers,
 > > >
 > > > Chris
 > > >
 > > > On Wed, Jan 11, 2023 at 11:45 AM Chris Egerton
mailto:chr...@aiven.io>> wrote:
 > > >
 > > > > Hi all,
 > > > >
 > > > > Thanks to José for running the system tests, and to Bruno
for verifying
 > > > > the results!
 > > > >
 > > > > With that, I'm closing the vote. The RC has passed with the
required
 > > > > number of votes. I'll be sending out a results announcement
shortly.
 > > > >
 > > > > Cheers,
 > > > >
 > > > > Chris
 > > > >
 > > > > On Wed, Jan 11, 2023 at 6:19 AM Bruno Cadonna
mailto:cado...@apache.org>>
 > > wrote:
 > > > >
 > > > >> Hi Chris and José,
 > > > >>
 > > > >> I think the issue is not Streams related but has to do
with the
 > > > >> following commit:
 > > > >>
 > > > >> commit b66af662e61082cb8def576ded1fe5cee37e155f (HEAD,
tag: 3.3.2-rc1)
 > > > >> Author: Chris Egerton mailto:chr...@aiven.io>>
 > > > >> Date:   Wed Dec 21 16:14:10 2022 -0500
 > > > >>
 > > > >>      Bump version to 3.3.2
 > > > >>
 > > > >>
 > > > >> The Streams upgrade system tests verify the upgrade from a
previous
 > > > >> version to the current version. In the above the current
version in
 > > > >> gradle.properties is set from 3.3.2-SNAPSHOT to 3.3.2 but
the test
 > > > >> verifies for development version 3.3.2-SNAPSHOT.
 > > > >>
 > > > >> I ran the following failing test:
 > > > >>
 > > > >>
 > >

TC_PATHS="tests/kafkatest/tests/streams/streams_application_upgrade_test.py::StreamsUpgradeTest.test_app_upgrade"
 > > > >>
 > > > >> _DUCKTAPE_OPTIONS='--parameters
 > > > >>
 > >

'\''{"bounce_type":"full","from_version":"2.2.2","to_version":"3.3.2-SNAPSHOT"}'\'
 > > > >>
 > > > >> bash tests/docker/run_tests.sh
 > > > >>
 > > > >> on the previous commit to the above commit, i.e.:
 > > > >>
 > > > >> commit e3212f28eb88e8f7fcf3d2d4c646b2a28b0f668e
 > > > >> Author: José Armando García Sancio
mailto:jsan...@users.noreply.github.com>>
 > > > >> Date:   Tue Dec 20 10:55:14 2022 -0800
 > > > >>
 > > > >> and the test passed.
 > > > >>
 > > > >> Best,
 > > > >> Bruno
 > > > >>
 > > > >> On 10.01.23 19:25, José Armando García Sancio wrote:
 > > > >> > Hey Chris,
 > > > >> >
 > > > >> > Here are the results:
 > > > >> >
 > > > >>
 > >

http://confluent-kafka-branch-builder-system-test-results.s3-us-west-2.amazonaws.com/system-test-kafka-branch-builder--1673314598--apache--HEAD--b66af662e6/2023-01-09--001./2023-01-09--001./report.html
 

 > > > >> >
 > > > >> > It looks like all of the failures are when trying to
upgrade to
 > > > >> > 3.3.2-SNAPSHOT. I saw a similar error in my PR here but
I am not
 > > sure
 > > > >> > if it is related:
https://github.com/apache/kafka/pull/13077

 > > > >> >
 > > > >> > Maybe someone familiar with Kafka Streams can help.
 > > > >> >

Re: [DISCUSS] KIP-890 Server Side Defense

2023-01-18 Thread Justine Olshan
Hey all, just wanted to quickly update and say I've modified the KIP to
explicitly mention that AddOffsetCommitsToTxnRequest will be replaced by a
coordinator-side (inter-broker) AddPartitionsToTxn implicit request. This
mirrors the user partitions and will implicitly add offset partitions to
transactions when we commit offsets on them. We will deprecate
AddOffsetCommitsToTxnRequest
for new clients.

Also to address Artem's comments --
I'm a bit unsure if the changes here will change the previous behavior for
fencing producers. In the case you mention in the first paragraph, are you
saying we bump the epoch before we try to abort the transaction? I think I
need to understand the scenarios you mention a bit better.

As for the second part -- I think it makes sense to have some sort of
"sentinel" epoch to signal epoch is about to overflow (I think we sort of
have this value in place in some ways) so we can codify it in the KIP. I'll
look into that and try to update soon.

Thanks,
Justine.

On Fri, Jan 13, 2023 at 5:01 PM Artem Livshits
 wrote:

> It's good to know that KIP-588 addressed some of the issues.  Looking at
> the code, it still looks like there are some cases that would result in
> fatal error, e.g. PRODUCER_FENCED is issued by the transaction coordinator
> if epoch doesn't match, and the client treats it as a fatal error (code in
> TransactionManager request handling).  If we consider, for example,
> committing a transaction that returns a timeout, but actually succeeds,
> trying to abort it or re-commit may result in PRODUCER_FENCED error
> (because of epoch bump).
>
> For failed commits, specifically, we need to know the actual outcome,
> because if we return an error the application may think that the
> transaction is aborted and redo the work, leading to duplicates.
>
> Re: overflowing epoch.  We could either do it on the TC and return both
> producer id and epoch (e.g. change the protocol), or signal the client that
> it needs to get a new producer id.  Checking for max epoch could be a
> reasonable signal, the value to check should probably be present in the KIP
> as this is effectively a part of the contract.  Also, the TC should
> probably return an error if the client didn't change producer id after
> hitting max epoch.
>
> -Artem
>
>
> On Thu, Jan 12, 2023 at 10:31 AM Justine Olshan
>  wrote:
>
> > Thanks for the discussion Artem.
> >
> > With respect to the handling of fenced producers, we have some behavior
> > already in place. As of KIP-588:
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-588%3A+Allow+producers+to+recover+gracefully+from+transaction+timeouts
> > ,
> > we handle timeouts more gracefully. The producer can recover.
> >
> > Produce requests can also recover from epoch fencing by aborting the
> > transaction and starting over.
> >
> > What other cases were you considering that would cause us to have a
> fenced
> > epoch but we'd want to recover?
> >
> > The first point about handling epoch overflows is fair. I think there is
> > some logic we'd need to consider. (ie, if we are one away from the max
> > epoch, we need to reset the producer ID.) I'm still wondering if there
> is a
> > way to direct this from the response, or if everything should be done on
> > the client side. Let me know if you have any thoughts here.
> >
> > Thanks,
> > Justine
> >
> > On Tue, Jan 10, 2023 at 4:06 PM Artem Livshits
> >  wrote:
> >
> > > There are some workflows in the client that are implied by protocol
> > > changes, e.g.:
> > >
> > > - for new clients, epoch changes with every transaction and can
> overflow,
> > > in old clients this condition was handled transparently, because epoch
> > was
> > > bumped in InitProducerId and it would return a new producer id if epoch
> > > overflows, the new clients would need to implement some workflow to
> > refresh
> > > producer id
> > > - how to handle fenced producers, for new clients epoch changes with
> > every
> > > transaction, so in presence of failures during commits / aborts, the
> > > producer could get easily fenced, old clients would pretty much would
> get
> > > fenced when a new incarnation of the producer was initialized with
> > > InitProducerId so it's ok to treat as a fatal error, the new clients
> > would
> > > need to implement some workflow to handle that error, otherwise they
> > could
> > > get fenced by themselves
> > > - in particular (as a subset of the previous issue), what would the
> > client
> > > do if it got a timeout during commit?  commit could've succeeded or
> > failed
> > >
> > > Not sure if this has to be defined in the KIP as implementing those
> > > probably wouldn't require protocol changes, but we have multiple
> > > implementations of Kafka clients, so probably would be good to have
> some
> > > client implementation guidance.  Could also be done as a separate doc.
> > >
> > > -Artem
> > >
> > > On Mon, Jan 9, 2023 at 3:38 PM Justine Olshan
> >  > > >
> > > wrote:
> > >
> > > > Hey all, I'

Re: [DISCUSS] KIP-875: First-class offsets support in Kafka Connect

2023-01-18 Thread Greg Harris
Hi Chris,

I had some clarifying questions about the alterOffsets hooks. The KIP
includes these elements of the design:

* The Javadoc for the methods mentions that the alterOffsets methods are
only called on started and initialized connector objects.
* The 'Altering' and 'Resetting' offsets descriptions indicate that the
requests are forwarded to the leader.
* And finally, the description of "A new STOPPED state" includes a note
that the Connector will not be started, and it will not be able to generate
new task configurations

1. Does this mean that a connector may be assigned to a non-leader worker
in the cluster, an alter request comes in, and a connector instance is
temporarily started on the leader to service the request?
2. While the alterOffsets method is being called, will the leader need to
ignore potential requestTaskReconfiguration calls?
On the surface, this seems to conflict with the semantics of the rebalance
subsystem, as connectors are being started where they are not assigned.
Additionally, it seems to conflict with the semantics of the STOPPED state,
which when read literally, might imply that the connector is not started
_anywhere_ in the cluster.

I think that if we wish to provide these alterOffsets methods, they must be
called on started and initialized connector objects.
And if that's the case, then we will need to ignore
requestTaskReconfigurationCalls.
But we may need to relax the wording on the Stopped state to add an
exception for temporary starts, while still preventing it from using
resources in the background.
We should also consider whether we can influence the rebalance algorithm to
allocate STOPPED connectors to the leader, or not allocate them at all, or
use the rebalance algorithm's connector assignment to distribute the
alterOffsets calls across the cluster.

And slightly related:
3. How will the check for the STOPPED state on alter requests be
implemented, is it from reading the config topic or the status topic?
4. Is there synchronization to ensure that a connector on a non-leader is
STOPPED before an instance is started on the leader? If not, there might be
a risk of the non-leader connector overwriting the effects of the
alterOffsets on the leader connector.

Thanks,
Greg


On Fri, Dec 16, 2022 at 8:26 AM Yash Mayya  wrote:

> Hi Chris,
>
> Thanks for clarifying, I had missed that update in the KIP (the bit about
> altering/resetting offsets response). I think your arguments for not going
> with an additional method or a custom return type make sense.
>
> Thanks,
> Yash
>
> On Sat, Dec 10, 2022 at 12:28 AM Chris Egerton 
> wrote:
>
> > Hi Yash,
> >
> > The idea with the boolean is to just signify that a connector has
> > overridden this method, which allows us to issue a definitive response in
> > the REST API when servicing offset alter/reset requests (described more
> in
> > detail here:
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-875%3A+First-class+offsets+support+in+Kafka+Connect#KIP875:FirstclassoffsetssupportinKafkaConnect-Altering/resettingoffsets(response)
> > ).
> >
> > One alternative could be to add some kind of OffsetAlterResult return
> type
> > for this method with constructors like "OffsetAlterResult.success()",
> > "OffsetAlterResult.unknown()" (default), and
> > "OffsetAlterResult.failure(Throwable t)", but it's hard to envision a
> > reason to use the "failure" static factory method instead of just
> throwing
> > an exception, at which point, we're only left with two reasonable
> methods:
> > success, and unknown.
> >
> > Another could be to add a "boolean canResetOffsets()" method to the
> > SourceConnector class, but adding a separate method seems like overkill,
> > and developers may not understand that implementing that method to always
> > return "false" won't actually cause offset reset requests to not take
> > place.
> >
> > One final option could be to have something like an AlterOffsetsSupport
> > enum with values SUPPORTED and UNSUPPORTED and a new "AlterOffsetsSupport
> > alterOffsetsSupport()" SourceConnector method that returns null by
> default
> > (which implicitly maps to the "unknown support" response message in the
> > REST API). This would line up with the ExactlyOnceSupport API we added in
> > KIP-618. However, I'm hesitant to adopt it because it's not strictly
> > necessary to address the cases that we want to cover; everything can be
> > handled with a single method that gets invoked on offset alter/reset
> > requests. With exactly-once support, we added this hook because it was
> > designed to be invoked in a different point in the connector lifecycle.
> >
> > Cheers,
> >
> > Chris
> >
> > On Wed, Dec 7, 2022 at 9:46 AM Yash Mayya  wrote:
> >
> > > Hi Chris,
> > >
> > > Sorry for the late reply.
> > >
> > > > I don't believe logging an error message is sufficient for
> > > > handling failures to reset-after-delete. IMO it's highly
> > > > likely that users will either shoot themselves in the foot
> > > > b

[jira] [Created] (KAFKA-14637) Upgrade to 3.4 from old versions (< 0.10) are failing due to incompatible meta.properties check

2023-01-18 Thread Akhilesh Chaganti (Jira)
Akhilesh Chaganti created KAFKA-14637:
-

 Summary: Upgrade to 3.4 from old versions (< 0.10) are failing due 
to incompatible meta.properties check
 Key: KAFKA-14637
 URL: https://issues.apache.org/jira/browse/KAFKA-14637
 Project: Kafka
  Issue Type: Bug
Affects Versions: 3.4.0
Reporter: Akhilesh Chaganti


3.4 has a check in broker startup to ensure cluster.id is provided in 
`metadata.properties`. This is not always the case if the previous version of 
Kafka is < 0.10.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[VOTE] KIP-875: First-class offsets support in Kafka Connect

2023-01-18 Thread Chris Egerton
Hi all,

I'd like to call for a vote on KIP-875, which adds support for viewing and
manipulating the offsets of connectors to the Kafka Connect REST API.

The KIP:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-875%3A+First-class+offsets+support+in+Kafka+Connect

The discussion thread:
https://lists.apache.org/thread/m5bklnh5w4mwr9nbzrmfk0pftpxfjd02

Cheers,

Chris


Re: [VOTE] 3.3.2 RC1

2023-01-18 Thread Chris Egerton
Thanks Mickael!

@John, @Matthias -- would either of you be able to lend a hand with the
push to S3?

Cheers,

Chris

On Tue, Jan 17, 2023 at 12:24 PM Mickael Maison 
wrote:

> Hi Chris,
>
> I've pushed the artifacts to
> https://dist.apache.org/repos/dist/release/kafka/ and added your key
> to the KEYS file in that repo.
> You should now be able to release the artifacts. Then you'll need a
> Confluent employee (I usually ping John or Matthias) to push a few
> binaries to their S3 bucket.
>
> Thanks,
> Mickael
>
> On Tue, Jan 17, 2023 at 5:04 PM Chris Egerton 
> wrote:
> >
> > Hi Mickael,
> >
> > Haven't found anyone yet; your help would be greatly appreciated!
> >
> > Cheers,
> >
> > Chris
> >
> > On Mon, Jan 16, 2023 at 8:46 AM Mickael Maison  >
> > wrote:
> >
> > > Hi Chris,
> > >
> > > Have you already found a PMC member to help you out? If not I'm happy
> > > to lend you a hand.
> > >
> > > Thanks,
> > > Mickael
> > >
> > > On Wed, Jan 11, 2023 at 9:07 PM Chris Egerton  >
> > > wrote:
> > > >
> > > > Hi all,
> > > >
> > > > In order to continue with the release process, I need the assistance
> of a
> > > > PMC member to help with publishing artifacts to the release
> directory of
> > > > our SVN repo, and adding my public key to the KEYS file. Would
> anyone be
> > > > willing to lend a hand?
> > > >
> > > > Cheers,
> > > >
> > > > Chris
> > > >
> > > > On Wed, Jan 11, 2023 at 11:45 AM Chris Egerton 
> wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > Thanks to José for running the system tests, and to Bruno for
> verifying
> > > > > the results!
> > > > >
> > > > > With that, I'm closing the vote. The RC has passed with the
> required
> > > > > number of votes. I'll be sending out a results announcement
> shortly.
> > > > >
> > > > > Cheers,
> > > > >
> > > > > Chris
> > > > >
> > > > > On Wed, Jan 11, 2023 at 6:19 AM Bruno Cadonna 
> > > wrote:
> > > > >
> > > > >> Hi Chris and José,
> > > > >>
> > > > >> I think the issue is not Streams related but has to do with the
> > > > >> following commit:
> > > > >>
> > > > >> commit b66af662e61082cb8def576ded1fe5cee37e155f (HEAD, tag:
> 3.3.2-rc1)
> > > > >> Author: Chris Egerton 
> > > > >> Date:   Wed Dec 21 16:14:10 2022 -0500
> > > > >>
> > > > >>  Bump version to 3.3.2
> > > > >>
> > > > >>
> > > > >> The Streams upgrade system tests verify the upgrade from a
> previous
> > > > >> version to the current version. In the above the current version
> in
> > > > >> gradle.properties is set from 3.3.2-SNAPSHOT to 3.3.2 but the test
> > > > >> verifies for development version 3.3.2-SNAPSHOT.
> > > > >>
> > > > >> I ran the following failing test:
> > > > >>
> > > > >>
> > >
> TC_PATHS="tests/kafkatest/tests/streams/streams_application_upgrade_test.py::StreamsUpgradeTest.test_app_upgrade"
> > > > >>
> > > > >> _DUCKTAPE_OPTIONS='--parameters
> > > > >>
> > >
> '\''{"bounce_type":"full","from_version":"2.2.2","to_version":"3.3.2-SNAPSHOT"}'\'
> > > > >>
> > > > >> bash tests/docker/run_tests.sh
> > > > >>
> > > > >> on the previous commit to the above commit, i.e.:
> > > > >>
> > > > >> commit e3212f28eb88e8f7fcf3d2d4c646b2a28b0f668e
> > > > >> Author: José Armando García Sancio <
> jsan...@users.noreply.github.com>
> > > > >> Date:   Tue Dec 20 10:55:14 2022 -0800
> > > > >>
> > > > >> and the test passed.
> > > > >>
> > > > >> Best,
> > > > >> Bruno
> > > > >>
> > > > >> On 10.01.23 19:25, José Armando García Sancio wrote:
> > > > >> > Hey Chris,
> > > > >> >
> > > > >> > Here are the results:
> > > > >> >
> > > > >>
> > >
> http://confluent-kafka-branch-builder-system-test-results.s3-us-west-2.amazonaws.com/system-test-kafka-branch-builder--1673314598--apache--HEAD--b66af662e6/2023-01-09--001./2023-01-09--001./report.html
> > > > >> >
> > > > >> > It looks like all of the failures are when trying to upgrade to
> > > > >> > 3.3.2-SNAPSHOT. I saw a similar error in my PR here but I am not
> > > sure
> > > > >> > if it is related: https://github.com/apache/kafka/pull/13077
> > > > >> >
> > > > >> > Maybe someone familiar with Kafka Streams can help.
> > > > >> >
> > > > >> > Thanks,
> > > > >>
> > > > >
> > >
>


[GitHub] [kafka-site] wcarlson5 merged pull request #479: MINOR: Add Walker as a committer

2023-01-18 Thread GitBox


wcarlson5 merged PR #479:
URL: https://github.com/apache/kafka-site/pull/479


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Tumbling windows offset

2023-01-18 Thread Amsterdam Luís de Lima Filho
Hello everyone,

I need to perform windowed aggregation for weekly interval, from Monday to 
Monday, using Tumbling Windows.

It does not seem to be supported: 
https://stackoverflow.com/questions/72785744/how-to-apply-an-offset-to-tumbling-window-in-order-to-delay-the-starting-of-win
 

Can someone help me with a workaround or guidance on how to make a PR adding 
this feature?

I need this to finish a migration from Flink to Kafka Streams in prod.

Best,
Amsterdam

[jira] [Created] (KAFKA-14636) Compression optimization: Use zstd dictionary based (de)compression

2023-01-18 Thread Divij Vaidya (Jira)
Divij Vaidya created KAFKA-14636:


 Summary: Compression optimization: Use zstd dictionary based 
(de)compression
 Key: KAFKA-14636
 URL: https://issues.apache.org/jira/browse/KAFKA-14636
 Project: Kafka
  Issue Type: Sub-task
Reporter: Divij Vaidya


Use dictionary functionality of Zstd decompression. Train the dictionary per 
topic for first few MBs and then use it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14635) Compression optimization: Use direct memory buffers for (de)compression

2023-01-18 Thread Divij Vaidya (Jira)
Divij Vaidya created KAFKA-14635:


 Summary: Compression optimization: Use direct memory buffers for 
(de)compression
 Key: KAFKA-14635
 URL: https://issues.apache.org/jira/browse/KAFKA-14635
 Project: Kafka
  Issue Type: Sub-task
Reporter: Divij Vaidya


Provide a pool of direct buffers to zstd-jni for it’s internal usage. Direct 
buffers is an ideal use case for scenarios where data is transferred across JNI 
such as the case in (de) compression. The latest version of zstd-jni works with 
direct buffers.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14634) Compression optimization: Calculate the size of decompressed buffer dynamically at runtime.

2023-01-18 Thread Divij Vaidya (Jira)
Divij Vaidya created KAFKA-14634:


 Summary: Compression optimization: Calculate the size of 
decompressed buffer dynamically at runtime.
 Key: KAFKA-14634
 URL: https://issues.apache.org/jira/browse/KAFKA-14634
 Project: Kafka
  Issue Type: Sub-task
Reporter: Divij Vaidya
Assignee: Divij Vaidya
 Fix For: 3.5.0


Calculate the size of decompressed buffer dynamically at runtime. It could be 
based on recommendation provided by Zstd. Currently fixed at 16KB. Using the 
value that is recommended by Zstd saves a copy in native code.
see: [https://github.com/facebook/zstd/issues/340]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14633) Compression optimization: Use BufferSupplier to allocate the intermediate decompressed buffer

2023-01-18 Thread Divij Vaidya (Jira)
Divij Vaidya created KAFKA-14633:


 Summary: Compression optimization: Use BufferSupplier to allocate 
the intermediate decompressed buffer
 Key: KAFKA-14633
 URL: https://issues.apache.org/jira/browse/KAFKA-14633
 Project: Kafka
  Issue Type: Sub-task
Reporter: Divij Vaidya
Assignee: Divij Vaidya
 Fix For: 3.5.0


Use BufferSupplier to allocate the intermediate decompressed buffer.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14632) Compression optimization: Remove unnecessary intermediate buffers

2023-01-18 Thread Divij Vaidya (Jira)
Divij Vaidya created KAFKA-14632:


 Summary: Compression optimization: Remove unnecessary intermediate 
buffers
 Key: KAFKA-14632
 URL: https://issues.apache.org/jira/browse/KAFKA-14632
 Project: Kafka
  Issue Type: Sub-task
Reporter: Divij Vaidya
Assignee: Divij Vaidya
 Fix For: 3.5.0


Remove two layers of buffers (the 16KB one and 2KB one) and replace with a 
single buffer called decompressionBuffer. The time it takes to prepare a batch 
for decompression will be bounded by the allocation of largest buffer and 
hence, using only one large buffer (16KB) doesn’t cause any regression.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14631) Compression optimization: do not read the key/value for last record in the batch

2023-01-18 Thread Divij Vaidya (Jira)
Divij Vaidya created KAFKA-14631:


 Summary: Compression optimization: do not read the key/value for 
last record in the batch
 Key: KAFKA-14631
 URL: https://issues.apache.org/jira/browse/KAFKA-14631
 Project: Kafka
  Issue Type: Sub-task
Reporter: Divij Vaidya
Assignee: Divij Vaidya
 Fix For: 3.5.0


Do not read the end of the batch since it contains the key/value for last 
record. Instead of “skipping” which would lead to decompression, we can simply 
not read it at all.

Only applicable for skipIterator.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14630) Update zstd-jni version to 1.5.2-4

2023-01-18 Thread Divij Vaidya (Jira)
Divij Vaidya created KAFKA-14630:


 Summary: Update zstd-jni version to 1.5.2-4
 Key: KAFKA-14630
 URL: https://issues.apache.org/jira/browse/KAFKA-14630
 Project: Kafka
  Issue Type: Sub-task
  Components: core
Reporter: Divij Vaidya
Assignee: Divij Vaidya
 Fix For: 3.5.0


While the latest version as of Jan 2023 is 1.5.2-5, we will not upgrade to the 
latest version because it inflates the size of binary [2]with no significant 
upgrade benefits.


{noformat}
diviv:zstd-jni/ (master) $ git log --pretty=oneline v1.5.2-1...v1.5.2-3         
                                           [14:13:27]
c983ae3e086b63a40e1bb430cb2ebf95ecc52c71 (tag: v1.5.2-3) Adjust signature 
comments after e5c6a3290b8335db7c70877fda22ca26a96c72e4.
510bbd6be80592227c6e5cf8cd8d71cb76c0c279 Add methods for streaming 
(de)compression of direct ByteBuffers.
62b9dad49fc00f253cb35c1942c3ca6af4ee2b47 Fix lgtm C++.
73ae46e1af16619143b7c87e35ad9c05363e2c97 v1.5.2-3
e5c6a3290b8335db7c70877fda22ca26a96c72e4 Fix overflows
54d3d50c16d96bd8a30e2d4c0a9648001a52d6f9 Fix some error return codes.
b788a2ed7a5e36e5252b1696e6cc8bae48a7afbc Upgrade scala.
31060934c26e080031465702ec369591e12874f8 Add NoFinalizer variants for the 
direct buffer streams.
3d5ab915167f2dfe244d2d2192b66a9feac7c543 (tag: v1.5.2-2) Fix the symbols export 
on the cross-compiled libraries also
8a4993f57119e2b682d1f0b8db263027983208c6 Use different approach for MacOS to 
not export all symbols
15352a3941e8bec2dd1f374768478c02eaf69d6f Don't pass version-script to clang 
linker, it's not supported
9916cbd611cd882b5d4958d4acf0518c01363487 Don't export the zstd symbols, just 
our own ones
277cb2779e2e1922d6de0cf4e1e7519d2647acef fix spelling
ff3a141d78abc71741734b0d0524bf47096b29f8 Fallback to `scp` if `rsync` in not 
installed on the build machines
ba99eec9ac9a5c3eb16200ae67c235a91b16b570 Up the version
4c4d9cd382b6e515a7d0d6cd37c1ebb087f5ab73 Support LoongArch64
97550e35610cd36e2ad510e20e503c0f997c1a3a Remove 
frameHeaderSize(Min|Max){noformat}



 [2] [https://github.com/luben/zstd-jni/issues/237] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14629) Performance improvement for Zstd compressed workload

2023-01-18 Thread Divij Vaidya (Jira)
Divij Vaidya created KAFKA-14629:


 Summary: Performance improvement for Zstd compressed workload
 Key: KAFKA-14629
 URL: https://issues.apache.org/jira/browse/KAFKA-14629
 Project: Kafka
  Issue Type: Improvement
  Components: core
Reporter: Divij Vaidya
Assignee: Divij Vaidya
 Attachments: benchmark-jira.xlsx

h2. Motivation

>From a CPU flamegraph analysis for a compressed workload (openmessaging), we 
>have observed that ValidateMessagesAndAssignOffsets method takes 75% of the 
>total CPU/time taken by UnifiedLog.appendAsLeader(). The improvements 
>suggested below will reduce CPU usage and increase throughput.
h2. Background

A producer will append multiple records in a batch and the batch will be 
compressed. This compressed batch along with some headers will be sent to the 
server. On the server, it will perform a checksum for the batch to validate 
data integrity during network transfer. The batch payload is still in 
compressed form so far. Broker will now try to append this batch to the log. 
Before appending, broker will perform schema integrity validation on individual 
records such as record offsets are monotonically increasing etc. To perform 
these validations, server will have to decompress the batch.

The schema validation of a batch on the server is done by decompressing and 
validating individual records. For each records, the validation needs to read 
all fields from the record except for key and value. [1]
h2. Performance requirements

Pre-allocation of array should not add excessive overhead to batches with small 
records → For example allocating a 65KB array for a record of size 1KB is an 
overkill and negatively impacts performance for small size requests.

Overhead of skipping bytes should be minimal → we don’t need to read key/value 
of a record which on average is the largest amount of data in a record. The 
implementation should efficiently skip key/value bytes 

Minimize the number of JNI calls → JNI calls are expensive and work best when 
you make fewer calls to decompress/compress the same amount of data.

Minimize new byte array/buffer allocation → Ideally, the only array allocation 
that should happen would be the array used to store the result of 
decompression. Even this could be optimized by using buffers backed direct 
memory or re-using same buffers since we process one record at a time.
h2. Current implementation - decompression + zstd

We allocated a 2KB array called skipArray to store decompressed data [2]. This 
array is re-used for the scope of a batch (i.e. across all records). 

We allocate a 16KB array to buffer the data between skipArray and underlying 
zstd-jni library calls [3]. The motivation of doing is to read at least 16KB of 
data at-a-time in one single call to the JNI layer. This array is re-used for 
the scope of a batch (i.e. across all records).

We provide a BufferPool to zstd-jni. It uses this pool to create buffers for 
it’s own use, i.e. one allocation per batch and one allocation per skip call(). 
Note that this pool is not used to store the output of decompression. 
Currently, we use BufferPool which is scoped to a thread. 


h2. Potential improvements
 # Do not read the end of the batch since it contains the key/value for last 
record. Instead of “skipping” which would lead to decompression, we can simply 
not read it at all.
 # Remove two layers of buffers (the 16KB one and 2KB one) and replace with a 
single buffer called decompressionBuffer. The time it takes to prepare a batch 
for decompression will be bounded by the allocation of largest buffer and 
hence, using only one large buffer (16KB) doesn’t cause any regression.
 # Use BufferSupplier to allocate the intermediate decompressed buffer.
 # Calculate the size of decompressed buffer dynamically at runtime. It could 
be based on recommendation provided by Zstd. Currently fixed at 16KB. Using the 
value that is recommended by Zstd saves a copy in native code. 
[https://github.com/facebook/zstd/issues/340]
 # Provide a pool of direct buffers to zstd-jni for it’s internal usage. Direct 
buffers is an ideal use case for scenarios where data is transferred across JNI 
such as the case in (de) compression. The latest version of zstd-jni works with 
direct buffers.
 # Read the network input into a direct buffer and pass that to zstd-jni for 
decompression. Store the output in a direct buffer as well.
 # Use dictionary functionality of decompression. Train the dictionary for 
first few MBs and then use it.
 # Use the skip functionality of zstd-jni and do not bring “skipped” data to 
Kafka layer, hence, we don’t need a buffer size to store skipped data in Kafka. 
This could be done by using DataInputStream and removing the intermediate 
buffer stream (16Kb one).

h2. Prototype implementation

[https://github.com/divijvaidya/kafka/commits/optimize-compress

[DISCUSS] KIP-899: Allow clients to rebootstrap

2023-01-18 Thread Ivan Yurchenko
Hello!
I would like to start the discussion thread on KIP-899: Allow clients to
rebootstrap.
This KIP proposes to allow Kafka clients to repeat the bootstrap process
when fetching metadata if none of the known nodes are available.
https://cwiki.apache.org/confluence/display/KAFKA/KIP-899%3A+Allow+clients+to+rebootstrap

A question right away: should we eventually change the default behavior or
it can remain configurable "forever"? The latter is proposed in the KIP.

Thank you!

Ivan


Re: [ANNOUNCE] New committer: Stanislav Kozlovski

2023-01-18 Thread Pere Urbón Bayes
Congrats Stan,
  very well deserved.

-- Pere

On Wed, Jan 18, 2023 at 7:51 AM Yash Mayya  wrote:

> Congratulations, Stan!
>
> On Tue, Jan 17, 2023 at 9:20 PM Jun Rao  wrote:
>
> > Hi, Everyone,
> >
> > The PMC of Apache Kafka is pleased to announce a new Kafka committer
> > Stanislav Kozlovski.
> >
> > Stan has been contributing to Apache Kafka since June 2018. He made
> various
> > contributions including the following KIPs.
> >
> > KIP-455: Create an Administrative API for Replica Reassignment
> > KIP-412: Extend Admin API to support dynamic application log levels
> >
> > Congratulations, Stan!
> >
> > Thanks,
> >
> > Jun (on behalf of the Apache Kafka PMC)
> >
>


-- 
Pere Urbon-Bayes
Software Architect
https://twitter.com/purbon
https://www.linkedin.com/in/purbon/


Re: [ANNOUNCE] New committer: Walker Carlson

2023-01-18 Thread Bruno Cadonna

Congrats, Walker!
Well deserved!

Best,
Bruno

On 18.01.23 05:29, Sagar wrote:

Congratulations Walker!

Thanks!
Sagar.

On Wed, Jan 18, 2023 at 9:32 AM Tom Bentley  wrote:


Congratulations!

On Wed, 18 Jan 2023 at 01:26, John Roesler  wrote:


Congratulations, Walker!
-John

On Tue, Jan 17, 2023, at 18:50, Guozhang Wang wrote:

Congrats, Walker!

On Tue, Jan 17, 2023 at 2:20 PM Chris Egerton 


wrote:


Congrats, Walker!

On Tue, Jan 17, 2023, 17:07 Bill Bejeck 

wrote:



Congratulations, Walker!

-Bill

On Tue, Jan 17, 2023 at 4:57 PM Matthias J. Sax 

wrote:



Dear community,

I am pleased to announce Walker Carlson as a new Kafka committer.

Walker has been contributing to Apache Kafka since November 2019.

He

made various contributions including the following KIPs.

KIP-671: Introduce Kafka Streams Specific Uncaught Exception

Handler

KIP-696: Update Streams FSM to clarify ERROR state meaning
KIP-715: Expose Committed offset in streams


Congratulations Walker and welcome on board!


Thanks,
-Matthias (on behalf of the Apache Kafka PMC)














Re: [DISCUSS] KIP-895: Dynamically refresh partition count of __consumer_offsets

2023-01-18 Thread Christo Lolov
Greetings,

I am bumping the below DISCUSSion thread for KIP-895. The KIP presents a
situation where consumer groups are in an undefined state until a rolling
restart of a cluster is performed. While I have demonstrated the behaviour
using a cluster using Zookeeper I believe the same problem can be shown in
a KRaft cluster. Please let me know your opinions on the problem and the
presented solution.

Best,
Christo

On Thursday, 29 December 2022 at 14:19:27 GMT, Christo
>  wrote:
>
>
> Hello!
> I would like to start this discussion thread on KIP-895: Dynamically
> refresh partition count of __consumer_offsets.
> The KIP proposes to alter brokers so that they refresh the partition count
> of __consumer_offsets used to determine group coordinators without
> requiring a rolling restart of the cluster.
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-895%3A+Dynamically+refresh+partition+count+of+__consumer_offsets
>
> Let me know your thoughts on the matter!
> Best, Christo
>


[jira] [Created] (KAFKA-14628) Move CommandLineUtils and CommandDefaultOptions shared classes

2023-01-18 Thread Federico Valeri (Jira)
Federico Valeri created KAFKA-14628:
---

 Summary: Move CommandLineUtils and CommandDefaultOptions shared 
classes
 Key: KAFKA-14628
 URL: https://issues.apache.org/jira/browse/KAFKA-14628
 Project: Kafka
  Issue Type: Sub-task
Reporter: Federico Valeri






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [ANNOUNCE] New committer: Walker Carlson

2023-01-18 Thread Mickael Maison
Congratulations Walker!

On Wed, Jan 18, 2023 at 7:52 AM Yash Mayya  wrote:
>
> Congrats, Walker!
>
> On Wed, Jan 18, 2023 at 3:27 AM Matthias J. Sax  wrote:
>
> > Dear community,
> >
> > I am pleased to announce Walker Carlson as a new Kafka committer.
> >
> > Walker has been contributing to Apache Kafka since November 2019. He
> > made various contributions including the following KIPs.
> >
> > KIP-671: Introduce Kafka Streams Specific Uncaught Exception Handler
> > KIP-696: Update Streams FSM to clarify ERROR state meaning
> > KIP-715: Expose Committed offset in streams
> >
> >
> > Congratulations Walker and welcome on board!
> >
> >
> > Thanks,
> >-Matthias (on behalf of the Apache Kafka PMC)
> >