Build failed in Jenkins: Kafka » Kafka Branch Builder » trunk #522

2021-10-14 Thread Apache Jenkins Server
See 


Changes:


--
[...truncated 493658 lines...]
[2021-10-15T04:05:43.784Z] 
[2021-10-15T04:05:43.784Z] ConsumerBounceTest > testCloseDuringRebalance() 
PASSED
[2021-10-15T04:05:43.784Z] 
[2021-10-15T04:05:43.784Z] ConsumerBounceTest > testClose() STARTED
[2021-10-15T04:05:45.540Z] 
[2021-10-15T04:05:45.540Z] PlaintextConsumerTest > 
testMultiConsumerSessionTimeoutOnClose() PASSED
[2021-10-15T04:05:45.540Z] 
[2021-10-15T04:05:45.540Z] PlaintextConsumerTest > 
testMultiConsumerStickyAssignor() STARTED
[2021-10-15T04:05:59.427Z] 
[2021-10-15T04:05:59.427Z] > Task :streams:integrationTest
[2021-10-15T04:05:59.427Z] 
[2021-10-15T04:05:59.427Z] 
org.apache.kafka.streams.processor.internals.StreamsAssignmentScaleTest > 
testStickyTaskAssignorLargePartitionCount PASSED
[2021-10-15T04:05:59.427Z] 
[2021-10-15T04:05:59.427Z] 
org.apache.kafka.streams.processor.internals.StreamsAssignmentScaleTest > 
testFallbackPriorTaskAssignorManyStandbys STARTED
[2021-10-15T04:06:00.365Z] 
[2021-10-15T04:06:00.365Z] 
org.apache.kafka.streams.processor.internals.StreamsAssignmentScaleTest > 
testFallbackPriorTaskAssignorManyStandbys PASSED
[2021-10-15T04:06:00.365Z] 
[2021-10-15T04:06:00.365Z] 
org.apache.kafka.streams.processor.internals.StreamsAssignmentScaleTest > 
testHighAvailabilityTaskAssignorManyStandbys STARTED
[2021-10-15T04:06:16.621Z] 
[2021-10-15T04:06:16.621Z] > Task :core:integrationTest
[2021-10-15T04:06:16.621Z] 
[2021-10-15T04:06:16.621Z] ConsumerBounceTest > testClose() PASSED
[2021-10-15T04:06:16.621Z] 
[2021-10-15T04:06:16.621Z] ConsumerBounceTest > 
testSeekAndCommitWithBrokerFailures() STARTED
[2021-10-15T04:06:23.632Z] 
[2021-10-15T04:06:23.632Z] PlaintextConsumerTest > 
testMultiConsumerStickyAssignor() PASSED
[2021-10-15T04:06:23.632Z] 
[2021-10-15T04:06:23.632Z] PlaintextConsumerTest > 
testFetchRecordLargerThanFetchMaxBytes() STARTED
[2021-10-15T04:06:29.391Z] 
[2021-10-15T04:06:29.392Z] PlaintextConsumerTest > 
testFetchRecordLargerThanFetchMaxBytes() PASSED
[2021-10-15T04:06:29.392Z] 
[2021-10-15T04:06:29.392Z] PlaintextConsumerTest > testAutoCommitOnClose() 
STARTED
[2021-10-15T04:06:30.331Z] 
[2021-10-15T04:06:30.331Z] > Task :streams:integrationTest
[2021-10-15T04:06:30.331Z] 
[2021-10-15T04:06:30.331Z] 
org.apache.kafka.streams.processor.internals.StreamsAssignmentScaleTest > 
testHighAvailabilityTaskAssignorManyStandbys PASSED
[2021-10-15T04:06:30.331Z] 
[2021-10-15T04:06:30.331Z] 
org.apache.kafka.streams.processor.internals.StreamsAssignmentScaleTest > 
testFallbackPriorTaskAssignorLargeNumConsumers STARTED
[2021-10-15T04:06:30.331Z] 
[2021-10-15T04:06:30.331Z] 
org.apache.kafka.streams.processor.internals.StreamsAssignmentScaleTest > 
testFallbackPriorTaskAssignorLargeNumConsumers PASSED
[2021-10-15T04:06:30.331Z] 
[2021-10-15T04:06:30.331Z] 
org.apache.kafka.streams.processor.internals.StreamsAssignmentScaleTest > 
testStickyTaskAssignorLargeNumConsumers STARTED
[2021-10-15T04:06:30.331Z] 
[2021-10-15T04:06:30.331Z] 
org.apache.kafka.streams.processor.internals.StreamsAssignmentScaleTest > 
testStickyTaskAssignorLargeNumConsumers PASSED
[2021-10-15T04:06:30.331Z] 
[2021-10-15T04:06:30.331Z] 
org.apache.kafka.streams.processor.internals.HandlingSourceTopicDeletionIntegrationTest
 > shouldThrowErrorAfterSourceTopicDeleted STARTED
[2021-10-15T04:06:30.331Z] 
[2021-10-15T04:06:30.331Z] 
org.apache.kafka.streams.processor.internals.HandlingSourceTopicDeletionIntegrationTest
 > shouldThrowErrorAfterSourceTopicDeleted PASSED
[2021-10-15T04:06:40.354Z] 
[2021-10-15T04:06:40.354Z] > Task :core:integrationTest
[2021-10-15T04:06:40.354Z] 
[2021-10-15T04:06:40.354Z] PlaintextConsumerTest > testAutoCommitOnClose() 
PASSED
[2021-10-15T04:06:40.354Z] 
[2021-10-15T04:06:40.354Z] PlaintextConsumerTest > testListTopics() STARTED
[2021-10-15T04:06:40.354Z] 
[2021-10-15T04:06:40.354Z] ConsumerBounceTest > 
testSeekAndCommitWithBrokerFailures() PASSED
[2021-10-15T04:06:40.354Z] 
[2021-10-15T04:06:40.354Z] ConsumerBounceTest > 
testConsumerReceivesFatalExceptionWhenGroupPassesMaxSize() STARTED
[2021-10-15T04:06:40.354Z] 
[2021-10-15T04:06:40.354Z] PlaintextConsumerTest > testListTopics() PASSED
[2021-10-15T04:06:40.354Z] 
[2021-10-15T04:06:40.354Z] PlaintextConsumerTest > 
testExpandingTopicSubscriptions() STARTED
[2021-10-15T04:06:41.293Z] 
[2021-10-15T04:06:41.293Z] ConsumerBounceTest > 
testConsumerReceivesFatalExceptionWhenGroupPassesMaxSize() PASSED
[2021-10-15T04:06:41.293Z] 
[2021-10-15T04:06:41.293Z] ConsumerBounceTest > 
testSubscribeWhenTopicUnavailable() STARTED
[2021-10-15T04:06:43.052Z] 
[2021-10-15T04:06:43.052Z] PlaintextConsumerTest > 
testExpandingTopicSubscriptions() PASSED
[2021-10-15T04:06:43.052Z] 
[2021-10-15T04:06:43.052Z] PlaintextConsumerTest > 
testMultiConsumerDefaultAssignor() STARTED
[2021-10-15T04:07:02.077Z] 
[2021-10-15T04:07:02.077Z] PlaintextConsumerTest > 

Re: [SPAM] Re: Why does Kafka have a higher throughput than Redis?

2021-10-14 Thread Luke Chen
Hi Vitor,
I'm not the expert, either, but I think Andrew's answer is pretty much the
reasons why Kafka is doing good.
And I'm not too familiar with Redis, either. But I'd say, there are many
configurations in each product to increase the throughput, and the use
cases are different, the comparison might not be fair.

For your question:
3.  Didn't know about this zero-copy technique, I'll read more about that
but feels like the result would be a response similar to as if kafka had
the info stored in-memory (as redis do) but that would still make me
question how is that Kafka can handle a higher throughput if the "design"
is so similar.
--> Again, I'm not familiar with Redis, but even you store data in memory,
if there's no OS's help, you still need to copy data to kernel space to
send to the receiver, compared with the zero-copy technique, all data flow
are within kernel space.

But again, the use cases are different, the comparison might not be fair.
We can only analyze and learn why and how they have good throughput.
That's my two cents.

Thank you.
Luke

On Fri, Oct 15, 2021 at 3:47 AM Vitor Augusto de Medeiros <
v.medei...@aluno.ufabc.edu.br> wrote:

> Thanks for the response, Andrew, i appreciate the help!
>
> Just i few thoughts that came up while reading your points:
>
>
>   1.  In theory, Redis is also handling/storing data in memory which makes
> me wonder why is that Kafka does it better? Perhaps it has to do with the
> API contract, where, as you said, there's no complex transactional software
> that might hurt performance.
>   2.  Didn't know there was such a big difference from linear to random
> writes, pretty awesome! But I still don't understand how disk usage, even
> If doing linear writes, is still allowing a throughput rate of 2 to 3x the
> amount of Redis, which doesn't use disk write/read at all and keep messages
> stored in memory.
>   3.  Didn't know about this zero-copy technique, I'll read more about
> that but feels like the result would be a response similar to as if kafka
> had the info stored in-memory (as redis do) but that would still make me
> question how is that Kafka can handle a higher throughput if the "design"
> is so similar.
>
>
> 
> De: Andrew Grant 
> Enviado: quinta-feira, 14 de outubro de 2021 15:55
> Para: dev@kafka.apache.org 
> Assunto: [SPAM] Re: Why does Kafka have a higher throughput than Redis?
>
> Hi Vitor,
>
> I'm not an expert and probably some more knowledgeable folks can also chime
> in (and correct me) but a few things came to mind:
>
> 1) On the write side (i.e. when using the publisher), Kafka does not flush
> data to disk by default. It writes to the page cache so all writes are sort
> of in-memory in a way. They're staged in the page cache and the kernel
> flushes the data asynchronously. Also the API contract for Kafka is quite
> "simple" in that it mostly reads and writes arbitrary sequences of bytes -
> there isn't as much complex transactional software in front of the
> writing/reading that might hurt performance compared to some other data
> stores. Note, Kafka does provide things like idempotence and transactions
> so it's not like there is never any overhead to consider.
>
> 2) Kafka reads and writes are conducive to being linear which helps a lot
> with performance. Random writes are a lot slower than linear ones.
>
> 3) For reading (i.e. when using the consumer) data Kafka uses a zero-copy
> technique in which data is directly sent from the page cache to the network
> buffer without going through user space which helps a lot.
>
> 4) Kafka batches aggressively.
>
> Here are two resources which might provide more information
> https://docs.confluent.io/platform/current/kafka/design.html,
>
> https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines
> .
>
> Hope this helps a bit.
>
> Andrew
>
> On Thu, Oct 14, 2021 at 1:11 PM Vitor Augusto de Medeiros <
> v.medei...@aluno.ufabc.edu.br> wrote:
>
> > Hi everyone,
> >
> >  i'm doing a benchmark comparison between Kafka and Redis for my final
> > bachelor paper and would like to understand more about why Kafka have
> > higher throughput if compared to Redis.
> >
> >  I noticed Redis has lower overall latency (and makes sense since it's
> > stored in memory) but cant figure out the difference in throughput.
> >
> > I found a study (not sure if i can post links here but it's named A
> > COMPARISON OF DATA INGESTION PLATFORMS IN REAL-TIME STREAM PROCESSING
> > PIPELINES by Sebastian Tallberg)
> > showing Kafka's throughput hitting 3x the amount of msg/s if compared to
> > Redis for a 1kB payload. I would like to understand what is in Kafka's
> > architecture that allows it to be a lot faster than other message
> > brokers/Redis in particular
> >
> > Thanks!
> >
>
>
> --
> Andrew Grant
> 8054482621
>


Re: Kafka client 2.7.1 missing JaasUtils.isZkSecurityEnabled() method

2021-10-14 Thread Luke Chen
Hi Alexandre
Yes, you're right. We renamed the `isZkSecurityEnabled` method name into
`isZkSaslEnabled`, because it checked sasl config only. You can check here

.

If you want to check TLS configuration, you can check here

and here
,
basically it just check the tls is enabled and client socket/keystore is
set.

Thank you.
Luke


On Thu, Oct 14, 2021 at 11:04 PM Alexandre Vermeerbergen <
avermeerber...@gmail.com> wrote:

> Hello,
>
> When upgrading from Kafka client 2.4.0 our dependencies to Kafka
> client 2.7.1, we noticed that JaasUtils.isZkSecurityEnabled() method
> no longer exists.
>
> Is there an equivalent method in Kafka client 2.7.1 which could use
> instead ?
>
> Kind regards,
> Alexandre
>


Re: [DISCUSS] KIP-778 KRaft Upgrades

2021-10-14 Thread David Arthur
Kowshik, thanks for the review!

7001. An enum sounds like a good idea here. Especially since setting
Upgrade=false and Force=true doesn't really make sense. An enum would avoid
confusing/invalid combinations of flags

7002. How about adding --force-downgrade as an alternative to the
--downgrade argument? So, it would take the same arguments (list of
feature:version), but use the DOWNGRADE_UNSAFE option in the RPC.

7003. Yes we will need the advanced CLI since we will need to only modify
the "metadata.version" FF. I was kind of wondering if we might want
separate sub-commands for the different operations instead of all under
"update". E.g., "kafka-features.sh
upgrade|downgrade|force-downgrade|delete|describe".

7004/7005. The intention in this KIP is that we bump the "metadata.version"
liberally and that most upgrades are backwards compatible. We're relying on
this for feature gating as well as indicating compatibility. The enum is
indeed an implementation detail that is enforced by the controller when
handling UpdateFeaturesRequest. As for lossy downgrades, this really only
applies to the metadata log as we will lose some fields and records when
downgrading to an older version. This is useful as an escape hatch for
cases when a software upgrade has occurred, the feature flag was increased,
and then bugs are discovered. Without the lossy downgrade scenario, we have
no path back to a previous software version.

As for the min/max finalized version, I'm not totally clear on cases where
these would differ. I think for "metadata.version" we just want a single
finalized version for the whole cluster (like we do for IBP).

-David


On Thu, Oct 14, 2021 at 1:59 PM José Armando García Sancio
 wrote:

> On Tue, Oct 12, 2021 at 10:57 AM Colin McCabe  wrote:
> > > 11. For downgrades, it would be useful to describe how to determine the
> > > downgrade process (generating new snapshot, propagating the snapshot,
> etc)
> > > has completed. We could block the UpdateFeature request until the
> process
> > > is completed. However, since the process could take time, the request
> could
> > > time out. Another way is through DescribeFeature and the server only
> > > reports downgraded versions after the process is completed.
> >
> > Hmm.. I think we need to avoid blocking, since we don't know how long it
> will take for all nodes to act on the downgrade request. After all, some
> nodes may be down.
> >
> > But I agree we should have some way of knowing when the upgrade is done.
> DescribeClusterResponse seems like the natural place to put information
> about each node's feature level. While we're at it, we should also add a
> boolean to indicate whether the given node is fenced. (This will always be
> false for ZK mode, of course...)
> >
>
> I agree. I think from the user's point of view, they would like to
> know if it is safe to downgrade the software version of a specific
> broker or controller. I think that it is safe to downgrade a broker or
> controller when the metadata.version has been downgraded and the node
> has generated a snapshot with the downgraded metadata.version.
>
> --
> -Jose
>


-- 
David Arthur


RE: [SPAM] Re: Why does Kafka have a higher throughput than Redis?

2021-10-14 Thread Vitor Augusto de Medeiros
Thanks for the response, Andrew, i appreciate the help!

Just i few thoughts that came up while reading your points:


  1.  In theory, Redis is also handling/storing data in memory which makes me 
wonder why is that Kafka does it better? Perhaps it has to do with the API 
contract, where, as you said, there's no complex transactional software that 
might hurt performance.
  2.  Didn't know there was such a big difference from linear to random writes, 
pretty awesome! But I still don't understand how disk usage, even If doing 
linear writes, is still allowing a throughput rate of 2 to 3x the amount of 
Redis, which doesn't use disk write/read at all and keep messages stored in 
memory.
  3.  Didn't know about this zero-copy technique, I'll read more about that but 
feels like the result would be a response similar to as if kafka had the info 
stored in-memory (as redis do) but that would still make me question how is 
that Kafka can handle a higher throughput if the "design" is so similar.



De: Andrew Grant 
Enviado: quinta-feira, 14 de outubro de 2021 15:55
Para: dev@kafka.apache.org 
Assunto: [SPAM] Re: Why does Kafka have a higher throughput than Redis?

Hi Vitor,

I'm not an expert and probably some more knowledgeable folks can also chime
in (and correct me) but a few things came to mind:

1) On the write side (i.e. when using the publisher), Kafka does not flush
data to disk by default. It writes to the page cache so all writes are sort
of in-memory in a way. They're staged in the page cache and the kernel
flushes the data asynchronously. Also the API contract for Kafka is quite
"simple" in that it mostly reads and writes arbitrary sequences of bytes -
there isn't as much complex transactional software in front of the
writing/reading that might hurt performance compared to some other data
stores. Note, Kafka does provide things like idempotence and transactions
so it's not like there is never any overhead to consider.

2) Kafka reads and writes are conducive to being linear which helps a lot
with performance. Random writes are a lot slower than linear ones.

3) For reading (i.e. when using the consumer) data Kafka uses a zero-copy
technique in which data is directly sent from the page cache to the network
buffer without going through user space which helps a lot.

4) Kafka batches aggressively.

Here are two resources which might provide more information
https://docs.confluent.io/platform/current/kafka/design.html,
https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines
.

Hope this helps a bit.

Andrew

On Thu, Oct 14, 2021 at 1:11 PM Vitor Augusto de Medeiros <
v.medei...@aluno.ufabc.edu.br> wrote:

> Hi everyone,
>
>  i'm doing a benchmark comparison between Kafka and Redis for my final
> bachelor paper and would like to understand more about why Kafka have
> higher throughput if compared to Redis.
>
>  I noticed Redis has lower overall latency (and makes sense since it's
> stored in memory) but cant figure out the difference in throughput.
>
> I found a study (not sure if i can post links here but it's named A
> COMPARISON OF DATA INGESTION PLATFORMS IN REAL-TIME STREAM PROCESSING
> PIPELINES by Sebastian Tallberg)
> showing Kafka's throughput hitting 3x the amount of msg/s if compared to
> Redis for a 1kB payload. I would like to understand what is in Kafka's
> architecture that allows it to be a lot faster than other message
> brokers/Redis in particular
>
> Thanks!
>


--
Andrew Grant
8054482621


[jira] [Created] (KAFKA-13378) lost links after deleting partition

2021-10-14 Thread Maciej Malecki (Jira)
Maciej Malecki created KAFKA-13378:
--

 Summary: lost links after deleting partition
 Key: KAFKA-13378
 URL: https://issues.apache.org/jira/browse/KAFKA-13378
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 3.0.0
Reporter: Maciej Malecki


The broker deletes * .deleted files before closing them



sudo lsof | egrep "deleted|COMMAND" | grep -c kafka

20566



Example:

 

_java   714324  714654 java  kafka  DEL   REG   
   253,6    218103947 
/app/kafka_logs/a_messages_12_3-11/000621913890.index.deleted_

_java   714324  714654 java  kafka  DEL   REG   
   253,6    285212812 
/app/kafka_logs/a_messages_12_3-6/000621940158.index.deleted_

_java   714324  714654 java  kafka  DEL   REG       
   253,6    155206805 
/app/kafka_logs/a_messages_12_3-10/000621967706.index.deleted_

_java   714324  714654 java  kafka  DEL   REG   
   253,6    218106180 
/app/kafka_logs/__consumer_offsets-36/02344365.timeindex.deleted_

_java   714324  714654 java  kafka  DEL   REG   
   253,6    218106179 
/app/kafka_logs/__consumer_offsets-36/02344365.index.deleted_



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [VOTE] KIP-768: Extend SASL/OAUTHBEARER with Support for OIDC

2021-10-14 Thread Kirk True
Hi Ismael,

Thanks for reviewing the KIP. I've made a first pass at updating based on your 
feedback.

Questions/comments inline...

On Thu, Oct 14, 2021, at 6:20 AM, Ismael Juma wrote:
> Hi Kirk,
> 
> Thanks for the KIP. It looks good overall to me. A few nits:
> 
> 1. "sasl.login.retry.wait.ms": these configs are typically called `backoff`
> in Kafka. For example "retry.backoff.ms". The default for `retry.backoff.ms`
> is 100ms. Is there a reason why we are using a different value for this
> one? The `sasl.login.retry.max.wait.ms` should be renamed accordingly.


Changed to sasl.login.retry.backoff.ms and sasl.login.retry.backoff.max.ms and 
changed the former to 100 ms.

> 2. "sasl.login.attempts": do we need this at all? We have generally moved
> away from number of retries in favor of timeouts for Kafka (the producer
> has a retries config partly for historical reasons, but partly due to
> semantics that are specific to the producer.

Removed this option and now we just retry up to sasl.login.retry.backoff.max.ms.

> 3. "sasl.login.read.timeout.ms" : we have two types of kafka timeouts, "
> default.api.timeout.ms" and "request.timeout.ms". Is this similar to any of
> the two or is it different? If similar to one of the existing ones, we
> should name it similarly.

This is specifically for the setReadTimeout on java.net.URLConnection when 
making the call to the OAuth/OIDC provider to retrieve the token. So it is SASL 
specific because reading the response from the external OAuth/OIDC provider 
(likely over WAN) may require a longer timeout.

> 4. "sasl.login.connect.timeout.ms": is this the equivalent of "
> socket.connection.setup.timeout.ms" in Kafka? I am unsure why we chose such
> a long name, "connect.timeout.ms" would have been a lot better. However, if
> it is similar, then we may want to follow the same naming convention.

This is for the setConnectTimeout on java.net.URLConnection, similar to the 
above.

> 5. Should there be a "connect.max.timeout.ms" too?

AFAIK, we don't have that level of control per our use of URLConnection.

> 6. What are the compatibility guarantees offered by the
> "OAuthCompatibilityTest" CLI tool? Also, can we adjust the name so it's
> clear that it's a Command versus a test suite?

I changed the name to OAuthCompatibilityTool. Can you elaborate on what 
compatibility guarantees you'd like to see listed? I may just be 
misunderstanding the request.

Thanks,
Kirk

> 
> Thanks!
> 
> Ismael
> 
> On Mon, Sep 27, 2021 at 10:20 AM Kirk True  wrote:
> 
> > Hi all!
> >
> > I'd like to start a vote for KIP-768 that allows Kafka to connect to an
> > OAuth/OIDC identity provider for authentication and token retrieval:
> >
> >
> > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=186877575
> >
> > Thanks!
> > Kirk
> 


Re: Why does Kafka have a higher throughput than Redis?

2021-10-14 Thread Andrew Grant
Hi Vitor,

I'm not an expert and probably some more knowledgeable folks can also chime
in (and correct me) but a few things came to mind:

1) On the write side (i.e. when using the publisher), Kafka does not flush
data to disk by default. It writes to the page cache so all writes are sort
of in-memory in a way. They're staged in the page cache and the kernel
flushes the data asynchronously. Also the API contract for Kafka is quite
"simple" in that it mostly reads and writes arbitrary sequences of bytes -
there isn't as much complex transactional software in front of the
writing/reading that might hurt performance compared to some other data
stores. Note, Kafka does provide things like idempotence and transactions
so it's not like there is never any overhead to consider.

2) Kafka reads and writes are conducive to being linear which helps a lot
with performance. Random writes are a lot slower than linear ones.

3) For reading (i.e. when using the consumer) data Kafka uses a zero-copy
technique in which data is directly sent from the page cache to the network
buffer without going through user space which helps a lot.

4) Kafka batches aggressively.

Here are two resources which might provide more information
https://docs.confluent.io/platform/current/kafka/design.html,
https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines
.

Hope this helps a bit.

Andrew

On Thu, Oct 14, 2021 at 1:11 PM Vitor Augusto de Medeiros <
v.medei...@aluno.ufabc.edu.br> wrote:

> Hi everyone,
>
>  i'm doing a benchmark comparison between Kafka and Redis for my final
> bachelor paper and would like to understand more about why Kafka have
> higher throughput if compared to Redis.
>
>  I noticed Redis has lower overall latency (and makes sense since it's
> stored in memory) but cant figure out the difference in throughput.
>
> I found a study (not sure if i can post links here but it's named A
> COMPARISON OF DATA INGESTION PLATFORMS IN REAL-TIME STREAM PROCESSING
> PIPELINES by Sebastian Tallberg)
> showing Kafka's throughput hitting 3x the amount of msg/s if compared to
> Redis for a 1kB payload. I would like to understand what is in Kafka's
> architecture that allows it to be a lot faster than other message
> brokers/Redis in particular
>
> Thanks!
>


-- 
Andrew Grant
8054482621


Re: [DISCUSS] KIP-778 KRaft Upgrades

2021-10-14 Thread José Armando García Sancio
On Tue, Oct 12, 2021 at 10:57 AM Colin McCabe  wrote:
> > 11. For downgrades, it would be useful to describe how to determine the
> > downgrade process (generating new snapshot, propagating the snapshot, etc)
> > has completed. We could block the UpdateFeature request until the process
> > is completed. However, since the process could take time, the request could
> > time out. Another way is through DescribeFeature and the server only
> > reports downgraded versions after the process is completed.
>
> Hmm.. I think we need to avoid blocking, since we don't know how long it will 
> take for all nodes to act on the downgrade request. After all, some nodes may 
> be down.
>
> But I agree we should have some way of knowing when the upgrade is done. 
> DescribeClusterResponse seems like the natural place to put information about 
> each node's feature level. While we're at it, we should also add a boolean to 
> indicate whether the given node is fenced. (This will always be false for ZK 
> mode, of course...)
>

I agree. I think from the user's point of view, they would like to
know if it is safe to downgrade the software version of a specific
broker or controller. I think that it is safe to downgrade a broker or
controller when the metadata.version has been downgraded and the node
has generated a snapshot with the downgraded metadata.version.

-- 
-Jose


Re: [DISCUSS] KIP-778 KRaft Upgrades

2021-10-14 Thread José Armando García Sancio
On Thu, Oct 7, 2021 at 5:20 PM Jun Rao  wrote:
> 7. Jose, what control records were you referring?
>

Hey Jun, in KRaft we have 3 control records.
- LeaderChangeMessage - this is persistent in the replica log when a
new leader gets elected and the epoch increases.  We never included
this record in the snapshot so this KIP doesn't need to process these
control records.
- SnapshotFooterRecord and SnapshotHeaderRecord - these records are
control records at the end and beginning of the snapshot respectively.
They only exist in the snapshot and never in the replicated log. This
KIP needs to make sure that we downgrade these records.

Thanks!
-Jose


Why does Kafka have a higher throughput than Redis?

2021-10-14 Thread Vitor Augusto de Medeiros
Hi everyone,

 i'm doing a benchmark comparison between Kafka and Redis for my final bachelor 
paper and would like to understand more about why Kafka have higher throughput 
if compared to Redis.

 I noticed Redis has lower overall latency (and makes sense since it's stored 
in memory) but cant figure out the difference in throughput.

I found a study (not sure if i can post links here but it's named A COMPARISON 
OF DATA INGESTION PLATFORMS IN REAL-TIME STREAM PROCESSING PIPELINES by 
Sebastian Tallberg)
showing Kafka's throughput hitting 3x the amount of msg/s if compared to Redis 
for a 1kB payload. I would like to understand what is in Kafka's architecture 
that allows it to be a lot faster than other message brokers/Redis in particular

Thanks!


Re: [DISCUSS] KIP-778 KRaft Upgrades

2021-10-14 Thread José Armando García Sancio
On Tue, Oct 5, 2021 at 8:53 AM David Arthur  wrote:
> 2. Generate snapshot on downgrade
> > > Metadata snapshot is generated and sent to the other inactive
> > controllers and to brokers (this snapshot may be lossy!)
> > Why do we need to send this downgraded snapshot to the brokers? The
> > replicas have seen the FeatureLevelRecord and noticed the downgrade.
> > Can we have the replicas each independently generate a downgraded
> > snapshot at the offset for the downgraded FeatureLevelRecord? I assume
> > that the active controller will guarantee that all records after the
> > FatureLevelRecord use the downgraded version. If so, it would be good
> > to mention that explicitly.
>
>
> Similar to above, yes a broker that detects a downgrade via
> FeatureLevelRecord could generate its own downgrade snapshot and reload its
> state from that. This does get a little fuzzy when we consider cases where
> brokers are on different software versions and could be generating a
> downgrade snapshot for version X, but using different versions of the code.
> It might be safer to let the controller generate the snapshot so each
> broker (regardless of software version) gets the same records. However, for
> upgrades (or downgrades) we expect the whole cluster to be running the same
> software version before triggering the metadata.version change, so perhaps
> this isn't a likely scenario. Thoughts?
>
>

Are you saying that for metadata.version X different software versions
could generate different snapshots? If so, I would consider this an
implementation bug, no? The format and content of a snapshot is a
public API that needs to be supported across software versions.

> 3. Max metadata version
> > >For the first release that supports metadata.version, we can simply
> > initialize metadata.version with the current (and only) version. For future
> > releases, we will need a mechanism to bootstrap a particular version. This
> > could be done using the meta.properties file or some similar mechanism. The
> > reason we need the allow for a specific initial version is to support the
> > use case of starting a Kafka cluster at version X with an older
> > metadata.version.
>
>
> I assume that the Active Controller will learn the metadata version of
> > the broker through the BrokerRegistrationRequest. How will the Active
> > Controller learn about the max metadata version of the inactive
> > controller nodes? We currently don't send a registration request from
> > the inactive controller to the active controller.
>
>
> This came up during the design, but I neglected to add it to the KIP. We
> will need a mechanism for determining the supported features of each
> controller similar to how brokers use BrokerRegistrationRequest. Perhaps
> controllers could write a FeatureLevelRecord (or similar) to the metadata
> log indicating their supported version. WDYT?
>

Hmm. So I think you are proposing the following flow:
1. Cluster metadata partition replicas establish a quorum using
ApiVersions and the KRaft protocol.
2. Inactive controllers send a registration RPC to the active controller.
3. The active controller persists this information to the metadata log.

What happens if the inactive controllers send a metadata.version range
that is not compatible with the metadata.version set for the cluster?

> Why do you need to bootstrap a particular version? Isn't the intent
> > that the broker will learn the active metadata version by reading the
> > metadata before unfencing?
>
>
> This bootstrapping is needed for when a KRaft cluster is first started. If
> we don't have this mechanism, the cluster can't really do anything until
> the operator finalizes the metadata.version with the tool. The
> bootstrapping will be done by the controller and the brokers will see this
> version as a record (like you say). I'll add some text to clarify this.
>
>

Got it. A new cluster will use the metadata.version of the first
active controller. The first active controller will persist this
information on the metadata log. All replicas (inactive controller and
brokers) will configure themselves based on this record.

> 4. Reject Registration - This is related to the bullet point above.
> > What will be the behavior of the active controller if the broker sends
> > a metadata version that is not compatible with the cluster wide
> > metadata version?
>
>
> If a broker starts up with a lower supported version range than the current
> cluster metadata.version, it should log an error and shutdown. This is in
> line with KIP-584.
>

Okay. We need to extend this for the controller case.

Thanks
-- 
-Jose


Re: [DISCUSS] Apache Kafka 3.1.0 release

2021-10-14 Thread David Jacot
Hi Luke,

Added it to the plan.

Thanks,
David

On Thu, Oct 14, 2021 at 10:09 AM Luke Chen  wrote:

> Hi David,
> KIP-766 is merged into trunk. Please help add it into the release plan.
>
> Thank you.
> Luke
>
> On Mon, Oct 11, 2021 at 10:50 PM David Jacot 
> wrote:
>
> > Hi Michael,
> >
> > Sure. I have updated the release plan to include it. Thanks for the
> > heads up.
> >
> > Best,
> > David
> >
> > On Mon, Oct 11, 2021 at 4:39 PM Mickael Maison  >
> > wrote:
> >
> > > Hi David,
> > >
> > > You can add KIP-690 to the release plan. The vote passed months ago
> > > and I merged the PR today.
> > >
> > > Thanks
> > >
> > > On Fri, Oct 8, 2021 at 8:32 AM David Jacot  >
> > > wrote:
> > > >
> > > > Hi folks,
> > > >
> > > > Just a quick reminder that KIP Freeze is next Friday, October 15th.
> > > >
> > > > Cheers,
> > > > David
> > > >
> > > > On Wed, Sep 29, 2021 at 3:52 PM Chris Egerton
> > > 
> > > > wrote:
> > > >
> > > > > Thanks David!
> > > > >
> > > > > On Wed, Sep 29, 2021 at 2:56 AM David Jacot
> > > 
> > > > > wrote:
> > > > >
> > > > > > Hi Chris,
> > > > > >
> > > > > > Sure thing. I have added KIP-618 to the release plan. Thanks for
> > the
> > > > > heads
> > > > > > up.
> > > > > >
> > > > > > Best,
> > > > > > David
> > > > > >
> > > > > > On Wed, Sep 29, 2021 at 8:53 AM David Jacot  >
> > > wrote:
> > > > > >
> > > > > > > Hi Kirk,
> > > > > > >
> > > > > > > Yes, it is definitely possible if you can get the KIP voted
> > before
> > > the
> > > > > > KIP
> > > > > > > freeze
> > > > > > > and the code committed before the feature freeze. Please, let
> me
> > > know
> > > > > > when
> > > > > > > the
> > > > > > > KIP is voted and I will add it to the release plan.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > David
> > > > > > >
> > > > > > > On Tue, Sep 28, 2021 at 7:05 PM Chris Egerton
> > > > > > 
> > > > > > > wrote:
> > > > > > >
> > > > > > >> Hi David,
> > > > > > >>
> > > > > > >> Wondering if we can get KIP-618 included? The vote passed
> months
> > > ago
> > > > > > and a
> > > > > > >> PR has been available since mid-June.
> > > > > > >>
> > > > > > >> Cheers,
> > > > > > >>
> > > > > > >> Chris
> > > > > > >>
> > > > > > >> On Tue, Sep 28, 2021 at 12:53 PM Kirk True <
> > k...@mustardgrain.com
> > > >
> > > > > > wrote:
> > > > > > >>
> > > > > > >> > Hi David,
> > > > > > >> >
> > > > > > >> > Is it possible to try to get KIP-768 in 3.1? I have put it
> up
> > > for a
> > > > > > vote
> > > > > > >> > and have much of it implemented already.
> > > > > > >> >
> > > > > > >> > Thanks,
> > > > > > >> > Kirk
> > > > > > >> >
> > > > > > >> > On Tue, Sep 28, 2021, at 3:11 AM, Israel Ekpo wrote:
> > > > > > >> > > Ok. Sounds good, David.
> > > > > > >> > >
> > > > > > >> > > Let’s forge ahead. The plan looks good.
> > > > > > >> > >
> > > > > > >> > > On Tue, Sep 28, 2021 at 4:02 AM David Jacot
> > > > > > >>  > > > > > >> > >
> > > > > > >> > > wrote:
> > > > > > >> > >
> > > > > > >> > > > Hi Israel,
> > > > > > >> > > >
> > > > > > >> > > > Yeah, 3.0 took quite a long time to be released.
> However,
> > I
> > > > > think
> > > > > > >> > > > that we should stick to our time based release.
> > > > > > >> > > >
> > > > > > >> > > > Best,
> > > > > > >> > > > David
> > > > > > >> > > >
> > > > > > >> > > >
> > > > > > >> > > > On Tue, Sep 28, 2021 at 9:59 AM David Jacot <
> > > > > dja...@confluent.io>
> > > > > > >> > wrote:
> > > > > > >> > > >
> > > > > > >> > > > > Hi Bruno,
> > > > > > >> > > > >
> > > > > > >> > > > > Thanks for the heads up. I have removed it from the
> > plan.
> > > > > > >> > > > >
> > > > > > >> > > > > Best,
> > > > > > >> > > > > David
> > > > > > >> > > > >
> > > > > > >> > > > > On Mon, Sep 27, 2021 at 11:04 AM Bruno Cadonna <
> > > > > > >> cado...@apache.org>
> > > > > > >> > > > wrote:
> > > > > > >> > > > >
> > > > > > >> > > > >> Hi David,
> > > > > > >> > > > >>
> > > > > > >> > > > >> Thank you for the plan!
> > > > > > >> > > > >>
> > > > > > >> > > > >> KIP-698 will not make it for 3.1.0. Could you please
> > > remove
> > > > > it
> > > > > > >> from
> > > > > > >> > the
> > > > > > >> > > > >> plan?
> > > > > > >> > > > >>
> > > > > > >> > > > >> Best,
> > > > > > >> > > > >> Bruno
> > > > > > >> > > > >>
> > > > > > >> > > > >> On 24.09.21 16:22, David Jacot wrote:
> > > > > > >> > > > >> > Hi all,
> > > > > > >> > > > >> >
> > > > > > >> > > > >> > I just published a release plan here:
> > > > > > >> > > > >> >
> > > > > > >> >
> > > > >
> https://cwiki.apache.org/confluence/display/KAFKA/Release+Plan+3.1.0
> > > > > > >> > > > >> >
> > > > > > >> > > > >> > The plan suggests the following dates:
> > > > > > >> > > > >> >
> > > > > > >> > > > >> > KIP Freeze: 15 October 2021
> > > > > > >> > > > >> > Feature Freeze: 29 October 2021
> > > > > > >> > > > >> > Code Freeze: 12 November 2021
> > > > > > >> > > > >> >
> > > > > > >> > > > >> > At least two weeks of stabilization will follow
> Code
> > > > > Freeze.
> > > > > > >> > > > >> >

Kafka client 2.7.1 missing JaasUtils.isZkSecurityEnabled() method

2021-10-14 Thread Alexandre Vermeerbergen
Hello,

When upgrading from Kafka client 2.4.0 our dependencies to Kafka
client 2.7.1, we noticed that JaasUtils.isZkSecurityEnabled() method
no longer exists.

Is there an equivalent method in Kafka client 2.7.1 which could use instead ?

Kind regards,
Alexandre


[jira] [Resolved] (KAFKA-8375) Offset jumps back after commit

2021-10-14 Thread Markus Dybeck (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Markus Dybeck resolved KAFKA-8375.
--
Fix Version/s: 2.4.2
   Resolution: Later

We have not encountered the issue after upgrading both kafka-clients and akka.

> Offset jumps back after commit
> --
>
> Key: KAFKA-8375
> URL: https://issues.apache.org/jira/browse/KAFKA-8375
> Project: Kafka
>  Issue Type: Bug
>  Components: offset manager
>Affects Versions: 1.1.1
>Reporter: Markus Dybeck
>Priority: Major
> Fix For: 2.4.2
>
> Attachments: partition_lag_metrics.png
>
>
> *Setup*
> Kafka: 1.1.1
>  Kafka-client: 1.1.1
>  Zookeeper: 3.4.11
>  Akka streams: 0.20
> *Topic config*
> DELETE_RETENTION_MS_CONFIG: "5000"
>  CLEANUP_POLICY_CONFIG: "compact,delete"
>  RETENTION_BYTES_CONFIG: 2000L
>  RETENTION_MS_CONFIG: 3600
> *Consumer config*
> AUTO_OFFSET_RESET_CONFIG: "earliest"
> *Behavior*
>  We have 7 Consumers consuming from 7 partitions, and some of the consumers 
> lag jumped back a bit randomly. No new messages were pushed to the topic 
> during the time.  We didn't see any strange logs during the time, and the 
> brokers did not restart either.
> Either way, if there would be a restart or rebalance going on, we can not 
> understand why the offset would jump back after it was committed? 
> We did observe it both with logs and by watching metrics of the lag. Our logs 
> pointed out that after we committed the offset, around 30-35 seconds later we 
> consumed an earlier committed message and then the loop begun. The behavior 
> was the same after a restart of all the consumers. The behavior then stopped 
> after a while all by itself.
> We have no clue going forward, or if these might be an issue with akka. But 
> is there any known issue that might cause this?
> Attaching a screendump with metrics that shows the lag for one partition.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [VOTE] KIP-768: Extend SASL/OAUTHBEARER with Support for OIDC

2021-10-14 Thread Ismael Juma
Hi Kirk,

Thanks for the KIP. It looks good overall to me. A few nits:

1. "sasl.login.retry.wait.ms": these configs are typically called `backoff`
in Kafka. For example "retry.backoff.ms". The default for `retry.backoff.ms`
is 100ms. Is there a reason why we are using a different value for this
one? The `sasl.login.retry.max.wait.ms` should be renamed accordingly.
2. "sasl.login.attempts": do we need this at all? We have generally moved
away from number of retries in favor of timeouts for Kafka (the producer
has a retries config partly for historical reasons, but partly due to
semantics that are specific to the producer.
3. "sasl.login.read.timeout.ms" : we have two types of kafka timeouts, "
default.api.timeout.ms" and "request.timeout.ms". Is this similar to any of
the two or is it different? If similar to one of the existing ones, we
should name it similarly.
4. "sasl.login.connect.timeout.ms": is this the equivalent of "
socket.connection.setup.timeout.ms" in Kafka? I am unsure why we chose such
a long name, "connect.timeout.ms" would have been a lot better. However, if
it is similar, then we may want to follow the same naming convention.
5. Should there be a "connect.max.timeout.ms" too?
6. What are the compatibility guarantees offered by the
"OAuthCompatibilityTest" CLI tool? Also, can we adjust the name so it's
clear that it's a Command versus a test suite?

Thanks!

Ismael

On Mon, Sep 27, 2021 at 10:20 AM Kirk True  wrote:

> Hi all!
>
> I'd like to start a vote for KIP-768 that allows Kafka to connect to an
> OAuth/OIDC identity provider for authentication and token retrieval:
>
>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=186877575
>
> Thanks!
> Kirk


[jira] [Created] (KAFKA-13377) Fix

2021-10-14 Thread lujie (Jira)
lujie created KAFKA-13377:
-

 Summary: Fix 
 Key: KAFKA-13377
 URL: https://issues.apache.org/jira/browse/KAFKA-13377
 Project: Kafka
  Issue Type: Improvement
Reporter: lujie






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [VOTE] KIP-768: Extend SASL/OAUTHBEARER with Support for OIDC

2021-10-14 Thread David Jacot
+1 (binding). Thanks for the KIP!

On Fri, Oct 8, 2021 at 10:54 AM Rajini Sivaram 
wrote:

> +1 (binding)
> Thanks for the KIP, Kirk!
>
> Regards,
>
> Rajini
>
>
> On Thu, Sep 30, 2021 at 12:33 PM Manikumar 
> wrote:
>
> > Hi Kirk,
> >
> > Thanks for the KIP!
> >
> > +1 (binding)
> >
> >
> > Thanks,
> > Manikumar
> >
> > On Mon, Sep 27, 2021 at 10:50 PM Kirk True 
> wrote:
> >
> > > Hi all!
> > >
> > > I'd like to start a vote for KIP-768 that allows Kafka to connect to an
> > > OAuth/OIDC identity provider for authentication and token retrieval:
> > >
> > >
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=186877575
> > >
> > > Thanks!
> > > Kirk
> >
>


[jira] [Created] (KAFKA-13376) Allow MirrorMaker producer and consumer customization per replication flow

2021-10-14 Thread Ivan Yurchenko (Jira)
Ivan Yurchenko created KAFKA-13376:
--

 Summary: Allow MirrorMaker producer and consumer customization per 
replication flow
 Key: KAFKA-13376
 URL: https://issues.apache.org/jira/browse/KAFKA-13376
 Project: Kafka
  Issue Type: Bug
  Components: mirrormaker
Reporter: Ivan Yurchenko


Currently, it's possible to set producer and consumer configurations for a 
cluster in MirrorMaker, like this:

{noformat}
{source}.consumer.{consumer_config_name}
{target}.producer.{producer_config_name}
{noformat}

However, in some cases it makes sense to set these configs differently for 
different replication flows (e.g. when they have different latency/throughput 
trade-offs), something like:

{noformat}
{source}->{target}.{source}.consumer.{consumer_config_name}
{source}->{target}.{target}.producer.{producer_config_name}
{noformat}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] Apache Kafka 3.1.0 release

2021-10-14 Thread Luke Chen
Hi David,
KIP-766 is merged into trunk. Please help add it into the release plan.

Thank you.
Luke

On Mon, Oct 11, 2021 at 10:50 PM David Jacot 
wrote:

> Hi Michael,
>
> Sure. I have updated the release plan to include it. Thanks for the
> heads up.
>
> Best,
> David
>
> On Mon, Oct 11, 2021 at 4:39 PM Mickael Maison 
> wrote:
>
> > Hi David,
> >
> > You can add KIP-690 to the release plan. The vote passed months ago
> > and I merged the PR today.
> >
> > Thanks
> >
> > On Fri, Oct 8, 2021 at 8:32 AM David Jacot 
> > wrote:
> > >
> > > Hi folks,
> > >
> > > Just a quick reminder that KIP Freeze is next Friday, October 15th.
> > >
> > > Cheers,
> > > David
> > >
> > > On Wed, Sep 29, 2021 at 3:52 PM Chris Egerton
> > 
> > > wrote:
> > >
> > > > Thanks David!
> > > >
> > > > On Wed, Sep 29, 2021 at 2:56 AM David Jacot
> > 
> > > > wrote:
> > > >
> > > > > Hi Chris,
> > > > >
> > > > > Sure thing. I have added KIP-618 to the release plan. Thanks for
> the
> > > > heads
> > > > > up.
> > > > >
> > > > > Best,
> > > > > David
> > > > >
> > > > > On Wed, Sep 29, 2021 at 8:53 AM David Jacot 
> > wrote:
> > > > >
> > > > > > Hi Kirk,
> > > > > >
> > > > > > Yes, it is definitely possible if you can get the KIP voted
> before
> > the
> > > > > KIP
> > > > > > freeze
> > > > > > and the code committed before the feature freeze. Please, let me
> > know
> > > > > when
> > > > > > the
> > > > > > KIP is voted and I will add it to the release plan.
> > > > > >
> > > > > > Thanks,
> > > > > > David
> > > > > >
> > > > > > On Tue, Sep 28, 2021 at 7:05 PM Chris Egerton
> > > > > 
> > > > > > wrote:
> > > > > >
> > > > > >> Hi David,
> > > > > >>
> > > > > >> Wondering if we can get KIP-618 included? The vote passed months
> > ago
> > > > > and a
> > > > > >> PR has been available since mid-June.
> > > > > >>
> > > > > >> Cheers,
> > > > > >>
> > > > > >> Chris
> > > > > >>
> > > > > >> On Tue, Sep 28, 2021 at 12:53 PM Kirk True <
> k...@mustardgrain.com
> > >
> > > > > wrote:
> > > > > >>
> > > > > >> > Hi David,
> > > > > >> >
> > > > > >> > Is it possible to try to get KIP-768 in 3.1? I have put it up
> > for a
> > > > > vote
> > > > > >> > and have much of it implemented already.
> > > > > >> >
> > > > > >> > Thanks,
> > > > > >> > Kirk
> > > > > >> >
> > > > > >> > On Tue, Sep 28, 2021, at 3:11 AM, Israel Ekpo wrote:
> > > > > >> > > Ok. Sounds good, David.
> > > > > >> > >
> > > > > >> > > Let’s forge ahead. The plan looks good.
> > > > > >> > >
> > > > > >> > > On Tue, Sep 28, 2021 at 4:02 AM David Jacot
> > > > > >>  > > > > >> > >
> > > > > >> > > wrote:
> > > > > >> > >
> > > > > >> > > > Hi Israel,
> > > > > >> > > >
> > > > > >> > > > Yeah, 3.0 took quite a long time to be released. However,
> I
> > > > think
> > > > > >> > > > that we should stick to our time based release.
> > > > > >> > > >
> > > > > >> > > > Best,
> > > > > >> > > > David
> > > > > >> > > >
> > > > > >> > > >
> > > > > >> > > > On Tue, Sep 28, 2021 at 9:59 AM David Jacot <
> > > > dja...@confluent.io>
> > > > > >> > wrote:
> > > > > >> > > >
> > > > > >> > > > > Hi Bruno,
> > > > > >> > > > >
> > > > > >> > > > > Thanks for the heads up. I have removed it from the
> plan.
> > > > > >> > > > >
> > > > > >> > > > > Best,
> > > > > >> > > > > David
> > > > > >> > > > >
> > > > > >> > > > > On Mon, Sep 27, 2021 at 11:04 AM Bruno Cadonna <
> > > > > >> cado...@apache.org>
> > > > > >> > > > wrote:
> > > > > >> > > > >
> > > > > >> > > > >> Hi David,
> > > > > >> > > > >>
> > > > > >> > > > >> Thank you for the plan!
> > > > > >> > > > >>
> > > > > >> > > > >> KIP-698 will not make it for 3.1.0. Could you please
> > remove
> > > > it
> > > > > >> from
> > > > > >> > the
> > > > > >> > > > >> plan?
> > > > > >> > > > >>
> > > > > >> > > > >> Best,
> > > > > >> > > > >> Bruno
> > > > > >> > > > >>
> > > > > >> > > > >> On 24.09.21 16:22, David Jacot wrote:
> > > > > >> > > > >> > Hi all,
> > > > > >> > > > >> >
> > > > > >> > > > >> > I just published a release plan here:
> > > > > >> > > > >> >
> > > > > >> >
> > > > https://cwiki.apache.org/confluence/display/KAFKA/Release+Plan+3.1.0
> > > > > >> > > > >> >
> > > > > >> > > > >> > The plan suggests the following dates:
> > > > > >> > > > >> >
> > > > > >> > > > >> > KIP Freeze: 15 October 2021
> > > > > >> > > > >> > Feature Freeze: 29 October 2021
> > > > > >> > > > >> > Code Freeze: 12 November 2021
> > > > > >> > > > >> >
> > > > > >> > > > >> > At least two weeks of stabilization will follow Code
> > > > Freeze.
> > > > > >> > > > >> >
> > > > > >> > > > >> > I have included all the currently approved KIPs
> > targeting
> > > > > >> 3.1.0.
> > > > > >> > > > Please
> > > > > >> > > > >> > let me know if I should add/remove any to/from the
> > plan.
> > > > > >> > > > >> >
> > > > > >> > > > >> > Please let me know if you have any objections.
> > > > > >> > > > >> >
> > > > > >> > > > >> > Regards,
> > > > > >> > > > >> > David
> > > > > >> > > > >> >
> > > > > >> > > > >> > On Mon, Sep 20, 

[jira] [Created] (KAFKA-13375) Kafka streams apps w/EOS unable to start at InitProducerId

2021-10-14 Thread Lerh Chuan Low (Jira)
Lerh Chuan Low created KAFKA-13375:
--

 Summary: Kafka streams apps w/EOS unable to start at InitProducerId
 Key: KAFKA-13375
 URL: https://issues.apache.org/jira/browse/KAFKA-13375
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 2.8.0
Reporter: Lerh Chuan Low


Hello, I'm wondering if this is a Kafka bug. Our environment setup is as 
follows:

Kafka streams 2.8 - with *EXACTLY_ONCE* turned on (Not *EOS_BETA*, but I don't 
think the changes introduced in EOS beta should affect this)
Kafka broker 2.8. 

We have this situation where we were doing a rolling restart of the broker to 
apply some security changes. After we finished, 4 out of some 15 Stream Apps 
are unable to start. They can never succeed, no matter what we do. 

They fail with the error:


{code:java}
 2021-10-14 07:20:13,548 WARN 
[srn-rec-feeder-802c18a1-9512-4a2a-8c2e-00e37550199d-StreamThread-3] 
o.a.k.s.p.i.StreamsProducer stream-thread 
[srn-rec-feeder-802c18a1-9512-4a2a-8c2e-00e37550199d-StreamThread-3] task [0_6] 
Timeout exception caught trying to initialize transactions. The broker is 
either slow or in bad state (like not having enough replicas) in responding to 
the request, or the connection to broker was interrupted sending the request or 
receiving the response. Will retry initializing the task in the next loop. 
Consider overwriting max.block.ms to a larger value to avoid timeout 
errors{code}


We found a previous Jira describing the issue here: 
https://issues.apache.org/jira/browse/KAFKA-8803. It seems like back then what 
people did was to rolling restart the brokers. We tried that - we targeted the 
group coordinators for our failing apps, then transaction coordinators, then 
all of them. It hasn't resolved our issue so far. 

A few interesting things we've found so far:

- What I can see is that all the failing apps only fail on certain partitions. 
E.g for the app above, only partition 6 never succeeds. Partition 6 shares the 
same coordinator as some of the other partitions and those work, so it seems 
like the issue isn't related to broker state. 

- All the failing apps have a message similar to this 


{code:java}
[2021-10-14 00:54:51,569] INFO [Transaction Marker Request Completion Handler 
103]: Sending srn-rec-feeder-0_6's transaction marker for partition 
srn-bot-003-14 has permanently failed with error 
org.apache.kafka.common.errors.InvalidProducerEpochException with the current 
coordinator epoch 143; cancel sending any more transaction markers 
TxnMarkerEntry{producerId=7001, producerEpoch=610, coordinatorEpoch=143, 
result=ABORT, partitions=[srn-bot-003-14]} to the brokers 
(kafka.coordinator.transaction.TransactionMarkerRequestCompletionHandler) {code}
While we were restarting the brokers. They all failed shortly after. No other 
consumer groups for the other working partitions/working stream apps logged 
this message. 

On digging around in git blame and reading through the source, it looks like 
this is meant to be benign. 

- We tried DEBUG logging for the TransactionCoordinator and 
TransactionStateManager. We can see (assigner is a functioning app)
{code:java}
[2021-10-14 06:48:23,813] DEBUG [TransactionCoordinator id=105] Returning 
CONCURRENT_TRANSACTIONS error code to client for srn-assigner-0_14's 
AddPartitions request (kafka.coordinator.transaction.TransactionCoordinator) 
{code}
I've seen those before during steady state. I do believe they are benign. We 
never see it for the problematic partitions/consumer groups for some reason.
- We tried turning on TRACE for KafkaApis. We can see
{code:java}
[2021-10-14 06:56:58,408] TRACE [KafkaApi-105] Completed srn-rec-feeder-0_6's 
InitProducerIdRequest with result 
InitProducerIdResult(-1,-1,CONCURRENT_TRANSACTIONS) from client 
srn-rec-feeder-802c18a1-9512-4a2a-8c2e-00e37550199d-StreamThread-4-0_6-producer.
 (kafka.server.KafkaApis) {code}
It starts to make me wonder if there's a situation where Kafka is unable to 
abort the transactions if there is never any success in initializing a producer 
ID. But this is diving deep into insider knowledge terriroty that I simply 
don't have. I'm wondering if anyone with more knowledge of how transactions 
work can shed some light here if we are in the wrong path, or if there's any 
way to restore operations at all short of a streams reset? From a cursory look 
at
{noformat}
TransactionCoordinator#handleInitProducerId {noformat}
it looks like any old transactions should just be aborted and life goes on, but 
it's not happening. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Wiki permission to author a KIP

2021-10-14 Thread David Jacot
Hi Igor,

I have added you. I am looking forward to your KIP.

Best,
David

On Thu, Oct 14, 2021 at 12:16 AM Igor Soarez 
wrote:

> Hi,
>
> Could someone please grant me access in the wiki to create a KIP?
>
> username: soarez
>
> Thanks,
>
> --
> Igor
>
>