Re: [VOTE] 3.0.0 RC1

2021-09-02 Thread Ron Dagostino
Hi Konstantine.  I have opened a probable blocker ticket
https://issues.apache.org/jira/browse/KAFKA-13270.  I will work on a PR
shortly.  The description on that ticket is as follows:

The implementation of https://issues.apache.org/jira/browse/ZOOKEEPER-3593 in
ZooKeeper version 3.6.0 decreased the default value for the ZooKeeper
client's `jute.maxbuffer` configuration from 4MB to 1MB. This can cause a
problem if Kafka tries to retrieve a large amount of data across many
znodes – in such a case the ZooKeeper client will repeatedly emit a message
of the form "java.io.IOException: Packet len <> is out of range" and
the Kafka broker will never connect to ZooKeeper and fail to make progress
on the startup sequence. We can avoid the potential for this issue to occur
by explicitly setting the value to 4MB whenever we create a new ZooKeeper
client as long as no explicit value has been set via the `jute.maxbuffer`
system property.

Ron

On Thu, Sep 2, 2021 at 5:52 PM Israel Ekpo  wrote:

> Magnus,
>
> Please could you share the machine and network specs?
>
> How much CPU, RAM is available on each node?
>
> What JDK, JRE version are you using?
>
> What are your broker and client configuration values? Please could you
> share this info if possible?
>
> Thanks.
>
>
>
> On Wed, Sep 1, 2021 at 10:25 AM Magnus Edenhill 
> wrote:
>
> > Hi Konstantine,
> >
> > Some findings from running 3.0.0-RC1 with the librdkafka test suite:
> >
> > * Compaction seems to take slightly longer to kick in when segment sizes
> >   exceed their threshold. (Used to take less than 20 seconds, now takes
> > 20..30 seconds.)
> >
> > * CreateTopic seems to take slightly longer to propagate through the
> > cluster,
> >   e.g., before a new topic is available in metadata from other brokers.
> >
> > * CreateTopics seems to take longer when the Admin request timeout is
> set,
> >   looks like a plateau at 10 seconds:
> >   https://imgur.com/a/n6y76sj
> >
> > (This is a 3 broker cluster with identical configs between 2.8 and
> 3.0.0.)
> >
> > Nothing critical, but could be an indication of regression so I thought
> I'd
> > mention it.
> >
> > Regards,
> > Magnus
> >
> >
> > Den tis 31 aug. 2021 kl 17:51 skrev Konstantine Karantasis <
> > kkaranta...@apache.org>:
> >
> > > Small correction to my previous email.
> > > The actual link for public preview of the 3.0.0 blog post draft is:
> > >
> > >
> >
> https://blogs.apache.org/preview/kafka/?previewEntry=what-s-new-in-apache6
> > >
> > > (see also the email thread with title: [DISCUSS] Please review the
> 3.0.0
> > > blog post)
> > >
> > > Best,
> > > Konstantine
> > >
> > > On Tue, Aug 31, 2021 at 6:34 PM Konstantine Karantasis <
> > > kkaranta...@apache.org> wrote:
> > >
> > > >
> > > > Hello Kafka users, developers and client-developers,
> > > >
> > > > This is the second release candidate for Apache Kafka 3.0.0.
> > > > It corresponds to a major release that includes many new features,
> > > > including:
> > > >
> > > > * The deprecation of support for Java 8 and Scala 2.12.
> > > > * Kafka Raft support for snapshots of the metadata topic and
> > > > other improvements in the self-managed quorum.
> > > > * Deprecation of message formats v0 and v1.
> > > > * Stronger delivery guarantees for the Kafka producer enabled by
> > default.
> > > > * Optimizations in OffsetFetch and FindCoordinator requests.
> > > > * More flexible Mirror Maker 2 configuration and deprecation of
> > > > Mirror Maker 1.
> > > > * Ability to restart a connector's tasks on a single call in Kafka
> > > Connect.
> > > > * Connector log contexts and connector client overrides are now
> enabled
> > > > by default.
> > > > * Enhanced semantics for timestamp synchronization in Kafka Streams.
> > > > * Revamped public API for Stream's TaskId.
> > > > * Default serde becomes null in Kafka Streams and several
> > > > other configuration changes.
> > > >
> > > > You may read and review a more detailed list of changes in the 3.0.0
> > blog
> > > > post draft here:
> > > >
> > > >
> > >
> >
> https://blogs.apache.org/roller-ui/authoring/preview/kafka/?previewEntry=what-s-new-in-apache6
> > > >
> > > > Release notes for the 3.0.0 release:
> > > >
> > https://home.apache.org/~kkarantasis/kafka-3.0.0-rc1/RELEASE_NOTES.html
> > > >
> > > > *** Please download, test and vote by Wednesday, September 8, 2021
> ***
> > > >
> > > > Kafka's KEYS file containing PGP keys we use to sign the release:
> > > > https://kafka.apache.org/KEYS
> > > >
> > > > * Release artifacts to be voted upon (source and binary):
> > > > https://home.apache.org/~kkarantasis/kafka-3.0.0-rc1/
> > > >
> > > > * Maven artifacts to be voted upon:
> > > >
> https://repository.apache.org/content/groups/staging/org/apache/kafka/
> > > >
> > > > * Javadoc:
> > > > https://home.apache.org/~kkarantasis/kafka-3.0.0-rc1/javadoc/
> > > >
> > > > * Tag to be voted upon (off 3.0 branch) is the 3.0.0 tag:
> > > > https://github.com/apache/kafka/releases/tag/3.0.0-rc1
> > > >
> > > 

[jira] [Created] (KAFKA-13270) Kafka may fail to connect to ZooKeeper, retry forever, and never start

2021-09-02 Thread Ron Dagostino (Jira)
Ron Dagostino created KAFKA-13270:
-

 Summary: Kafka may fail to connect to ZooKeeper, retry forever, 
and never start
 Key: KAFKA-13270
 URL: https://issues.apache.org/jira/browse/KAFKA-13270
 Project: Kafka
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Ron Dagostino
Assignee: Ron Dagostino
 Fix For: 3.0.0


The implementation of https://issues.apache.org/jira/browse/ZOOKEEPER-3593 in 
ZooKeeper version 3.6.0 decreased the default value for the ZooKeeper client's 
`jute.maxbuffer` configuration from 4MB to 1MB.  This can cause a problem if 
Kafka tries to retrieve a large amount of data across many znodes -- in such a 
case the ZooKeeper client will repeatedly emit a message of the form 
"java.io.IOException: Packet len <> is out of range" and the Kafka broker 
will never connect to ZooKeeper and fail make progress on the startup sequence. 
 We can avoid the potential for this issue to occur by explicitly setting the 
value to 4MB whenever we create a new ZooKeeper client as long as no explicit 
value has been set via the `jute.maxbuffer` system property.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-13269) Kafka Streams Aggregation data loss between instance restarts and rebalances

2021-09-02 Thread Rohit Bobade (Jira)
Rohit Bobade created KAFKA-13269:


 Summary: Kafka Streams Aggregation data loss between instance 
restarts and rebalances
 Key: KAFKA-13269
 URL: https://issues.apache.org/jira/browse/KAFKA-13269
 Project: Kafka
  Issue Type: Bug
  Components: streams
Affects Versions: 2.6.2
Reporter: Rohit Bobade


Using Kafka Streams 2.6.2 and doing count based aggregation of messages. Also 
setting Processing Guarantee - EXACTLY_ONCE_BETA and 
NUM_STANDBY_REPLICAS_CONFIG = 1. Sending some messages and restarting instances 
in middle while processing to test fault tolerance. The output count is 
incorrect because of data loss while restoring state.

It looks like the streams task becomes active and starts processing even when 
the state is not fully restored but is within the acceptable recovery lag 
(default is 1) This results in data loss
{quote}A stateful active task is assigned to an instance only when its state is 
within the configured acceptable.recovery.lag, if one exists
{quote}
[https://docs.confluent.io/platform/current/streams/developer-guide/running-app.html?_ga=2.33073014.912824567.1630441414-1598368976.1615841473#state-restoration-during-workload-rebalance]

[https://docs.confluent.io/platform/current/installation/configuration/streams-configs.html#streamsconfigs_acceptable.recovery.lag]

Setting acceptable.recovery.lag to 0 and re-running the chaos tests gives the 
correct result.

Related KIP: 
[https://cwiki.apache.org/confluence/display/KAFKA/KIP-441%3A+Smooth+Scaling+Out+for+Kafka+Streams#KIP441:SmoothScalingOutforKafkaStreams-Computingthemost-caught-upinstances]

Just want to get some thoughts on this use case from the Kafka team or if 
anyone has encountered similar issue



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] KIP-767 Connect Latency Metrics

2021-09-02 Thread Ryanne Dolan
Thanks Jordan, this is a major blindspot today.

Ryanne


On Wed, Sep 1, 2021, 6:03 PM Jordan Bull  wrote:

> Hi all,
>
> I would like to start the discussion for KIP-767 involving adding latency
> metrics to Connect. The KIP can be found at
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-767%3A+Connect+Latency+Metrics
>
> Thanks,
> Jordan
>


Build failed in Jenkins: Kafka » Kafka Branch Builder » trunk #445

2021-09-02 Thread Apache Jenkins Server
See 


Changes:


--
[...truncated 490794 lines...]
[Pipeline] junit
[Pipeline] dir
[2021-09-02T22:29:40.880Z] Running in 
/home/jenkins/workspace/Kafka_kafka_trunk/streams/quickstart
[Pipeline] {
[Pipeline] sh
[2021-09-02T22:29:41.070Z] Recording test results
[2021-09-02T22:29:41.147Z] 
[2021-09-02T22:29:41.147Z] LogCleanerParameterizedIntegrationTest > 
testCleanerWithMessageFormatV0(CompressionType) > 
kafka.log.LogCleanerParameterizedIntegrationTest.testCleanerWithMessageFormatV0(CompressionType)[2]
 PASSED
[2021-09-02T22:29:41.147Z] 
[2021-09-02T22:29:41.147Z] LogCleanerParameterizedIntegrationTest > 
testCleanerWithMessageFormatV0(CompressionType) > 
kafka.log.LogCleanerParameterizedIntegrationTest.testCleanerWithMessageFormatV0(CompressionType)[3]
 STARTED
[2021-09-02T22:29:43.234Z] + mvn clean install -Dgpg.skip
[2021-09-02T22:29:44.182Z] [INFO] Scanning for projects...
[2021-09-02T22:29:44.182Z] [INFO] 

[2021-09-02T22:29:44.182Z] [INFO] Reactor Build Order:
[2021-09-02T22:29:44.182Z] [INFO] 
[2021-09-02T22:29:44.182Z] [INFO] Kafka Streams :: Quickstart   
 [pom]
[2021-09-02T22:29:44.182Z] [INFO] streams-quickstart-java   
 [maven-archetype]
[2021-09-02T22:29:44.182Z] [INFO] 
[2021-09-02T22:29:44.182Z] [INFO] < 
org.apache.kafka:streams-quickstart >-
[2021-09-02T22:29:44.182Z] [INFO] Building Kafka Streams :: Quickstart 
3.1.0-SNAPSHOT[1/2]
[2021-09-02T22:29:44.182Z] [INFO] [ pom 
]-
[2021-09-02T22:29:44.182Z] [INFO] 
[2021-09-02T22:29:44.182Z] [INFO] --- maven-clean-plugin:3.0.0:clean 
(default-clean) @ streams-quickstart ---
[2021-09-02T22:29:44.182Z] [INFO] 
[2021-09-02T22:29:44.182Z] [INFO] --- maven-remote-resources-plugin:1.5:process 
(process-resource-bundles) @ streams-quickstart ---
[2021-09-02T22:29:45.130Z] [INFO] 
[2021-09-02T22:29:45.130Z] [INFO] --- maven-site-plugin:3.5.1:attach-descriptor 
(attach-descriptor) @ streams-quickstart ---
[2021-09-02T22:29:46.079Z] [INFO] 
[2021-09-02T22:29:46.079Z] [INFO] --- maven-gpg-plugin:1.6:sign 
(sign-artifacts) @ streams-quickstart ---
[2021-09-02T22:29:46.079Z] [INFO] 
[2021-09-02T22:29:46.079Z] [INFO] --- maven-install-plugin:2.5.2:install 
(default-install) @ streams-quickstart ---
[2021-09-02T22:29:46.079Z] [INFO] Installing 
/home/jenkins/workspace/Kafka_kafka_trunk/streams/quickstart/pom.xml to 
/home/jenkins/.m2/repository/org/apache/kafka/streams-quickstart/3.1.0-SNAPSHOT/streams-quickstart-3.1.0-SNAPSHOT.pom
[2021-09-02T22:29:46.079Z] [INFO] 
[2021-09-02T22:29:46.079Z] [INFO] --< 
org.apache.kafka:streams-quickstart-java >--
[2021-09-02T22:29:46.079Z] [INFO] Building streams-quickstart-java 
3.1.0-SNAPSHOT[2/2]
[2021-09-02T22:29:46.079Z] [INFO] --[ maven-archetype 
]---
[2021-09-02T22:29:46.079Z] [INFO] 
[2021-09-02T22:29:46.079Z] [INFO] --- maven-clean-plugin:3.0.0:clean 
(default-clean) @ streams-quickstart-java ---
[2021-09-02T22:29:46.079Z] [INFO] 
[2021-09-02T22:29:46.079Z] [INFO] --- maven-remote-resources-plugin:1.5:process 
(process-resource-bundles) @ streams-quickstart-java ---
[2021-09-02T22:29:46.079Z] [INFO] 
[2021-09-02T22:29:46.079Z] [INFO] --- maven-resources-plugin:2.7:resources 
(default-resources) @ streams-quickstart-java ---
[2021-09-02T22:29:46.079Z] [INFO] Using 'UTF-8' encoding to copy filtered 
resources.
[2021-09-02T22:29:46.079Z] [INFO] Copying 6 resources
[2021-09-02T22:29:46.079Z] [INFO] Copying 3 resources
[2021-09-02T22:29:46.079Z] [INFO] 
[2021-09-02T22:29:46.079Z] [INFO] --- maven-resources-plugin:2.7:testResources 
(default-testResources) @ streams-quickstart-java ---
[2021-09-02T22:29:46.079Z] [INFO] Using 'UTF-8' encoding to copy filtered 
resources.
[2021-09-02T22:29:46.079Z] [INFO] Copying 2 resources
[2021-09-02T22:29:46.079Z] [INFO] Copying 3 resources
[2021-09-02T22:29:46.079Z] [INFO] 
[2021-09-02T22:29:46.079Z] [INFO] --- maven-archetype-plugin:2.2:jar 
(default-jar) @ streams-quickstart-java ---
[2021-09-02T22:29:46.079Z] [INFO] Building archetype jar: 
/home/jenkins/workspace/Kafka_kafka_trunk/streams/quickstart/java/target/streams-quickstart-java-3.1.0-SNAPSHOT
[2021-09-02T22:29:46.603Z] [INFO] 
[2021-09-02T22:29:46.603Z] [INFO] --- maven-site-plugin:3.5.1:attach-descriptor 
(attach-descriptor) @ streams-quickstart-java ---
[2021-09-02T22:29:46.603Z] [INFO] 
[2021-09-02T22:29:46.603Z] [INFO] --- 
maven-archetype-plugin:2.2:integration-test (default-integration-test) @ 
streams-quickstart-java ---
[2021-09-02T22:29:46.603Z] [INFO] 
[2021-09-02T22:29:46.603Z] [INFO] --- maven-gpg-plugin:1.6:sign 
(sign-artifacts) @ streams-quickstart-java 

Re: [VOTE] 3.0.0 RC1

2021-09-02 Thread Israel Ekpo
Magnus,

Please could you share the machine and network specs?

How much CPU, RAM is available on each node?

What JDK, JRE version are you using?

What are your broker and client configuration values? Please could you
share this info if possible?

Thanks.



On Wed, Sep 1, 2021 at 10:25 AM Magnus Edenhill  wrote:

> Hi Konstantine,
>
> Some findings from running 3.0.0-RC1 with the librdkafka test suite:
>
> * Compaction seems to take slightly longer to kick in when segment sizes
>   exceed their threshold. (Used to take less than 20 seconds, now takes
> 20..30 seconds.)
>
> * CreateTopic seems to take slightly longer to propagate through the
> cluster,
>   e.g., before a new topic is available in metadata from other brokers.
>
> * CreateTopics seems to take longer when the Admin request timeout is set,
>   looks like a plateau at 10 seconds:
>   https://imgur.com/a/n6y76sj
>
> (This is a 3 broker cluster with identical configs between 2.8 and 3.0.0.)
>
> Nothing critical, but could be an indication of regression so I thought I'd
> mention it.
>
> Regards,
> Magnus
>
>
> Den tis 31 aug. 2021 kl 17:51 skrev Konstantine Karantasis <
> kkaranta...@apache.org>:
>
> > Small correction to my previous email.
> > The actual link for public preview of the 3.0.0 blog post draft is:
> >
> >
> https://blogs.apache.org/preview/kafka/?previewEntry=what-s-new-in-apache6
> >
> > (see also the email thread with title: [DISCUSS] Please review the 3.0.0
> > blog post)
> >
> > Best,
> > Konstantine
> >
> > On Tue, Aug 31, 2021 at 6:34 PM Konstantine Karantasis <
> > kkaranta...@apache.org> wrote:
> >
> > >
> > > Hello Kafka users, developers and client-developers,
> > >
> > > This is the second release candidate for Apache Kafka 3.0.0.
> > > It corresponds to a major release that includes many new features,
> > > including:
> > >
> > > * The deprecation of support for Java 8 and Scala 2.12.
> > > * Kafka Raft support for snapshots of the metadata topic and
> > > other improvements in the self-managed quorum.
> > > * Deprecation of message formats v0 and v1.
> > > * Stronger delivery guarantees for the Kafka producer enabled by
> default.
> > > * Optimizations in OffsetFetch and FindCoordinator requests.
> > > * More flexible Mirror Maker 2 configuration and deprecation of
> > > Mirror Maker 1.
> > > * Ability to restart a connector's tasks on a single call in Kafka
> > Connect.
> > > * Connector log contexts and connector client overrides are now enabled
> > > by default.
> > > * Enhanced semantics for timestamp synchronization in Kafka Streams.
> > > * Revamped public API for Stream's TaskId.
> > > * Default serde becomes null in Kafka Streams and several
> > > other configuration changes.
> > >
> > > You may read and review a more detailed list of changes in the 3.0.0
> blog
> > > post draft here:
> > >
> > >
> >
> https://blogs.apache.org/roller-ui/authoring/preview/kafka/?previewEntry=what-s-new-in-apache6
> > >
> > > Release notes for the 3.0.0 release:
> > >
> https://home.apache.org/~kkarantasis/kafka-3.0.0-rc1/RELEASE_NOTES.html
> > >
> > > *** Please download, test and vote by Wednesday, September 8, 2021 ***
> > >
> > > Kafka's KEYS file containing PGP keys we use to sign the release:
> > > https://kafka.apache.org/KEYS
> > >
> > > * Release artifacts to be voted upon (source and binary):
> > > https://home.apache.org/~kkarantasis/kafka-3.0.0-rc1/
> > >
> > > * Maven artifacts to be voted upon:
> > > https://repository.apache.org/content/groups/staging/org/apache/kafka/
> > >
> > > * Javadoc:
> > > https://home.apache.org/~kkarantasis/kafka-3.0.0-rc1/javadoc/
> > >
> > > * Tag to be voted upon (off 3.0 branch) is the 3.0.0 tag:
> > > https://github.com/apache/kafka/releases/tag/3.0.0-rc1
> > >
> > > * Documentation:
> > > https://kafka.apache.org/30/documentation.html
> > >
> > > * Protocol:
> > > https://kafka.apache.org/30/protocol.html
> > >
> > > * Successful Jenkins builds for the 3.0 branch:
> > > Unit/integration tests:
> > >
> >
> https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka/detail/3.0/121/pipeline/
> > > (only few flaky failures)
> > > System tests:
> > > https://jenkins.confluent.io/job/system-test-kafka/job/3.0/57/
> > >
> > > /**
> > >
> > > Thanks,
> > > Konstantine
> > >
> >
>


[jira] [Resolved] (KAFKA-13225) Controller skips sending UpdateMetadataRequest when shutting down broker doesnt host partitions

2021-09-02 Thread Jun Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jun Rao resolved KAFKA-13225.
-
Fix Version/s: 3.1.0
   Resolution: Fixed

merged the PR to trunk

> Controller skips sending UpdateMetadataRequest when shutting down broker 
> doesnt host partitions 
> 
>
> Key: KAFKA-13225
> URL: https://issues.apache.org/jira/browse/KAFKA-13225
> Project: Kafka
>  Issue Type: Bug
>  Components: controller
>Reporter: David Mao
>Assignee: David Mao
>Priority: Minor
> Fix For: 3.1.0
>
>
> If a broker not hosting replicas for any partitions is shut down while there 
> are offline partitions, the controller can fail to send out metadata updates 
> to other brokers in the cluster.
>  
> Since this is a very niche scenario, I will leave the priority as Minor.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[VOTE] KIP 771: KRaft brokers should not expose controller metrics

2021-09-02 Thread Ryan Dielhenn
Hello kafka devs,

I would like to start a vote on KIP-771. This KIP proposes to not expose
controller metrics on KRaft brokers since KRaft brokers are not controller
eligible and will never have a non-zero value for the metric. Since
exposing metrics that will always be zero is both unneeded and causes
non-negligible performance impact it would be best to not move forward with
KAFKA-13140: https://github.com/apache/kafka/pull/11133 and instead accept
this KIP.



Here is a link to the KIP which documents the behavior change from how
controller metrics are exposed in a Kafka cluster using Zookeeper to how
they are exposed in a Kafka cluster using KRaft.
:
https://cwiki.apache.org/confluence/display/KAFKA/KIP+771%3A+KRaft+brokers+should+not+expose+controller+metrics

Here is a link to the discussion:
https://lists.apache.org/thread.html/r74432034527fab13cc973ad5187ef5881a642500d77b0d275dd7f018%40%3Cdev.kafka.apache.org%3E

Regards,
Ryan Dielhenn


[jira] [Created] (KAFKA-13268) Add more integration tests for Table Table FK joins with repartitioning

2021-09-02 Thread Guozhang Wang (Jira)
Guozhang Wang created KAFKA-13268:
-

 Summary: Add more integration tests for Table Table FK joins with 
repartitioning
 Key: KAFKA-13268
 URL: https://issues.apache.org/jira/browse/KAFKA-13268
 Project: Kafka
  Issue Type: Improvement
  Components: streams, unit tests
Reporter: Guozhang Wang


We should add to the FK join multipartition integration test with a 
Repartitioned for:
1) just the new partition count
2) a custom partitioner

This is to test if there's a bug where the internal topics don't pick up a 
partitioner provided that way.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-13267) InvalidPidMappingException: The producer attempted to use a producer id which is not currently assigned to its transactional id

2021-09-02 Thread Gilles Philippart (Jira)
Gilles Philippart created KAFKA-13267:
-

 Summary: InvalidPidMappingException: The producer attempted to use 
a producer id which is not currently assigned to its transactional id
 Key: KAFKA-13267
 URL: https://issues.apache.org/jira/browse/KAFKA-13267
 Project: Kafka
  Issue Type: Bug
Affects Versions: 2.8.0
Reporter: Gilles Philippart


We're using Confluent Cloud and Kafka Streams 2.8.0 and we've seen these errors 
pop up in apps using EOS:
{code:java}
InvalidPidMappingException: The producer attempted to use a producer id which 
is not currently assigned to its transactional id
{code}
Full stack trace:
{code:java}
Error encountered sending record to topic ola-update-1 for task 4_7 due to: 
org.apache.kafka.common.errors.InvalidPidMappingException: The producer 
attempted to use a producer id which is not currently assigned to its 
transactional id. Exception handler choose to FAIL the processing, no more 
records would be sent.
RecordCollectorImpl.java  226  recordSendError(...)
RecordCollectorImpl.java:226:in `recordSendError'
RecordCollectorImpl.java  196  lambda$send$0(...)
RecordCollectorImpl.java:196:in `lambda$send$0'
KafkaProducer.java  1365  onCompletion(...)
KafkaProducer.java:1365:in `onCompletion'
ProducerBatch.java  231  completeFutureAndFireCallbacks(...)
ProducerBatch.java:231:in `completeFutureAndFireCallbacks'
ProducerBatch.java  159  abort(...)
ProducerBatch.java:159:in `abort'
RecordAccumulator.java  763  abortBatches(...)
RecordAccumulator.java:763:in `abortBatches'
More (5 lines)
Nested Exceptionsorg.apache.kafka.streams.errors.StreamsException: Error 
encountered sending record to topic ola-update-1 for task 4_7 due to: 
org.apache.kafka.common.errors.InvalidPidMappingException: The producer 
attempted to use a producer id which is not currently assigned to its 
transactional id. Exception handler choose to FAIL the processing, no more 
records would be sent.
RecordCollectorImpl.java  226  recordSendError(...)
RecordCollectorImpl.java:226:in `recordSendError'
 org.apache.kafka.common.errors.InvalidPidMappingException: The producer 
attempted to use a producer id which is not currently assigned to its 
transactional id.
{code}
I've seen that KAFKA-6821 described the same problem on an earlier version of 
Kafka and was closed due to the subsequent works on EOS.

Another ticket raised recently shows that the exception is still occurring (but 
the ticket wasn't raised for that specific error): KAFKA-12774



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-13266) `InitialFetchState` should be created after partition is removed from the fetchers

2021-09-02 Thread David Jacot (Jira)
David Jacot created KAFKA-13266:
---

 Summary: `InitialFetchState` should be created after partition is 
removed from the fetchers
 Key: KAFKA-13266
 URL: https://issues.apache.org/jira/browse/KAFKA-13266
 Project: Kafka
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: David Jacot
Assignee: David Jacot


 

`ReplicationTest.test_replication_with_broker_failure` in KRaft mode sometimes 
fails with the following error in the log:
{noformat}
[2021-08-31 11:31:25,092] ERROR [ReplicaFetcher replicaId=1, leaderId=2, 
fetcherId=0] Unexpected error occurred while processing data for partition 
__consumer_offsets-1 at offset 31727 
(kafka.server.ReplicaFetcherThread)java.lang.IllegalStateException: Offset 
mismatch for partition __consumer_offsets-1: fetched offset = 31727, log end 
offset = 31728. at 
kafka.server.ReplicaFetcherThread.processPartitionData(ReplicaFetcherThread.scala:194)
 at 
kafka.server.AbstractFetcherThread.$anonfun$processFetchRequest$8(AbstractFetcherThread.scala:545)
 at scala.Option.foreach(Option.scala:437) at 
kafka.server.AbstractFetcherThread.$anonfun$processFetchRequest$7(AbstractFetcherThread.scala:533)
 at 
kafka.server.AbstractFetcherThread.$anonfun$processFetchRequest$7$adapted(AbstractFetcherThread.scala:532)
 at 
kafka.utils.Implicits$MapExtensionMethods$.$anonfun$forKeyValue$1(Implicits.scala:62)
 at 
scala.collection.convert.JavaCollectionWrappers$JMapWrapperLike.foreachEntry(JavaCollectionWrappers.scala:359)
 at 
scala.collection.convert.JavaCollectionWrappers$JMapWrapperLike.foreachEntry$(JavaCollectionWrappers.scala:355)
 at 
scala.collection.convert.JavaCollectionWrappers$AbstractJMapWrapper.foreachEntry(JavaCollectionWrappers.scala:309)
 at 
kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:532)
 at 
kafka.server.AbstractFetcherThread.$anonfun$maybeFetch$3(AbstractFetcherThread.scala:216)
 at 
kafka.server.AbstractFetcherThread.$anonfun$maybeFetch$3$adapted(AbstractFetcherThread.scala:215)
 at scala.Option.foreach(Option.scala:437) at 
kafka.server.AbstractFetcherThread.maybeFetch(AbstractFetcherThread.scala:215) 
at kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:197) 
at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:99)[2021-08-31 
11:31:25,093] WARN [ReplicaFetcher replicaId=1, leaderId=2, fetcherId=0] 
Partition __consumer_offsets-1 marked as failed 
(kafka.server.ReplicaFetcherThread)
{noformat}
 

The issue is due to a race condition in 
`ReplicaManager#applyLocalFollowersDelta`. The `InitialFetchState` is created 
and populated before the partition is removed from the fetcher threads. This 
means that the fetch offset of the `InitialFetchState` could be outdated when 
the fetcher threads are re-started because the fetcher threads could have 
incremented the log end offset in between.

The partitions must be removed from the fetcher threads before the 
`InitialFetchStates` are created.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] KIP-770: Replace "buffered.records.per.partition" with "input.buffer.max.bytes"

2021-09-02 Thread Sagar
Thanks Guozhang and Luke.

I have updated the KIP with all the suggested changes.

Do you think we could start voting for this?

Thanks!
Sagar.

On Thu, Sep 2, 2021 at 8:26 AM Luke Chen  wrote:

> Thanks for the KIP. Overall LGTM.
>
> Just one thought, if we "rename" the config directly as mentioned in the
> KIP, would that break existing applications?
> Should we deprecate the old one first, and make the old/new names co-exist
> for some period of time?
>
> Public Interfaces
>
>- Adding a new config *input.buffer.max.bytes *applicable at a topology
>level. The importance of this config would be *Medium*.
>- Renaming *cache.max.bytes.buffering* to *statestore.cache.max.bytes*.
>
>
>
> Thank you.
> Luke
>
> On Thu, Sep 2, 2021 at 1:50 AM Guozhang Wang  wrote:
>
> > Currently the state store cache size default value is 10MB today, which
> > arguably is rather small. So I'm thinking maybe for this config default
> to
> > 512MB.
> >
> > Other than that, LGTM.
> >
> > On Sat, Aug 28, 2021 at 11:34 AM Sagar 
> wrote:
> >
> > > Thanks Guozhang and Sophie.
> > >
> > > Yeah a small default value would lower the throughput. I didn't quite
> > > realise it earlier. It's slightly hard to predict this value so I would
> > > guess around 1/2 GB to 1 GB? WDYT?
> > >
> > > Regarding the renaming of the config and the new metric, sure would
> > include
> > > it in the KIP.
> > >
> > > Lastly, importance would also. be added. I guess Medium should be ok.
> > >
> > > Thanks!
> > > Sagar.
> > >
> > >
> > > On Sat, Aug 28, 2021 at 10:42 AM Sophie Blee-Goldman
> > >  wrote:
> > >
> > > > 1) I agree that we should just distribute the bytes evenly, at least
> > for
> > > > now. It's simpler to understand and
> > > > we can always change it later, plus it makes sense to keep this
> aligned
> > > > with how the cache works today
> > > >
> > > > 2) +1 to being conservative in the generous sense, it's just not
> > > something
> > > > we can predict with any degree
> > > > of accuracy and even if we could, the appropriate value is going to
> > > differ
> > > > wildly across applications and use
> > > > cases. We might want to just pick some multiple of the default cache
> > > size,
> > > > and maybe do some research on
> > > > other relevant defaults or sizes (default JVM heap, size of available
> > > > memory in common hosts eg EC2
> > > > instances, etc). We don't need to worry as much about erring on the
> > side
> > > of
> > > > too big, since other configs like
> > > > the max.poll.records will help somewhat to keep it from exploding.
> > > >
> > > > 4) 100%, I always found the *cache.max.bytes.buffering* config name
> to
> > be
> > > > incredibly confusing. Deprecating this in
> > > > favor of "*statestore.cache.max.bytes*" and aligning it to the new
> > input
> > > > buffer config sounds good to me to include here.
> > > >
> > > > 5) The KIP should list all relevant public-facing changes, including
> > > > metadata like the config's "Importance". Personally
> > > > I would recommend Medium, or even High if we're really worried about
> > the
> > > > default being wrong for a lot of users
> > > >
> > > > Thanks for the KIP, besides those few things that Guozhang brought up
> > and
> > > > the config importance, everything SGTM
> > > >
> > > > -Sophie
> > > >
> > > > On Thu, Aug 26, 2021 at 2:41 PM Guozhang Wang 
> > > wrote:
> > > >
> > > > > 1) I meant for your proposed solution. I.e. to distribute the
> > > configured
> > > > > bytes among threads evenly.
> > > > >
> > > > > 2) I was actually thinking about making the default a large enough
> > > value
> > > > so
> > > > > that we would not introduce performance regression: thinking about
> a
> > > use
> > > > > case with many partitions and each record may be large, then
> > > effectively
> > > > we
> > > > > would only start pausing when the total bytes buffered is pretty
> > large.
> > > > If
> > > > > we set the default value to small, we would be "more aggressive" on
> > > > pausing
> > > > > which may impact throughput.
> > > > >
> > > > > 3) Yes exactly, this would naturally be at the "partition-group"
> > class
> > > > > since that represents the task's all input partitions.
> > > > >
> > > > > 4) This is just a bold thought, I'm interested to see other's
> > thoughts.
> > > > >
> > > > >
> > > > > Guozhang
> > > > >
> > > > > On Mon, Aug 23, 2021 at 4:10 AM Sagar 
> > > wrote:
> > > > >
> > > > > > Thanks Guozhang.
> > > > > >
> > > > > > 1) Just for my confirmation, when you say we should proceed with
> > the
> > > > even
> > > > > > distribution of bytes, are you referring to the Proposed Solution
> > in
> > > > the
> > > > > > KIP or the option you had considered in the JIRA?
> > > > > > 2) Default value for the config is something that I missed. I
> agree
> > > we
> > > > > > can't have really large values as it might be detrimental to the
> > > > > > performance. Maybe, as a starting point, we assume that only 1
> > Stream
> > > > > Task
> > > > > > 

[jira] [Created] (KAFKA-13265) Kafka consumers disappearing after certain point of time

2021-09-02 Thread Ayyandurai Mani (Jira)
Ayyandurai Mani created KAFKA-13265:
---

 Summary: Kafka consumers disappearing after certain point of time 
 Key: KAFKA-13265
 URL: https://issues.apache.org/jira/browse/KAFKA-13265
 Project: Kafka
  Issue Type: Test
  Components: consumer
Affects Versions: 2.4.0
Reporter: Ayyandurai Mani
 Attachments: Consumer_Disappear_Issue_Screen.png, server.log

Dear Kafka Team,

We are facing one issue for past few days in our development environment. We 
have topic called 'search-service-topic-dev' and consumer group 
'search-service-group' with 10 partitions, and concurrency also 10 at  consumer 
side. 

When we publish more messages( each message is 115kb) into the topic after some 
certain point of the time consumers disappeared from the consumer group (note : 
consumer service are running). Have attached screenshot for reference (filename 
: Consumer_Disappear_Issue_Screen.png) 

>From screenshot when i execute describe command for the consumer group at 
>14:35:32 (IST) consumers were available but when i execute at 14:38:17(IST) 
>consumers were not there. 

Attached kafka server.log for that particular time(kafka is running in UTC 
timezone server).

Note : Message size in each partitions is around 2GB.

 

We are kind of blocked due to this behavior. Please help me to resolve this. 

Thanks in advance.

Ayyandurai

 

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Build failed in Jenkins: Kafka » Kafka Branch Builder » 2.8 #75

2021-09-02 Thread Apache Jenkins Server
See 


Changes:


--
[...truncated 466811 lines...]
[2021-09-02T10:17:08.594Z] SimpleAclAuthorizerTest > 
testSuperUserWithCustomPrincipalHasAccess() PASSED
[2021-09-02T10:17:08.594Z] 
[2021-09-02T10:17:08.594Z] SimpleAclAuthorizerTest > 
testAllowAccessWithCustomPrincipal() STARTED
[2021-09-02T10:17:08.594Z] 
[2021-09-02T10:17:08.594Z] SimpleAclAuthorizerTest > 
testAllowAccessWithCustomPrincipal() PASSED
[2021-09-02T10:17:08.594Z] 
[2021-09-02T10:17:08.594Z] SimpleAclAuthorizerTest > 
testDeleteAclOnWildcardResource() STARTED
[2021-09-02T10:17:09.537Z] 
[2021-09-02T10:17:09.537Z] SimpleAclAuthorizerTest > 
testDeleteAclOnWildcardResource() PASSED
[2021-09-02T10:17:09.537Z] 
[2021-09-02T10:17:09.537Z] SimpleAclAuthorizerTest > 
testWritesLiteralWritesLiteralAclChangeEventWhenInterBrokerProtocolLessThanKafkaV2eralAclChangesForOlderProtocolVersions()
 STARTED
[2021-09-02T10:17:09.537Z] 
[2021-09-02T10:17:09.537Z] SimpleAclAuthorizerTest > 
testWritesLiteralWritesLiteralAclChangeEventWhenInterBrokerProtocolLessThanKafkaV2eralAclChangesForOlderProtocolVersions()
 PASSED
[2021-09-02T10:17:09.537Z] 
[2021-09-02T10:17:09.537Z] SimpleAclAuthorizerTest > 
testThrowsOnAddPrefixedAclIfInterBrokerProtocolVersionTooLow() STARTED
[2021-09-02T10:17:10.481Z] 
[2021-09-02T10:17:10.481Z] SimpleAclAuthorizerTest > 
testThrowsOnAddPrefixedAclIfInterBrokerProtocolVersionTooLow() PASSED
[2021-09-02T10:17:10.481Z] 
[2021-09-02T10:17:10.481Z] SimpleAclAuthorizerTest > 
testAccessAllowedIfAllowAclExistsOnPrefixedResource() STARTED
[2021-09-02T10:17:11.425Z] 
[2021-09-02T10:17:11.425Z] SimpleAclAuthorizerTest > 
testAccessAllowedIfAllowAclExistsOnPrefixedResource() PASSED
[2021-09-02T10:17:11.425Z] 
[2021-09-02T10:17:11.425Z] SimpleAclAuthorizerTest > 
testHighConcurrencyModificationOfResourceAcls() STARTED
[2021-09-02T10:17:11.425Z] 
[2021-09-02T10:17:11.425Z] SimpleAclAuthorizerTest > 
testHighConcurrencyModificationOfResourceAcls() PASSED
[2021-09-02T10:17:11.425Z] 
[2021-09-02T10:17:11.425Z] SimpleAclAuthorizerTest > 
testAuthorizeWithEmptyResourceName() STARTED
[2021-09-02T10:17:12.367Z] 
[2021-09-02T10:17:12.367Z] SimpleAclAuthorizerTest > 
testAuthorizeWithEmptyResourceName() PASSED
[2021-09-02T10:17:12.367Z] 
[2021-09-02T10:17:12.367Z] SimpleAclAuthorizerTest > 
testAuthorizeThrowsOnNonLiteralResource() STARTED
[2021-09-02T10:17:12.367Z] 
[2021-09-02T10:17:12.367Z] SimpleAclAuthorizerTest > 
testAuthorizeThrowsOnNonLiteralResource() PASSED
[2021-09-02T10:17:12.367Z] 
[2021-09-02T10:17:12.367Z] SimpleAclAuthorizerTest > 
testDeleteAllAclOnPrefixedResource() STARTED
[2021-09-02T10:17:13.310Z] 
[2021-09-02T10:17:13.310Z] SimpleAclAuthorizerTest > 
testDeleteAllAclOnPrefixedResource() PASSED
[2021-09-02T10:17:13.310Z] 
[2021-09-02T10:17:13.310Z] SimpleAclAuthorizerTest > 
testAddAclsOnLiteralResource() STARTED
[2021-09-02T10:17:14.252Z] 
[2021-09-02T10:17:14.252Z] SimpleAclAuthorizerTest > 
testAddAclsOnLiteralResource() PASSED
[2021-09-02T10:17:14.252Z] 
[2021-09-02T10:17:14.252Z] SimpleAclAuthorizerTest > testGetAclsPrincipal() 
STARTED
[2021-09-02T10:17:14.252Z] 
[2021-09-02T10:17:14.252Z] SimpleAclAuthorizerTest > testGetAclsPrincipal() 
PASSED
[2021-09-02T10:17:14.252Z] 
[2021-09-02T10:17:14.252Z] SimpleAclAuthorizerTest > 
testAddAclsOnPrefiexedResource() STARTED
[2021-09-02T10:17:15.195Z] 
[2021-09-02T10:17:15.195Z] SimpleAclAuthorizerTest > 
testAddAclsOnPrefiexedResource() PASSED
[2021-09-02T10:17:15.195Z] 
[2021-09-02T10:17:15.195Z] SimpleAclAuthorizerTest > 
testWritesExtendedAclChangeEventIfInterBrokerProtocolNotSet() STARTED
[2021-09-02T10:17:15.195Z] 
[2021-09-02T10:17:15.195Z] SimpleAclAuthorizerTest > 
testWritesExtendedAclChangeEventIfInterBrokerProtocolNotSet() PASSED
[2021-09-02T10:17:15.195Z] 
[2021-09-02T10:17:15.195Z] SimpleAclAuthorizerTest > 
testAccessAllowedIfAllowAclExistsOnWildcardResource() STARTED
[2021-09-02T10:17:16.137Z] 
[2021-09-02T10:17:16.137Z] SimpleAclAuthorizerTest > 
testAccessAllowedIfAllowAclExistsOnWildcardResource() PASSED
[2021-09-02T10:17:16.137Z] 
[2021-09-02T10:17:16.137Z] SimpleAclAuthorizerTest > testLoadCache() STARTED
[2021-09-02T10:17:16.137Z] 
[2021-09-02T10:17:16.137Z] SimpleAclAuthorizerTest > testLoadCache() PASSED
[2021-09-02T10:17:16.137Z] 
[2021-09-02T10:17:16.137Z] LogCleanerParameterizedIntegrationTest > [1] 
codec=NONE STARTED
[2021-09-02T10:17:17.901Z] 
[2021-09-02T10:17:17.901Z] LogCleanerParameterizedIntegrationTest > [1] 
codec=NONE PASSED
[2021-09-02T10:17:17.901Z] 
[2021-09-02T10:17:17.901Z] LogCleanerParameterizedIntegrationTest > [2] 
codec=GZIP STARTED
[2021-09-02T10:17:19.735Z] 
[2021-09-02T10:17:19.735Z] LogCleanerParameterizedIntegrationTest > [2] 
codec=GZIP PASSED
[2021-09-02T10:17:19.735Z] 
[2021-09-02T10:17:19.735Z] LogCleanerParameterizedIntegrationTest > [3] 
codec=SNAPPY STARTED
[2021-09-02T10:17:20.678Z] 

Jenkins build is unstable: Kafka » Kafka Branch Builder » trunk #444

2021-09-02 Thread Apache Jenkins Server
See 




Timeline for production ready KRaft

2021-09-02 Thread Kunal Goyal
Hello,

We wanted to use Kafka without zookeeper. It would be good to know when
Apache KRaft is expected to be production ready. Can you please inform us
of the timeline? It would help in our planning.

-- 

Thanks & Regards

Kunal Goyal


[jira] [Created] (KAFKA-13264) backwardFetch in InMemoryWindowStore doesn't return in reverse order

2021-09-02 Thread Luke Chen (Jira)
Luke Chen created KAFKA-13264:
-

 Summary: backwardFetch in InMemoryWindowStore doesn't return in 
reverse order
 Key: KAFKA-13264
 URL: https://issues.apache.org/jira/browse/KAFKA-13264
 Project: Kafka
  Issue Type: Bug
  Components: streams
Affects Versions: 2.8.0
Reporter: Luke Chen
Assignee: Luke Chen


When working on another PR, I found currently, the backwardFetch in 
InMemoryWindowStore doesn't return in reverse order when there are records in 
the same window.

ex: window size = 500,

input records:

key: "a", value: "aa", timestamp: 0 ==> will be in [0, 500\] window

key: "b", value: "bb", timestamp: 10 ==> will be in [0, 500\] window

when fetch in forward order:

"a" -> "b", which is expected

when fetch in backward order:

"a" -> "b", which is NOT expected



--
This message was sent by Atlassian Jira
(v8.3.4#803005)