Re: Kafka Streams: Dynamic Topic Routing & Nonexistent Topics

2020-07-08 Thread Rhys Anthony McCaig
Guozhang - thank you for your thoughts.

You are right - this is more about the producer client than the streams
client.

caching the metadata outside producer, e.g. in an admin client would not
> be a perfect solution since in either way your metadata cache inside the
> producer or inside the admin client would not guarantee to be always up to
> date


It's not perfect, but fortunately in my case it's "good enough" as the main
concern is not letting poorly behaved clients hold up the processing
because they forgot to set up the topics their records should be sent to.

letting the send() call to fail with an UnknownTopicOrPartitionError and
> push the burden on the caller to decide what to do (either wait and retry,
> or give up and stop the world etc) may work, but that requires modifying
> the interface semantics, or at least adding an overloaded function of
> "send()". Maybe worth discussing in a KIP.


The more I think about it, the more I like the idea of differentiating
between a metadata refresh timeout and the case where the metadata was able
to be refreshed yet still didn't contain the topic (or partition). I'll
take a bit more of a look at the existing implementation and try to find
some time to write a KIP for this - as you pointed out this modifies the
interface semantics, so it would need to be an additive and opt in change.

since max.block is a
global config it may also affect other blocking calls like txn-related ones
as well.


Yes that was my concern with this approach as well and hence why i think
the admin client workaround is my best approach at the moment.

Cheers,
Rhys

On Sat, Jul 4, 2020 at 8:36 PM Guozhang Wang  wrote:

> Hello,
>
> Thanks for reaching out to the community for this. I think (maybe you've
> also suggested) it is rather an observation on producer client than on
> streams client. Generally speaking we want to know if we can fail fast if
> the metadata cannot be found in producer.send() call. And here are my
> thoughts:
>
> 1) caching the metadata outside producer, e.g. in an admin client would not
> be a perfect solution since in either way your metadata cache inside the
> producer or inside the admin client would not guarantee to be always up to
> date: e.g. maybe you've decided to fail the record to send since it was not
> in the cache, but one second right after it the metadata gets refreshed and
> contains that topic.
>
> 2) letting the send() call to fail with an UnknownTopicOrPartitionError and
> push the burden on the caller to decide what to do (either wait and retry,
> or give up and stop the world etc) may work, but that requires modifying
> the interface semantics, or at least adding an overloaded function of
> "send()". Maybe worth discussing in a KIP.
>
> 3) for your specific case, if you believe the metadata should be static and
> not changed (i.e. you assume all topics should be pre-created and none
> would be created later), then I think setting max.block to a smaller value
> and just catch TimeoutException is fine since for send() itself, the
> max.block is only used for metadata refresh and buffer allocation when it
> is not sufficient, and the latter should be rare case assuming you set the
> buffer.size to be reasonably large. But note that since max.block is a
> global config it may also affect other blocking calls like txn-related ones
> as well.
>
>
> On Wed, Jul 1, 2020 at 6:10 PM Rhys Anthony McCaig 
> wrote:
>
> > Hi All,
> >
> > I have been recently working on a streams application that uses a
> > TopicNameExtractor to dynamically route records based on the payload.
> This
> > streams application is used by various other applications, and
> occasionally
> > these other applications request for a record to be sent to a
> non-existent
> > topic - rather than this topic be created, the message should be logged
> and
> > dropped.
> >
> > Unfortunately, I don't seem to have found a good way to implement this
> > behaviour in a reliable way: I originally hoped to be able to catch these
> > scenarios in a ProductionExceptionHandler by catching an
> > UnknownTopicOrPartitionError, however the current producer behaviour is
> to
> > wait for max.block.ms in waitOnMetadata() for partitions to be returned
> > for
> > the topic before throwing a TimeoutException. If after refreshing
> metadata,
> > there are still no partitions for the requested topic, it will continue
> to
> > request an update until the timeout is reached:  (
> >
> >
> https://github.com/apache/kafka/blob/b8a99be7847c61d7792689b71fda5b283f8340a8/clients/src/main/java/org/apache/kafka/clients/producer/KafkaProducer.java#L1051
> > )
> > For my use case, there are two challenges here:
> > 1. ProductionExceptionHandler must catch TimeoutException and inspect the
> > message to determine that the exception was caused by not finding the
> topic
> > in the metadata
> > 2. The streams task blocks (as expected) while the producer is fetching
> > metadata, holding up processing of other 

[VOTE] KIP-616: Rename implicit Serdes instances in kafka-streams-scala

2020-07-08 Thread Yuriy Badalyantc
Hi everybody

I would like to start a vote  for KIP-616:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-616%3A+Rename+implicit+Serdes+instances+in+kafka-streams-scala

This KIP fixes name clash in the org.apache.kafka.streams.scala.Serdes.

-Yuriy


Build failed in Jenkins: kafka-trunk-jdk8 #4701

2020-07-08 Thread Apache Jenkins Server
See 


Changes:

[github] KAFKA-10134: Use long poll if we do not have fetchable partitions


--
[...truncated 1.84 MB...]

kafka.zookeeper.ZooKeeperClientTest > testGetDataNonExistentZNode PASSED

kafka.zookeeper.ZooKeeperClientTest > testConnectionTimeout STARTED

kafka.zookeeper.ZooKeeperClientTest > testConnectionTimeout PASSED

kafka.zookeeper.ZooKeeperClientTest > 
testBlockOnRequestCompletionFromStateChangeHandler STARTED
ERROR: Could not install GRADLE_4_10_3_HOME
java.lang.NullPointerException
at 
hudson.plugins.toolenv.ToolEnvBuildWrapper$1.buildEnvVars(ToolEnvBuildWrapper.java:46)
at hudson.model.AbstractBuild.getEnvironment(AbstractBuild.java:873)
at hudson.plugins.git.GitSCM.getParamExpandedRepos(GitSCM.java:484)
at 
hudson.plugins.git.GitSCM.compareRemoteRevisionWithImpl(GitSCM.java:693)
at hudson.plugins.git.GitSCM.compareRemoteRevisionWith(GitSCM.java:658)
at hudson.scm.SCM.compareRemoteRevisionWith(SCM.java:400)
at hudson.scm.SCM.poll(SCM.java:417)
at hudson.model.AbstractProject._poll(AbstractProject.java:1390)
at hudson.model.AbstractProject.poll(AbstractProject.java:1293)
at hudson.triggers.SCMTrigger$Runner.runPolling(SCMTrigger.java:603)
at hudson.triggers.SCMTrigger$Runner.run(SCMTrigger.java:649)
at 
hudson.util.SequentialExecutionQueue$QueueEntry.run(SequentialExecutionQueue.java:119)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
ERROR: Could not install GRADLE_4_10_3_HOME
java.lang.NullPointerException
at 
hudson.plugins.toolenv.ToolEnvBuildWrapper$1.buildEnvVars(ToolEnvBuildWrapper.java:46)
at hudson.model.AbstractBuild.getEnvironment(AbstractBuild.java:873)
at hudson.plugins.git.GitSCM.getParamExpandedRepos(GitSCM.java:484)
at 
hudson.plugins.git.GitSCM.compareRemoteRevisionWithImpl(GitSCM.java:693)
at hudson.plugins.git.GitSCM.compareRemoteRevisionWith(GitSCM.java:658)
at hudson.scm.SCM.compareRemoteRevisionWith(SCM.java:400)
at hudson.scm.SCM.poll(SCM.java:417)
at hudson.model.AbstractProject._poll(AbstractProject.java:1390)
at hudson.model.AbstractProject.poll(AbstractProject.java:1293)
at hudson.triggers.SCMTrigger$Runner.runPolling(SCMTrigger.java:603)
at hudson.triggers.SCMTrigger$Runner.run(SCMTrigger.java:649)
at 
hudson.util.SequentialExecutionQueue$QueueEntry.run(SequentialExecutionQueue.java:119)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
ERROR: Could not install GRADLE_4_10_3_HOME
java.lang.NullPointerException
at 
hudson.plugins.toolenv.ToolEnvBuildWrapper$1.buildEnvVars(ToolEnvBuildWrapper.java:46)
at hudson.model.AbstractBuild.getEnvironment(AbstractBuild.java:873)
at hudson.plugins.git.GitSCM.getParamExpandedRepos(GitSCM.java:484)
at 
hudson.plugins.git.GitSCM.compareRemoteRevisionWithImpl(GitSCM.java:693)
at hudson.plugins.git.GitSCM.compareRemoteRevisionWith(GitSCM.java:658)
at hudson.scm.SCM.compareRemoteRevisionWith(SCM.java:400)
at hudson.scm.SCM.poll(SCM.java:417)
at hudson.model.AbstractProject._poll(AbstractProject.java:1390)
at hudson.model.AbstractProject.poll(AbstractProject.java:1293)
at hudson.triggers.SCMTrigger$Runner.runPolling(SCMTrigger.java:603)
at hudson.triggers.SCMTrigger$Runner.run(SCMTrigger.java:649)
at 
hudson.util.SequentialExecutionQueue$QueueEntry.run(SequentialExecutionQueue.java:119)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
ERROR: Could not install GRADLE_4_10_3_HOME
java.lang.NullPointerException
at 
hudson.plugins.toolenv.ToolEnvBuildWrapper$1.buildEnvVars(ToolEnvBuildWrapper.java:46)
at hudson.model.AbstractBuild.getEnvironment(AbstractBuild.java:873)
at 

Jenkins build is back to normal : kafka-trunk-jdk14 #278

2020-07-08 Thread Apache Jenkins Server
See 




[jira] [Created] (KAFKA-10251) Flaky Test kafka.api.TransactionsBounceTest.testWithGroupMetadata

2020-07-08 Thread Sophie Blee-Goldman (Jira)
Sophie Blee-Goldman created KAFKA-10251:
---

 Summary: Flaky Test 
kafka.api.TransactionsBounceTest.testWithGroupMetadata
 Key: KAFKA-10251
 URL: https://issues.apache.org/jira/browse/KAFKA-10251
 Project: Kafka
  Issue Type: Bug
  Components: core
Reporter: Sophie Blee-Goldman


h3. Stacktrace

org.scalatest.exceptions.TestFailedException: Consumed 0 records before timeout 
instead of the expected 200 records at 
org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:530) at 
org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:529) at 
org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1389) at 
org.scalatest.Assertions.fail(Assertions.scala:1091) at 
org.scalatest.Assertions.fail$(Assertions.scala:1087) at 
org.scalatest.Assertions$.fail(Assertions.scala:1389) at 
kafka.utils.TestUtils$.pollUntilAtLeastNumRecords(TestUtils.scala:842) at 
kafka.api.TransactionsBounceTest.testWithGroupMetadata(TransactionsBounceTest.scala:109)

 

 

The logs are pretty much just this on repeat:
{code:java}
[2020-07-08 23:41:04,034] ERROR Error when sending message to topic 
output-topic with key: 9955, value: 9955 with error: 
(org.apache.kafka.clients.producer.internals.ErrorLoggingCallback:52) 
org.apache.kafka.common.KafkaException: Failing batch since transaction was 
aborted at 
org.apache.kafka.clients.producer.internals.Sender.maybeSendAndPollTransactionalRequest(Sender.java:423)
 at org.apache.kafka.clients.producer.internals.Sender.runOnce(Sender.java:313) 
at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:240) at 
java.lang.Thread.run(Thread.java:748) [2020-07-08 23:41:04,034] ERROR Error 
when sending message to topic output-topic with key: 9959, value: 9959 with 
error: (org.apache.kafka.clients.producer.internals.ErrorLoggingCallback:52) 
org.apache.kafka.common.KafkaException: Failing batch since transaction was 
aborted at 
org.apache.kafka.clients.producer.internals.Sender.maybeSendAndPollTransactionalRequest(Sender.java:423)
 at org.apache.kafka.clients.producer.internals.Sender.runOnce(Sender.java:313) 
at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:240) at 
java.lang.Thread.run(Thread.java:748)
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Jenkins build is back to normal : kafka-trunk-jdk11 #1628

2020-07-08 Thread Apache Jenkins Server
See 




[DISCUSS] KIP-450: Sliding Windows

2020-07-08 Thread Leah Thomas
Hi all,
I'd like to kick-off the discussion for KIP-450, adding sliding window
aggregation support to Kafka Streams.

https://cwiki.apache.org/confluence/display/KAFKA/KIP-450%3A+Sliding+Window+Aggregations+in+the+DSL

Let me know what you think,
Leah


Build failed in Jenkins: kafka-trunk-jdk14 #277

2020-07-08 Thread Apache Jenkins Server
See 


Changes:

[github] MINOR; alterReplicaLogDirs should not fail all the futures when only 
one


--
[...truncated 6.38 MB...]
org.apache.kafka.streams.TopologyTestDriverTest > shouldInitProcessor[Eos 
enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldThrowForUnknownTopic[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldThrowForUnknownTopic[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldPunctuateOnStreamsTime[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldPunctuateOnStreamsTime[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldCaptureGlobalTopicNameIfWrittenInto[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldCaptureGlobalTopicNameIfWrittenInto[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldThrowIfInMemoryBuiltInStoreIsAccessedWithUntypedMethod[Eos enabled = 
false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldThrowIfInMemoryBuiltInStoreIsAccessedWithUntypedMethod[Eos enabled = 
false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldProcessFromSourcesThatMatchMultiplePattern[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldProcessFromSourcesThatMatchMultiplePattern[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > shouldPopulateGlobalStore[Eos 
enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > shouldPopulateGlobalStore[Eos 
enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldThrowIfPersistentBuiltInStoreIsAccessedWithUntypedMethod[Eos enabled = 
false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldThrowIfPersistentBuiltInStoreIsAccessedWithUntypedMethod[Eos enabled = 
false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldAllowPrePopulatingStatesStoresWithCachingEnabled[Eos enabled = false] 
STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldAllowPrePopulatingStatesStoresWithCachingEnabled[Eos enabled = false] 
PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldReturnCorrectPersistentStoreTypeOnly[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldReturnCorrectPersistentStoreTypeOnly[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > shouldRespectTaskIdling[Eos 
enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > shouldRespectTaskIdling[Eos 
enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldUseSourceSpecificDeserializers[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldUseSourceSpecificDeserializers[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > shouldReturnAllStores[Eos 
enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > shouldReturnAllStores[Eos 
enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldNotCreateStateDirectoryForStatelessTopology[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldNotCreateStateDirectoryForStatelessTopology[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldApplyGlobalUpdatesCorrectlyInRecursiveTopologies[Eos enabled = false] 
STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldApplyGlobalUpdatesCorrectlyInRecursiveTopologies[Eos enabled = false] 
PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldReturnAllStoresNames[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldReturnAllStoresNames[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldPassRecordHeadersIntoSerializersAndDeserializers[Eos enabled = false] 
STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldPassRecordHeadersIntoSerializersAndDeserializers[Eos enabled = false] 
PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldProcessConsumerRecordList[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldProcessConsumerRecordList[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldUseSinkSpecificSerializers[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldUseSinkSpecificSerializers[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldFlushStoreForFirstInput[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldFlushStoreForFirstInput[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 

Re: Feedback: Print schemaId using bin/kafka-dump-log.sh

2020-07-08 Thread Mohanraj Nagasamy
Thanks Adam for providing the feedback! 

On 7/7/20, 5:05 AM, "Adam Bellemare"  wrote:

Hi Mohanraj

While I see the usefulness of your suggestion, the main issue is that
you're using the Confluent schema registry's conventions and hardwiring
them into Kafka core. Given that Confluent's standards are not part of
Kafka's official standards, I do not think you will get approval to submit
this code into Kafka core.

There may be Confluent tools that are available that already do this, or
perhaps they have their own custom tools available where this may be more
suitable for submission.

Adam



On Mon, Jul 6, 2020 at 11:00 AM Mohanraj Nagasamy 

wrote:

> Do anyone have feedback on this? ☺
>
> From: Mohanraj Nagasamy 
> Date: Wednesday, July 1, 2020 at 6:29 PM
> To: "dev@kafka.apache.org" 
> Subject: Feedback: Print schemaId using bin/kafka-dump-log.sh
>
> When I try to dump kafka logs for diagnosing or debugging a problem, It's
> handy to see if the kafka log message schemaId or not. If it has got, 
print
> the schemaId.
>
> I'm soliciting feedback as to whether it is worth making this change to
> kafka-core codebase.
>
> I'm new to the kafka-community - forgive me if this wasn't part of the
> process.
>
> This is the change I made:
>
> ```
>  core/src/main/scala/kafka/tools/DumpLogSegments.scala | 21
> +++--
>  1 file changed, 19 insertions(+), 2 deletions(-)
>
> diff --git a/core/src/main/scala/kafka/tools/DumpLogSegments.scala
> b/core/src/main/scala/kafka/tools/DumpLogSegments.scala
> index 9e9546a92..a8750ac3d 100755
> --- a/core/src/main/scala/kafka/tools/DumpLogSegments.scala
> +++ b/core/src/main/scala/kafka/tools/DumpLogSegments.scala
> @@ -35,6 +35,7 @@ object DumpLogSegments {
>
>// visible for testing
>private[tools] val RecordIndent = "|"
> +  private val MAGIC_BYTE = 0x0
>
>def main(args: Array[String]): Unit = {
>  val opts = new DumpLogSegmentsOptions(args)
> @@ -277,8 +278,24 @@ object DumpLogSegments {
>}
>  } else if (printContents) {
>val (key, payload) = parser.parse(record)
> -  key.foreach(key => print(s" key: $key"))
> -  payload.foreach(payload => print(s" payload: $payload"))
> +  key.foreach(key => {
> +val keyBuffer = record.key()
> +if (keyBuffer.get() == MAGIC_BYTE) {
> +  print(s" keySchemaId: ${keyBuffer.getInt} key: $key")
> +}
> +else {
> +  print(s" key: $key")
> +}
> +  })
> +
> +  payload.foreach(payload => {
> +val valueBuffer = record.value()
> +if (valueBuffer.get() == MAGIC_BYTE) {
> +  print(s" payloadSchemaId: ${valueBuffer.getInt}
> payload: $payload")
> +} else {
> +  print(s" payload: $payload")
> +}
> +  })
>  }
>  println()
>}
> (END)
> ```
>
> And this is how the output looks like:
>
> ```
> $ bin/kafka-dump-log.sh --files
> data/kafka/logdir/avro_topic_test-0/.log
> --print-data-log
>
> | offset: 50 CreateTime: 1593570942959 keysize: -1 valuesize: 16 sequence:
> -1 headerKeys: [] payloadSchemaId: 1 payload:
> TracRowe
> baseOffset: 51 lastOffset: 51 count: 1 baseSequence: -1 lastSequence: -1
> producerId: -1 producerEpoch: -1 partitionLeaderEpoch: 0 isTransactional:
> false isControl: false position: 2918 CreateTime: 1593570958044 size: 101
> magic: 2 compresscodec: NONE crc: 1913155179 isvalid: true
> | offset: 51 CreateTime: 1593570958044 keysize: 3 valuesize: 30 sequence:
> -1 headerKeys: [] key: ... payloadSchemaId: 2 payload:
> .iRKoMVeoVVnTmQEuqwSTHZQ
> baseOffset: 52 lastOffset: 52 count: 1 baseSequence: -1 lastSequence: -1
> producerId: -1 producerEpoch: -1 partitionLeaderEpoch: 0 isTransactional:
> false isControl: false position: 3019 CreateTime: 1593570969001 size: 84
> magic: 2 compresscodec: NONE crc: 2188466405 isvalid: true
> ```
>
> -Mohan
>


Re: [DISCUSS] Apache Kafka 2.6.0 release

2020-07-08 Thread Sophie Blee-Goldman
Hey Randall,

We just discovered another regression in 2.6:
https://issues.apache.org/jira/browse/KAFKA-10249

The fix is extremely straightforward -- only about two lines of actual
code -- and low risk. It is a new regression introduced in 2.6 and affects
all Streams apps with any suppression or other in-memory state.

The PR is already ready here: https://github.com/apache/kafka/pull/8996

Best,
Sophie

On Wed, Jul 8, 2020 at 10:59 AM John Roesler  wrote:

> Hi Randall,
>
> While developing system tests, I've just unearthed a new 2.6 regression:
> https://issues.apache.org/jira/browse/KAFKA-10247
>
> I've got a PR in progress. Hoping to finish it up today:
> https://github.com/apache/kafka/pull/8994
>
> Sorry for the trouble,
> -John
>
> On Mon, Jun 29, 2020, at 09:29, Randall Hauch wrote:
> > Thanks for raising this, David. I agree it makes sense to include this
> fix
> > in 2.6, so I've adjusted the "Fix Version(s)" field to include '2.6.0'.
> >
> > Best regards,
> >
> > Randall
> >
> > On Mon, Jun 29, 2020 at 8:25 AM David Jacot  wrote:
> >
> > > Hi Randall,
> > >
> > > We have discovered an annoying issue that we introduced in 2.5:
> > >
> > > Describing topics with the command line tool fails if the user does not
> > > have the
> > > privileges to access the ListPartitionReassignments API. I believe that
> > > this is the
> > > case for most non-admin users.
> > >
> > > I propose to include the fix in 2.6. The fix is trivial so low risk.
> What
> > > do you think?
> > >
> > > JIRA: https://issues.apache.org/jira/browse/KAFKA-10212
> > > PR: https://github.com/apache/kafka/pull/8947
> > >
> > > Best,
> > > David
> > >
> > > On Sat, Jun 27, 2020 at 4:39 AM John Roesler 
> wrote:
> > >
> > > > Hi Randall,
> > > >
> > > > I neglected to notify this thread when I merged the fix for
> > > > https://issues.apache.org/jira/browse/KAFKA-10185
> > > > on June 19th. I'm sorry about that oversight. It is marked with
> > > > a fix version of 2.6.0.
> > > >
> > > > On a side node, I have a fix for KAFKA-10173, which I'm merging
> > > > and backporting right now.
> > > >
> > > > Thanks for managing the release,
> > > > -John
> > > >
> > > > On Thu, Jun 25, 2020, at 10:23, Randall Hauch wrote:
> > > > > Thanks for the update, folks!
> > > > >
> > > > > Based upon Jira [1], we currently have 4 issues that are considered
> > > > > blockers for the 2.6.0 release and production of RCs:
> > > > >
> > > > >- https://issues.apache.org/jira/browse/KAFKA-10134 - High CPU
> > > issue
> > > > >during rebalance in Kafka consumer after upgrading to 2.5
> > > (unassigned)
> > > > >- https://issues.apache.org/jira/browse/KAFKA-10143 - Can no
> longer
> > > > >change replication throttle with reassignment tool (Jason G)
> > > > >- https://issues.apache.org/jira/browse/KAFKA-10166 - Excessive
> > > > >TaskCorruptedException seen in testing (Sophie, Bruno)
> > > > >- https://issues.apache.org/jira/browse/KAFKA-10173
> > > > >- BufferUnderflowException during Kafka Streams Upgrade (John R)
> > > > >
> > > > > and one critical issue that may be a regression that at this time
> will
> > > > not
> > > > > block production of RCs:
> > > > >
> > > > >- https://issues.apache.org/jira/browse/KAFKA-10017 - Flaky
> Test
> > > > >EosBetaUpgradeIntegrationTest.shouldUpgradeFromEosAlphaToEosBeta
> > > > (Matthias)
> > > > >
> > > > > and one build/release issue we'd like to fix if possible but will
> not
> > > > block
> > > > > RCs or the release:
> > > > >
> > > > >- https://issues.apache.org/jira/browse/KAFKA-9381
> > > > >- kafka-streams-scala: Javadocs + Scaladocs not published on
> maven
> > > > central
> > > > >(me)
> > > > >
> > > > > I'm working with the assignees and reporters of these issues (via
> > > > comments
> > > > > on the issues) to identify an ETA and to track progress. Anyone is
> > > > welcome
> > > > > to chime in on those issues.
> > > > >
> > > > > At this time, no other changes (other than PRs that only
> fix/improve
> > > > tests)
> > > > > should be merged to the `2.6` branch. If you think you've
> identified a
> > > > new
> > > > > blocker issue or believe another existing issue should be treated
> as a
> > > > > blocker for 2.6.0, please mark the issue's `fix version` as `2.6.0`
> > > _and_
> > > > > respond to this thread with details, and I will work with you to
> > > > determine
> > > > > whether it is indeed a blocker.
> > > > >
> > > > > As always, let me know here if you have any questions/concerns.
> > > > >
> > > > > Best regards,
> > > > >
> > > > > Randall
> > > > >
> > > > > [1]
> https://issues.apache.org/jira/projects/KAFKA/versions/12346918
> > > > >
> > > > > On Thu, Jun 25, 2020 at 8:27 AM Mario Molina 
> > > wrote:
> > > > >
> > > > > > Hi Randal,
> > > > > >
> > > > > > Ticket https://issues.apache.org/jira/browse/KAFKA-9018 is not a
> > > > blocker
> > > > > > so
> > > > > > it can be moved to the 2.7.0 version.
> > > > > >
> > > > > > Mario
> > 

[jira] [Created] (KAFKA-10250) Kafka broker shrinks the ISRs and disconnects from other brokers for few seconds

2020-07-08 Thread Rishank Chandra Puram (Jira)
Rishank Chandra Puram created KAFKA-10250:
-

 Summary: Kafka broker shrinks the ISRs and disconnects from other 
brokers for few seconds
 Key: KAFKA-10250
 URL: https://issues.apache.org/jira/browse/KAFKA-10250
 Project: Kafka
  Issue Type: Bug
  Components: build, config, consumer, controller, log, producer 
Affects Versions: 2.0.0
Reporter: Rishank Chandra Puram


The following the summary/overview of the whole issue. Can you please help us 
look into the below and let us know you thoughts on what caused the issue? And 
how to mitigate this in the future

Issue:
1.  All of a sudden all the other brokers in cluster report have issues 
with one of the broker as below

Error for good broker [broker: broker2, brokerID: 1030]
[2020-06-27 16:14:53,367] WARN [ReplicaFetcher replicaId=1030, leaderId=1029, 
fetcherId=13] Error in response for fetch request (type=FetchRequest, 
replicaId=1030, maxWait=500, minBytes=1, maxBytes=10485760, fetchData={}, 
isolationLevel=READ_UNCOMMITTED, toForget=, metadata=(sessionId=1901394956, 
epoch=1018699)) (kafka.server.ReplicaFetcherThread)
java.io.IOException: Connection to 1029 was disconnected before the response 
was read
at 
org.apache.kafka.clients.NetworkClientUtils.sendAndReceive(NetworkClientUtils.java:97)
at 
kafka.server.ReplicaFetcherBlockingSend.sendRequest(ReplicaFetcherBlockingSend.scala:96)
at 
kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:240)
at 
kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:43)
at 
kafka.server.AbstractFetcherThread.prabcssFetchRequest(AbstractFetcherThread.scala:149)
at 
kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:114)
at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82)
 
Error from affected broker [broker: broker6, brokerID: 1029]
[2020-06-27 16:14:25,744] INFO [Partition topic_test2-33 broker=1029] Shrinking 
ISR from 1025,1029,1030 to 1025,1029 (kafka.cluster.Partition)
[2020-06-27 16:14:25,752] INFO [Partition topic_a_restate-34 broker=1029] 
Shrinking ISR from 1027,1029,1028 to 1029 (kafka.cluster.Partition)
[2020-06-27 16:14:25,760] INFO [Partition topic_f_restate-39 broker=1029] 
Shrinking ISR from 1026,1029,1025 to 1029,1025 (kafka.cluster.Partition)
[2020-06-27 16:14:25,772] INFO [Partition topic_test2-16 broker=1029] Shrinking 
ISR from 1028,1029,1030 to 1029,1030 (kafka.cluster.Partition)
[2020-06-27 16:14:26,683] INFO [ProducerStateManager partition=topic_abc_f-21] 
Writing producer snapshot at offset 1509742173 (kafka.log.ProducerStateManager)
[2020-06-27 16:14:26,683] INFO [Log partition=topic_abc_f-21, 
dir=/hadoop-e/kafka/data1] Rolled new log segment at offset 1509742173 in 1 ms. 
(kafka.log.Log)
[2020-06-27 16:14:55,375] INFO [Partition topic_test2-33 broker=1029] Expanding 
ISR from 1025,1029 to 1025,1029,1030 (kafka.cluster.Partition)
[2020-06-27 16:14:55,387] INFO [Partition test-topic-analysis-9 broker=1029] 
Expanding ISR from 1028,1029 to 1028,1029,1030 (kafka.cluster.Partition)
 
2.  Entire kafka cluster becomes stable within few minutes
 
Trace for good broker [broker: broker2, brokerID: 1030]
[2020-06-27 16:20:14,861] INFO Deleted time index 
/hadoop-g/kafka/data1/topic-analysis-0/009100172512.timeindex.deleted. 
(kafka.log.LogSegment)
[2020-06-27 16:20:14,882] INFO [Log partition=topic-analysis-4, 
dir=/hadoop-b/kafka/data1] Deleting segment 9100010843 (kafka.log.Log)
[2020-06-27 16:20:14,883] INFO Deleted log 
/hadoop-b/kafka/data1/topic-analysis-4/009100010843.log.deleted. 
(kafka.log.LogSegment)
[2020-06-27 16:20:14,897] INFO Deleted offset index 
/hadoop-b/kafka/data1/topic-analysis-4/009100010843.index.deleted. 
(kafka.log.LogSegment)
[2020-06-27 16:20:14,897] INFO Deleted time index 
/hadoop-b/kafka/data1/topic-analysis-4/009100010843.timeindex.deleted. 
(kafka.log.LogSegment)
 
Trace from affected broker [broker: broker6, brokerID: 1029]
[2020-06-27 16:21:01,552] INFO [Log partition=topic-analysis-22, 
dir=/hadoop-i/kafka/data1] Scheduling log segment [baseOffset 9099830344, size 
1073733482] for deletion. (kafka.log.Log)
[2020-06-27 16:21:01,553] INFO [Log partition=topic-analysis-22, 
dir=/hadoop-i/kafka/data1] Scheduling log segment [baseOffset 9100082067, size 
1073738790] for deletion. (kafka.log.Log)
[2020-06-27 16:21:01,553] INFO [Log partition=topic-analysis-22, 
dir=/hadoop-i/kafka/data1] Incrementing log start offset to 9100334713 
(kafka.log.Log)
[2020-06-27 16:21:01,581] INFO Cleared earliest 0 entries from epoch cache 
based on passed offset 9100334713 leaving 1 in EpochFile for partition 
topic-analysis-22 (kafka.server.epoch.LeaderEpochFileCache)
[2020-06-27 16:22:00,175] INFO [Log partition=topic_def_c-1, 
dir=/hadoop-j/kafka/data1] Deleting segment 1628853412 

Build failed in Jenkins: kafka-2.2-jdk8 #49

2020-07-08 Thread Apache Jenkins Server
See 


Changes:

[jason] KAFKA-9144; Track timestamp from txn markers to prevent early producer


--
[...truncated 2.59 MB...]

kafka.utils.CommandLineUtilsTest > testParseArgs PASSED

kafka.utils.CommandLineUtilsTest > testParseArgsWithMultipleDelimiters STARTED

kafka.utils.CommandLineUtilsTest > testParseArgsWithMultipleDelimiters PASSED

kafka.utils.CommandLineUtilsTest > testMaybeMergeOptionsDefaultValueIfNotExist 
STARTED

kafka.utils.CommandLineUtilsTest > testMaybeMergeOptionsDefaultValueIfNotExist 
PASSED

kafka.utils.CommandLineUtilsTest > testParseEmptyArgWithNoDelimiter STARTED

kafka.utils.CommandLineUtilsTest > testParseEmptyArgWithNoDelimiter PASSED

kafka.utils.CommandLineUtilsTest > 
testMaybeMergeOptionsDefaultOverwriteExisting STARTED

kafka.utils.CommandLineUtilsTest > 
testMaybeMergeOptionsDefaultOverwriteExisting PASSED

kafka.utils.CommandLineUtilsTest > testParseEmptyArgAsValid STARTED

kafka.utils.CommandLineUtilsTest > testParseEmptyArgAsValid PASSED

kafka.utils.CommandLineUtilsTest > testMaybeMergeOptionsNotOverwriteExisting 
STARTED

kafka.utils.CommandLineUtilsTest > testMaybeMergeOptionsNotOverwriteExisting 
PASSED

kafka.utils.JsonTest > testParseToWithInvalidJson STARTED

kafka.utils.JsonTest > testParseToWithInvalidJson PASSED

kafka.utils.JsonTest > testParseTo STARTED

kafka.utils.JsonTest > testParseTo PASSED

kafka.utils.JsonTest > testJsonParse STARTED

kafka.utils.JsonTest > testJsonParse PASSED

kafka.utils.JsonTest > testLegacyEncodeAsString STARTED

kafka.utils.JsonTest > testLegacyEncodeAsString PASSED

kafka.utils.JsonTest > testEncodeAsBytes STARTED

kafka.utils.JsonTest > testEncodeAsBytes PASSED

kafka.utils.JsonTest > testEncodeAsString STARTED

kafka.utils.JsonTest > testEncodeAsString PASSED

kafka.utils.ReplicationUtilsTest > testUpdateLeaderAndIsr STARTED

kafka.utils.ReplicationUtilsTest > testUpdateLeaderAndIsr PASSED

kafka.utils.ZkUtilsTest > testGetSequenceIdMethod STARTED

kafka.utils.ZkUtilsTest > testGetSequenceIdMethod PASSED

kafka.utils.ZkUtilsTest > testAbortedConditionalDeletePath STARTED

kafka.utils.ZkUtilsTest > testAbortedConditionalDeletePath PASSED

kafka.utils.ZkUtilsTest > testGetAllPartitionsTopicWithoutPartitions STARTED

kafka.utils.ZkUtilsTest > testGetAllPartitionsTopicWithoutPartitions PASSED

kafka.utils.ZkUtilsTest > testSuccessfulConditionalDeletePath STARTED

kafka.utils.ZkUtilsTest > testSuccessfulConditionalDeletePath PASSED

kafka.utils.ZkUtilsTest > testPersistentSequentialPath STARTED

kafka.utils.ZkUtilsTest > testPersistentSequentialPath PASSED

kafka.utils.ZkUtilsTest > testClusterIdentifierJsonParsing STARTED

kafka.utils.ZkUtilsTest > testClusterIdentifierJsonParsing PASSED

kafka.utils.ZkUtilsTest > testGetLeaderIsrAndEpochForPartition STARTED

kafka.utils.ZkUtilsTest > testGetLeaderIsrAndEpochForPartition PASSED

kafka.utils.PasswordEncoderTest > testEncoderConfigChange STARTED

kafka.utils.PasswordEncoderTest > testEncoderConfigChange PASSED

kafka.utils.PasswordEncoderTest > testEncodeDecodeAlgorithms STARTED

kafka.utils.PasswordEncoderTest > testEncodeDecodeAlgorithms PASSED

kafka.utils.PasswordEncoderTest > testEncodeDecode STARTED

kafka.utils.PasswordEncoderTest > testEncodeDecode PASSED

kafka.utils.timer.TimerTaskListTest > testAll STARTED

kafka.utils.timer.TimerTaskListTest > testAll PASSED

kafka.utils.timer.TimerTest > testAlreadyExpiredTask STARTED

kafka.utils.timer.TimerTest > testAlreadyExpiredTask PASSED

kafka.utils.timer.TimerTest > testTaskExpiration STARTED

kafka.utils.timer.TimerTest > testTaskExpiration PASSED

kafka.utils.ShutdownableThreadTest > testShutdownWhenCalledAfterThreadStart 
STARTED

kafka.utils.ShutdownableThreadTest > testShutdownWhenCalledAfterThreadStart 
PASSED

kafka.utils.SchedulerTest > testMockSchedulerNonPeriodicTask STARTED

kafka.utils.SchedulerTest > testMockSchedulerNonPeriodicTask PASSED

kafka.utils.SchedulerTest > testMockSchedulerPeriodicTask STARTED

kafka.utils.SchedulerTest > testMockSchedulerPeriodicTask PASSED

kafka.utils.SchedulerTest > testNonPeriodicTask STARTED

kafka.utils.SchedulerTest > testNonPeriodicTask PASSED

kafka.utils.SchedulerTest > testRestart STARTED

kafka.utils.SchedulerTest > testRestart PASSED

kafka.utils.SchedulerTest > testReentrantTaskInMockScheduler STARTED

kafka.utils.SchedulerTest > testReentrantTaskInMockScheduler PASSED

kafka.utils.SchedulerTest > testPeriodicTask STARTED

kafka.utils.SchedulerTest > testPeriodicTask PASSED

kafka.utils.json.JsonValueTest > testJsonObjectIterator STARTED

kafka.utils.json.JsonValueTest > testJsonObjectIterator PASSED

kafka.utils.json.JsonValueTest > testDecodeLong STARTED

kafka.utils.json.JsonValueTest > testDecodeLong PASSED

kafka.utils.json.JsonValueTest > testAsJsonObject STARTED

kafka.utils.json.JsonValueTest > testAsJsonObject PASSED


Build failed in Jenkins: kafka-trunk-jdk11 #1627

2020-07-08 Thread Apache Jenkins Server
See 


Changes:

[github] KAFKA-10134: Use long poll if we do not have fetchable partitions


--
[...truncated 6.38 MB...]

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldPunctuateOnWallClockTimeDeprecated[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldPunctuateOnWallClockTimeDeprecated[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldProcessRecordForTopic[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldProcessRecordForTopic[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldForwardRecordsFromSubtopologyToSubtopology[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldForwardRecordsFromSubtopologyToSubtopology[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldNotUpdateStoreForSmallerValue[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldNotUpdateStoreForSmallerValue[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldCreateStateDirectoryForStatefulTopology[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldCreateStateDirectoryForStatefulTopology[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldPunctuateIfWallClockTimeAdvances[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldPunctuateIfWallClockTimeAdvances[Eos enabled = false] PASSED

org.apache.kafka.streams.internals.WindowStoreFacadeTest > shouldReturnIsOpen 
STARTED

org.apache.kafka.streams.internals.WindowStoreFacadeTest > shouldReturnIsOpen 
PASSED

org.apache.kafka.streams.internals.WindowStoreFacadeTest > shouldReturnName 
STARTED

org.apache.kafka.streams.internals.WindowStoreFacadeTest > shouldReturnName 
PASSED

org.apache.kafka.streams.internals.WindowStoreFacadeTest > 
shouldPutWithUnknownTimestamp STARTED

org.apache.kafka.streams.internals.WindowStoreFacadeTest > 
shouldPutWithUnknownTimestamp PASSED

org.apache.kafka.streams.internals.WindowStoreFacadeTest > 
shouldPutWindowStartTimestampWithUnknownTimestamp STARTED

org.apache.kafka.streams.internals.WindowStoreFacadeTest > 
shouldPutWindowStartTimestampWithUnknownTimestamp PASSED

org.apache.kafka.streams.internals.WindowStoreFacadeTest > 
shouldReturnIsPersistent STARTED

org.apache.kafka.streams.internals.WindowStoreFacadeTest > 
shouldReturnIsPersistent PASSED

org.apache.kafka.streams.internals.WindowStoreFacadeTest > shouldForwardClose 
STARTED

org.apache.kafka.streams.internals.WindowStoreFacadeTest > shouldForwardClose 
PASSED

org.apache.kafka.streams.internals.WindowStoreFacadeTest > shouldForwardFlush 
STARTED

org.apache.kafka.streams.internals.WindowStoreFacadeTest > shouldForwardFlush 
PASSED

org.apache.kafka.streams.internals.WindowStoreFacadeTest > shouldForwardInit 
STARTED

org.apache.kafka.streams.internals.WindowStoreFacadeTest > shouldForwardInit 
PASSED

org.apache.kafka.streams.TestTopicsTest > testNonUsedOutputTopic STARTED

org.apache.kafka.streams.TestTopicsTest > testNonUsedOutputTopic PASSED

org.apache.kafka.streams.TestTopicsTest > testEmptyTopic STARTED

org.apache.kafka.streams.TestTopicsTest > testEmptyTopic PASSED

org.apache.kafka.streams.TestTopicsTest > testStartTimestamp STARTED

org.apache.kafka.streams.TestTopicsTest > testStartTimestamp PASSED

org.apache.kafka.streams.TestTopicsTest > testNegativeAdvance STARTED

org.apache.kafka.streams.TestTopicsTest > testNegativeAdvance PASSED

org.apache.kafka.streams.TestTopicsTest > shouldNotAllowToCreateWithNullDriver 
STARTED

org.apache.kafka.streams.TestTopicsTest > shouldNotAllowToCreateWithNullDriver 
PASSED

org.apache.kafka.streams.TestTopicsTest > testDuration STARTED

org.apache.kafka.streams.TestTopicsTest > testDuration PASSED

org.apache.kafka.streams.TestTopicsTest > testOutputToString STARTED

org.apache.kafka.streams.TestTopicsTest > testOutputToString PASSED

org.apache.kafka.streams.TestTopicsTest > testValue STARTED

org.apache.kafka.streams.TestTopicsTest > testValue PASSED

org.apache.kafka.streams.TestTopicsTest > testTimestampAutoAdvance STARTED

org.apache.kafka.streams.TestTopicsTest > testTimestampAutoAdvance PASSED

org.apache.kafka.streams.TestTopicsTest > testOutputWrongSerde STARTED

org.apache.kafka.streams.TestTopicsTest > testOutputWrongSerde PASSED

org.apache.kafka.streams.TestTopicsTest > 
shouldNotAllowToCreateOutputTopicWithNullTopicName STARTED

org.apache.kafka.streams.TestTopicsTest > 
shouldNotAllowToCreateOutputTopicWithNullTopicName PASSED

org.apache.kafka.streams.TestTopicsTest > testWrongSerde STARTED

org.apache.kafka.streams.TestTopicsTest > testWrongSerde PASSED

org.apache.kafka.streams.TestTopicsTest > 

Re: [DISCUSS] KIP-631: The Quorum-based Kafka Controller

2020-07-08 Thread Colin McCabe
On Tue, Jul 7, 2020, at 18:27, Ron Dagostino wrote:
> HI Colin.  Thanks for the KIP.  Here is some feedback and various questions.
> 

Hi Ron,

Thanks for taking a look!

> "*Controller processes will listen on a separate port from brokers.  This
> will be true even when the broker and controller are co-located in the same
> JVM*". I assume it is possible that the port numbers could be the same when
> using separate JVMs (i.e. broker uses port 9192 and controller also uses
> port 9192).  I think it would be clearer to state this along these
> lines: "Controller
> nodes will listen on a port, and the controller port must differ from any
> port that a broker in the same JVM is listening on.  In other words, a
> controller and a broker node, when in the same JVM, do not share ports"
> 

I changed it to "Controller processes will listen on a separate endpoint from 
brokers" to avoid confusion.

>
> I think the sentence "*In the realm of ACLs, this translates to controllers
> requiring CLUSTERACTION on CLUSTER for all operations*" is confusing.  It
> feels to me that you can just delete it.  Am I missing something here?
> 

Hmm.  I think it is important to be specific about what ACLs we plan to 
require.  CLUSTERACTION on CLUSTER is what we generally use for 
broker-to-broker operations.  I'm not sure how to improve this.  Maybe a link 
to some background material would help?

>
> The KIP states "*The metadata will be stored in memory on all the active
> controllers.*"  Can there be multiple active controllers?  Should it
> instead read "The metadata will be stored in memory on all potential
> controllers." (or something like that)?
> 

Oh, this should have just been "The metadata will be stored in memory on all 
the controllers."  Fixed.

>
> KIP-595 states "*we have assumed the name __cluster_metadata for this
> topic, but this is not a formal part of this proposal*".  This KIP-631
> states "*Metadata changes need to be persisted to the __metadata log before
> we propagate them to the other nodes in the cluster.  This means waiting
> for the metadata log's last stable offset to advance to the offset of the
> change.*"  Are we here formally defining "__metadata" as the topic name,
> and should these sentences refer to "__metadata topic" rather than
> "__metadata log"?

I think the place to have the discussion about the name of the metadata topic 
is here.  The other KIPs only refer to this tangentially.

We've had a few suggestions in the past, but the current one in the doc is 
__kafka_metadata.  Thinking about it a bit more, though, maybe we should use 
something that isn't valid as a normal topic name, to avoid conflicts.  Perhaps 
~metadata would be good.

>  What are the "other nodes in the cluster" that are
> referred to?  These are not controller nodes but brokers, right?

It's intended to be all nodes: controller and brokers.  After all, the other 
controllers also need to mirror the metadata.

>  If so,
> then should we say "before we propagate them to the brokers"?  Technically
> we have a controller cluster and a broker cluster -- two separate clusters,
> correct?  (Even though we could potentially share JVMs and therefore
> require no additional processes.). If the statement is referring to nodes
> in both clusters then maybe we should state "before we propagate them to
> the other nodes in the controller cluster or to brokers."
> 
> "*The controller may have several of these uncommitted changes in flight at
> any given time.  In essence, the controller's in-memory state is always a
> little bit in the future compared to the current state.  This allows the
> controller to continue doing things while it waits for the previous changes
> to be committed to the Raft log.*"  Should the three references above be to
> the active controller rather than just the controller?
> 

Good point.  I fixed this to refer to the "active controller."

>
> "*Therefore, the controller must not make this future state "visible" to
> the rest of the cluster until it has been made persistent – that is, until
> it becomes current state*". Again I wonder if this should refer to "active"
> controller, and indicate "anyone else" as opposed to "the rest of the
> cluster" since we are talking about 2 clusters here?
> 

Fixed.

>
> "*When the active controller decides that it itself should create a
> snapshot, it will first try to give up the leadership of the Raft quorum.*"
> Why?  Is it necessary to state this?  It seems like it might be an
> implementation detail rather than a necessary constraint/requirement that
> we declare publicly and would have to abide by.
>

It seems in scope here, since it describes how the controller interacts with 
the snapshotting mechanism.  An implementation detail would be something more 
like a class name or code, I think.

>
> "*It will reject brokers whose metadata is too stale*". Why?  An example
> might be helpful here.
> 

There's a bit more context in KIP-500.  Stale metadata is a general 

[jira] [Created] (KAFKA-10249) In-memory stores are skipped when checkpointing but not skipped when reading the checkpoint

2020-07-08 Thread Sophie Blee-Goldman (Jira)
Sophie Blee-Goldman created KAFKA-10249:
---

 Summary: In-memory stores are skipped when checkpointing but not 
skipped when reading the checkpoint
 Key: KAFKA-10249
 URL: https://issues.apache.org/jira/browse/KAFKA-10249
 Project: Kafka
  Issue Type: Bug
  Components: streams
Affects Versions: 2.6.0
Reporter: Sophie Blee-Goldman
Assignee: Sophie Blee-Goldman


As the title suggests, offsets for in-memory stores (including the suppression 
buffer) are not written to the checkpoint file. However, when reading from the 
checkpoint file during task initialization, we do not check 
StateStore#persistent. We attempt to look up the offsets for in-memory stores 
in the checkpoint file, and obviously do not find them.

With eos we have to conclude that the existing state is dirty and thus throw a 
TaskCorruptedException. So pretty much any task with in-memory state will 
always hit this exception when reinitializing from the checkpoint, forcing it 
to clear the entire state directory and build up all of its state again from 
scratch (both persistent and in-memory).

This is especially unfortunate for KIP-441, as we will hit this any time a task 
is moved from one thread to another.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-10225) Increase default zk session timeout for system tests

2020-07-08 Thread Jun Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jun Rao resolved KAFKA-10225.
-
Fix Version/s: 2.7.0
   Resolution: Fixed

merged the PR to trunk

> Increase default zk session timeout for system tests
> 
>
> Key: KAFKA-10225
> URL: https://issues.apache.org/jira/browse/KAFKA-10225
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Chia-Ping Tsai
>Assignee: Chia-Ping Tsai
>Priority: Minor
> Fix For: 2.7.0
>
>
> I'm digging in the flaky system tests and then I noticed there are many flaky 
> caused by following check.
> {code}
> with node.account.monitor_log(KafkaService.STDOUT_STDERR_CAPTURE) as 
> monitor:
> node.account.ssh(cmd)
> # Kafka 1.0.0 and higher don't have a space between "Kafka" and 
> "Server"
> monitor.wait_until("Kafka\s*Server.*started", 
> timeout_sec=timeout_sec, backoff_sec=.25,
>err_msg="Kafka server didn't finish startup in 
> %d seconds" % timeout_sec)
> {code}
> And the error message in broker log is shown below.
> {quote}
> kafka.zookeeper.ZooKeeperClientTimeoutException: Timed out waiting for 
> connection while in state: CONNECTING
>   at 
> kafka.zookeeper.ZooKeeperClient.waitUntilConnected(ZooKeeperClient.scala:262)
>   at kafka.zookeeper.ZooKeeperClient.(ZooKeeperClient.scala:119)
>   at kafka.zk.KafkaZkClient$.apply(KafkaZkClient.scala:1880)
>   at kafka.server.KafkaServer.createZkClient$1(KafkaServer.scala:430)
>   at kafka.server.KafkaServer.initZkClient(KafkaServer.scala:455)
>   at kafka.server.KafkaServer.startup(KafkaServer.scala:227)
>   at 
> kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:44)
>   at kafka.Kafka$.main(Kafka.scala:82)
>   at kafka.Kafka.main(Kafka.scala)
> {quote}
> I'm surprised the default timeout of zk connection in system test is only 2 
> seconds as the default timeout in production is increased to 18s (see 
> https://github.com/apache/kafka/commit/4bde9bb3ccaf5571be76cb96ea051dadaeeaf5c7)
> {code}
> config_property.ZOOKEEPER_CONNECTION_TIMEOUT_MS: 2000
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-10248) Drop idempotent KTable source updates

2020-07-08 Thread John Roesler (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Roesler resolved KAFKA-10248.
--
Resolution: Fixed

> Drop idempotent KTable source updates
> -
>
> Key: KAFKA-10248
> URL: https://issues.apache.org/jira/browse/KAFKA-10248
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: John Roesler
>Assignee: Richard Yu
>Priority: Major
> Fix For: 2.6.0
>
>
> Implement KIP-557 for KTable source nodes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-10248) Drop idempotent KTable source updates

2020-07-08 Thread John Roesler (Jira)
John Roesler created KAFKA-10248:


 Summary: Drop idempotent KTable source updates
 Key: KAFKA-10248
 URL: https://issues.apache.org/jira/browse/KAFKA-10248
 Project: Kafka
  Issue Type: Sub-task
Reporter: John Roesler
Assignee: Richard Yu


Implement KIP-557 for KTable source nodes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] KIP-363

2020-07-08 Thread Magnus Edenhill
Hi Tom,

I think it would be useful with some real world (or made up!) numbers on
how much relative/% space is saved for
the most error-dense protocol requests.
E.g., an OffsetCommitResponse with 10 topics and 100 failing partitions
would reduce the overall size by % bytes.

Thanks,
Magnus


Den tis 7 juli 2020 kl 17:01 skrev Colin McCabe :

> Hi Tom,
>
> Thanks for this.  I think the tough part is probably the few messages that
> are still using manual serialization, which can't be easily converted to
> using this.  So we will probably have to upgrade them to using automatic
> generation, or accept a little inconsistency for a while until they are
> upgraded.
>
> best,
> Colin
>
>
> On Wed, Jul 1, 2020, at 09:21, Tom Bentley wrote:
> > Hi all,
> >
> > Following a suggestion from Colin in the KIP-625 discussion thread, I'd
> > like to start discussion on a much smaller KIP which proposes to make
> error
> > codes and messages tagged fields in all RPCs.
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-363%3A+Make+RPC+error+codes+and+messages+tagged+fields
> >
> > I'd be grateful for any feedback you may have.
> >
> > Kind regards,
> >
> > Tom
> >
>


Re: [DISCUSS] KIP-621: Deprecate and replace DescribeLogDirsResult.all() and .values()

2020-07-08 Thread Colin McCabe
Hi Dongjin,

Hmm.  I'm not sure I follow.  How does deprecating 
DescribeLogDirsResponse.LogDirInfo help here?  The issue is not so much the 
class, but the fact that it's exposed as a public API.  So it seems appropriate 
to deprecate the methods that return it, but not the class itself, since we'll 
continue to use it internally.

best,
Colin


On Wed, Jul 8, 2020, at 07:41, Dongjin Lee wrote:
> Hi Tom,
> 
> Thanks for taking this issue. I opened a PR for this issue earlier, but
> your KIP was submitted first. So I closed my one
> 
> .
> 
> I have a question: for consistency with other methods, how about
> maintaining the signature of DescribeLogDirsResult#[all, values]? There is
> an alternative approach to deprecate DescribeLogDirsResponse.[LogDirInfo,
> ReplicaInfo]. (Please have a look at my proposal.)
> 
> Best,
> Dongjin
> 
> +1. I updated the link to the discussion thread on your KIP document.
> 
> On 2020/06/29 09:31:53, Tom Bentley  wrote:
> > Hi,>
> >
> > Does anyone have any comments about this? If not I'll likely start a
> vote>
> > in a couple of days.>
> >
> > Kind regards,>
> >
> > Tom>
> >
> > On Mon, Jun 8, 2020 at 4:56 PM Tom Bentley  wrote:>
> >
> > > Hi all,>
> > >>
> > > I've opened a small KIP seeking to deprecate and replace a couple of>
> > > methods of DescribeLogDirsResult which refer to internal classes in
> their>
> > > return type.>
> > >>
> > >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=158862109>
> > >>
> > > Please take a look if you have the time.>
> > >>
> > > Kind regards,>
> > >>
> > > Tom>
> > >>
> >
>


[jira] [Resolved] (KAFKA-10134) High CPU issue during rebalance in Kafka consumer after upgrading to 2.5

2020-07-08 Thread Guozhang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang resolved KAFKA-10134.
---
Resolution: Fixed

> High CPU issue during rebalance in Kafka consumer after upgrading to 2.5
> 
>
> Key: KAFKA-10134
> URL: https://issues.apache.org/jira/browse/KAFKA-10134
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Affects Versions: 2.5.0
>Reporter: Sean Guo
>Assignee: Guozhang Wang
>Priority: Blocker
> Fix For: 2.6.0, 2.5.1
>
>
> We want to utilize the new rebalance protocol to mitigate the stop-the-world 
> effect during the rebalance as our tasks are long running task.
> But after the upgrade when we try to kill an instance to let rebalance happen 
> when there is some load(some are long running tasks >30S) there, the CPU will 
> go sky-high. It reads ~700% in our metrics so there should be several threads 
> are in a tight loop. We have several consumer threads consuming from 
> different partitions during the rebalance. This is reproducible in both the 
> new CooperativeStickyAssignor and old eager rebalance rebalance protocol. The 
> difference is that with old eager rebalance rebalance protocol used the high 
> CPU usage will dropped after the rebalance done. But when using cooperative 
> one, it seems the consumers threads are stuck on something and couldn't 
> finish the rebalance so the high CPU usage won't drop until we stopped our 
> load. Also a small load without long running task also won't cause continuous 
> high CPU usage as the rebalance can finish in that case.
>  
> "executor.kafka-consumer-executor-4" #124 daemon prio=5 os_prio=0 
> cpu=76853.07ms elapsed=841.16s tid=0x7fe11f044000 nid=0x1f4 runnable  
> [0x7fe119aab000]"executor.kafka-consumer-executor-4" #124 daemon prio=5 
> os_prio=0 cpu=76853.07ms elapsed=841.16s tid=0x7fe11f044000 nid=0x1f4 
> runnable  [0x7fe119aab000]   java.lang.Thread.State: RUNNABLE at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:467)
>  at 
> org.apache.kafka.clients.consumer.KafkaConsumer.updateAssignmentMetadataIfNeeded(KafkaConsumer.java:1275)
>  at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1241) 
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1216) 
> at
>  
> By debugging into the code we found it looks like the clients are  in a loop 
> on finding the coordinator.
> I also tried the old rebalance protocol for the new version the issue still 
> exists but the CPU will be back to normal when the rebalance is done.
> Also tried the same on the 2.4.1 which seems don't have this issue. So it 
> seems related something changed between 2.4.1 and 2.5.0.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Build failed in Jenkins: kafka-trunk-jdk14 #276

2020-07-08 Thread Apache Jenkins Server
See 


Changes:

[github] KAFKA-10134: Use long poll if we do not have fetchable partitions


--
[...truncated 3.19 MB...]
org.apache.kafka.streams.TopologyTestDriverTest > 
shouldCaptureGlobalTopicNameIfWrittenInto[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldCaptureGlobalTopicNameIfWrittenInto[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldThrowIfInMemoryBuiltInStoreIsAccessedWithUntypedMethod[Eos enabled = 
false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldThrowIfInMemoryBuiltInStoreIsAccessedWithUntypedMethod[Eos enabled = 
false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldProcessFromSourcesThatMatchMultiplePattern[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldProcessFromSourcesThatMatchMultiplePattern[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > shouldPopulateGlobalStore[Eos 
enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > shouldPopulateGlobalStore[Eos 
enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldThrowIfPersistentBuiltInStoreIsAccessedWithUntypedMethod[Eos enabled = 
false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldThrowIfPersistentBuiltInStoreIsAccessedWithUntypedMethod[Eos enabled = 
false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldAllowPrePopulatingStatesStoresWithCachingEnabled[Eos enabled = false] 
STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldAllowPrePopulatingStatesStoresWithCachingEnabled[Eos enabled = false] 
PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldReturnCorrectPersistentStoreTypeOnly[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldReturnCorrectPersistentStoreTypeOnly[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > shouldRespectTaskIdling[Eos 
enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > shouldRespectTaskIdling[Eos 
enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldUseSourceSpecificDeserializers[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldUseSourceSpecificDeserializers[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > shouldReturnAllStores[Eos 
enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > shouldReturnAllStores[Eos 
enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldNotCreateStateDirectoryForStatelessTopology[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldNotCreateStateDirectoryForStatelessTopology[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldApplyGlobalUpdatesCorrectlyInRecursiveTopologies[Eos enabled = false] 
STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldApplyGlobalUpdatesCorrectlyInRecursiveTopologies[Eos enabled = false] 
PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldReturnAllStoresNames[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldReturnAllStoresNames[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldPassRecordHeadersIntoSerializersAndDeserializers[Eos enabled = false] 
STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldPassRecordHeadersIntoSerializersAndDeserializers[Eos enabled = false] 
PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldProcessConsumerRecordList[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldProcessConsumerRecordList[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldUseSinkSpecificSerializers[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldUseSinkSpecificSerializers[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldFlushStoreForFirstInput[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldFlushStoreForFirstInput[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldProcessFromSourceThatMatchPattern[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldProcessFromSourceThatMatchPattern[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldCaptureSinkTopicNamesIfWrittenInto[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldCaptureSinkTopicNamesIfWrittenInto[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldUpdateStoreForNewKey[Eos enabled = false] STARTED


Re: [VOTE] KIP-632: Add DirectoryConfigProvider

2020-07-08 Thread Mickael Maison
+1 (binding)
Thanks

On Wed, Jul 8, 2020 at 11:31 AM Manikumar  wrote:
>
> +1 (bindig)
>
> Thanks for the KIP.
>
> On Tue, Jul 7, 2020 at 10:30 PM David Jacot  wrote:
>
> > +1 (non-binding). Thanks for the KIP!
> >
> > On Tue, Jul 7, 2020 at 12:54 PM Tom Bentley  wrote:
> >
> > > Hi,
> > >
> > > I'd like to start a vote on KIP-632, which is about making the config
> > > provider mechanism more ergonomic on Kubernetes:
> > >
> > >
> > >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-632%3A+Add+DirectoryConfigProvider
> > >
> > > Please take a look if you have time.
> > >
> > > Many thanks,
> > >
> > > Tom
> > >
> >


Re: [DISCUSS] Apache Kafka 2.6.0 release

2020-07-08 Thread John Roesler
Hi Randall,

While developing system tests, I've just unearthed a new 2.6 regression:
https://issues.apache.org/jira/browse/KAFKA-10247

I've got a PR in progress. Hoping to finish it up today:
https://github.com/apache/kafka/pull/8994

Sorry for the trouble,
-John

On Mon, Jun 29, 2020, at 09:29, Randall Hauch wrote:
> Thanks for raising this, David. I agree it makes sense to include this fix
> in 2.6, so I've adjusted the "Fix Version(s)" field to include '2.6.0'.
> 
> Best regards,
> 
> Randall
> 
> On Mon, Jun 29, 2020 at 8:25 AM David Jacot  wrote:
> 
> > Hi Randall,
> >
> > We have discovered an annoying issue that we introduced in 2.5:
> >
> > Describing topics with the command line tool fails if the user does not
> > have the
> > privileges to access the ListPartitionReassignments API. I believe that
> > this is the
> > case for most non-admin users.
> >
> > I propose to include the fix in 2.6. The fix is trivial so low risk. What
> > do you think?
> >
> > JIRA: https://issues.apache.org/jira/browse/KAFKA-10212
> > PR: https://github.com/apache/kafka/pull/8947
> >
> > Best,
> > David
> >
> > On Sat, Jun 27, 2020 at 4:39 AM John Roesler  wrote:
> >
> > > Hi Randall,
> > >
> > > I neglected to notify this thread when I merged the fix for
> > > https://issues.apache.org/jira/browse/KAFKA-10185
> > > on June 19th. I'm sorry about that oversight. It is marked with
> > > a fix version of 2.6.0.
> > >
> > > On a side node, I have a fix for KAFKA-10173, which I'm merging
> > > and backporting right now.
> > >
> > > Thanks for managing the release,
> > > -John
> > >
> > > On Thu, Jun 25, 2020, at 10:23, Randall Hauch wrote:
> > > > Thanks for the update, folks!
> > > >
> > > > Based upon Jira [1], we currently have 4 issues that are considered
> > > > blockers for the 2.6.0 release and production of RCs:
> > > >
> > > >- https://issues.apache.org/jira/browse/KAFKA-10134 - High CPU
> > issue
> > > >during rebalance in Kafka consumer after upgrading to 2.5
> > (unassigned)
> > > >- https://issues.apache.org/jira/browse/KAFKA-10143 - Can no longer
> > > >change replication throttle with reassignment tool (Jason G)
> > > >- https://issues.apache.org/jira/browse/KAFKA-10166 - Excessive
> > > >TaskCorruptedException seen in testing (Sophie, Bruno)
> > > >- https://issues.apache.org/jira/browse/KAFKA-10173
> > > >- BufferUnderflowException during Kafka Streams Upgrade (John R)
> > > >
> > > > and one critical issue that may be a regression that at this time will
> > > not
> > > > block production of RCs:
> > > >
> > > >- https://issues.apache.org/jira/browse/KAFKA-10017 - Flaky Test
> > > >EosBetaUpgradeIntegrationTest.shouldUpgradeFromEosAlphaToEosBeta
> > > (Matthias)
> > > >
> > > > and one build/release issue we'd like to fix if possible but will not
> > > block
> > > > RCs or the release:
> > > >
> > > >- https://issues.apache.org/jira/browse/KAFKA-9381
> > > >- kafka-streams-scala: Javadocs + Scaladocs not published on maven
> > > central
> > > >(me)
> > > >
> > > > I'm working with the assignees and reporters of these issues (via
> > > comments
> > > > on the issues) to identify an ETA and to track progress. Anyone is
> > > welcome
> > > > to chime in on those issues.
> > > >
> > > > At this time, no other changes (other than PRs that only fix/improve
> > > tests)
> > > > should be merged to the `2.6` branch. If you think you've identified a
> > > new
> > > > blocker issue or believe another existing issue should be treated as a
> > > > blocker for 2.6.0, please mark the issue's `fix version` as `2.6.0`
> > _and_
> > > > respond to this thread with details, and I will work with you to
> > > determine
> > > > whether it is indeed a blocker.
> > > >
> > > > As always, let me know here if you have any questions/concerns.
> > > >
> > > > Best regards,
> > > >
> > > > Randall
> > > >
> > > > [1] https://issues.apache.org/jira/projects/KAFKA/versions/12346918
> > > >
> > > > On Thu, Jun 25, 2020 at 8:27 AM Mario Molina 
> > wrote:
> > > >
> > > > > Hi Randal,
> > > > >
> > > > > Ticket https://issues.apache.org/jira/browse/KAFKA-9018 is not a
> > > blocker
> > > > > so
> > > > > it can be moved to the 2.7.0 version.
> > > > >
> > > > > Mario
> > > > >
> > > > > On Wed, 24 Jun 2020 at 20:22, Boyang Chen <
> > reluctanthero...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hey Randal,
> > > > > >
> > > > > > There was another spotted blocker:
> > > > > > https://issues.apache.org/jira/browse/KAFKA-10173
> > > > > > As of current, John is working on a fix.
> > > > > >
> > > > > > Boyang
> > > > > >
> > > > > > On Wed, Jun 24, 2020 at 4:08 PM Sophie Blee-Goldman <
> > > sop...@confluent.io
> > > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Hey all,
> > > > > > >
> > > > > > > Just a heads up that we discovered a new blocker. The fix is
> > pretty
> > > > > > > straightforward
> > > > > > > and there's already a PR for it so it should be resolved quickly.
> 

[jira] [Created] (KAFKA-10247) Streams may attempt to process after closing a task

2020-07-08 Thread John Roesler (Jira)
John Roesler created KAFKA-10247:


 Summary: Streams may attempt to process after closing a task
 Key: KAFKA-10247
 URL: https://issues.apache.org/jira/browse/KAFKA-10247
 Project: Kafka
  Issue Type: Bug
  Components: streams
Affects Versions: 2.6.0
Reporter: John Roesler
Assignee: John Roesler


Observed in a system test. A corrupted task was detected, and Stream properly 
closed it as dirty:
{code:java}
[2020-07-08 17:08:09,345] WARN stream-thread 
[SmokeTest-66676ca8-d517-4e4b-bb5f-44203e24e569-StreamThread-2] Encountered 
org.apache.kafka.clients.consumer.OffsetOutOfRangeException fetching records 
from restore consumer for partitions [SmokeTest-cntStoreName-changelog-1], it 
is likely that the consumer's position has fallen out of the topic partition 
offset range because the topic was truncated or compacted on the broker, 
marking the corresponding tasks as corrupted and re-initializing it later. 
(org.apache.kafka.streams.processor.internals.StoreChangelogReader)
org.apache.kafka.clients.consumer.OffsetOutOfRangeException: Fetch position 
FetchPosition{offset=1, offsetEpoch=Optional.empty, 
currentLeader=LeaderAndEpoch{leader=Optional[ducker03:9092 (id: 1 rack: null)], 
epoch=0}} is out of range for partition SmokeTest-cntStoreName-changelog-1
   at 
org.apache.kafka.clients.consumer.internals.Fetcher.handleOffsetOutOfRange(Fetcher.java:1344)
   at 
org.apache.kafka.clients.consumer.internals.Fetcher.initializeCompletedFetch(Fetcher.java:1296)
   at 
org.apache.kafka.clients.consumer.internals.Fetcher.fetchedRecords(Fetcher.java:611)
   at 
org.apache.kafka.clients.consumer.KafkaConsumer.pollForFetches(KafkaConsumer.java:1280)
   at 
org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1238)
   at 
org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1206)
   at 
org.apache.kafka.streams.processor.internals.StoreChangelogReader.restore(StoreChangelogReader.java:433)
   at 
org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:664)
   at 
org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:548)
   at 
org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:507)
[2020-07-08 17:08:09,345] WARN stream-thread 
[SmokeTest-66676ca8-d517-4e4b-bb5f-44203e24e569-StreamThread-2] Detected the 
states of tasks {2_1=[SmokeTest-cntStoreName-changelog-1]} are corrupted. Will 
close the task as dirty and re-create and bootstrap from scratch. 
(org.apache.kafka.streams.processor.internals.StreamThread)
org.apache.kafka.streams.errors.TaskCorruptedException: Tasks with changelogs 
{2_1=[SmokeTest-cntStoreName-changelog-1]} are corrupted and hence needs to be 
re-initialized
   at 
org.apache.kafka.streams.processor.internals.StoreChangelogReader.restore(StoreChangelogReader.java:446)
   at 
org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:664)
   at 
org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:548)
   at 
org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:507)
Caused by: org.apache.kafka.clients.consumer.OffsetOutOfRangeException: Fetch 
position FetchPosition{offset=1, offsetEpoch=Optional.empty, 
currentLeader=LeaderAndEpoch{leader=Optional[ducker03:9092 (id: 1 rack: null)], 
epoch=0}} is out of range for partition SmokeTest-cntStoreName-changelog-1
   at 
org.apache.kafka.clients.consumer.internals.Fetcher.handleOffsetOutOfRange(Fetcher.java:1344)
   at 
org.apache.kafka.clients.consumer.internals.Fetcher.initializeCompletedFetch(Fetcher.java:1296)
   at 
org.apache.kafka.clients.consumer.internals.Fetcher.fetchedRecords(Fetcher.java:611)
   at 
org.apache.kafka.clients.consumer.KafkaConsumer.pollForFetches(KafkaConsumer.java:1280)
   at 
org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1238)
   at 
org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1206)
   at 
org.apache.kafka.streams.processor.internals.StoreChangelogReader.restore(StoreChangelogReader.java:433)
   ... 3 more
[2020-07-08 17:08:09,346] INFO stream-thread 
[SmokeTest-66676ca8-d517-4e4b-bb5f-44203e24e569-StreamThread-2] task [2_1] 
Suspended running (org.apache.kafka.streams.processor.internals.StreamTask)
[2020-07-08 17:08:09,346] DEBUG stream-thread 
[SmokeTest-66676ca8-d517-4e4b-bb5f-44203e24e569-StreamThread-2] task [2_1] 
Closing its state manager and all the registered state stores: 
{sum-STATE-STORE-50=StateStoreMetadata (sum-STATE-STORE-50 : 
SmokeTest-sum-STATE-STORE-50-changelog-1 @ null, 
cntStoreName=StateStoreMetadata (cntStoreName : 
SmokeTest-cntStoreName-changelog-1 @ 0} 
(org.apache.kafka.streams.processor.internals.ProcessorStateManager)
[2020-07-08 17:08:09,346] INFO [Consumer 

Re: [DISCUSS] KIP-621: Deprecate and replace DescribeLogDirsResult.all() and .values()

2020-07-08 Thread Dongjin Lee
Hi Tom,

Thanks for taking this issue. I opened a PR for this issue earlier, but
your KIP was submitted first. So I closed my one

.

I have a question: for consistency with other methods, how about
maintaining the signature of DescribeLogDirsResult#[all, values]? There is
an alternative approach to deprecate DescribeLogDirsResponse.[LogDirInfo,
ReplicaInfo]. (Please have a look at my proposal.)

Best,
Dongjin

+1. I updated the link to the discussion thread on your KIP document.

On 2020/06/29 09:31:53, Tom Bentley  wrote:
> Hi,>
>
> Does anyone have any comments about this? If not I'll likely start a
vote>
> in a couple of days.>
>
> Kind regards,>
>
> Tom>
>
> On Mon, Jun 8, 2020 at 4:56 PM Tom Bentley  wrote:>
>
> > Hi all,>
> >>
> > I've opened a small KIP seeking to deprecate and replace a couple of>
> > methods of DescribeLogDirsResult which refer to internal classes in
their>
> > return type.>
> >>
> >
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=158862109>
> >>
> > Please take a look if you have the time.>
> >>
> > Kind regards,>
> >>
> > Tom>
> >>
>


Re: [DISCUSS] KIP-638: Deprecate DescribeLogDirsResponse.[LogDirInfo, ReplicaInfo]

2020-07-08 Thread Dongjin Lee
Hi Mickael,

Okay. I will close my KIP. Let's continue the discussion on KIP-621 thread.

Thanks,
Dongjin

On Tue, Jul 7, 2020 at 11:06 PM Mickael Maison 
wrote:

> Hi Dongjin,
>
> It looks like this KIP is addressing the same issue as KIP-621:
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=158862109
>
> On Tue, Jul 7, 2020 at 2:29 PM Dongjin Lee  wrote:
> >
> > Hi devs,
> >
> > I hope to start the discussion of KIP-638, which aims to fix a glitch in
> > Admin#describeLogDirs method.
> >
> > - KIP:
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=158866169
> > - Jira: https://issues.apache.org/jira/browse/KAFKA-8794
> >
> > All kinds of feedback will be greatly appreciated.
> >
> > Best,
> > Dongjin
> >
> > --
> > *Dongjin Lee*
> >
> > *A hitchhiker in the mathematical world.*
> >
> >
> >
> >
> > *github:  github.com/dongjinleekr
> > keybase:
> https://keybase.io/dongjinleekr
> > linkedin:
> kr.linkedin.com/in/dongjinleekr
> > speakerdeck:
> speakerdeck.com/dongjin
> > *
>


-- 
*Dongjin Lee*

*A hitchhiker in the mathematical world.*




*github:  github.com/dongjinleekr
keybase: https://keybase.io/dongjinleekr
linkedin: kr.linkedin.com/in/dongjinleekr
speakerdeck: speakerdeck.com/dongjin
*


[jira] [Resolved] (KAFKA-8794) Deprecate DescribeLogDirsResponse.[LogDirInfo, ReplicaInfo]

2020-07-08 Thread Dongjin Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjin Lee resolved KAFKA-8794.

Resolution: Duplicate

Duplication of KAFKA-10120.

> Deprecate DescribeLogDirsResponse.[LogDirInfo, ReplicaInfo]
> ---
>
> Key: KAFKA-8794
> URL: https://issues.apache.org/jira/browse/KAFKA-8794
> Project: Kafka
>  Issue Type: Improvement
>  Components: clients, documentation
>Reporter: Dongjin Lee
>Assignee: Dongjin Lee
>Priority: Minor
>  Labels: needs-kip
>
> As of 2.3.0, {{DescribeLogDirsResult}} returned by 
> {{AdminClient#describeLogDirs(Collection)}} is exposing the internal data 
> structure, {{DescribeLogDirsResponse.LogDirInfo}}. By doing so, its Javadoc 
> provides no documentation on it. Its imparity is clear when comparing with 
> {{DescribeReplicaLogDirsResult}}, returned by 
> {{AdminClient#describeReplicaLogDirs(Collection)}}.
> To resolve this, {{DescribeLogDirsResponse.[LogDirInfo, ReplicaInfo]}} should 
> be deprecated and hided from the public later; instead, 
> {{org.apache.kafka.clients.admin.DescribeLogDirsResult}} should provide 
> {{[LogDirInfo, ReplicaInfo]}} as its internal class, like 
> {{DescribeReplicaLogDirsResult}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] KIP-616: Rename implicit Serdes instances in kafka-streams-scala

2020-07-08 Thread John Roesler
Hi Yuriy,

Once it seems like there’s general agreement in the discussion, you can start a 
voting thread. You can find examples on the mailing list of what to say in the 
first message. It’s basically just a message with the subject line changed from 
“[DISCUSS]...” to “[VOTE]...”, and then stating that you’d like to start the 
vote. It’s nice to link to the kip document again. 

The rules for the vote are at the top of the “Kafka Improvement Process” page, 
but you basically need 3 binding +1 votes and no binding -1 votes. You also 
need to wait at least three days from when you start the vote before you can 
declare it accepted. There’s no upper time limit. 

If you’re unsure of who has a binding vote, it’s just the people listed on the 
Apache Kafka Committers page. 

If people are slow to vote, feel free to keep bumping the thread, just like 
with the discussion. 

Thanks again for getting involved!
-John

On Tue, Jul 7, 2020, at 01:51, Yuriy Badalyantc wrote:
> So, what's next? It's my first KIP and I'm not familiar with all processes.
> 
> -Yuriy
> 
> On Mon, Jul 6, 2020 at 1:32 AM John Roesler  wrote:
> 
> > Hi Yuriy,
> >
> > Thanks for the update! It looks good to me.
> >
> > Thanks,
> > John
> >
> > On Sun, Jul 5, 2020, at 03:27, Yuriy Badalyantc wrote:
> > > Hi John.
> > >
> > > I updated the KIP. An old proposed implementation is now in the rejected
> > > alternatives.
> > >
> > > - Yuriy
> > >
> > > On Sun, Jul 5, 2020 at 12:03 AM John Roesler 
> > wrote:
> > >
> > > > Hi Yuriy,
> > > >
> > > > I agree, we can keep them separate. I just wanted to make you aware of
> > it.
> > > >
> > > > Thanks for the PR, it looks the way I expected.
> > > >
> > > > I just read over the KIP document again. I think it needs to be
> > updated to
> > > > the current proposal, and then we’ll be able to start the vote.
> > > >
> > > > Thanks,
> > > > John
> > > >
> > > > On Tue, Jun 30, 2020, at 04:58, Yuriy Badalyantc wrote:
> > > > > Hi everybody!
> > > > >
> > > > > Looks like a discussion about KIP-513 could take a while. I think we
> > > > should
> > > > > move forward with KIP-616 without waiting for KIP-513.
> > > > >
> > > > > I created a new pull request for KIP-616:
> > > > > https://github.com/apache/kafka/pull/8955. It contains a new
> > > > > `org.apache.kafka.streams.scala.serialization.Serdes` object without
> > name
> > > > > clash. An old one was marked as deprecated. This change is backward
> > > > > compatible and it could be merged in any further release.
> > > > >
> > > > > On Wed, Jun 3, 2020 at 12:41 PM Yuriy Badalyantc 
> > > > wrote:
> > > > >
> > > > > > Hi, John
> > > > > >
> > > > > > Thanks for pointing that out. I expressed my thoughts about
> > KIP-513 and
> > > > > > its connection to KIP-616 in the KIP-513 mail list.
> > > > > >
> > > > > > - Yuriy
> > > > > >
> > > > > > On Sun, May 31, 2020 at 1:26 AM John Roesler 
> > > > wrote:
> > > > > >
> > > > > >> Hi Yuriy,
> > > > > >>
> > > > > >> I was just looking back at KIP-513, and I’m wondering if there’s
> > any
> > > > > >> overlap we should consider here, or if they are just orthogonal.
> > > > > >>
> > > > > >> Thanks,
> > > > > >> -John
> > > > > >>
> > > > > >> On Thu, May 28, 2020, at 21:36, Yuriy Badalyantc wrote:
> > > > > >> > At the current moment, I think John's plan is better than the
> > > > original
> > > > > >> plan
> > > > > >> > described in the KIP. I think we should create a new `Serdes` in
> > > > another
> > > > > >> > package. The old one will be deprecated.
> > > > > >> >
> > > > > >> > - Yuriy
> > > > > >> >
> > > > > >> > On Fri, May 29, 2020 at 8:58 AM John Roesler <
> > vvcep...@apache.org>
> > > > > >> wrote:
> > > > > >> >
> > > > > >> > > Thanks, Matthias,
> > > > > >> > >
> > > > > >> > > If we go with the approach Yuriy and I agreed on, to
> > deprecate and
> > > > > >> replace
> > > > > >> > > the whole class and not just a few of the methods, then the
> > > > timeline
> > > > > >> is
> > > > > >> > > less of a concern. Under that plan, Yuriy can just write the
> > new
> > > > class
> > > > > >> > > exactly the way he wants and people can cleanly swap over to
> > the
> > > > new
> > > > > >> > > pattern when they are ready.
> > > > > >> > >
> > > > > >> > > The timeline was more significant if we were just going to
> > > > deprecate
> > > > > >> some
> > > > > >> > > methods and add new methods to the existing class. That plan
> > > > requires
> > > > > >> two
> > > > > >> > > implementation phases, where we first deprecate the existing
> > > > methods
> > > > > >> and
> > > > > >> > > later swap the implicits at the same time we remove the
> > deprecated
> > > > > >> members.
> > > > > >> > > Aside from the complexity of that approach, it’s not a
> > breakage
> > > > free
> > > > > >> path,
> > > > > >> > > as some users would be forced to continue using the deprecated
> > > > members
> > > > > >> > > until a future release drops them, breaking their source
> > code, and
> > > > > >> only
> > > > 

Re: [VOTE] KIP-622 Add currentSystemTimeMs and currentStreamTimeMs to ProcessorContext

2020-07-08 Thread Bruno Cadonna
Thanks Will and Piotr,

+1 (non-binding)

Best,
Bruno

On Wed, Jul 8, 2020 at 8:12 AM Matthias J. Sax  wrote:
>
> Thanks for the KIP.
>
> +1 (binding)
>
>
> -Matthias
>
> On 7/7/20 11:48 AM, William Bottrell wrote:
> > Hi everyone,
> >
> > I'd like to start a vote for adding two new time API's to ProcessorContext.
> >
> > Add currentSystemTimeMs and currentStreamTimeMs to ProcessorContext
> > 
> >
> >  Thanks everyone for the initial feedback and thanks for your time.
> >
>


Re: [VOTE] KIP-621: Deprecate and replace DescribeLogDirsResult.all() and .values()

2020-07-08 Thread Manikumar
+1 (binding)

Thanks for the KIP.

On Tue, Jul 7, 2020 at 8:19 PM Colin McCabe  wrote:

> Thanks, Tom.
>
> I tried to think of a better way to do this, but I think you're right that
> we probably just need different methods.
>
> +1 (binding).
>
> best,
> Colin
>
> On Mon, Jul 6, 2020, at 01:14, Tom Bentley wrote:
> > Hi,
> >
> > I'd like to start a vote on KIP-621 which is about deprecating methods in
> > DescribeLogDirsResult which leak internal classes, replacing them with
> > public API classes.
> >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=158862109
> >
> > Thanks,
> >
> > Tom
> >
>


Re: [VOTE] KIP-632: Add DirectoryConfigProvider

2020-07-08 Thread Manikumar
+1 (bindig)

Thanks for the KIP.

On Tue, Jul 7, 2020 at 10:30 PM David Jacot  wrote:

> +1 (non-binding). Thanks for the KIP!
>
> On Tue, Jul 7, 2020 at 12:54 PM Tom Bentley  wrote:
>
> > Hi,
> >
> > I'd like to start a vote on KIP-632, which is about making the config
> > provider mechanism more ergonomic on Kubernetes:
> >
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-632%3A+Add+DirectoryConfigProvider
> >
> > Please take a look if you have time.
> >
> > Many thanks,
> >
> > Tom
> >
>


Re: [VOTE] KIP-431: Support of printing additional ConsumerRecord fields in DefaultMessageFormatter

2020-07-08 Thread Manikumar
+1 (binding)

Thanks for the KIP.

Thanks,
Manikumar



On Wed, Jul 8, 2020 at 11:47 AM Matthias J. Sax  wrote:

> +1 (binding)
>
> -Matthias
>
> On 7/7/20 7:16 PM, John Roesler wrote:
> > Hi Badai,
> >
> > Thanks for picking this up. I've reviewed the KIP document and
> > the threads you linked. I think we may want to make more
> > improvements in the future to the printing of headers in particular,
> > but this KIP seems like a clear benefit already. I think we can
> > take it incrementally.
> >
> > I'm +1 (binding)
> >
> > Thanks,
> > -John
> >
> > On Tue, Jul 7, 2020, at 09:57, Badai Aqrandista wrote:
> >> Hi all
> >>
> >> After resurrecting the discussion thread [1] for KIP-431 and have not
> >> received any further feedback for 2 weeks, I would like to resurrect
> >> the voting thread [2] for KIP-431.
> >>
> >> I have updated KIP-431 wiki page
> >> (
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-431%3A+Support+of+printing+additional+ConsumerRecord+fields+in+DefaultMessageFormatter
> )
> >> to address Ismael's comment on that thread [3].
> >>
> >> Does anyone else have other comments or objections about this KIP?
> >>
> >> [1]
> >>
> https://lists.apache.org/thread.html/raabf3268ed05931b8a048fce0d6cdf6a326aee4b0d89713d6e6998d6%40%3Cdev.kafka.apache.org%3E
> >>
> >> [2]
> >>
> https://lists.apache.org/thread.html/41fff34873184625370f9e76b8d9257f7a9e7892ab616afe64b4f67c%40%3Cdev.kafka.apache.org%3E
> >>
> >> [3]
> >>
> https://lists.apache.org/thread.html/99e9cbaad4a0a49b96db104de450c9f488d4b2b03a09b991bcbadbc7%40%3Cdev.kafka.apache.org%3E
> >>
> >> --
> >> Thanks,
> >> Badai
> >>
>
>


Jenkins build is back to normal : kafka-2.5-jdk8 #164

2020-07-08 Thread Apache Jenkins Server
See 




Re: [VOTE] KIP-392: Allow consumers to fetch from the closest replica

2020-07-08 Thread Rajini Sivaram
Hi all,

We would like to make a change related to this KIP to ensure that transient
exceptions during reassignments are handled as retriable exceptions. Kafka
brokers currently return REPLICA_NOT_AVAILABLE for Fetch requests if the
replica is not available on the broker. But ReplicaNotAvailableException is
not a retriable InvalidMetadataException. Java consumers handle this
correctly by explicitly matching error codes, but some non-Java clients
treat this as a fatal error. Brokers return NOT_LEADER_FOR_PARTITION for
Produce requests in this scenario. This error is currently not suitable for
fetch since fetch could be from leader or follower.

Following on from the discussion in the PR
https://github.com/apache/kafka/pull/8979 for fixing
https://issues.apache.org/jira/browse/KAFKA-10223, we would like to rename
NOT_LEADER_FOR_PARTITION to NOT_LEADER_OR_FOLLOWER so that it can be used
for fetch-from-follower as well. Both Produce and Fetch requests will
return this error if replica is not available. This makes it compatible
with old and new clients and is a non-breaking change. A new exception
class NotLeaderOrFollowerException will be added. This will be a subclass
of NotLeaderForPartitionException. We will deprecate
NotLeaderForPartitionException and ReplicaNotAvailableException.

If you have any concerns about this change, please let us know. Otherwise,
I will update the KIP and PR.

Thank you,

Rajini


On Mon, Apr 8, 2019 at 5:19 PM Jason Gustafson  wrote:

> I'm going to call this vote.
>
> +1: Me, Ryanne, Guozhang, Eno, David, Viktor, Stephane, Gwen, Jun
>
> Binding total is +5 with no -1 or +0. Thanks all for the discussion and
> feedback.
>
> -Jason
>
>
> On Thu, Apr 4, 2019 at 4:55 PM Jun Rao  wrote:
>
> > Hi, Jason,
> >
> > Thanks for the updated KIP. +1 from me.
> >
> > Jun
> >
> > On Thu, Apr 4, 2019 at 2:26 PM Jason Gustafson 
> wrote:
> >
> > > Hi Jun,
> > >
> > > I have updated the KIP to remove `replica.selection.policy` from the
> > > consumer configuration. Thanks for the suggestion.
> > >
> > > Best,
> > > Jason
> > >
> > > On Wed, Mar 27, 2019 at 9:46 AM Jason Gustafson 
> > > wrote:
> > >
> > > > @Jun
> > > >
> > > > Re; 200: It's a fair point that it is useful to minimize the client
> > > > changes that are needed to get a benefit from affinity. I think the
> > high
> > > > level argument that this is mostly the concern of operators and
> should
> > be
> > > > under their control. Since there is a protocol bump here, users will
> > have
> > > > to upgrade clients at a minimum. An alternative would be to make
> > > > "preferred" the default option for `replica.selection.policy`. But I
> > > agree
> > > > that the value of the configuration becomes less clear in this case.
> > > > Overall this suggestion sounds good to me, but let me see if there is
> > any
> > > > additional feedback before I update the KIP.
> > > >
> > > > Re; 201: Ack.
> > > >
> > > > @Guozhang
> > > >
> > > > I think rack.id is still an easier and more reliable way for many
> > users
> > > > to determine local affinity. This lets us provide the simple
> rack-aware
> > > > implementation which is probably sufficient for a fair number of use
> > > cases
> > > > and wouldn't require users to write any custom code.
> > > >
> > > > Thanks,
> > > > Jason
> > > >
> > > >
> > > > On Wed, Mar 27, 2019 at 9:05 AM Guozhang Wang 
> > > wrote:
> > > >
> > > >> Hello Jun,
> > > >>
> > > >> Regarding 200: if we assume that most client would not bother
> setting
> > > >> rack.id at all and affinity can be determined w/o rack.id via TCP
> > > header,
> > > >> plus rack.id may not be "future-proof" additional information is
> > needed
> > > >> as
> > > >> well, then do we still need to change the protocol of metadata
> request
> > > to
> > > >> add `rack.id`?
> > > >>
> > > >>
> > > >> Guozhang
> > > >>
> > > >> On Tue, Mar 26, 2019 at 6:23 PM Jun Rao  wrote:
> > > >>
> > > >> > Hi, Jason,
> > > >> >
> > > >> > Thanks for the KIP. Just a couple of more comments.
> > > >> >
> > > >> > 200. I am wondering if we really need the replica.selection.policy
> > > >> config
> > > >> > in the consumer. A slight variant is that we (1) let the consumer
> > > always
> > > >> > fetch from the PreferredReplica and (2) provide a default
> > > >> implementation of
> > > >> > ReplicaSelector that always returns the leader replica in select()
> > for
> > > >> > backward compatibility. Then, we can get rid of
> > > >> replica.selection.policy in
> > > >> > the consumer. The benefits are that (1) fewer configs, (2)
> affinity
> > > >> > optimization can potentially be turned on with just a broker side
> > > change
> > > >> > (assuming affinity can be determined w/o client rack.id).
> > > >> >
> > > >> > 201. I am wondering if PreferredReplica in the protocol should be
> > > named
> > > >> > PreferredReadReplica since it's intended for reads?
> > > >> >
> > > >> > Jun
> > > >> >
> > > >> > On Mon, Mar 25, 2019 at 9:07 AM Jason Gustafson <
> 

[jira] [Resolved] (KAFKA-10245) Using vulnerable log4j version

2020-07-08 Thread Tom Bentley (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tom Bentley resolved KAFKA-10245.
-
Resolution: Duplicate

> Using vulnerable log4j version
> --
>
> Key: KAFKA-10245
> URL: https://issues.apache.org/jira/browse/KAFKA-10245
> Project: Kafka
>  Issue Type: Bug
>  Components: core, KafkaConnect
>Affects Versions: 2.5.0
>Reporter: Pavel Kuznetsov
>Priority: Major
>  Labels: security
>
> *Description*
> I checked kafka_2.12-2.5.0.tgz distribution with WhiteSource and find out 
> that log4j version, that used in kafka-connect and kafka-brocker, has 
> vulnerabilities
>  * log4j-1.2.17.jar has 
> [CVE-2019-17571|https://github.com/advisories/GHSA-2qrg-x229-3v8q] and 
> [CVE-2020-9488|https://github.com/advisories/GHSA-vwqq-5vrc-xw9h] 
> vulnerabilities. The way to fix it is to upgrade to 
> org.apache.logging.log4j:log4j-core:2.13.2
> *To Reproduce*
> Download kafka_2.12-2.5.0.tgz
> Open libs folder in it and find log4j-1.2.17.jar.
> Check [CVE-2019-17571|https://github.com/advisories/GHSA-2qrg-x229-3v8q] and 
> [CVE-2020-9488|https://github.com/advisories/GHSA-vwqq-5vrc-xw9h] to see that 
> log4j 1.2.17 is vulnerable.
> *Expected*
>  * log4j is log4j-core 2.13.2 or higher
> *Actual*
>  * log4j is 1.2.17



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Build failed in Jenkins: kafka-2.3-jdk8 #218

2020-07-08 Thread Apache Jenkins Server
See 


Changes:

[github] KAFKA-9144; Track timestamp from txn markers to prevent early producer


--
[...truncated 2.99 MB...]
kafka.log.LogCleanerTest > testPartialSegmentClean STARTED

kafka.log.LogCleanerTest > testPartialSegmentClean PASSED

kafka.log.LogCleanerTest > testCommitMarkerRemoval STARTED

kafka.log.LogCleanerTest > testCommitMarkerRemoval PASSED

kafka.log.LogCleanerTest > testCleanSegmentsWithConcurrentSegmentDeletion 
STARTED

kafka.log.LogCleanerTest > testCleanSegmentsWithConcurrentSegmentDeletion PASSED

kafka.log.LogValidatorTest > testRecompressedBatchWithoutRecordsNotAllowed 
STARTED

kafka.log.LogValidatorTest > testRecompressedBatchWithoutRecordsNotAllowed 
PASSED

kafka.log.LogValidatorTest > testCompressedV1 STARTED

kafka.log.LogValidatorTest > testCompressedV1 PASSED

kafka.log.LogValidatorTest > testCompressedV2 STARTED

kafka.log.LogValidatorTest > testCompressedV2 PASSED

kafka.log.LogValidatorTest > testDownConversionOfIdempotentRecordsNotPermitted 
STARTED

kafka.log.LogValidatorTest > testDownConversionOfIdempotentRecordsNotPermitted 
PASSED

kafka.log.LogValidatorTest > 
testOffsetAssignmentAfterUpConversionV0ToV2NonCompressed STARTED

kafka.log.LogValidatorTest > 
testOffsetAssignmentAfterUpConversionV0ToV2NonCompressed PASSED

kafka.log.LogValidatorTest > testAbsoluteOffsetAssignmentCompressed STARTED

kafka.log.LogValidatorTest > testAbsoluteOffsetAssignmentCompressed PASSED

kafka.log.LogValidatorTest > testLogAppendTimeWithRecompressionV1 STARTED

kafka.log.LogValidatorTest > testLogAppendTimeWithRecompressionV1 PASSED

kafka.log.LogValidatorTest > testLogAppendTimeWithRecompressionV2 STARTED

kafka.log.LogValidatorTest > testLogAppendTimeWithRecompressionV2 PASSED

kafka.log.LogValidatorTest > testCreateTimeUpConversionV0ToV1 STARTED

kafka.log.LogValidatorTest > testCreateTimeUpConversionV0ToV1 PASSED

kafka.log.LogValidatorTest > testCreateTimeUpConversionV0ToV2 STARTED

kafka.log.LogValidatorTest > testCreateTimeUpConversionV0ToV2 PASSED

kafka.log.LogValidatorTest > testCreateTimeUpConversionV1ToV2 STARTED

kafka.log.LogValidatorTest > testCreateTimeUpConversionV1ToV2 PASSED

kafka.log.LogValidatorTest > 
testOffsetAssignmentAfterDownConversionV2ToV0Compressed STARTED

kafka.log.LogValidatorTest > 
testOffsetAssignmentAfterDownConversionV2ToV0Compressed PASSED

kafka.log.LogValidatorTest > testZStdCompressedWithUnavailableIBPVersion STARTED

kafka.log.LogValidatorTest > testZStdCompressedWithUnavailableIBPVersion PASSED

kafka.log.LogValidatorTest > 
testOffsetAssignmentAfterUpConversionV1ToV2Compressed STARTED

kafka.log.LogValidatorTest > 
testOffsetAssignmentAfterUpConversionV1ToV2Compressed PASSED

kafka.log.LogValidatorTest > 
testOffsetAssignmentAfterUpConversionV0ToV1NonCompressed STARTED

kafka.log.LogValidatorTest > 
testOffsetAssignmentAfterUpConversionV0ToV1NonCompressed PASSED

kafka.log.LogValidatorTest > 
testDownConversionOfTransactionalRecordsNotPermitted STARTED

kafka.log.LogValidatorTest > 
testDownConversionOfTransactionalRecordsNotPermitted PASSED

kafka.log.LogValidatorTest > 
testOffsetAssignmentAfterUpConversionV0ToV1Compressed STARTED

kafka.log.LogValidatorTest > 
testOffsetAssignmentAfterUpConversionV0ToV1Compressed PASSED

kafka.log.LogValidatorTest > testRelativeOffsetAssignmentNonCompressedV1 STARTED

kafka.log.LogValidatorTest > testRelativeOffsetAssignmentNonCompressedV1 PASSED

kafka.log.LogValidatorTest > testRelativeOffsetAssignmentNonCompressedV2 STARTED

kafka.log.LogValidatorTest > testRelativeOffsetAssignmentNonCompressedV2 PASSED

kafka.log.LogValidatorTest > testControlRecordsNotAllowedFromClients STARTED

kafka.log.LogValidatorTest > testControlRecordsNotAllowedFromClients PASSED

kafka.log.LogValidatorTest > testRelativeOffsetAssignmentCompressedV1 STARTED

kafka.log.LogValidatorTest > testRelativeOffsetAssignmentCompressedV1 PASSED

kafka.log.LogValidatorTest > testRelativeOffsetAssignmentCompressedV2 STARTED

kafka.log.LogValidatorTest > testRelativeOffsetAssignmentCompressedV2 PASSED

kafka.log.LogValidatorTest > 
testOffsetAssignmentAfterDownConversionV2ToV1NonCompressed STARTED

kafka.log.LogValidatorTest > 
testOffsetAssignmentAfterDownConversionV2ToV1NonCompressed PASSED

kafka.log.LogValidatorTest > testLogAppendTimeNonCompressedV1 STARTED

kafka.log.LogValidatorTest > testLogAppendTimeNonCompressedV1 PASSED

kafka.log.LogValidatorTest > 
testOffsetAssignmentAfterDownConversionV2ToV0NonCompressed STARTED

kafka.log.LogValidatorTest > 
testOffsetAssignmentAfterDownConversionV2ToV0NonCompressed PASSED

kafka.log.LogValidatorTest > testControlRecordsNotCompressed STARTED

kafka.log.LogValidatorTest > testControlRecordsNotCompressed PASSED

kafka.log.LogValidatorTest > testInvalidCreateTimeNonCompressedV1 STARTED

kafka.log.LogValidatorTest > testInvalidCreateTimeNonCompressedV1 PASSED


Re: [VOTE] KIP-431: Support of printing additional ConsumerRecord fields in DefaultMessageFormatter

2020-07-08 Thread Matthias J. Sax
+1 (binding)

-Matthias

On 7/7/20 7:16 PM, John Roesler wrote:
> Hi Badai,
> 
> Thanks for picking this up. I've reviewed the KIP document and
> the threads you linked. I think we may want to make more 
> improvements in the future to the printing of headers in particular,
> but this KIP seems like a clear benefit already. I think we can
> take it incrementally.
> 
> I'm +1 (binding)
> 
> Thanks,
> -John
> 
> On Tue, Jul 7, 2020, at 09:57, Badai Aqrandista wrote:
>> Hi all
>>
>> After resurrecting the discussion thread [1] for KIP-431 and have not
>> received any further feedback for 2 weeks, I would like to resurrect
>> the voting thread [2] for KIP-431.
>>
>> I have updated KIP-431 wiki page
>> (https://cwiki.apache.org/confluence/display/KAFKA/KIP-431%3A+Support+of+printing+additional+ConsumerRecord+fields+in+DefaultMessageFormatter)
>> to address Ismael's comment on that thread [3].
>>
>> Does anyone else have other comments or objections about this KIP?
>>
>> [1] 
>> https://lists.apache.org/thread.html/raabf3268ed05931b8a048fce0d6cdf6a326aee4b0d89713d6e6998d6%40%3Cdev.kafka.apache.org%3E
>>
>> [2] 
>> https://lists.apache.org/thread.html/41fff34873184625370f9e76b8d9257f7a9e7892ab616afe64b4f67c%40%3Cdev.kafka.apache.org%3E
>>
>> [3] 
>> https://lists.apache.org/thread.html/99e9cbaad4a0a49b96db104de450c9f488d4b2b03a09b991bcbadbc7%40%3Cdev.kafka.apache.org%3E
>>
>> -- 
>> Thanks,
>> Badai
>>



signature.asc
Description: OpenPGP digital signature


Re: [VOTE] KIP-622 Add currentSystemTimeMs and currentStreamTimeMs to ProcessorContext

2020-07-08 Thread Matthias J. Sax
Thanks for the KIP.

+1 (binding)


-Matthias

On 7/7/20 11:48 AM, William Bottrell wrote:
> Hi everyone,
> 
> I'd like to start a vote for adding two new time API's to ProcessorContext.
> 
> Add currentSystemTimeMs and currentStreamTimeMs to ProcessorContext
> 
> 
>  Thanks everyone for the initial feedback and thanks for your time.
> 



signature.asc
Description: OpenPGP digital signature


Re: [DISCUSS] KIP-622 Add currentSystemTimeMs and currentStreamTimeMs to ProcessorContext

2020-07-08 Thread Matthias J. Sax
I think, we don't need a default implementation for the new methods.

What would be the use-case to implement the  `ProcessorContext`
interface? In contract to, for example, `KeyValueStore`,
`ProcessorContext` is a use-only interface because it's never passed
into Kafka Streams, but only handed out to the user.


-Matthias


On 7/7/20 1:28 PM, William Bottrell wrote:
> Sure, I would appreciate help from Piotr creating an example.
> 
> On Tue, Jul 7, 2020 at 12:03 PM Boyang Chen 
> wrote:
> 
>> Hey John,
>>
>> since ProcessorContext is a public API, I couldn't be sure that people
>> won't try to extend it. Without a default implementation, user code
>> compilation will break.
>>
>> William and Piotr, it seems that we haven't added any example usage of the
>> new API, could we try to address that? It should help with the motivation
>> and follow-up meta comments as John proposed.
>>
>> Boyang
>>
>> On Mon, Jul 6, 2020 at 12:04 PM Matthias J. Sax  wrote:
>>
>>> William,
>>>
>>> thanks for the KIP. LGMT. Feel free to start a vote.
>>>
>>>
>>> -Matthias
>>>
>>>
>>> On 7/4/20 10:14 AM, John Roesler wrote:
 Hi Richard,

 It’s good to hear from you!

 Thanks for bringing up the wall-clock suppression feature. IIRC,
>> someone
>>> actually started a KIP discussion for it already, but I don’t think it
>> went
>>> to a vote. I don’t recall any technical impediment, just the lack of
>>> availability to finish it up. Although there is some association, it
>> would
>>> be good to keep the KIPs separate.

 Thanks,
 John

 On Sat, Jul 4, 2020, at 10:05, Richard Yu wrote:
> Hi all,
>
> This reminds me of a previous issue I think that we were discussing.
> @John Roesler  I think you should
>> remember
>>> this one.
>
> A while back, we were talking about having suppress operator emit
> records by wall-clock time instead of stream time.
> If we are adding this, wouldn't that make it more feasible for us to
> implement that feature for suppression?
>
> If I recall correctly, there actually had been quite a bit of user
> demand for such a feature.
> Might be good to include it in this KIP (or maybe use one of the prior
> KIPs for this feature).
>
> Best,
> Richard
>
> On Sat, Jul 4, 2020 at 6:58 AM John Roesler 
>>> wrote:
>> Hi all,
>>
>>  1. Thanks, Boyang, it is nice to see usage examples in KIPs like
>>> this. It helps during the discussion, and it’s also good documentation
>>> later on.
>>
>>  2. Yeah, this is a subtle point. The motivation mentions being able
>>> to control the time during tests, but to be able to make it work, the
>>> processor implementation needs a public method on ProcessorContext to get
>>> the time. Otherwise, processors would have to check the type of the
>> context
>>> and cast, depending on whether they’re running inside a test or not. In
>>> retrospect, if we’d had a usage example, this probably would have been
>>> clear.
>>
>>  3. I don’t think we expect people to have their own implementations
>>> of ProcessorContext. Since all implementations are internal, it’s really
>> an
>>> implementation detail whether we use a default method, abstract methods,
>> or
>>> concrete methods. I can’t think of an implementation that really wants to
>>> just look up the system time. In the production code path, we cache the
>>> time for performance, and in testing, we use a mock time.
>>
>>  Thanks,
>>  John
>>
>>
>>  On Fri, Jul 3, 2020, at 06:41, Piotr Smoliński wrote:
>>  > 1. Makes sense; let me propose something
>>  >
>>  > 2. That's not testing-only. The goal is to use the same API to
>>> access
>>  > the time
>>  > in deployment and testing environments. The major driver is
>>  > System.currentTimeMillis(),
>>  > which a) cannot be used in tests b) could go in specific cases
>> back
>>> c)
>>  > is not compatible
>>  > with punctuator call. The idea is that we could access clock using
>>  > uniform API.
>>  > For completness we should have same API for system and stream
>> time.
>>  >
>>  > 3. There aren't that many subclasses. Two important ones are
>>  > ProcessorContextImpl and
>>  > MockProcessorContext (and third one:
>>  > ForwardingDisableProcessorContext). If given
>>  > implementation does not support schedule() call, there is no
>> reason
>>> to
>>  > support clock access.
>>  > The default implementation should just throw
>>  > UnsupportedOperationException just to prevent
>>  > from compilation errors in possible subclasses.
>>  >
>>  > On 2020/07/01 02:24:43, Boyang Chen 
>>> wrote:
>>  > > Thanks Will for the KIP. A couple questions and suggestions:
>>  > >
>>  > > 1. I think for new APIs to make most sense, we should add a
>>> minimal example
>>  > > demonstrating how it could be useful