[jira] [Commented] (KAFKA-7206) Enable batching in FindCoordinator

2019-05-13 Thread Yishun Guan (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16839059#comment-16839059
 ] 

Yishun Guan commented on KAFKA-7206:


Hi [~sagarrao], feel free to take a look at this: 
[https://cwiki.apache.org/confluence/display/KAFKA/KIP-347%3A++Enable+batching+in+FindCoordinatorRequest],
 turns out the backward compatibility is a little tricky here, so it is a 
little complex for a minor improvement.

> Enable batching in FindCoordinator
> --
>
> Key: KAFKA-7206
> URL: https://issues.apache.org/jira/browse/KAFKA-7206
> Project: Kafka
>  Issue Type: Improvement
>  Components: admin
>Reporter: Yishun Guan
>Assignee: Yishun Guan
>Priority: Critical
>  Labels: needs-discussion, needs-kip, newbie++
>
> To quote [~guozhang] :
> "The proposal is that, we extend FindCoordinatorRequest to have multiple 
> consumer ids: today each FindCoordinatorRequest only contains a single 
> consumer id, so in our scenario we need to send N request for N consumer 
> groups still. If we can request for coordinators in a single request, then 
> the workflow could be simplified to:
>  # send a single FindCoordinatorRequest to a broker asking for coordinators 
> of all consumer groups.
>  1.a) note that the response may still succeed in finding some coordinators 
> while error on others, and we need to handle them on that granularity (see 
> below).
>  # and then for the collected coordinator, group them by coordinator id and 
> send one request per coordinator destination.
> Note that this change would require the version to be bumped up, to 
> FIND_COORDINATOR_REQUEST_V3 for such protocol changes, also the RESPONSE 
> version should be bumped up in order to include multiple coordinators."
> A KIP is needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (KAFKA-8294) Batch StopReplica requests with partition deletion and add test cases

2019-05-13 Thread Jason Gustafson (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson resolved KAFKA-8294.

   Resolution: Fixed
Fix Version/s: 2.3.0

> Batch StopReplica requests with partition deletion and add test cases
> -
>
> Key: KAFKA-8294
> URL: https://issues.apache.org/jira/browse/KAFKA-8294
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Jason Gustafson
>Assignee: Jason Gustafson
>Priority: Major
> Fix For: 2.3.0
>
>
> One of the tricky aspects we found in KAFKA-8237 is the batching of the 
> StopReplica requests. We should have test cases covering expected behavior so 
> that we do not introduce regressions and we should make the batching 
> consistent whether or not `deletePartitions` is set.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-8294) Batch StopReplica requests with partition deletion and add test cases

2019-05-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16839015#comment-16839015
 ] 

ASF GitHub Bot commented on KAFKA-8294:
---

hachikuji commented on pull request #6642: KAFKA-8294; Batch StopReplica 
requests when possible and improve test coverage
URL: https://github.com/apache/kafka/pull/6642
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Batch StopReplica requests with partition deletion and add test cases
> -
>
> Key: KAFKA-8294
> URL: https://issues.apache.org/jira/browse/KAFKA-8294
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Jason Gustafson
>Assignee: Jason Gustafson
>Priority: Major
>
> One of the tricky aspects we found in KAFKA-8237 is the batching of the 
> StopReplica requests. We should have test cases covering expected behavior so 
> that we do not introduce regressions and we should make the batching 
> consistent whether or not `deletePartitions` is set.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-8363) Config provider parsing is broken

2019-05-13 Thread Chris Egerton (JIRA)
Chris Egerton created KAFKA-8363:


 Summary: Config provider parsing is broken
 Key: KAFKA-8363
 URL: https://issues.apache.org/jira/browse/KAFKA-8363
 Project: Kafka
  Issue Type: Bug
Affects Versions: 2.1.1, 2.2.0, 2.1.0, 2.0.1, 2.0.0
Reporter: Chris Egerton
Assignee: Chris Egerton


The 
[regex|https://github.com/apache/kafka/blob/63e4f67d9ba9e08bdce705b35c5acf32dcd20633/clients/src/main/java/org/apache/kafka/common/config/ConfigTransformer.java#L56]
 used by the {{ConfigTransformer}} class to parse config provider syntax (see 
[KIP-279|https://cwiki.apache.org/confluence/display/KAFKA/KIP-297%3A+Externalizing+Secrets+for+Connect+Configurations])
 is broken and fails when multiple path-less configs are specified. For 
example: {{"${provider:configOne} ${provider:configTwo}"}} would be parsed 
incorrectly as a reference with a path of {{"configOne} $\{provider"}}. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-8363) Config provider parsing is broken

2019-05-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16838995#comment-16838995
 ] 

ASF GitHub Bot commented on KAFKA-8363:
---

C0urante commented on pull request #6726: KAFKA-8363: Fix parsing bug for 
config providers
URL: https://github.com/apache/kafka/pull/6726
 
 
   [Jira](https://issues.apache.org/jira/browse/KAFKA-8363)
   
   The regex used to parse config provider syntax can fail to accurately parse 
provided configurations when multiple path-less configs are requested (e.g., 
`${provider:pathOne} ${provider:pathTwo}`). This change fixes that parsing and 
adds a unit test to prevent regression.
   
   This bug is present since the addition of config providers and so should be 
backported through to 2.0, when they were first added.
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Config provider parsing is broken
> -
>
> Key: KAFKA-8363
> URL: https://issues.apache.org/jira/browse/KAFKA-8363
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 2.0.0, 2.0.1, 2.1.0, 2.2.0, 2.1.1
>Reporter: Chris Egerton
>Assignee: Chris Egerton
>Priority: Major
>
> The 
> [regex|https://github.com/apache/kafka/blob/63e4f67d9ba9e08bdce705b35c5acf32dcd20633/clients/src/main/java/org/apache/kafka/common/config/ConfigTransformer.java#L56]
>  used by the {{ConfigTransformer}} class to parse config provider syntax (see 
> [KIP-279|https://cwiki.apache.org/confluence/display/KAFKA/KIP-297%3A+Externalizing+Secrets+for+Connect+Configurations])
>  is broken and fails when multiple path-less configs are specified. For 
> example: {{"${provider:configOne} ${provider:configTwo}"}} would be parsed 
> incorrectly as a reference with a path of {{"configOne} $\{provider"}}. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-6455) Improve timestamp propagation at DSL level

2019-05-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-6455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16838991#comment-16838991
 ] 

ASF GitHub Bot commented on KAFKA-6455:
---

mjsax commented on pull request #6725: KAFKA-6455: Improve DSL operator 
timestamp semantics
URL: https://github.com/apache/kafka/pull/6725
 
 
   *More detailed description of your change,
   if necessary. The PR title and PR message become
   the squashed commit message, so use a separate
   comment to ping reviewers.*
   
   *Summary of testing strategy (including rationale)
   for the feature or bug fix. Unit and/or integration
   tests are expected for any behaviour change and
   system tests should be considered for larger changes.*
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Improve timestamp propagation at DSL level
> --
>
> Key: KAFKA-6455
> URL: https://issues.apache.org/jira/browse/KAFKA-6455
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Affects Versions: 1.0.0
>Reporter: Matthias J. Sax
>Assignee: Matthias J. Sax
>Priority: Major
>  Labels: needs-kip
>
> At DSL level, we inherit the timestamp propagation "contract" from the 
> Processor API. This contract in not optimal at DSL level, and we should 
> define a DSL level contract that matches the semantics of the corresponding 
> DSL operator.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-8362) LogCleaner gets stuck after partition move between log directories

2019-05-13 Thread Julio Ng (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julio Ng updated KAFKA-8362:

Description: 
When a partition is moved from one directory to another, their checkpoint entry 
in cleaner-offset-checkpoint file is not removed from the source directory.

As a consequence when we read the last firstDirtyOffset, we might get a stale 
value from the old checkpoint file.

Basically, we need clean up the entry from the check point file in the source 
directory when the move is completed

The current issue is that the code in LogCleanerManager:
{noformat}
/**
 * @return the position processed for all logs.
 */
def allCleanerCheckpoints: Map[TopicPartition, Long] = {
  inLock(lock) {
checkpoints.values.flatMap(checkpoint => {
  try {
checkpoint.read()
  } catch {
case e: KafkaStorageException =>
  error(s"Failed to access checkpoint file ${checkpoint.file.getName} 
in dir ${checkpoint.file.getParentFile.getAbsolutePath}", e)
  Map.empty[TopicPartition, Long]
  }
}).toMap
  }
}{noformat}
collapses the offsets when multiple entries exist for the topicPartition

  was:
When a partition is moved from one directory to another, their checkpoint entry 
in cleaner-offset-checkpoint file is not removed from the source directory.

As a consequence when we read the last firstDirtyOffset, we might get a stale 
value from the old checkpoint file.

Basically, we need clean up the entry from the check point file in the source 
directory when the move is completed

The current issue is that the code in LogCleanerManager:

{{def allCleanerCheckpoints: Map[TopicPartition, Long] = {}}
{{  inLock(lock) {}}
{{    checkpoints.values.flatMap(checkpoint => {}}
{{      try {}}
{{        checkpoint.read()}}
{{      } catch {}}
{{        case e: KafkaStorageException =>}}
{{          error(s"Failed to access checkpoint file ${checkpoint.file.getName} 
in dir ${checkpoint.file.getParentFile.getAbsolutePath}", e)}}
{{          Map.empty[TopicPartition, Long]}}
{{      }}}
{{    }).toMap}}
{{  }}}
{{}}}

collapses the offsets when multiple entries exist for the topicPartition


> LogCleaner gets stuck after partition move between log directories
> --
>
> Key: KAFKA-8362
> URL: https://issues.apache.org/jira/browse/KAFKA-8362
> Project: Kafka
>  Issue Type: Bug
>  Components: log cleaner
>Reporter: Julio Ng
>Priority: Major
>
> When a partition is moved from one directory to another, their checkpoint 
> entry in cleaner-offset-checkpoint file is not removed from the source 
> directory.
> As a consequence when we read the last firstDirtyOffset, we might get a stale 
> value from the old checkpoint file.
> Basically, we need clean up the entry from the check point file in the source 
> directory when the move is completed
> The current issue is that the code in LogCleanerManager:
> {noformat}
> /**
>  * @return the position processed for all logs.
>  */
> def allCleanerCheckpoints: Map[TopicPartition, Long] = {
>   inLock(lock) {
> checkpoints.values.flatMap(checkpoint => {
>   try {
> checkpoint.read()
>   } catch {
> case e: KafkaStorageException =>
>   error(s"Failed to access checkpoint file ${checkpoint.file.getName} 
> in dir ${checkpoint.file.getParentFile.getAbsolutePath}", e)
>   Map.empty[TopicPartition, Long]
>   }
> }).toMap
>   }
> }{noformat}
> collapses the offsets when multiple entries exist for the topicPartition



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-8362) LogCleaner gets stuck after partition move between log directories

2019-05-13 Thread Julio Ng (JIRA)
Julio Ng created KAFKA-8362:
---

 Summary: LogCleaner gets stuck after partition move between log 
directories
 Key: KAFKA-8362
 URL: https://issues.apache.org/jira/browse/KAFKA-8362
 Project: Kafka
  Issue Type: Bug
  Components: log cleaner
Reporter: Julio Ng


When a partition is moved from one directory to another, their checkpoint entry 
in cleaner-offset-checkpoint file is not removed from the source directory.

As a consequence when we read the last firstDirtyOffset, we might get a stale 
value from the old checkpoint file.

Basically, we need clean up the entry from the check point file in the source 
directory when the move is completed

The current issue is that the code in LogCleanerManager:

{{def allCleanerCheckpoints: Map[TopicPartition, Long] = {}}
{{  inLock(lock) {}}
{{    checkpoints.values.flatMap(checkpoint => {}}
{{      try {}}
{{        checkpoint.read()}}
{{      } catch {}}
{{        case e: KafkaStorageException =>}}
{{          error(s"Failed to access checkpoint file ${checkpoint.file.getName} 
in dir ${checkpoint.file.getParentFile.getAbsolutePath}", e)}}
{{          Map.empty[TopicPartition, Long]}}
{{      }}}
{{    }).toMap}}
{{  }}}
{{}}}

collapses the offsets when multiple entries exist for the topicPartition



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-8361) Fix ConsumerPerformanceTest#testNonDetailedHeaderMatchBody to test a real ConsumerPerformance's method

2019-05-13 Thread Kengo Seki (JIRA)
Kengo Seki created KAFKA-8361:
-

 Summary: Fix 
ConsumerPerformanceTest#testNonDetailedHeaderMatchBody to test a real 
ConsumerPerformance's method
 Key: KAFKA-8361
 URL: https://issues.apache.org/jira/browse/KAFKA-8361
 Project: Kafka
  Issue Type: Improvement
  Components: unit tests
Reporter: Kengo Seki
Assignee: Kengo Seki


{{kafka.tools.ConsumerPerformanceTest#testNonDetailedHeaderMatchBody}} doesn't 
work as a regression test for now, since it tests an anonymous function defined 
in the test method itself.
{code:java}
  @Test
  def testNonDetailedHeaderMatchBody(): Unit = {
testHeaderMatchContent(detailed = false, 2, () => 
println(s"${dateFormat.format(System.currentTimeMillis)}, " +
  s"${dateFormat.format(System.currentTimeMillis)}, 1.0, 1.0, 1, 1.0, 1, 1, 
1.1, 1.1"))
  }
{code}
It should test a real {{ConsumerPerformance}}'s method, just like 
{{testDetailedHeaderMatchBody}}.
{code:java}
  @Test
  def testDetailedHeaderMatchBody(): Unit = {
testHeaderMatchContent(detailed = true, 2,
  () => ConsumerPerformance.printConsumerProgress(1, 1024 * 1024, 0, 1, 0, 
0, 1, dateFormat, 1L))
  }
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-8315) Cannot pass Materialized into a join operation - hence cant set retention period independent of grace

2019-05-13 Thread John Roesler (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16838878#comment-16838878
 ] 

John Roesler commented on KAFKA-8315:
-

[~the4thamigo_uk], The python tests are something different.

I was just talking about the Java classes that are called like 
"WhateverWhateverIntegrationTest". You should be able to run those right from 
the IDE. If you want to run all the streams integration tests, it's `./gradlew 
clean :streams:test`. It will take 7 minutes-ish.

> Cannot pass Materialized into a join operation - hence cant set retention 
> period independent of grace
> -
>
> Key: KAFKA-8315
> URL: https://issues.apache.org/jira/browse/KAFKA-8315
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Andrew
>Assignee: John Roesler
>Priority: Major
> Attachments: code.java
>
>
> The documentation says to use `Materialized` not `JoinWindows.until()` 
> ([https://kafka.apache.org/22/javadoc/org/apache/kafka/streams/kstream/JoinWindows.html#until-long-]),
>  but there is no where to pass a `Materialized` instance to the join 
> operation, only to the group operation is supported it seems.
>  
> Slack conversation here : 
> [https://confluentcommunity.slack.com/archives/C48AHTCUQ/p1556799561287300]
> [Additional]
> From what I understand, the retention period should be independent of the 
> grace period, so I think this is more than a documentation fix (see comments 
> below)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-8341) AdminClient should retry coordinator lookup after NOT_COORDINATOR error

2019-05-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16838828#comment-16838828
 ] 

ASF GitHub Bot commented on KAFKA-8341:
---

vikasconfluent commented on pull request #6723: KAFKA-8341. Retry Consumer 
group operation for NOT_COORDINATOR error
URL: https://github.com/apache/kafka/pull/6723
 
 
   An api call for consumer groups is made up of two calls:
   1. Find the consumer group coordinator
   2. Send the request to the node found in step 1
   
   But the coordinator can get moved between step 1 and 2. In that case we
   currently fail. This change fixes that by detecting this error and then
   retrying.
   
   Following APIs are impacted by this behavior:
   1. listConsumerGroupOffsets
   2. deleteConsumerGroups
   3. describeConsumerGroups
   
   Each of these call result in AdminClient making multiple calls to the 
backend.
   As AdminClient code invokes each backend api in a separate event loop, the 
code
   that detects the error (step 2) need to restart whole operation including
   step 1. This needed a change to capture the "Call" object for step 1 in
   step 2.
   
   This change thus refactors the code to make it easy to perform a retry of
   whole operation. It creates a Context object to capture the api arguments
   that can then be referred by each "Call" objects. This is just for 
convenience
   and makes method signature simpler as we only need to pass one object instead
   of multiple api arguments.
   
   The creation of each "Call" object is done in a new method, so we can
   easily resubmit step 1 in step 2.
   
   This change also modifies corresponding unit test to test this scenario.
   
   *More detailed description of your change,
   if necessary. The PR title and PR message become
   the squashed commit message, so use a separate
   comment to ping reviewers.*
   
   *Summary of testing strategy (including rationale)
   for the feature or bug fix. Unit and/or integration
   tests are expected for any behaviour change and
   system tests should be considered for larger changes.*
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> AdminClient should retry coordinator lookup after NOT_COORDINATOR error
> ---
>
> Key: KAFKA-8341
> URL: https://issues.apache.org/jira/browse/KAFKA-8341
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>Assignee: Vikas Singh
>Priority: Major
>
> If a group operation (e.g. DescribeGroup) fails because the coordinator has 
> moved, the AdminClient should lookup the coordinator before retrying the 
> operation. Currently we will either fail or just retry anyway. This is 
> similar in some ways to controller rediscovery after getting NOT_CONTROLLER 
> errors.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-8360) Docs do not mention RequestQueueSize JMX metric

2019-05-13 Thread Charles Francis Larrieu Casias (JIRA)
Charles Francis Larrieu Casias created KAFKA-8360:
-

 Summary: Docs do not mention RequestQueueSize JMX metric
 Key: KAFKA-8360
 URL: https://issues.apache.org/jira/browse/KAFKA-8360
 Project: Kafka
  Issue Type: Improvement
  Components: documentation, metrics, network
Reporter: Charles Francis Larrieu Casias


In the [monitoring 
documentation|[https://kafka.apache.org/documentation/#monitoring],] there is 
no mention of the `kafka.network:type=RequestChannel,name=RequestQueueSize` JMX 
metric. This is an important metric because it can indicate that there are too 
many requests in queue and suggest either increasing `queued.max.requests` 
(along with perhaps memory), or increasing `num.io.threads`.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-8315) Cannot pass Materialized into a join operation - hence cant set retention period independent of grace

2019-05-13 Thread Andrew (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16838783#comment-16838783
 ] 

Andrew commented on KAFKA-8315:
---

Thanks [~vvcephei], I am running the unit tests in the kafka project. I confess 
I havent worked out the integration tests yet, but I saw the python scripts for 
this. I agree the ordering from the RecordQueue/PartitionGroup looks sound, and 
that it is something weird with how data is pushed into the RecordQueues from 
the source topics. [~ableegoldman] any clues on this would be greatly 
appreciated.

 

Thanks again for the responses. Hopefully, we will get to the bottom of this.

> Cannot pass Materialized into a join operation - hence cant set retention 
> period independent of grace
> -
>
> Key: KAFKA-8315
> URL: https://issues.apache.org/jira/browse/KAFKA-8315
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Andrew
>Assignee: John Roesler
>Priority: Major
> Attachments: code.java
>
>
> The documentation says to use `Materialized` not `JoinWindows.until()` 
> ([https://kafka.apache.org/22/javadoc/org/apache/kafka/streams/kstream/JoinWindows.html#until-long-]),
>  but there is no where to pass a `Materialized` instance to the join 
> operation, only to the group operation is supported it seems.
>  
> Slack conversation here : 
> [https://confluentcommunity.slack.com/archives/C48AHTCUQ/p1556799561287300]
> [Additional]
> From what I understand, the retention period should be independent of the 
> grace period, so I think this is more than a documentation fix (see comments 
> below)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-8315) Cannot pass Materialized into a join operation - hence cant set retention period independent of grace

2019-05-13 Thread John Roesler (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16838720#comment-16838720
 ] 

John Roesler commented on KAFKA-8315:
-

Hi [~the4thamigo_uk],

Unfortunately, the TopologyTestDriver is going to be insufficient for 
exercising the behavior you want, since it processes events synchronously as 
soon as you call `pipeInput`, but the problem you're having appears to be with 
the logic that chooses records polled from Kafka (which only KafkaStreams does).

I'd suggest, as the fastest way to try and nail this down, actually to pull the 
Kafka project down (since we have set up integration tests that actually do use 
the brokers and run a "real" KafkaStreams) and modify one of the join 
integration tests to reproduce your use case.

This still sounds like a bug to me, even though it might not be the one that 
[~ableegoldman] reported.

Regarding the ticket, it'd be better not to split the history of this 
investigation, so I recommend just editing the title and description of the 
ticket, instead of making a new ticket.

Thanks,
-John

> Cannot pass Materialized into a join operation - hence cant set retention 
> period independent of grace
> -
>
> Key: KAFKA-8315
> URL: https://issues.apache.org/jira/browse/KAFKA-8315
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Andrew
>Assignee: John Roesler
>Priority: Major
> Attachments: code.java
>
>
> The documentation says to use `Materialized` not `JoinWindows.until()` 
> ([https://kafka.apache.org/22/javadoc/org/apache/kafka/streams/kstream/JoinWindows.html#until-long-]),
>  but there is no where to pass a `Materialized` instance to the join 
> operation, only to the group operation is supported it seems.
>  
> Slack conversation here : 
> [https://confluentcommunity.slack.com/archives/C48AHTCUQ/p1556799561287300]
> [Additional]
> From what I understand, the retention period should be independent of the 
> grace period, so I think this is more than a documentation fix (see comments 
> below)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-8325) Remove from the incomplete set failed. This should be impossible

2019-05-13 Thread Ming Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16838718#comment-16838718
 ] 

Ming Liu commented on KAFKA-8325:
-

We also run into this problem after we upgrade to Kafka 2.2.

> Remove from the incomplete set failed. This should be impossible
> 
>
> Key: KAFKA-8325
> URL: https://issues.apache.org/jira/browse/KAFKA-8325
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 2.1.0
>Reporter: Mattia Barbon
>Priority: Major
>
> I got this error when using the Kafka producer. So far it happened twice, 
> with an interval of about 1 week.
> {{ERROR [2019-05-05 08:43:07,505] 
> org.apache.kafka.clients.producer.internals.Sender: [Producer 
> clientId=, transactionalId=] Uncaught error in kafka 
> producer I/O thread:}}
> {{ ! java.lang.IllegalStateException: Remove from the incomplete set failed. 
> This should be impossible.}}
> {{ ! at 
> org.apache.kafka.clients.producer.internals.IncompleteBatches.remove(IncompleteBatches.java:44)}}
> {{ ! at 
> org.apache.kafka.clients.producer.internals.RecordAccumulator.deallocate(RecordAccumulator.java:645)}}
> {{ ! at 
> org.apache.kafka.clients.producer.internals.Sender.failBatch(Sender.java:717)}}
> {{ ! at 
> org.apache.kafka.clients.producer.internals.Sender.sendProducerData(Sender.java:365)}}
> {{ ! at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:308)}}
> {{ ! at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:233)}}
> {{ ! at java.lang.Thread.run(Thread.java:748)}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-8359) Reconsider default for leader imbalance percentage

2019-05-13 Thread Dhruvil Shah (JIRA)
Dhruvil Shah created KAFKA-8359:
---

 Summary: Reconsider default for leader imbalance percentage
 Key: KAFKA-8359
 URL: https://issues.apache.org/jira/browse/KAFKA-8359
 Project: Kafka
  Issue Type: Improvement
Reporter: Dhruvil Shah


By default, the leader imbalance ratio is 10%. This means that the controller 
won't trigger preferred leader election for a broker unless the ratio of the 
number of partitions a broker is the current leader of and the number of 
partitions it is the preferred leader of is off by more than 10%. The problem 
is when a broker is catching up after a restart, the smallest topics tend to 
catch up first and the largest ones later, so the 10% remaining difference may 
not be proportional to the broker's load. To keep better balance in the 
cluster, we should consider setting `leader.imbalance.per.broker.percentage=0` 
by default so that the preferred leaders are always elected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-8351) Log cleaner must handle transactions spanning multiple segments

2019-05-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16838672#comment-16838672
 ] 

ASF GitHub Bot commented on KAFKA-8351:
---

hachikuji commented on pull request #6722: KAFKA-8351; Cleaner should handle 
transactions spanning multiple segments
URL: https://github.com/apache/kafka/pull/6722
 
 
   When cleaning transactional data, we need to keep track of which 
transactions still have data associated with them so that we do not remove the 
markers. We had logic to do this, but it was not being carried over when 
beginning cleaning for a new set of segments. This could cause the cleaner to 
incorrectly believe a transaction marker was no longer needed. The fix here 
carries the transactional state between groups of segments to be cleaned.
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Log cleaner must handle transactions spanning multiple segments
> ---
>
> Key: KAFKA-8351
> URL: https://issues.apache.org/jira/browse/KAFKA-8351
> Project: Kafka
>  Issue Type: Bug
>  Components: log cleaner
>Reporter: Jason Gustafson
>Assignee: Jason Gustafson
>Priority: Major
>
> When cleaning transactions, we have to do some bookkeeping to keep track of 
> which transactions still have data left around. As long as there is still 
> data, we cannot remove the transaction marker. The problem is that we do this 
> tracking at the segment level. We do not carry over the ongoing transaction 
> state between segments. So if the first entry in a segment is a marker, we 
> incorrectly clean it. In the worst case, data from a committed transaction 
> could become aborted.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-7321) ensure timely processing of deletion requests in Kafka topic (Time-based log compaction)

2019-05-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16838668#comment-16838668
 ] 

ASF GitHub Bot commented on KAFKA-7321:
---

jjkoshy commented on pull request #6009: KAFKA-7321: Add a Maximum Log 
Compaction Lag (KIP-354)
URL: https://github.com/apache/kafka/pull/6009
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> ensure timely processing of deletion requests in Kafka topic (Time-based log 
> compaction)
> 
>
> Key: KAFKA-7321
> URL: https://issues.apache.org/jira/browse/KAFKA-7321
> Project: Kafka
>  Issue Type: Improvement
>  Components: log
>Reporter: xiongqi wu
>Assignee: xiongqi wu
>Priority: Major
>
> _Compaction enables Kafka to remove old messages that are flagged for 
> deletion while other messages can be retained for a relatively longer time.  
> Today, a log segment may remain un-compacted for a long time since the 
> eligibility for log compaction is determined based on compaction ratio 
> (“min.cleanable.dirty.ratio”) and min compaction lag 
> ("min.compaction.lag.ms") setting.  Ability to delete a log message through 
> compaction in a timely manner has become an important requirement in some use 
> cases (e.g., GDPR).  For example,  one use case is to delete PII (Personal 
> Identifiable information) data within 7 days while keeping non-PII 
> indefinitely in compacted format.  The goal of this change is to provide a 
> time-based compaction policy that ensures the cleanable section is compacted 
> after the specified time interval regardless of dirty ratio and “min 
> compaction lag”.  However, dirty ratio and “min compaction lag” are still 
> honored if the time based compaction rule is not violated. In other words, if 
> Kafka receives a deletion request on a key (e..g, a key with null value), the 
> corresponding log segment will be picked up for compaction after the 
> configured time interval to remove the key._
>  
> _This is to track effort in KIP 354:_
> _https://cwiki.apache.org/confluence/display/KAFKA/KIP-354%3A+Time-based+log+compaction+policy_



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (KAFKA-8335) Log cleaner skips Transactional mark and batch record, causing unlimited growth of __consumer_offsets

2019-05-13 Thread Jason Gustafson (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson resolved KAFKA-8335.

   Resolution: Fixed
Fix Version/s: 2.2.1
   2.1.2
   2.0.2

> Log cleaner skips Transactional mark and batch record, causing unlimited 
> growth of __consumer_offsets
> -
>
> Key: KAFKA-8335
> URL: https://issues.apache.org/jira/browse/KAFKA-8335
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Boquan Tang
>Assignee: Jason Gustafson
>Priority: Major
> Fix For: 2.0.2, 2.1.2, 2.2.1
>
> Attachments: seg_april_25.zip, segment.zip
>
>
> My Colleague Weichu already sent out a mail to kafka user mailing list 
> regarding this issue, but we think it's worth having a ticket tracking it.
> We are using Kafka Streams with exactly-once enabled on a Kafka cluster for
> a while.
> Recently we found that the size of __consumer_offsets partitions grew huge.
> Some partition went over 30G. This caused Kafka to take quite long to load
> "__consumer_offsets" topic on startup (it loads the topic in order to
> become group coordinator).
> We dumped the __consumer_offsets segments and found that while normal
> offset commits are nicely compacted, transaction records (COMMIT, etc) are
> all preserved. Looks like that since these messages don't have a key, the
> LogCleaner is keeping them all:
> --
> $ bin/kafka-run-class.sh kafka.tools.DumpLogSegments --files
> /003484332061.log --key-decoder-class
> kafka.serializer.StringDecoder 2>/dev/null | cat -v | head
> Dumping 003484332061.log
> Starting offset: 3484332061
> offset: 3484332089 position: 549 CreateTime: 1556003706952 isvalid: true
> keysize: 4 valuesize: 6 magic: 2 compresscodec: NONE producerId: 1006
> producerEpoch: 2530 sequence: -1 isTransactional: true headerKeys: []
> endTxnMarker: COMMIT coordinatorEpoch: 81
> offset: 3484332090 position: 627 CreateTime: 1556003706952 isvalid: true
> keysize: 4 valuesize: 6 magic: 2 compresscodec: NONE producerId: 4005
> producerEpoch: 2520 sequence: -1 isTransactional: true headerKeys: []
> endTxnMarker: COMMIT coordinatorEpoch: 84
> ...
> --
> Streams is doing transaction commits per 100ms (commit.interval.ms=100 when
> exactly-once) so the __consumer_offsets is growing really fast.
> Is this (to keep all transactions) by design, or is that a bug for
> LogCleaner?  What would be the way to clean up the topic?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (KAFKA-8315) Cannot pass Materialized into a join operation - hence cant set retention period independent of grace

2019-05-13 Thread Andrew (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16838565#comment-16838565
 ] 

Andrew edited comment on KAFKA-8315 at 5/13/19 3:52 PM:


This is the test enhanced to use timestamp extraction, and it works  ;

[https://github.com/the4thamigo-uk/kafka/commit/5121851491f2fd0471d8f3c49940175e23a26f1b]

 

So, it would seem that the issue is how the data is read when the data is 
already fully populated in the source topics. Seems like, as we discussed 
previously, it simply reads all the left records first, then the right records. 
How can we throttle the ingestion of the records to avoid this?


was (Author: the4thamigo_uk):
This is the test enhanced to use timestamp extraction, and it works  ;

[https://github.com/the4thamigo-uk/kafka/commit/5121851491f2fd0471d8f3c49940175e23a26f1b]

 

So, it would seem that the issue is how the data is read when the data is 
already fully populated in the source topics.

> Cannot pass Materialized into a join operation - hence cant set retention 
> period independent of grace
> -
>
> Key: KAFKA-8315
> URL: https://issues.apache.org/jira/browse/KAFKA-8315
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Andrew
>Assignee: John Roesler
>Priority: Major
> Attachments: code.java
>
>
> The documentation says to use `Materialized` not `JoinWindows.until()` 
> ([https://kafka.apache.org/22/javadoc/org/apache/kafka/streams/kstream/JoinWindows.html#until-long-]),
>  but there is no where to pass a `Materialized` instance to the join 
> operation, only to the group operation is supported it seems.
>  
> Slack conversation here : 
> [https://confluentcommunity.slack.com/archives/C48AHTCUQ/p1556799561287300]
> [Additional]
> From what I understand, the retention period should be independent of the 
> grace period, so I think this is more than a documentation fix (see comments 
> below)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-8335) Log cleaner skips Transactional mark and batch record, causing unlimited growth of __consumer_offsets

2019-05-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16838636#comment-16838636
 ] 

ASF GitHub Bot commented on KAFKA-8335:
---

hachikuji commented on pull request #6715: KAFKA-8335; Clean empty batches when 
sequence numbers are reused
URL: https://github.com/apache/kafka/pull/6715
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Log cleaner skips Transactional mark and batch record, causing unlimited 
> growth of __consumer_offsets
> -
>
> Key: KAFKA-8335
> URL: https://issues.apache.org/jira/browse/KAFKA-8335
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Boquan Tang
>Assignee: Jason Gustafson
>Priority: Major
> Attachments: seg_april_25.zip, segment.zip
>
>
> My Colleague Weichu already sent out a mail to kafka user mailing list 
> regarding this issue, but we think it's worth having a ticket tracking it.
> We are using Kafka Streams with exactly-once enabled on a Kafka cluster for
> a while.
> Recently we found that the size of __consumer_offsets partitions grew huge.
> Some partition went over 30G. This caused Kafka to take quite long to load
> "__consumer_offsets" topic on startup (it loads the topic in order to
> become group coordinator).
> We dumped the __consumer_offsets segments and found that while normal
> offset commits are nicely compacted, transaction records (COMMIT, etc) are
> all preserved. Looks like that since these messages don't have a key, the
> LogCleaner is keeping them all:
> --
> $ bin/kafka-run-class.sh kafka.tools.DumpLogSegments --files
> /003484332061.log --key-decoder-class
> kafka.serializer.StringDecoder 2>/dev/null | cat -v | head
> Dumping 003484332061.log
> Starting offset: 3484332061
> offset: 3484332089 position: 549 CreateTime: 1556003706952 isvalid: true
> keysize: 4 valuesize: 6 magic: 2 compresscodec: NONE producerId: 1006
> producerEpoch: 2530 sequence: -1 isTransactional: true headerKeys: []
> endTxnMarker: COMMIT coordinatorEpoch: 81
> offset: 3484332090 position: 627 CreateTime: 1556003706952 isvalid: true
> keysize: 4 valuesize: 6 magic: 2 compresscodec: NONE producerId: 4005
> producerEpoch: 2520 sequence: -1 isTransactional: true headerKeys: []
> endTxnMarker: COMMIT coordinatorEpoch: 84
> ...
> --
> Streams is doing transaction commits per 100ms (commit.interval.ms=100 when
> exactly-once) so the __consumer_offsets is growing really fast.
> Is this (to keep all transactions) by design, or is that a bug for
> LogCleaner?  What would be the way to clean up the topic?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (KAFKA-8315) Cannot pass Materialized into a join operation - hence cant set retention period independent of grace

2019-05-13 Thread Andrew (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16838565#comment-16838565
 ] 

Andrew edited comment on KAFKA-8315 at 5/13/19 3:45 PM:


This is the test enhanced to use timestamp extraction, and it works  ;

[https://github.com/the4thamigo-uk/kafka/commit/5121851491f2fd0471d8f3c49940175e23a26f1b]

 

So, it would seem that the issue is how the data is read when the data is 
already fully populated in the source topics.


was (Author: the4thamigo_uk):
This is the test enhanced to use timestamp extraction, and it works  ;

[https://github.com/the4thamigo-uk/kafka/commit/5121851491f2fd0471d8f3c49940175e23a26f1b]

 

So, it would seem that the issue is how the data is read when the data already 
exists in the topics.

> Cannot pass Materialized into a join operation - hence cant set retention 
> period independent of grace
> -
>
> Key: KAFKA-8315
> URL: https://issues.apache.org/jira/browse/KAFKA-8315
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Andrew
>Assignee: John Roesler
>Priority: Major
> Attachments: code.java
>
>
> The documentation says to use `Materialized` not `JoinWindows.until()` 
> ([https://kafka.apache.org/22/javadoc/org/apache/kafka/streams/kstream/JoinWindows.html#until-long-]),
>  but there is no where to pass a `Materialized` instance to the join 
> operation, only to the group operation is supported it seems.
>  
> Slack conversation here : 
> [https://confluentcommunity.slack.com/archives/C48AHTCUQ/p1556799561287300]
> [Additional]
> From what I understand, the retention period should be independent of the 
> grace period, so I think this is more than a documentation fix (see comments 
> below)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (KAFKA-8347) Choose next record to process by timestamp

2019-05-13 Thread John Roesler (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Roesler reassigned KAFKA-8347:
---

Assignee: Sophie Blee-Goldman

> Choose next record to process by timestamp
> --
>
> Key: KAFKA-8347
> URL: https://issues.apache.org/jira/browse/KAFKA-8347
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Sophie Blee-Goldman
>Assignee: Sophie Blee-Goldman
>Priority: Major
>
> Currently PartitionGroup will determine the next record to process by 
> choosing the partition with the lowest stream time. However if a partition 
> contains out of order data its stream time may be significantly larger than 
> the timestamp of the next record. The next record should instead be chosen as 
> the record with the lowest timestamp across all partitions, regardless of 
> which partition it comes from or what its partition time is.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-8336) Enable dynamic update of client-side SSL factory in brokers

2019-05-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16838605#comment-16838605
 ] 

ASF GitHub Bot commented on KAFKA-8336:
---

rajinisivaram commented on pull request #6721: KAFKA-8336; Enable dynamic 
reconfiguration of broker's client-side certs
URL: https://github.com/apache/kafka/pull/6721
 
 
   Enable reconfiguration of SSL keystores and truststores in client-side 
channel builders used by brokers for controller, transaction coordinator and 
replica fetchers. This enables brokers using TLS mutual authentication for 
inter-broker listener to use short-lived certs that may be updated before 
expiry without restarting brokers.
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Enable dynamic update of client-side SSL factory in brokers
> ---
>
> Key: KAFKA-8336
> URL: https://issues.apache.org/jira/browse/KAFKA-8336
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 2.2.0
>Reporter: Rajini Sivaram
>Assignee: Rajini Sivaram
>Priority: Major
> Fix For: 2.3.0
>
>
> We currently support dynamic update of server-side keystores. This allows 
> expired certs to be updated on brokers without a rolling restart. When mutual 
> authentication is enabled for inter-broker-communication 
> (ssl.client.auth=required), we dont currently dynamically update client-side 
> keystores for controller or transaction coordinator. So a broker restart (or 
> controller change) is required for cert update for this case. Since 
> short-lived SSL cert is a common usecase, we should enable client-side cert 
> updates for all client connections initiated by the broker to ensure that SSL 
> certificate expiry can be handled with dynamic config updates on brokers for 
> all configurations.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-8357) OOM on HPUX

2019-05-13 Thread Shamil Sabirov (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shamil Sabirov updated KAFKA-8357:
--
Description: 
we have troubles similar to KAFKA-5962

issue resolved by updating docs. for linux

but i have no idea how we can fix this for HPUX environment

any ideas?

  was:
we have trubles similar to KAFKA-5962

issue resolved by updating docs. for linux

but i have no idea how we can fix this for HPUX environment

any ideas?


> OOM on HPUX
> ---
>
> Key: KAFKA-8357
> URL: https://issues.apache.org/jira/browse/KAFKA-8357
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 2.2.0
> Environment: HP-UX B.11.31 U ia64
>Reporter: Shamil Sabirov
>Priority: Major
> Attachments: server.log.2019-05-10-11
>
>
> we have troubles similar to KAFKA-5962
> issue resolved by updating docs. for linux
> but i have no idea how we can fix this for HPUX environment
> any ideas?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-8358) KafkaConsumer.endOffsets should be able to also return end offsets while not ignoring control records

2019-05-13 Thread Natan Silnitsky (JIRA)
Natan Silnitsky created KAFKA-8358:
--

 Summary: KafkaConsumer.endOffsets should be able to also return 
end offsets while not ignoring control records
 Key: KAFKA-8358
 URL: https://issues.apache.org/jira/browse/KAFKA-8358
 Project: Kafka
  Issue Type: Improvement
Reporter: Natan Silnitsky


We have a use case where we have a wrapper on top of {{kafkaConsumer}} for 
compact logs.
In order to know that a user can get "new" values for a key in the compact log, 
on init, or on rebalance, we need to block until all "old" values were read.

We wanted to use {{KafkaConsumer.endOffsets}} to help us find out where the 
"old" values end.
once all "old" values arrive from {{KafkaConsumer.poll}}, we can release the 
blocking on getting new values.

But it seems that [control 
records|https://github.com/apache/kafka/blob/c09e25fac2aaea61af892ae3e5273679a4bdbc7d/clients/src/main/java/org/apache/kafka/common/record/DefaultRecordBatch.java#L128]
 are not received in {{KafkaConsumer.poll }} but are taking into account for 
{{KafkaConsumer.endOffsets }}

So the Feature request is for {{KafkaConsumer.endOffsets}} to have a flag to 
ignore control records, the same way that {{KafkaConsumer.poll }} ignores them.



(From a quick review of the code, it seems that 
{{LeaderEpochFile}}.[assign|https://github.com/apache/kafka/blob/c09e25fac2aaea61af892ae3e5273679a4bdbc7d/core/src/main/scala/kafka/server/epoch/LeaderEpochFileCache.scala#L51]
 can be given the flag isControl from 
[batch.isControlBatch|https://github.com/apache/kafka/blob/c09e25fac2aaea61af892ae3e5273679a4bdbc7d/clients/src/main/java/org/apache/kafka/common/record/RecordBatch.java#L239]

But I'm maybe wrong with my understanding there...)

CC:
[~berman7] [~berman]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-8357) OOM on HPUX

2019-05-13 Thread Shamil Sabirov (JIRA)
Shamil Sabirov created KAFKA-8357:
-

 Summary: OOM on HPUX
 Key: KAFKA-8357
 URL: https://issues.apache.org/jira/browse/KAFKA-8357
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 2.2.0
 Environment: HP-UX B.11.31 U ia64
Reporter: Shamil Sabirov
 Attachments: server.log.2019-05-10-11

we have trubles similar to KAFKA-5962

issue resolved by updating docs. for linux

but i have no idea how we can fix this for HPUX environment

any ideas?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (KAFKA-8315) Cannot pass Materialized into a join operation - hence cant set retention period independent of grace

2019-05-13 Thread Andrew (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16838565#comment-16838565
 ] 

Andrew edited comment on KAFKA-8315 at 5/13/19 2:17 PM:


This is the test enhanced to use timestamp extraction, and it works  ;

[https://github.com/the4thamigo-uk/kafka/commit/5121851491f2fd0471d8f3c49940175e23a26f1b]

 

So, it would seem that the issue is how the data is read when the data already 
exists in the topics.


was (Author: the4thamigo_uk):
This is the test enhanced to use timestamp extraction, and it works  ;

https://github.com/the4thamigo-uk/kafka/commit/5121851491f2fd0471d8f3c49940175e23a26f1b

> Cannot pass Materialized into a join operation - hence cant set retention 
> period independent of grace
> -
>
> Key: KAFKA-8315
> URL: https://issues.apache.org/jira/browse/KAFKA-8315
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Andrew
>Assignee: John Roesler
>Priority: Major
> Attachments: code.java
>
>
> The documentation says to use `Materialized` not `JoinWindows.until()` 
> ([https://kafka.apache.org/22/javadoc/org/apache/kafka/streams/kstream/JoinWindows.html#until-long-]),
>  but there is no where to pass a `Materialized` instance to the join 
> operation, only to the group operation is supported it seems.
>  
> Slack conversation here : 
> [https://confluentcommunity.slack.com/archives/C48AHTCUQ/p1556799561287300]
> [Additional]
> From what I understand, the retention period should be independent of the 
> grace period, so I think this is more than a documentation fix (see comments 
> below)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-8315) Cannot pass Materialized into a join operation - hence cant set retention period independent of grace

2019-05-13 Thread Andrew (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16838565#comment-16838565
 ] 

Andrew commented on KAFKA-8315:
---

This is the test enhanced to use timestamp extraction, and it works  ;

https://github.com/the4thamigo-uk/kafka/commit/5121851491f2fd0471d8f3c49940175e23a26f1b

> Cannot pass Materialized into a join operation - hence cant set retention 
> period independent of grace
> -
>
> Key: KAFKA-8315
> URL: https://issues.apache.org/jira/browse/KAFKA-8315
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Andrew
>Assignee: John Roesler
>Priority: Major
> Attachments: code.java
>
>
> The documentation says to use `Materialized` not `JoinWindows.until()` 
> ([https://kafka.apache.org/22/javadoc/org/apache/kafka/streams/kstream/JoinWindows.html#until-long-]),
>  but there is no where to pass a `Materialized` instance to the join 
> operation, only to the group operation is supported it seems.
>  
> Slack conversation here : 
> [https://confluentcommunity.slack.com/archives/C48AHTCUQ/p1556799561287300]
> [Additional]
> From what I understand, the retention period should be independent of the 
> grace period, so I think this is more than a documentation fix (see comments 
> below)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (KAFKA-8315) Cannot pass Materialized into a join operation - hence cant set retention period independent of grace

2019-05-13 Thread Andrew (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16838550#comment-16838550
 ] 

Andrew edited comment on KAFKA-8315 at 5/13/19 1:50 PM:


This test appears to work ok 
[https://github.com/the4thamigo-uk/kafka/commit/560432e7daae217a2255161787dd55ca56845794.|https://github.com/the4thamigo-uk/kafka/commit/560432e7daae217a2255161787dd55ca56845794]

 

Differences with join-example are :
1) It is using TopologyTestDriver, which means data is not pre-populated in 
topics.
2) Im not using timestamp extractors


was (Author: the4thamigo_uk):
This test appears to work ok 
[https://github.com/the4thamigo-uk/kafka/commit/560432e7daae217a2255161787dd55ca56845794.|https://github.com/the4thamigo-uk/kafka/commit/560432e7daae217a2255161787dd55ca56845794]

> Cannot pass Materialized into a join operation - hence cant set retention 
> period independent of grace
> -
>
> Key: KAFKA-8315
> URL: https://issues.apache.org/jira/browse/KAFKA-8315
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Andrew
>Assignee: John Roesler
>Priority: Major
> Attachments: code.java
>
>
> The documentation says to use `Materialized` not `JoinWindows.until()` 
> ([https://kafka.apache.org/22/javadoc/org/apache/kafka/streams/kstream/JoinWindows.html#until-long-]),
>  but there is no where to pass a `Materialized` instance to the join 
> operation, only to the group operation is supported it seems.
>  
> Slack conversation here : 
> [https://confluentcommunity.slack.com/archives/C48AHTCUQ/p1556799561287300]
> [Additional]
> From what I understand, the retention period should be independent of the 
> grace period, so I think this is more than a documentation fix (see comments 
> below)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (KAFKA-8315) Cannot pass Materialized into a join operation - hence cant set retention period independent of grace

2019-05-13 Thread Andrew (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16838550#comment-16838550
 ] 

Andrew edited comment on KAFKA-8315 at 5/13/19 1:48 PM:


This test appears to work ok 
[https://github.com/the4thamigo-uk/kafka/commit/560432e7daae217a2255161787dd55ca56845794.|https://github.com/the4thamigo-uk/kafka/commit/560432e7daae217a2255161787dd55ca56845794]


was (Author: the4thamigo_uk):
This test appears to work ok 
[https://github.com/the4thamigo-uk/kafka/commit/560432e7daae217a2255161787dd55ca56845794.]

> Cannot pass Materialized into a join operation - hence cant set retention 
> period independent of grace
> -
>
> Key: KAFKA-8315
> URL: https://issues.apache.org/jira/browse/KAFKA-8315
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Andrew
>Assignee: John Roesler
>Priority: Major
> Attachments: code.java
>
>
> The documentation says to use `Materialized` not `JoinWindows.until()` 
> ([https://kafka.apache.org/22/javadoc/org/apache/kafka/streams/kstream/JoinWindows.html#until-long-]),
>  but there is no where to pass a `Materialized` instance to the join 
> operation, only to the group operation is supported it seems.
>  
> Slack conversation here : 
> [https://confluentcommunity.slack.com/archives/C48AHTCUQ/p1556799561287300]
> [Additional]
> From what I understand, the retention period should be independent of the 
> grace period, so I think this is more than a documentation fix (see comments 
> below)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (KAFKA-8315) Cannot pass Materialized into a join operation - hence cant set retention period independent of grace

2019-05-13 Thread Andrew (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16838550#comment-16838550
 ] 

Andrew edited comment on KAFKA-8315 at 5/13/19 1:48 PM:


This test appears to work ok 
[https://github.com/the4thamigo-uk/kafka/commit/560432e7daae217a2255161787dd55ca56845794.|https://github.com/the4thamigo-uk/kafka/commit/560432e7daae217a2255161787dd55ca56845794]


was (Author: the4thamigo_uk):
This test appears to work ok 
[https://github.com/the4thamigo-uk/kafka/commit/560432e7daae217a2255161787dd55ca56845794.|https://github.com/the4thamigo-uk/kafka/commit/560432e7daae217a2255161787dd55ca56845794]

> Cannot pass Materialized into a join operation - hence cant set retention 
> period independent of grace
> -
>
> Key: KAFKA-8315
> URL: https://issues.apache.org/jira/browse/KAFKA-8315
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Andrew
>Assignee: John Roesler
>Priority: Major
> Attachments: code.java
>
>
> The documentation says to use `Materialized` not `JoinWindows.until()` 
> ([https://kafka.apache.org/22/javadoc/org/apache/kafka/streams/kstream/JoinWindows.html#until-long-]),
>  but there is no where to pass a `Materialized` instance to the join 
> operation, only to the group operation is supported it seems.
>  
> Slack conversation here : 
> [https://confluentcommunity.slack.com/archives/C48AHTCUQ/p1556799561287300]
> [Additional]
> From what I understand, the retention period should be independent of the 
> grace period, so I think this is more than a documentation fix (see comments 
> below)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-8315) Cannot pass Materialized into a join operation - hence cant set retention period independent of grace

2019-05-13 Thread Andrew (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16838550#comment-16838550
 ] 

Andrew commented on KAFKA-8315:
---

This test appears to work ok 
[https://github.com/the4thamigo-uk/kafka/commit/560432e7daae217a2255161787dd55ca56845794.]

> Cannot pass Materialized into a join operation - hence cant set retention 
> period independent of grace
> -
>
> Key: KAFKA-8315
> URL: https://issues.apache.org/jira/browse/KAFKA-8315
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Andrew
>Assignee: John Roesler
>Priority: Major
> Attachments: code.java
>
>
> The documentation says to use `Materialized` not `JoinWindows.until()` 
> ([https://kafka.apache.org/22/javadoc/org/apache/kafka/streams/kstream/JoinWindows.html#until-long-]),
>  but there is no where to pass a `Materialized` instance to the join 
> operation, only to the group operation is supported it seems.
>  
> Slack conversation here : 
> [https://confluentcommunity.slack.com/archives/C48AHTCUQ/p1556799561287300]
> [Additional]
> From what I understand, the retention period should be independent of the 
> grace period, so I think this is more than a documentation fix (see comments 
> below)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (KAFKA-8315) Cannot pass Materialized into a join operation - hence cant set retention period independent of grace

2019-05-13 Thread Andrew (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16838471#comment-16838471
 ] 

Andrew edited comment on KAFKA-8315 at 5/13/19 12:52 PM:
-

[~vvcephei] [~ableegoldman] I was just looking at the unit tests for 
PartitionGroup and noticed that the comment refers to timestamps, but the 
ConsumerGroup constructor that is used passes the value in the offset parameter 
:

[https://github.com/apache/kafka/blob/trunk/streams/src/test/java/org/apache/kafka/streams/processor/internals/PartitionGroupTest.java#L92]

Is this correct?

 

Update: I see now that this is the MockTimestampExtractor that uses the offset 
as the timestamp... 

[https://github.com/apache/kafka/blob/trunk/streams/src/test/java/org/apache/kafka/test/MockTimestampExtractor.java]

 


was (Author: the4thamigo_uk):
[~vvcephei] [~ableegoldman] I was just looking at the unit tests for 
PartitionGroup and noticed that the comment refers to timestamps, but the 
ConsumerGroup constructor that is used passes the value in the offset parameter 
:

[https://github.com/apache/kafka/blob/trunk/streams/src/test/java/org/apache/kafka/streams/processor/internals/PartitionGroupTest.java#L92]

Is this correct?

> Cannot pass Materialized into a join operation - hence cant set retention 
> period independent of grace
> -
>
> Key: KAFKA-8315
> URL: https://issues.apache.org/jira/browse/KAFKA-8315
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Andrew
>Assignee: John Roesler
>Priority: Major
> Attachments: code.java
>
>
> The documentation says to use `Materialized` not `JoinWindows.until()` 
> ([https://kafka.apache.org/22/javadoc/org/apache/kafka/streams/kstream/JoinWindows.html#until-long-]),
>  but there is no where to pass a `Materialized` instance to the join 
> operation, only to the group operation is supported it seems.
>  
> Slack conversation here : 
> [https://confluentcommunity.slack.com/archives/C48AHTCUQ/p1556799561287300]
> [Additional]
> From what I understand, the retention period should be independent of the 
> grace period, so I think this is more than a documentation fix (see comments 
> below)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-8315) Cannot pass Materialized into a join operation - hence cant set retention period independent of grace

2019-05-13 Thread Andrew (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16838471#comment-16838471
 ] 

Andrew commented on KAFKA-8315:
---

[~vvcephei] [~ableegoldman] I was just looking at the unit tests for 
PartitionGroup and noticed that the comment refers to timestamps, but the 
ConsumerGroup constructor that is used passes the value in the offset parameter 
:

[https://github.com/apache/kafka/blob/trunk/streams/src/test/java/org/apache/kafka/streams/processor/internals/PartitionGroupTest.java#L92]

Is this correct?

> Cannot pass Materialized into a join operation - hence cant set retention 
> period independent of grace
> -
>
> Key: KAFKA-8315
> URL: https://issues.apache.org/jira/browse/KAFKA-8315
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Andrew
>Assignee: John Roesler
>Priority: Major
> Attachments: code.java
>
>
> The documentation says to use `Materialized` not `JoinWindows.until()` 
> ([https://kafka.apache.org/22/javadoc/org/apache/kafka/streams/kstream/JoinWindows.html#until-long-]),
>  but there is no where to pass a `Materialized` instance to the join 
> operation, only to the group operation is supported it seems.
>  
> Slack conversation here : 
> [https://confluentcommunity.slack.com/archives/C48AHTCUQ/p1556799561287300]
> [Additional]
> From what I understand, the retention period should be independent of the 
> grace period, so I think this is more than a documentation fix (see comments 
> below)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (KAFKA-8315) Cannot pass Materialized into a join operation - hence cant set retention period independent of grace

2019-05-13 Thread Andrew (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16838012#comment-16838012
 ] 

Andrew edited comment on KAFKA-8315 at 5/13/19 7:10 AM:


[~ableegoldman] Right, I see what you mean, this loop only goes until the head 
is found 
[https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/processor/internals/RecordQueue.java#L158.]
 We do have out of order data in our real data streams, however, it looks like 
you are right that it shouldn't affect my {{join-example}} demo, which 
reproduces the issue with only ordered data. 

Any further ideas on why the {{join-example}} doesnt work? If it is a different 
bug, shall we open a new ticket as this current one is not really relevant 
anymore?


was (Author: the4thamigo_uk):
[~ableegoldman] Right, I see what you mean, this loop only goes until the head 
is found 
[https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/processor/internals/RecordQueue.java#L158.]
 We do have out of order data in our real data streams, however, it looks like 
you are right that it shouldn't affect my {{join-example}} demo, which 
reproduces the issue with only ordered data. Any further ideas on why the 
{{join-example}} doesnt work?

> Cannot pass Materialized into a join operation - hence cant set retention 
> period independent of grace
> -
>
> Key: KAFKA-8315
> URL: https://issues.apache.org/jira/browse/KAFKA-8315
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Andrew
>Assignee: John Roesler
>Priority: Major
> Attachments: code.java
>
>
> The documentation says to use `Materialized` not `JoinWindows.until()` 
> ([https://kafka.apache.org/22/javadoc/org/apache/kafka/streams/kstream/JoinWindows.html#until-long-]),
>  but there is no where to pass a `Materialized` instance to the join 
> operation, only to the group operation is supported it seems.
>  
> Slack conversation here : 
> [https://confluentcommunity.slack.com/archives/C48AHTCUQ/p1556799561287300]
> [Additional]
> From what I understand, the retention period should be independent of the 
> grace period, so I think this is more than a documentation fix (see comments 
> below)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (KAFKA-8354) Replace SyncGroup request/response with automated protocol

2019-05-13 Thread Boyang Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boyang Chen reassigned KAFKA-8354:
--

Assignee: Boyang Chen

> Replace SyncGroup request/response with automated protocol
> --
>
> Key: KAFKA-8354
> URL: https://issues.apache.org/jira/browse/KAFKA-8354
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Boyang Chen
>Assignee: Boyang Chen
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)