[jira] [Commented] (KAFKA-8465) Make sure that the copy of the same topic is evenly distributed across a broker's disk.

2019-07-31 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16897788#comment-16897788
 ] 

ASF GitHub Bot commented on KAFKA-8465:
---

lordcheng10 commented on pull request #6866: [KAFKA-8465]replication  strategy 
for the topic dimension
URL: https://github.com/apache/kafka/pull/6866
 
 
   When some partiton's replication is assigned to a broker, which disks 
should these copies be placed on the broker? The original strategy is to 
allocate according to the number of partiitons, but this will cause a partiton 
with too many topics to be stored on a disk, which may cause disk hotspot 
problems.
   In order to solve this problem, we propose an improved strategy: first 
ensure that the number of partitions of each disk in the topic dimension is 
even. If the number of partitions of a topic on two disks is equal, then sort 
according to the total number of partitions on the disk. Select a disk with the 
least number of partitions to store the current replication.
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Make sure that the copy of the same topic is evenly distributed across a 
> broker's disk.
> ---
>
> Key: KAFKA-8465
> URL: https://issues.apache.org/jira/browse/KAFKA-8465
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 2.0.1, 2.2.0, 2.2.1
>Reporter: ChenLin
>Priority: Major
> Attachments: image-2019-07-30-13-40-12-711.png, 
> image-2019-07-30-13-40-49-878.png, 
> replication_strategy_for_the_topic_dimension.patch
>
>
> When some partiton's replication is assigned to a broker, which disks should 
> these copies be placed on the broker? The original strategy is to allocate 
> according to the number of partiitons。This strategy will result in uneven 
> disk allocation for the topic dimension.
>  In order to solve this problem, we propose an improved strategy: first 
> ensure that the number of partitions of each disk in the topic dimension is 
> even. If the number of partitions of a topic on two disks is equal, then sort 
> according to the total number of partitions on the disk. Select a disk with 
> the least number of partitions to store the current replication.
> !image-2019-07-30-13-40-12-711.png!
> !image-2019-07-30-13-40-49-878.png!
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (KAFKA-8465) Make sure that the copy of the same topic is evenly distributed across a broker's disk.

2019-07-31 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16897787#comment-16897787
 ] 

ASF GitHub Bot commented on KAFKA-8465:
---

lordcheng10 commented on pull request #6866: [KAFKA-8465]replication  strategy 
for the topic dimension
URL: https://github.com/apache/kafka/pull/6866
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Make sure that the copy of the same topic is evenly distributed across a 
> broker's disk.
> ---
>
> Key: KAFKA-8465
> URL: https://issues.apache.org/jira/browse/KAFKA-8465
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 2.0.1, 2.2.0, 2.2.1
>Reporter: ChenLin
>Priority: Major
> Attachments: image-2019-07-30-13-40-12-711.png, 
> image-2019-07-30-13-40-49-878.png, 
> replication_strategy_for_the_topic_dimension.patch
>
>
> When some partiton's replication is assigned to a broker, which disks should 
> these copies be placed on the broker? The original strategy is to allocate 
> according to the number of partiitons。This strategy will result in uneven 
> disk allocation for the topic dimension.
>  In order to solve this problem, we propose an improved strategy: first 
> ensure that the number of partitions of each disk in the topic dimension is 
> even. If the number of partitions of a topic on two disks is equal, then sort 
> according to the total number of partitions on the disk. Select a disk with 
> the least number of partitions to store the current replication.
> !image-2019-07-30-13-40-12-711.png!
> !image-2019-07-30-13-40-49-878.png!
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (KAFKA-8465) Make sure that the copy of the same topic is evenly distributed across a broker's disk.

2019-07-31 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16897786#comment-16897786
 ] 

ASF GitHub Bot commented on KAFKA-8465:
---

lordcheng10 commented on pull request #6866: [KAFKA-8465]replication  strategy 
for the topic dimension
URL: https://github.com/apache/kafka/pull/6866
 
 
   When some partiton's replication is assigned to a broker, which disks 
should these copies be placed on the broker? The original strategy is to 
allocate according to the number of partiitons, but this will cause a partiton 
with too many topics to be stored on a disk, which may cause disk hotspot 
problems.
   In order to solve this problem, we propose an improved strategy: first 
ensure that the number of partitions of each disk in the topic dimension is 
even. If the number of partitions of a topic on two disks is equal, then sort 
according to the total number of partitions on the disk. Select a disk with the 
least number of partitions to store the current replication.
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Make sure that the copy of the same topic is evenly distributed across a 
> broker's disk.
> ---
>
> Key: KAFKA-8465
> URL: https://issues.apache.org/jira/browse/KAFKA-8465
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 2.0.1, 2.2.0, 2.2.1
>Reporter: ChenLin
>Priority: Major
> Attachments: image-2019-07-30-13-40-12-711.png, 
> image-2019-07-30-13-40-49-878.png, 
> replication_strategy_for_the_topic_dimension.patch
>
>
> When some partiton's replication is assigned to a broker, which disks should 
> these copies be placed on the broker? The original strategy is to allocate 
> according to the number of partiitons。This strategy will result in uneven 
> disk allocation for the topic dimension.
>  In order to solve this problem, we propose an improved strategy: first 
> ensure that the number of partitions of each disk in the topic dimension is 
> even. If the number of partitions of a topic on two disks is equal, then sort 
> according to the total number of partitions on the disk. Select a disk with 
> the least number of partitions to store the current replication.
> !image-2019-07-30-13-40-12-711.png!
> !image-2019-07-30-13-40-49-878.png!
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (KAFKA-8729) Augment ProduceResponse error messaging for specific culprit records

2019-07-31 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16897561#comment-16897561
 ] 

ASF GitHub Bot commented on KAFKA-8729:
---

tuvtran commented on pull request #7142: KAFKA-8729, pt 1: Add 4 new metrics to 
keep track of various types of invalid record rejections
URL: https://github.com/apache/kafka/pull/7142
 
 
   Right now we only have very generic `FailedProduceRequestsPerSec` and 
`FailedFetchRequestsPerSec` metrics that mark whenever a record is failed on 
the broker side. To improve the debugging UX, I added 4 new metrics in 
`BrokerTopicStats` to log various scenarios when an `InvalidRecordException` is 
thrown when `LogValidator` fails to validate a record:
   -- `NoKeyCompactedTopicRecordsPerSec`: counter of failures by compacted 
records with no key
   -- `InvalidMagicNumberRecordsPerSec`: counter of failures by records with 
invalid magic number
   -- `InvalidMessageCrcRecordsPerSec`: counter of failures by records with crc 
corruption
   -- `NonIncreasingOffsetRecordsPerSec`: counter of failures by records with 
invalid offset
   
   *More detailed description of your change,
   if necessary. The PR title and PR message become
   the squashed commit message, so use a separate
   comment to ping reviewers.*
   
   *Summary of testing strategy (including rationale)
   for the feature or bug fix. Unit and/or integration
   tests are expected for any behaviour change and
   system tests should be considered for larger changes.*
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Augment ProduceResponse error messaging for specific culprit records
> 
>
> Key: KAFKA-8729
> URL: https://issues.apache.org/jira/browse/KAFKA-8729
> Project: Kafka
>  Issue Type: Improvement
>  Components: clients, core
>Reporter: Guozhang Wang
>Assignee: Tu Tran
>Priority: Major
>
> 1. We should replace the misleading CORRUPT_RECORD error code with a new 
> INVALID_RECORD.
> 2. We should augment the ProduceResponse with customizable error message and 
> indicators of culprit records.
> 3. We should change the client-side handling logic of non-retriable 
> INVALID_RECORD to re-batch the records.
> Details see: 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-467%3A+Augment+ProduceResponse+error+messaging+for+specific+culprit+records



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Comment Edited] (KAFKA-8731) InMemorySessionStore throws NullPointerException on startup

2019-07-31 Thread Bill Bejeck (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16897546#comment-16897546
 ] 

Bill Bejeck edited comment on KAFKA-8731 at 7/31/19 9:33 PM:
-

merged to trunk and cherry-picked to 2.3


was (Author: bbejeck):
cherry-picked to 2.3

> InMemorySessionStore throws NullPointerException on startup
> ---
>
> Key: KAFKA-8731
> URL: https://issues.apache.org/jira/browse/KAFKA-8731
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 2.3.0
>Reporter: Jonathan Gordon
>Assignee: Sophie Blee-Goldman
>Priority: Blocker
> Fix For: 2.4.0, 2.3.1
>
>
> I receive a NullPointerException on startup when trying to use the new 
> InMemorySessionStore via Stores.inMemorySessionStore(...) using the DSL.
> Here's the stack trace:
> {{ERROR [2019-07-29 21:56:52,246] 
> org.apache.kafka.streams.processor.internals.StreamThread: stream-thread 
> [trace_indexer-c8439020-12af-4db2-ad56-3e58cd56540f-StreamThread-1] 
> Encountered the following error during processing:}}
> {{! java.lang.NullPointerException: null}}
> {{! at 
> org.apache.kafka.streams.state.internals.InMemorySessionStore.remove(InMemorySessionStore.java:123)}}
> {{! at 
> org.apache.kafka.streams.state.internals.InMemorySessionStore.put(InMemorySessionStore.java:115)}}
> {{! at 
> org.apache.kafka.streams.state.internals.InMemorySessionStore.lambda$init$0(InMemorySessionStore.java:93)}}
> {{! at 
> org.apache.kafka.streams.processor.internals.StateRestoreCallbackAdapter.lambda$adapt$1(StateRestoreCallbackAdapter.java:47)}}
> {{! at 
> org.apache.kafka.streams.processor.internals.CompositeRestoreListener.restoreBatch(CompositeRestoreListener.java:89)}}
> {{! at 
> org.apache.kafka.streams.processor.internals.StateRestorer.restore(StateRestorer.java:92)}}
> {{! at 
> org.apache.kafka.streams.processor.internals.StoreChangelogReader.processNext(StoreChangelogReader.java:317)}}
> {{! at 
> org.apache.kafka.streams.processor.internals.StoreChangelogReader.restore(StoreChangelogReader.java:92)}}
> {{! at 
> org.apache.kafka.streams.processor.internals.TaskManager.updateNewAndRestoringTasks(TaskManager.java:328)}}
> {{! at 
> org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:867)}}
> {{! at 
> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:805)}}
> {{! at 
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:774)}}
>  
> Here's the Slack thread:
> [https://confluentcommunity.slack.com/archives/C48AHTCUQ/p1564438647169600]
>  
> Here's a current PR aimed at fixing the issue:
> [https://github.com/apache/kafka/pull/7132]
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (KAFKA-8729) Augment ProduceResponse error messaging for specific culprit records

2019-07-31 Thread Guozhang Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-8729:
-
Issue Type: Improvement  (was: Bug)

> Augment ProduceResponse error messaging for specific culprit records
> 
>
> Key: KAFKA-8729
> URL: https://issues.apache.org/jira/browse/KAFKA-8729
> Project: Kafka
>  Issue Type: Improvement
>  Components: clients, core
>Reporter: Guozhang Wang
>Assignee: Tu Tran
>Priority: Major
>
> 1. We should replace the misleading CORRUPT_RECORD error code with a new 
> INVALID_RECORD.
> 2. We should augment the ProduceResponse with customizable error message and 
> indicators of culprit records.
> 3. We should change the client-side handling logic of non-retriable 
> INVALID_RECORD to re-batch the records.
> Details see: 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-467%3A+Augment+ProduceResponse+error+messaging+for+specific+culprit+records



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (KAFKA-8731) InMemorySessionStore throws NullPointerException on startup

2019-07-31 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16897533#comment-16897533
 ] 

ASF GitHub Bot commented on KAFKA-8731:
---

bbejeck commented on pull request #7132: KAFKA-8731: InMemorySessionStore 
throws NullPointerException on startup
URL: https://github.com/apache/kafka/pull/7132
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> InMemorySessionStore throws NullPointerException on startup
> ---
>
> Key: KAFKA-8731
> URL: https://issues.apache.org/jira/browse/KAFKA-8731
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 2.3.0
>Reporter: Jonathan Gordon
>Assignee: Sophie Blee-Goldman
>Priority: Blocker
> Fix For: 2.4.0, 2.3.1
>
>
> I receive a NullPointerException on startup when trying to use the new 
> InMemorySessionStore via Stores.inMemorySessionStore(...) using the DSL.
> Here's the stack trace:
> {{ERROR [2019-07-29 21:56:52,246] 
> org.apache.kafka.streams.processor.internals.StreamThread: stream-thread 
> [trace_indexer-c8439020-12af-4db2-ad56-3e58cd56540f-StreamThread-1] 
> Encountered the following error during processing:}}
> {{! java.lang.NullPointerException: null}}
> {{! at 
> org.apache.kafka.streams.state.internals.InMemorySessionStore.remove(InMemorySessionStore.java:123)}}
> {{! at 
> org.apache.kafka.streams.state.internals.InMemorySessionStore.put(InMemorySessionStore.java:115)}}
> {{! at 
> org.apache.kafka.streams.state.internals.InMemorySessionStore.lambda$init$0(InMemorySessionStore.java:93)}}
> {{! at 
> org.apache.kafka.streams.processor.internals.StateRestoreCallbackAdapter.lambda$adapt$1(StateRestoreCallbackAdapter.java:47)}}
> {{! at 
> org.apache.kafka.streams.processor.internals.CompositeRestoreListener.restoreBatch(CompositeRestoreListener.java:89)}}
> {{! at 
> org.apache.kafka.streams.processor.internals.StateRestorer.restore(StateRestorer.java:92)}}
> {{! at 
> org.apache.kafka.streams.processor.internals.StoreChangelogReader.processNext(StoreChangelogReader.java:317)}}
> {{! at 
> org.apache.kafka.streams.processor.internals.StoreChangelogReader.restore(StoreChangelogReader.java:92)}}
> {{! at 
> org.apache.kafka.streams.processor.internals.TaskManager.updateNewAndRestoringTasks(TaskManager.java:328)}}
> {{! at 
> org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:867)}}
> {{! at 
> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:805)}}
> {{! at 
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:774)}}
>  
> Here's the Slack thread:
> [https://confluentcommunity.slack.com/archives/C48AHTCUQ/p1564438647169600]
>  
> Here's a current PR aimed at fixing the issue:
> [https://github.com/apache/kafka/pull/7132]
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Resolved] (KAFKA-8704) Add PartitionAssignor adapter for backwards compatibility

2019-07-31 Thread Guozhang Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang resolved KAFKA-8704.
--
   Resolution: Fixed
Fix Version/s: 2.4.0

> Add PartitionAssignor adapter for backwards compatibility
> -
>
> Key: KAFKA-8704
> URL: https://issues.apache.org/jira/browse/KAFKA-8704
> Project: Kafka
>  Issue Type: Sub-task
>  Components: clients
>Reporter: Sophie Blee-Goldman
>Assignee: Sophie Blee-Goldman
>Priority: Major
> Fix For: 2.4.0
>
>
> As part of KIP-429, we are deprecating the old 
> consumer.internal.PartitionAssignor in favor of a [new 
> consumer.PartitionAssignor|https://issues.apache.org/jira/browse/KAFKA-8703] 
> interface  that is part of the public API.
>  
> Although the old PartitionAssignor was technically part of the internal 
> package, some users may have implemented it and this change will break source 
> compatibility for them as they would need to modify their class to implement 
> the new interface. The number of users affected may be small, but nonetheless 
> we would like to add an adapter to maintain compatibility for these users.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (KAFKA-8179) Incremental Rebalance Protocol for Kafka Consumer

2019-07-31 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16897521#comment-16897521
 ] 

ASF GitHub Bot commented on KAFKA-8179:
---

guozhangwang commented on pull request #7110:  KAFKA-8179: 
PartitionAssignorAdapter
URL: https://github.com/apache/kafka/pull/7110
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Incremental Rebalance Protocol for Kafka Consumer
> -
>
> Key: KAFKA-8179
> URL: https://issues.apache.org/jira/browse/KAFKA-8179
> Project: Kafka
>  Issue Type: Improvement
>  Components: consumer
>Reporter: Guozhang Wang
>Assignee: Guozhang Wang
>Priority: Major
>
> Recently Kafka community is promoting cooperative rebalancing to mitigate the 
> pain points in the stop-the-world rebalancing protocol. This ticket is 
> created to initiate that idea at the Kafka consumer client, which will be 
> beneficial for heavy-stateful consumers such as Kafka Streams applications.
> In short, the scope of this ticket includes reducing unnecessary rebalance 
> latency due to heavy partition migration: i.e. partitions being revoked and 
> re-assigned. This would make the built-in consumer assignors (range, 
> round-robin etc) to be aware of previously assigned partitions and be sticky 
> in best-effort.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (KAFKA-8677) Flakey test GroupEndToEndAuthorizationTest#testNoDescribeProduceOrConsumeWithoutTopicDescribeAcl

2019-07-31 Thread Bruno Cadonna (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16897497#comment-16897497
 ] 

Bruno Cadonna commented on KAFKA-8677:
--

https://builds.apache.org/job/kafka-pr-jdk11-scala2.12/6624/testReport/junit/kafka.api/SaslGssapiSslEndToEndAuthorizationTest/testNoDescribeProduceOrConsumeWithoutTopicDescribeAcl/

> Flakey test 
> GroupEndToEndAuthorizationTest#testNoDescribeProduceOrConsumeWithoutTopicDescribeAcl
> 
>
> Key: KAFKA-8677
> URL: https://issues.apache.org/jira/browse/KAFKA-8677
> Project: Kafka
>  Issue Type: Bug
>  Components: core, security, unit tests
>Affects Versions: 2.4.0
>Reporter: Boyang Chen
>Priority: Major
>  Labels: flaky-test
>
> [https://builds.apache.org/job/kafka-pr-jdk11-scala2.12/6325/console]
>  
> *18:43:39* kafka.api.GroupEndToEndAuthorizationTest > 
> testNoDescribeProduceOrConsumeWithoutTopicDescribeAcl STARTED*18:44:00* 
> kafka.api.GroupEndToEndAuthorizationTest.testNoDescribeProduceOrConsumeWithoutTopicDescribeAcl
>  failed, log available in 
> /home/jenkins/jenkins-slave/workspace/kafka-pr-jdk11-scala2.12/core/build/reports/testOutput/kafka.api.GroupEndToEndAuthorizationTest.testNoDescribeProduceOrConsumeWithoutTopicDescribeAcl.test.stdout*18:44:00*
>  *18:44:00* kafka.api.GroupEndToEndAuthorizationTest > 
> testNoDescribeProduceOrConsumeWithoutTopicDescribeAcl FAILED*18:44:00* 
> org.scalatest.exceptions.TestFailedException: Consumed 0 records before 
> timeout instead of the expected 1 records



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (KAFKA-8555) Flaky test ExampleConnectIntegrationTest#testSourceConnector

2019-07-31 Thread Bruno Cadonna (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16897500#comment-16897500
 ] 

Bruno Cadonna commented on KAFKA-8555:
--

https://builds.apache.org/job/kafka-pr-jdk11-scala2.13/672/testReport/junit/org.apache.kafka.connect.integration/ExampleConnectIntegrationTest/testSourceConnector/

> Flaky test ExampleConnectIntegrationTest#testSourceConnector
> 
>
> Key: KAFKA-8555
> URL: https://issues.apache.org/jira/browse/KAFKA-8555
> Project: Kafka
>  Issue Type: Bug
>Reporter: Boyang Chen
>Priority: Major
> Attachments: log-job139.txt, log-job141.txt, log-job23145.txt, 
> log-job23215.txt, log-job6046.txt
>
>
> [https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/22798/console]
> *02:03:21* 
> org.apache.kafka.connect.integration.ExampleConnectIntegrationTest.testSourceConnector
>  failed, log available in 
> /home/jenkins/jenkins-slave/workspace/kafka-pr-jdk8-scala2.11/connect/runtime/build/reports/testOutput/org.apache.kafka.connect.integration.ExampleConnectIntegrationTest.testSourceConnector.test.stdout*02:03:21*
>  *02:03:21* 
> org.apache.kafka.connect.integration.ExampleConnectIntegrationTest > 
> testSourceConnector FAILED*02:03:21* 
> org.apache.kafka.connect.errors.DataException: Insufficient records committed 
> by connector simple-conn in 15000 millis. Records expected=2000, 
> actual=1013*02:03:21* at 
> org.apache.kafka.connect.integration.ConnectorHandle.awaitCommits(ConnectorHandle.java:188)*02:03:21*
>  at 
> org.apache.kafka.connect.integration.ExampleConnectIntegrationTest.testSourceConnector(ExampleConnectIntegrationTest.java:181)*02:03:21*



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (KAFKA-8555) Flaky test ExampleConnectIntegrationTest#testSourceConnector

2019-07-31 Thread Bruno Cadonna (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16897496#comment-16897496
 ] 

Bruno Cadonna commented on KAFKA-8555:
--

https://builds.apache.org/job/kafka-pr-jdk11-scala2.12/6624/testReport/junit/org.apache.kafka.connect.integration/ExampleConnectIntegrationTest/testSourceConnector/

> Flaky test ExampleConnectIntegrationTest#testSourceConnector
> 
>
> Key: KAFKA-8555
> URL: https://issues.apache.org/jira/browse/KAFKA-8555
> Project: Kafka
>  Issue Type: Bug
>Reporter: Boyang Chen
>Priority: Major
> Attachments: log-job139.txt, log-job141.txt, log-job23145.txt, 
> log-job23215.txt, log-job6046.txt
>
>
> [https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/22798/console]
> *02:03:21* 
> org.apache.kafka.connect.integration.ExampleConnectIntegrationTest.testSourceConnector
>  failed, log available in 
> /home/jenkins/jenkins-slave/workspace/kafka-pr-jdk8-scala2.11/connect/runtime/build/reports/testOutput/org.apache.kafka.connect.integration.ExampleConnectIntegrationTest.testSourceConnector.test.stdout*02:03:21*
>  *02:03:21* 
> org.apache.kafka.connect.integration.ExampleConnectIntegrationTest > 
> testSourceConnector FAILED*02:03:21* 
> org.apache.kafka.connect.errors.DataException: Insufficient records committed 
> by connector simple-conn in 15000 millis. Records expected=2000, 
> actual=1013*02:03:21* at 
> org.apache.kafka.connect.integration.ConnectorHandle.awaitCommits(ConnectorHandle.java:188)*02:03:21*
>  at 
> org.apache.kafka.connect.integration.ExampleConnectIntegrationTest.testSourceConnector(ExampleConnectIntegrationTest.java:181)*02:03:21*



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (KAFKA-8260) Flaky test ConsumerBounceTest#testRollingBrokerRestartsWithSmallerMaxGroupSizeConfigDisruptsBigGroup

2019-07-31 Thread Bill Bejeck (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16897416#comment-16897416
 ] 

Bill Bejeck commented on KAFKA-8260:


Failed again on [https://builds.apache.org/job/kafka-pr-jdk11-scala2.13/671/]

 
{noformat}
Error Message
org.scalatest.exceptions.TestFailedException: The remaining consumers in the 
group could not fetch the expected records
Stacktrace
org.scalatest.exceptions.TestFailedException: The remaining consumers in the 
group could not fetch the expected records
at 
org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:530)
at 
org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:529)
at 
org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1389)
at org.scalatest.Assertions.fail(Assertions.scala:1091)
at org.scalatest.Assertions.fail$(Assertions.scala:1087)
at org.scalatest.Assertions$.fail(Assertions.scala:1389)
at kafka.utils.TestUtils$.waitUntilTrue(TestUtils.scala:822)
at 
kafka.api.ConsumerBounceTest.testRollingBrokerRestartsWithSmallerMaxGroupSizeConfigDisruptsBigGroup(ConsumerBounceTest.scala:330)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:305)
at 
org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:365)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
at org.junit.runners.ParentRunner$4.run(ParentRunner.java:330)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:78)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:328)
at org.junit.runners.ParentRunner.access$100(ParentRunner.java:65)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:292)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:305)
at org.junit.runners.ParentRunner.run(ParentRunner.java:412)
at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:110)
at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58)
at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38)
at 
org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:62)
at 
org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51)
at jdk.internal.reflect.GeneratedMethodAccessor10.invoke(Unknown Source)
at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at 
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
at 
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
at 
org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:32)
at 
org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:93)
at com.sun.proxy.$Proxy2.processTestClass(Unknown Source)
at 
org.gradle.api.internal.tasks.testing.worker.TestWorker.processTestClass(TestWorker.java:118)
at jdk.internal.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
at 
java.base/jdk.internal.reflect.Delegating

[jira] [Commented] (KAFKA-8541) Flakey test LeaderElectionCommandTest#testTopicPartition

2019-07-31 Thread Bill Bejeck (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16897410#comment-16897410
 ] 

Bill Bejeck commented on KAFKA-8541:


Failed again on [https://builds.apache.org/job/kafka-pr-jdk11-scala2.13/671/]

 
{noformat}
Error Message
kafka.common.AdminCommandFailedException: Timeout waiting for election results
Stacktrace
kafka.common.AdminCommandFailedException: Timeout waiting for election results
at 
kafka.admin.LeaderElectionCommand$.electLeaders(LeaderElectionCommand.scala:133)
at 
kafka.admin.LeaderElectionCommand$.run(LeaderElectionCommand.scala:88)
at 
kafka.admin.LeaderElectionCommand$.main(LeaderElectionCommand.scala:41)
at 
kafka.admin.LeaderElectionCommandTest.$anonfun$testTopicPartition$1(LeaderElectionCommandTest.scala:127)
at 
kafka.admin.LeaderElectionCommandTest.$anonfun$testTopicPartition$1$adapted(LeaderElectionCommandTest.scala:105)
at kafka.utils.TestUtils$.resource(TestUtils.scala:1588)
at 
kafka.admin.LeaderElectionCommandTest.testTopicPartition(LeaderElectionCommandTest.scala:105)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:305)
at 
org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:365)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
at org.junit.runners.ParentRunner$4.run(ParentRunner.java:330)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:78)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:328)
at org.junit.runners.ParentRunner.access$100(ParentRunner.java:65)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:292)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:305)
at org.junit.runners.ParentRunner.run(ParentRunner.java:412)
at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:110)
at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58)
at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38)
at 
org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:62)
at 
org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51)
at jdk.internal.reflect.GeneratedMethodAccessor10.invoke(Unknown Source)
at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at 
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
at 
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
at 
org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:32)
at 
org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:93)
at com.sun.proxy.$Proxy2.processTestClass(Unknown Source)
at 
org.gradle.api.internal.tasks.testing.worker.TestWorker.processTestClass(TestWorker.java:118)
at jdk.internal.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.

[jira] [Commented] (KAFKA-8672) RebalanceSourceConnectorsIntegrationTest#testReconfigConnector

2019-07-31 Thread Matthias J. Sax (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16897374#comment-16897374
 ] 

Matthias J. Sax commented on KAFKA-8672:


[https://builds.apache.org/job/kafka-pr-jdk11-scala2.12/6606/testReport/junit/org.apache.kafka.connect.integration/RebalanceSourceConnectorsIntegrationTest/testReconfigConnector/]

> RebalanceSourceConnectorsIntegrationTest#testReconfigConnector
> --
>
> Key: KAFKA-8672
> URL: https://issues.apache.org/jira/browse/KAFKA-8672
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 2.4.0
>Reporter: Matthias J. Sax
>Priority: Critical
>  Labels: flaky-test
>
> [https://builds.apache.org/job/kafka-pr-jdk11-scala2.12/6281/testReport/junit/org.apache.kafka.connect.integration/RebalanceSourceConnectorsIntegrationTest/testReconfigConnector/]
> {quote}java.lang.RuntimeException: Could not find enough records. found 33, 
> expected 100 at 
> org.apache.kafka.connect.util.clusters.EmbeddedKafkaCluster.consume(EmbeddedKafkaCluster.java:306)
>  at 
> org.apache.kafka.connect.integration.RebalanceSourceConnectorsIntegrationTest.testReconfigConnector(RebalanceSourceConnectorsIntegrationTest.java:180){quote}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (KAFKA-8661) Flaky Test RebalanceSourceConnectorsIntegrationTest#testStartTwoConnectors

2019-07-31 Thread Matthias J. Sax (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16897373#comment-16897373
 ] 

Matthias J. Sax commented on KAFKA-8661:


[https://builds.apache.org/job/kafka-pr-jdk11-scala2.12/6606/testReport/junit/org.apache.kafka.connect.integration/RebalanceSourceConnectorsIntegrationTest/testStartTwoConnectors/]

and

[https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/23793/testReport/junit/org.apache.kafka.connect.integration/RebalanceSourceConnectorsIntegrationTest/testStartTwoConnectors/]

> Flaky Test RebalanceSourceConnectorsIntegrationTest#testStartTwoConnectors
> --
>
> Key: KAFKA-8661
> URL: https://issues.apache.org/jira/browse/KAFKA-8661
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect, unit tests
>Affects Versions: 2.3.0
>Reporter: Matthias J. Sax
>Priority: Critical
>  Labels: flaky-test
> Fix For: 2.4.0, 2.3.1
>
>
> [https://builds.apache.org/job/kafka-pr-jdk11-scala2.13/224/testReport/junit/org.apache.kafka.connect.integration/RebalanceSourceConnectorsIntegrationTest/testStartTwoConnectors/]
> {quote}java.lang.AssertionError: Condition not met within timeout 3. 
> Connector tasks did not start in time. at 
> org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:376) at 
> org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:353) at 
> org.apache.kafka.connect.integration.RebalanceSourceConnectorsIntegrationTest.testStartTwoConnectors(RebalanceSourceConnectorsIntegrationTest.java:120){quote}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (KAFKA-8465) Make sure that the copy of the same topic is evenly distributed across a broker's disk.

2019-07-31 Thread ChenLin (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ChenLin updated KAFKA-8465:
---
Summary: Make sure that the copy of the same topic is evenly distributed 
across a broker's disk.  (was: Copying strategy for the topic dimension)

> Make sure that the copy of the same topic is evenly distributed across a 
> broker's disk.
> ---
>
> Key: KAFKA-8465
> URL: https://issues.apache.org/jira/browse/KAFKA-8465
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 2.0.1, 2.2.0, 2.2.1
>Reporter: ChenLin
>Priority: Major
> Attachments: image-2019-07-30-13-40-12-711.png, 
> image-2019-07-30-13-40-49-878.png, 
> replication_strategy_for_the_topic_dimension.patch
>
>
> When some partiton's replication is assigned to a broker, which disks should 
> these copies be placed on the broker? The original strategy is to allocate 
> according to the number of partiitons。This strategy will result in uneven 
> disk allocation for the topic dimension.
>  In order to solve this problem, we propose an improved strategy: first 
> ensure that the number of partitions of each disk in the topic dimension is 
> even. If the number of partitions of a topic on two disks is equal, then sort 
> according to the total number of partitions on the disk. Select a disk with 
> the least number of partitions to store the current replication.
> !image-2019-07-30-13-40-12-711.png!
> !image-2019-07-30-13-40-49-878.png!
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Resolved] (KAFKA-8688) Upgrade system tests fail due to data loss with older message format

2019-07-31 Thread Rajini Sivaram (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajini Sivaram resolved KAFKA-8688.
---
   Resolution: Fixed
 Reviewer: Ismael Juma
Fix Version/s: 2.4.0

> Upgrade system tests fail due to data loss with older message format
> 
>
> Key: KAFKA-8688
> URL: https://issues.apache.org/jira/browse/KAFKA-8688
> Project: Kafka
>  Issue Type: Bug
>  Components: system tests
>Reporter: Rajini Sivaram
>Assignee: Rajini Sivaram
>Priority: Major
> Fix For: 2.4.0
>
>
> System test failure for TestUpgrade/test_upgrade: from_kafka_version=0.9.0.1, 
> to_message_format_version=0.9.0.1, compression_types=.lz4
> {code:java}
> 3 acked message did not make it to the Consumer. They are: [33906, 33900, 
> 33903]. The first 3 missing messages were validated to ensure they are in 
> Kafka's data files. 3 were missing. This suggests data loss. Here are some of 
> the messages not found in the data files: [33906, 33900, 33903]
> Traceback (most recent call last):
>   File 
> "/home/jenkins/workspace/system-test-kafka_5.3.x/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.7.5-py2.7.egg/ducktape/tests/runner_client.py",
>  line 132, in run
>     data = self.run_test()
>   File 
> "/home/jenkins/workspace/system-test-kafka_5.3.x/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.7.5-py2.7.egg/ducktape/tests/runner_client.py",
>  line 189, in run_test
>     return self.test_context.function(self.test)
>   File 
> "/home/jenkins/workspace/system-test-kafka_5.3.x/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.7.5-py2.7.egg/ducktape/mark/_mark.py",
>  line 428, in wrapper
>     return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
>   File 
> "/home/jenkins/workspace/system-test-kafka_5.3.x/kafka/tests/kafkatest/tests/core/upgrade_test.py",
>  line 136, in test_upgrade
>     self.run_produce_consume_validate(core_test_action=lambda: 
> self.perform_upgrade(from_kafka_version,
>   File 
> "/home/jenkins/workspace/system-test-kafka_5.3.x/kafka/tests/kafkatest/tests/produce_consume_validate.py",
>  line 112, in run_produce_consume_validate
>     self.validate()
>   File 
> "/home/jenkins/workspace/system-test-kafka_5.3.x/kafka/tests/kafkatest/tests/produce_consume_validate.py",
>  line 135, in validate
>     assert succeeded, error_msg
> AssertionError: 3 acked message did not make it to the Consumer. They are: 
> [33906, 33900, 33903]. The first 3 missing messages were validated to ensure 
> they are in Kafka's data files. 3 were missing. This suggests data loss. Here 
> are some of the messages not found in the data files: [33906, 33900, 33903]
> {code}
> Logs show:
>  # Broker 1 is leader of partition
>  # Broker 2 successfully fetches from offset 10947 and processes request
>  # Broker 2 sends fetch request to broker 1 for offset 10950
>  # Broker 1 sets is HW to 10950, acknowledges produce requests up to HW
>  # Broker 2 is elected leader
>  # Broker 2 truncates to its local HW of 10947 - 3 messages are lost
> This data loss is a known issue that was fixed under KIP-101. But since this 
> can still happen with older messages formats, we should update upgrade tests 
> to cope with some data loss.
>   



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (KAFKA-8688) Upgrade system tests fail due to data loss with older message format

2019-07-31 Thread Rajini Sivaram (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16897274#comment-16897274
 ] 

Rajini Sivaram commented on KAFKA-8688:
---

PR: https://github.com/apache/kafka/pull/7102

> Upgrade system tests fail due to data loss with older message format
> 
>
> Key: KAFKA-8688
> URL: https://issues.apache.org/jira/browse/KAFKA-8688
> Project: Kafka
>  Issue Type: Bug
>  Components: system tests
>Reporter: Rajini Sivaram
>Assignee: Rajini Sivaram
>Priority: Major
>
> System test failure for TestUpgrade/test_upgrade: from_kafka_version=0.9.0.1, 
> to_message_format_version=0.9.0.1, compression_types=.lz4
> {code:java}
> 3 acked message did not make it to the Consumer. They are: [33906, 33900, 
> 33903]. The first 3 missing messages were validated to ensure they are in 
> Kafka's data files. 3 were missing. This suggests data loss. Here are some of 
> the messages not found in the data files: [33906, 33900, 33903]
> Traceback (most recent call last):
>   File 
> "/home/jenkins/workspace/system-test-kafka_5.3.x/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.7.5-py2.7.egg/ducktape/tests/runner_client.py",
>  line 132, in run
>     data = self.run_test()
>   File 
> "/home/jenkins/workspace/system-test-kafka_5.3.x/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.7.5-py2.7.egg/ducktape/tests/runner_client.py",
>  line 189, in run_test
>     return self.test_context.function(self.test)
>   File 
> "/home/jenkins/workspace/system-test-kafka_5.3.x/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.7.5-py2.7.egg/ducktape/mark/_mark.py",
>  line 428, in wrapper
>     return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
>   File 
> "/home/jenkins/workspace/system-test-kafka_5.3.x/kafka/tests/kafkatest/tests/core/upgrade_test.py",
>  line 136, in test_upgrade
>     self.run_produce_consume_validate(core_test_action=lambda: 
> self.perform_upgrade(from_kafka_version,
>   File 
> "/home/jenkins/workspace/system-test-kafka_5.3.x/kafka/tests/kafkatest/tests/produce_consume_validate.py",
>  line 112, in run_produce_consume_validate
>     self.validate()
>   File 
> "/home/jenkins/workspace/system-test-kafka_5.3.x/kafka/tests/kafkatest/tests/produce_consume_validate.py",
>  line 135, in validate
>     assert succeeded, error_msg
> AssertionError: 3 acked message did not make it to the Consumer. They are: 
> [33906, 33900, 33903]. The first 3 missing messages were validated to ensure 
> they are in Kafka's data files. 3 were missing. This suggests data loss. Here 
> are some of the messages not found in the data files: [33906, 33900, 33903]
> {code}
> Logs show:
>  # Broker 1 is leader of partition
>  # Broker 2 successfully fetches from offset 10947 and processes request
>  # Broker 2 sends fetch request to broker 1 for offset 10950
>  # Broker 1 sets is HW to 10950, acknowledges produce requests up to HW
>  # Broker 2 is elected leader
>  # Broker 2 truncates to its local HW of 10947 - 3 messages are lost
> This data loss is a known issue that was fixed under KIP-101. But since this 
> can still happen with older messages formats, we should update upgrade tests 
> to cope with some data loss.
>   



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Comment Edited] (KAFKA-8735) BrokerMetadataCheckPoint should check metadata.properties existence itself

2019-07-31 Thread Qinghui Xu (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16897153#comment-16897153
 ] 

Qinghui Xu edited comment on KAFKA-8735 at 7/31/19 1:03 PM:


Errors during the tests:
{code:java}
2019-07-30 15:36:31 ERROR BrokerMetadataCheckpoint:74 - Failed to read 
meta.properties file under dir 
/var/folders/h9/msx_bvyj4x1cmcc6wmndg0y416l7n9/T/junit4372466223918887888/junit4263066398108675507/meta.properties
 due to 
/var/folders/h9/msx_bvyj4x1cmcc6wmndg0y416l7n9/T/junit4372466223918887888/junit4263066398108675507/meta.properties
2019-07-30 15:36:31 ERROR KafkaServer:76 - Fail to read meta.properties under 
log directory 
/var/folders/h9/msx_bvyj4x1cmcc6wmndg0y416l7n9/T/junit4372466223918887888/junit4263066398108675507
java.nio.file.NoSuchFileException: 
/var/folders/h9/msx_bvyj4x1cmcc6wmndg0y416l7n9/T/junit4372466223918887888/junit4263066398108675507/meta.properties
at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
at 
sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:214)
at java.nio.file.Files.newByteChannel(Files.java:361)
at java.nio.file.Files.newByteChannel(Files.java:407)
at 
java.nio.file.spi.FileSystemProvider.newInputStream(FileSystemProvider.java:384)
at java.nio.file.Files.newInputStream(Files.java:152)
at org.apache.kafka.common.utils.Utils.loadProps(Utils.java:574)
at 
kafka.server.BrokerMetadataCheckpoint.liftedTree2$1(BrokerMetadataCheckpoint.scala:63)
at kafka.server.BrokerMetadataCheckpoint.read(BrokerMetadataCheckpoint.scala:62)
at 
kafka.server.KafkaServer$$anonfun$getBrokerIdAndOfflineDirs$1.apply(KafkaServer.scala:668)
at 
kafka.server.KafkaServer$$anonfun$getBrokerIdAndOfflineDirs$1.apply(KafkaServer.scala:666)
at 
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:35)
at kafka.server.KafkaServer.getBrokerIdAndOfflineDirs(KafkaServer.scala:666)
at kafka.server.KafkaServer.startup(KafkaServer.scala:209)
at kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:38)
...
at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:46)
at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
at org.junit.rules.RunRules.evaluate(RunRules.java:20)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:110)
at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58)
at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38)
at 
org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:62)
at 
org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
at 
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
at 
org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:32)
at 
org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:93)
at com.sun.proxy.$Proxy2.processTestClass(Unknown Source)
at 
org.gradle.api.internal.tasks.testing.worker.TestWorker.processTestClass(TestWorker.java:118)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java

[jira] [Commented] (KAFKA-8735) BrokerMetadataCheckPoint should check metadata.properties existence itself

2019-07-31 Thread Qinghui Xu (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16897153#comment-16897153
 ] 

Qinghui Xu commented on KAFKA-8735:
---

Errors during the tests:
{code:java}
2019-07-30 15:36:31 ERROR BrokerMetadataCheckpoint:74 - Failed to read 
meta.properties file under dir 
/var/folders/h9/msx_bvyj4x1cmcc6wmndg0y416l7n9/T/junit4372466223918887888/junit4263066398108675507/meta.properties
 due to 
/var/folders/h9/msx_bvyj4x1cmcc6wmndg0y416l7n9/T/junit4372466223918887888/junit4263066398108675507/meta.properties
2019-07-30 15:36:31 ERROR KafkaServer:76 - Fail to read meta.properties under 
log directory 
/var/folders/h9/msx_bvyj4x1cmcc6wmndg0y416l7n9/T/junit4372466223918887888/junit4263066398108675507
java.nio.file.NoSuchFileException: 
/var/folders/h9/msx_bvyj4x1cmcc6wmndg0y416l7n9/T/junit4372466223918887888/junit4263066398108675507/meta.properties
at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
at 
sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:214)
at java.nio.file.Files.newByteChannel(Files.java:361)
at java.nio.file.Files.newByteChannel(Files.java:407)
at 
java.nio.file.spi.FileSystemProvider.newInputStream(FileSystemProvider.java:384)
at java.nio.file.Files.newInputStream(Files.java:152)
at org.apache.kafka.common.utils.Utils.loadProps(Utils.java:574)
at 
kafka.server.BrokerMetadataCheckpoint.liftedTree2$1(BrokerMetadataCheckpoint.scala:63)
at kafka.server.BrokerMetadataCheckpoint.read(BrokerMetadataCheckpoint.scala:62)
at 
kafka.server.KafkaServer$$anonfun$getBrokerIdAndOfflineDirs$1.apply(KafkaServer.scala:668)
at 
kafka.server.KafkaServer$$anonfun$getBrokerIdAndOfflineDirs$1.apply(KafkaServer.scala:666)
at 
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:35)
at kafka.server.KafkaServer.getBrokerIdAndOfflineDirs(KafkaServer.scala:666)
at kafka.server.KafkaServer.startup(KafkaServer.scala:209)
at kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:38)
...
at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:46)
at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
at org.junit.rules.RunRules.evaluate(RunRules.java:20)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:110)
at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58)
at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38)
at 
org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:62)
at 
org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
at 
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
at 
org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:32)
at 
org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:93)
at com.sun.proxy.$Proxy2.processTestClass(Unknown Source)
at 
org.gradle.api.internal.tasks.testing.worker.TestWorker.processTestClass(TestWorker.java:118)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.gradle.internal.dispatch.Reflection

[jira] [Commented] (KAFKA-8589) Flakey test ResetConsumerGroupOffsetTest#testResetOffsetsExistingTopic

2019-07-31 Thread Bruno Cadonna (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16897081#comment-16897081
 ] 

Bruno Cadonna commented on KAFKA-8589:
--

https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/23807/testReport/junit/kafka.admin/ResetConsumerGroupOffsetTest/testResetOffsetsExistingTopic/

> Flakey test ResetConsumerGroupOffsetTest#testResetOffsetsExistingTopic
> --
>
> Key: KAFKA-8589
> URL: https://issues.apache.org/jira/browse/KAFKA-8589
> Project: Kafka
>  Issue Type: Bug
>  Components: admin, clients, unit tests
>Affects Versions: 2.4.0
>Reporter: Boyang Chen
>Priority: Major
>  Labels: flaky-test
>
> [https://builds.apache.org/job/kafka-pr-jdk11-scala2.12/5724/consoleFull]
> *20:25:15* 
> kafka.admin.ResetConsumerGroupOffsetTest.testResetOffsetsExistingTopic 
> failed, log available in 
> /home/jenkins/jenkins-slave/workspace/kafka-pr-jdk11-scala2.12/core/build/reports/testOutput/kafka.admin.ResetConsumerGroupOffsetTest.testResetOffsetsExistingTopic.test.stdout*20:25:15*
>  *20:25:15* kafka.admin.ResetConsumerGroupOffsetTest > 
> testResetOffsetsExistingTopic FAILED*20:25:15* 
> java.util.concurrent.ExecutionException: 
> org.apache.kafka.common.errors.CoordinatorNotAvailableException: The 
> coordinator is not available.*20:25:15* at 
> org.apache.kafka.common.internals.KafkaFutureImpl.wrapAndThrow(KafkaFutureImpl.java:45)*20:25:15*
>  at 
> org.apache.kafka.common.internals.KafkaFutureImpl.access$000(KafkaFutureImpl.java:32)*20:25:15*
>  at 
> org.apache.kafka.common.internals.KafkaFutureImpl$SingleWaiter.await(KafkaFutureImpl.java:89)*20:25:15*
>  at 
> org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:260)*20:25:15*
>  at 
> kafka.admin.ConsumerGroupCommand$ConsumerGroupService.$anonfun$resetOffsets$1(ConsumerGroupCommand.scala:379)*20:25:15*
>  at 
> scala.collection.TraversableOnce.$anonfun$foldLeft$1(TraversableOnce.scala:160)*20:25:15*
>  at 
> scala.collection.TraversableOnce.$anonfun$foldLeft$1$adapted(TraversableOnce.scala:160)*20:25:15*
>  at scala.collection.Iterator.foreach(Iterator.scala:941)*20:25:15*   
>   at scala.collection.Iterator.foreach$(Iterator.scala:941)*20:25:15* 
> at 
> scala.collection.AbstractIterator.foreach(Iterator.scala:1429)*20:25:15*  
>at scala.collection.IterableLike.foreach(IterableLike.scala:74)*20:25:15*  
>at 
> scala.collection.IterableLike.foreach$(IterableLike.scala:73)*20:25:15*   
>   at scala.collection.AbstractIterable.foreach(Iterable.scala:56)*20:25:15*   
>   at 
> scala.collection.TraversableOnce.foldLeft(TraversableOnce.scala:160)*20:25:15*
>  at 
> scala.collection.TraversableOnce.foldLeft$(TraversableOnce.scala:158)*20:25:15*
>  at 
> scala.collection.AbstractTraversable.foldLeft(Traversable.scala:108)*20:25:15*
>  at 
> kafka.admin.ConsumerGroupCommand$ConsumerGroupService.resetOffsets(ConsumerGroupCommand.scala:377)*20:25:15*
>  at 
> kafka.admin.ResetConsumerGroupOffsetTest.resetOffsets(ResetConsumerGroupOffsetTest.scala:507)*20:25:15*
>  at 
> kafka.admin.ResetConsumerGroupOffsetTest.resetAndAssertOffsets(ResetConsumerGroupOffsetTest.scala:477)*20:25:15*
>  at 
> kafka.admin.ResetConsumerGroupOffsetTest.testResetOffsetsExistingTopic(ResetConsumerGroupOffsetTest.scala:123)*20:25:15*
>  *20:25:15* Caused by:*20:25:15* 
> org.apache.kafka.common.errors.CoordinatorNotAvailableException: The 
> coordinator is not available.*20*



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (KAFKA-8555) Flaky test ExampleConnectIntegrationTest#testSourceConnector

2019-07-31 Thread Bruno Cadonna (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16897079#comment-16897079
 ] 

Bruno Cadonna commented on KAFKA-8555:
--

https://builds.apache.org/job/kafka-pr-jdk11-scala2.13/667/testReport/junit/org.apache.kafka.connect.integration/ExampleConnectIntegrationTest/testSourceConnector/

> Flaky test ExampleConnectIntegrationTest#testSourceConnector
> 
>
> Key: KAFKA-8555
> URL: https://issues.apache.org/jira/browse/KAFKA-8555
> Project: Kafka
>  Issue Type: Bug
>Reporter: Boyang Chen
>Priority: Major
> Attachments: log-job139.txt, log-job141.txt, log-job23145.txt, 
> log-job23215.txt, log-job6046.txt
>
>
> [https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/22798/console]
> *02:03:21* 
> org.apache.kafka.connect.integration.ExampleConnectIntegrationTest.testSourceConnector
>  failed, log available in 
> /home/jenkins/jenkins-slave/workspace/kafka-pr-jdk8-scala2.11/connect/runtime/build/reports/testOutput/org.apache.kafka.connect.integration.ExampleConnectIntegrationTest.testSourceConnector.test.stdout*02:03:21*
>  *02:03:21* 
> org.apache.kafka.connect.integration.ExampleConnectIntegrationTest > 
> testSourceConnector FAILED*02:03:21* 
> org.apache.kafka.connect.errors.DataException: Insufficient records committed 
> by connector simple-conn in 15000 millis. Records expected=2000, 
> actual=1013*02:03:21* at 
> org.apache.kafka.connect.integration.ConnectorHandle.awaitCommits(ConnectorHandle.java:188)*02:03:21*
>  at 
> org.apache.kafka.connect.integration.ExampleConnectIntegrationTest.testSourceConnector(ExampleConnectIntegrationTest.java:181)*02:03:21*



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Comment Edited] (KAFKA-8600) Replace DescribeDelegationToken request/response with automated protocol

2019-07-31 Thread Mickael Maison (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16897047#comment-16897047
 ] 

Mickael Maison edited comment on KAFKA-8600 at 7/31/19 11:19 AM:
-

I already have PRs opened for Renew and Expire, so feel free to review them if 
you have time:
 [https://github.com/apache/kafka/pull/7038]
[https://github.com/apache/kafka/pull/7098]

I've not started looking at this one, so grab it if you want.


was (Author: mimaison):
I already have PRs opened for Renew and Expire, so feel free to review them if 
you have time:
[https://github.com/apache/kafka/pull/7038]
[https://github.com/apache/kafka/pull/7098
]
I've not started looking at this one, so grab it if you want.

> Replace DescribeDelegationToken request/response with automated protocol
> 
>
> Key: KAFKA-8600
> URL: https://issues.apache.org/jira/browse/KAFKA-8600
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Mickael Maison
>Assignee: Viktor Somogyi-Vass
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (KAFKA-8600) Replace DescribeDelegationToken request/response with automated protocol

2019-07-31 Thread Viktor Somogyi-Vass (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16897058#comment-16897058
 ] 

Viktor Somogyi-Vass commented on KAFKA-8600:


Thanks, I'll make some time for the reviews too!

> Replace DescribeDelegationToken request/response with automated protocol
> 
>
> Key: KAFKA-8600
> URL: https://issues.apache.org/jira/browse/KAFKA-8600
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Mickael Maison
>Assignee: Mickael Maison
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Assigned] (KAFKA-8600) Replace DescribeDelegationToken request/response with automated protocol

2019-07-31 Thread Viktor Somogyi-Vass (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Viktor Somogyi-Vass reassigned KAFKA-8600:
--

Assignee: Viktor Somogyi-Vass  (was: Mickael Maison)

> Replace DescribeDelegationToken request/response with automated protocol
> 
>
> Key: KAFKA-8600
> URL: https://issues.apache.org/jira/browse/KAFKA-8600
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Mickael Maison
>Assignee: Viktor Somogyi-Vass
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (KAFKA-8600) Replace DescribeDelegationToken request/response with automated protocol

2019-07-31 Thread Mickael Maison (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16897047#comment-16897047
 ] 

Mickael Maison commented on KAFKA-8600:
---

I already have PRs opened for Renew and Expire, so feel free to review them if 
you have time:
[https://github.com/apache/kafka/pull/7038]
[https://github.com/apache/kafka/pull/7098
]
I've not started looking at this one, so grab it if you want.

> Replace DescribeDelegationToken request/response with automated protocol
> 
>
> Key: KAFKA-8600
> URL: https://issues.apache.org/jira/browse/KAFKA-8600
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Mickael Maison
>Assignee: Mickael Maison
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Resolved] (KAFKA-8722) Data crc check repair

2019-07-31 Thread Jason Gustafson (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson resolved KAFKA-8722.

   Resolution: Fixed
Fix Version/s: (was: 0.10.2.2)
   0.10.2.3

> Data crc check repair
> -
>
> Key: KAFKA-8722
> URL: https://issues.apache.org/jira/browse/KAFKA-8722
> Project: Kafka
>  Issue Type: Improvement
>  Components: log
>Affects Versions: 0.10.2.2
>Reporter: ChenLin
>Priority: Major
> Fix For: 0.10.2.3
>
> Attachments: image-2019-07-27-14-50-08-128.png, 
> image-2019-07-27-14-50-58-300.png, image-2019-07-27-14-56-25-610.png, 
> image-2019-07-27-14-57-06-687.png, image-2019-07-27-15-05-12-565.png, 
> image-2019-07-27-15-06-07-123.png, image-2019-07-27-15-10-21-709.png, 
> image-2019-07-27-15-18-22-716.png, image-2019-07-30-11-39-01-605.png
>
>
> In our production environment, when we consume kafka's topic data in an 
> operating program, we found an error:
> org.apache.kafka.common.KafkaException: Record for partition 
> rl_dqn_debug_example-49 at offset 2911287689 is invalid, cause: Record is 
> corrupt (stored crc = 3580880396, computed crc = 1701403171)
>  at 
> org.apache.kafka.clients.consumer.internals.Fetcher.parseRecord(Fetcher.java:869)
>  at 
> org.apache.kafka.clients.consumer.internals.Fetcher.parseCompletedFetch(Fetcher.java:788)
>  at 
> org.apache.kafka.clients.consumer.internals.Fetcher.fetchedRecords(Fetcher.java:480)
>  at 
> org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1188)
>  at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1046)
>  at kafka.consumer.NewShinyConsumer.receive(BaseConsumer.scala:88)
>  at kafka.tools.ConsoleConsumer$.process(ConsoleConsumer.scala:120)
>  at kafka.tools.ConsoleConsumer$.run(ConsoleConsumer.scala:75)
>  at kafka.tools.ConsoleConsumer$.main(ConsoleConsumer.scala:50)
>  at kafka.tools.ConsoleConsumer.main(ConsoleConsumer.scala)
> At this point we used the kafka.tools.DumpLogSegments tool to parse the disk 
> log file and found that there was indeed dirty data:
> !image-2019-07-27-14-57-06-687.png!
> By looking at the code, I found that in some cases kafka would not verify the 
> data and write it to disk, so we fixed it.
>  We found that when record.offset is not equal to the offset we are 
> expecting, kafka will set the variable inPlaceAssignment to false. When 
> inPlaceAssignment is false, data will not be verified:
> !image-2019-07-27-14-50-58-300.png!
> !image-2019-07-27-14-50-08-128.png!
> Our repairs are as follows:
> !image-2019-07-30-11-39-01-605.png!
> We did a comparative test for this. By modifying the client-side producer 
> code, we made some dirty data. For the original kafka version, it was able to 
> write to the disk normally, but when it was consumed, it was reported, but 
> our repaired version was written. At the time, it can be verified, so this 
> producer write failed:
> !image-2019-07-27-15-05-12-565.png!
> At this time, when the client consumes, an error will be reported:
> !image-2019-07-27-15-06-07-123.png!
> When the kafka server is replaced with the repaired version, the producer 
> will verify that the dirty data is written. The producer failed to write the 
> data this time
> !image-2019-07-27-15-10-21-709.png!



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (KAFKA-8722) Data crc check repair

2019-07-31 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16896978#comment-16896978
 ] 

ASF GitHub Bot commented on KAFKA-8722:
---

hachikuji commented on pull request #7124: KAFKA-8722: Data crc check repair
URL: https://github.com/apache/kafka/pull/7124
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Data crc check repair
> -
>
> Key: KAFKA-8722
> URL: https://issues.apache.org/jira/browse/KAFKA-8722
> Project: Kafka
>  Issue Type: Improvement
>  Components: log
>Affects Versions: 0.10.2.2
>Reporter: ChenLin
>Priority: Major
> Fix For: 0.10.2.2
>
> Attachments: image-2019-07-27-14-50-08-128.png, 
> image-2019-07-27-14-50-58-300.png, image-2019-07-27-14-56-25-610.png, 
> image-2019-07-27-14-57-06-687.png, image-2019-07-27-15-05-12-565.png, 
> image-2019-07-27-15-06-07-123.png, image-2019-07-27-15-10-21-709.png, 
> image-2019-07-27-15-18-22-716.png, image-2019-07-30-11-39-01-605.png
>
>
> In our production environment, when we consume kafka's topic data in an 
> operating program, we found an error:
> org.apache.kafka.common.KafkaException: Record for partition 
> rl_dqn_debug_example-49 at offset 2911287689 is invalid, cause: Record is 
> corrupt (stored crc = 3580880396, computed crc = 1701403171)
>  at 
> org.apache.kafka.clients.consumer.internals.Fetcher.parseRecord(Fetcher.java:869)
>  at 
> org.apache.kafka.clients.consumer.internals.Fetcher.parseCompletedFetch(Fetcher.java:788)
>  at 
> org.apache.kafka.clients.consumer.internals.Fetcher.fetchedRecords(Fetcher.java:480)
>  at 
> org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1188)
>  at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1046)
>  at kafka.consumer.NewShinyConsumer.receive(BaseConsumer.scala:88)
>  at kafka.tools.ConsoleConsumer$.process(ConsoleConsumer.scala:120)
>  at kafka.tools.ConsoleConsumer$.run(ConsoleConsumer.scala:75)
>  at kafka.tools.ConsoleConsumer$.main(ConsoleConsumer.scala:50)
>  at kafka.tools.ConsoleConsumer.main(ConsoleConsumer.scala)
> At this point we used the kafka.tools.DumpLogSegments tool to parse the disk 
> log file and found that there was indeed dirty data:
> !image-2019-07-27-14-57-06-687.png!
> By looking at the code, I found that in some cases kafka would not verify the 
> data and write it to disk, so we fixed it.
>  We found that when record.offset is not equal to the offset we are 
> expecting, kafka will set the variable inPlaceAssignment to false. When 
> inPlaceAssignment is false, data will not be verified:
> !image-2019-07-27-14-50-58-300.png!
> !image-2019-07-27-14-50-08-128.png!
> Our repairs are as follows:
> !image-2019-07-30-11-39-01-605.png!
> We did a comparative test for this. By modifying the client-side producer 
> code, we made some dirty data. For the original kafka version, it was able to 
> write to the disk normally, but when it was consumed, it was reported, but 
> our repaired version was written. At the time, it can be verified, so this 
> producer write failed:
> !image-2019-07-27-15-05-12-565.png!
> At this time, when the client consumes, an error will be reported:
> !image-2019-07-27-15-06-07-123.png!
> When the kafka server is replaced with the repaired version, the producer 
> will verify that the dirty data is written. The producer failed to write the 
> data this time
> !image-2019-07-27-15-10-21-709.png!



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Comment Edited] (KAFKA-8600) Replace DescribeDelegationToken request/response with automated protocol

2019-07-31 Thread Viktor Somogyi-Vass (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16896966#comment-16896966
 ] 

Viktor Somogyi-Vass edited comment on KAFKA-8600 at 7/31/19 9:20 AM:
-

Hey [~mimaison] are you working on this? I probably would like to get this done 
as I'm working on this part anyway as part of KIP-373. Would you be fine if I 
implemented this?
(In fact I could get done the other two DT protocols as well :) )


was (Author: viktorsomogyi):
Hey [~mimaison] are you working on this? I probably would like to get this done 
as I'm working on this part anyway as part of KIP-373. Would you be fine if I 
implemented this?

> Replace DescribeDelegationToken request/response with automated protocol
> 
>
> Key: KAFKA-8600
> URL: https://issues.apache.org/jira/browse/KAFKA-8600
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Mickael Maison
>Assignee: Mickael Maison
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (KAFKA-8738) Cleaning thread blocked when more than one ALTER_REPLICA_LOG_DIRS requests sent

2019-07-31 Thread Alexandre Dupriez (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16896969#comment-16896969
 ] 

Alexandre Dupriez commented on KAFKA-8738:
--

Interesting. Do you have a self-contained use case to reproduce?

> Cleaning thread blocked  when more than one ALTER_REPLICA_LOG_DIRS requests 
> sent
> 
>
> Key: KAFKA-8738
> URL: https://issues.apache.org/jira/browse/KAFKA-8738
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 2.1.1
>Reporter: dingsainan
>Priority: Major
>
> Hi,
>   
>  I am experiencing one situation  that the log cleaner dose not work  for the 
> related topic-partition when using --kafka-reassign-partitions.sh tool for 
> V2.1.1 for more than one time frequently.
>   
>  My operation:
>  submitting one task for migration replica in one same broker first,  when 
> the previous task still in progress, we submit one new task for the same 
> topic-partition.
>  
> {code:java}
> // the first task:
> {"partitions":
> [{"topic": "lancer_ops_billions_all_log_json_billions",
>   "partition": 1,
>   "replicas": [6,15],
>   "log_dirs": ["any","/data/mnt/storage02/datum/kafka_data"]}]
> }
> //the second task
> {"partitions":
> [{"topic": "lancer_ops_billions_all_log_json_billions",
>   "partition": 1,
>   "replicas": [6,15],
>   "log_dirs": ["any","/data/mnt/storage03/datum/kafka_data"]}]
> }
>  
> {code}
>  
>  My search:
>  Kafka executes abortAndPauseCleaning() once task is submitted, shortly, 
> another task is submitted for the same topic-partition, so the clean thread 
> status is {color:#ff}LogCleaningPaused(2){color} currently. When the 
> second task completed, the clean thread will be resumed for this 
> topic-partition once. In my case, the previous task is killed directly, no 
> resumeClean() is executed for the first task, so when the second task is 
> completed, the clean status for the topic-partition is still 
> {color:#ff}LogCleaningPaused(1){color}, which blocks the clean thread for 
> the topic-partition.
>   
>  _That's all my search, please confirm._
>   
>  _Thanks_
>  _Nora_



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (KAFKA-8733) Offline partitions occur when leader's disk is slow in reads while responding to follower fetch requests.

2019-07-31 Thread Satish Duggana (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Satish Duggana updated KAFKA-8733:
--
Description: 
We found offline partitions issue multiple times on some of the hosts in our 
clusters. After going through the broker logs and hosts’s disk stats, it looks 
like this issue occurs whenever the read/write operations take more time on 
that disk. In a particular case where read time is more than the 
replica.lag.time.max.ms, follower replicas will be out of sync as their earlier 
fetch requests are stuck while reading the local log and their fetch status is 
not yet updated as mentioned in the below code of `ReplicaManager`. If there is 
an issue in reading the data from the log for a duration more than 
replica.lag.time.max.ms then all the replicas will be out of sync and partition 
becomes offline if min.isr.replicas > 1 and unclean.leader.election is false.

 
{code:java}
def readFromLog(): Seq[(TopicPartition, LogReadResult)] = {
  val result = readFromLocalLog( // this call took more than 
`replica.lag.time.max.ms`
  replicaId = replicaId,
  fetchOnlyFromLeader = fetchOnlyFromLeader,
  readOnlyCommitted = fetchOnlyCommitted,
  fetchMaxBytes = fetchMaxBytes,
  hardMaxBytesLimit = hardMaxBytesLimit,
  readPartitionInfo = fetchInfos,
  quota = quota,
  isolationLevel = isolationLevel)
  if (isFromFollower) updateFollowerLogReadResults(replicaId, result). // fetch 
time gets updated here, but mayBeShrinkIsr should have been already called and 
the replica is removed from isr
 else result
 }

val logReadResults = readFromLog()
{code}
Attached the graphs of disk weighted io time stats when this issue occurred.

I will raise a 
[KIP|https://cwiki.apache.org/confluence/display/KAFKA/KIP-501+Avoid+offline+partitions+in+the+edgcase+scenario+of+follower+fetch+requests+not+processed+in+time]
 describing options on how to handle this scenario.

 

  was:
We found offline partitions issue multiple times on some of the hosts in our 
clusters. After going through the broker logs and hosts’s disk stats, it looks 
like this issue occurs whenever the read/write operations take more time on 
that disk. In a particular case where read time is more than the 
replica.lag.time.max.ms, follower replicas will be out of sync as their earlier 
fetch requests are stuck while reading the local log and their fetch status is 
not yet updated as mentioned in the below code of `ReplicaManager`. If there is 
an issue in reading the data from the log for a duration more than 
replica.lag.time.max.ms then all the replicas will be out of sync and partition 
becomes offline if min.isr.replicas > 1 and unclean.leader.election is false.

 
{code:java}
def readFromLog(): Seq[(TopicPartition, LogReadResult)] = {
  val result = readFromLocalLog( // this call took more than 
`replica.lag.time.max.ms`
  replicaId = replicaId,
  fetchOnlyFromLeader = fetchOnlyFromLeader,
  readOnlyCommitted = fetchOnlyCommitted,
  fetchMaxBytes = fetchMaxBytes,
  hardMaxBytesLimit = hardMaxBytesLimit,
  readPartitionInfo = fetchInfos,
  quota = quota,
  isolationLevel = isolationLevel)
  if (isFromFollower) updateFollowerLogReadResults(replicaId, result). // fetch 
time gets updated here, but mayBeShrinkIsr should have been already called and 
the replica is removed from sir
 else result
 }

val logReadResults = readFromLog()
{code}
Attached the graphs of disk weighted io time stats when this issue occurred.

I will raise a 
[KIP|https://cwiki.apache.org/confluence/display/KAFKA/KIP-501+Avoid+offline+partitions+in+the+edgcase+scenario+of+follower+fetch+requests+not+processed+in+time]
 describing options on how to handle this scenario.

 


> Offline partitions occur when leader's disk is slow in reads while responding 
> to follower fetch requests.
> -
>
> Key: KAFKA-8733
> URL: https://issues.apache.org/jira/browse/KAFKA-8733
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.1.2, 2.4.0
>Reporter: Satish Duggana
>Assignee: Satish Duggana
>Priority: Critical
> Attachments: weighted-io-time-2.png, wio-time.png
>
>
> We found offline partitions issue multiple times on some of the hosts in our 
> clusters. After going through the broker logs and hosts’s disk stats, it 
> looks like this issue occurs whenever the read/write operations take more 
> time on that disk. In a particular case where read time is more than the 
> replica.lag.time.max.ms, follower replicas will be out of sync as their 
> earlier fetch requests are stuck while reading the local log and their fetch 
> status is not yet updated as mentioned in the below code of `ReplicaManager`. 
> If there is an issue in reading the data from the log for a duration m

[jira] [Commented] (KAFKA-8600) Replace DescribeDelegationToken request/response with automated protocol

2019-07-31 Thread Viktor Somogyi-Vass (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16896966#comment-16896966
 ] 

Viktor Somogyi-Vass commented on KAFKA-8600:


Hey [~mimaison] are you working on this? I probably would like to get this done 
as I'm working on this part anyway as part of KIP-373. Would you be fine if I 
implemented this?

> Replace DescribeDelegationToken request/response with automated protocol
> 
>
> Key: KAFKA-8600
> URL: https://issues.apache.org/jira/browse/KAFKA-8600
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Mickael Maison
>Assignee: Mickael Maison
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (KAFKA-8739) rejoining broker fails to sanity check existing log segments

2019-07-31 Thread sanjiv marathe (JIRA)
sanjiv marathe created KAFKA-8739:
-

 Summary: rejoining broker fails to sanity check existing log 
segments
 Key: KAFKA-8739
 URL: https://issues.apache.org/jira/browse/KAFKA-8739
 Project: Kafka
  Issue Type: Bug
  Components: replication
Affects Versions: 2.3.0
Reporter: sanjiv marathe


kafka claims it can be used as a storage. But following scenario proves other 
wise.
 # Consider a topic with single partition, repl-factor 2, with two brokers, say 
A and B with A is the leader.
 # Broker B fails due to sector errors. Sysadmin fixes the issues and brings it 
up again after a few minutes. A few log segments are lost/corrupted.
 # Broker B catches up with missed out msgs by fetching them from the leader A, 
but does not realize the issue with earlier log segments.
 # Broker A fails, B becomes the leader.
 # A new consumer requests msgs from the beginning. Broker B fails to deliver 
msgs belonging to corrupted log segments.

Suggested solution

A broker, immediately after coming up, should go through a sanity check, e.g. 
CRC check of its log segments. Any corrupted/lost, should be refetched from the 
leader.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (KAFKA-8041) Flaky Test LogDirFailureTest#testIOExceptionDuringLogRoll

2019-07-31 Thread Bruno Cadonna (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16896845#comment-16896845
 ] 

Bruno Cadonna commented on KAFKA-8041:
--

https://builds.apache.org/job/kafka-pr-jdk11-scala2.13/654/testReport/junit/kafka.server/LogDirFailureTest/testIOExceptionDuringLogRoll/

> Flaky Test LogDirFailureTest#testIOExceptionDuringLogRoll
> -
>
> Key: KAFKA-8041
> URL: https://issues.apache.org/jira/browse/KAFKA-8041
> Project: Kafka
>  Issue Type: Bug
>  Components: core, unit tests
>Affects Versions: 2.0.1, 2.3.0
>Reporter: Matthias J. Sax
>Priority: Critical
>  Labels: flaky-test
> Fix For: 2.4.0
>
>
> [https://builds.apache.org/blue/organizations/jenkins/kafka-2.0-jdk8/detail/kafka-2.0-jdk8/236/tests]
> {quote}java.lang.AssertionError: Expected some messages
> at kafka.utils.TestUtils$.fail(TestUtils.scala:357)
> at kafka.utils.TestUtils$.waitUntilTrue(TestUtils.scala:787)
> at 
> kafka.server.LogDirFailureTest.testProduceAfterLogDirFailureOnLeader(LogDirFailureTest.scala:189)
> at 
> kafka.server.LogDirFailureTest.testIOExceptionDuringLogRoll(LogDirFailureTest.scala:63){quote}
> STDOUT
> {quote}[2019-03-05 03:44:58,614] ERROR [ReplicaFetcher replicaId=1, 
> leaderId=0, fetcherId=0] Error for partition topic-6 at offset 0 
> (kafka.server.ReplicaFetcherThread:76)
> org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
> does not host this topic-partition.
> [2019-03-05 03:44:58,614] ERROR [ReplicaFetcher replicaId=1, leaderId=0, 
> fetcherId=0] Error for partition topic-0 at offset 0 
> (kafka.server.ReplicaFetcherThread:76)
> org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
> does not host this topic-partition.
> [2019-03-05 03:44:58,615] ERROR [ReplicaFetcher replicaId=1, leaderId=0, 
> fetcherId=0] Error for partition topic-10 at offset 0 
> (kafka.server.ReplicaFetcherThread:76)
> org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
> does not host this topic-partition.
> [2019-03-05 03:44:58,615] ERROR [ReplicaFetcher replicaId=1, leaderId=0, 
> fetcherId=0] Error for partition topic-4 at offset 0 
> (kafka.server.ReplicaFetcherThread:76)
> org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
> does not host this topic-partition.
> [2019-03-05 03:44:58,615] ERROR [ReplicaFetcher replicaId=1, leaderId=0, 
> fetcherId=0] Error for partition topic-8 at offset 0 
> (kafka.server.ReplicaFetcherThread:76)
> org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
> does not host this topic-partition.
> [2019-03-05 03:44:58,615] ERROR [ReplicaFetcher replicaId=1, leaderId=0, 
> fetcherId=0] Error for partition topic-2 at offset 0 
> (kafka.server.ReplicaFetcherThread:76)
> org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
> does not host this topic-partition.
> [2019-03-05 03:45:00,248] ERROR Error while rolling log segment for topic-0 
> in dir 
> /home/jenkins/jenkins-slave/workspace/kafka-2.0-jdk8/core/data/kafka-3869208920357262216
>  (kafka.server.LogDirFailureChannel:76)
> java.io.FileNotFoundException: 
> /home/jenkins/jenkins-slave/workspace/kafka-2.0-jdk8/core/data/kafka-3869208920357262216/topic-0/.index
>  (Not a directory)
> at java.io.RandomAccessFile.open0(Native Method)
> at java.io.RandomAccessFile.open(RandomAccessFile.java:316)
> at java.io.RandomAccessFile.(RandomAccessFile.java:243)
> at kafka.log.AbstractIndex.$anonfun$resize$1(AbstractIndex.scala:121)
> at scala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.java:12)
> at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:251)
> at kafka.log.AbstractIndex.resize(AbstractIndex.scala:115)
> at kafka.log.AbstractIndex.$anonfun$trimToValidSize$1(AbstractIndex.scala:184)
> at scala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.java:12)
> at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:251)
> at kafka.log.AbstractIndex.trimToValidSize(AbstractIndex.scala:184)
> at kafka.log.LogSegment.onBecomeInactiveSegment(LogSegment.scala:501)
> at kafka.log.Log.$anonfun$roll$8(Log.scala:1520)
> at kafka.log.Log.$anonfun$roll$8$adapted(Log.scala:1520)
> at scala.Option.foreach(Option.scala:257)
> at kafka.log.Log.$anonfun$roll$2(Log.scala:1520)
> at kafka.log.Log.maybeHandleIOException(Log.scala:1881)
> at kafka.log.Log.roll(Log.scala:1484)
> at 
> kafka.server.LogDirFailureTest.testProduceAfterLogDirFailureOnLeader(LogDirFailureTest.scala:154)
> at 
> kafka.server.LogDirFailureTest.testIOExceptionDuringLogRoll(LogDirFailureTest.scala:63)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
>

[jira] [Commented] (KAFKA-8460) Flaky Test PlaintextConsumerTest#testLowMaxFetchSizeForRequestAndPartition

2019-07-31 Thread Bruno Cadonna (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16896842#comment-16896842
 ] 

Bruno Cadonna commented on KAFKA-8460:
--

https://builds.apache.org/job/kafka-pr-jdk11-scala2.13/654/testReport/kafka.api/PlaintextConsumerTest/testLowMaxFetchSizeForRequestAndPartition/

> Flaky Test  PlaintextConsumerTest#testLowMaxFetchSizeForRequestAndPartition
> ---
>
> Key: KAFKA-8460
> URL: https://issues.apache.org/jira/browse/KAFKA-8460
> Project: Kafka
>  Issue Type: Bug
>Reporter: Boyang Chen
>Priority: Major
>
> [https://builds.apache.org/job/kafka-pr-jdk11-scala2.12/5168/consoleFull] 
>  *16:17:04* kafka.api.PlaintextConsumerTest > 
> testLowMaxFetchSizeForRequestAndPartition FAILED*16:17:04* 
> org.scalatest.exceptions.TestFailedException: Timed out before consuming 
> expected 2700 records. The number consumed was 1980.*16:17:04* at 
> org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:530)*16:17:04*
>  at 
> org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:529)*16:17:04*
>  at 
> org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1389)*16:17:04*
>  at org.scalatest.Assertions.fail(Assertions.scala:1091)*16:17:04* at 
> org.scalatest.Assertions.fail$(Assertions.scala:1087)*16:17:04* at 
> org.scalatest.Assertions$.fail(Assertions.scala:1389)*16:17:04* at 
> kafka.utils.TestUtils$.waitUntilTrue(TestUtils.scala:789)*16:17:04* at 
> kafka.utils.TestUtils$.pollRecordsUntilTrue(TestUtils.scala:765)*16:17:04* at 
> kafka.api.AbstractConsumerTest.consumeRecords(AbstractConsumerTest.scala:156)*16:17:04*
>  at 
> kafka.api.PlaintextConsumerTest.testLowMaxFetchSizeForRequestAndPartition(PlaintextConsumerTest.scala:801)*16:17:04*



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)