[jira] [Commented] (KAFKA-5358) Consumer perf tool should count rebalance time separately
[ https://issues.apache.org/jira/browse/KAFKA-5358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16137766#comment-16137766 ] ASF GitHub Bot commented on KAFKA-5358: --- Github user huxihx closed the pull request at: https://github.com/apache/kafka/pull/3188 > Consumer perf tool should count rebalance time separately > - > > Key: KAFKA-5358 > URL: https://issues.apache.org/jira/browse/KAFKA-5358 > Project: Kafka > Issue Type: Improvement >Reporter: Jason Gustafson >Assignee: huxihx > > It would be helpful to measure rebalance time separately in the performance > tool so that throughput between different versions can be compared more > easily in spite of improvements such as > https://cwiki.apache.org/confluence/display/KAFKA/KIP-134%3A+Delay+initial+consumer+group+rebalance. > At the moment, running the perf tool on 0.11.0 or trunk for a short amount > of time will present a severely skewed picture since the overall time will be > dominated by the join group delay. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-5358) Consumer perf tool should count rebalance time separately
[ https://issues.apache.org/jira/browse/KAFKA-5358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16137764#comment-16137764 ] ASF GitHub Bot commented on KAFKA-5358: --- GitHub user huxihx opened a pull request: https://github.com/apache/kafka/pull/3723 KAFKA-5358: Consumer perf tool should count rebalance time. You can merge this pull request into a Git repository by running: $ git pull https://github.com/huxihx/kafka KAFKA-5358 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/kafka/pull/3723.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3723 commit 934f24e32479453ce437fef27a796c9e2b1b2514 Author: huxihxDate: 2017-08-23T02:07:18Z As per Jason's comments, refined type from to a naive > Consumer perf tool should count rebalance time separately > - > > Key: KAFKA-5358 > URL: https://issues.apache.org/jira/browse/KAFKA-5358 > Project: Kafka > Issue Type: Improvement >Reporter: Jason Gustafson >Assignee: huxihx > > It would be helpful to measure rebalance time separately in the performance > tool so that throughput between different versions can be compared more > easily in spite of improvements such as > https://cwiki.apache.org/confluence/display/KAFKA/KIP-134%3A+Delay+initial+consumer+group+rebalance. > At the moment, running the perf tool on 0.11.0 or trunk for a short amount > of time will present a severely skewed picture since the overall time will be > dominated by the join group delay. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-5731) Connect WorkerSinkTask out of order offset commit can lead to inconsistent state
[ https://issues.apache.org/jira/browse/KAFKA-5731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16137736#comment-16137736 ] ASF GitHub Bot commented on KAFKA-5731: --- Github user rhauch closed the pull request at: https://github.com/apache/kafka/pull/3717 > Connect WorkerSinkTask out of order offset commit can lead to inconsistent > state > > > Key: KAFKA-5731 > URL: https://issues.apache.org/jira/browse/KAFKA-5731 > Project: Kafka > Issue Type: Bug > Components: KafkaConnect >Reporter: Jason Gustafson >Assignee: Randall Hauch > Fix For: 0.10.2.2, 0.11.0.1 > > > In Connect's WorkerSinkTask, we do sequence number validation to ensure that > offset commits are handled in the right order > (https://github.com/apache/kafka/blob/trunk/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSinkTask.java#L199). > > Unfortunately, for asynchronous commits, the {{lastCommittedOffsets}} field > is overridden regardless of this sequence check as long as the response had > no error > (https://github.com/apache/kafka/blob/trunk/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSinkTask.java#L284): > {code:java} > OffsetCommitCallback cb = new OffsetCommitCallback() { > @Override > public void onComplete(Map> offsets, Exception error) { > if (error == null) { > lastCommittedOffsets = offsets; > } > onCommitCompleted(error, seqno); > } > }; > {code} > Hence if we get an out of order commit, then the internal state will be > inconsistent. To fix this, we should only override {{lastCommittedOffsets}} > after sequence validation as part of the {{onCommitCompleted(...)}} method. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-5603) Streams should not abort transaction when closing zombie task
[ https://issues.apache.org/jira/browse/KAFKA-5603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16137706#comment-16137706 ] ASF GitHub Bot commented on KAFKA-5603: --- GitHub user mjsax opened a pull request: https://github.com/apache/kafka/pull/3722 KAFKA-5603: Don't abort TX for zombie tasks You can merge this pull request into a Git repository by running: $ git pull https://github.com/mjsax/kafka kafka-5603-dont-abort-tx-for-zombie-tasks-01101 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/kafka/pull/3722.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3722 commit 72cff12f2274eb16f44fde85dcc7d5ff30614a3f Author: Matthias J. SaxDate: 2017-08-23T01:06:32Z KAFKA-5603: Don't abort TX for zombie tasks > Streams should not abort transaction when closing zombie task > - > > Key: KAFKA-5603 > URL: https://issues.apache.org/jira/browse/KAFKA-5603 > Project: Kafka > Issue Type: Bug > Components: streams >Affects Versions: 0.11.0.0 >Reporter: Matthias J. Sax >Assignee: Matthias J. Sax >Priority: Critical > Fix For: 0.11.0.1 > > > The contract of the transactional producer API is to not call any > transactional method after a {{ProducerFenced}} exception was thrown. > Streams however, does an unconditional call within {{StreamTask#close()}} to > {{abortTransaction()}} in case of unclean shutdown. We need to distinguish > between a {{ProducerFenced}} and other unclean shutdown cases. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-5768) Upgrade ducktape version to 0.7.0, and use new kill_java_processes
[ https://issues.apache.org/jira/browse/KAFKA-5768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16137689#comment-16137689 ] Colin P. McCabe commented on KAFKA-5768: https://github.com/apache/kafka/pull/3721 > Upgrade ducktape version to 0.7.0, and use new kill_java_processes > -- > > Key: KAFKA-5768 > URL: https://issues.apache.org/jira/browse/KAFKA-5768 > Project: Kafka > Issue Type: Improvement >Reporter: Colin P. McCabe >Assignee: Colin P. McCabe > > Upgrade the ducktape version to 0.7.0. Use the new {{kill_java_processes}} > function in ducktape to kill only the processes that are part of a service > when starting or stopping a service, rather than killing all java processes > (in some cases) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-3705) Support non-key joining in KTable
[ https://issues.apache.org/jira/browse/KAFKA-3705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16137686#comment-16137686 ] ASF GitHub Bot commented on KAFKA-3705: --- GitHub user guozhangwang opened a pull request: https://github.com/apache/kafka/pull/3720 [DO NOT MERGE] KAFKA-3705: non-key joins This is just for reviewing the diff easily to see how it is done by @jfillipiak. You can merge this pull request into a Git repository by running: $ git pull https://github.com/Kaiserchen/kafka KAFKA3705 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/kafka/pull/3720.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3720 commit 3da2b8f787a5d30dee2de71cf0f125ab3e57d89b Author: jfilipiakDate: 2017-06-30T09:00:39Z onetomany join signature to show on mailing list commit cc9c6f4a68170fb829adb46a6de40ec0fc75716f Author: jfilipiak Date: 2017-07-12T14:49:43Z stores commit 807e90aac82d7659310ce92066ac1df6e339068a Author: jfilipiak Date: 2017-07-26T06:06:58Z just throw in most of the processors, wont build commit 1a6ff7b01ad35dd7eedf4c69aa534043ab1a8eb8 Author: jfilipiak Date: 2017-08-18T10:07:34Z random clean up commit ffe9b9496afbdad73bfcb9c014b6045b8ca95e79 Author: jfilipiak Date: 2017-08-19T19:22:02Z clean up as much as possible > Support non-key joining in KTable > - > > Key: KAFKA-3705 > URL: https://issues.apache.org/jira/browse/KAFKA-3705 > Project: Kafka > Issue Type: Bug > Components: streams >Reporter: Guozhang Wang > Labels: api > > Today in Kafka Streams DSL, KTable joins are only based on keys. If users > want to join a KTable A by key {{a}} with another KTable B by key {{b}} but > with a "foreign key" {{a}}, and assuming they are read from two topics which > are partitioned on {{a}} and {{b}} respectively, they need to do the > following pattern: > {code} > tableB' = tableB.groupBy(/* select on field "a" */).agg(...); // now tableB' > is partitioned on "a" > tableA.join(tableB', joiner); > {code} > Even if these two tables are read from two topics which are already > partitioned on {{a}}, users still need to do the pre-aggregation in order to > make the two joining streams to be on the same key. This is a draw-back from > programability and we should fix it. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (KAFKA-2590) KIP-28: Kafka Streams Checklist
[ https://issues.apache.org/jira/browse/KAFKA-2590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guozhang Wang updated KAFKA-2590: - Fix Version/s: 0.11.0.0 > KIP-28: Kafka Streams Checklist > --- > > Key: KAFKA-2590 > URL: https://issues.apache.org/jira/browse/KAFKA-2590 > Project: Kafka > Issue Type: New Feature > Components: streams >Reporter: Guozhang Wang >Assignee: Guozhang Wang > Fix For: 0.11.0.0 > > > This is an umbrella story for the processor client and Kafka Streams feature > implementation. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (KAFKA-4125) Provide low-level Processor API meta data in DSL layer
[ https://issues.apache.org/jira/browse/KAFKA-4125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guozhang Wang updated KAFKA-4125: - Issue Type: New Feature (was: Sub-task) Parent: (was: KAFKA-2590) > Provide low-level Processor API meta data in DSL layer > -- > > Key: KAFKA-4125 > URL: https://issues.apache.org/jira/browse/KAFKA-4125 > Project: Kafka > Issue Type: New Feature > Components: streams >Reporter: Matthias J. Sax >Assignee: Jeyhun Karimov >Priority: Minor > Labels: kip > Fix For: 1.0.0 > > > For Processor API, user can get meta data like record offset, timestamp etc > via the provided {{Context}} object. It might be useful to allow uses to > access this information in DSL layer, too. > The idea would be, to do it "the Flink way", ie, by providing > RichFunctions; {{mapValue()}} for example. > Is takes a {{ValueMapper}} that only has method > {noformat} > V2 apply(V1 value); > {noformat} > Thus, you cannot get any meta data within apply (it's completely "blind"). > We would add two more interfaces: {{RichFunction}} with a method > {{open(Context context)}} and > {noformat} > RichValueMapper extends ValueMapper , RichFunction > {noformat} > This way, the user can chose to implement Rich- or Standard-function and > we do not need to change existing APIs. Both can be handed into > {{KStream.mapValues()}} for example. Internally, we check if a Rich > function is provided, and if yes, hand in the {{Context}} object once, to > make it available to the user who can now access it within {{apply()}} -- or > course, the user must set a member variable in {{open()}} to hold the > reference to the Context object. > KIP: > https://cwiki.apache.org/confluence/display/KAFKA/KIP-159%3A+Introducing+Rich+functions+to+Streams -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (KAFKA-5731) Connect WorkerSinkTask out of order offset commit can lead to inconsistent state
[ https://issues.apache.org/jira/browse/KAFKA-5731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ewen Cheslack-Postava updated KAFKA-5731: - Fix Version/s: 0.10.2.2 > Connect WorkerSinkTask out of order offset commit can lead to inconsistent > state > > > Key: KAFKA-5731 > URL: https://issues.apache.org/jira/browse/KAFKA-5731 > Project: Kafka > Issue Type: Bug > Components: KafkaConnect >Reporter: Jason Gustafson >Assignee: Randall Hauch > Fix For: 0.10.2.2, 0.11.0.1 > > > In Connect's WorkerSinkTask, we do sequence number validation to ensure that > offset commits are handled in the right order > (https://github.com/apache/kafka/blob/trunk/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSinkTask.java#L199). > > Unfortunately, for asynchronous commits, the {{lastCommittedOffsets}} field > is overridden regardless of this sequence check as long as the response had > no error > (https://github.com/apache/kafka/blob/trunk/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSinkTask.java#L284): > {code:java} > OffsetCommitCallback cb = new OffsetCommitCallback() { > @Override > public void onComplete(Map> offsets, Exception error) { > if (error == null) { > lastCommittedOffsets = offsets; > } > onCommitCompleted(error, seqno); > } > }; > {code} > Hence if we get an out of order commit, then the internal state will be > inconsistent. To fix this, we should only override {{lastCommittedOffsets}} > after sequence validation as part of the {{onCommitCompleted(...)}} method. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (KAFKA-2590) KIP-28: Kafka Streams Checklist
[ https://issues.apache.org/jira/browse/KAFKA-2590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guozhang Wang resolved KAFKA-2590. -- Resolution: Fixed Assignee: Guozhang Wang > KIP-28: Kafka Streams Checklist > --- > > Key: KAFKA-2590 > URL: https://issues.apache.org/jira/browse/KAFKA-2590 > Project: Kafka > Issue Type: New Feature > Components: streams >Reporter: Guozhang Wang >Assignee: Guozhang Wang > > This is an umbrella story for the processor client and Kafka Streams feature > implementation. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (KAFKA-3429) Remove Serdes needed for repartitioning in KTable stateful operations
[ https://issues.apache.org/jira/browse/KAFKA-3429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guozhang Wang updated KAFKA-3429: - Issue Type: Improvement (was: Sub-task) Parent: (was: KAFKA-2590) > Remove Serdes needed for repartitioning in KTable stateful operations > - > > Key: KAFKA-3429 > URL: https://issues.apache.org/jira/browse/KAFKA-3429 > Project: Kafka > Issue Type: Improvement > Components: streams >Reporter: Guozhang Wang > Labels: api > > Currently in KTable aggregate operations where a repartition is possibly > needed since the aggregation key may not be the same as the original primary > key, we require the users to provide serdes (default to configured ones) for > read / write to the internally created re-partition topic. However, these are > not necessary since for all KTable instances either generated from the topics > directly: > {code}table = builder.table(...){code} > or from aggregation operations: > {code}table = stream.aggregate(...){code} > There are already serde provided for materializing the data, and hence the > same serde can be re-used when the resulted KTable is involved in future > aggregation operations. For example: > {code} > table1 = stream.aggregateByKey(serde); > table2 = table1.aggregate(aggregator, selector, originalSerde, > aggregateSerde); > {code} > We would not need to require users to specify the "originalSerde" in > table1.aggregate since it could always reuse the "serde" from > stream.aggregateByKey, which is used to materialize the table1 object. > In order to get ride of it, implementation-wise we need to carry the serde > information along with the KTableImpl instance in order to re-use it in a > future operation that requires repartitioning. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (KAFKA-4113) Allow KTable bootstrap
[ https://issues.apache.org/jira/browse/KAFKA-4113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guozhang Wang updated KAFKA-4113: - Issue Type: New Feature (was: Sub-task) Parent: (was: KAFKA-2590) > Allow KTable bootstrap > -- > > Key: KAFKA-4113 > URL: https://issues.apache.org/jira/browse/KAFKA-4113 > Project: Kafka > Issue Type: New Feature > Components: streams >Reporter: Matthias J. Sax >Assignee: Guozhang Wang > > On the mailing list, there are multiple request about the possibility to > "fully populate" a KTable before actual stream processing start. > Even if it is somewhat difficult to define, when the initial populating phase > should end, there are multiple possibilities: > The main idea is, that there is a rarely updated topic that contains the > data. Only after this topic got read completely and the KTable is ready, the > application should start processing. This would indicate, that on startup, > the current partition sizes must be fetched and stored, and after KTable got > populated up to those offsets, stream processing can start. > Other discussed ideas are: > 1) an initial fixed time period for populating > (it might be hard for a user to estimate the correct value) > 2) an "idle" period, ie, if no update to a KTable for a certain time is > done, we consider it as populated > 3) a timestamp cut off point, ie, all records with an older timestamp > belong to the initial populating phase > The API change is not decided yet, and the API desing is part of this JIRA. > One suggestion (for option (4)) was: > {noformat} > KTable table = builder.table("topic", 1000); // populate the table without > reading any other topics until see one record with timestamp 1000. > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (KAFKA-3429) Remove Serdes needed for repartitioning in KTable stateful operations
[ https://issues.apache.org/jira/browse/KAFKA-3429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guozhang Wang updated KAFKA-3429: - Labels: api (was: api newbie++) > Remove Serdes needed for repartitioning in KTable stateful operations > - > > Key: KAFKA-3429 > URL: https://issues.apache.org/jira/browse/KAFKA-3429 > Project: Kafka > Issue Type: Sub-task > Components: streams >Reporter: Guozhang Wang > Labels: api > > Currently in KTable aggregate operations where a repartition is possibly > needed since the aggregation key may not be the same as the original primary > key, we require the users to provide serdes (default to configured ones) for > read / write to the internally created re-partition topic. However, these are > not necessary since for all KTable instances either generated from the topics > directly: > {code}table = builder.table(...){code} > or from aggregation operations: > {code}table = stream.aggregate(...){code} > There are already serde provided for materializing the data, and hence the > same serde can be re-used when the resulted KTable is involved in future > aggregation operations. For example: > {code} > table1 = stream.aggregateByKey(serde); > table2 = table1.aggregate(aggregator, selector, originalSerde, > aggregateSerde); > {code} > We would not need to require users to specify the "originalSerde" in > table1.aggregate since it could always reuse the "serde" from > stream.aggregateByKey, which is used to materialize the table1 object. > In order to get ride of it, implementation-wise we need to carry the serde > information along with the KTableImpl instance in order to re-use it in a > future operation that requires repartitioning. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (KAFKA-5768) Upgrade ducktape version to 0.7.0, and use new kill_java_processes
[ https://issues.apache.org/jira/browse/KAFKA-5768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin P. McCabe updated KAFKA-5768: --- Description: Upgrade the ducktape version to 0.7.0. Use the new {{kill_java_processes}} function in ducktape to kill only the processes that are part of a service when starting or stopping a service, rather than killing all java processes (in some cases) (was: Upgrade the ducktape version to 0.7.0. Use the new `kill_java_processes` function in ducktape to kill only the processes that are part of a service when starting or stopping a service, rather than killing all java processes (in some cases)) > Upgrade ducktape version to 0.7.0, and use new kill_java_processes > -- > > Key: KAFKA-5768 > URL: https://issues.apache.org/jira/browse/KAFKA-5768 > Project: Kafka > Issue Type: Improvement >Reporter: Colin P. McCabe >Assignee: Colin P. McCabe > > Upgrade the ducktape version to 0.7.0. Use the new {{kill_java_processes}} > function in ducktape to kill only the processes that are part of a service > when starting or stopping a service, rather than killing all java processes > (in some cases) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (KAFKA-5768) Upgrade ducktape version to 0.7.0, and use new kill_java_processes
Colin P. McCabe created KAFKA-5768: -- Summary: Upgrade ducktape version to 0.7.0, and use new kill_java_processes Key: KAFKA-5768 URL: https://issues.apache.org/jira/browse/KAFKA-5768 Project: Kafka Issue Type: Improvement Reporter: Colin P. McCabe Assignee: Colin P. McCabe Upgrade the ducktape version to 0.7.0. Use the new `kill_java_processes` function in ducktape to kill only the processes that are part of a service when starting or stopping a service, rather than killing all java processes (in some cases) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-5603) Streams should not abort transaction when closing zombie task
[ https://issues.apache.org/jira/browse/KAFKA-5603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16137647#comment-16137647 ] ASF GitHub Bot commented on KAFKA-5603: --- GitHub user mjsax opened a pull request: https://github.com/apache/kafka/pull/3719 KAFKA-5603: Don't abort TX for zombie tasks You can merge this pull request into a Git repository by running: $ git pull https://github.com/mjsax/kafka kafka-5603-dont-abort-tx-for-zombie-tasks-2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/kafka/pull/3719.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3719 commit d1814d2e657dea4fb37b98305cc0f960119a123f Author: Matthias J. SaxDate: 2017-08-23T00:09:52Z KAFKA-5603: Don't abort TX for zombie tasks > Streams should not abort transaction when closing zombie task > - > > Key: KAFKA-5603 > URL: https://issues.apache.org/jira/browse/KAFKA-5603 > Project: Kafka > Issue Type: Bug > Components: streams >Affects Versions: 0.11.0.0 >Reporter: Matthias J. Sax >Assignee: Matthias J. Sax >Priority: Critical > Fix For: 0.11.0.1 > > > The contract of the transactional producer API is to not call any > transactional method after a {{ProducerFenced}} exception was thrown. > Streams however, does an unconditional call within {{StreamTask#close()}} to > {{abortTransaction()}} in case of unclean shutdown. We need to distinguish > between a {{ProducerFenced}} and other unclean shutdown cases. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-5767) Kafka server should halt if IBP < 1.0.0 and there is log directory failure
[ https://issues.apache.org/jira/browse/KAFKA-5767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16137619#comment-16137619 ] ASF GitHub Bot commented on KAFKA-5767: --- GitHub user lindong28 opened a pull request: https://github.com/apache/kafka/pull/3718 KAFKA-5767; Kafka server should halt if IBP < 1.0.0 and there is log directory failure You can merge this pull request into a Git repository by running: $ git pull https://github.com/lindong28/kafka KAFKA-5767 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/kafka/pull/3718.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3718 commit e3ca5fa9b5d78b987652bb2d1b7600e2df992109 Author: Dong LinDate: 2017-08-22T23:40:17Z KAFKA-5767; Kafka server should halt if IBP < 1.0.0 and there is log directory failure > Kafka server should halt if IBP < 1.0.0 and there is log directory failure > -- > > Key: KAFKA-5767 > URL: https://issues.apache.org/jira/browse/KAFKA-5767 > Project: Kafka > Issue Type: Bug >Reporter: Dong Lin >Assignee: Dong Lin > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-5731) Connect WorkerSinkTask out of order offset commit can lead to inconsistent state
[ https://issues.apache.org/jira/browse/KAFKA-5731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16137610#comment-16137610 ] Randall Hauch commented on KAFKA-5731: -- [~hachikuji] or [~ewencp]: I've reopened the issue so this can be backported to the {{0.10.2}} branch, and added [this pull request|https://github.com/apache/kafka/pull/3717] for that branch. Thanks! > Connect WorkerSinkTask out of order offset commit can lead to inconsistent > state > > > Key: KAFKA-5731 > URL: https://issues.apache.org/jira/browse/KAFKA-5731 > Project: Kafka > Issue Type: Bug > Components: KafkaConnect >Reporter: Jason Gustafson >Assignee: Randall Hauch > Fix For: 0.11.0.1 > > > In Connect's WorkerSinkTask, we do sequence number validation to ensure that > offset commits are handled in the right order > (https://github.com/apache/kafka/blob/trunk/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSinkTask.java#L199). > > Unfortunately, for asynchronous commits, the {{lastCommittedOffsets}} field > is overridden regardless of this sequence check as long as the response had > no error > (https://github.com/apache/kafka/blob/trunk/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSinkTask.java#L284): > {code:java} > OffsetCommitCallback cb = new OffsetCommitCallback() { > @Override > public void onComplete(Map> offsets, Exception error) { > if (error == null) { > lastCommittedOffsets = offsets; > } > onCommitCompleted(error, seqno); > } > }; > {code} > Hence if we get an out of order commit, then the internal state will be > inconsistent. To fix this, we should only override {{lastCommittedOffsets}} > after sequence validation as part of the {{onCommitCompleted(...)}} method. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-5731) Connect WorkerSinkTask out of order offset commit can lead to inconsistent state
[ https://issues.apache.org/jira/browse/KAFKA-5731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16137605#comment-16137605 ] ASF GitHub Bot commented on KAFKA-5731: --- GitHub user rhauch opened a pull request: https://github.com/apache/kafka/pull/3717 KAFKA-5731 Corrected how the sink task worker updates the last committed offsets (0.10.2) Prior to this change, it was possible for the synchronous consumer commit request to be handled before previously-submitted asynchronous commit requests. If that happened, the out-of-order handlers improperly set the last committed offsets, which then became inconsistent with the offsets the connector task is working with. This change ensures that the last committed offsets are updated only for the most recent commit request, even if the consumer reorders the calls to the callbacks. This change also backports the fix for KAFKA-4942, which was minimal that caused the new tests to fail. **This is for the `0.10.2` branch; see #3662 for the equivalent and already-approved PR for `trunk` and #3672 for the equivalent and already-approved PR for the `0.11.0` branch.** You can merge this pull request into a Git repository by running: $ git pull https://github.com/rhauch/kafka kafka-5731-0.10.2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/kafka/pull/3717.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3717 commit 4e39118ae302980e6d8b75cb09534172d99d5b7a Author: Randall HauchDate: 2017-08-12T00:42:06Z KAFKA-5731 Corrected how the sink task worker updates the last committed offsets Prior to this change, it was possible for the synchronous consumer commit request to be handled before previously-submitted asynchronous commit requests. If that happened, the out-of-order handlers improperly set the last committed offsets, which then became inconsistent with the offsets the connector task is working with. This change ensures that the last committed offsets are updated only for the most recent commit request, even if the consumer reorders the calls to the callbacks. commit 6f74ef8e89f1193a5deb66870022591d51eb6580 Author: Randall Hauch Date: 2017-08-14T19:11:08Z KAFKA-5731 Corrected mock consumer behavior during rebalance Corrects the test case added in the previous commit to properly revoke the existing partition assignments before adding new partition assigments. commit 3c2531b1abdaf3cdaac3781a45597a616652ff1c Author: Randall Hauch Date: 2017-08-14T19:11:45Z KAFKA-5731 Added expected call that was missing in another test commit 05567b1677e7f5a39ca0f20d86773c872193da0b Author: Randall Hauch Date: 2017-08-14T22:24:35Z KAFKA-5731 Improved log messages related to offset commits # Conflicts: # connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSinkTask.java commit 60eba0ce024f3a211f33bf20bc04febbebb7d1c4 Author: Randall Hauch Date: 2017-08-15T14:47:05Z KAFKA-5731 More cleanup of log messages related to offset commits commit ff123bfb910742e3a5c320fff6b23ff645ef62a2 Author: Randall Hauch Date: 2017-08-15T16:21:52Z KAFKA-5731 More improvements to the log messages in WorkerSinkTask # Conflicts: # connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSinkTask.java commit d5f1b29a4cb41c139094cbed9b78cf51594f861c Author: Randall Hauch Date: 2017-08-15T16:31:28Z KAFKA-5731 Removed unnecessary log message commit f2b02cda83876b7d55f331cf89d9a306ab2b467f Author: Randall Hauch Date: 2017-08-15T17:54:16Z KAFKA-5731 Additional tweaks to debug and trace log messages to ensure clarity and usefulness commit fa427b7557b93a600e0007e1a4adfb4aa38f526b Author: Randall Hauch Date: 2017-08-15T19:30:09Z KAFKA-5731 Use the correct value in trace messages commit 957f0acac6154fda6b522e89c00df3dcb299 Author: Randall Hauch Date: 2017-08-22T23:29:52Z KAFKA-4942 Fix commitTimeoutMs being set before the commit actually started Backported the fix for this issue, which was fixed in 0.11.0.0 > Connect WorkerSinkTask out of order offset commit can lead to inconsistent > state > > > Key: KAFKA-5731 > URL: https://issues.apache.org/jira/browse/KAFKA-5731 > Project: Kafka > Issue Type: Bug > Components: KafkaConnect >Reporter: Jason Gustafson >Assignee: Randall Hauch > Fix For:
[jira] [Created] (KAFKA-5767) Kafka server should halt if IBP < 1.0.0 and there is log directory failure
Dong Lin created KAFKA-5767: --- Summary: Kafka server should halt if IBP < 1.0.0 and there is log directory failure Key: KAFKA-5767 URL: https://issues.apache.org/jira/browse/KAFKA-5767 Project: Kafka Issue Type: Bug Reporter: Dong Lin Assignee: Dong Lin -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (KAFKA-5156) Options for handling exceptions in streams
[ https://issues.apache.org/jira/browse/KAFKA-5156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias J. Sax reassigned KAFKA-5156: -- Assignee: Matthias J. Sax (was: Eno Thereska) > Options for handling exceptions in streams > -- > > Key: KAFKA-5156 > URL: https://issues.apache.org/jira/browse/KAFKA-5156 > Project: Kafka > Issue Type: Task > Components: streams >Affects Versions: 0.10.2.0 >Reporter: Eno Thereska >Assignee: Matthias J. Sax > Labels: user-experience > Fix For: 1.0.0 > > > This is a task around options for handling exceptions in streams. It focuses > around options for dealing with corrupt data (keep going, stop streams, log, > retry, etc). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-3473) Add controller channel manager request queue time metric.
[ https://issues.apache.org/jira/browse/KAFKA-3473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16137320#comment-16137320 ] Ismael Juma commented on KAFKA-3473: [~omkreddy], we added metrics for the controller channel queue size, but not a queue time metric. > Add controller channel manager request queue time metric. > - > > Key: KAFKA-3473 > URL: https://issues.apache.org/jira/browse/KAFKA-3473 > Project: Kafka > Issue Type: Improvement > Components: controller >Affects Versions: 0.10.0.0 >Reporter: Jiangjie Qin >Assignee: Dong Lin > > Currently controller appends the requests to brokers into controller channel > manager queue during state transition. i.e. the state transition are > propagated asynchronously. We need to track the request queue time on the > controller side to see how long the state propagation is delayed after the > state transition finished on the controller. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-3473) Add controller channel manager request queue time metric.
[ https://issues.apache.org/jira/browse/KAFKA-3473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16137281#comment-16137281 ] Manikumar commented on KAFKA-3473: -- @ijuma Is this covered in KAFKA-5135/KIP-143? > Add controller channel manager request queue time metric. > - > > Key: KAFKA-3473 > URL: https://issues.apache.org/jira/browse/KAFKA-3473 > Project: Kafka > Issue Type: Improvement > Components: controller >Affects Versions: 0.10.0.0 >Reporter: Jiangjie Qin >Assignee: Dong Lin > > Currently controller appends the requests to brokers into controller channel > manager queue during state transition. i.e. the state transition are > propagated asynchronously. We need to track the request queue time on the > controller side to see how long the state propagation is delayed after the > state transition finished on the controller. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (KAFKA-3653) expose the queue size in ControllerChannelManager
[ https://issues.apache.org/jira/browse/KAFKA-3653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manikumar resolved KAFKA-3653. -- Resolution: Fixed Fixed in KAFKA-5135/KIP-143 > expose the queue size in ControllerChannelManager > - > > Key: KAFKA-3653 > URL: https://issues.apache.org/jira/browse/KAFKA-3653 > Project: Kafka > Issue Type: Improvement >Reporter: Jun Rao >Assignee: Gwen Shapira > > Currently, ControllerChannelManager maintains a queue per broker. If the > queue fills up, metadata propagation to the broker is delayed. It would be > useful to expose a metric on the size on the queue for monitoring. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (KAFKA-3800) java client can`t poll msg
[ https://issues.apache.org/jira/browse/KAFKA-3800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manikumar resolved KAFKA-3800. -- Resolution: Cannot Reproduce Please reopen if the issue still exists. > java client can`t poll msg > -- > > Key: KAFKA-3800 > URL: https://issues.apache.org/jira/browse/KAFKA-3800 > Project: Kafka > Issue Type: Bug > Components: consumer >Affects Versions: 0.9.0.1 > Environment: java8,win7 64 >Reporter: frank >Assignee: Neha Narkhede > > i use hump topic name, after poll msg is null.eg: Test_4 why? > all low char is ok. i`m try nodejs,kafka-console-consumers.bat is ok -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (KAFKA-3927) kafka broker config docs issue
[ https://issues.apache.org/jira/browse/KAFKA-3927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manikumar resolved KAFKA-3927. -- Resolution: Later Yes, These changes are done in KAFKA-615. Please reopen if the issue still exists. > kafka broker config docs issue > -- > > Key: KAFKA-3927 > URL: https://issues.apache.org/jira/browse/KAFKA-3927 > Project: Kafka > Issue Type: Bug > Components: website >Affects Versions: 0.10.0.0 >Reporter: Shawn Guo >Priority: Minor > > https://kafka.apache.org/documentation.html#brokerconfigs > log.flush.interval.messages > default value is "9223372036854775807" > log.flush.interval.ms > default value is null > log.flush.scheduler.interval.ms > default value is "9223372036854775807" > etc. obviously these default values are incorrect. how these doc get > generated ? it looks confusing. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (KAFKA-5749) Refactor SessionStore hierarchy
[ https://issues.apache.org/jira/browse/KAFKA-5749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16137226#comment-16137226 ] Andrey Dyachkov edited comment on KAFKA-5749 at 8/22/17 7:13 PM: - [~miguno] It would be very nice if someone guide me through at that point, becasue I do not understand how it works here. I can explain more: I have already made a couple contributions ([KAFKA-4643|https://issues.apache.org/jira/browse/KAFKA-4643], [KAFKA-4657|https://issues.apache.org/jira/browse/KAFKA-4657]). I had decided I could move futher with it and took [KAFKA-5723|https://issues.apache.org/jira/browse/KAFKA-5723], created [pr|https://github.com/apache/kafka/pull/3671] and asked commiters to help me with that, but I have not got any response yet(7 days). I even wrote email to dev list asking adding me to contributors list and helping with [KAFKA-5723|https://issues.apache.org/jira/browse/KAFKA-5723], but no reponse. I think it is related to the priority of the tickets, which I do not know and maybe number of people supporting it. That is why I ended up commenting on the tickets which are subsets of improvments, which is being developed by commiters. I am ready to spent my time on learning and contributing to client side code(clients, admin, streams etc.), since my main tech stack is around Java. Sorry for writting it here, I hope you guys can help me with that. was (Author: andrey.dyach...@gmail.com): [~miguno] It would be very nice if someone will guide me through at that point, becasue I do not understand how it works here. I can explain more: I have already made a couple contributions ([KAFKA-4643|https://issues.apache.org/jira/browse/KAFKA-4643], [KAFKA-4657|https://issues.apache.org/jira/browse/KAFKA-4657]). I had decided I could move futher with it and took [KAFKA-5723|https://issues.apache.org/jira/browse/KAFKA-5723], created [pr|https://github.com/apache/kafka/pull/3671] and asked commiters to help me with that, but I have not got any response yet(7 days). I even wrote email to dev list asking adding me to contributors list and helping with [KAFKA-5723|https://issues.apache.org/jira/browse/KAFKA-5723], but no reponse. I think it is related to the priority of the tickets, which I do not know and maybe number of people supporting it. That is why I ended up commenting on the tickets which are subsets of improvments, which is being developed by commiters. I am ready to spent my time on learning and contributing to client side code(clients, admin, streams etc.), since my main tech stack is around Java. Sorry for writting it here, I hope you guys can help me with that. > Refactor SessionStore hierarchy > --- > > Key: KAFKA-5749 > URL: https://issues.apache.org/jira/browse/KAFKA-5749 > Project: Kafka > Issue Type: Sub-task > Components: streams >Affects Versions: 1.0.0 >Reporter: Damian Guy >Assignee: Damian Guy > Fix For: 1.0.0 > > > In order to support bytes store we need to create a MeteredSessionStore and > ChangeloggingSessionStore. We then need to refactor the current SessionStore > implementations to use this. All inner stores should by of typebyte[]> -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-5749) Refactor SessionStore hierarchy
[ https://issues.apache.org/jira/browse/KAFKA-5749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16137226#comment-16137226 ] Andrey Dyachkov commented on KAFKA-5749: [~miguno] It would be very nice if someone will guide me through at that point, becasue I do not understand how it works here. I can explain more: I have already made a couple contributions ([KAFKA-4643|https://issues.apache.org/jira/browse/KAFKA-4643], [KAFKA-4657|https://issues.apache.org/jira/browse/KAFKA-4657]). I had decided I could move futher with it and took [KAFKA-5723|https://issues.apache.org/jira/browse/KAFKA-5723], created [pr|https://github.com/apache/kafka/pull/3671] and asked commiters to help me with that, but I have not got any response yet(7 days). I even wrote email to dev list asking adding me to contributors list and helping with [KAFKA-5723|https://issues.apache.org/jira/browse/KAFKA-5723], but no reponse. I think it is related to the priority of the tickets, which I do not know and maybe number of people supporting it. That is why I ended up commenting on the tickets which are subsets of improvments, which is being developed by commiters. I am ready to spent my time on learning and contributing to client side code(clients, admin, streams etc.), since my main tech stack is around Java. Sorry for writting it here, I hope you guys can help me with that. > Refactor SessionStore hierarchy > --- > > Key: KAFKA-5749 > URL: https://issues.apache.org/jira/browse/KAFKA-5749 > Project: Kafka > Issue Type: Sub-task > Components: streams >Affects Versions: 1.0.0 >Reporter: Damian Guy >Assignee: Damian Guy > Fix For: 1.0.0 > > > In order to support bytes store we need to create a MeteredSessionStore and > ChangeloggingSessionStore. We then need to refactor the current SessionStore > implementations to use this. All inner stores should by of typebyte[]> -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-5732) Kafka 0.11 Consumer.Poll() hangs for consumer.subscribe()
[ https://issues.apache.org/jira/browse/KAFKA-5732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16137222#comment-16137222 ] Ramkumar commented on KAFKA-5732: - I found if this points to old data log directory (which had Kafka 0.8 was using) then this issue happens. If I point to new directory , then it appears to work fine. Any pointer why it is not working when referring to kafka 0.8 data log folders? how it needs to make compatible. I had set up the message format and protocol as below inter.broker.protocol.version=0.11.0 log.message.format.version=0.11.0 > Kafka 0.11 Consumer.Poll() hangs for consumer.subscribe() > -- > > Key: KAFKA-5732 > URL: https://issues.apache.org/jira/browse/KAFKA-5732 > Project: Kafka > Issue Type: Bug > Components: consumer >Affects Versions: 0.11.0.0 > Environment: Linux >Reporter: Ramkumar > Attachments: dumptest5 > > > Hi, > I am upgraded my 3 node kafka cluster from 0.8 to 0.11 broker. I am trying to > test the new consumer APIs. > Below is the code extract. consumer.poll() method goes for a toss (thread > dump attached) for consumer.subscribe() method . This poll returns value if I > use consumer.seek() methods. Please let me know what i am doing incorrectly. > i have the advertised.host and listeners updated okay in server.properties. > Thread dump attached. > Properties props1 = new Properties(); > props1.put("bootstrap.servers", "localhost:9092"); > props1.put("group.id", "test3"); > props1.put("enable.auto.commit", "false"); > props1.put("auto_offset_reset", "earliest"); > props1.put("request.timeout.ms", 3); > props1.put("key.deserializer", > "org.apache.kafka.common.serialization.StringDeserializer"); > props1.put("value.deserializer", > "org.apache.kafka.common.serialization.StringDeserializer"); > String TestTopic="T3"; > KafkaConsumerconsumer1 = new > KafkaConsumer<>(props1); > consumer1.subscribe(Arrays.asList(TestTopic)); > int j = 0; > while (j < 10) { > j++; > ConsumerRecords > records1=consumer1.poll(100); > for (ConsumerRecord record1 : > records1) { > System.out.printf("offset = %d, key = > %s, value = %s", record1.offset(), record1.key(), > record1.value()); > String t = record1.value(); > out.write(t.getBytes()); > } -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (KAFKA-5766) Very high CPU-load of consumer when broker is down
Sebastian Bernauer created KAFKA-5766: - Summary: Very high CPU-load of consumer when broker is down Key: KAFKA-5766 URL: https://issues.apache.org/jira/browse/KAFKA-5766 Project: Kafka Issue Type: Bug Components: consumer Reporter: Sebastian Bernauer Hi, i have a single broker instance at localhost. I set up a Consumer with the following code: {code:java} Mapconfigs = new HashMap<>(); configs.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092"); configs.put(ConsumerConfig.GROUP_ID_CONFIG, "gh399"); configs.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class); configs.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class); KafkaConsumer consumer = new KafkaConsumer<>(configs); consumer.assign(Collections.singletonList(new TopicPartition("foo", 0))); while (true) { ConsumerRecords records = consumer.poll(1000); System.out.println(records.count()); } {code} This works all fine, until i shut down the broker. If i do so, it causes a 100% CPU-load of my application. After starting the broker again the usage decreases back to a normal level. It would be very nice if you could help me! Thanks, Sebastian Spring-Kafka: 2.0.0.M3 Kafka: 0.10.2.0 JDK: 1.8.0_121 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (KAFKA-4823) Creating Kafka Producer on application running on Java older version
[ https://issues.apache.org/jira/browse/KAFKA-4823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manikumar resolved KAFKA-4823. -- Resolution: Won't Fix Kafka Dropped support for Java 1.6 from 0.9 release. You can try Rest Proxy/Other language libraries. Please reopen if you think otherwise > Creating Kafka Producer on application running on Java older version > > > Key: KAFKA-4823 > URL: https://issues.apache.org/jira/browse/KAFKA-4823 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.10.1.1 > Environment: Application running on Java 1.6 >Reporter: live2code > > I have an application running on Java 1.6 which cannot be upgraded.This > application need to have interfaces to post (producer )messages to Kafka > server remote box.Also receive messages as consumer .The code runs fine from > my local env which is on java 1.7.But the same producer and consumer fails > when executed within the application with the error > Caused by: java.lang.UnsupportedClassVersionError: JVMCFRE003 bad major > version; class=org/apache/kafka/clients/producer/KafkaProducer, offset=6 > Is there someway I can still do it ? -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (KAFKA-5765) Move merge() from StreamsBuilder to KStream
[ https://issues.apache.org/jira/browse/KAFKA-5765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guozhang Wang updated KAFKA-5765: - Fix Version/s: 1.0.0 > Move merge() from StreamsBuilder to KStream > --- > > Key: KAFKA-5765 > URL: https://issues.apache.org/jira/browse/KAFKA-5765 > Project: Kafka > Issue Type: Bug > Components: streams >Affects Versions: 0.11.0.0 >Reporter: Matthias J. Sax >Assignee: Matthias J. Sax > Labels: needs-kip > Fix For: 1.0.0 > > > Merging multiple {{KStream}} is done via {{StreamsBuilder#merge()}} (formally > {{KStreamBuilder#merge()}}). This is quite unnatural and should be done via > {{KStream#merge()}}. > As {{StreamsBuilder}} is not released yet, this is not a backward > incompatible change (and KStreamBuilder is already deprecated). We still need > a KIP as we add a new method to a public {{KStreams}} API. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-5765) Move merge() from StreamsBuilder to KStream
[ https://issues.apache.org/jira/browse/KAFKA-5765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16137146#comment-16137146 ] Guozhang Wang commented on KAFKA-5765: -- Sounds reasonable to me. > Move merge() from StreamsBuilder to KStream > --- > > Key: KAFKA-5765 > URL: https://issues.apache.org/jira/browse/KAFKA-5765 > Project: Kafka > Issue Type: Bug > Components: streams >Affects Versions: 0.11.0.0 >Reporter: Matthias J. Sax >Assignee: Matthias J. Sax > Labels: needs-kip > Fix For: 1.0.0 > > > Merging multiple {{KStream}} is done via {{StreamsBuilder#merge()}} (formally > {{KStreamBuilder#merge()}}). This is quite unnatural and should be done via > {{KStream#merge()}}. > As {{StreamsBuilder}} is not released yet, this is not a backward > incompatible change (and KStreamBuilder is already deprecated). We still need > a KIP as we add a new method to a public {{KStreams}} API. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-5152) Kafka Streams keeps restoring state after shutdown is initiated during startup
[ https://issues.apache.org/jira/browse/KAFKA-5152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16137135#comment-16137135 ] ASF GitHub Bot commented on KAFKA-5152: --- Github user asfgit closed the pull request at: https://github.com/apache/kafka/pull/3675 > Kafka Streams keeps restoring state after shutdown is initiated during startup > -- > > Key: KAFKA-5152 > URL: https://issues.apache.org/jira/browse/KAFKA-5152 > Project: Kafka > Issue Type: Bug > Components: streams >Affects Versions: 0.10.2.1 >Reporter: Xavier Léauté >Assignee: Damian Guy >Priority: Blocker > Fix For: 0.11.0.1, 1.0.0 > > > If streams shutdown is initiated during state restore (e.g. an uncaught > exception is thrown) streams will not shut down until all stores are first > finished restoring. > As restore progresses, stream threads appear to be taken out of service as > part of the shutdown sequence, causing rebalancing of tasks. This compounds > the problem by slowing down the restore process even further, since the > remaining threads now have to also restore the reassigned tasks before they > can shut down. > A more severe issue is that if there is a new rebalance triggered during the > end of the waitingSync phase (e.g. due to a new member joining the group, or > some members timed out the SyncGroup response), then some consumer clients of > the group may already proceed with the {{onPartitionsAssigned}} and blocked > on trying to grab the file dir lock not yet released from other clients, > while the other clients holding the lock are consistently re-sending > {{JoinGroup}} requests while the rebalance cannot be completed because the > clients blocked on the file dir lock will not be kicked out of the group as > its heartbeat thread has been consistently sending HBRequest. Hence this is a > deadlock caused by not releasing the file dir locks in task suspension. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (KAFKA-5696) SourceConnector does not commit offset on rebalance
[ https://issues.apache.org/jira/browse/KAFKA-5696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16137100#comment-16137100 ] Oleg Kuznetsov edited comment on KAFKA-5696 at 8/22/17 5:56 PM: [~rhauch] Is it possible to add waiting all the tasks to finish their current loop (i.e. let producer to finish writing records, commit their offsets) before rebalancing? was (Author: olkuznsmith): [~rhauch] Is it possible to add waiting all the task to finish their current loop (i.e. let producer to finish writing records, commit their offsets) before rebalancing? > SourceConnector does not commit offset on rebalance > --- > > Key: KAFKA-5696 > URL: https://issues.apache.org/jira/browse/KAFKA-5696 > Project: Kafka > Issue Type: Bug > Components: KafkaConnect >Reporter: Oleg Kuznetsov >Priority: Critical > Labels: newbie > > I'm running SourceConnector, that reads files from storage and put data in > kafka. I want, in case of reconfiguration, offsets to be flushed. > Say, a file is completely processed, but source records are not yet committed > and in case of reconfiguration their offsets might be missing in store. > Is it possible to force committing offsets on reconfiguration? -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-5696) SourceConnector does not commit offset on rebalance
[ https://issues.apache.org/jira/browse/KAFKA-5696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16137100#comment-16137100 ] Oleg Kuznetsov commented on KAFKA-5696: --- [~rhauch] Is it possible to add waiting all the task to finish their current loop (i.e. let producer to finish writing records, commit their offsets) before rebalancing? > SourceConnector does not commit offset on rebalance > --- > > Key: KAFKA-5696 > URL: https://issues.apache.org/jira/browse/KAFKA-5696 > Project: Kafka > Issue Type: Bug > Components: KafkaConnect >Reporter: Oleg Kuznetsov >Priority: Critical > Labels: newbie > > I'm running SourceConnector, that reads files from storage and put data in > kafka. I want, in case of reconfiguration, offsets to be flushed. > Say, a file is completely processed, but source records are not yet committed > and in case of reconfiguration their offsets might be missing in store. > Is it possible to force committing offsets on reconfiguration? -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (KAFKA-5547) Return topic authorization failed if no topic describe access
[ https://issues.apache.org/jira/browse/KAFKA-5547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manikumar reassigned KAFKA-5547: Assignee: Manikumar > Return topic authorization failed if no topic describe access > - > > Key: KAFKA-5547 > URL: https://issues.apache.org/jira/browse/KAFKA-5547 > Project: Kafka > Issue Type: Improvement >Reporter: Jason Gustafson >Assignee: Manikumar > Labels: security, usability > Fix For: 1.0.0 > > > We previously made a change to several of the request APIs to return > UNKNOWN_TOPIC_OR_PARTITION if the principal does not have Describe access to > the topic. The thought was to avoid leaking information about which topics > exist. The problem with this is that a client which sees this error will just > keep retrying because it is usually treated as retriable. It seems, however, > that we could return TOPIC_AUTHORIZATION_FAILED instead and still avoid > leaking information as long as we ensure that the Describe authorization > check comes before the topic existence check. This would avoid the ambiguity > on the client. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (KAFKA-5603) Streams should not abort transaction when closing zombie task
[ https://issues.apache.org/jira/browse/KAFKA-5603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias J. Sax updated KAFKA-5603: --- Fix Version/s: (was: 0.11.0.2) 0.11.0.1 > Streams should not abort transaction when closing zombie task > - > > Key: KAFKA-5603 > URL: https://issues.apache.org/jira/browse/KAFKA-5603 > Project: Kafka > Issue Type: Bug > Components: streams >Affects Versions: 0.11.0.0 >Reporter: Matthias J. Sax >Assignee: Matthias J. Sax >Priority: Critical > Fix For: 0.11.0.1 > > > The contract of the transactional producer API is to not call any > transactional method after a {{ProducerFenced}} exception was thrown. > Streams however, does an unconditional call within {{StreamTask#close()}} to > {{abortTransaction()}} in case of unclean shutdown. We need to distinguish > between a {{ProducerFenced}} and other unclean shutdown cases. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-5749) Refactor SessionStore hierarchy
[ https://issues.apache.org/jira/browse/KAFKA-5749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16137085#comment-16137085 ] Michael Noll commented on KAFKA-5749: - Thanks for wanting to contribute, Andrey! Perhaps you could work on some of the other open JIRAs? (Feel free to reach out to us if you don't know which would be good starting points.) > Refactor SessionStore hierarchy > --- > > Key: KAFKA-5749 > URL: https://issues.apache.org/jira/browse/KAFKA-5749 > Project: Kafka > Issue Type: Sub-task > Components: streams >Affects Versions: 1.0.0 >Reporter: Damian Guy >Assignee: Damian Guy > Fix For: 1.0.0 > > > In order to support bytes store we need to create a MeteredSessionStore and > ChangeloggingSessionStore. We then need to refactor the current SessionStore > implementations to use this. All inner stores should by of typebyte[]> -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (KAFKA-5401) Kafka Over TLS Error - Failed to send SSL Close message - Broken Pipe
[ https://issues.apache.org/jira/browse/KAFKA-5401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manikumar resolved KAFKA-5401. -- Resolution: Duplicate > Kafka Over TLS Error - Failed to send SSL Close message - Broken Pipe > - > > Key: KAFKA-5401 > URL: https://issues.apache.org/jira/browse/KAFKA-5401 > Project: Kafka > Issue Type: Bug > Components: KafkaConnect >Affects Versions: 0.10.1.1 > Environment: SLES 11 , Kakaf Over TLS >Reporter: PaVan > Labels: security > > SLES 11 > WARN Failed to send SSL Close message > (org.apache.kafka.common.network.SslTransportLayer) > java.io.IOException: Broken pipe > at sun.nio.ch.FileDispatcherImpl.write0(Native Method) > at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47) > at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93) > at sun.nio.ch.IOUtil.write(IOUtil.java:65) > at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:471) > at > org.apache.kafka.common.network.SslTransportLayer.flush(SslTransportLayer.java:194) > at > org.apache.kafka.common.network.SslTransportLayer.close(SslTransportLayer.java:148) > at > org.apache.kafka.common.network.KafkaChannel.close(KafkaChannel.java:45) > at org.apache.kafka.common.network.Selector.close(Selector.java:442) > at org.apache.kafka.common.network.Selector.poll(Selector.java:310) > at > org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:256) > at > org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:216) > at > org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:128) > at java.lang.Thread.run(Thread.java:745) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-5765) Move merge() from StreamsBuilder to KStream
[ https://issues.apache.org/jira/browse/KAFKA-5765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16137081#comment-16137081 ] Xavier Léauté commented on KAFKA-5765: -- I have a small request when it comes to merge(). The current varargs form generates a lot of compiler warnings that need to be suppressed using {{@SuppressWarnings("unchecked")}}. Given that the typical merge use-case only involves only a handful of streams, I think it would be useful to provide a couple of overloads that take a fixed number of arguments, similar to what Guave does in [ImmutableList.of(...)|https://google.github.io/guava/releases/21.0/api/docs/com/google/common/collect/ImmutableList.html#of-E-E-E-E-E-E-E-E-E-E-E-] > Move merge() from StreamsBuilder to KStream > --- > > Key: KAFKA-5765 > URL: https://issues.apache.org/jira/browse/KAFKA-5765 > Project: Kafka > Issue Type: Bug > Components: streams >Affects Versions: 0.11.0.0 >Reporter: Matthias J. Sax >Assignee: Matthias J. Sax > Labels: needs-kip > > Merging multiple {{KStream}} is done via {{StreamsBuilder#merge()}} (formally > {{KStreamBuilder#merge()}}). This is quite unnatural and should be done via > {{KStream#merge()}}. > As {{StreamsBuilder}} is not released yet, this is not a backward > incompatible change (and KStreamBuilder is already deprecated). We still need > a KIP as we add a new method to a public {{KStreams}} API. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (KAFKA-5765) Move merge() from StreamsBuilder to KStream
Matthias J. Sax created KAFKA-5765: -- Summary: Move merge() from StreamsBuilder to KStream Key: KAFKA-5765 URL: https://issues.apache.org/jira/browse/KAFKA-5765 Project: Kafka Issue Type: Bug Components: streams Affects Versions: 0.11.0.0 Reporter: Matthias J. Sax Assignee: Matthias J. Sax Merging multiple {{KStream}} is done via {{StreamsBuilder#merge()}} (formally {{KStreamBuilder#merge()}}). This is quite unnatural and should be done via {{KStream#merge()}}. As {{StreamsBuilder}} is not released yet, this is not a backward incompatible change (and KStreamBuilder is already deprecated). We still need a KIP as we add a new method to a public {{KStreams}} API. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-5758) Reassigning a topic's partitions can adversely impact other topics
[ https://issues.apache.org/jira/browse/KAFKA-5758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16136984#comment-16136984 ] David van Geest commented on KAFKA-5758: Awesome, Option 3 makes sense to me. Sorry for not communicating my ideas very clearly there. > Reassigning a topic's partitions can adversely impact other topics > -- > > Key: KAFKA-5758 > URL: https://issues.apache.org/jira/browse/KAFKA-5758 > Project: Kafka > Issue Type: Bug > Components: core >Affects Versions: 0.10.1.1 >Reporter: David van Geest > Labels: reliability > Fix For: 1.0.0 > > > We've noticed that reassigning a topic's partitions seems to adversely impact > other topics. Specifically, followers for other topics fall out of the ISR. > While I'm not 100% sure about why this happens, the scenario seems to be as > follows: > 1. Reassignment is manually triggered on topic-partition X-Y, and broker A > (which used to be a follower for X-Y) is no longer a follower. > 2. Broker A makes `FetchRequest` including topic-partition X-Y to broker B, > just after the reassignment. > 3. Broker B can fulfill the `FetchRequest`, but while trying to do so it > tries to record the position of "follower" A. This fails, because broker A is > no longer a follower for X-Y (see exception below). > 4. The entire `FetchRequest` request fails, and broker A's other followed > topics start falling behind. > 5. Depending on the length of the reassignment, this sequence repeats. > In step 3, we see exceptions like: > {noformat} > Error when handling request Name: FetchRequest; Version: 3; CorrelationId: > 46781859; ClientId: ReplicaFetcherThread-0-1001; ReplicaId: 1006; MaxWait: > 500 ms; MinBytes: 1 bytes; MaxBytes:10485760 bytes; RequestInfo: > > kafka.common.NotAssignedReplicaException: Leader 1001 failed to record > follower 1006's position -1 since the replica is not recognized to be one of > the assigned replicas 1001,1004,1005 for partition [topic_being_reassigned,5]. > at kafka.cluster.Partition.updateReplicaLogReadResult(Partition.scala:249) > at > kafka.server.ReplicaManager$$anonfun$updateFollowerLogReadResults$2.apply(ReplicaManager.scala:923) > at > kafka.server.ReplicaManager$$anonfun$updateFollowerLogReadResults$2.apply(ReplicaManager.scala:920) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) > at > kafka.server.ReplicaManager.updateFollowerLogReadResults(ReplicaManager.scala:920) > at kafka.server.ReplicaManager.fetchMessages(ReplicaManager.scala:481) > at kafka.server.KafkaApis.handleFetchRequest(KafkaApis.scala:534) > at kafka.server.KafkaApis.handle(KafkaApis.scala:79) > at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:60) > at java.lang.Thread.run(Thread.java:745) > {noformat} > Does my assessment make sense? If so, this behaviour seems problematic. A few > changes that might improve matters (assuming I'm on the right track): > 1. `FetchRequest` should be able to return partial results > 2. The broker fulfilling the `FetchRequest` could ignore the > `NotAssignedReplicaException`, and return results without recording the > not-any-longer-follower position. > This behaviour was experienced with 0.10.1.1, although looking at the > changelogs and the code in question, I don't see any reason why it would have > changed in later versions. > Am very interested to have some discussion on this. Thanks! -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (KAFKA-5762) Refactor AdminClient to use LogContext
[ https://issues.apache.org/jira/browse/KAFKA-5762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kamal Chandraprakash reassigned KAFKA-5762: --- Assignee: Kamal Chandraprakash > Refactor AdminClient to use LogContext > -- > > Key: KAFKA-5762 > URL: https://issues.apache.org/jira/browse/KAFKA-5762 > Project: Kafka > Issue Type: Improvement >Reporter: Ismael Juma >Assignee: Kamal Chandraprakash > > We added a LogContext object which automatically adds a log prefix to every > message written by loggers constructed from it (much like the Logging mixin > available in the server code). We use this in the consumer to ensure that > messages always contain the consumer group and client ids, which is very > helpful when multiple consumers are run on the same instance. We should do > something similar for the AdminClient. We should always include the client id. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-5758) Reassigning a topic's partitions can adversely impact other topics
[ https://issues.apache.org/jira/browse/KAFKA-5758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16136959#comment-16136959 ] Ismael Juma commented on KAFKA-5758: Option 3 is a normal pattern for the Kafka protocol (since APIs often involve batches) and we don't consider it a partial result. If that's what you meant when you described option 1, then great. > Reassigning a topic's partitions can adversely impact other topics > -- > > Key: KAFKA-5758 > URL: https://issues.apache.org/jira/browse/KAFKA-5758 > Project: Kafka > Issue Type: Bug > Components: core >Affects Versions: 0.10.1.1 >Reporter: David van Geest > Labels: reliability > Fix For: 1.0.0 > > > We've noticed that reassigning a topic's partitions seems to adversely impact > other topics. Specifically, followers for other topics fall out of the ISR. > While I'm not 100% sure about why this happens, the scenario seems to be as > follows: > 1. Reassignment is manually triggered on topic-partition X-Y, and broker A > (which used to be a follower for X-Y) is no longer a follower. > 2. Broker A makes `FetchRequest` including topic-partition X-Y to broker B, > just after the reassignment. > 3. Broker B can fulfill the `FetchRequest`, but while trying to do so it > tries to record the position of "follower" A. This fails, because broker A is > no longer a follower for X-Y (see exception below). > 4. The entire `FetchRequest` request fails, and broker A's other followed > topics start falling behind. > 5. Depending on the length of the reassignment, this sequence repeats. > In step 3, we see exceptions like: > {noformat} > Error when handling request Name: FetchRequest; Version: 3; CorrelationId: > 46781859; ClientId: ReplicaFetcherThread-0-1001; ReplicaId: 1006; MaxWait: > 500 ms; MinBytes: 1 bytes; MaxBytes:10485760 bytes; RequestInfo: > > kafka.common.NotAssignedReplicaException: Leader 1001 failed to record > follower 1006's position -1 since the replica is not recognized to be one of > the assigned replicas 1001,1004,1005 for partition [topic_being_reassigned,5]. > at kafka.cluster.Partition.updateReplicaLogReadResult(Partition.scala:249) > at > kafka.server.ReplicaManager$$anonfun$updateFollowerLogReadResults$2.apply(ReplicaManager.scala:923) > at > kafka.server.ReplicaManager$$anonfun$updateFollowerLogReadResults$2.apply(ReplicaManager.scala:920) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) > at > kafka.server.ReplicaManager.updateFollowerLogReadResults(ReplicaManager.scala:920) > at kafka.server.ReplicaManager.fetchMessages(ReplicaManager.scala:481) > at kafka.server.KafkaApis.handleFetchRequest(KafkaApis.scala:534) > at kafka.server.KafkaApis.handle(KafkaApis.scala:79) > at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:60) > at java.lang.Thread.run(Thread.java:745) > {noformat} > Does my assessment make sense? If so, this behaviour seems problematic. A few > changes that might improve matters (assuming I'm on the right track): > 1. `FetchRequest` should be able to return partial results > 2. The broker fulfilling the `FetchRequest` could ignore the > `NotAssignedReplicaException`, and return results without recording the > not-any-longer-follower position. > This behaviour was experienced with 0.10.1.1, although looking at the > changelogs and the code in question, I don't see any reason why it would have > changed in later versions. > Am very interested to have some discussion on this. Thanks! -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-5758) Reassigning a topic's partitions can adversely impact other topics
[ https://issues.apache.org/jira/browse/KAFKA-5758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16136980#comment-16136980 ] Ismael Juma commented on KAFKA-5758: Yes. > Reassigning a topic's partitions can adversely impact other topics > -- > > Key: KAFKA-5758 > URL: https://issues.apache.org/jira/browse/KAFKA-5758 > Project: Kafka > Issue Type: Bug > Components: core >Affects Versions: 0.10.1.1 >Reporter: David van Geest > Labels: reliability > Fix For: 1.0.0 > > > We've noticed that reassigning a topic's partitions seems to adversely impact > other topics. Specifically, followers for other topics fall out of the ISR. > While I'm not 100% sure about why this happens, the scenario seems to be as > follows: > 1. Reassignment is manually triggered on topic-partition X-Y, and broker A > (which used to be a follower for X-Y) is no longer a follower. > 2. Broker A makes `FetchRequest` including topic-partition X-Y to broker B, > just after the reassignment. > 3. Broker B can fulfill the `FetchRequest`, but while trying to do so it > tries to record the position of "follower" A. This fails, because broker A is > no longer a follower for X-Y (see exception below). > 4. The entire `FetchRequest` request fails, and broker A's other followed > topics start falling behind. > 5. Depending on the length of the reassignment, this sequence repeats. > In step 3, we see exceptions like: > {noformat} > Error when handling request Name: FetchRequest; Version: 3; CorrelationId: > 46781859; ClientId: ReplicaFetcherThread-0-1001; ReplicaId: 1006; MaxWait: > 500 ms; MinBytes: 1 bytes; MaxBytes:10485760 bytes; RequestInfo: > > kafka.common.NotAssignedReplicaException: Leader 1001 failed to record > follower 1006's position -1 since the replica is not recognized to be one of > the assigned replicas 1001,1004,1005 for partition [topic_being_reassigned,5]. > at kafka.cluster.Partition.updateReplicaLogReadResult(Partition.scala:249) > at > kafka.server.ReplicaManager$$anonfun$updateFollowerLogReadResults$2.apply(ReplicaManager.scala:923) > at > kafka.server.ReplicaManager$$anonfun$updateFollowerLogReadResults$2.apply(ReplicaManager.scala:920) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) > at > kafka.server.ReplicaManager.updateFollowerLogReadResults(ReplicaManager.scala:920) > at kafka.server.ReplicaManager.fetchMessages(ReplicaManager.scala:481) > at kafka.server.KafkaApis.handleFetchRequest(KafkaApis.scala:534) > at kafka.server.KafkaApis.handle(KafkaApis.scala:79) > at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:60) > at java.lang.Thread.run(Thread.java:745) > {noformat} > Does my assessment make sense? If so, this behaviour seems problematic. A few > changes that might improve matters (assuming I'm on the right track): > 1. `FetchRequest` should be able to return partial results > 2. The broker fulfilling the `FetchRequest` could ignore the > `NotAssignedReplicaException`, and return results without recording the > not-any-longer-follower position. > This behaviour was experienced with 0.10.1.1, although looking at the > changelogs and the code in question, I don't see any reason why it would have > changed in later versions. > Am very interested to have some discussion on this. Thanks! -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-5696) SourceConnector does not commit offset on rebalance
[ https://issues.apache.org/jira/browse/KAFKA-5696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16136976#comment-16136976 ] Randall Hauch commented on KAFKA-5696: -- [~olkuznsmith], adding, updating, or removing a connector configuration will cause Kafka Connect worker cluster to perform a _rebalance_ that will stop, redistribute, and restart all of the remaining connectors. > SourceConnector does not commit offset on rebalance > --- > > Key: KAFKA-5696 > URL: https://issues.apache.org/jira/browse/KAFKA-5696 > Project: Kafka > Issue Type: Bug > Components: KafkaConnect >Reporter: Oleg Kuznetsov >Priority: Critical > Labels: newbie > > I'm running SourceConnector, that reads files from storage and put data in > kafka. I want, in case of reconfiguration, offsets to be flushed. > Say, a file is completely processed, but source records are not yet committed > and in case of reconfiguration their offsets might be missing in store. > Is it possible to force committing offsets on reconfiguration? -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-5758) Reassigning a topic's partitions can adversely impact other topics
[ https://issues.apache.org/jira/browse/KAFKA-5758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16136974#comment-16136974 ] David van Geest commented on KAFKA-5758: Ah OK. So Option 3 still returns data for the other partitions then? If so, that's what I meant. > Reassigning a topic's partitions can adversely impact other topics > -- > > Key: KAFKA-5758 > URL: https://issues.apache.org/jira/browse/KAFKA-5758 > Project: Kafka > Issue Type: Bug > Components: core >Affects Versions: 0.10.1.1 >Reporter: David van Geest > Labels: reliability > Fix For: 1.0.0 > > > We've noticed that reassigning a topic's partitions seems to adversely impact > other topics. Specifically, followers for other topics fall out of the ISR. > While I'm not 100% sure about why this happens, the scenario seems to be as > follows: > 1. Reassignment is manually triggered on topic-partition X-Y, and broker A > (which used to be a follower for X-Y) is no longer a follower. > 2. Broker A makes `FetchRequest` including topic-partition X-Y to broker B, > just after the reassignment. > 3. Broker B can fulfill the `FetchRequest`, but while trying to do so it > tries to record the position of "follower" A. This fails, because broker A is > no longer a follower for X-Y (see exception below). > 4. The entire `FetchRequest` request fails, and broker A's other followed > topics start falling behind. > 5. Depending on the length of the reassignment, this sequence repeats. > In step 3, we see exceptions like: > {noformat} > Error when handling request Name: FetchRequest; Version: 3; CorrelationId: > 46781859; ClientId: ReplicaFetcherThread-0-1001; ReplicaId: 1006; MaxWait: > 500 ms; MinBytes: 1 bytes; MaxBytes:10485760 bytes; RequestInfo: > > kafka.common.NotAssignedReplicaException: Leader 1001 failed to record > follower 1006's position -1 since the replica is not recognized to be one of > the assigned replicas 1001,1004,1005 for partition [topic_being_reassigned,5]. > at kafka.cluster.Partition.updateReplicaLogReadResult(Partition.scala:249) > at > kafka.server.ReplicaManager$$anonfun$updateFollowerLogReadResults$2.apply(ReplicaManager.scala:923) > at > kafka.server.ReplicaManager$$anonfun$updateFollowerLogReadResults$2.apply(ReplicaManager.scala:920) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) > at > kafka.server.ReplicaManager.updateFollowerLogReadResults(ReplicaManager.scala:920) > at kafka.server.ReplicaManager.fetchMessages(ReplicaManager.scala:481) > at kafka.server.KafkaApis.handleFetchRequest(KafkaApis.scala:534) > at kafka.server.KafkaApis.handle(KafkaApis.scala:79) > at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:60) > at java.lang.Thread.run(Thread.java:745) > {noformat} > Does my assessment make sense? If so, this behaviour seems problematic. A few > changes that might improve matters (assuming I'm on the right track): > 1. `FetchRequest` should be able to return partial results > 2. The broker fulfilling the `FetchRequest` could ignore the > `NotAssignedReplicaException`, and return results without recording the > not-any-longer-follower position. > This behaviour was experienced with 0.10.1.1, although looking at the > changelogs and the code in question, I don't see any reason why it would have > changed in later versions. > Am very interested to have some discussion on this. Thanks! -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (KAFKA-5748) Fix console producer to set timestamp and partition
[ https://issues.apache.org/jira/browse/KAFKA-5748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismael Juma updated KAFKA-5748: --- Fix Version/s: (was: 0.11.0.1) 1.0.0 > Fix console producer to set timestamp and partition > --- > > Key: KAFKA-5748 > URL: https://issues.apache.org/jira/browse/KAFKA-5748 > Project: Kafka > Issue Type: Improvement >Reporter: Ran Ma > Fix For: 1.0.0 > > > https://github.com/apache/kafka/pull/3689 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-5758) Reassigning a topic's partitions can adversely impact other topics
[ https://issues.apache.org/jira/browse/KAFKA-5758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16136933#comment-16136933 ] David van Geest commented on KAFKA-5758: [~ijuma], thanks for the response! I'm not sure I understand the distinction between my option 1 and your option 3. In both, we're talking about returning partial results (along with an error of sorts for the partition that is no longer being followed) in the response to `FetchRequest` right? > Reassigning a topic's partitions can adversely impact other topics > -- > > Key: KAFKA-5758 > URL: https://issues.apache.org/jira/browse/KAFKA-5758 > Project: Kafka > Issue Type: Bug > Components: core >Affects Versions: 0.10.1.1 >Reporter: David van Geest > Labels: reliability > Fix For: 1.0.0 > > > We've noticed that reassigning a topic's partitions seems to adversely impact > other topics. Specifically, followers for other topics fall out of the ISR. > While I'm not 100% sure about why this happens, the scenario seems to be as > follows: > 1. Reassignment is manually triggered on topic-partition X-Y, and broker A > (which used to be a follower for X-Y) is no longer a follower. > 2. Broker A makes `FetchRequest` including topic-partition X-Y to broker B, > just after the reassignment. > 3. Broker B can fulfill the `FetchRequest`, but while trying to do so it > tries to record the position of "follower" A. This fails, because broker A is > no longer a follower for X-Y (see exception below). > 4. The entire `FetchRequest` request fails, and broker A's other followed > topics start falling behind. > 5. Depending on the length of the reassignment, this sequence repeats. > In step 3, we see exceptions like: > {noformat} > Error when handling request Name: FetchRequest; Version: 3; CorrelationId: > 46781859; ClientId: ReplicaFetcherThread-0-1001; ReplicaId: 1006; MaxWait: > 500 ms; MinBytes: 1 bytes; MaxBytes:10485760 bytes; RequestInfo: > > kafka.common.NotAssignedReplicaException: Leader 1001 failed to record > follower 1006's position -1 since the replica is not recognized to be one of > the assigned replicas 1001,1004,1005 for partition [topic_being_reassigned,5]. > at kafka.cluster.Partition.updateReplicaLogReadResult(Partition.scala:249) > at > kafka.server.ReplicaManager$$anonfun$updateFollowerLogReadResults$2.apply(ReplicaManager.scala:923) > at > kafka.server.ReplicaManager$$anonfun$updateFollowerLogReadResults$2.apply(ReplicaManager.scala:920) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) > at > kafka.server.ReplicaManager.updateFollowerLogReadResults(ReplicaManager.scala:920) > at kafka.server.ReplicaManager.fetchMessages(ReplicaManager.scala:481) > at kafka.server.KafkaApis.handleFetchRequest(KafkaApis.scala:534) > at kafka.server.KafkaApis.handle(KafkaApis.scala:79) > at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:60) > at java.lang.Thread.run(Thread.java:745) > {noformat} > Does my assessment make sense? If so, this behaviour seems problematic. A few > changes that might improve matters (assuming I'm on the right track): > 1. `FetchRequest` should be able to return partial results > 2. The broker fulfilling the `FetchRequest` could ignore the > `NotAssignedReplicaException`, and return results without recording the > not-any-longer-follower position. > This behaviour was experienced with 0.10.1.1, although looking at the > changelogs and the code in question, I don't see any reason why it would have > changed in later versions. > Am very interested to have some discussion on this. Thanks! -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (KAFKA-5465) FetchResponse v0 does not return any messages when max_bytes smaller than v2 message set
[ https://issues.apache.org/jira/browse/KAFKA-5465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismael Juma updated KAFKA-5465: --- Fix Version/s: (was: 0.11.0.1) > FetchResponse v0 does not return any messages when max_bytes smaller than v2 > message set > - > > Key: KAFKA-5465 > URL: https://issues.apache.org/jira/browse/KAFKA-5465 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.11.0.0 >Reporter: Dana Powers >Assignee: Jason Gustafson >Priority: Blocker > > In prior releases, when consuming uncompressed messages, FetchResponse v0 > will return a message if it is smaller than the max_bytes sent in the > FetchRequest. In 0.11.0.0 RC0, when messages are stored as v2 internally, the > response will be empty unless the full MessageSet is smaller than max_bytes. > In some configurations, this may cause old consumers to get stuck on large > messages where previously they were able to make progress one message at a > time. > For example, when I produce 10 5KB messages using ProduceRequest v0 and then > attempt FetchRequest v0 with partition max bytes = 6KB (larger than a single > message but smaller than all 10 messages together), I get an empty message > set from 0.11.0.0. Previous brokers would have returned a single message. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (KAFKA-5681) jarAll does not build all scala versions anymore.
[ https://issues.apache.org/jira/browse/KAFKA-5681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismael Juma updated KAFKA-5681: --- Fix Version/s: (was: 0.11.0.1) > jarAll does not build all scala versions anymore. > - > > Key: KAFKA-5681 > URL: https://issues.apache.org/jira/browse/KAFKA-5681 > Project: Kafka > Issue Type: Bug > Components: build >Affects Versions: 0.11.0.0 >Reporter: Jiangjie Qin >Assignee: Jiangjie Qin > > ./gradlew jarAll no longer builds jars for all scala versions. We should use > {{availableScalaVersions}} instead of {{defaultScalaVersions}} when build. We > probably should consider backporting the fix to 0.11.0.0. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (KAFKA-5764) KafkaShortnamer should allow for case inensitive matches
Ryan P created KAFKA-5764: - Summary: KafkaShortnamer should allow for case inensitive matches Key: KAFKA-5764 URL: https://issues.apache.org/jira/browse/KAFKA-5764 Project: Kafka Issue Type: Improvement Components: security Affects Versions: 0.11.0.0 Reporter: Ryan P Currently it does not appear that the KafkaShortnamer allows for case insensitive search and replace rules. It would be good to match the functionality provided by HDFS as operators are familiar with this. This also makes it easier to port auth_to_local rules from your existing hdfs configurations to your new kafka configuration. HWX auth_to_local guide for reference https://community.hortonworks.com/articles/14463/auth-to-local-rules-syntax.html -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-5752) Delete topic and re-create topic immediate will delete the new topic's timeindex
[ https://issues.apache.org/jira/browse/KAFKA-5752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16136846#comment-16136846 ] ASF GitHub Bot commented on KAFKA-5752: --- Github user asfgit closed the pull request at: https://github.com/apache/kafka/pull/3700 > Delete topic and re-create topic immediate will delete the new topic's > timeindex > - > > Key: KAFKA-5752 > URL: https://issues.apache.org/jira/browse/KAFKA-5752 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.10.2.0, 0.11.0.0 >Reporter: Pengwei >Assignee: Manikumar >Priority: Critical > Labels: reliability > Fix For: 0.11.0.1, 1.0.0 > > > When we delete the topic and re-create the topic with the same name, we will > find after the > async delete topic is finished, async delete will remove the newly created > topic's time index. > This is because in the LogManager's asyncDelete, it will change the log and > index's file pointer to the renamed log and index, but missing the time > index. So will cause this issue -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (KAFKA-4856) Calling KafkaProducer.close() from multiple threads may cause spurious error
[ https://issues.apache.org/jira/browse/KAFKA-4856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismael Juma resolved KAFKA-4856. Resolution: Fixed Fix Version/s: 1.0.0 0.11.0.1 > Calling KafkaProducer.close() from multiple threads may cause spurious error > > > Key: KAFKA-4856 > URL: https://issues.apache.org/jira/browse/KAFKA-4856 > Project: Kafka > Issue Type: Bug > Components: clients >Affects Versions: 0.9.0.0, 0.10.0.0, 0.10.2.0 >Reporter: Xavier Léauté >Assignee: Manikumar >Priority: Minor > Fix For: 0.11.0.1, 1.0.0 > > > Calling {{KafkaProducer.close()}} from multiple threads simultaneously may > cause the following harmless error message to be logged. There appears to be > a race-condition in {{AppInfoParser.unregisterAppInfo}} that we don't guard > against. > {noformat} > WARN Error unregistering AppInfo mbean > (org.apache.kafka.common.utils.AppInfoParser:71) > javax.management.InstanceNotFoundException: > kafka.producer:type=app-info,id= > at > com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getMBean(DefaultMBeanServerInterceptor.java:1095) > at > com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.exclusiveUnregisterMBean(DefaultMBeanServerInterceptor.java:427) > at > com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.unregisterMBean(DefaultMBeanServerInterceptor.java:415) > at > com.sun.jmx.mbeanserver.JmxMBeanServer.unregisterMBean(JmxMBeanServer.java:546) > at > org.apache.kafka.common.utils.AppInfoParser.unregisterAppInfo(AppInfoParser.java:69) > at > org.apache.kafka.clients.producer.KafkaProducer.close(KafkaProducer.java:735) > at > org.apache.kafka.clients.producer.KafkaProducer.close(KafkaProducer.java:686) > at > org.apache.kafka.clients.producer.KafkaProducer.close(KafkaProducer.java:665) > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-4856) Calling KafkaProducer.close() from multiple threads may cause spurious error
[ https://issues.apache.org/jira/browse/KAFKA-4856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16136837#comment-16136837 ] ASF GitHub Bot commented on KAFKA-4856: --- Github user asfgit closed the pull request at: https://github.com/apache/kafka/pull/3696 > Calling KafkaProducer.close() from multiple threads may cause spurious error > > > Key: KAFKA-4856 > URL: https://issues.apache.org/jira/browse/KAFKA-4856 > Project: Kafka > Issue Type: Bug > Components: clients >Affects Versions: 0.9.0.0, 0.10.0.0, 0.10.2.0 >Reporter: Xavier Léauté >Assignee: Manikumar >Priority: Minor > > Calling {{KafkaProducer.close()}} from multiple threads simultaneously may > cause the following harmless error message to be logged. There appears to be > a race-condition in {{AppInfoParser.unregisterAppInfo}} that we don't guard > against. > {noformat} > WARN Error unregistering AppInfo mbean > (org.apache.kafka.common.utils.AppInfoParser:71) > javax.management.InstanceNotFoundException: > kafka.producer:type=app-info,id= > at > com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getMBean(DefaultMBeanServerInterceptor.java:1095) > at > com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.exclusiveUnregisterMBean(DefaultMBeanServerInterceptor.java:427) > at > com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.unregisterMBean(DefaultMBeanServerInterceptor.java:415) > at > com.sun.jmx.mbeanserver.JmxMBeanServer.unregisterMBean(JmxMBeanServer.java:546) > at > org.apache.kafka.common.utils.AppInfoParser.unregisterAppInfo(AppInfoParser.java:69) > at > org.apache.kafka.clients.producer.KafkaProducer.close(KafkaProducer.java:735) > at > org.apache.kafka.clients.producer.KafkaProducer.close(KafkaProducer.java:686) > at > org.apache.kafka.clients.producer.KafkaProducer.close(KafkaProducer.java:665) > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (KAFKA-5153) KAFKA Cluster : 0.10.2.0 : Servers Getting disconnected : Service Impacting
[ https://issues.apache.org/jira/browse/KAFKA-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16136769#comment-16136769 ] Arpan edited comment on KAFKA-5153 at 8/22/17 1:10 PM: --- Hi [~arthurk] - Not sure yet what is the solution and we are also stuck and it is quite strange as well. You may also want to have a look at https://issues.apache.org/jira/browse/KAFKA-2729 KAFKA-2729 once. This looks to be similar to the problem we are facing. Regards, Arpan Khagram +91 8308993200 was (Author: arpan.khagram0...@gmail.com): Hi [~arthurk] - Not sure yet what is the solution and we are also stuck and it is quite strange as well. You may also want to have a look at https://issues.apache.org/jira/browse/KAFKA-2729 once. This looks to be similar to the problem we are facing. Regards, Arpan Khagram +91 8308993200 > KAFKA Cluster : 0.10.2.0 : Servers Getting disconnected : Service Impacting > --- > > Key: KAFKA-5153 > URL: https://issues.apache.org/jira/browse/KAFKA-5153 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.10.2.0, 0.11.0.0 > Environment: RHEL 6 > Java Version 1.8.0_91-b14 >Reporter: Arpan >Priority: Critical > Attachments: server_1_72server.log, server_2_73_server.log, > server_3_74Server.log, server.properties, ThreadDump_1493564142.dump, > ThreadDump_1493564177.dump, ThreadDump_1493564249.dump > > > Hi Team, > I was earlier referring to issue KAFKA-4477 because the problem i am facing > is similar. I tried to search the same reference in release docs as well but > did not get anything in 0.10.1.1 or 0.10.2.0. I am currently using > 2.11_0.10.2.0. > I am have 3 node cluster for KAFKA and cluster for ZK as well on the same set > of servers in cluster mode. We are having around 240GB of data getting > transferred through KAFKA everyday. What we are observing is disconnect of > the server from cluster and ISR getting reduced and it starts impacting > service. > I have also observed file descriptor count getting increased a bit, in normal > circumstances we have not observed FD count more than 500 but when issue > started we were observing it in the range of 650-700 on all 3 servers. > Attaching thread dumps of all 3 servers when we started facing the issue > recently. > The issue get vanished once you bounce the nodes and the set up is not > working more than 5 days without this issue. Attaching server logs as well. > Kindly let me know if you need any additional information. Attaching > server.properties as well for one of the server (It's similar on all 3 > serversP) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-5153) KAFKA Cluster : 0.10.2.0 : Servers Getting disconnected : Service Impacting
[ https://issues.apache.org/jira/browse/KAFKA-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16136769#comment-16136769 ] Arpan commented on KAFKA-5153: -- Hi [~arthurk] - Not sure yet what is the solution and we are also stuck and it is quite strange as well. You may also want to have a look at https://issues.apache.org/jira/browse/KAFKA-2729 once. This looks to be similar to the problem we are facing. Regards, Arpan Khagram +91 8308993200 > KAFKA Cluster : 0.10.2.0 : Servers Getting disconnected : Service Impacting > --- > > Key: KAFKA-5153 > URL: https://issues.apache.org/jira/browse/KAFKA-5153 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.10.2.0, 0.11.0.0 > Environment: RHEL 6 > Java Version 1.8.0_91-b14 >Reporter: Arpan >Priority: Critical > Attachments: server_1_72server.log, server_2_73_server.log, > server_3_74Server.log, server.properties, ThreadDump_1493564142.dump, > ThreadDump_1493564177.dump, ThreadDump_1493564249.dump > > > Hi Team, > I was earlier referring to issue KAFKA-4477 because the problem i am facing > is similar. I tried to search the same reference in release docs as well but > did not get anything in 0.10.1.1 or 0.10.2.0. I am currently using > 2.11_0.10.2.0. > I am have 3 node cluster for KAFKA and cluster for ZK as well on the same set > of servers in cluster mode. We are having around 240GB of data getting > transferred through KAFKA everyday. What we are observing is disconnect of > the server from cluster and ISR getting reduced and it starts impacting > service. > I have also observed file descriptor count getting increased a bit, in normal > circumstances we have not observed FD count more than 500 but when issue > started we were observing it in the range of 650-700 on all 3 servers. > Attaching thread dumps of all 3 servers when we started facing the issue > recently. > The issue get vanished once you bounce the nodes and the set up is not > working more than 5 days without this issue. Attaching server logs as well. > Kindly let me know if you need any additional information. Attaching > server.properties as well for one of the server (It's similar on all 3 > serversP) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (KAFKA-2254) The shell script should be optimized , even kafka-run-class.sh has a syntax error.
[ https://issues.apache.org/jira/browse/KAFKA-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manikumar reassigned KAFKA-2254: Assignee: Manikumar > The shell script should be optimized , even kafka-run-class.sh has a syntax > error. > -- > > Key: KAFKA-2254 > URL: https://issues.apache.org/jira/browse/KAFKA-2254 > Project: Kafka > Issue Type: Bug > Components: build >Affects Versions: 0.8.2.1 > Environment: linux >Reporter: Bo Wang >Assignee: Manikumar > Labels: client-script, kafka-run-class.sh, shell-script > Attachments: kafka-shell-script.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > kafka-run-class.sh 128 line has a syntax error(missing a space): > 127-loggc) > 128 if [ -z "$KAFKA_GC_LOG_OPTS"] ; then > 129GC_LOG_ENABLED="true" > 130 fi > And use the ShellCheck to check the shell scripts, the results shows some > errors 、 warnings and notes: > https://github.com/koalaman/shellcheck/wiki/SC2068 > https://github.com/koalaman/shellcheck/wiki/Sc2046 > https://github.com/koalaman/shellcheck/wiki/Sc2086 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (KAFKA-4856) Calling KafkaProducer.close() from multiple threads may cause spurious error
[ https://issues.apache.org/jira/browse/KAFKA-4856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manikumar reassigned KAFKA-4856: Assignee: Manikumar > Calling KafkaProducer.close() from multiple threads may cause spurious error > > > Key: KAFKA-4856 > URL: https://issues.apache.org/jira/browse/KAFKA-4856 > Project: Kafka > Issue Type: Bug > Components: clients >Affects Versions: 0.9.0.0, 0.10.0.0, 0.10.2.0 >Reporter: Xavier Léauté >Assignee: Manikumar >Priority: Minor > > Calling {{KafkaProducer.close()}} from multiple threads simultaneously may > cause the following harmless error message to be logged. There appears to be > a race-condition in {{AppInfoParser.unregisterAppInfo}} that we don't guard > against. > {noformat} > WARN Error unregistering AppInfo mbean > (org.apache.kafka.common.utils.AppInfoParser:71) > javax.management.InstanceNotFoundException: > kafka.producer:type=app-info,id= > at > com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getMBean(DefaultMBeanServerInterceptor.java:1095) > at > com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.exclusiveUnregisterMBean(DefaultMBeanServerInterceptor.java:427) > at > com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.unregisterMBean(DefaultMBeanServerInterceptor.java:415) > at > com.sun.jmx.mbeanserver.JmxMBeanServer.unregisterMBean(JmxMBeanServer.java:546) > at > org.apache.kafka.common.utils.AppInfoParser.unregisterAppInfo(AppInfoParser.java:69) > at > org.apache.kafka.clients.producer.KafkaProducer.close(KafkaProducer.java:735) > at > org.apache.kafka.clients.producer.KafkaProducer.close(KafkaProducer.java:686) > at > org.apache.kafka.clients.producer.KafkaProducer.close(KafkaProducer.java:665) > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (KAFKA-5763) Refactor NetworkClient to use LogContext
Ismael Juma created KAFKA-5763: -- Summary: Refactor NetworkClient to use LogContext Key: KAFKA-5763 URL: https://issues.apache.org/jira/browse/KAFKA-5763 Project: Kafka Issue Type: Improvement Reporter: Ismael Juma We added a LogContext object which automatically adds a log prefix to every message written by loggers constructed from it (much like the Logging mixin available in the server code). We use this in the consumer to ensure that messages always contain the consumer group and client ids, which is very helpful when multiple consumers are run on the same instance. We should do something similar for the NetworkClient. We should always include the client id. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (KAFKA-5762) Refactor AdminClient to use LogContext
Ismael Juma created KAFKA-5762: -- Summary: Refactor AdminClient to use LogContext Key: KAFKA-5762 URL: https://issues.apache.org/jira/browse/KAFKA-5762 Project: Kafka Issue Type: Improvement Reporter: Ismael Juma We added a LogContext object which automatically adds a log prefix to every message written by loggers constructed from it (much like the Logging mixin available in the server code). We use this in the consumer to ensure that messages always contain the consumer group and client ids, which is very helpful when multiple consumers are run on the same instance. We should do something similar for the AdminClient. We should always include the client id. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-5758) Reassigning a topic's partitions can adversely impact other topics
[ https://issues.apache.org/jira/browse/KAFKA-5758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16136708#comment-16136708 ] Ismael Juma commented on KAFKA-5758: [~junrao], what do you think we should do in this case? Option 1 doesn't seem right. Option 2 could work although it's a bit wasteful. An additional option: 3. Return an error for that partition (`NOT_FOLLOWER` would be appropriate, but we don't have that, so we we'd have to reuse an existing error) > Reassigning a topic's partitions can adversely impact other topics > -- > > Key: KAFKA-5758 > URL: https://issues.apache.org/jira/browse/KAFKA-5758 > Project: Kafka > Issue Type: Bug > Components: core >Affects Versions: 0.10.1.1 >Reporter: David van Geest > Labels: reliability > Fix For: 1.0.0 > > > We've noticed that reassigning a topic's partitions seems to adversely impact > other topics. Specifically, followers for other topics fall out of the ISR. > While I'm not 100% sure about why this happens, the scenario seems to be as > follows: > 1. Reassignment is manually triggered on topic-partition X-Y, and broker A > (which used to be a follower for X-Y) is no longer a follower. > 2. Broker A makes `FetchRequest` including topic-partition X-Y to broker B, > just after the reassignment. > 3. Broker B can fulfill the `FetchRequest`, but while trying to do so it > tries to record the position of "follower" A. This fails, because broker A is > no longer a follower for X-Y (see exception below). > 4. The entire `FetchRequest` request fails, and broker A's other followed > topics start falling behind. > 5. Depending on the length of the reassignment, this sequence repeats. > In step 3, we see exceptions like: > {noformat} > Error when handling request Name: FetchRequest; Version: 3; CorrelationId: > 46781859; ClientId: ReplicaFetcherThread-0-1001; ReplicaId: 1006; MaxWait: > 500 ms; MinBytes: 1 bytes; MaxBytes:10485760 bytes; RequestInfo: > > kafka.common.NotAssignedReplicaException: Leader 1001 failed to record > follower 1006's position -1 since the replica is not recognized to be one of > the assigned replicas 1001,1004,1005 for partition [topic_being_reassigned,5]. > at kafka.cluster.Partition.updateReplicaLogReadResult(Partition.scala:249) > at > kafka.server.ReplicaManager$$anonfun$updateFollowerLogReadResults$2.apply(ReplicaManager.scala:923) > at > kafka.server.ReplicaManager$$anonfun$updateFollowerLogReadResults$2.apply(ReplicaManager.scala:920) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) > at > kafka.server.ReplicaManager.updateFollowerLogReadResults(ReplicaManager.scala:920) > at kafka.server.ReplicaManager.fetchMessages(ReplicaManager.scala:481) > at kafka.server.KafkaApis.handleFetchRequest(KafkaApis.scala:534) > at kafka.server.KafkaApis.handle(KafkaApis.scala:79) > at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:60) > at java.lang.Thread.run(Thread.java:745) > {noformat} > Does my assessment make sense? If so, this behaviour seems problematic. A few > changes that might improve matters (assuming I'm on the right track): > 1. `FetchRequest` should be able to return partial results > 2. The broker fulfilling the `FetchRequest` could ignore the > `NotAssignedReplicaException`, and return results without recording the > not-any-longer-follower position. > This behaviour was experienced with 0.10.1.1, although looking at the > changelogs and the code in question, I don't see any reason why it would have > changed in later versions. > Am very interested to have some discussion on this. Thanks! -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (KAFKA-5758) Reassigning a topic's partitions can adversely impact other topics
[ https://issues.apache.org/jira/browse/KAFKA-5758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismael Juma updated KAFKA-5758: --- Fix Version/s: (was: 0.11.0.1) > Reassigning a topic's partitions can adversely impact other topics > -- > > Key: KAFKA-5758 > URL: https://issues.apache.org/jira/browse/KAFKA-5758 > Project: Kafka > Issue Type: Bug > Components: core >Affects Versions: 0.10.1.1 >Reporter: David van Geest > Labels: reliability > Fix For: 1.0.0 > > > We've noticed that reassigning a topic's partitions seems to adversely impact > other topics. Specifically, followers for other topics fall out of the ISR. > While I'm not 100% sure about why this happens, the scenario seems to be as > follows: > 1. Reassignment is manually triggered on topic-partition X-Y, and broker A > (which used to be a follower for X-Y) is no longer a follower. > 2. Broker A makes `FetchRequest` including topic-partition X-Y to broker B, > just after the reassignment. > 3. Broker B can fulfill the `FetchRequest`, but while trying to do so it > tries to record the position of "follower" A. This fails, because broker A is > no longer a follower for X-Y (see exception below). > 4. The entire `FetchRequest` request fails, and broker A's other followed > topics start falling behind. > 5. Depending on the length of the reassignment, this sequence repeats. > In step 3, we see exceptions like: > {noformat} > Error when handling request Name: FetchRequest; Version: 3; CorrelationId: > 46781859; ClientId: ReplicaFetcherThread-0-1001; ReplicaId: 1006; MaxWait: > 500 ms; MinBytes: 1 bytes; MaxBytes:10485760 bytes; RequestInfo: > > kafka.common.NotAssignedReplicaException: Leader 1001 failed to record > follower 1006's position -1 since the replica is not recognized to be one of > the assigned replicas 1001,1004,1005 for partition [topic_being_reassigned,5]. > at kafka.cluster.Partition.updateReplicaLogReadResult(Partition.scala:249) > at > kafka.server.ReplicaManager$$anonfun$updateFollowerLogReadResults$2.apply(ReplicaManager.scala:923) > at > kafka.server.ReplicaManager$$anonfun$updateFollowerLogReadResults$2.apply(ReplicaManager.scala:920) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) > at > kafka.server.ReplicaManager.updateFollowerLogReadResults(ReplicaManager.scala:920) > at kafka.server.ReplicaManager.fetchMessages(ReplicaManager.scala:481) > at kafka.server.KafkaApis.handleFetchRequest(KafkaApis.scala:534) > at kafka.server.KafkaApis.handle(KafkaApis.scala:79) > at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:60) > at java.lang.Thread.run(Thread.java:745) > {noformat} > Does my assessment make sense? If so, this behaviour seems problematic. A few > changes that might improve matters (assuming I'm on the right track): > 1. `FetchRequest` should be able to return partial results > 2. The broker fulfilling the `FetchRequest` could ignore the > `NotAssignedReplicaException`, and return results without recording the > not-any-longer-follower position. > This behaviour was experienced with 0.10.1.1, although looking at the > changelogs and the code in question, I don't see any reason why it would have > changed in later versions. > Am very interested to have some discussion on this. Thanks! -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (KAFKA-5758) Reassigning a topic's partitions can adversely impact other topics
[ https://issues.apache.org/jira/browse/KAFKA-5758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16136702#comment-16136702 ] Ismael Juma edited comment on KAFKA-5758 at 8/22/17 12:28 PM: -- [~dwvangeest], looks like a good find. We should not be failing the whole fetch request, we should only fail the relevant partition. was (Author: ijuma): [~dwvangeest], looks like a good fine. We should not be failing the whole fetch request, we should only fail the relevant partition. > Reassigning a topic's partitions can adversely impact other topics > -- > > Key: KAFKA-5758 > URL: https://issues.apache.org/jira/browse/KAFKA-5758 > Project: Kafka > Issue Type: Bug > Components: core >Affects Versions: 0.10.1.1 >Reporter: David van Geest > Labels: reliability > Fix For: 0.11.0.1, 1.0.0 > > > We've noticed that reassigning a topic's partitions seems to adversely impact > other topics. Specifically, followers for other topics fall out of the ISR. > While I'm not 100% sure about why this happens, the scenario seems to be as > follows: > 1. Reassignment is manually triggered on topic-partition X-Y, and broker A > (which used to be a follower for X-Y) is no longer a follower. > 2. Broker A makes `FetchRequest` including topic-partition X-Y to broker B, > just after the reassignment. > 3. Broker B can fulfill the `FetchRequest`, but while trying to do so it > tries to record the position of "follower" A. This fails, because broker A is > no longer a follower for X-Y (see exception below). > 4. The entire `FetchRequest` request fails, and broker A's other followed > topics start falling behind. > 5. Depending on the length of the reassignment, this sequence repeats. > In step 3, we see exceptions like: > {noformat} > Error when handling request Name: FetchRequest; Version: 3; CorrelationId: > 46781859; ClientId: ReplicaFetcherThread-0-1001; ReplicaId: 1006; MaxWait: > 500 ms; MinBytes: 1 bytes; MaxBytes:10485760 bytes; RequestInfo: > > kafka.common.NotAssignedReplicaException: Leader 1001 failed to record > follower 1006's position -1 since the replica is not recognized to be one of > the assigned replicas 1001,1004,1005 for partition [topic_being_reassigned,5]. > at kafka.cluster.Partition.updateReplicaLogReadResult(Partition.scala:249) > at > kafka.server.ReplicaManager$$anonfun$updateFollowerLogReadResults$2.apply(ReplicaManager.scala:923) > at > kafka.server.ReplicaManager$$anonfun$updateFollowerLogReadResults$2.apply(ReplicaManager.scala:920) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) > at > kafka.server.ReplicaManager.updateFollowerLogReadResults(ReplicaManager.scala:920) > at kafka.server.ReplicaManager.fetchMessages(ReplicaManager.scala:481) > at kafka.server.KafkaApis.handleFetchRequest(KafkaApis.scala:534) > at kafka.server.KafkaApis.handle(KafkaApis.scala:79) > at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:60) > at java.lang.Thread.run(Thread.java:745) > {noformat} > Does my assessment make sense? If so, this behaviour seems problematic. A few > changes that might improve matters (assuming I'm on the right track): > 1. `FetchRequest` should be able to return partial results > 2. The broker fulfilling the `FetchRequest` could ignore the > `NotAssignedReplicaException`, and return results without recording the > not-any-longer-follower position. > This behaviour was experienced with 0.10.1.1, although looking at the > changelogs and the code in question, I don't see any reason why it would have > changed in later versions. > Am very interested to have some discussion on this. Thanks! -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-5758) Reassigning a topic's partitions can adversely impact other topics
[ https://issues.apache.org/jira/browse/KAFKA-5758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16136702#comment-16136702 ] Ismael Juma commented on KAFKA-5758: [~dwvangeest], looks like a good fine. We should not be failing the whole fetch request, we should only fail the relevant partition. > Reassigning a topic's partitions can adversely impact other topics > -- > > Key: KAFKA-5758 > URL: https://issues.apache.org/jira/browse/KAFKA-5758 > Project: Kafka > Issue Type: Bug > Components: core >Affects Versions: 0.10.1.1 >Reporter: David van Geest > Labels: reliability > Fix For: 0.11.0.1, 1.0.0 > > > We've noticed that reassigning a topic's partitions seems to adversely impact > other topics. Specifically, followers for other topics fall out of the ISR. > While I'm not 100% sure about why this happens, the scenario seems to be as > follows: > 1. Reassignment is manually triggered on topic-partition X-Y, and broker A > (which used to be a follower for X-Y) is no longer a follower. > 2. Broker A makes `FetchRequest` including topic-partition X-Y to broker B, > just after the reassignment. > 3. Broker B can fulfill the `FetchRequest`, but while trying to do so it > tries to record the position of "follower" A. This fails, because broker A is > no longer a follower for X-Y (see exception below). > 4. The entire `FetchRequest` request fails, and broker A's other followed > topics start falling behind. > 5. Depending on the length of the reassignment, this sequence repeats. > In step 3, we see exceptions like: > {noformat} > Error when handling request Name: FetchRequest; Version: 3; CorrelationId: > 46781859; ClientId: ReplicaFetcherThread-0-1001; ReplicaId: 1006; MaxWait: > 500 ms; MinBytes: 1 bytes; MaxBytes:10485760 bytes; RequestInfo: > > kafka.common.NotAssignedReplicaException: Leader 1001 failed to record > follower 1006's position -1 since the replica is not recognized to be one of > the assigned replicas 1001,1004,1005 for partition [topic_being_reassigned,5]. > at kafka.cluster.Partition.updateReplicaLogReadResult(Partition.scala:249) > at > kafka.server.ReplicaManager$$anonfun$updateFollowerLogReadResults$2.apply(ReplicaManager.scala:923) > at > kafka.server.ReplicaManager$$anonfun$updateFollowerLogReadResults$2.apply(ReplicaManager.scala:920) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) > at > kafka.server.ReplicaManager.updateFollowerLogReadResults(ReplicaManager.scala:920) > at kafka.server.ReplicaManager.fetchMessages(ReplicaManager.scala:481) > at kafka.server.KafkaApis.handleFetchRequest(KafkaApis.scala:534) > at kafka.server.KafkaApis.handle(KafkaApis.scala:79) > at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:60) > at java.lang.Thread.run(Thread.java:745) > {noformat} > Does my assessment make sense? If so, this behaviour seems problematic. A few > changes that might improve matters (assuming I'm on the right track): > 1. `FetchRequest` should be able to return partial results > 2. The broker fulfilling the `FetchRequest` could ignore the > `NotAssignedReplicaException`, and return results without recording the > not-any-longer-follower position. > This behaviour was experienced with 0.10.1.1, although looking at the > changelogs and the code in question, I don't see any reason why it would have > changed in later versions. > Am very interested to have some discussion on this. Thanks! -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (KAFKA-5758) Reassigning a topic's partitions can adversely impact other topics
[ https://issues.apache.org/jira/browse/KAFKA-5758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismael Juma updated KAFKA-5758: --- Fix Version/s: 1.0.0 0.11.0.1 > Reassigning a topic's partitions can adversely impact other topics > -- > > Key: KAFKA-5758 > URL: https://issues.apache.org/jira/browse/KAFKA-5758 > Project: Kafka > Issue Type: Bug > Components: core >Affects Versions: 0.10.1.1 >Reporter: David van Geest > Labels: reliability > Fix For: 0.11.0.1, 1.0.0 > > > We've noticed that reassigning a topic's partitions seems to adversely impact > other topics. Specifically, followers for other topics fall out of the ISR. > While I'm not 100% sure about why this happens, the scenario seems to be as > follows: > 1. Reassignment is manually triggered on topic-partition X-Y, and broker A > (which used to be a follower for X-Y) is no longer a follower. > 2. Broker A makes `FetchRequest` including topic-partition X-Y to broker B, > just after the reassignment. > 3. Broker B can fulfill the `FetchRequest`, but while trying to do so it > tries to record the position of "follower" A. This fails, because broker A is > no longer a follower for X-Y (see exception below). > 4. The entire `FetchRequest` request fails, and broker A's other followed > topics start falling behind. > 5. Depending on the length of the reassignment, this sequence repeats. > In step 3, we see exceptions like: > {noformat} > Error when handling request Name: FetchRequest; Version: 3; CorrelationId: > 46781859; ClientId: ReplicaFetcherThread-0-1001; ReplicaId: 1006; MaxWait: > 500 ms; MinBytes: 1 bytes; MaxBytes:10485760 bytes; RequestInfo: > > kafka.common.NotAssignedReplicaException: Leader 1001 failed to record > follower 1006's position -1 since the replica is not recognized to be one of > the assigned replicas 1001,1004,1005 for partition [topic_being_reassigned,5]. > at kafka.cluster.Partition.updateReplicaLogReadResult(Partition.scala:249) > at > kafka.server.ReplicaManager$$anonfun$updateFollowerLogReadResults$2.apply(ReplicaManager.scala:923) > at > kafka.server.ReplicaManager$$anonfun$updateFollowerLogReadResults$2.apply(ReplicaManager.scala:920) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) > at > kafka.server.ReplicaManager.updateFollowerLogReadResults(ReplicaManager.scala:920) > at kafka.server.ReplicaManager.fetchMessages(ReplicaManager.scala:481) > at kafka.server.KafkaApis.handleFetchRequest(KafkaApis.scala:534) > at kafka.server.KafkaApis.handle(KafkaApis.scala:79) > at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:60) > at java.lang.Thread.run(Thread.java:745) > {noformat} > Does my assessment make sense? If so, this behaviour seems problematic. A few > changes that might improve matters (assuming I'm on the right track): > 1. `FetchRequest` should be able to return partial results > 2. The broker fulfilling the `FetchRequest` could ignore the > `NotAssignedReplicaException`, and return results without recording the > not-any-longer-follower position. > This behaviour was experienced with 0.10.1.1, although looking at the > changelogs and the code in question, I don't see any reason why it would have > changed in later versions. > Am very interested to have some discussion on this. Thanks! -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (KAFKA-5758) Reassigning a topic's partitions can adversely impact other topics
[ https://issues.apache.org/jira/browse/KAFKA-5758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismael Juma updated KAFKA-5758: --- Labels: reliability (was: ) > Reassigning a topic's partitions can adversely impact other topics > -- > > Key: KAFKA-5758 > URL: https://issues.apache.org/jira/browse/KAFKA-5758 > Project: Kafka > Issue Type: Bug > Components: core >Affects Versions: 0.10.1.1 >Reporter: David van Geest > Labels: reliability > Fix For: 0.11.0.1, 1.0.0 > > > We've noticed that reassigning a topic's partitions seems to adversely impact > other topics. Specifically, followers for other topics fall out of the ISR. > While I'm not 100% sure about why this happens, the scenario seems to be as > follows: > 1. Reassignment is manually triggered on topic-partition X-Y, and broker A > (which used to be a follower for X-Y) is no longer a follower. > 2. Broker A makes `FetchRequest` including topic-partition X-Y to broker B, > just after the reassignment. > 3. Broker B can fulfill the `FetchRequest`, but while trying to do so it > tries to record the position of "follower" A. This fails, because broker A is > no longer a follower for X-Y (see exception below). > 4. The entire `FetchRequest` request fails, and broker A's other followed > topics start falling behind. > 5. Depending on the length of the reassignment, this sequence repeats. > In step 3, we see exceptions like: > {noformat} > Error when handling request Name: FetchRequest; Version: 3; CorrelationId: > 46781859; ClientId: ReplicaFetcherThread-0-1001; ReplicaId: 1006; MaxWait: > 500 ms; MinBytes: 1 bytes; MaxBytes:10485760 bytes; RequestInfo: > > kafka.common.NotAssignedReplicaException: Leader 1001 failed to record > follower 1006's position -1 since the replica is not recognized to be one of > the assigned replicas 1001,1004,1005 for partition [topic_being_reassigned,5]. > at kafka.cluster.Partition.updateReplicaLogReadResult(Partition.scala:249) > at > kafka.server.ReplicaManager$$anonfun$updateFollowerLogReadResults$2.apply(ReplicaManager.scala:923) > at > kafka.server.ReplicaManager$$anonfun$updateFollowerLogReadResults$2.apply(ReplicaManager.scala:920) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) > at > kafka.server.ReplicaManager.updateFollowerLogReadResults(ReplicaManager.scala:920) > at kafka.server.ReplicaManager.fetchMessages(ReplicaManager.scala:481) > at kafka.server.KafkaApis.handleFetchRequest(KafkaApis.scala:534) > at kafka.server.KafkaApis.handle(KafkaApis.scala:79) > at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:60) > at java.lang.Thread.run(Thread.java:745) > {noformat} > Does my assessment make sense? If so, this behaviour seems problematic. A few > changes that might improve matters (assuming I'm on the right track): > 1. `FetchRequest` should be able to return partial results > 2. The broker fulfilling the `FetchRequest` could ignore the > `NotAssignedReplicaException`, and return results without recording the > not-any-longer-follower position. > This behaviour was experienced with 0.10.1.1, although looking at the > changelogs and the code in question, I don't see any reason why it would have > changed in later versions. > Am very interested to have some discussion on this. Thanks! -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (KAFKA-5720) In Jenkins, kafka.api.SaslSslAdminClientIntegrationTest failed with org.apache.kafka.common.errors.TimeoutException
[ https://issues.apache.org/jira/browse/KAFKA-5720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismael Juma updated KAFKA-5720: --- Fix Version/s: 1.0.0 > In Jenkins, kafka.api.SaslSslAdminClientIntegrationTest failed with > org.apache.kafka.common.errors.TimeoutException > --- > > Key: KAFKA-5720 > URL: https://issues.apache.org/jira/browse/KAFKA-5720 > Project: Kafka > Issue Type: Bug >Reporter: Colin P. McCabe >Priority: Minor > Fix For: 1.0.0 > > > In Jenkins, kafka.api.SaslSslAdminClientIntegrationTest failed with > org.apache.kafka.common.errors.TimeoutException. > {code} > java.util.concurrent.ExecutionException: > org.apache.kafka.common.errors.TimeoutException: Aborted due to timeout. > at > org.apache.kafka.common.internals.KafkaFutureImpl.wrapAndThrow(KafkaFutureImpl.java:45) > at > org.apache.kafka.common.internals.KafkaFutureImpl.access$000(KafkaFutureImpl.java:32) > at > org.apache.kafka.common.internals.KafkaFutureImpl$SingleWaiter.await(KafkaFutureImpl.java:89) > at > org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:213) > at > kafka.api.AdminClientIntegrationTest.testCallInFlightTimeouts(AdminClientIntegrationTest.scala:399) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.lang.Thread.run(Thread.java:748) > Caused by: org.apache.kafka.common.errors.TimeoutException: Aborted due to > timeout. > {code} > It's unclear whether this was an environment error or test bug. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-5342) Distinguish abortable failures in transactional producer
[ https://issues.apache.org/jira/browse/KAFKA-5342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16136686#comment-16136686 ] Ismael Juma commented on KAFKA-5342: [~hachikuji], do you think you can submit the PR today? > Distinguish abortable failures in transactional producer > > > Key: KAFKA-5342 > URL: https://issues.apache.org/jira/browse/KAFKA-5342 > Project: Kafka > Issue Type: Sub-task > Components: clients, core, producer >Reporter: Jason Gustafson >Assignee: Jason Gustafson > Fix For: 0.11.0.1 > > > The transactional producer distinguishes two classes of user-visible errors: > 1. Abortable errors: these are errors which are fatal to the ongoing > transaction, but which can be successfully aborted. Essentially any error in > which the producer can still expect to successfully send EndTxn to the > transaction coordinator is abortable. > 2. Fatal errors: any error which is not abortable is fatal. For example, a > transactionalId authorization error is fatal because it would also prevent > the TC from receiving the EndTxn request. > At the moment, it's not clear how the user would know how they should handle > a given failure. One option is to add an exception type to indicate which > errors are abortable (e.g. AbortableKafkaException). Then any other exception > could be considered fatal. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-5749) Refactor SessionStore hierarchy
[ https://issues.apache.org/jira/browse/KAFKA-5749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16136644#comment-16136644 ] Damian Guy commented on KAFKA-5749: --- Sorry [~adyachkov] these tasks i've assigned to myself as the are part of the KIP i'm working on. In most cases i already have a plan and/or have done part of the task. > Refactor SessionStore hierarchy > --- > > Key: KAFKA-5749 > URL: https://issues.apache.org/jira/browse/KAFKA-5749 > Project: Kafka > Issue Type: Sub-task > Components: streams >Affects Versions: 1.0.0 >Reporter: Damian Guy >Assignee: Damian Guy > Fix For: 1.0.0 > > > In order to support bytes store we need to create a MeteredSessionStore and > ChangeloggingSessionStore. We then need to refactor the current SessionStore > implementations to use this. All inner stores should by of typebyte[]> -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (KAFKA-5686) Documentation inconsistency on the "Compression"
[ https://issues.apache.org/jira/browse/KAFKA-5686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manikumar reassigned KAFKA-5686: Assignee: Manikumar > Documentation inconsistency on the "Compression" > > > Key: KAFKA-5686 > URL: https://issues.apache.org/jira/browse/KAFKA-5686 > Project: Kafka > Issue Type: Bug > Components: documentation >Affects Versions: 0.11.0.0 >Reporter: Seweryn Habdank-Wojewodzki >Assignee: Manikumar >Priority: Minor > > At the page: > https://kafka.apache.org/documentation/ > There is a sentence: > {{Kafka supports GZIP, Snappy and LZ4 compression protocols. More details on > compression can be found here.}} > Especially link under the word *here* is describing very old compression > settings, which is false in case of version 0.11.x.y. > JAVA API: > Also it would be nice to clearly state if *compression.type* uses only case > sensitive String as a value or if it is recommended to use e.g. > {{CompressionType.GZIP.name}} for JAVA API. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (KAFKA-5751) Kafka cannot start; corrupted index file(s)
[ https://issues.apache.org/jira/browse/KAFKA-5751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manikumar resolved KAFKA-5751. -- Resolution: Duplicate > Kafka cannot start; corrupted index file(s) > --- > > Key: KAFKA-5751 > URL: https://issues.apache.org/jira/browse/KAFKA-5751 > Project: Kafka > Issue Type: Bug > Components: log >Affects Versions: 0.11.0.0 > Environment: Linux (RedHat 7) >Reporter: Martin M >Priority: Critical > > A system was running Kafka 0.11.0 and some applications that produce and > consume events. > During the runtime, a power outage was experienced. Upon restart, Kafka did > not recover. > Logs show repeatedly the messages below: > *server.log* > {noformat} > [2017-08-15 15:02:26,374] FATAL [Kafka Server 1001], Fatal error during > KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer) > org.apache.kafka.common.protocol.types.SchemaException: Error reading field > 'version': java.nio.BufferUnderflowException > at org.apache.kafka.common.protocol.types.Schema.read(Schema.java:75) > at > kafka.log.ProducerStateManager$.readSnapshot(ProducerStateManager.scala:289) > at > kafka.log.ProducerStateManager.loadFromSnapshot(ProducerStateManager.scala:440) > at > kafka.log.ProducerStateManager.truncateAndReload(ProducerStateManager.scala:499) > at kafka.log.Log.kafka$log$Log$$recoverSegment(Log.scala:327) > at kafka.log.Log$$anonfun$loadSegmentFiles$3.apply(Log.scala:314) > at kafka.log.Log$$anonfun$loadSegmentFiles$3.apply(Log.scala:272) > at > scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733) > at > scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) > at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186) > at > scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732) > at kafka.log.Log.loadSegmentFiles(Log.scala:272) > at kafka.log.Log.loadSegments(Log.scala:376) > at kafka.log.Log.(Log.scala:179) > at kafka.log.Log$.apply(Log.scala:1580) > at > kafka.log.LogManager$$anonfun$loadLogs$2$$anonfun$5$$anonfun$apply$12$$anonfun$apply$1.apply$mcV$sp(LogManager.scala:172) > at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:57) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:748) > {noformat} > *kafkaServer.out* > {noformat} > [2017-08-15 16:03:50,927] WARN Found a corrupted index file due to > requirement failed: Corrupt index found, index file > (/opt/nsp/os/kafka/data/session-manager.revoke_token_topic-7/.index) > has non-zero size but the last offset is 0 which is no larger than the base > offset 0.}. deleting > /opt/nsp/os/kafka/data/session-manager.revoke_token_topic-7/.timeindex, > > /opt/nsp/os/kafka/data/session-manager.revoke_token_topic-7/.index, > and > /opt/nsp/os/kafka/data/session-manager.revoke_token_topic-7/.txnindex > and rebuilding index... (kafka.log.Log) > [2017-08-15 16:03:50,931] INFO [Kafka Server 1001], shutting down > (kafka.server.KafkaServer) > [2017-08-15 16:03:50,932] INFO Recovering unflushed segment 0 in log > session-manager.revoke_token_topic-7. (kafka.log.Log) > [2017-08-15 16:03:50,935] INFO Terminate ZkClient event thread. > (org.I0Itec.zkclient.ZkEventThread) > [2017-08-15 16:03:50,936] INFO Loading producer state from offset 0 for > partition session-manager.revoke_token_topic-7 with message format version 2 > (kafka.log.Log) > [2017-08-15 16:03:50,937] INFO Completed load of log > session-manager.revoke_token_topic-7 with 1 log segments, log start offset 0 > and log end offset 0 in 10 ms (kafka.log.Log) > [2017-08-15 16:03:50,938] INFO Session: 0x1000f772d26063b closed > (org.apache.zookeeper.ZooKeeper) > [2017-08-15 16:03:50,938] INFO EventThread shut down for session: > 0x1000f772d26063b (org.apache.zookeeper.ClientCnxn) > {noformat} > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (KAFKA-5714) Allow whitespaces in the principal name
[ https://issues.apache.org/jira/browse/KAFKA-5714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manikumar reassigned KAFKA-5714: Assignee: (was: Manikumar) > Allow whitespaces in the principal name > --- > > Key: KAFKA-5714 > URL: https://issues.apache.org/jira/browse/KAFKA-5714 > Project: Kafka > Issue Type: Improvement > Components: security >Affects Versions: 0.10.2.1 >Reporter: Alla Tumarkin > > Request > Improve parser behavior to allow whitespaces in the principal name in the > config file, as in: > {code} > super.users=User:CN=Unknown, OU=Unknown, O=Unknown, L=Unknown, ST=Unknown, > C=Unknown > {code} > Background > Current implementation requires that there are no whitespaces after commas, > i.e. > {code} > super.users=User:CN=Unknown,OU=Unknown,O=Unknown,L=Unknown,ST=Unknown,C=Unknown > {code} > Note: having a semicolon at the end doesn't help, i.e. this does not work > either > {code} > super.users=User:CN=Unknown, OU=Unknown, O=Unknown, L=Unknown, ST=Unknown, > C=Unknown; > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-5689) Refactor WindowStore hierarchy so that Metered Store is the outermost store
[ https://issues.apache.org/jira/browse/KAFKA-5689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16136607#comment-16136607 ] ASF GitHub Bot commented on KAFKA-5689: --- Github user asfgit closed the pull request at: https://github.com/apache/kafka/pull/3692 > Refactor WindowStore hierarchy so that Metered Store is the outermost store > > > Key: KAFKA-5689 > URL: https://issues.apache.org/jira/browse/KAFKA-5689 > Project: Kafka > Issue Type: Sub-task > Components: streams >Reporter: Damian Guy >Assignee: Damian Guy > Fix For: 1.0.0 > > > MeteredWinowStore is currently not the outermost store. Further it needs to > have the inner store asto allow easy plugability of custom > storage engines. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (KAFKA-5503) Idempotent producer ignores shutdown while fetching ProducerId
[ https://issues.apache.org/jira/browse/KAFKA-5503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismael Juma updated KAFKA-5503: --- Fix Version/s: (was: 0.11.0.2, 1.0.0) 0.11.0.2 1.0.0 > Idempotent producer ignores shutdown while fetching ProducerId > -- > > Key: KAFKA-5503 > URL: https://issues.apache.org/jira/browse/KAFKA-5503 > Project: Kafka > Issue Type: Sub-task > Components: clients, core, producer >Affects Versions: 0.11.0.0 >Reporter: Jason Gustafson >Assignee: Evgeny Veretennikov > Fix For: 1.0.0, 0.11.0.2 > > > When using the idempotent producer, we initially block the sender thread > while we attempt to get the ProducerId. During this time, a concurrent call > to close() will be ignored. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (KAFKA-5503) Idempotent producer ignores shutdown while fetching ProducerId
[ https://issues.apache.org/jira/browse/KAFKA-5503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismael Juma updated KAFKA-5503: --- Fix Version/s: (was: 0.11.0.1) 0.11.0.2, 1.0.0 > Idempotent producer ignores shutdown while fetching ProducerId > -- > > Key: KAFKA-5503 > URL: https://issues.apache.org/jira/browse/KAFKA-5503 > Project: Kafka > Issue Type: Sub-task > Components: clients, core, producer >Affects Versions: 0.11.0.0 >Reporter: Jason Gustafson >Assignee: Evgeny Veretennikov > Fix For: 0.11.0.2, 1.0.0 > > > When using the idempotent producer, we initially block the sender thread > while we attempt to get the ProducerId. During this time, a concurrent call > to close() will be ignored. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (KAFKA-5752) Delete topic and re-create topic immediate will delete the new topic's timeindex
[ https://issues.apache.org/jira/browse/KAFKA-5752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismael Juma reassigned KAFKA-5752: -- Assignee: Manikumar > Delete topic and re-create topic immediate will delete the new topic's > timeindex > - > > Key: KAFKA-5752 > URL: https://issues.apache.org/jira/browse/KAFKA-5752 > Project: Kafka > Issue Type: Bug >Affects Versions: 0.10.2.0, 0.11.0.0 >Reporter: Pengwei >Assignee: Manikumar >Priority: Critical > Labels: reliability > Fix For: 0.11.0.1, 1.0.0 > > > When we delete the topic and re-create the topic with the same name, we will > find after the > async delete topic is finished, async delete will remove the newly created > topic's time index. > This is because in the LogManager's asyncDelete, it will change the log and > index's file pointer to the renamed log and index, but missing the time > index. So will cause this issue -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (KAFKA-5761) Serializer API should support ByteBuffer
Bhaskar Gollapudi created KAFKA-5761: Summary: Serializer API should support ByteBuffer Key: KAFKA-5761 URL: https://issues.apache.org/jira/browse/KAFKA-5761 Project: Kafka Issue Type: Improvement Components: clients Affects Versions: 0.11.0.0 Reporter: Bhaskar Gollapudi Consider the Serializer : Its main method is : byte[] serialize(String topic, T data); Producer applications create a implementation that takes in an instance ( of T ) and convert that to a byte[]. This byte array is allocated a new for this message.This byte array then is handed over to Kafka Producer API internals that write the bytes to buffer/ network socket. When the next message arrives , the serializer instead of creating a new byte[] , should try to reuse the existing byte[] for the new message. This requires two things : 1. The process of handing off the bytes to the buffer/socket and reusing the byte[] must happen on the same thread. 2 There should be a way for marking the end of available bytes in the byte[]. The first is reasonably simple to understand. If this does not happen , and without other necessary synchrinization , the byte[] get corrupted and so is the message written to buffer/socket.However , this requirement is easy to meet for a producer application , because it controls the threads on which the serializer is invoked. The second is where the problem lies with the current API. It does not allow a variable size of bytes to be read from a container. It is limited by the byte[]'s length. This forces the producer to 1 either create a new byte[] for a message that is bigger than the previous one. OR 2. Decide a max size and use a padding . Both are cumbersome and error prone, and may cause wasting of network bandwidth. Instead , if there is an Serializer with this method : ByteBuffer serialize(String topic, T data); This helps to implements a reusable bytes container for clients to avoid allocations for each message. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (KAFKA-3866) KerberosLogin refresh time bug and other improvements
[ https://issues.apache.org/jira/browse/KAFKA-3866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismael Juma updated KAFKA-3866: --- Fix Version/s: 1.0.0 > KerberosLogin refresh time bug and other improvements > - > > Key: KAFKA-3866 > URL: https://issues.apache.org/jira/browse/KAFKA-3866 > Project: Kafka > Issue Type: Bug > Components: security >Affects Versions: 0.10.0.0 >Reporter: Ismael Juma >Assignee: Ismael Juma > Fix For: 1.0.0, 0.11.0.2 > > > ZOOKEEPER-2295 describes a bug in the Kerberos refresh time logic that is > also present in our KerberosLogin class. While looking at the code, I found a > number of things that could be improved. More details in the PR. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (KAFKA-3866) KerberosLogin refresh time bug and other improvements
[ https://issues.apache.org/jira/browse/KAFKA-3866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismael Juma updated KAFKA-3866: --- Fix Version/s: (was: 0.11.0.1) 0.11.0.2 > KerberosLogin refresh time bug and other improvements > - > > Key: KAFKA-3866 > URL: https://issues.apache.org/jira/browse/KAFKA-3866 > Project: Kafka > Issue Type: Bug > Components: security >Affects Versions: 0.10.0.0 >Reporter: Ismael Juma >Assignee: Ismael Juma > Fix For: 1.0.0, 0.11.0.2 > > > ZOOKEEPER-2295 describes a bug in the Kerberos refresh time logic that is > also present in our KerberosLogin class. While looking at the code, I found a > number of things that could be improved. More details in the PR. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-4741) Memory leak in RecordAccumulator.append
[ https://issues.apache.org/jira/browse/KAFKA-4741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16136549#comment-16136549 ] Julius Žaromskis commented on KAFKA-4741: - I'm having this problem: https://stackoverflow.com/questions/45813477/kafka-off-heap-memory-leak My questions is this - would it cause leaking on producer or on server? > Memory leak in RecordAccumulator.append > --- > > Key: KAFKA-4741 > URL: https://issues.apache.org/jira/browse/KAFKA-4741 > Project: Kafka > Issue Type: Bug > Components: clients >Reporter: Satish Duggana >Assignee: Satish Duggana > Fix For: 0.11.0.0 > > > RecordAccumulator creates a `ByteBuffer` from free memory pool. This should > be deallocated when invocations encounter an exception or throwing any > exceptions. > I added todo comment lines in the below code for cases to deallocate that > buffer. > {code:title=RecordProducer.java|borderStyle=solid} > ByteBuffer buffer = free.allocate(size, maxTimeToBlock); > synchronized (dq) { > // Need to check if producer is closed again after grabbing > the dequeue lock. > if (closed) >// todo buffer should be cleared. > throw new IllegalStateException("Cannot send after the > producer is closed."); > // todo buffer should be cleared up when tryAppend throws an > Exception > RecordAppendResult appendResult = tryAppend(timestamp, key, > value, callback, dq); > if (appendResult != null) { > // Somebody else found us a batch, return the one we > waited for! Hopefully this doesn't happen often... > free.deallocate(buffer); > return appendResult; > } > {code} > I will raise PR for the same soon. -- This message was sent by Atlassian JIRA (v6.4.14#64029)