[jira] [Updated] (KAFKA-9210) kafka stream loss data
[ https://issues.apache.org/jira/browse/KAFKA-9210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] panpan.liu updated KAFKA-9210: -- Description: kafka broker: 2.0.1 kafka stream client: 2.1.0 # two applications run at the same time # after some days,I stop one application(in k8s) # The flollowing log occured and I check the data and find that value is less than what I expected. ``` Partitions [flash-app-xmc-worker-share-store-minute-repartition-1]Partitions [flash-app-xmc-worker-share-store-minute-repartition-1] for changelog flash-app-xmc-worker-share-store-minute-changelog-12019-11-19 05:50:49.816|WARN |flash-client-xmc-StreamThread-3|o.a.k.s.p.i.StoreChangelogReader|101|stream-thread [flash-client-xmc-StreamThread-3] Restoring StreamTasks failed. Deleting StreamTasks stores to recreate from scratch.org.apache.kafka.clients.consumer.OffsetOutOfRangeException: Offsets out of range with no configured reset policy for partitions: \{flash-app-xmc-worker-share-store-minute-changelog-1=6128684} at org.apache.kafka.clients.consumer.internals.Fetcher.parseCompletedFetch(Fetcher.java:987)2019-11-19 05:50:49.817|INFO |flash-client-xmc-StreamThread-3|o.a.k.s.p.i.StoreChangelogReader|105|stream-thread [flash-client-xmc-StreamThread-3] Reinitializing StreamTask TaskId: 10_1 ProcessorTopology: KSTREAM-SOURCE-70: topics: [flash-app-xmc-worker-share-store-minute-repartition] children: [KSTREAM-AGGREGATE-67] KSTREAM-AGGREGATE-67: states: [worker-share-store-minute] children: [KTABLE-TOSTREAM-71] KTABLE-TOSTREAM-71: children: [KSTREAM-SINK-72] KSTREAM-SINK-72: topic: StaticTopicNameExtractor(xmc-worker-share-minute)Partitions [flash-app-xmc-worker-share-store-minute-repartition-1] for changelog flash-app-xmc-worker-share-store-minute-changelog-12019-11-19 05:50:49.842|WARN |flash-client-xmc-StreamThread-3|o.a.k.s.p.i.StoreChangelogReader|101|stream-thread [flash-client-xmc-StreamThread-3] Restoring StreamTasks failed. Deleting StreamTasks stores to recreate from scratch.org.apache.kafka.clients.consumer.OffsetOutOfRangeException: Offsets out of range with no configured reset policy for partitions: \{flash-app-xmc-worker-share-store-minute-changelog-1=6128684} at org.apache.kafka.clients.consumer.internals.Fetcher.parseCompletedFetch(Fetcher.java:987)2019-11-19 05:50:49.842|INFO |flash-client-xmc-StreamThread-3|o.a.k.s.p.i.StoreChangelogReader|105|stream-thread [flash-client-xmc-StreamThread-3] Reinitializing StreamTask TaskId: 10_1 ProcessorTopology: KSTREAM-SOURCE-70: topics: [flash-app-xmc-worker-share-store-minute-repartition] children: [KSTREAM-AGGREGATE-67] KSTREAM-AGGREGATE-67: states: [worker-share-store-minute] children: [KTABLE-TOSTREAM-71] KTABLE-TOSTREAM-71: children: [KSTREAM-SINK-72] KSTREAM-SINK-72: topic: StaticTopicNameExtractor(xmc-worker-share-minute)Partitions [flash-app-xmc-worker-share-store-minute-repartition-1] for changelog flash-app-xmc-worker-share-store-minute-changelog-12019-11-19 05:50:49.905|WARN |flash-client-xmc-StreamThread-3|o.a.k.s.p.i.StoreChangelogReader|101|stream-thread [flash-client-xmc-StreamThread-3] Restoring StreamTasks failed. Deleting StreamTasks stores to recreate from scratch.org.apache.kafka.clients.consumer.OffsetOutOfRangeException: Offsets out of range with no configured reset policy for partitions: \{flash-app-xmc-worker-share-store-minute-changelog-1=6128684} at org.apache.kafka.clients.consumer.internals.Fetcher.parseCompletedFetch(Fetcher.java:987)2019-11-19 05:50:49.906|INFO |flash-client-xmc-StreamThread-3|o.a.k.s.p.i.StoreChangelogReader|105|stream-thread [flash-client-xmc-StreamThread-3] Reinitializing StreamTask TaskId: 10_1 ProcessorTopology: KSTREAM-SOURCE-70: topics: [flash-app-xmc-worker-share-store-minute-repartition] children: [KSTREAM-AGGREGATE-67] KSTREAM-AGGREGATE-67: states: [worker-share-store-minute] ``` was: kafka broker: 2.0.1 kafka stream client: 2.1.0 # two applications run at the same time # after some days,I stop one application(in k8s) # The flollowing log occured and I check the data and find that value is less than what I expected. {noformat} Partitions [flash-app-xmc-worker-share-store-minute-repartition-1]Partitions [flash-app-xmc-worker-share-store-minute-repartition-1] for changelog flash-app-xmc-worker-share-store-minute-changelog-12019-11-19 05:50:49.816|WARN |flash-client-xmc-StreamThread-3|o.a.k.s.p.i.StoreChangelogReader|101|stream-thread [flash-client-xmc-StreamThread-3] Restoring StreamTasks failed. Deleting StreamTasks stores to recreate from scratch.org.apache.kafka.clients.consumer.OffsetOutOfRangeException: Offsets out of range with no configured reset policy for partitions: {flash-app-xmc-worker-share-store-minute-changelog-1=6128684}
[jira] [Updated] (KAFKA-9210) kafka stream loss data
[ https://issues.apache.org/jira/browse/KAFKA-9210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] panpan.liu updated KAFKA-9210: -- Attachment: screenshot-1.png > kafka stream loss data > -- > > Key: KAFKA-9210 > URL: https://issues.apache.org/jira/browse/KAFKA-9210 > Project: Kafka > Issue Type: Bug >Affects Versions: 2.0.1 >Reporter: panpan.liu >Priority: Major > Attachments: app.log, screenshot-1.png > > > kafka broker: 2.0.1 > kafka stream client: 2.1.0 > # two applications run at the same time > # after some days,I stop one application(in k8s) > # The flollowing log occured and I check the data and find that value is > less than what I expected. > > {noformat} > Partitions [flash-app-xmc-worker-share-store-minute-repartition-1]Partitions > [flash-app-xmc-worker-share-store-minute-repartition-1] for changelog > flash-app-xmc-worker-share-store-minute-changelog-12019-11-19 > 05:50:49.816|WARN > |flash-client-xmc-StreamThread-3|o.a.k.s.p.i.StoreChangelogReader|101|stream-thread > [flash-client-xmc-StreamThread-3] Restoring StreamTasks failed. Deleting > StreamTasks stores to recreate from > scratch.org.apache.kafka.clients.consumer.OffsetOutOfRangeException: Offsets > out of range with no configured reset policy for partitions: > {flash-app-xmc-worker-share-store-minute-changelog-1=6128684} at > org.apache.kafka.clients.consumer.internals.Fetcher.parseCompletedFetch(Fetcher.java:987)2019-11-19 > 05:50:49.817|INFO > |flash-client-xmc-StreamThread-3|o.a.k.s.p.i.StoreChangelogReader|105|stream-thread > [flash-client-xmc-StreamThread-3] Reinitializing StreamTask TaskId: 10_1 > ProcessorTopology: KSTREAM-SOURCE-70: topics: > [flash-app-xmc-worker-share-store-minute-repartition] children: > [KSTREAM-AGGREGATE-67] KSTREAM-AGGREGATE-67: states: > [worker-share-store-minute] children: [KTABLE-TOSTREAM-71] > KTABLE-TOSTREAM-71: children: [KSTREAM-SINK-72] > KSTREAM-SINK-72: topic: > StaticTopicNameExtractor(xmc-worker-share-minute)Partitions > [flash-app-xmc-worker-share-store-minute-repartition-1] for changelog > flash-app-xmc-worker-share-store-minute-changelog-12019-11-19 > 05:50:49.842|WARN > |flash-client-xmc-StreamThread-3|o.a.k.s.p.i.StoreChangelogReader|101|stream-thread > [flash-client-xmc-StreamThread-3] Restoring StreamTasks failed. Deleting > StreamTasks stores to recreate from > scratch.org.apache.kafka.clients.consumer.OffsetOutOfRangeException: Offsets > out of range with no configured reset policy for partitions: > {flash-app-xmc-worker-share-store-minute-changelog-1=6128684} at > org.apache.kafka.clients.consumer.internals.Fetcher.parseCompletedFetch(Fetcher.java:987)2019-11-19 > 05:50:49.842|INFO > |flash-client-xmc-StreamThread-3|o.a.k.s.p.i.StoreChangelogReader|105|stream-thread > [flash-client-xmc-StreamThread-3] Reinitializing StreamTask TaskId: 10_1 > ProcessorTopology: KSTREAM-SOURCE-70: topics: > [flash-app-xmc-worker-share-store-minute-repartition] children: > [KSTREAM-AGGREGATE-67] KSTREAM-AGGREGATE-67: states: > [worker-share-store-minute] children: [KTABLE-TOSTREAM-71] > KTABLE-TOSTREAM-71: children: [KSTREAM-SINK-72] > KSTREAM-SINK-72: topic: > StaticTopicNameExtractor(xmc-worker-share-minute)Partitions > [flash-app-xmc-worker-share-store-minute-repartition-1] for changelog > flash-app-xmc-worker-share-store-minute-changelog-12019-11-19 > 05:50:49.905|WARN > |flash-client-xmc-StreamThread-3|o.a.k.s.p.i.StoreChangelogReader|101|stream-thread > [flash-client-xmc-StreamThread-3] Restoring StreamTasks failed. Deleting > StreamTasks stores to recreate from > scratch.org.apache.kafka.clients.consumer.OffsetOutOfRangeException: Offsets > out of range with no configured reset policy for partitions: > {flash-app-xmc-worker-share-store-minute-changelog-1=6128684} at > org.apache.kafka.clients.consumer.internals.Fetcher.parseCompletedFetch(Fetcher.java:987)2019-11-19 > 05:50:49.906|INFO > |flash-client-xmc-StreamThread-3|o.a.k.s.p.i.StoreChangelogReader|105|stream-thread > [flash-client-xmc-StreamThread-3] Reinitializing StreamTask TaskId: 10_1 > ProcessorTopology: KSTREAM-SOURCE-70: topics: > [flash-app-xmc-worker-share-store-minute-repartition] children: > [KSTREAM-AGGREGATE-67] KSTREAM-AGGREGATE-67: states: > [worker-share-store-minute] > {noformat} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KAFKA-9210) kafka stream loss data
[ https://issues.apache.org/jira/browse/KAFKA-9210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] panpan.liu updated KAFKA-9210: -- Attachment: app.log > kafka stream loss data > -- > > Key: KAFKA-9210 > URL: https://issues.apache.org/jira/browse/KAFKA-9210 > Project: Kafka > Issue Type: Bug >Affects Versions: 2.0.1 >Reporter: panpan.liu >Priority: Major > Attachments: app.log > > > kafka broker: 2.0.1 > kafka stream client: 2.1.0 > # two applications run at the same time > # after some days,I stop one application(in k8s) > # The flollowing log occured and I check the data and find that value is > less than what I expected. > > {noformat} > Partitions [flash-app-xmc-worker-share-store-minute-repartition-1]Partitions > [flash-app-xmc-worker-share-store-minute-repartition-1] for changelog > flash-app-xmc-worker-share-store-minute-changelog-12019-11-19 > 05:50:49.816|WARN > |flash-client-xmc-StreamThread-3|o.a.k.s.p.i.StoreChangelogReader|101|stream-thread > [flash-client-xmc-StreamThread-3] Restoring StreamTasks failed. Deleting > StreamTasks stores to recreate from > scratch.org.apache.kafka.clients.consumer.OffsetOutOfRangeException: Offsets > out of range with no configured reset policy for partitions: > {flash-app-xmc-worker-share-store-minute-changelog-1=6128684} at > org.apache.kafka.clients.consumer.internals.Fetcher.parseCompletedFetch(Fetcher.java:987)2019-11-19 > 05:50:49.817|INFO > |flash-client-xmc-StreamThread-3|o.a.k.s.p.i.StoreChangelogReader|105|stream-thread > [flash-client-xmc-StreamThread-3] Reinitializing StreamTask TaskId: 10_1 > ProcessorTopology: KSTREAM-SOURCE-70: topics: > [flash-app-xmc-worker-share-store-minute-repartition] children: > [KSTREAM-AGGREGATE-67] KSTREAM-AGGREGATE-67: states: > [worker-share-store-minute] children: [KTABLE-TOSTREAM-71] > KTABLE-TOSTREAM-71: children: [KSTREAM-SINK-72] > KSTREAM-SINK-72: topic: > StaticTopicNameExtractor(xmc-worker-share-minute)Partitions > [flash-app-xmc-worker-share-store-minute-repartition-1] for changelog > flash-app-xmc-worker-share-store-minute-changelog-12019-11-19 > 05:50:49.842|WARN > |flash-client-xmc-StreamThread-3|o.a.k.s.p.i.StoreChangelogReader|101|stream-thread > [flash-client-xmc-StreamThread-3] Restoring StreamTasks failed. Deleting > StreamTasks stores to recreate from > scratch.org.apache.kafka.clients.consumer.OffsetOutOfRangeException: Offsets > out of range with no configured reset policy for partitions: > {flash-app-xmc-worker-share-store-minute-changelog-1=6128684} at > org.apache.kafka.clients.consumer.internals.Fetcher.parseCompletedFetch(Fetcher.java:987)2019-11-19 > 05:50:49.842|INFO > |flash-client-xmc-StreamThread-3|o.a.k.s.p.i.StoreChangelogReader|105|stream-thread > [flash-client-xmc-StreamThread-3] Reinitializing StreamTask TaskId: 10_1 > ProcessorTopology: KSTREAM-SOURCE-70: topics: > [flash-app-xmc-worker-share-store-minute-repartition] children: > [KSTREAM-AGGREGATE-67] KSTREAM-AGGREGATE-67: states: > [worker-share-store-minute] children: [KTABLE-TOSTREAM-71] > KTABLE-TOSTREAM-71: children: [KSTREAM-SINK-72] > KSTREAM-SINK-72: topic: > StaticTopicNameExtractor(xmc-worker-share-minute)Partitions > [flash-app-xmc-worker-share-store-minute-repartition-1] for changelog > flash-app-xmc-worker-share-store-minute-changelog-12019-11-19 > 05:50:49.905|WARN > |flash-client-xmc-StreamThread-3|o.a.k.s.p.i.StoreChangelogReader|101|stream-thread > [flash-client-xmc-StreamThread-3] Restoring StreamTasks failed. Deleting > StreamTasks stores to recreate from > scratch.org.apache.kafka.clients.consumer.OffsetOutOfRangeException: Offsets > out of range with no configured reset policy for partitions: > {flash-app-xmc-worker-share-store-minute-changelog-1=6128684} at > org.apache.kafka.clients.consumer.internals.Fetcher.parseCompletedFetch(Fetcher.java:987)2019-11-19 > 05:50:49.906|INFO > |flash-client-xmc-StreamThread-3|o.a.k.s.p.i.StoreChangelogReader|105|stream-thread > [flash-client-xmc-StreamThread-3] Reinitializing StreamTask TaskId: 10_1 > ProcessorTopology: KSTREAM-SOURCE-70: topics: > [flash-app-xmc-worker-share-store-minute-repartition] children: > [KSTREAM-AGGREGATE-67] KSTREAM-AGGREGATE-67: states: > [worker-share-store-minute] > {noformat} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KAFKA-9210) kafka stream loss data
[ https://issues.apache.org/jira/browse/KAFKA-9210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] panpan.liu updated KAFKA-9210: -- Description: kafka broker: 2.0.1 kafka stream client: 2.1.0 # two applications run at the same time # after some days,I stop one application(in k8s) # The flollowing log occured and I check the data and find that value is less than what I expected. {noformat} Partitions [flash-app-xmc-worker-share-store-minute-repartition-1]Partitions [flash-app-xmc-worker-share-store-minute-repartition-1] for changelog flash-app-xmc-worker-share-store-minute-changelog-12019-11-19 05:50:49.816|WARN |flash-client-xmc-StreamThread-3|o.a.k.s.p.i.StoreChangelogReader|101|stream-thread [flash-client-xmc-StreamThread-3] Restoring StreamTasks failed. Deleting StreamTasks stores to recreate from scratch.org.apache.kafka.clients.consumer.OffsetOutOfRangeException: Offsets out of range with no configured reset policy for partitions: {flash-app-xmc-worker-share-store-minute-changelog-1=6128684} at org.apache.kafka.clients.consumer.internals.Fetcher.parseCompletedFetch(Fetcher.java:987)2019-11-19 05:50:49.817|INFO |flash-client-xmc-StreamThread-3|o.a.k.s.p.i.StoreChangelogReader|105|stream-thread [flash-client-xmc-StreamThread-3] Reinitializing StreamTask TaskId: 10_1 ProcessorTopology: KSTREAM-SOURCE-70: topics: [flash-app-xmc-worker-share-store-minute-repartition] children: [KSTREAM-AGGREGATE-67] KSTREAM-AGGREGATE-67: states: [worker-share-store-minute] children: [KTABLE-TOSTREAM-71] KTABLE-TOSTREAM-71: children: [KSTREAM-SINK-72] KSTREAM-SINK-72: topic: StaticTopicNameExtractor(xmc-worker-share-minute)Partitions [flash-app-xmc-worker-share-store-minute-repartition-1] for changelog flash-app-xmc-worker-share-store-minute-changelog-12019-11-19 05:50:49.842|WARN |flash-client-xmc-StreamThread-3|o.a.k.s.p.i.StoreChangelogReader|101|stream-thread [flash-client-xmc-StreamThread-3] Restoring StreamTasks failed. Deleting StreamTasks stores to recreate from scratch.org.apache.kafka.clients.consumer.OffsetOutOfRangeException: Offsets out of range with no configured reset policy for partitions: {flash-app-xmc-worker-share-store-minute-changelog-1=6128684} at org.apache.kafka.clients.consumer.internals.Fetcher.parseCompletedFetch(Fetcher.java:987)2019-11-19 05:50:49.842|INFO |flash-client-xmc-StreamThread-3|o.a.k.s.p.i.StoreChangelogReader|105|stream-thread [flash-client-xmc-StreamThread-3] Reinitializing StreamTask TaskId: 10_1 ProcessorTopology: KSTREAM-SOURCE-70: topics: [flash-app-xmc-worker-share-store-minute-repartition] children: [KSTREAM-AGGREGATE-67] KSTREAM-AGGREGATE-67: states: [worker-share-store-minute] children: [KTABLE-TOSTREAM-71] KTABLE-TOSTREAM-71: children: [KSTREAM-SINK-72] KSTREAM-SINK-72: topic: StaticTopicNameExtractor(xmc-worker-share-minute)Partitions [flash-app-xmc-worker-share-store-minute-repartition-1] for changelog flash-app-xmc-worker-share-store-minute-changelog-12019-11-19 05:50:49.905|WARN |flash-client-xmc-StreamThread-3|o.a.k.s.p.i.StoreChangelogReader|101|stream-thread [flash-client-xmc-StreamThread-3] Restoring StreamTasks failed. Deleting StreamTasks stores to recreate from scratch.org.apache.kafka.clients.consumer.OffsetOutOfRangeException: Offsets out of range with no configured reset policy for partitions: {flash-app-xmc-worker-share-store-minute-changelog-1=6128684} at org.apache.kafka.clients.consumer.internals.Fetcher.parseCompletedFetch(Fetcher.java:987)2019-11-19 05:50:49.906|INFO |flash-client-xmc-StreamThread-3|o.a.k.s.p.i.StoreChangelogReader|105|stream-thread [flash-client-xmc-StreamThread-3] Reinitializing StreamTask TaskId: 10_1 ProcessorTopology: KSTREAM-SOURCE-70: topics: [flash-app-xmc-worker-share-store-minute-repartition] children: [KSTREAM-AGGREGATE-67] KSTREAM-AGGREGATE-67: states: [worker-share-store-minute] {noformat} was: ``` Partitions [flash-app-xmc-worker-share-store-minute-repartition-1]Partitions [flash-app-xmc-worker-share-store-minute-repartition-1] for changelog flash-app-xmc-worker-share-store-minute-changelog-12019-11-19 05:50:49.816|WARN |flash-client-xmc-StreamThread-3|o.a.k.s.p.i.StoreChangelogReader|101|stream-thread [flash-client-xmc-StreamThread-3] Restoring StreamTasks failed. Deleting StreamTasks stores to recreate from scratch.org.apache.kafka.clients.consumer.OffsetOutOfRangeException: Offsets out of range with no configured reset policy for partitions: \{flash-app-xmc-worker-share-store-minute-changelog-1=6128684} at org.apache.kafka.clients.consumer.internals.Fetcher.parseCompletedFetch(Fetcher.java:987)2019-11-19 05:50:49.817|INFO |flash-client-xmc-StreamThread-3|o.a.k.s.p.i.StoreChangelogReader|105|stream-thread [flash-client-xmc-StreamThread-3]
[jira] [Created] (KAFKA-9210) kafka stream loss data
panpan.liu created KAFKA-9210: - Summary: kafka stream loss data Key: KAFKA-9210 URL: https://issues.apache.org/jira/browse/KAFKA-9210 Project: Kafka Issue Type: Bug Affects Versions: 2.0.1 Reporter: panpan.liu ``` Partitions [flash-app-xmc-worker-share-store-minute-repartition-1]Partitions [flash-app-xmc-worker-share-store-minute-repartition-1] for changelog flash-app-xmc-worker-share-store-minute-changelog-12019-11-19 05:50:49.816|WARN |flash-client-xmc-StreamThread-3|o.a.k.s.p.i.StoreChangelogReader|101|stream-thread [flash-client-xmc-StreamThread-3] Restoring StreamTasks failed. Deleting StreamTasks stores to recreate from scratch.org.apache.kafka.clients.consumer.OffsetOutOfRangeException: Offsets out of range with no configured reset policy for partitions: \{flash-app-xmc-worker-share-store-minute-changelog-1=6128684} at org.apache.kafka.clients.consumer.internals.Fetcher.parseCompletedFetch(Fetcher.java:987)2019-11-19 05:50:49.817|INFO |flash-client-xmc-StreamThread-3|o.a.k.s.p.i.StoreChangelogReader|105|stream-thread [flash-client-xmc-StreamThread-3] Reinitializing StreamTask TaskId: 10_1 ProcessorTopology: KSTREAM-SOURCE-70: topics: [flash-app-xmc-worker-share-store-minute-repartition] children: [KSTREAM-AGGREGATE-67] KSTREAM-AGGREGATE-67: states: [worker-share-store-minute] children: [KTABLE-TOSTREAM-71] KTABLE-TOSTREAM-71: children: [KSTREAM-SINK-72] KSTREAM-SINK-72: topic: StaticTopicNameExtractor(xmc-worker-share-minute)Partitions [flash-app-xmc-worker-share-store-minute-repartition-1] for changelog flash-app-xmc-worker-share-store-minute-changelog-12019-11-19 05:50:49.842|WARN |flash-client-xmc-StreamThread-3|o.a.k.s.p.i.StoreChangelogReader|101|stream-thread [flash-client-xmc-StreamThread-3] Restoring StreamTasks failed. Deleting StreamTasks stores to recreate from scratch.org.apache.kafka.clients.consumer.OffsetOutOfRangeException: Offsets out of range with no configured reset policy for partitions: \{flash-app-xmc-worker-share-store-minute-changelog-1=6128684} at org.apache.kafka.clients.consumer.internals.Fetcher.parseCompletedFetch(Fetcher.java:987)2019-11-19 05:50:49.842|INFO |flash-client-xmc-StreamThread-3|o.a.k.s.p.i.StoreChangelogReader|105|stream-thread [flash-client-xmc-StreamThread-3] Reinitializing StreamTask TaskId: 10_1 ProcessorTopology: KSTREAM-SOURCE-70: topics: [flash-app-xmc-worker-share-store-minute-repartition] children: [KSTREAM-AGGREGATE-67] KSTREAM-AGGREGATE-67: states: [worker-share-store-minute] children: [KTABLE-TOSTREAM-71] KTABLE-TOSTREAM-71: children: [KSTREAM-SINK-72] KSTREAM-SINK-72: topic: StaticTopicNameExtractor(xmc-worker-share-minute)Partitions [flash-app-xmc-worker-share-store-minute-repartition-1] for changelog flash-app-xmc-worker-share-store-minute-changelog-12019-11-19 05:50:49.905|WARN |flash-client-xmc-StreamThread-3|o.a.k.s.p.i.StoreChangelogReader|101|stream-thread [flash-client-xmc-StreamThread-3] Restoring StreamTasks failed. Deleting StreamTasks stores to recreate from scratch.org.apache.kafka.clients.consumer.OffsetOutOfRangeException: Offsets out of range with no configured reset policy for partitions: \{flash-app-xmc-worker-share-store-minute-changelog-1=6128684} at org.apache.kafka.clients.consumer.internals.Fetcher.parseCompletedFetch(Fetcher.java:987)2019-11-19 05:50:49.906|INFO |flash-client-xmc-StreamThread-3|o.a.k.s.p.i.StoreChangelogReader|105|stream-thread [flash-client-xmc-StreamThread-3] Reinitializing StreamTask TaskId: 10_1 ProcessorTopology: KSTREAM-SOURCE-70: topics: [flash-app-xmc-worker-share-store-minute-repartition] children: [KSTREAM-AGGREGATE-67] KSTREAM-AGGREGATE-67: states: [worker-share-store-minute] ``` -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-9205) Add an option to enforce rack-aware partition reassignment
[ https://issues.apache.org/jira/browse/KAFKA-9205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16977155#comment-16977155 ] sats commented on KAFKA-9205: - [~vahid] can you please create the KIP (Sorry i am newbie not aware of the process), i can work on the code part. > Add an option to enforce rack-aware partition reassignment > -- > > Key: KAFKA-9205 > URL: https://issues.apache.org/jira/browse/KAFKA-9205 > Project: Kafka > Issue Type: Improvement > Components: admin, tools >Reporter: Vahid Hashemian >Priority: Minor > > One regularly used healing operation on Kafka clusters is replica > reassignments for topic partitions. For example, when there is a skew in > inbound/outbound traffic of a broker replica reassignment can be used to move > some leaders/followers from the broker; or if there is a skew in disk usage > of brokers, replica reassignment can more some partitions to other brokers > that have more disk space available. > In Kafka clusters that span across multiple data centers (or availability > zones), high availability is a priority; in the sense that when a data center > goes offline the cluster should be able to resume normal operation by > guaranteeing partition replicas in all data centers. > This guarantee is currently the responsibility of the on-call engineer that > performs the reassignment or the tool that automatically generates the > reassignment plan for improving the cluster health (e.g. by considering the > rack configuration value of each broker in the cluster). the former, is quite > error-prone, and the latter, would lead to duplicate code in all such admin > tools (which are not error free either). Not all use cases can make use the > default assignment strategy that is used by --generate option; and current > rack aware enforcement applies to this option only. > It would be great for the built-in replica assignment API and tool provided > by Kafka to support a rack aware verification option for --execute scenario > that would simply return an error when [some] brokers in any replica set > share a common rack. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-7356) Add gradle task for dependency listing
[ https://issues.apache.org/jira/browse/KAFKA-7356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16977133#comment-16977133 ] ASF GitHub Bot commented on KAFKA-7356: --- omkreddy commented on pull request #5589: KAFKA-7356 Added allDeps task to generate complete dependency report. URL: https://github.com/apache/kafka/pull/5589 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add gradle task for dependency listing > -- > > Key: KAFKA-7356 > URL: https://issues.apache.org/jira/browse/KAFKA-7356 > Project: Kafka > Issue Type: New Feature > Components: build, packaging >Reporter: Andy LoPresto >Priority: Minor > > I needed to examine the dependency list to confirm/deny use of a specific > dependency. Running {{gradle -q dependencies}} in the root directory only > lists the {{rat}} dependencies. Adding a custom section to *build.gradle* > allows for a complete listing of the dependencies from the command line. > {code} > subprojects { > task allDeps(type: DependencyReportTask) {} > } > {code} > To invoke: {{gradle allDeps}} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (KAFKA-9157) logcleaner could generate empty segment files after cleaning
[ https://issues.apache.org/jira/browse/KAFKA-9157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] huxihx reassigned KAFKA-9157: - Assignee: huxihx > logcleaner could generate empty segment files after cleaning > > > Key: KAFKA-9157 > URL: https://issues.apache.org/jira/browse/KAFKA-9157 > Project: Kafka > Issue Type: Improvement > Components: core >Affects Versions: 2.3.0 >Reporter: Jun Rao >Assignee: huxihx >Priority: Major > > Currently, the log cleaner could only combine segments within a 2-billion > offset range. If all records in that range are deleted, an empty segment > could be generated. It would be useful to avoid generating such empty > segments. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-9209) Avoid sending unnecessary offset updates from consumer after KIP-211
Michael Bingham created KAFKA-9209: -- Summary: Avoid sending unnecessary offset updates from consumer after KIP-211 Key: KAFKA-9209 URL: https://issues.apache.org/jira/browse/KAFKA-9209 Project: Kafka Issue Type: Improvement Components: consumer Affects Versions: 2.3.0 Reporter: Michael Bingham With KIP-211 ([https://cwiki.apache.org/confluence/display/KAFKA/KIP-211%3A+Revise+Expiration+Semantics+of+Consumer+Group+Offsets]), offsets will no longer expire as long as the consumer group is active. If the consumer has {{enable.auto.commit=true}}, and if no new events are arriving on subscribed partition(s), the consumer still sends offsets (unchanged) to the group coordinator just to keep them from expiring. This is no longer necessary, and an optimization could potentially be implemented to only send offsets with auto commit when there are actual updates to be made (i.e., when new events have been processed). This would require detecting whether the broker supports the new expiration semantics in KIP-211, and only apply the optimization when it does. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-8981) Add bandwidth limits to DegradedNetworkFault
[ https://issues.apache.org/jira/browse/KAFKA-8981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16977051#comment-16977051 ] ASF GitHub Bot commented on KAFKA-8981: --- mumrah commented on pull request #7446: KAFKA-8981 Add rate limiting to NetworkDegradeSpec URL: https://github.com/apache/kafka/pull/7446 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add bandwidth limits to DegradedNetworkFault > > > Key: KAFKA-8981 > URL: https://issues.apache.org/jira/browse/KAFKA-8981 > Project: Kafka > Issue Type: Improvement >Reporter: David Arthur >Priority: Minor > > We are currently only using Traffic Control (tc) to introduce packet latency. > It also has the ability to limit the egress bandwidth. It would be nice to > include this as an option when running system tests that need to simulate > long/slow links. > > {{tc}} references > * [https://netbeez.net/blog/how-to-use-the-linux-traffic-control/] > * [https://www.badunetworks.com/traffic-shaping-with-tc/] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-9188) Flaky Test SslAdminClientIntegrationTest.testSynchronousAuthorizerAclUpdatesBlockRequestThreads
[ https://issues.apache.org/jira/browse/KAFKA-9188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16977025#comment-16977025 ] Matthias J. Sax commented on KAFKA-9188: [https://builds.apache.org/job/kafka-pr-jdk11-scala2.13/3444/testReport/junit/kafka.api/SslAdminIntegrationTest/testSynchronousAuthorizerAclUpdatesBlockRequestThreads/] > Flaky Test > SslAdminClientIntegrationTest.testSynchronousAuthorizerAclUpdatesBlockRequestThreads > --- > > Key: KAFKA-9188 > URL: https://issues.apache.org/jira/browse/KAFKA-9188 > Project: Kafka > Issue Type: Test > Components: core >Reporter: Bill Bejeck >Priority: Major > Labels: flaky-test > > Failed in > [https://builds.apache.org/job/kafka-pr-jdk11-scala2.12/9373/testReport/junit/kafka.api/SslAdminClientIntegrationTest/testSynchronousAuthorizerAclUpdatesBlockRequestThreads/] > > {noformat} > Error Messagejava.util.concurrent.ExecutionException: > org.apache.kafka.common.errors.TimeoutException: Aborted due to > timeout.Stacktracejava.util.concurrent.ExecutionException: > org.apache.kafka.common.errors.TimeoutException: Aborted due to timeout. > at > org.apache.kafka.common.internals.KafkaFutureImpl.wrapAndThrow(KafkaFutureImpl.java:45) > at > org.apache.kafka.common.internals.KafkaFutureImpl.access$000(KafkaFutureImpl.java:32) > at > org.apache.kafka.common.internals.KafkaFutureImpl$SingleWaiter.await(KafkaFutureImpl.java:89) > at > org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:260) > at > kafka.api.SslAdminClientIntegrationTest.$anonfun$testSynchronousAuthorizerAclUpdatesBlockRequestThreads$1(SslAdminClientIntegrationTest.scala:201) > at > scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) > at > scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) > at > kafka.api.SslAdminClientIntegrationTest.testSynchronousAuthorizerAclUpdatesBlockRequestThreads(SslAdminClientIntegrationTest.scala:201) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:566) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:288) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:282) > at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) > at java.base/java.lang.Thread.run(Thread.java:834) > Caused by: org.apache.kafka.common.errors.TimeoutException: Aborted due to > timeout. > Standard Output[2019-11-14 15:13:51,489] ERROR [ReplicaFetcher replicaId=1, > leaderId=2, fetcherId=0] Error for partition mytopic1-1 at offset 0 > (kafka.server.ReplicaFetcherThread:76) > org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server > does not host this topic-partition. > [2019-11-14 15:13:51,490] ERROR [ReplicaFetcher replicaId=1, leaderId=0, > fetcherId=0] Error for partition mytopic1-0 at offset 0 > (kafka.server.ReplicaFetcherThread:76) > org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server > does not host this topic-partition. > [2019-11-14 15:14:04,686] ERROR [KafkaApi-2] Error when handling request: > clientId=adminclient-644, correlationId=4, api=CREATE_ACLS, version=1, > body={creations=[{resource_type=2,resource_name=foobar,resource_pattern_type=3,principal=User:ANONYMOUS,host=*,operation=3,permission_type=3},{resource_type=5,resource_name=transactional_id,resource_pattern_type=3,principal=User:ANONYMOUS,host=*,operation=4,permission_type=3}]} > (kafka.server.KafkaApis:76) > org.apache.kafka.common.errors.ClusterAuthorizationException: Request > Request(processor=1, connectionId=127.0.0.1:41993-127.0.0.1:34770-0, >
[jira] [Comment Edited] (KAFKA-9207) Replica Out of Sync as creating ReplicaFetcher thread failed with connection to leader
[ https://issues.apache.org/jira/browse/KAFKA-9207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16976957#comment-16976957 ] Xue Liu edited comment on KAFKA-9207 at 11/18/19 11:38 PM: --- We have some further discovery: When creating that thread, the follower had connection error to the leader. See attachment error-connection.jpg I feel like we can at least add retry for these temporary network error and also add log to catch this error. was (Author: xuel1): We have some further discovery: When creating that thread, the follower had connection error to the leader. See attachment error-connection.jpg > Replica Out of Sync as creating ReplicaFetcher thread failed with connection > to leader > -- > > Key: KAFKA-9207 > URL: https://issues.apache.org/jira/browse/KAFKA-9207 > Project: Kafka > Issue Type: Bug > Components: replication >Affects Versions: 2.3.0 >Reporter: Xue Liu >Priority: Major > Attachments: Capture.PNG, error-connection.jpg > > > We sometimes see a replica for a partition is out of sync. When the issue > happens, it seems that we just lost that replica (would never catch-up), > unless we restart that broker. > It appears that ReplicaFetcher thread for that partition is dead and broker > will not restart that thread. We didn't see any exception in server or > controller logs. > The screen capture is taken from the broker that has that replica. Leader is > 2017. > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (KAFKA-8571) Not complete delayed produce requests when processing StopReplicaRequest causing high produce latency for acks=all
[ https://issues.apache.org/jira/browse/KAFKA-8571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Gustafson resolved KAFKA-8571. Resolution: Fixed > Not complete delayed produce requests when processing StopReplicaRequest > causing high produce latency for acks=all > -- > > Key: KAFKA-8571 > URL: https://issues.apache.org/jira/browse/KAFKA-8571 > Project: Kafka > Issue Type: Bug >Reporter: Zhanxiang (Patrick) Huang >Assignee: Zhanxiang (Patrick) Huang >Priority: Major > > Currently a broker will only attempt to complete delayed requests upon > highwater mark changes and receiving LeaderAndIsrRequest. When a broker > receives StopReplicaRequest, it will not try to complete delayed operations > including delayed produce for acks=all, which can cause the producer to > timeout even though the producer should have attempted to talk to the new > leader faster if a NotLeaderForPartition error is sent. > This can happen during partition reassignment when controller is trying to > kick the previous leader out of the replica set. It this case, controller > will only send StopReplicaRequest (not LeaderAndIsrRequest) to the previous > leader in the replica set shrink phase. Here is an example: > {noformat} > During Reassign the replica set of partition A from [B1, B2] to [B2, B3]: > t0: Controller expands the replica set to [B1, B2, B3] > t1: B1 receives produce request PR on partition A with acks=all and timetout > T. B1 puts PR into the DelayedProducePurgatory with timeout T. > t2: Controller elects B2 as the new leader and shrinks the replica set fo > [B2, B3]. LeaderAndIsrRequests are sent to B2 and B3. StopReplicaRequest is > sent to B!. > t3: B1 receives StopReplicaRequest but doesn't try to comeplete PR. > If PR cannot be fullfilled by t3, and t1 + T > t3, PR will eventually time > out in the purgatory and producer will eventually time out the produce > request.{noformat} > Since it is possible for the leader to receive only a StopReplicaRequest > (without receiving any LeaderAndIsrRequest) to leave the replica set, a fix > for this issue is to also try to complete delay operations in processing > StopReplicaRequest. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (KAFKA-9198) StopReplica handler should complete pending purgatory operations
[ https://issues.apache.org/jira/browse/KAFKA-9198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Gustafson resolved KAFKA-9198. Fix Version/s: 2.4.0 Resolution: Fixed The PR has been merged. I meant to change the title to reference KAFKA-8571, but forgot about it. So instead I'll just resolve this one as fixed and mark the other as a dup. Sorry for the noise. > StopReplica handler should complete pending purgatory operations > > > Key: KAFKA-9198 > URL: https://issues.apache.org/jira/browse/KAFKA-9198 > Project: Kafka > Issue Type: Bug >Reporter: Jason Gustafson >Assignee: Jason Gustafson >Priority: Major > Fix For: 2.4.0 > > > When a reassignment completes, the current leader may need to be shutdown > with a StopReplica request. It may still have fetch/produce requests in > purgatory when this happens. We do not have logic currently to force > completion of these requests which means they are doomed to eventually > timeout. This is mostly an issue for produce requests which use the default > request timeout of 30s. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Reopened] (KAFKA-9198) StopReplica handler should complete pending purgatory operations
[ https://issues.apache.org/jira/browse/KAFKA-9198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Gustafson reopened KAFKA-9198: > StopReplica handler should complete pending purgatory operations > > > Key: KAFKA-9198 > URL: https://issues.apache.org/jira/browse/KAFKA-9198 > Project: Kafka > Issue Type: Bug >Reporter: Jason Gustafson >Assignee: Jason Gustafson >Priority: Major > > When a reassignment completes, the current leader may need to be shutdown > with a StopReplica request. It may still have fetch/produce requests in > purgatory when this happens. We do not have logic currently to force > completion of these requests which means they are doomed to eventually > timeout. This is mostly an issue for produce requests which use the default > request timeout of 30s. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-8571) Not complete delayed produce requests when processing StopReplicaRequest causing high produce latency for acks=all
[ https://issues.apache.org/jira/browse/KAFKA-8571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16976965#comment-16976965 ] ASF GitHub Bot commented on KAFKA-8571: --- hachikuji commented on pull request #7069: KAFKA-8571: Clean up purgatory when leader replica is kicked out of replica list. URL: https://github.com/apache/kafka/pull/7069 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Not complete delayed produce requests when processing StopReplicaRequest > causing high produce latency for acks=all > -- > > Key: KAFKA-8571 > URL: https://issues.apache.org/jira/browse/KAFKA-8571 > Project: Kafka > Issue Type: Bug >Reporter: Zhanxiang (Patrick) Huang >Assignee: Zhanxiang (Patrick) Huang >Priority: Major > > Currently a broker will only attempt to complete delayed requests upon > highwater mark changes and receiving LeaderAndIsrRequest. When a broker > receives StopReplicaRequest, it will not try to complete delayed operations > including delayed produce for acks=all, which can cause the producer to > timeout even though the producer should have attempted to talk to the new > leader faster if a NotLeaderForPartition error is sent. > This can happen during partition reassignment when controller is trying to > kick the previous leader out of the replica set. It this case, controller > will only send StopReplicaRequest (not LeaderAndIsrRequest) to the previous > leader in the replica set shrink phase. Here is an example: > {noformat} > During Reassign the replica set of partition A from [B1, B2] to [B2, B3]: > t0: Controller expands the replica set to [B1, B2, B3] > t1: B1 receives produce request PR on partition A with acks=all and timetout > T. B1 puts PR into the DelayedProducePurgatory with timeout T. > t2: Controller elects B2 as the new leader and shrinks the replica set fo > [B2, B3]. LeaderAndIsrRequests are sent to B2 and B3. StopReplicaRequest is > sent to B!. > t3: B1 receives StopReplicaRequest but doesn't try to comeplete PR. > If PR cannot be fullfilled by t3, and t1 + T > t3, PR will eventually time > out in the purgatory and producer will eventually time out the produce > request.{noformat} > Since it is possible for the leader to receive only a StopReplicaRequest > (without receiving any LeaderAndIsrRequest) to leave the replica set, a fix > for this issue is to also try to complete delay operations in processing > StopReplicaRequest. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-9198) StopReplica handler should complete pending purgatory operations
[ https://issues.apache.org/jira/browse/KAFKA-9198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16976964#comment-16976964 ] ASF GitHub Bot commented on KAFKA-9198: --- hachikuji commented on pull request #7701: KAFKA-9198; Complete purgatory operations on receiving StopReplica URL: https://github.com/apache/kafka/pull/7701 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > StopReplica handler should complete pending purgatory operations > > > Key: KAFKA-9198 > URL: https://issues.apache.org/jira/browse/KAFKA-9198 > Project: Kafka > Issue Type: Bug >Reporter: Jason Gustafson >Assignee: Jason Gustafson >Priority: Major > > When a reassignment completes, the current leader may need to be shutdown > with a StopReplica request. It may still have fetch/produce requests in > purgatory when this happens. We do not have logic currently to force > completion of these requests which means they are doomed to eventually > timeout. This is mostly an issue for produce requests which use the default > request timeout of 30s. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (KAFKA-9207) Replica Out of Sync as creating ReplicaFetcher thread failed with connection to leader
[ https://issues.apache.org/jira/browse/KAFKA-9207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16976957#comment-16976957 ] Xue Liu edited comment on KAFKA-9207 at 11/18/19 10:50 PM: --- We have some further discovery: When creating that thread, the follower had connection error to the leader. See attachment error-connection.jpg was (Author: xuel1): We have some further discovery: When creating that thread, the follower had connection error to the leader. See attachment error-connection.jpg > Replica Out of Sync as creating ReplicaFetcher thread failed with connection > to leader > -- > > Key: KAFKA-9207 > URL: https://issues.apache.org/jira/browse/KAFKA-9207 > Project: Kafka > Issue Type: Bug > Components: replication >Affects Versions: 2.3.0 >Reporter: Xue Liu >Priority: Major > Attachments: Capture.PNG, error-connection.jpg > > > We sometimes see a replica for a partition is out of sync. When the issue > happens, it seems that we just lost that replica (would never catch-up), > unless we restart that broker. > It appears that ReplicaFetcher thread for that partition is dead and broker > will not restart that thread. We didn't see any exception in server or > controller logs. > The screen capture is taken from the broker that has that replica. Leader is > 2017. > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KAFKA-9207) Replica Out of Sync as creating ReplicaFetcher thread failed with connection to leader
[ https://issues.apache.org/jira/browse/KAFKA-9207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xue Liu updated KAFKA-9207: --- Attachment: error-connection.jpg > Replica Out of Sync as creating ReplicaFetcher thread failed with connection > to leader > -- > > Key: KAFKA-9207 > URL: https://issues.apache.org/jira/browse/KAFKA-9207 > Project: Kafka > Issue Type: Bug > Components: replication >Affects Versions: 2.3.0 >Reporter: Xue Liu >Priority: Major > Attachments: Capture.PNG, error-connection.jpg > > > We sometimes see a replica for a partition is out of sync. When the issue > happens, it seems that we just lost that replica (would never catch-up), > unless we restart that broker. > It appears that ReplicaFetcher thread for that partition is dead and broker > will not restart that thread. We didn't see any exception in server or > controller logs. > The screen capture is taken from the broker that has that replica. Leader is > 2017. > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-9207) Replica Out of Sync as creating ReplicaFetcher thread failed with connection to leader
[ https://issues.apache.org/jira/browse/KAFKA-9207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16976957#comment-16976957 ] Xue Liu commented on KAFKA-9207: We have some further discovery: When creating that thread, the follower had connection error to the leader. See attachment error-connection.jpg > Replica Out of Sync as creating ReplicaFetcher thread failed with connection > to leader > -- > > Key: KAFKA-9207 > URL: https://issues.apache.org/jira/browse/KAFKA-9207 > Project: Kafka > Issue Type: Bug > Components: replication >Affects Versions: 2.3.0 >Reporter: Xue Liu >Priority: Major > Attachments: Capture.PNG > > > We sometimes see a replica for a partition is out of sync. When the issue > happens, it seems that we just lost that replica (would never catch-up), > unless we restart that broker. > It appears that ReplicaFetcher thread for that partition is dead and broker > will not restart that thread. We didn't see any exception in server or > controller logs. > The screen capture is taken from the broker that has that replica. Leader is > 2017. > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KAFKA-9207) Replica Out of Sync as creating ReplicaFetcher thread failed with connection to leader
[ https://issues.apache.org/jira/browse/KAFKA-9207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xue Liu updated KAFKA-9207: --- Summary: Replica Out of Sync as creating ReplicaFetcher thread failed with connection to leader (was: Replica Out of Sync as ReplicaFetcher thread is dead) > Replica Out of Sync as creating ReplicaFetcher thread failed with connection > to leader > -- > > Key: KAFKA-9207 > URL: https://issues.apache.org/jira/browse/KAFKA-9207 > Project: Kafka > Issue Type: Bug > Components: replication >Affects Versions: 2.3.0 >Reporter: Xue Liu >Priority: Major > Attachments: Capture.PNG > > > We sometimes see a replica for a partition is out of sync. When the issue > happens, it seems that we just lost that replica (would never catch-up), > unless we restart that broker. > It appears that ReplicaFetcher thread for that partition is dead and broker > will not restart that thread. We didn't see any exception in server or > controller logs. > The screen capture is taken from the broker that has that replica. Leader is > 2017. > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-9205) Add an option to enforce rack-aware partition reassignment
[ https://issues.apache.org/jira/browse/KAFKA-9205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16976895#comment-16976895 ] Vahid Hashemian commented on KAFKA-9205: This will still likely require a KIP since the default behavior could change. > Add an option to enforce rack-aware partition reassignment > -- > > Key: KAFKA-9205 > URL: https://issues.apache.org/jira/browse/KAFKA-9205 > Project: Kafka > Issue Type: Improvement > Components: admin, tools >Reporter: Vahid Hashemian >Priority: Minor > > One regularly used healing operation on Kafka clusters is replica > reassignments for topic partitions. For example, when there is a skew in > inbound/outbound traffic of a broker replica reassignment can be used to move > some leaders/followers from the broker; or if there is a skew in disk usage > of brokers, replica reassignment can more some partitions to other brokers > that have more disk space available. > In Kafka clusters that span across multiple data centers (or availability > zones), high availability is a priority; in the sense that when a data center > goes offline the cluster should be able to resume normal operation by > guaranteeing partition replicas in all data centers. > This guarantee is currently the responsibility of the on-call engineer that > performs the reassignment or the tool that automatically generates the > reassignment plan for improving the cluster health (e.g. by considering the > rack configuration value of each broker in the cluster). the former, is quite > error-prone, and the latter, would lead to duplicate code in all such admin > tools (which are not error free either). Not all use cases can make use the > default assignment strategy that is used by --generate option; and current > rack aware enforcement applies to this option only. > It would be great for the built-in replica assignment API and tool provided > by Kafka to support a rack aware verification option for --execute scenario > that would simply return an error when [some] brokers in any replica set > share a common rack. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-8843) Zookeeper migration tool support for TLS
[ https://issues.apache.org/jira/browse/KAFKA-8843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16976893#comment-16976893 ] Kelly Schoenhofen commented on KAFKA-8843: -- Question, does ZK 3.5.6 allow for SSL (TLS, but let's say SSL to keep in line with the documentation) from Kafka? Not SASL_SSL, just plain SSL. Is that what this Jira is for? I have quorum TLS working in ZK 3.5.6, I added a tls-secured listener, but as of yet I can't quite get Kafka to connect to it: {{[2019-11-18 15:03:11,545] INFO Opening socket connection to server xxx/x.x.x.x:2182. Will not attempt to authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn)}} is the closest I have come, but I didn't want do to SASL_SSL, I just want to secure the traffic between Kafka and ZooKeeper using TLS 1.2 and a specific class of cipher, like TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384, and enforce the CN name on each side to match each other's cert & trusted cert stores (like how ZooKeeper Quorum TLS works). > Zookeeper migration tool support for TLS > > > Key: KAFKA-8843 > URL: https://issues.apache.org/jira/browse/KAFKA-8843 > Project: Kafka > Issue Type: Bug >Reporter: Pere Urbon-Bayes >Assignee: Pere Urbon-Bayes >Priority: Minor > > Currently zookeeper-migration tool works based on SASL authentication. What > means only digest and kerberos authentication is supported. > > With the introduction of ZK 3.5, TLS is added, including a new X509 > authentication provider. > > To support this great future and utilise the TLS principals, the > zookeeper-migration-tool script should support the X509 authentication as > well. > > In my newbie view, this should mean adding a new parameter to allow other > ways of authentication around > [https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/admin/ZkSecurityMigrator.scala#L65. > > |https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/admin/ZkSecurityMigrator.scala#L65] > > If I understand the process correct, this will require a KIP, right? > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KAFKA-8326) Add Serde> support
[ https://issues.apache.org/jira/browse/KAFKA-8326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniyar Yeralin updated KAFKA-8326: --- Description: _This ticket proposes adding new {color:#4c9aff}ListSerializer{color} and {color:#4c9aff}ListDeserializer{color} classes as well as support for the new classes into the Serdes class. This will allow using List Serde of type_ {color:#4c9aff}_Serde>_{color} _directly from Consumers, Producers and Streams._ _{color:#4c9aff}Serde>{color} serialization and deserialization will be done through repeatedly calling a serializer/deserializer for each entry provided by passed generic {color:#4c9aff}Inner{color}'s Serde. For example, if you want to create List of Strings serde, then serializer/deserializer of StringSerde will be used to serialize/deserialize each entry in {color:#4c9aff}List{color}._ I believe there are many use cases where List Serde could be used. Ex. [https://stackoverflow.com/questions/41427174/aggregate-java-objects-in-a-list-with-kafka-streams-dsl-windows], [https://stackoverflow.com/questions/46365884/issue-with-arraylist-serde-in-kafka-streams-api] For instance, aggregate grouped (by key) values together in a list to do other subsequent operations on the collection. KIP Link: [https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization] was: _This ticket proposes adding new {color:#4c9aff}ListSerializer{color} and {color:#4c9aff}ListDeserializer{color} classes as well as support for the new classes into the Serdes class. This will allow using List Serde of type_ {color:#4c9aff}_, T>_{color} _directly from Consumers, Producers and Streams._ _{color:#4c9aff}List{color} serialization and deserialization will be done through repeatedly calling a serializer/deserializer for each entry provided by passed generic {color:#4c9aff}T{color}'s Serde. For example, if you want to create List of Strings serde, then serializer/deserializer of StringSerde will be used to serialize/deserialize each entry in {color:#4c9aff}List{color}._ I believe there are many use cases where List Serde could be used. Ex. [https://stackoverflow.com/questions/41427174/aggregate-java-objects-in-a-list-with-kafka-streams-dsl-windows], [https://stackoverflow.com/questions/46365884/issue-with-arraylist-serde-in-kafka-streams-api] For instance, aggregate grouped (by key) values together in a list to do other subsequent operations on the collection. KIP Link: [https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization] > Add Serde> support > -- > > Key: KAFKA-8326 > URL: https://issues.apache.org/jira/browse/KAFKA-8326 > Project: Kafka > Issue Type: Improvement > Components: clients, streams >Reporter: Daniyar Yeralin >Assignee: Daniyar Yeralin >Priority: Minor > Labels: kip > > _This ticket proposes adding new {color:#4c9aff}ListSerializer{color} and > {color:#4c9aff}ListDeserializer{color} classes as well as support for the new > classes into the Serdes class. This will allow using List Serde of type_ > {color:#4c9aff}_Serde>_{color} _directly from Consumers, > Producers and Streams._ > _{color:#4c9aff}Serde>{color} serialization and deserialization > will be done through repeatedly calling a serializer/deserializer for each > entry provided by passed generic {color:#4c9aff}Inner{color}'s Serde. For > example, if you want to create List of Strings serde, then > serializer/deserializer of StringSerde will be used to serialize/deserialize > each entry in {color:#4c9aff}List{color}._ > I believe there are many use cases where List Serde could be used. Ex. > [https://stackoverflow.com/questions/41427174/aggregate-java-objects-in-a-list-with-kafka-streams-dsl-windows], > > [https://stackoverflow.com/questions/46365884/issue-with-arraylist-serde-in-kafka-streams-api] > For instance, aggregate grouped (by key) values together in a list to do > other subsequent operations on the collection. > KIP Link: > [https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KAFKA-8326) Add Serde> support
[ https://issues.apache.org/jira/browse/KAFKA-8326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniyar Yeralin updated KAFKA-8326: --- Summary: Add Serde> support (was: Add List Serde) > Add Serde> support > -- > > Key: KAFKA-8326 > URL: https://issues.apache.org/jira/browse/KAFKA-8326 > Project: Kafka > Issue Type: Improvement > Components: clients, streams >Reporter: Daniyar Yeralin >Assignee: Daniyar Yeralin >Priority: Minor > Labels: kip > > _This ticket proposes adding new {color:#4c9aff}ListSerializer{color} and > {color:#4c9aff}ListDeserializer{color} classes as well as support for the new > classes into the Serdes class. This will allow using List Serde of type_ > {color:#4c9aff}_, T>_{color} _directly from Consumers, > Producers and Streams._ > _{color:#4c9aff}List{color} serialization and deserialization will be done > through repeatedly calling a serializer/deserializer for each entry provided > by passed generic {color:#4c9aff}T{color}'s Serde. For example, if you want > to create List of Strings serde, then serializer/deserializer of StringSerde > will be used to serialize/deserialize each entry in > {color:#4c9aff}List{color}._ > I believe there are many use cases where List Serde could be used. Ex. > [https://stackoverflow.com/questions/41427174/aggregate-java-objects-in-a-list-with-kafka-streams-dsl-windows], > > [https://stackoverflow.com/questions/46365884/issue-with-arraylist-serde-in-kafka-streams-api] > For instance, aggregate grouped (by key) values together in a list to do > other subsequent operations on the collection. > KIP Link: > [https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KAFKA-9205) Add an option to enforce rack-aware partition reassignment
[ https://issues.apache.org/jira/browse/KAFKA-9205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vahid Hashemian updated KAFKA-9205: --- Description: One regularly used healing operation on Kafka clusters is replica reassignments for topic partitions. For example, when there is a skew in inbound/outbound traffic of a broker replica reassignment can be used to move some leaders/followers from the broker; or if there is a skew in disk usage of brokers, replica reassignment can more some partitions to other brokers that have more disk space available. In Kafka clusters that span across multiple data centers (or availability zones), high availability is a priority; in the sense that when a data center goes offline the cluster should be able to resume normal operation by guaranteeing partition replicas in all data centers. This guarantee is currently the responsibility of the on-call engineer that performs the reassignment or the tool that automatically generates the reassignment plan for improving the cluster health (e.g. by considering the rack configuration value of each broker in the cluster). the former, is quite error-prone, and the latter, would lead to duplicate code in all such admin tools (which are not error free either). Not all use cases can make use the default assignment strategy that is used by --generate option; and current rack aware enforcement applies to this option only. It would be great for the built-in replica assignment API and tool provided by Kafka to support a rack aware verification option for --execute scenario that would simply return an error when [some] brokers in any replica set share a common rack. was: One regularly used healing operation on Kafka clusters is replica reassignments for topic partitions. For example, when there is a skew in inbound/outbound traffic of a broker replica reassignment can be used to move some leaders/followers from the broker; or if there is a skew in disk usage of brokers, replica reassignment can more some partitions to other brokers that have more disk space available. In Kafka clusters that span across multiple data centers (or availability zones), high availability is a priority; in the sense that when a data center goes offline the cluster should be able to resume normal operation by guaranteeing partition replicas in all data centers. This guarantee is currently the responsibility of the on-call engineer that performs the reassignment or the tool that automatically generates the reassignment plan for improving the cluster health (e.g. by considering the rack configuration value of each broker in the cluster). the former, is quite error-prone, and the latter, would lead to duplicate code in all such admin tools (which are not error free either). It would be great for the built-in replica assignment API and tool provided by Kafka to support a rack aware verification option that would simply return an error when [some] brokers in any replica set share a common rack. > Add an option to enforce rack-aware partition reassignment > -- > > Key: KAFKA-9205 > URL: https://issues.apache.org/jira/browse/KAFKA-9205 > Project: Kafka > Issue Type: Improvement > Components: admin, tools >Reporter: Vahid Hashemian >Priority: Minor > Labels: needs-kip > > One regularly used healing operation on Kafka clusters is replica > reassignments for topic partitions. For example, when there is a skew in > inbound/outbound traffic of a broker replica reassignment can be used to move > some leaders/followers from the broker; or if there is a skew in disk usage > of brokers, replica reassignment can more some partitions to other brokers > that have more disk space available. > In Kafka clusters that span across multiple data centers (or availability > zones), high availability is a priority; in the sense that when a data center > goes offline the cluster should be able to resume normal operation by > guaranteeing partition replicas in all data centers. > This guarantee is currently the responsibility of the on-call engineer that > performs the reassignment or the tool that automatically generates the > reassignment plan for improving the cluster health (e.g. by considering the > rack configuration value of each broker in the cluster). the former, is quite > error-prone, and the latter, would lead to duplicate code in all such admin > tools (which are not error free either). Not all use cases can make use the > default assignment strategy that is used by --generate option; and current > rack aware enforcement applies to this option only. > It would be great for the built-in replica assignment API and tool provided > by Kafka to support a rack aware verification option for --execute scenario
[jira] [Commented] (KAFKA-9205) Add an option to enforce rack-aware partition reassignment
[ https://issues.apache.org/jira/browse/KAFKA-9205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16976888#comment-16976888 ] Vahid Hashemian commented on KAFKA-9205: Thanks [~sbellapu] for the pointer. KIP-36 and the current implementation enforces rack aware assignment when generating an assignment (using the --generate option). If a custom reassignment algorithm is used to generate the assignment, or if the reassignment is manually generated on ad-hoc basic, the tool does not enforce rack awareness when run with --execute option. It would be great if enforcement can be implemented in --execute scenario too. I updated the description too. > Add an option to enforce rack-aware partition reassignment > -- > > Key: KAFKA-9205 > URL: https://issues.apache.org/jira/browse/KAFKA-9205 > Project: Kafka > Issue Type: Improvement > Components: admin, tools >Reporter: Vahid Hashemian >Priority: Minor > > One regularly used healing operation on Kafka clusters is replica > reassignments for topic partitions. For example, when there is a skew in > inbound/outbound traffic of a broker replica reassignment can be used to move > some leaders/followers from the broker; or if there is a skew in disk usage > of brokers, replica reassignment can more some partitions to other brokers > that have more disk space available. > In Kafka clusters that span across multiple data centers (or availability > zones), high availability is a priority; in the sense that when a data center > goes offline the cluster should be able to resume normal operation by > guaranteeing partition replicas in all data centers. > This guarantee is currently the responsibility of the on-call engineer that > performs the reassignment or the tool that automatically generates the > reassignment plan for improving the cluster health (e.g. by considering the > rack configuration value of each broker in the cluster). the former, is quite > error-prone, and the latter, would lead to duplicate code in all such admin > tools (which are not error free either). Not all use cases can make use the > default assignment strategy that is used by --generate option; and current > rack aware enforcement applies to this option only. > It would be great for the built-in replica assignment API and tool provided > by Kafka to support a rack aware verification option for --execute scenario > that would simply return an error when [some] brokers in any replica set > share a common rack. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KAFKA-9205) Add an option to enforce rack-aware partition reassignment
[ https://issues.apache.org/jira/browse/KAFKA-9205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vahid Hashemian updated KAFKA-9205: --- Labels: (was: needs-kip) > Add an option to enforce rack-aware partition reassignment > -- > > Key: KAFKA-9205 > URL: https://issues.apache.org/jira/browse/KAFKA-9205 > Project: Kafka > Issue Type: Improvement > Components: admin, tools >Reporter: Vahid Hashemian >Priority: Minor > > One regularly used healing operation on Kafka clusters is replica > reassignments for topic partitions. For example, when there is a skew in > inbound/outbound traffic of a broker replica reassignment can be used to move > some leaders/followers from the broker; or if there is a skew in disk usage > of brokers, replica reassignment can more some partitions to other brokers > that have more disk space available. > In Kafka clusters that span across multiple data centers (or availability > zones), high availability is a priority; in the sense that when a data center > goes offline the cluster should be able to resume normal operation by > guaranteeing partition replicas in all data centers. > This guarantee is currently the responsibility of the on-call engineer that > performs the reassignment or the tool that automatically generates the > reassignment plan for improving the cluster health (e.g. by considering the > rack configuration value of each broker in the cluster). the former, is quite > error-prone, and the latter, would lead to duplicate code in all such admin > tools (which are not error free either). Not all use cases can make use the > default assignment strategy that is used by --generate option; and current > rack aware enforcement applies to this option only. > It would be great for the built-in replica assignment API and tool provided > by Kafka to support a rack aware verification option for --execute scenario > that would simply return an error when [some] brokers in any replica set > share a common rack. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-9205) Add an option to enforce rack-aware partition reassignment
[ https://issues.apache.org/jira/browse/KAFKA-9205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16976855#comment-16976855 ] sats commented on KAFKA-9205: - [~vahid] do you have new KIP ? or this can be a extension to [KIP-36|[https://cwiki.apache.org/confluence/display/KAFKA/KIP-36+Rack+aware+replica+assignment]] please let me know so that i can give a shot on the implementation. > Add an option to enforce rack-aware partition reassignment > -- > > Key: KAFKA-9205 > URL: https://issues.apache.org/jira/browse/KAFKA-9205 > Project: Kafka > Issue Type: Improvement > Components: admin, tools >Reporter: Vahid Hashemian >Priority: Minor > Labels: needs-kip > > One regularly used healing operation on Kafka clusters is replica > reassignments for topic partitions. For example, when there is a skew in > inbound/outbound traffic of a broker replica reassignment can be used to move > some leaders/followers from the broker; or if there is a skew in disk usage > of brokers, replica reassignment can more some partitions to other brokers > that have more disk space available. > In Kafka clusters that span across multiple data centers (or availability > zones), high availability is a priority; in the sense that when a data center > goes offline the cluster should be able to resume normal operation by > guaranteeing partition replicas in all data centers. > This guarantee is currently the responsibility of the on-call engineer that > performs the reassignment or the tool that automatically generates the > reassignment plan for improving the cluster health (e.g. by considering the > rack configuration value of each broker in the cluster). the former, is quite > error-prone, and the latter, would lead to duplicate code in all such admin > tools (which are not error free either). > It would be great for the built-in replica assignment API and tool provided > by Kafka to support a rack aware verification option that would simply return > an error when [some] brokers in any replica set share a common rack. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-9208) Flaky Test SslAdminClientIntegrationTest.testCreatePartitions
[ https://issues.apache.org/jira/browse/KAFKA-9208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16976843#comment-16976843 ] Sophie Blee-Goldman commented on KAFKA-9208: Results cleaned up already but I saw this fail again (on a different 2.4-targeted PR, also Java 8) > Flaky Test SslAdminClientIntegrationTest.testCreatePartitions > - > > Key: KAFKA-9208 > URL: https://issues.apache.org/jira/browse/KAFKA-9208 > Project: Kafka > Issue Type: Bug > Components: admin >Affects Versions: 2.4.0 >Reporter: Sophie Blee-Goldman >Priority: Major > Labels: flaky-test > > Java 8 build failed on 2.4-targeted PR > h3. Stacktrace > java.lang.AssertionError: validateOnly expected:<3> but was:<1> at > org.junit.Assert.fail(Assert.java:89) at > org.junit.Assert.failNotEquals(Assert.java:835) at > org.junit.Assert.assertEquals(Assert.java:647) at > kafka.api.AdminClientIntegrationTest$$anonfun$testCreatePartitions$6.apply(AdminClientIntegrationTest.scala:625) > at > kafka.api.AdminClientIntegrationTest$$anonfun$testCreatePartitions$6.apply(AdminClientIntegrationTest.scala:599) > at scala.collection.immutable.List.foreach(List.scala:392) at > kafka.api.AdminClientIntegrationTest.testCreatePartitions(AdminClientIntegrationTest.scala:599) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:288) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:282) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) at > java.lang.Thread.run(Thread.java:748) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KAFKA-9207) Replica Out of Sync as ReplicaFetcher thread is dead
[ https://issues.apache.org/jira/browse/KAFKA-9207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xue Liu updated KAFKA-9207: --- Description: We sometimes see a replica for a partition is out of sync. When the issue happens, it seems that we just lost that replica (would never catch-up), unless we restart that broker. It appears that ReplicaFetcher thread for that partition is dead and broker will not restart that thread. We didn't see any exception in server or controller logs. The screen capture is taken from the broker that has that replica. Leader is 2017. was: We sometimes see a replica for a partition is out of sync. When the issue happens, it seems that we just lost that replica (would never catch-up), unless we restart that broker. It appears that ReplicaFetcher thread for that partition is dead and broker will not restart that thread. We didn't see any exception in server or controller logs. > Replica Out of Sync as ReplicaFetcher thread is dead > > > Key: KAFKA-9207 > URL: https://issues.apache.org/jira/browse/KAFKA-9207 > Project: Kafka > Issue Type: Bug > Components: replication >Affects Versions: 2.3.0 >Reporter: Xue Liu >Priority: Major > Attachments: Capture.PNG > > > We sometimes see a replica for a partition is out of sync. When the issue > happens, it seems that we just lost that replica (would never catch-up), > unless we restart that broker. > It appears that ReplicaFetcher thread for that partition is dead and broker > will not restart that thread. We didn't see any exception in server or > controller logs. > The screen capture is taken from the broker that has that replica. Leader is > 2017. > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-9208) Flaky Test SslAdminClientIntegrationTest.testCreatePartitions
Sophie Blee-Goldman created KAFKA-9208: -- Summary: Flaky Test SslAdminClientIntegrationTest.testCreatePartitions Key: KAFKA-9208 URL: https://issues.apache.org/jira/browse/KAFKA-9208 Project: Kafka Issue Type: Bug Components: admin Affects Versions: 2.4.0 Reporter: Sophie Blee-Goldman Java 8 build failed on 2.4-targeted PR h3. Stacktrace java.lang.AssertionError: validateOnly expected:<3> but was:<1> at org.junit.Assert.fail(Assert.java:89) at org.junit.Assert.failNotEquals(Assert.java:835) at org.junit.Assert.assertEquals(Assert.java:647) at kafka.api.AdminClientIntegrationTest$$anonfun$testCreatePartitions$6.apply(AdminClientIntegrationTest.scala:625) at kafka.api.AdminClientIntegrationTest$$anonfun$testCreatePartitions$6.apply(AdminClientIntegrationTest.scala:599) at scala.collection.immutable.List.foreach(List.scala:392) at kafka.api.AdminClientIntegrationTest.testCreatePartitions(AdminClientIntegrationTest.scala:599) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:288) at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:282) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.lang.Thread.run(Thread.java:748) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KAFKA-9207) Replica Out of Sync as ReplicaFetcher thread is dead
[ https://issues.apache.org/jira/browse/KAFKA-9207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xue Liu updated KAFKA-9207: --- Description: We sometimes see a replica for a partition is out of sync. When the issue happens, it seems that we just lost that replica (would never catch-up), unless we restart that broker. It appears that ReplicaFetcher thread for that partition is dead and broker will not restart that thread. We didn't see any exception in server or controller logs. was: We sometimes see a replica for a partition is out of sync. When the issue happens, it seems that we just lost that replica (would never catch-up), unless we restart that broker. It appears that ReplicaFetcher thread for that partition is dead. We didn't see any exception in server or controller logs. > Replica Out of Sync as ReplicaFetcher thread is dead > > > Key: KAFKA-9207 > URL: https://issues.apache.org/jira/browse/KAFKA-9207 > Project: Kafka > Issue Type: Bug > Components: replication >Affects Versions: 2.3.0 >Reporter: Xue Liu >Priority: Major > Attachments: Capture.PNG > > > We sometimes see a replica for a partition is out of sync. When the issue > happens, it seems that we just lost that replica (would never catch-up), > unless we restart that broker. > It appears that ReplicaFetcher thread for that partition is dead and broker > will not restart that thread. We didn't see any exception in server or > controller logs. > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-9207) Replica Out of Sync as ReplicaFetcher thread is dead
Xue Liu created KAFKA-9207: -- Summary: Replica Out of Sync as ReplicaFetcher thread is dead Key: KAFKA-9207 URL: https://issues.apache.org/jira/browse/KAFKA-9207 Project: Kafka Issue Type: Bug Components: replication Affects Versions: 2.3.0 Reporter: Xue Liu Attachments: Capture.PNG We sometimes see a replica for a partition is out of sync. When the issue happens, it seems that we just lost that replica (would never catch-up), unless we restart that broker. It appears that ReplicaFetcher thread for that partition is dead. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KAFKA-9207) Replica Out of Sync as ReplicaFetcher thread is dead
[ https://issues.apache.org/jira/browse/KAFKA-9207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xue Liu updated KAFKA-9207: --- Description: We sometimes see a replica for a partition is out of sync. When the issue happens, it seems that we just lost that replica (would never catch-up), unless we restart that broker. It appears that ReplicaFetcher thread for that partition is dead. We didn't see any exception in server or controller logs. was: We sometimes see a replica for a partition is out of sync. When the issue happens, it seems that we just lost that replica (would never catch-up), unless we restart that broker. It appears that ReplicaFetcher thread for that partition is dead. > Replica Out of Sync as ReplicaFetcher thread is dead > > > Key: KAFKA-9207 > URL: https://issues.apache.org/jira/browse/KAFKA-9207 > Project: Kafka > Issue Type: Bug > Components: replication >Affects Versions: 2.3.0 >Reporter: Xue Liu >Priority: Major > Attachments: Capture.PNG > > > We sometimes see a replica for a partition is out of sync. When the issue > happens, it seems that we just lost that replica (would never catch-up), > unless we restart that broker. > It appears that ReplicaFetcher thread for that partition is dead. We didn't > see any exception in server or controller logs. > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-9206) Consumer should handle `CORRUPT_MESSAGE` error code in fetch response
Jason Gustafson created KAFKA-9206: -- Summary: Consumer should handle `CORRUPT_MESSAGE` error code in fetch response Key: KAFKA-9206 URL: https://issues.apache.org/jira/browse/KAFKA-9206 Project: Kafka Issue Type: Bug Reporter: Jason Gustafson This error code is possible, for example, when the broker scans the log to find the fetch offset after the index lookup. Currently this results in a slightly obscure message such as the following: {code:java} java.lang.IllegalStateException: Unexpected error code 2 while fetching data{code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-9157) logcleaner could generate empty segment files after cleaning
[ https://issues.apache.org/jira/browse/KAFKA-9157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16976745#comment-16976745 ] sats commented on KAFKA-9157: - [~huxi_2b] sure please go head. > logcleaner could generate empty segment files after cleaning > > > Key: KAFKA-9157 > URL: https://issues.apache.org/jira/browse/KAFKA-9157 > Project: Kafka > Issue Type: Improvement > Components: core >Affects Versions: 2.3.0 >Reporter: Jun Rao >Priority: Major > > Currently, the log cleaner could only combine segments within a 2-billion > offset range. If all records in that range are deleted, an empty segment > could be generated. It would be useful to avoid generating such empty > segments. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-9205) Add an option to enforce rack-aware partition reassignment
Vahid Hashemian created KAFKA-9205: -- Summary: Add an option to enforce rack-aware partition reassignment Key: KAFKA-9205 URL: https://issues.apache.org/jira/browse/KAFKA-9205 Project: Kafka Issue Type: Improvement Components: admin, tools Reporter: Vahid Hashemian One regularly used healing operation on Kafka clusters is replica reassignments for topic partitions. For example, when there is a skew in inbound/outbound traffic of a broker replica reassignment can be used to move some leaders/followers from the broker; or if there is a skew in disk usage of brokers, replica reassignment can more some partitions to other brokers that have more disk space available. In Kafka clusters that span across multiple data centers (or availability zones), high availability is a priority; in the sense that when a data center goes offline the cluster should be able to resume normal operation by guaranteeing partition replicas in all data centers. This guarantee is currently the responsibility of the on-call engineer that performs the reassignment or the tool that automatically generates the reassignment plan for improving the cluster health (e.g. by considering the rack configuration value of each broker in the cluster). the former, is quite error-prone, and the latter, would lead to duplicate code in all such admin tools (which are not error free either). It would be great for the built-in replica assignment API and tool provided by Kafka to support a rack aware verification option that would simply return an error when [some] brokers in any replica set share a common rack. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-9203) kafka-client 2.3.1 fails to consume lz4 compressed topic in kafka 2.1.1
[ https://issues.apache.org/jira/browse/KAFKA-9203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16976701#comment-16976701 ] Ismael Juma commented on KAFKA-9203: We upgraded lz4-java from 1.5.0 to 1.6.0 in Kafka 2.3. I wonder if that could be the reason: [https://github.com/lz4/lz4-java/blob/master/CHANGES.md] > kafka-client 2.3.1 fails to consume lz4 compressed topic in kafka 2.1.1 > --- > > Key: KAFKA-9203 > URL: https://issues.apache.org/jira/browse/KAFKA-9203 > Project: Kafka > Issue Type: Bug > Components: compression, consumer >Affects Versions: 2.3.0, 2.3.1 >Reporter: David Watzke >Priority: Major > > I run kafka cluster 2.1.1 > when I upgraded a client app to use kafka-client 2.3.0 (or 2.3.1) instead of > 2.2.0, I immediately started getting the following exceptions in a loop when > consuming a topic with LZ4-compressed messages: > {noformat} > 2019-11-18 11:59:16.888 ERROR [afka-consumer-thread] > com.avast.utils2.kafka.consumer.ConsumerThread: Unexpected exception occurred > while polling and processing messages: org.apache.kafka.common.KafkaExce > ption: Received exception when fetching the next record from > FILEREP_LONER_REQUEST-4. If needed, please seek past the record to continue > consumption. > org.apache.kafka.common.KafkaException: Received exception when fetching the > next record from FILEREP_LONER_REQUEST-4. If needed, please seek past the > record to continue consumption. > at > org.apache.kafka.clients.consumer.internals.Fetcher$PartitionRecords.fetchRecords(Fetcher.java:1473) > > at > org.apache.kafka.clients.consumer.internals.Fetcher$PartitionRecords.access$1600(Fetcher.java:1332) > > at > org.apache.kafka.clients.consumer.internals.Fetcher.fetchRecords(Fetcher.java:645) > > at > org.apache.kafka.clients.consumer.internals.Fetcher.fetchedRecords(Fetcher.java:606) > > at > org.apache.kafka.clients.consumer.KafkaConsumer.pollForFetches(KafkaConsumer.java:1263) > > at > org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1225) > at > org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1201) > at > com.avast.utils2.kafka.consumer.ConsumerThread.pollAndProcessMessagesFromKafka(ConsumerThread.java:180) > > at > com.avast.utils2.kafka.consumer.ConsumerThread.run(ConsumerThread.java:149) > at > com.avast.filerep.saver.RequestSaver$.$anonfun$main$8(RequestSaver.scala:28) > at > com.avast.filerep.saver.RequestSaver$.$anonfun$main$8$adapted(RequestSaver.scala:19) > > at > resource.AbstractManagedResource.$anonfun$acquireFor$1(AbstractManagedResource.scala:88) > > at > scala.util.control.Exception$Catch.$anonfun$either$1(Exception.scala:252) > at scala.util.control.Exception$Catch.apply(Exception.scala:228) > at scala.util.control.Exception$Catch.either(Exception.scala:252) > at > resource.AbstractManagedResource.acquireFor(AbstractManagedResource.scala:88) > at > resource.ManagedResourceOperations.apply(ManagedResourceOperations.scala:26) > at > resource.ManagedResourceOperations.apply$(ManagedResourceOperations.scala:26) > at > resource.AbstractManagedResource.apply(AbstractManagedResource.scala:50) > at > resource.ManagedResourceOperations.acquireAndGet(ManagedResourceOperations.scala:25) > > at > resource.ManagedResourceOperations.acquireAndGet$(ManagedResourceOperations.scala:25) > > at > resource.AbstractManagedResource.acquireAndGet(AbstractManagedResource.scala:50) > > at > resource.ManagedResourceOperations.foreach(ManagedResourceOperations.scala:53) > > at > resource.ManagedResourceOperations.foreach$(ManagedResourceOperations.scala:53) > > at > resource.AbstractManagedResource.foreach(AbstractManagedResource.scala:50) > at > com.avast.filerep.saver.RequestSaver$.$anonfun$main$6(RequestSaver.scala:19) > at > com.avast.filerep.saver.RequestSaver$.$anonfun$main$6$adapted(RequestSaver.scala:18) > > at > resource.AbstractManagedResource.$anonfun$acquireFor$1(AbstractManagedResource.scala:88) > > at > scala.util.control.Exception$Catch.$anonfun$either$1(Exception.scala:252) > at scala.util.control.Exception$Catch.apply(Exception.scala:228) > at scala.util.control.Exception$Catch.either(Exception.scala:252) > at > resource.AbstractManagedResource.acquireFor(AbstractManagedResource.scala:88) > at > resource.ManagedResourceOperations.apply(ManagedResourceOperations.scala:26) > at > resource.ManagedResourceOperations.apply$(ManagedResourceOperations.scala:26) >
[jira] [Commented] (KAFKA-9203) kafka-client 2.3.1 fails to consume lz4 compressed topic in kafka 2.1.1
[ https://issues.apache.org/jira/browse/KAFKA-9203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16976693#comment-16976693 ] Ismael Juma commented on KAFKA-9203: Thanks for the report. We have many tests and workloads running with lz4 and this is the first time I am seeing this issue, so it's more subtle than "lz4 broken with kafka 2.3". Do you know which client was used to compress these messages? Was it the Java producer 2.2? > kafka-client 2.3.1 fails to consume lz4 compressed topic in kafka 2.1.1 > --- > > Key: KAFKA-9203 > URL: https://issues.apache.org/jira/browse/KAFKA-9203 > Project: Kafka > Issue Type: Bug > Components: compression, consumer >Affects Versions: 2.3.0, 2.3.1 >Reporter: David Watzke >Priority: Major > > I run kafka cluster 2.1.1 > when I upgraded a client app to use kafka-client 2.3.0 (or 2.3.1) instead of > 2.2.0, I immediately started getting the following exceptions in a loop when > consuming a topic with LZ4-compressed messages: > {noformat} > 2019-11-18 11:59:16.888 ERROR [afka-consumer-thread] > com.avast.utils2.kafka.consumer.ConsumerThread: Unexpected exception occurred > while polling and processing messages: org.apache.kafka.common.KafkaExce > ption: Received exception when fetching the next record from > FILEREP_LONER_REQUEST-4. If needed, please seek past the record to continue > consumption. > org.apache.kafka.common.KafkaException: Received exception when fetching the > next record from FILEREP_LONER_REQUEST-4. If needed, please seek past the > record to continue consumption. > at > org.apache.kafka.clients.consumer.internals.Fetcher$PartitionRecords.fetchRecords(Fetcher.java:1473) > > at > org.apache.kafka.clients.consumer.internals.Fetcher$PartitionRecords.access$1600(Fetcher.java:1332) > > at > org.apache.kafka.clients.consumer.internals.Fetcher.fetchRecords(Fetcher.java:645) > > at > org.apache.kafka.clients.consumer.internals.Fetcher.fetchedRecords(Fetcher.java:606) > > at > org.apache.kafka.clients.consumer.KafkaConsumer.pollForFetches(KafkaConsumer.java:1263) > > at > org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1225) > at > org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1201) > at > com.avast.utils2.kafka.consumer.ConsumerThread.pollAndProcessMessagesFromKafka(ConsumerThread.java:180) > > at > com.avast.utils2.kafka.consumer.ConsumerThread.run(ConsumerThread.java:149) > at > com.avast.filerep.saver.RequestSaver$.$anonfun$main$8(RequestSaver.scala:28) > at > com.avast.filerep.saver.RequestSaver$.$anonfun$main$8$adapted(RequestSaver.scala:19) > > at > resource.AbstractManagedResource.$anonfun$acquireFor$1(AbstractManagedResource.scala:88) > > at > scala.util.control.Exception$Catch.$anonfun$either$1(Exception.scala:252) > at scala.util.control.Exception$Catch.apply(Exception.scala:228) > at scala.util.control.Exception$Catch.either(Exception.scala:252) > at > resource.AbstractManagedResource.acquireFor(AbstractManagedResource.scala:88) > at > resource.ManagedResourceOperations.apply(ManagedResourceOperations.scala:26) > at > resource.ManagedResourceOperations.apply$(ManagedResourceOperations.scala:26) > at > resource.AbstractManagedResource.apply(AbstractManagedResource.scala:50) > at > resource.ManagedResourceOperations.acquireAndGet(ManagedResourceOperations.scala:25) > > at > resource.ManagedResourceOperations.acquireAndGet$(ManagedResourceOperations.scala:25) > > at > resource.AbstractManagedResource.acquireAndGet(AbstractManagedResource.scala:50) > > at > resource.ManagedResourceOperations.foreach(ManagedResourceOperations.scala:53) > > at > resource.ManagedResourceOperations.foreach$(ManagedResourceOperations.scala:53) > > at > resource.AbstractManagedResource.foreach(AbstractManagedResource.scala:50) > at > com.avast.filerep.saver.RequestSaver$.$anonfun$main$6(RequestSaver.scala:19) > at > com.avast.filerep.saver.RequestSaver$.$anonfun$main$6$adapted(RequestSaver.scala:18) > > at > resource.AbstractManagedResource.$anonfun$acquireFor$1(AbstractManagedResource.scala:88) > > at > scala.util.control.Exception$Catch.$anonfun$either$1(Exception.scala:252) > at scala.util.control.Exception$Catch.apply(Exception.scala:228) > at scala.util.control.Exception$Catch.either(Exception.scala:252) > at > resource.AbstractManagedResource.acquireFor(AbstractManagedResource.scala:88) > at >
[jira] [Commented] (KAFKA-9180) Broker won't start with empty log dir
[ https://issues.apache.org/jira/browse/KAFKA-9180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16976691#comment-16976691 ] ASF GitHub Bot commented on KAFKA-9180: --- ijuma commented on pull request #7700: KAFKA-9180: Introduce BrokerMetadataCheckpointTest URL: https://github.com/apache/kafka/pull/7700 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Broker won't start with empty log dir > - > > Key: KAFKA-9180 > URL: https://issues.apache.org/jira/browse/KAFKA-9180 > Project: Kafka > Issue Type: Bug > Components: core >Affects Versions: 2.4.0 >Reporter: Magnus Edenhill >Assignee: Ismael Juma >Priority: Blocker > > On kafka trunk at commit 1675115ec193acf4c7d44e68a57272edfec0b455: > > Attempting to start the broker with an existing but empty log dir yields the > following error and terminates the process: > {code:java} > [2019-11-13 10:42:16,922] ERROR Failed to read meta.properties file under dir > /Users/magnus/src/librdkafka/tests/tmp/LibrdkafkaTestCluster/73638126/KafkaBrokerApp/2/logs/meta.properties > due to > /Users/magnus/src/librdkafka/tests/tmp/LibrdkafkaTestCluster/73638126/KafkaBrokerApp/2/logs/meta.properties > (No such file or directory) > (kafka.server.BrokerMetadataCheckpoint)[2019-11-13 10:42:16,924] ERROR Fail > to read meta.properties under log directory > /Users/magnus/src/librdkafka/tests/tmp/LibrdkafkaTestCluster/73638126/KafkaBrokerApp/2/logs > (kafka.server.KafkaServer)java.io.FileNotFoundException: > /Users/magnus/src/librdkafka/tests/tmp/LibrdkafkaTestCluster/73638126/KafkaBrokerApp/2/logs/meta.properties > (No such file or directory)at java.io.FileInputStream.open0(Native > Method)at java.io.FileInputStream.open(FileInputStream.java:195) > at java.io.FileInputStream.(FileInputStream.java:138)at > java.io.FileInputStream.(FileInputStream.java:93)at > org.apache.kafka.common.utils.Utils.loadProps(Utils.java:512)at > kafka.server.BrokerMetadataCheckpoint.liftedTree2$1(BrokerMetadataCheckpoint.scala:73) > at > kafka.server.BrokerMetadataCheckpoint.read(BrokerMetadataCheckpoint.scala:72) >at > kafka.server.KafkaServer.$anonfun$getBrokerMetadataAndOfflineDirs$1(KafkaServer.scala:704) > at > kafka.server.KafkaServer.$anonfun$getBrokerMetadataAndOfflineDirs$1$adapted(KafkaServer.scala:702) > at > scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36) > at > scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33) > at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38) > at > kafka.server.KafkaServer.getBrokerMetadataAndOfflineDirs(KafkaServer.scala:702) > at kafka.server.KafkaServer.startup(KafkaServer.scala:214)at > kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:44) > at kafka.Kafka$.main(Kafka.scala:84)at > kafka.Kafka.main(Kafka.scala) {code} > > > Changing the catch to FileNotFoundException fixes the issue, here: > [https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/server/BrokerMetadataCheckpoint.scala#L84] > > > This is a regression from 2.3.x. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-9170) KafkaStreams constructor fails to read configuration from Properties object created with default values
[ https://issues.apache.org/jira/browse/KAFKA-9170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16976669#comment-16976669 ] Oleg Muravskiy commented on KAFKA-9170: --- OK, I will need some time to figure out the scope of changes needed and whether it quantifies for a KIP, but do not have any spare time right now, so please bear with me > KafkaStreams constructor fails to read configuration from Properties object > created with default values > --- > > Key: KAFKA-9170 > URL: https://issues.apache.org/jira/browse/KAFKA-9170 > Project: Kafka > Issue Type: Bug > Components: streams >Affects Versions: 2.3.0 >Reporter: Oleg Muravskiy >Priority: Major > > When the *Properties* object passed in to the *KafkaStreams* constructor is > created like > > {code:java} > new Properties(defaultProperties){code} > > KafkaStreams fails to read properties properly, which in my case results in > an error: > > {noformat} > org.apache.kafka.common.config.ConfigException: Missing required > configuration "bootstrap.servers" which has no default > value.org.apache.kafka.common.config.ConfigException: Missing required > configuration "bootstrap.servers" which has no default value. at > org.apache.kafka.common.config.ConfigDef.parseValue(ConfigDef.java:476) at > org.apache.kafka.common.config.ConfigDef.parse(ConfigDef.java:466) at > org.apache.kafka.common.config.AbstractConfig.(AbstractConfig.java:108) > at > org.apache.kafka.common.config.AbstractConfig.(AbstractConfig.java:142) > at org.apache.kafka.streams.StreamsConfig.(StreamsConfig.java:844) at > org.apache.kafka.streams.StreamsConfig.(StreamsConfig.java:839) at > org.apache.kafka.streams.KafkaStreams.(KafkaStreams.java:544) > {noformat} > This is due to the fact that the constructor that receives the *Properties* > class: > > {code:java} > public KafkaStreams(final Topology topology, > final Properties props) { > this(topology.internalTopologyBuilder, new StreamsConfig(props), new > DefaultKafkaClientSupplier()); > {code} > passes *props* into *StreamsConfig*, which ignores the *Properties* > interface, and only uses the *Map* interface: > > {code:java} > public StreamsConfig(final Map props) { > this(props, true); > } > {code} > (Note that if you do > {{props.getProperty(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG)}}, it returns the > correct value). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-9184) Redundant task creation after worker fails to join a specific group generation
[ https://issues.apache.org/jira/browse/KAFKA-9184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16976650#comment-16976650 ] Timur Rubeko commented on KAFKA-9184: - Please, let me know should you need additional information. > Redundant task creation after worker fails to join a specific group generation > -- > > Key: KAFKA-9184 > URL: https://issues.apache.org/jira/browse/KAFKA-9184 > Project: Kafka > Issue Type: Bug > Components: KafkaConnect >Affects Versions: 2.3.2 >Reporter: Konstantine Karantasis >Assignee: Konstantine Karantasis >Priority: Blocker > Fix For: 2.3.2 > > > First reported here: > https://stackoverflow.com/questions/58631092/kafka-connect-assigns-same-task-to-multiple-workers > There seems to be an issue with task reassignment when a worker rejoins after > an unsuccessful join request. The worker seems to be outside the group for a > generation but when it joins again the same task is running in more than one > worker -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (KAFKA-9184) Redundant task creation after worker fails to join a specific group generation
[ https://issues.apache.org/jira/browse/KAFKA-9184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16976648#comment-16976648 ] Timur Rubeko edited comment on KAFKA-9184 at 11/18/19 3:54 PM: --- Hello. SO question author here. Following is an example of a sequence of events that typically leads to the redundant task creation. Set-up: three workers and three connectors. Relevant logs: *Worker A*: {code:java} [2019-11-03 11:07:26,912] INFO [Worker clientId=connect-1, groupId=ingest-sources-cluster] Joined group at generation 639 and got assignment: Assignment{error=0, leader='connect-1-7d69fcf2-1025-418b-9091-4054b984d18f', leaderUrl='http://10.16.0.18:8083/', offset=250, connectorIds=[mqtt-source], taskIds=[mqtt-source-0], revokedConnectorIds=[], revokedTaskIds=[], delay=30} (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1397) [2019-11-03 11:12:06,192] INFO [Worker clientId=connect-1, groupId=ingest-sources-cluster] Joined group at generation 640 and got assignment: Assignment{error=0, leader='connect-1-7d69fcf2-1025-418b-9091-4054b984d18f', leaderUrl='http://10.16.0.18:8083/', offset=250, connectorIds=[hdfs-sink, another-hdfs-sink, mqtt-source], taskIds=[hdfs-sink-0, another-hdfs-sink-0, mqtt-source-0], revokedConnectorIds=[], revokedTaskIds=[], delay=0} (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1397) [2019-11-03 11:12:09,247] INFO [Worker clientId=connect-1, groupId=ingest-sources-cluster] Joined group at generation 641 and got assignment: Assignment{error=0, leader='connect-1-7d69fcf2-1025-418b-9091-4054b984d18f', leaderUrl='http://10.16.0.18:8083/', offset=250, connectorIds=[hdfs-sink, another-hdfs-sink, mqtt-source], taskIds=[hdfs-sink-0, another-hdfs-sink-0, mqtt-source-0], revokedConnectorIds=[], revokedTaskIds=[], delay=0} (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1397) [2019-11-03 12:49:05,632] INFO [Worker clientId=connect-1, groupId=ingest-sources-cluster] Joined group at generation 642 and got assignment: Assignment{error=0, leader='connect-1-7d69fcf2-1025-418b-9091-4054b984d18f', leaderUrl='http://10.16.0.18:8083/', offset=250, connectorIds=[hdfs-sink, another-hdfs-sink, mqtt-source], taskIds=[hdfs-sink-0, another-hdfs-sink-0, mqtt-source-0], revokedConnectorIds=[], revokedTaskIds=[], delay=0} (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1397) {code} *Worker B*: {code:java} [2019-11-03 11:07:26,911] INFO [Worker clientId=connect-1, groupId=ingest-sources-cluster] Joined group at generation 639 and got assignment: Assignment{error=0, leader='connect-1-7d69fcf2-1025-418b-9091-4054b984d18f', leaderUrl='http://10.16.0.18:8083/', offset=250, connectorIds=[another-hdfs-sink], taskIds=[another-hdfs-sink-0], revokedConnectorIds=[], revokedTaskIds=[], delay=30} (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1397) [2019-11-03 11:12:06,041] INFO [Worker clientId=connect-1, groupId=ingest-sources-cluster] Attempt to heartbeat failed for since member id connect-1-bf534716-be2f-4cb3-9f26-521023c6b504 is not valid. (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:947) [2019-11-03 11:12:09,251] INFO [Worker clientId=connect-1, groupId=ingest-sources-cluster] Joined group at generation 641 and got assignment: Assignment{error=0, leader='connect-1-7d69fcf2-1025-418b-9091-4054b984d18f', leaderUrl='http://10.16.0.18:8083/', offset=250, connectorIds=[another-hdfs-sink], taskIds=[another-hdfs-sink-0], revokedConnectorIds=[], revokedTaskIds=[], delay=0} (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1397) [2019-11-03 12:49:03,150] INFO [Worker clientId=connect-1, groupId=ingest-sources-cluster] Attempt to heartbeat failed for since member id connect-1-c930bdb9-eedf-4313-95e0-4a6927836094 is not valid. (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:947) [2019-11-03 12:49:05,632] INFO [Worker clientId=connect-1, groupId=ingest-sources-cluster] Joined group at generation 642 and got assignment: Assignment{error=0, leader='connect-1-7d69fcf2-1025-418b-9091-4054b984d18f', leaderUrl='http://10.16.0.18:8083/', offset=250, connectorIds=[another-hdfs-sink], taskIds=[another-hdfs-sink-0], revokedConnectorIds=[], revokedTaskIds=[], delay=0} (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1397) {code} *Worker C*: {code:java} [2019-11-03 11:07:26,911] INFO [Worker clientId=connect-1, groupId=ingest-sources-cluster] Joined group at generation 639 and got assignment: Assignment{error=0, leader='connect-1-7d69fcf2-1025-418b-9091-4054b984d18f', leaderUrl='http://10.16.0.18:8083/', offset=250, connectorIds=[hdfs-sink], taskIds=[hdfs-sink-0], revokedConnectorIds=[], revokedTaskIds=[], delay=30} (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1397) [2019-11-03 11:12:06,006] INFO
[jira] [Comment Edited] (KAFKA-9184) Redundant task creation after worker fails to join a specific group generation
[ https://issues.apache.org/jira/browse/KAFKA-9184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16976648#comment-16976648 ] Timur Rubeko edited comment on KAFKA-9184 at 11/18/19 3:50 PM: --- Hello. SO question author here. Following is an example of a sequence of events that typically leads to the redundant task creation. Set-up: three workers and three connectors. Relevant logs: *Worker A*: {code:java} [2019-11-03 11:07:26,912] INFO [Worker clientId=connect-1, groupId=ingest-sources-cluster] Joined group at generation 639 and got assignment: Assignment{error=0, leader='connect-1-7d69fcf2-1025-418b-9091-4054b984d18f', leaderUrl='http://10.16.0.18:8083/', offset=250, connectorIds=[mqtt-source], taskIds=[mqtt-source-0], revokedConnectorIds=[], revokedTaskIds=[], delay=30} (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1397) [2019-11-03 11:12:06,192] INFO [Worker clientId=connect-1, groupId=ingest-sources-cluster] Joined group at generation 640 and got assignment: Assignment{error=0, leader='connect-1-7d69fcf2-1025-418b-9091-4054b984d18f', leaderUrl='http://10.16.0.18:8083/', offset=250, connectorIds=[hdfs-sink, another-hdfs-sink, mqtt-source], taskIds=[hdfs-sink-0, another-hdfs-sink-0, mqtt-source-0], revokedConnectorIds=[], revokedTaskIds=[], delay=0} (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1397) [2019-11-03 11:12:09,247] INFO [Worker clientId=connect-1, groupId=ingest-sources-cluster] Joined group at generation 641 and got assignment: Assignment{error=0, leader='connect-1-7d69fcf2-1025-418b-9091-4054b984d18f', leaderUrl='http://10.16.0.18:8083/', offset=250, connectorIds=[hdfs-sink, another-hdfs-sink, mqtt-source], taskIds=[hdfs-sink-0, another-hdfs-sink-0, mqtt-source-0], revokedConnectorIds=[], revokedTaskIds=[], delay=0} (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1397) [2019-11-03 12:49:05,632] INFO [Worker clientId=connect-1, groupId=ingest-sources-cluster] Joined group at generation 642 and got assignment: Assignment{error=0, leader='connect-1-7d69fcf2-1025-418b-9091-4054b984d18f', leaderUrl='http://10.16.0.18:8083/', offset=250, connectorIds=[hdfs-sink, another-hdfs-sink, mqtt-source], taskIds=[hdfs-sink-0, another-hdfs-sink-0, mqtt-source-0], revokedConnectorIds=[], revokedTaskIds=[], delay=0} (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1397) {code} *Worker B*: {code:java} [2019-11-03 11:07:26,911] INFO [Worker clientId=connect-1, groupId=ingest-sources-cluster] Joined group at generation 639 and got assignment: Assignment{error=0, leader='connect-1-7d69fcf2-1025-418b-9091-4054b984d18f', leaderUrl='http://10.16.0.18:8083/', offset=250, connectorIds=[another-hdfs-sink], taskIds=[another-hdfs-sink-0], revokedConnectorIds=[], revokedTaskIds=[], delay=30} (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1397) [2019-11-03 11:12:06,041] INFO [Worker clientId=connect-1, groupId=ingest-sources-cluster] Attempt to heartbeat failed for since member id connect-1-bf534716-be2f-4cb3-9f26-521023c6b504 is not valid. (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:947) [2019-11-03 11:12:09,251] INFO [Worker clientId=connect-1, groupId=ingest-sources-cluster] Joined group at generation 641 and got assignment: Assignment{error=0, leader='connect-1-7d69fcf2-1025-418b-9091-4054b984d18f', leaderUrl='http://10.16.0.18:8083/', offset=250, connectorIds=[another-hdfs-sink], taskIds=[another-hdfs-sink-0], revokedConnectorIds=[], revokedTaskIds=[], delay=0} (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1397) [2019-11-03 12:49:03,150] INFO [Worker clientId=connect-1, groupId=ingest-sources-cluster] Attempt to heartbeat failed for since member id connect-1-c930bdb9-eedf-4313-95e0-4a6927836094 is not valid. (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:947) [2019-11-03 12:49:05,632] INFO [Worker clientId=connect-1, groupId=ingest-sources-cluster] Joined group at generation 642 and got assignment: Assignment{error=0, leader='connect-1-7d69fcf2-1025-418b-9091-4054b984d18f', leaderUrl='http://10.16.0.18:8083/', offset=250, connectorIds=[another-hdfs-sink], taskIds=[another-hdfs-sink-0], revokedConnectorIds=[], revokedTaskIds=[], delay=0} (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1397) {code} *Worker C*: {code:java} [2019-11-03 11:07:26,911] INFO [Worker clientId=connect-1, groupId=ingest-sources-cluster] Joined group at generation 639 and got assignment: Assignment{error=0, leader='connect-1-7d69fcf2-1025-418b-9091-4054b984d18f', leaderUrl='http://10.16.0.18:8083/', offset=250, connectorIds=[hdfs-sink], taskIds=[hdfs-sink-0], revokedConnectorIds=[], revokedTaskIds=[], delay=30} (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1397) [2019-11-03 11:12:06,006] INFO
[jira] [Commented] (KAFKA-9184) Redundant task creation after worker fails to join a specific group generation
[ https://issues.apache.org/jira/browse/KAFKA-9184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16976648#comment-16976648 ] Timur Rubeko commented on KAFKA-9184: - Hello. SO question author here. Following is an example of a sequence of events that typically leads to the redundant task creation. Set-up: three workers and 3 connectors. Relevant logs: *Worker A*: {code:java} [2019-11-03 11:07:26,912] INFO [Worker clientId=connect-1, groupId=ingest-sources-cluster] Joined group at generation 639 and got assignment: Assignment{error=0, leader='connect-1-7d69fcf2-1025-418b-9091-4054b984d18f', leaderUrl='http://10.16.0.18:8083/', offset=250, connectorIds=[mqtt-source], taskIds=[mqtt-source-0], revokedConnectorIds=[], revokedTaskIds=[], delay=30} (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1397) [2019-11-03 11:12:06,192] INFO [Worker clientId=connect-1, groupId=ingest-sources-cluster] Joined group at generation 640 and got assignment: Assignment{error=0, leader='connect-1-7d69fcf2-1025-418b-9091-4054b984d18f', leaderUrl='http://10.16.0.18:8083/', offset=250, connectorIds=[hdfs-sink, another-hdfs-sink, mqtt-source], taskIds=[hdfs-sink-0, another-hdfs-sink-0, mqtt-source-0], revokedConnectorIds=[], revokedTaskIds=[], delay=0} (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1397) [2019-11-03 11:12:09,247] INFO [Worker clientId=connect-1, groupId=ingest-sources-cluster] Joined group at generation 641 and got assignment: Assignment{error=0, leader='connect-1-7d69fcf2-1025-418b-9091-4054b984d18f', leaderUrl='http://10.16.0.18:8083/', offset=250, connectorIds=[hdfs-sink, another-hdfs-sink, mqtt-source], taskIds=[hdfs-sink-0, another-hdfs-sink-0, mqtt-source-0], revokedConnectorIds=[], revokedTaskIds=[], delay=0} (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1397) [2019-11-03 12:49:05,632] INFO [Worker clientId=connect-1, groupId=ingest-sources-cluster] Joined group at generation 642 and got assignment: Assignment{error=0, leader='connect-1-7d69fcf2-1025-418b-9091-4054b984d18f', leaderUrl='http://10.16.0.18:8083/', offset=250, connectorIds=[hdfs-sink, another-hdfs-sink, mqtt-source], taskIds=[hdfs-sink-0, another-hdfs-sink-0, mqtt-source-0], revokedConnectorIds=[], revokedTaskIds=[], delay=0} (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1397) {code} *Worker B*: {code:java} [2019-11-03 11:07:26,911] INFO [Worker clientId=connect-1, groupId=ingest-sources-cluster] Joined group at generation 639 and got assignment: Assignment{error=0, leader='connect-1-7d69fcf2-1025-418b-9091-4054b984d18f', leaderUrl='http://10.16.0.18:8083/', offset=250, connectorIds=[another-hdfs-sink], taskIds=[another-hdfs-sink-0], revokedConnectorIds=[], revokedTaskIds=[], delay=30} (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1397) [2019-11-03 11:12:06,041] INFO [Worker clientId=connect-1, groupId=ingest-sources-cluster] Attempt to heartbeat failed for since member id connect-1-bf534716-be2f-4cb3-9f26-521023c6b504 is not valid. (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:947) [2019-11-03 11:12:09,251] INFO [Worker clientId=connect-1, groupId=ingest-sources-cluster] Joined group at generation 641 and got assignment: Assignment{error=0, leader='connect-1-7d69fcf2-1025-418b-9091-4054b984d18f', leaderUrl='http://10.16.0.18:8083/', offset=250, connectorIds=[another-hdfs-sink], taskIds=[another-hdfs-sink-0], revokedConnectorIds=[], revokedTaskIds=[], delay=0} (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1397) [2019-11-03 12:49:03,150] INFO [Worker clientId=connect-1, groupId=ingest-sources-cluster] Attempt to heartbeat failed for since member id connect-1-c930bdb9-eedf-4313-95e0-4a6927836094 is not valid. (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:947) [2019-11-03 12:49:05,632] INFO [Worker clientId=connect-1, groupId=ingest-sources-cluster] Joined group at generation 642 and got assignment: Assignment{error=0, leader='connect-1-7d69fcf2-1025-418b-9091-4054b984d18f', leaderUrl='http://10.16.0.18:8083/', offset=250, connectorIds=[another-hdfs-sink], taskIds=[another-hdfs-sink-0], revokedConnectorIds=[], revokedTaskIds=[], delay=0} (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1397) {code} *Worker C*: {code:java} [2019-11-03 11:07:26,911] INFO [Worker clientId=connect-1, groupId=ingest-sources-cluster] Joined group at generation 639 and got assignment: Assignment{error=0, leader='connect-1-7d69fcf2-1025-418b-9091-4054b984d18f', leaderUrl='http://10.16.0.18:8083/', offset=250, connectorIds=[hdfs-sink], taskIds=[hdfs-sink-0], revokedConnectorIds=[], revokedTaskIds=[], delay=30} (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1397) [2019-11-03 11:12:06,006] INFO [Worker clientId=connect-1, groupId=ingest-sources-cluster]
[jira] [Updated] (KAFKA-9204) ReplaceField transformation fails when encountering tombstone event
[ https://issues.apache.org/jira/browse/KAFKA-9204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Georgios Kalogiros updated KAFKA-9204: -- Fix Version/s: (was: 2.3.0) > ReplaceField transformation fails when encountering tombstone event > --- > > Key: KAFKA-9204 > URL: https://issues.apache.org/jira/browse/KAFKA-9204 > Project: Kafka > Issue Type: Bug > Components: KafkaConnect >Reporter: Georgios Kalogiros >Priority: Major > > When applying the {{ReplaceField}} transformation to a tombstone event, an > exception is raised: > > {code:java} > org.apache.kafka.connect.errors.ConnectException: Tolerance exceeded in error > handler > at > org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:178) > at > org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execute(RetryWithToleranceOperator.java:104) > at > org.apache.kafka.connect.runtime.TransformationChain.apply(TransformationChain.java:50) > at > org.apache.kafka.connect.runtime.WorkerSinkTask.convertAndTransformRecord(WorkerSinkTask.java:506) > at > org.apache.kafka.connect.runtime.WorkerSinkTask.convertMessages(WorkerSinkTask.java:464) > at > org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:320) > at > org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:224) > at > org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:192) > at > org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:177) > at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:227) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: org.apache.kafka.connect.errors.DataException: Only Map objects > supported in absence of schema for [field replacement], found: null > at > org.apache.kafka.connect.transforms.util.Requirements.requireMap(Requirements.java:38) > at > org.apache.kafka.connect.transforms.ReplaceField.applySchemaless(ReplaceField.java:134) > at > org.apache.kafka.connect.transforms.ReplaceField.apply(ReplaceField.java:127) > at > org.apache.kafka.connect.runtime.TransformationChain.lambda$apply$0(TransformationChain.java:50) > at > org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndRetry(RetryWithToleranceOperator.java:128) > at > org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:162) > ... 14 more > {code} > There was a similar bug for the InsertField transformation that got merged in > recently: > https://issues.apache.org/jira/browse/KAFKA-8523 > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KAFKA-9204) ReplaceField transformation fails when encountering tombstone event
[ https://issues.apache.org/jira/browse/KAFKA-9204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Georgios Kalogiros updated KAFKA-9204: -- Affects Version/s: 2.3.0 > ReplaceField transformation fails when encountering tombstone event > --- > > Key: KAFKA-9204 > URL: https://issues.apache.org/jira/browse/KAFKA-9204 > Project: Kafka > Issue Type: Bug > Components: KafkaConnect >Affects Versions: 2.3.0 >Reporter: Georgios Kalogiros >Priority: Major > > When applying the {{ReplaceField}} transformation to a tombstone event, an > exception is raised: > > {code:java} > org.apache.kafka.connect.errors.ConnectException: Tolerance exceeded in error > handler > at > org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:178) > at > org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execute(RetryWithToleranceOperator.java:104) > at > org.apache.kafka.connect.runtime.TransformationChain.apply(TransformationChain.java:50) > at > org.apache.kafka.connect.runtime.WorkerSinkTask.convertAndTransformRecord(WorkerSinkTask.java:506) > at > org.apache.kafka.connect.runtime.WorkerSinkTask.convertMessages(WorkerSinkTask.java:464) > at > org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:320) > at > org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:224) > at > org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:192) > at > org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:177) > at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:227) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: org.apache.kafka.connect.errors.DataException: Only Map objects > supported in absence of schema for [field replacement], found: null > at > org.apache.kafka.connect.transforms.util.Requirements.requireMap(Requirements.java:38) > at > org.apache.kafka.connect.transforms.ReplaceField.applySchemaless(ReplaceField.java:134) > at > org.apache.kafka.connect.transforms.ReplaceField.apply(ReplaceField.java:127) > at > org.apache.kafka.connect.runtime.TransformationChain.lambda$apply$0(TransformationChain.java:50) > at > org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndRetry(RetryWithToleranceOperator.java:128) > at > org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:162) > ... 14 more > {code} > There was a similar bug for the InsertField transformation that got merged in > recently: > https://issues.apache.org/jira/browse/KAFKA-8523 > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-9204) ReplaceField transformation fails when encountering tombstone event
Georgios Kalogiros created KAFKA-9204: - Summary: ReplaceField transformation fails when encountering tombstone event Key: KAFKA-9204 URL: https://issues.apache.org/jira/browse/KAFKA-9204 Project: Kafka Issue Type: Bug Components: KafkaConnect Reporter: Georgios Kalogiros Fix For: 2.3.0 When applying the {{ReplaceField}} transformation to a tombstone event, an exception is raised: {code:java} org.apache.kafka.connect.errors.ConnectException: Tolerance exceeded in error handler at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:178) at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execute(RetryWithToleranceOperator.java:104) at org.apache.kafka.connect.runtime.TransformationChain.apply(TransformationChain.java:50) at org.apache.kafka.connect.runtime.WorkerSinkTask.convertAndTransformRecord(WorkerSinkTask.java:506) at org.apache.kafka.connect.runtime.WorkerSinkTask.convertMessages(WorkerSinkTask.java:464) at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:320) at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:224) at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:192) at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:177) at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:227) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.kafka.connect.errors.DataException: Only Map objects supported in absence of schema for [field replacement], found: null at org.apache.kafka.connect.transforms.util.Requirements.requireMap(Requirements.java:38) at org.apache.kafka.connect.transforms.ReplaceField.applySchemaless(ReplaceField.java:134) at org.apache.kafka.connect.transforms.ReplaceField.apply(ReplaceField.java:127) at org.apache.kafka.connect.runtime.TransformationChain.lambda$apply$0(TransformationChain.java:50) at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndRetry(RetryWithToleranceOperator.java:128) at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:162) ... 14 more {code} There was a similar bug for the InsertField transformation that got merged in recently: https://issues.apache.org/jira/browse/KAFKA-8523 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (KAFKA-9124) KIP-497: ISR changes should be propagated via Kafka protocol
[ https://issues.apache.org/jira/browse/KAFKA-9124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16976446#comment-16976446 ] Viktor Somogyi-Vass edited comment on KAFKA-9124 at 11/18/19 11:06 AM: --- [~cmccabe] sure. Do you think we should mark this or the other jira as a duplicate? was (Author: viktorsomogyi): [~cmccabe] sure. Do you think we should mark this as a duplicate? > KIP-497: ISR changes should be propagated via Kafka protocol > > > Key: KAFKA-9124 > URL: https://issues.apache.org/jira/browse/KAFKA-9124 > Project: Kafka > Issue Type: Sub-task >Reporter: Viktor Somogyi-Vass >Assignee: Colin McCabe >Priority: Major > > Currently {{Partition.expandIsr}} and {{Partition.shrinkIsr}} updates > Zookeeper which is listened by the controller and that's how it notices the > ISR changes and sends out metadata requests. > Instead of this the brokers should use Kafka protocol messages to send out > ISR change notifications. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KAFKA-9125) GroupMetadataManager and TransactionStateManager should query the controller instead of zkClient
[ https://issues.apache.org/jira/browse/KAFKA-9125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Viktor Somogyi-Vass updated KAFKA-9125: --- Summary: GroupMetadataManager and TransactionStateManager should query the controller instead of zkClient (was: GroupMetadataManager and TransactionStateManager should use metadata cache instead of zkClient) > GroupMetadataManager and TransactionStateManager should query the controller > instead of zkClient > > > Key: KAFKA-9125 > URL: https://issues.apache.org/jira/browse/KAFKA-9125 > Project: Kafka > Issue Type: Sub-task >Reporter: Viktor Somogyi-Vass >Assignee: Viktor Somogyi-Vass >Priority: Major > > Both classes query their respective topic's partition count via the zkClient. > This however could be queried by the broker's local metadata cache. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KAFKA-9203) kafka-client 2.3.1 fails to consume lz4 compressed topic in kafka 2.1.1
[ https://issues.apache.org/jira/browse/KAFKA-9203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Watzke updated KAFKA-9203: Description: I run kafka cluster 2.1.1 when I upgraded a client app to use kafka-client 2.3.0 (or 2.3.1) instead of 2.2.0, I immediately started getting the following exceptions in a loop when consuming a topic with LZ4-compressed messages: {noformat} 2019-11-18 11:59:16.888 ERROR [afka-consumer-thread] com.avast.utils2.kafka.consumer.ConsumerThread: Unexpected exception occurred while polling and processing messages: org.apache.kafka.common.KafkaExce ption: Received exception when fetching the next record from FILEREP_LONER_REQUEST-4. If needed, please seek past the record to continue consumption. org.apache.kafka.common.KafkaException: Received exception when fetching the next record from FILEREP_LONER_REQUEST-4. If needed, please seek past the record to continue consumption. at org.apache.kafka.clients.consumer.internals.Fetcher$PartitionRecords.fetchRecords(Fetcher.java:1473) at org.apache.kafka.clients.consumer.internals.Fetcher$PartitionRecords.access$1600(Fetcher.java:1332) at org.apache.kafka.clients.consumer.internals.Fetcher.fetchRecords(Fetcher.java:645) at org.apache.kafka.clients.consumer.internals.Fetcher.fetchedRecords(Fetcher.java:606) at org.apache.kafka.clients.consumer.KafkaConsumer.pollForFetches(KafkaConsumer.java:1263) at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1225) at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1201) at com.avast.utils2.kafka.consumer.ConsumerThread.pollAndProcessMessagesFromKafka(ConsumerThread.java:180) at com.avast.utils2.kafka.consumer.ConsumerThread.run(ConsumerThread.java:149) at com.avast.filerep.saver.RequestSaver$.$anonfun$main$8(RequestSaver.scala:28) at com.avast.filerep.saver.RequestSaver$.$anonfun$main$8$adapted(RequestSaver.scala:19) at resource.AbstractManagedResource.$anonfun$acquireFor$1(AbstractManagedResource.scala:88) at scala.util.control.Exception$Catch.$anonfun$either$1(Exception.scala:252) at scala.util.control.Exception$Catch.apply(Exception.scala:228) at scala.util.control.Exception$Catch.either(Exception.scala:252) at resource.AbstractManagedResource.acquireFor(AbstractManagedResource.scala:88) at resource.ManagedResourceOperations.apply(ManagedResourceOperations.scala:26) at resource.ManagedResourceOperations.apply$(ManagedResourceOperations.scala:26) at resource.AbstractManagedResource.apply(AbstractManagedResource.scala:50) at resource.ManagedResourceOperations.acquireAndGet(ManagedResourceOperations.scala:25) at resource.ManagedResourceOperations.acquireAndGet$(ManagedResourceOperations.scala:25) at resource.AbstractManagedResource.acquireAndGet(AbstractManagedResource.scala:50) at resource.ManagedResourceOperations.foreach(ManagedResourceOperations.scala:53) at resource.ManagedResourceOperations.foreach$(ManagedResourceOperations.scala:53) at resource.AbstractManagedResource.foreach(AbstractManagedResource.scala:50) at com.avast.filerep.saver.RequestSaver$.$anonfun$main$6(RequestSaver.scala:19) at com.avast.filerep.saver.RequestSaver$.$anonfun$main$6$adapted(RequestSaver.scala:18) at resource.AbstractManagedResource.$anonfun$acquireFor$1(AbstractManagedResource.scala:88) at scala.util.control.Exception$Catch.$anonfun$either$1(Exception.scala:252) at scala.util.control.Exception$Catch.apply(Exception.scala:228) at scala.util.control.Exception$Catch.either(Exception.scala:252) at resource.AbstractManagedResource.acquireFor(AbstractManagedResource.scala:88) at resource.ManagedResourceOperations.apply(ManagedResourceOperations.scala:26) at resource.ManagedResourceOperations.apply$(ManagedResourceOperations.scala:26) at resource.AbstractManagedResource.apply(AbstractManagedResource.scala:50) at resource.ManagedResourceOperations.acquireAndGet(ManagedResourceOperations.scala:25) at resource.ManagedResourceOperations.acquireAndGet$(ManagedResourceOperations.scala:25) at resource.AbstractManagedResource.acquireAndGet(AbstractManagedResource.scala:50) at resource.ManagedResourceOperations.foreach(ManagedResourceOperations.scala:53) at resource.ManagedResourceOperations.foreach$(ManagedResourceOperations.scala:53) at resource.AbstractManagedResource.foreach(AbstractManagedResource.scala:50) at com.avast.filerep.saver.RequestSaver$.$anonfun$main$4(RequestSaver.scala:18) at
[jira] [Commented] (KAFKA-9124) KIP-497: ISR changes should be propagated via Kafka protocol
[ https://issues.apache.org/jira/browse/KAFKA-9124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16976446#comment-16976446 ] Viktor Somogyi-Vass commented on KAFKA-9124: [~cmccabe] sure. Do you think we should mark this as a duplicate? > KIP-497: ISR changes should be propagated via Kafka protocol > > > Key: KAFKA-9124 > URL: https://issues.apache.org/jira/browse/KAFKA-9124 > Project: Kafka > Issue Type: Sub-task >Reporter: Viktor Somogyi-Vass >Assignee: Colin McCabe >Priority: Major > > Currently {{Partition.expandIsr}} and {{Partition.shrinkIsr}} updates > Zookeeper which is listened by the controller and that's how it notices the > ISR changes and sends out metadata requests. > Instead of this the brokers should use Kafka protocol messages to send out > ISR change notifications. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-9203) kafka-client 2.3.1 fails to consume lz4 compressed topic in kafka 2.1.1
David Watzke created KAFKA-9203: --- Summary: kafka-client 2.3.1 fails to consume lz4 compressed topic in kafka 2.1.1 Key: KAFKA-9203 URL: https://issues.apache.org/jira/browse/KAFKA-9203 Project: Kafka Issue Type: Bug Components: compression, consumer Affects Versions: 2.3.0, 2.3.1 Reporter: David Watzke I run kafka cluster 2.1.1 when I upgraded a client app to use kafka-client 2.3.0 (or 2.3.1) instead of 2.2.0, I immediately started getting the following exceptions in a loop: {noformat} 2019-11-18 11:59:16.888 ERROR [afka-consumer-thread] com.avast.utils2.kafka.consumer.ConsumerThread: Unexpected exception occurred while polling and processing messages: org.apache.kafka.common.KafkaExce ption: Received exception when fetching the next record from FILEREP_LONER_REQUEST-4. If needed, please seek past the record to continue consumption. org.apache.kafka.common.KafkaException: Received exception when fetching the next record from FILEREP_LONER_REQUEST-4. If needed, please seek past the record to continue consumption. at org.apache.kafka.clients.consumer.internals.Fetcher$PartitionRecords.fetchRecords(Fetcher.java:1473) at org.apache.kafka.clients.consumer.internals.Fetcher$PartitionRecords.access$1600(Fetcher.java:1332) at org.apache.kafka.clients.consumer.internals.Fetcher.fetchRecords(Fetcher.java:645) at org.apache.kafka.clients.consumer.internals.Fetcher.fetchedRecords(Fetcher.java:606) at org.apache.kafka.clients.consumer.KafkaConsumer.pollForFetches(KafkaConsumer.java:1263) at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1225) at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1201) at com.avast.utils2.kafka.consumer.ConsumerThread.pollAndProcessMessagesFromKafka(ConsumerThread.java:180) at com.avast.utils2.kafka.consumer.ConsumerThread.run(ConsumerThread.java:149) at com.avast.filerep.saver.RequestSaver$.$anonfun$main$8(RequestSaver.scala:28) at com.avast.filerep.saver.RequestSaver$.$anonfun$main$8$adapted(RequestSaver.scala:19) at resource.AbstractManagedResource.$anonfun$acquireFor$1(AbstractManagedResource.scala:88) at scala.util.control.Exception$Catch.$anonfun$either$1(Exception.scala:252) at scala.util.control.Exception$Catch.apply(Exception.scala:228) at scala.util.control.Exception$Catch.either(Exception.scala:252) at resource.AbstractManagedResource.acquireFor(AbstractManagedResource.scala:88) at resource.ManagedResourceOperations.apply(ManagedResourceOperations.scala:26) at resource.ManagedResourceOperations.apply$(ManagedResourceOperations.scala:26) at resource.AbstractManagedResource.apply(AbstractManagedResource.scala:50) at resource.ManagedResourceOperations.acquireAndGet(ManagedResourceOperations.scala:25) at resource.ManagedResourceOperations.acquireAndGet$(ManagedResourceOperations.scala:25) at resource.AbstractManagedResource.acquireAndGet(AbstractManagedResource.scala:50) at resource.ManagedResourceOperations.foreach(ManagedResourceOperations.scala:53) at resource.ManagedResourceOperations.foreach$(ManagedResourceOperations.scala:53) at resource.AbstractManagedResource.foreach(AbstractManagedResource.scala:50) at com.avast.filerep.saver.RequestSaver$.$anonfun$main$6(RequestSaver.scala:19) at com.avast.filerep.saver.RequestSaver$.$anonfun$main$6$adapted(RequestSaver.scala:18) at resource.AbstractManagedResource.$anonfun$acquireFor$1(AbstractManagedResource.scala:88) at scala.util.control.Exception$Catch.$anonfun$either$1(Exception.scala:252) at scala.util.control.Exception$Catch.apply(Exception.scala:228) at scala.util.control.Exception$Catch.either(Exception.scala:252) at resource.AbstractManagedResource.acquireFor(AbstractManagedResource.scala:88) at resource.ManagedResourceOperations.apply(ManagedResourceOperations.scala:26) at resource.ManagedResourceOperations.apply$(ManagedResourceOperations.scala:26) at resource.AbstractManagedResource.apply(AbstractManagedResource.scala:50) at resource.ManagedResourceOperations.acquireAndGet(ManagedResourceOperations.scala:25) at resource.ManagedResourceOperations.acquireAndGet$(ManagedResourceOperations.scala:25) at resource.AbstractManagedResource.acquireAndGet(AbstractManagedResource.scala:50) at resource.ManagedResourceOperations.foreach(ManagedResourceOperations.scala:53) at resource.ManagedResourceOperations.foreach$(ManagedResourceOperations.scala:53) at
[jira] [Assigned] (KAFKA-9167) Implement a broker to controller request channel
[ https://issues.apache.org/jira/browse/KAFKA-9167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Viktor Somogyi-Vass reassigned KAFKA-9167: -- Assignee: Viktor Somogyi-Vass (was: Dhruvil Shah) > Implement a broker to controller request channel > > > Key: KAFKA-9167 > URL: https://issues.apache.org/jira/browse/KAFKA-9167 > Project: Kafka > Issue Type: Sub-task > Components: controller, core >Reporter: Viktor Somogyi-Vass >Assignee: Viktor Somogyi-Vass >Priority: Major > > In some cases, we will need to create a new API to replace an operation that > was formerly done via ZooKeeper. One example of this is that when the leader > of a partition wants to modify the in-sync replica set, it currently modifies > ZooKeeper directly In the post-ZK world, the leader will make an RPC to the > active controller instead. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-9167) Implement a broker to controller request channel
[ https://issues.apache.org/jira/browse/KAFKA-9167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16976440#comment-16976440 ] Viktor Somogyi-Vass commented on KAFKA-9167: Ok, let's catch up about your ideas too, don't want to leave them out. > Implement a broker to controller request channel > > > Key: KAFKA-9167 > URL: https://issues.apache.org/jira/browse/KAFKA-9167 > Project: Kafka > Issue Type: Sub-task > Components: controller, core >Reporter: Viktor Somogyi-Vass >Assignee: Dhruvil Shah >Priority: Major > > In some cases, we will need to create a new API to replace an operation that > was formerly done via ZooKeeper. One example of this is that when the leader > of a partition wants to modify the in-sync replica set, it currently modifies > ZooKeeper directly In the post-ZK world, the leader will make an RPC to the > active controller instead. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (KAFKA-8958) Fix Kafka Streams JavaDocs with regard to used Serdes
[ https://issues.apache.org/jira/browse/KAFKA-8958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] bibin sebastian reassigned KAFKA-8958: -- Assignee: (was: bibin sebastian) > Fix Kafka Streams JavaDocs with regard to used Serdes > - > > Key: KAFKA-8958 > URL: https://issues.apache.org/jira/browse/KAFKA-8958 > Project: Kafka > Issue Type: Improvement > Components: streams >Reporter: Matthias J. Sax >Priority: Minor > Labels: beginner, newbie > > In older released, Kafka Streams applied operator specific overwrites of > Serdes as in-place overwrites. In newer releases, Kafka Streams tries to > re-use Serdes more "aggressively" by pushing serde information downstream if > the key and/or value did not change. > However, we never updated the JavaDocs accordingly. For example > `KStream#through(String topic)` JavaDocs say: > {code:java} > Materialize this stream to a topic and creates a new {@code KStream} from the > topic using default serializers, deserializers, and producer's {@link > DefaultPartitioner}. > {code} > The JavaDocs don't put into account that Serdes might have been set further > upstream, and the defaults from the config would not be used. > `KStream#through()` is just one example. We should address this through all > JavaDocs over all operators (ie, KStream, KGroupedStream, > TimeWindowedKStream, SessionWindowedKStream, KTable, and KGroupedTable. -- This message was sent by Atlassian Jira (v8.3.4#803005)