[jira] [Commented] (KAFKA-6273) Allow names for windowing joins
[ https://issues.apache.org/jira/browse/KAFKA-6273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16547129#comment-16547129 ] Daniel Wojda commented on KAFKA-6273: - [~matthiasmargush], what is the status of this KIP? I'm happy to take ownership of this change and work on KIP if you are not interested anymore. > Allow names for windowing joins > --- > > Key: KAFKA-6273 > URL: https://issues.apache.org/jira/browse/KAFKA-6273 > Project: Kafka > Issue Type: Improvement > Components: streams >Affects Versions: 1.0.0 >Reporter: Matthias >Priority: Major > Labels: needs-kip > > Currently, the internal name of a windowing join is generated at runtime > using a counter to ensure uniqueness within a running topology. > If a topology is changed and redeployed, different names can be generated for > the same window. This can result in lost windowing state without intervention > modifying offsets to re-consume the input topics. > https://kafka.apache.org/10/javadoc/org/apache/kafka/streams/kstream/Joined.html > The proposed change would be to add an optional configuration parameter > "joinName" to Joined. If provided, this would be used by KStreamImpl to > generate internal names when building a join. If not provided the existing > name generation would be used. (Since this is an opt-in, optional parameter > there would be no impact on existing code.) > Let me know if this should be a KIP instead. The change would be something > along [these lines|https://github.com/FundingCircle/kafka/pull/3/files]. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-6720) Inconsistent Kafka Streams behaviour when topic does not exist
[ https://issues.apache.org/jira/browse/KAFKA-6720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16418090#comment-16418090 ] Daniel Wojda commented on KAFKA-6720: - My mistake [~johnma]. That makes sense. Thank you for explanation! > Inconsistent Kafka Streams behaviour when topic does not exist > -- > > Key: KAFKA-6720 > URL: https://issues.apache.org/jira/browse/KAFKA-6720 > Project: Kafka > Issue Type: Bug > Components: streams >Affects Versions: 1.0.1 >Reporter: Daniel Wojda >Priority: Minor > > When Kafka Streams starts it reads metadata about topics used in topology > and it's partitions. If topology of that stream contains stateful operation > like #join, and a topic does not exist > [TopologyBuilderException|https://github.com/apache/kafka/blob/5bdfbd13524da667289cacb774bb92df36a253f2/streams/src/main/java/org/apache/kafka/streams/processor/internals/StreamPartitionAssignor.java#L719] > will be thrown. > In case of streams with simple topology with stateless operations only, like > #mapValue, and topic does not exist, Kafka Streams does not throw any > exception, just logs a warning: > ["log.warn("No partitions found for topic {}", > topic);"|https://github.com/apache/kafka/blob/5bdfbd13524da667289cacb774bb92df36a253f2/streams/src/main/java/org/apache/kafka/streams/processor/internals/StreamPartitionAssignor.java#L435] > > I believe the behaviour of Kafka Streams in both cases should be the same, > and it should throw TopologyBuilderException. > I am more than happy to prepare a Pull Request if it is a valid issue. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KAFKA-6437) Streams does not warn about missing input topics, but hangs
[ https://issues.apache.org/jira/browse/KAFKA-6437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16418073#comment-16418073 ] Daniel Wojda commented on KAFKA-6437: - I would like to add my comment as a user of Kafka Streams and author of KAFKA-6720. Important missing information here is that if you start Kafka Streams application without input topics created, it'll log a warning and stays in this "idle" state until you create that topic(s) *AND* a rebalancing happens. If you check the status of stream it will be "RUNNING". What is more, please correct me if I'm wrong, checking consumer lag will not help, because lag will be 0 (number of messages in non-existing topic is 0). As [~mjsax] already mentioned "it's well documented that you need to create all input topics before you start your application", so in my opinion "stopping the world and failing" is a better option than starting a "zombie" application. I understand that Kafka Streams has many users, other developers can have a different opinion than me, but in that case I'd suggest introducing a new config. "fail-on-missing-topic"? WDYT? > Streams does not warn about missing input topics, but hangs > --- > > Key: KAFKA-6437 > URL: https://issues.apache.org/jira/browse/KAFKA-6437 > Project: Kafka > Issue Type: Improvement > Components: streams >Affects Versions: 1.0.0 > Environment: Single client on single node broker >Reporter: Chris Schwarzfischer >Assignee: Mariam John >Priority: Minor > Labels: newbie > > *Case* > Streams application with two input topics being used for a left join. > When the left side topic is missing upon starting the streams application, it > hangs "in the middle" of the topology (at …9, see below). Only parts of > the intermediate topics are created (up to …9) > When the missing input topic is created, the streams application resumes > processing. > {noformat} > Topology: > StreamsTask taskId: 2_0 > ProcessorTopology: > KSTREAM-SOURCE-11: > topics: > [mystreams_app-KTABLE-AGGREGATE-STATE-STORE-09-repartition] > children: [KTABLE-AGGREGATE-12] > KTABLE-AGGREGATE-12: > states: > [KTABLE-AGGREGATE-STATE-STORE-09] > children: [KTABLE-TOSTREAM-20] > KTABLE-TOSTREAM-20: > children: [KSTREAM-SINK-21] > KSTREAM-SINK-21: > topic: data_udr_month_customer_aggregration > KSTREAM-SOURCE-17: > topics: > [mystreams_app-KSTREAM-MAP-14-repartition] > children: [KSTREAM-LEFTJOIN-18] > KSTREAM-LEFTJOIN-18: > states: > [KTABLE-AGGREGATE-STATE-STORE-09] > children: [KSTREAM-SINK-19] > KSTREAM-SINK-19: > topic: data_UDR_joined > Partitions [mystreams_app-KSTREAM-MAP-14-repartition-0, > mystreams_app-KTABLE-AGGREGATE-STATE-STORE-09-repartition-0] > {noformat} > *Why this matters* > The applications does quite a lot of preprocessing before joining with the > missing input topic. This preprocessing won't happen without the topic, > creating a huge backlog of data. > *Fix* > Issue an `warn` or `error` level message at start to inform about the missing > topic and it's consequences. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-6720) Inconsistent Kafka Streams behaviour when topic does not exist
Daniel Wojda created KAFKA-6720: --- Summary: Inconsistent Kafka Streams behaviour when topic does not exist Key: KAFKA-6720 URL: https://issues.apache.org/jira/browse/KAFKA-6720 Project: Kafka Issue Type: Bug Components: streams Affects Versions: 1.0.1 Reporter: Daniel Wojda When Kafka Streams starts it reads metadata about topics used in topology and it's partitions. If topology of that stream contains stateful operation like #join, and a topic does not exist [TopologyBuilderException|https://github.com/apache/kafka/blob/5bdfbd13524da667289cacb774bb92df36a253f2/streams/src/main/java/org/apache/kafka/streams/processor/internals/StreamPartitionAssignor.java#L719] will be thrown. In case of streams with simple topology with stateless operations only, like #mapValue, and topic does not exist, Kafka Streams does not throw any exception, just logs a warning: ["log.warn("No partitions found for topic {}", topic);"|https://github.com/apache/kafka/blob/5bdfbd13524da667289cacb774bb92df36a253f2/streams/src/main/java/org/apache/kafka/streams/processor/internals/StreamPartitionAssignor.java#L435] I believe the behaviour of Kafka Streams in both cases should be the same, and it should throw TopologyBuilderException. I am more than happy to prepare a Pull Request if it is a valid issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-6176) numDroppedMessages metric should not be incremented when no data are lost
Daniel Wojda created KAFKA-6176: --- Summary: numDroppedMessages metric should not be incremented when no data are lost Key: KAFKA-6176 URL: https://issues.apache.org/jira/browse/KAFKA-6176 Project: Kafka Issue Type: Bug Components: metrics Affects Versions: 1.0.0 Reporter: Daniel Wojda Priority: Minor Mirror Maker can be configured to not lose the data when producing failed. However when Mirror Maker is configured correctly and no messages are lost, *numDroppedMessages* metric is increased in case of producer failure. It is misleading for people who monitor Mirror Maker. Could we increase that metric when messages are really dropped? Or at least change the name to "numProducerFailuers"? -- This message was sent by Atlassian JIRA (v6.4.14#64029)