[jira] [Commented] (KAFKA-6273) Allow names for windowing joins

2018-07-17 Thread Daniel Wojda (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-6273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16547129#comment-16547129
 ] 

Daniel Wojda commented on KAFKA-6273:
-

[~matthiasmargush], what is the status of this KIP? I'm happy to take ownership 
of this change and work on KIP if you are not interested anymore.

> Allow names for windowing joins
> ---
>
> Key: KAFKA-6273
> URL: https://issues.apache.org/jira/browse/KAFKA-6273
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Affects Versions: 1.0.0
>Reporter: Matthias
>Priority: Major
>  Labels: needs-kip
>
> Currently, the internal name of a windowing join is generated at runtime 
> using a counter to ensure uniqueness within a running topology. 
> If a topology is changed and redeployed, different names can be generated for 
> the same window. This can result in lost windowing state without intervention 
> modifying offsets to re-consume the input topics.
> https://kafka.apache.org/10/javadoc/org/apache/kafka/streams/kstream/Joined.html
> The proposed change would be to add an optional configuration parameter 
> "joinName" to Joined. If provided, this would be used by KStreamImpl to 
> generate internal names when building a join. If not provided the existing 
> name generation would be used. (Since this is an opt-in, optional parameter 
> there would be no impact on existing code.)
> Let me know if this should be a KIP instead. The change would be something 
> along [these lines|https://github.com/FundingCircle/kafka/pull/3/files].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-6720) Inconsistent Kafka Streams behaviour when topic does not exist

2018-03-28 Thread Daniel Wojda (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16418090#comment-16418090
 ] 

Daniel Wojda commented on KAFKA-6720:
-

My mistake [~johnma]. That makes sense. Thank you for explanation!

> Inconsistent Kafka Streams behaviour when topic does not exist
> --
>
> Key: KAFKA-6720
> URL: https://issues.apache.org/jira/browse/KAFKA-6720
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 1.0.1
>Reporter: Daniel Wojda
>Priority: Minor
>
> When Kafka Streams starts it reads metadata about topics used in topology
>  and it's partitions. If topology of that stream contains stateful operation 
> like #join, and a topic does not exist 
> [TopologyBuilderException|https://github.com/apache/kafka/blob/5bdfbd13524da667289cacb774bb92df36a253f2/streams/src/main/java/org/apache/kafka/streams/processor/internals/StreamPartitionAssignor.java#L719]
>  will be thrown.
> In case of streams with simple topology with stateless operations only, like 
> #mapValue, and topic does not exist, Kafka Streams does not throw any 
> exception, just logs a warning:
>  ["log.warn("No partitions found for topic {}", 
> topic);"|https://github.com/apache/kafka/blob/5bdfbd13524da667289cacb774bb92df36a253f2/streams/src/main/java/org/apache/kafka/streams/processor/internals/StreamPartitionAssignor.java#L435]
>  
> I believe the behaviour of Kafka Streams in both cases should be the same, 
> and it should throw TopologyBuilderException.
> I am more than happy to prepare a Pull Request if it is a valid issue.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-6437) Streams does not warn about missing input topics, but hangs

2018-03-28 Thread Daniel Wojda (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-6437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16418073#comment-16418073
 ] 

Daniel Wojda commented on KAFKA-6437:
-

I would like to add my comment as a user of Kafka Streams and author of 
KAFKA-6720.
Important missing information here is that if you start Kafka Streams 
application without input topics created, it'll log a warning and stays in this 
"idle" state until you create that topic(s) *AND* a rebalancing happens. If you 
check the status of stream it will be "RUNNING". What is more, please correct 
me if I'm wrong, checking consumer lag will not help, because lag will be 0 
(number of messages in non-existing topic is 0). 

As [~mjsax] already mentioned "it's well documented that you need to create all 
input topics before you start your application", so in my opinion "stopping the 
world and failing" is a better option than starting a "zombie" application. 
I understand that Kafka Streams has many users, other developers can have a 
different opinion than me, but in that case I'd suggest introducing a new 
config. "fail-on-missing-topic"? WDYT?

> Streams does not warn about missing input topics, but hangs
> ---
>
> Key: KAFKA-6437
> URL: https://issues.apache.org/jira/browse/KAFKA-6437
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Affects Versions: 1.0.0
> Environment: Single client on single node broker
>Reporter: Chris Schwarzfischer
>Assignee: Mariam John
>Priority: Minor
>  Labels: newbie
>
> *Case*
> Streams application with two input topics being used for a left join.
> When the left side topic is missing upon starting the streams application, it 
> hangs "in the middle" of the topology (at …9, see below). Only parts of 
> the intermediate topics are created (up to …9)
> When the missing input topic is created, the streams application resumes 
> processing.
> {noformat}
> Topology:
> StreamsTask taskId: 2_0
>   ProcessorTopology:
>   KSTREAM-SOURCE-11:
>   topics: 
> [mystreams_app-KTABLE-AGGREGATE-STATE-STORE-09-repartition]
>   children:   [KTABLE-AGGREGATE-12]
>   KTABLE-AGGREGATE-12:
>   states: 
> [KTABLE-AGGREGATE-STATE-STORE-09]
>   children:   [KTABLE-TOSTREAM-20]
>   KTABLE-TOSTREAM-20:
>   children:   [KSTREAM-SINK-21]
>   KSTREAM-SINK-21:
>   topic:  data_udr_month_customer_aggregration
>   KSTREAM-SOURCE-17:
>   topics: 
> [mystreams_app-KSTREAM-MAP-14-repartition]
>   children:   [KSTREAM-LEFTJOIN-18]
>   KSTREAM-LEFTJOIN-18:
>   states: 
> [KTABLE-AGGREGATE-STATE-STORE-09]
>   children:   [KSTREAM-SINK-19]
>   KSTREAM-SINK-19:
>   topic:  data_UDR_joined
> Partitions [mystreams_app-KSTREAM-MAP-14-repartition-0, 
> mystreams_app-KTABLE-AGGREGATE-STATE-STORE-09-repartition-0]
> {noformat}
> *Why this matters*
> The applications does quite a lot of preprocessing before joining with the 
> missing input topic. This preprocessing won't happen without the topic, 
> creating a huge backlog of data.
> *Fix*
> Issue an `warn` or `error` level message at start to inform about the missing 
> topic and it's consequences.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-6720) Inconsistent Kafka Streams behaviour when topic does not exist

2018-03-27 Thread Daniel Wojda (JIRA)
Daniel Wojda created KAFKA-6720:
---

 Summary: Inconsistent Kafka Streams behaviour when topic does not 
exist
 Key: KAFKA-6720
 URL: https://issues.apache.org/jira/browse/KAFKA-6720
 Project: Kafka
  Issue Type: Bug
  Components: streams
Affects Versions: 1.0.1
Reporter: Daniel Wojda


When Kafka Streams starts it reads metadata about topics used in topology
 and it's partitions. If topology of that stream contains stateful operation 
like #join, and a topic does not exist 
[TopologyBuilderException|https://github.com/apache/kafka/blob/5bdfbd13524da667289cacb774bb92df36a253f2/streams/src/main/java/org/apache/kafka/streams/processor/internals/StreamPartitionAssignor.java#L719]
 will be thrown.

In case of streams with simple topology with stateless operations only, like 
#mapValue, and topic does not exist, Kafka Streams does not throw any 
exception, just logs a warning:
 ["log.warn("No partitions found for topic {}", 
topic);"|https://github.com/apache/kafka/blob/5bdfbd13524da667289cacb774bb92df36a253f2/streams/src/main/java/org/apache/kafka/streams/processor/internals/StreamPartitionAssignor.java#L435]
 

I believe the behaviour of Kafka Streams in both cases should be the same, and 
it should throw TopologyBuilderException.

I am more than happy to prepare a Pull Request if it is a valid issue.

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-6176) numDroppedMessages metric should not be incremented when no data are lost

2017-11-06 Thread Daniel Wojda (JIRA)
Daniel Wojda created KAFKA-6176:
---

 Summary: numDroppedMessages metric should not be incremented when 
no data are lost
 Key: KAFKA-6176
 URL: https://issues.apache.org/jira/browse/KAFKA-6176
 Project: Kafka
  Issue Type: Bug
  Components: metrics
Affects Versions: 1.0.0
Reporter: Daniel Wojda
Priority: Minor


Mirror Maker can be configured to not lose the data when producing failed. 

However when Mirror Maker is configured correctly and no messages are lost, 
*numDroppedMessages* metric is increased in case of producer failure. It is 
misleading for people who monitor Mirror Maker. 

Could we increase that metric when messages are really dropped? Or at least 
change the name to "numProducerFailuers"?




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)