[jira] [Commented] (KAFKA-13326) Add multi-cluster support to Kafka Streams
[ https://issues.apache.org/jira/browse/KAFKA-13326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17424298#comment-17424298 ] Matthias J. Sax commented on KAFKA-13326: - Neither mirror-maker2 not replicator support EOS semantics while Kafka Streams does. EOS is based on Kafka's transaction that don't work cross-cluster. Both, mirror-maker2 and replicator are "simple" copy-data tools, while Kafka Streams, that does processing, is much more complicated. For example, Kafka Streams might create additional topics to repartition data for processing, or create changelog topics for stateful processing. If there are two clusters, it's unclear in which those topics should be created. Also, it would be required not have a single consumer and producer, but to have multiple to read/write to both clusters. There is also complications with consumer group management if there are two cluster the application would connect to, as it's unclear which of both is really in charge (split brain problem). While it's not impossible to built, it's rather complex. > Add multi-cluster support to Kafka Streams > -- > > Key: KAFKA-13326 > URL: https://issues.apache.org/jira/browse/KAFKA-13326 > Project: Kafka > Issue Type: New Feature > Components: streams >Reporter: Guangyuan Wang >Priority: Major > Labels: needs-kip > > Dear Kafka Team, > According to the link, > https://kafka.apache.org/28/documentation/streams/developer-guide/config-streams.html#bootstrap-servers. > Kafka Streams applications can only communicate with a single Kafka cluster > specified by this config value. Future versions of Kafka Streams will support > connecting to different Kafka clusters for reading input streams and writing > output streams. > Which version will this feature be added in the Kafka stream? This is really > a very good feature. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-13326) Add multi-cluster support to Kafka Streams
[ https://issues.apache.org/jira/browse/KAFKA-13326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17422484#comment-17422484 ] Guangyuan Wang commented on KAFKA-13326: [~bchen225242] Thank you very much for the information provided. And I am a little curious about whether mirror-maker2 or other replicators face the same challenges? Why replicators could play well in cross-cluster, but Kafka stream can't? > Add multi-cluster support to Kafka Streams > -- > > Key: KAFKA-13326 > URL: https://issues.apache.org/jira/browse/KAFKA-13326 > Project: Kafka > Issue Type: New Feature > Components: streams >Reporter: Guangyuan Wang >Priority: Major > Labels: needs-kip > > Dear Kafka Team, > According to the link, > https://kafka.apache.org/28/documentation/streams/developer-guide/config-streams.html#bootstrap-servers. > Kafka Streams applications can only communicate with a single Kafka cluster > specified by this config value. Future versions of Kafka Streams will support > connecting to different Kafka clusters for reading input streams and writing > output streams. > Which version will this feature be added in the Kafka stream? This is really > a very good feature. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-13326) Add multi-cluster support to Kafka Streams
[ https://issues.apache.org/jira/browse/KAFKA-13326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17421952#comment-17421952 ] Matthias J. Sax commented on KAFKA-13326: - PR to update the web-page: https://github.com/apache/kafka-site/pull/376 > Add multi-cluster support to Kafka Streams > -- > > Key: KAFKA-13326 > URL: https://issues.apache.org/jira/browse/KAFKA-13326 > Project: Kafka > Issue Type: New Feature > Components: streams >Reporter: Guangyuan Wang >Priority: Major > Labels: needs-kip > > Dear Kafka Team, > According to the link, > https://kafka.apache.org/28/documentation/streams/developer-guide/config-streams.html#bootstrap-servers. > Kafka Streams applications can only communicate with a single Kafka cluster > specified by this config value. Future versions of Kafka Streams will support > connecting to different Kafka clusters for reading input streams and writing > output streams. > Which version will this feature be added in the Kafka stream? This is really > a very good feature. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-13326) Add multi-cluster support to Kafka Streams
[ https://issues.apache.org/jira/browse/KAFKA-13326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17421107#comment-17421107 ] Boyang Chen commented on KAFKA-13326: - [~wangguangyuan] Thanks for your interest. There are some known challenges to support cross-cluster processing: # The exactly-once semantic will break because Kafka couldn't do cross-cluster transaction # The topic tracking will need to be augmented with multi-cluster support in mind, which is a significant amount of work # The failure scenario will be complicated, as before it is within one broker cluster but now multiple, any pair of topic-cluster mapping could fail and cause a weird state to recover Considering numerous replicators existing on the market, it's not a top priority to support that natively in Kafka Streams. [~guozhang] [~mjsax]could give more insights here. > Add multi-cluster support to Kafka Streams > -- > > Key: KAFKA-13326 > URL: https://issues.apache.org/jira/browse/KAFKA-13326 > Project: Kafka > Issue Type: New Feature > Components: streams >Reporter: Guangyuan Wang >Priority: Major > Labels: needs-kip > > Dear Kafka Team, > According to the link, > https://kafka.apache.org/28/documentation/streams/developer-guide/config-streams.html#bootstrap-servers. > Kafka Streams applications can only communicate with a single Kafka cluster > specified by this config value. Future versions of Kafka Streams will support > connecting to different Kafka clusters for reading input streams and writing > output streams. > Which version will this feature be added in the Kafka stream? This is really > a very good feature. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-13326) Add multi-cluster support to Kafka Streams
[ https://issues.apache.org/jira/browse/KAFKA-13326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17421100#comment-17421100 ] Guangyuan Wang commented on KAFKA-13326: [~mjsax] Thank you for your answer. Cross-cluster data processing is very important for my product. As I'd like to use Kafka stream to filter messages from one cluster to another, this will reduce the number of the message need to be copied. Without cross-cluster data processing, I could only use mirror-maker2. This will cause a large number of messages to be copied from one cluster to another. Could I know the reason, why there is no plan atm to support cross-cluster data processing? And I will be glad to contribute this feature if you think this is a good feature. > Add multi-cluster support to Kafka Streams > -- > > Key: KAFKA-13326 > URL: https://issues.apache.org/jira/browse/KAFKA-13326 > Project: Kafka > Issue Type: New Feature > Components: streams >Reporter: Guangyuan Wang >Priority: Major > Labels: needs-kip > > Dear Kafka Team, > According to the link, > https://kafka.apache.org/28/documentation/streams/developer-guide/config-streams.html#bootstrap-servers. > Kafka Streams applications can only communicate with a single Kafka cluster > specified by this config value. Future versions of Kafka Streams will support > connecting to different Kafka clusters for reading input streams and writing > output streams. > Which version will this feature be added in the Kafka stream? This is really > a very good feature. -- This message was sent by Atlassian Jira (v8.3.4#803005)