[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API
[ https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15304143#comment-15304143 ] Cody Koeninger commented on SPARK-12177: This issue already does link to Mark's PR. In fact this issue probably shouldn't link to it unless he updates it, because that PR doesn't handle some pretty important things. I went ahead and merged my 0.10 branch into my linked PR. I think it's fairly obvious releasing something just for 0.9 is not useful at this point. For anyone asking for this to be merged, have you actually tested any of these PRs to see if they meet your needs? > Update KafkaDStreams to new Kafka 0.9 Consumer API > -- > > Key: SPARK-12177 > URL: https://issues.apache.org/jira/browse/SPARK-12177 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.6.0 >Reporter: Nikita Tarasenko > Labels: consumer, kafka > > Kafka 0.9 already released and it introduce new consumer API that not > compatible with old one. So, I added new consumer api. I made separate > classes in package org.apache.spark.streaming.kafka.v09 with changed API. I > didn't remove old classes for more backward compatibility. User will not need > to change his old spark applications when he uprgade to new Spark version. > Please rewiew my changes -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API
[ https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15303874#comment-15303874 ] Grzegorz Rożniecki commented on SPARK-12177: Any ETA on this? Being forced to work with old (0.8.X) Kafka APIs is not pleasant... Plus, this issue should link to this [GitHub PR|https://github.com/apache/spark/pull/10953]. > Update KafkaDStreams to new Kafka 0.9 Consumer API > -- > > Key: SPARK-12177 > URL: https://issues.apache.org/jira/browse/SPARK-12177 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.6.0 >Reporter: Nikita Tarasenko > Labels: consumer, kafka > > Kafka 0.9 already released and it introduce new consumer API that not > compatible with old one. So, I added new consumer api. I made separate > classes in package org.apache.spark.streaming.kafka.v09 with changed API. I > didn't remove old classes for more backward compatibility. User will not need > to change his old spark applications when he uprgade to new Spark version. > Please rewiew my changes -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API
[ https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15300227#comment-15300227 ] Andrey Borisov commented on SPARK-12177: kafka 0.10.0.0 has been released, probably pull request can be merged now. > Update KafkaDStreams to new Kafka 0.9 Consumer API > -- > > Key: SPARK-12177 > URL: https://issues.apache.org/jira/browse/SPARK-12177 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.6.0 >Reporter: Nikita Tarasenko > Labels: consumer, kafka > > Kafka 0.9 already released and it introduce new consumer API that not > compatible with old one. So, I added new consumer api. I made separate > classes in package org.apache.spark.streaming.kafka.v09 with changed API. I > didn't remove old classes for more backward compatibility. User will not need > to change his old spark applications when he uprgade to new Spark version. > Please rewiew my changes -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API
[ https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15274910#comment-15274910 ] Mark Grover commented on SPARK-12177: - I spent some time earlier today looking at the latest Kafka 0.10 RC. Thanks Cody, looks like you poked at it too. As Cody found out, the new kafka consumer API is in flux. I found and filed a PR (KAFKA-3669) for an incompatibility in KafkaConfig, which is now fixed in Kafka 0.10. But, there's more - KAFKA-3633 is of note - which discusses fixing a compatibility break between 0.9 and 0.10 and remains unresolved. So, at this point, I can say, that there's no point in committing the PR associated with this JIRA (https://github.com/apache/spark/pull/11863), until at least Kafka 0.10.0 is released and we have a good sense that Kafka 0.11.0 is not going to break compatibility with 0.10.0. Otherwise, we have to bear the burden of adding and maintaining complexity to build against multiple versions of Kafka, something Storm folks are already suffering from, having their KafkaSpout now using the new Kafka consumer API from 0.9. Given the timing of the Kafka 0.10.0 release and Spark 2 release, this JIRA likely wouldn't get resolved in 2.0. > Update KafkaDStreams to new Kafka 0.9 Consumer API > -- > > Key: SPARK-12177 > URL: https://issues.apache.org/jira/browse/SPARK-12177 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.6.0 >Reporter: Nikita Tarasenko > Labels: consumer, kafka > > Kafka 0.9 already released and it introduce new consumer API that not > compatible with old one. So, I added new consumer api. I made separate > classes in package org.apache.spark.streaming.kafka.v09 with changed API. I > didn't remove old classes for more backward compatibility. User will not need > to change his old spark applications when he uprgade to new Spark version. > Please rewiew my changes -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API
[ https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15274294#comment-15274294 ] Cody Koeninger commented on SPARK-12177: I have a branch working with RC3 of kafka 0.10, tests are at least passing. https://github.com/koeninger/spark-1/tree/kafka-0.10 The changes aren't huge, but I still don't see a reason to publish an artifact for 0.9 only to have it immediately be outdated. > Update KafkaDStreams to new Kafka 0.9 Consumer API > -- > > Key: SPARK-12177 > URL: https://issues.apache.org/jira/browse/SPARK-12177 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.6.0 >Reporter: Nikita Tarasenko > Labels: consumer, kafka > > Kafka 0.9 already released and it introduce new consumer API that not > compatible with old one. So, I added new consumer api. I made separate > classes in package org.apache.spark.streaming.kafka.v09 with changed API. I > didn't remove old classes for more backward compatibility. User will not need > to change his old spark applications when he uprgade to new Spark version. > Please rewiew my changes -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API
[ https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15268696#comment-15268696 ] Mark Grover commented on SPARK-12177: - bq. 1. Rename existing Kafka connector to include 0.8 version in the Maven project name. That sounds good to me. bq. 2. Don't support 0.9 client for now (otherwise we will need to support 3 clients very soon). We should revisit the 0.10 support in 2.1, and most likely support that. This is a little more involved. I agree with Cody that the new Kafka consumer API implementation in Spark, doesn't really have a benefit right now since we can't use the security features which are gated by delegation tokens support in Kafka (KAFKA-1696). However, delegation tokens aren't even going to make it to Kafka 0.10, so I see little point in us not committing the new Kafka consumer API implementation to Spark _because of that_. Also, I think bq. 2. Kafka 0.9 changes the client API. can be better expressed as bq. Kafka 0.9 introduces a new client API. There are 2 axes - one is kafka version (0.8/0.9) and another is the consumer API version (old/new). Both Kafka 0.8 and 0.9 support the old API without any modifications (for the most part) and the existing kafka module in Spark will continue to work with Kafka 0.8 and 0.9 (and with Kafka 0.10, I'd imagine. I have been working with Kafka community to report issues like KAFKA-3563, which break old API compatibility) because existing Spark module is based on the old API which is meant to be compatible in all those versions. As far as the new API goes, that may change in incompatible ways between 0.9 and 0.10, so we may need a new sub-module for an 0.10 based API implementation after all. The point I am trying to make is that there's nothing we'd gain by waiting for Kafka 0.10 to come out. It's not any better than Kafka 0.9 in terms of support of security features. I suppose the only thing you could save on is not having an additional subproject, if new consumer API from Kafka 0.10 broke compatibility significantly as compared to 0.9's new consumer API (smaller incompatibilities can be deal with reflection and such). I haven't played with the RC yet so I don't know if even that is the case. So, really we should be gating this on security features (like KAFKA-1696) going in to Kafka, which won't happen till at least Kafka 0.11 or put this is in right now. I can definitely look at the latest Kafka 0.10 and see how likely it is that we are going to need a new module (probably by end of this week), if that'd help in our decision. > Update KafkaDStreams to new Kafka 0.9 Consumer API > -- > > Key: SPARK-12177 > URL: https://issues.apache.org/jira/browse/SPARK-12177 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.6.0 >Reporter: Nikita Tarasenko > Labels: consumer, kafka > > Kafka 0.9 already released and it introduce new consumer API that not > compatible with old one. So, I added new consumer api. I made separate > classes in package org.apache.spark.streaming.kafka.v09 with changed API. I > didn't remove old classes for more backward compatibility. User will not need > to change his old spark applications when he uprgade to new Spark version. > Please rewiew my changes -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API
[ https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15267461#comment-15267461 ] Cody Koeninger commented on SPARK-12177: The only real selling point of 0.9 for Spark is security mechanisms, but those weren't fully baked as far as I can tell. There might be some performance wins from prefetching, but I haven't noticed a significant difference in my limited testing. 0.10 is on RC2 currently, and should hopefully work with 0.9 brokers as well, so it makes sense to skip 0.9 client IMHO. > Update KafkaDStreams to new Kafka 0.9 Consumer API > -- > > Key: SPARK-12177 > URL: https://issues.apache.org/jira/browse/SPARK-12177 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.6.0 >Reporter: Nikita Tarasenko > Labels: consumer, kafka > > Kafka 0.9 already released and it introduce new consumer API that not > compatible with old one. So, I added new consumer api. I made separate > classes in package org.apache.spark.streaming.kafka.v09 with changed API. I > didn't remove old classes for more backward compatibility. User will not need > to change his old spark applications when he uprgade to new Spark version. > Please rewiew my changes -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API
[ https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15267419#comment-15267419 ] Sean Owen commented on SPARK-12177: --- Is the idea here that the 0.8 client covers 0.8/0.9 brokers, but a new incompatible client is needed for 0.10 in any event? What is the value of the 0.9 client -- whatever it adds effectively doesn't matter for Spark? It sounds good to get ready, at least, to accommodate another client. CC [~mgrover] > Update KafkaDStreams to new Kafka 0.9 Consumer API > -- > > Key: SPARK-12177 > URL: https://issues.apache.org/jira/browse/SPARK-12177 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.6.0 >Reporter: Nikita Tarasenko > Labels: consumer, kafka > > Kafka 0.9 already released and it introduce new consumer API that not > compatible with old one. So, I added new consumer api. I made separate > classes in package org.apache.spark.streaming.kafka.v09 with changed API. I > didn't remove old classes for more backward compatibility. User will not need > to change his old spark applications when he uprgade to new Spark version. > Please rewiew my changes -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API
[ https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15267347#comment-15267347 ] Reynold Xin commented on SPARK-12177: - Talked more with Cody offline. Summary of discussion: Background information: 1. Kafka 0.8 client works with Kafka 0.8 and 0.9 broker. 2. Kafka 0.9 changes the client API. 3. Kafka 0.10, not yet released, will change the client API again. Recommended steps: 1. Rename existing Kafka connector to include 0.8 version in the Maven project name. 2. Don't support 0.9 client for now (otherwise we will need to support 3 clients very soon). We should revisit the 0.10 support in 2.1, and most likely support that. > Update KafkaDStreams to new Kafka 0.9 Consumer API > -- > > Key: SPARK-12177 > URL: https://issues.apache.org/jira/browse/SPARK-12177 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.6.0 >Reporter: Nikita Tarasenko > Labels: consumer, kafka > > Kafka 0.9 already released and it introduce new consumer API that not > compatible with old one. So, I added new consumer api. I made separate > classes in package org.apache.spark.streaming.kafka.v09 with changed API. I > didn't remove old classes for more backward compatibility. User will not need > to change his old spark applications when he uprgade to new Spark version. > Please rewiew my changes -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API
[ https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15267313#comment-15267313 ] Cody Koeninger commented on SPARK-12177: Yeah. My desired approach is in my PR, namely leave external/kafka as-is, and create a new external/kafka-beta (name change TBD) subproject. > Update KafkaDStreams to new Kafka 0.9 Consumer API > -- > > Key: SPARK-12177 > URL: https://issues.apache.org/jira/browse/SPARK-12177 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.6.0 >Reporter: Nikita Tarasenko > Labels: consumer, kafka > > Kafka 0.9 already released and it introduce new consumer API that not > compatible with old one. So, I added new consumer api. I made separate > classes in package org.apache.spark.streaming.kafka.v09 with changed API. I > didn't remove old classes for more backward compatibility. User will not need > to change his old spark applications when he uprgade to new Spark version. > Please rewiew my changes -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API
[ https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15267273#comment-15267273 ] Sean Owen commented on SPARK-12177: --- I'm not keeping score here, but I suspect it's a question of what you, Mark G et al want to do. Is the solution two Kafka modules? > Update KafkaDStreams to new Kafka 0.9 Consumer API > -- > > Key: SPARK-12177 > URL: https://issues.apache.org/jira/browse/SPARK-12177 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.6.0 >Reporter: Nikita Tarasenko > Labels: consumer, kafka > > Kafka 0.9 already released and it introduce new consumer API that not > compatible with old one. So, I added new consumer api. I made separate > classes in package org.apache.spark.streaming.kafka.v09 with changed API. I > didn't remove old classes for more backward compatibility. User will not need > to change his old spark applications when he uprgade to new Spark version. > Please rewiew my changes -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API
[ https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15267158#comment-15267158 ] Cody Koeninger commented on SPARK-12177: Given Reynold's message about a code freeze for 2.0 in the coming weeks, can we get some direction from a committer as to whether this is still being considered for 2.0 ? > Update KafkaDStreams to new Kafka 0.9 Consumer API > -- > > Key: SPARK-12177 > URL: https://issues.apache.org/jira/browse/SPARK-12177 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.6.0 >Reporter: Nikita Tarasenko > Labels: consumer, kafka > > Kafka 0.9 already released and it introduce new consumer API that not > compatible with old one. So, I added new consumer api. I made separate > classes in package org.apache.spark.streaming.kafka.v09 with changed API. I > didn't remove old classes for more backward compatibility. User will not need > to change his old spark applications when he uprgade to new Spark version. > Please rewiew my changes -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API
[ https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259536#comment-15259536 ] Cody Koeninger commented on SPARK-12177: Well, given that we have working implementations of rdd and dstream for 0.9 (0.10 really at this point), and structured streaming is still in design... > Update KafkaDStreams to new Kafka 0.9 Consumer API > -- > > Key: SPARK-12177 > URL: https://issues.apache.org/jira/browse/SPARK-12177 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.6.0 >Reporter: Nikita Tarasenko > Labels: consumer, kafka > > Kafka 0.9 already released and it introduce new consumer API that not > compatible with old one. So, I added new consumer api. I made separate > classes in package org.apache.spark.streaming.kafka.v09 with changed API. I > didn't remove old classes for more backward compatibility. User will not need > to change his old spark applications when he uprgade to new Spark version. > Please rewiew my changes -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API
[ https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259530#comment-15259530 ] Mario Briggs commented on SPARK-12177: -- what's the thinking on this one with respect to Structured Streaming? Is the thinking that Kafka0.9 be supported with the older Dstream API (KafkaUtils.createDirectStream) AND the newer structured streaming way of doing things ? or kafka 0.9 will only be supported only with the new structured streaming way? I am going to assume only [~tdas] or [~rxin] have an good idea on Structured streaming (sorry [~mgrover] , [~c...@koeninger.org] if i have insulted you :-) ), so appreciate if they can chime in. For my side i am assuming that 0.9 will be supported older DStream API as well. [~mgrover] howz it going merging cody's changes and making a new subproject. > Update KafkaDStreams to new Kafka 0.9 Consumer API > -- > > Key: SPARK-12177 > URL: https://issues.apache.org/jira/browse/SPARK-12177 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.6.0 >Reporter: Nikita Tarasenko > Labels: consumer, kafka > > Kafka 0.9 already released and it introduce new consumer API that not > compatible with old one. So, I added new consumer api. I made separate > classes in package org.apache.spark.streaming.kafka.v09 with changed API. I > didn't remove old classes for more backward compatibility. User will not need > to change his old spark applications when he uprgade to new Spark version. > Please rewiew my changes -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API
[ https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15232885#comment-15232885 ] Mark Grover commented on SPARK-12177: - Thanks Cody. I agree about the separate subproject and I will review the code in your PR. Thank you! > Update KafkaDStreams to new Kafka 0.9 Consumer API > -- > > Key: SPARK-12177 > URL: https://issues.apache.org/jira/browse/SPARK-12177 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.6.0 >Reporter: Nikita Tarasenko > Labels: consumer, kafka > > Kafka 0.9 already released and it introduce new consumer API that not > compatible with old one. So, I added new consumer api. I made separate > classes in package org.apache.spark.streaming.kafka.v09 with changed API. I > didn't remove old classes for more backward compatibility. User will not need > to change his old spark applications when he uprgade to new Spark version. > Please rewiew my changes -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API
[ https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15232828#comment-15232828 ] Cody Koeninger commented on SPARK-12177: Ok, since SPARK-13877 has been rejected and we're keeping the Kafka dstreams in spark, I'd like to get this moving again. I've done some basic throughput testing on my PR using kafka-producer-perf-test.sh to generate load, and after some tweaks the performance is comparable to the existing direct stream. I've made sure my existing transactional / idempotent examples work with the new consumer. I don't yet have a compelling need to move any of my production jobs to the new consumer, but it's at the point that I'd feel comfortable with other people testing it out. Given the issues I've seen with 0.9 / 0.10 (e.g. KAFKA-3135), I'm 100% sure that we want this in a totally separate subproject from the existing dstream, which should be left at 0.8. > Update KafkaDStreams to new Kafka 0.9 Consumer API > -- > > Key: SPARK-12177 > URL: https://issues.apache.org/jira/browse/SPARK-12177 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.6.0 >Reporter: Nikita Tarasenko > Labels: consumer, kafka > > Kafka 0.9 already released and it introduce new consumer API that not > compatible with old one. So, I added new consumer api. I made separate > classes in package org.apache.spark.streaming.kafka.v09 with changed API. I > didn't remove old classes for more backward compatibility. User will not need > to change his old spark applications when he uprgade to new Spark version. > Please rewiew my changes -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API
[ https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15204421#comment-15204421 ] Cody Koeninger commented on SPARK-12177: I made a PR with my changes for discussion's sake. https://github.com/apache/spark/pull/11863 On Sun, Mar 20, 2016 at 10:01 PM, Eugene Miretsky (JIRA) > Update KafkaDStreams to new Kafka 0.9 Consumer API > -- > > Key: SPARK-12177 > URL: https://issues.apache.org/jira/browse/SPARK-12177 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.6.0 >Reporter: Nikita Tarasenko > Labels: consumer, kafka > > Kafka 0.9 already released and it introduce new consumer API that not > compatible with old one. So, I added new consumer api. I made separate > classes in package org.apache.spark.streaming.kafka.v09 with changed API. I > didn't remove old classes for more backward compatibility. User will not need > to change his old spark applications when he uprgade to new Spark version. > Please rewiew my changes -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API
[ https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15204418#comment-15204418 ] Apache Spark commented on SPARK-12177: -- User 'koeninger' has created a pull request for this issue: https://github.com/apache/spark/pull/11863 > Update KafkaDStreams to new Kafka 0.9 Consumer API > -- > > Key: SPARK-12177 > URL: https://issues.apache.org/jira/browse/SPARK-12177 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.6.0 >Reporter: Nikita Tarasenko > Labels: consumer, kafka > > Kafka 0.9 already released and it introduce new consumer API that not > compatible with old one. So, I added new consumer api. I made separate > classes in package org.apache.spark.streaming.kafka.v09 with changed API. I > didn't remove old classes for more backward compatibility. User will not need > to change his old spark applications when he uprgade to new Spark version. > Please rewiew my changes -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API
[ https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15203662#comment-15203662 ] Eugene Miretsky commented on SPARK-12177: - 1) MessageAndMetaData: Was looking at KafkaRDD instead of NewKafkaRDD by mistake - my bad. 2)Decoder/Serializer: DirectKafkaInputDStreamBase has U <: Decoder[K]: ClassTag, T <: Decoder[V]: ClassTag type parameters. - NewDirectKafkaInputDStream ignores the Decoder TypeTags. (The old DirectKafkaInputDStream passes them to KafkaRDDIterator which then creates valueDecoder & keyDecoder which are then passed to MessageAndMetaData). - kafka.serializer.Decoder in part of the old Scala Kafka Consumer, The new Kafka Java Consumer is using org.apache.kafka.common.serialization.Deserializer. Just to make sure we are on the same page - I'm looking at PR 10953. Also may have missed something. > Update KafkaDStreams to new Kafka 0.9 Consumer API > -- > > Key: SPARK-12177 > URL: https://issues.apache.org/jira/browse/SPARK-12177 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.6.0 >Reporter: Nikita Tarasenko > Labels: consumer, kafka > > Kafka 0.9 already released and it introduce new consumer API that not > compatible with old one. So, I added new consumer api. I made separate > classes in package org.apache.spark.streaming.kafka.v09 with changed API. I > didn't remove old classes for more backward compatibility. User will not need > to change his old spark applications when he uprgade to new Spark version. > Please rewiew my changes -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API
[ https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15203027#comment-15203027 ] Cody Koeninger commented on SPARK-12177: Unless I'm misunderstanding your point, those changes are all in my fork already. Keeping a message handler for messageandmetadata doesn't make sense. Backwards compatibility with the existing direct stream isn't really workable. > Update KafkaDStreams to new Kafka 0.9 Consumer API > -- > > Key: SPARK-12177 > URL: https://issues.apache.org/jira/browse/SPARK-12177 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.6.0 >Reporter: Nikita Tarasenko > Labels: consumer, kafka > > Kafka 0.9 already released and it introduce new consumer API that not > compatible with old one. So, I added new consumer api. I made separate > classes in package org.apache.spark.streaming.kafka.v09 with changed API. I > didn't remove old classes for more backward compatibility. User will not need > to change his old spark applications when he uprgade to new Spark version. > Please rewiew my changes -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API
[ https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15203018#comment-15203018 ] Eugene Miretsky commented on SPARK-12177: - The new Kafka Java Consumer is using Deserializer instead of Decoder. The difference is not too big (extra type safety, and Deserializer::deserialize accepts a topic and a byte payload, while Decoder::fromBytes accepts only a byte payload), but still it would be nice to align with the new Kafka consumer. Would it make sense to replace Decoder with Deserializer in the new DirectStream? This would require getting rid of MessageAndMetadata, and hence breaking backwards compatibility with the existing DirectStream, but I guess it will have to be done at some point. > Update KafkaDStreams to new Kafka 0.9 Consumer API > -- > > Key: SPARK-12177 > URL: https://issues.apache.org/jira/browse/SPARK-12177 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.6.0 >Reporter: Nikita Tarasenko > Labels: consumer, kafka > > Kafka 0.9 already released and it introduce new consumer API that not > compatible with old one. So, I added new consumer api. I made separate > classes in package org.apache.spark.streaming.kafka.v09 with changed API. I > didn't remove old classes for more backward compatibility. User will not need > to change his old spark applications when he uprgade to new Spark version. > Please rewiew my changes -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API
[ https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15197536#comment-15197536 ] Cody Koeninger commented on SPARK-12177: My fork is working at a very basic level for caching consumers, preferred locations, dynamic topicpartitions, and being able to commit offsets into kafka. Taking a configured consumer also should allow some degree of control over offset generation policy, just by wrapping a consumer to return different values for assignment() or position(). Unit tests are passing but would need a lot more manual testing, I'm sure there are lots of rough edges. Given the discussion in SPARK-13877 about moving kakfa integration to a separate repo, I'm going to hold off on any more work until that's decided. > Update KafkaDStreams to new Kafka 0.9 Consumer API > -- > > Key: SPARK-12177 > URL: https://issues.apache.org/jira/browse/SPARK-12177 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.6.0 >Reporter: Nikita Tarasenko > Labels: consumer, kafka > > Kafka 0.9 already released and it introduce new consumer API that not > compatible with old one. So, I added new consumer api. I made separate > classes in package org.apache.spark.streaming.kafka.v09 with changed API. I > didn't remove old classes for more backward compatibility. User will not need > to change his old spark applications when he uprgade to new Spark version. > Please rewiew my changes -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API
[ https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15190103#comment-15190103 ] Cody Koeninger commented on SPARK-12177: There are a lot of things I'm really not happy with so far. Given the variety of ways topic subscription works, I think we're better off with a user-provided zero-arg function to create a configured consumer for use on the driver, rather than taking kafka parameters and trying to wrap a consumer as I'm doing currently. People are going to need some kind of access for committing offsets to kafka, but I think that can be handled by submitting finished offset ranges to a threadsafe queue that gets drained in compute() So anyway, for the two or three people actually looking at my repo, expect changes ;) > Update KafkaDStreams to new Kafka 0.9 Consumer API > -- > > Key: SPARK-12177 > URL: https://issues.apache.org/jira/browse/SPARK-12177 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.6.0 >Reporter: Nikita Tarasenko > Labels: consumer, kafka > > Kafka 0.9 already released and it introduce new consumer API that not > compatible with old one. So, I added new consumer api. I made separate > classes in package org.apache.spark.streaming.kafka.v09 with changed API. I > didn't remove old classes for more backward compatibility. User will not need > to change his old spark applications when he uprgade to new Spark version. > Please rewiew my changes -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API
[ https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15189410#comment-15189410 ] Praveen Devarao commented on SPARK-12177: - >>It's also probably better for Spark users if they don't blindly cache or collect consumer records, because its a lot of wasted space (e.g. topic name).<< I agree Thanks Praveen > Update KafkaDStreams to new Kafka 0.9 Consumer API > -- > > Key: SPARK-12177 > URL: https://issues.apache.org/jira/browse/SPARK-12177 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.6.0 >Reporter: Nikita Tarasenko > Labels: consumer, kafka > > Kafka 0.9 already released and it introduce new consumer API that not > compatible with old one. So, I added new consumer api. I made separate > classes in package org.apache.spark.streaming.kafka.v09 with changed API. I > didn't remove old classes for more backward compatibility. User will not need > to change his old spark applications when he uprgade to new Spark version. > Please rewiew my changes -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API
[ https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15189325#comment-15189325 ] Cody Koeninger commented on SPARK-12177: Clearly K and V are serializable somehow, because they were in a byte array in a Kafka message. So it's not really far fetched to expect a thin wrapper around K and V to also be serializable, and it would be convenient from a Spark perspective. That being said, I agree with you that the Kafka project isn't likely to go for it. It's also probably better for Spark users if they don't blindly cache or collect consumer records, because its a lot of wasted space (e.g. topic name). But people are going to be surprised the first the they try and it doesn't work, so I wanted to mention it. > Update KafkaDStreams to new Kafka 0.9 Consumer API > -- > > Key: SPARK-12177 > URL: https://issues.apache.org/jira/browse/SPARK-12177 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.6.0 >Reporter: Nikita Tarasenko > Labels: consumer, kafka > > Kafka 0.9 already released and it introduce new consumer API that not > compatible with old one. So, I added new consumer api. I made separate > classes in package org.apache.spark.streaming.kafka.v09 with changed API. I > didn't remove old classes for more backward compatibility. User will not need > to change his old spark applications when he uprgade to new Spark version. > Please rewiew my changes -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API
[ https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15188761#comment-15188761 ] Praveen Devarao commented on SPARK-12177: - Hi Cody, The last time when I got TopicParition and OffsetAndMetadata class made serializable, the argument was that these classes are used by end-users and are metadata class which would be needed for checkpoint purpose. As for ConsumerRecord, this class is meant to hold the actual data and would usually be not needed for checkpoint purpose...if we need the data we can always go to respective offset in respective topic from respective partition. Also, the ConsumerRecord class has members which are of generic type (K and V) so really the serialization depends on what type of object is flowed in by the user and if that is serializable. Given this, From Kafka perspective not sure how we can reason why would one want to mark this class as serializable. Thanks Praveen > Update KafkaDStreams to new Kafka 0.9 Consumer API > -- > > Key: SPARK-12177 > URL: https://issues.apache.org/jira/browse/SPARK-12177 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.6.0 >Reporter: Nikita Tarasenko > Labels: consumer, kafka > > Kafka 0.9 already released and it introduce new consumer API that not > compatible with old one. So, I added new consumer api. I made separate > classes in package org.apache.spark.streaming.kafka.v09 with changed API. I > didn't remove old classes for more backward compatibility. User will not need > to change his old spark applications when he uprgade to new Spark version. > Please rewiew my changes -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API
[ https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15187847#comment-15187847 ] Cody Koeninger commented on SPARK-12177: Anybody want to volunteer to fight the good fight with convincing the Kafka project to make ConsumerRecord serializable? Given that TopicPartition change took over a month, I don't have high hopes before 0.10 is released. I don't think it's a dealbreaker for us if the KafkaRDD is an RDD[ConsumerRecord], it just means users are going to have to map() over it or whatever. Attempts to collect it directly if it isn't serializable are going to fail. > Update KafkaDStreams to new Kafka 0.9 Consumer API > -- > > Key: SPARK-12177 > URL: https://issues.apache.org/jira/browse/SPARK-12177 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.6.0 >Reporter: Nikita Tarasenko > Labels: consumer, kafka > > Kafka 0.9 already released and it introduce new consumer API that not > compatible with old one. So, I added new consumer api. I made separate > classes in package org.apache.spark.streaming.kafka.v09 with changed API. I > didn't remove old classes for more backward compatibility. User will not need > to change his old spark applications when he uprgade to new Spark version. > Please rewiew my changes -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API
[ https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15187613#comment-15187613 ] Cody Koeninger commented on SPARK-12177: If anyone wants to take a look at the stuff I'm hacking on, it's at https://github.com/koeninger/spark-1/tree/kafka-0.9/external/kafka-beta/src/main/scala/org/apache/spark/streaming/kafka usage e.g. https://github.com/koeninger/kafka-exactly-once/blob/kafka-0.9/src/main/scala/example/BasicStream.scala Caching and preferred locations is sort of working but could clearly be improved. Dynamic topic subscriptions are, uh, interesting to say the least. > Update KafkaDStreams to new Kafka 0.9 Consumer API > -- > > Key: SPARK-12177 > URL: https://issues.apache.org/jira/browse/SPARK-12177 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.6.0 >Reporter: Nikita Tarasenko > Labels: consumer, kafka > > Kafka 0.9 already released and it introduce new consumer API that not > compatible with old one. So, I added new consumer api. I made separate > classes in package org.apache.spark.streaming.kafka.v09 with changed API. I > didn't remove old classes for more backward compatibility. User will not need > to change his old spark applications when he uprgade to new Spark version. > Please rewiew my changes -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API
[ https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185461#comment-15185461 ] Mark Grover commented on SPARK-12177: - For a) I think it's a larger discussion, that is relevant to not kafka - it'd be good for Spark to have a policy on far back does it want to support various versions and how that changes for major vs. minor releases of Spark. For b) there is this PR: https://github.com/apache/spark/pull/10953 and Cody is working on the LRU caching like he said and here's the relevant email on the Kafka mailing list: http://apache-spark-developers-list.1001551.n3.nabble.com/Upgrading-to-Kafka-0-9-x-td16466.html If you'd like to review the PR, that'd be appreciated. Thanks! > Update KafkaDStreams to new Kafka 0.9 Consumer API > -- > > Key: SPARK-12177 > URL: https://issues.apache.org/jira/browse/SPARK-12177 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.6.0 >Reporter: Nikita Tarasenko > Labels: consumer, kafka > > Kafka 0.9 already released and it introduce new consumer API that not > compatible with old one. So, I added new consumer api. I made separate > classes in package org.apache.spark.streaming.kafka.v09 with changed API. I > didn't remove old classes for more backward compatibility. User will not need > to change his old spark applications when he uprgade to new Spark version. > Please rewiew my changes -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API
[ https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15184387#comment-15184387 ] Cody Koeninger commented on SPARK-12177: I've been hacking on a simple lru cache for consumers and preferred locations to take advantage of it. Will update here if it works out. There are some things about the new consumer that make it awkward for this purpose, mentioned on Kafka dev list but no real response > Update KafkaDStreams to new Kafka 0.9 Consumer API > -- > > Key: SPARK-12177 > URL: https://issues.apache.org/jira/browse/SPARK-12177 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.6.0 >Reporter: Nikita Tarasenko > Labels: consumer, kafka > > Kafka 0.9 already released and it introduce new consumer API that not > compatible with old one. So, I added new consumer api. I made separate > classes in package org.apache.spark.streaming.kafka.v09 with changed API. I > didn't remove old classes for more backward compatibility. User will not need > to change his old spark applications when he uprgade to new Spark version. > Please rewiew my changes -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API
[ https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15184346#comment-15184346 ] Mansi Shah commented on SPARK-12177: So looks like we are dealing with two independent issues here - (a) version support. (b) 0.9.0 plugin design. Should we at least start hashing out the design for the new consumer and then we can see where it fits. Do the concerned folks still want to get on a call to figure out the design? > Update KafkaDStreams to new Kafka 0.9 Consumer API > -- > > Key: SPARK-12177 > URL: https://issues.apache.org/jira/browse/SPARK-12177 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.6.0 >Reporter: Nikita Tarasenko > Labels: consumer, kafka > > Kafka 0.9 already released and it introduce new consumer API that not > compatible with old one. So, I added new consumer api. I made separate > classes in package org.apache.spark.streaming.kafka.v09 with changed API. I > didn't remove old classes for more backward compatibility. User will not need > to change his old spark applications when he uprgade to new Spark version. > Please rewiew my changes -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API
[ https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15177315#comment-15177315 ] Mark Grover commented on SPARK-12177: - One more thing as a potential con for Proposal 1: There are places that have to use the kafka artifact. 'examples' subproject is a good example of that. The subproject pulls kafka artifact as a dependency and has example for Kafka usage. However, it can't depend on the new implementation's artifact at the same time because they depend on different versions of kafka. Therefore, unless I am missing something, new implementation's example can't go there. And, that's fine, we can put it within the subproject itself, instead of examples, but that won't necessarily work with tooling like run-example, etc. > Update KafkaDStreams to new Kafka 0.9 Consumer API > -- > > Key: SPARK-12177 > URL: https://issues.apache.org/jira/browse/SPARK-12177 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.6.0 >Reporter: Nikita Tarasenko > Labels: consumer, kafka > > Kafka 0.9 already released and it introduce new consumer API that not > compatible with old one. So, I added new consumer api. I made separate > classes in package org.apache.spark.streaming.kafka.v09 with changed API. I > didn't remove old classes for more backward compatibility. User will not need > to change his old spark applications when he uprgade to new Spark version. > Please rewiew my changes -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API
[ https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15176225#comment-15176225 ] Mark Grover commented on SPARK-12177: - Let me clarify what I was saying: There are 2 axes here - one is the new/old consumer API and other is the support for Kafka v0.8 and v0.9. Both Kafka v0.8 and v0.9 provide the old API, only v0.9 provides the new API. bq. The fact that the 0.9 consumer is still considered beta by the Kafka project and that things are going to change in 0.10 is an argument for keeping the existing implementation as it is, not an argument for throwing it away prematurely. I totally agree with you, Cody, that the old API implementation is bug free and I am definitely not proposing to throw away that implementation. My proposal is that both the old implementation and the new will rely on depend against the same version of Kafka - that being 0.9. Based on what I now understand (and please correct me if I am wrong), I think what you are proposing is: Proposal 1: 2 subprojects - one with old implementation and one with new. The 'old' subproject will be built against Kafka 0.8, and will have it's own assembly and the new subproject will use the new API and will be built against Kafka 0.9 and will have it's own assembly. And, what I am proposing is: Proposal 2: 2 subprojects - one with old implementation and one with new. Both the implementations will be built against Kafka 0.9, they both end up in one single Kafka assembly artifact. Pro of Proposal 1 is that folks who want to use the old implementation with Kafka 0.8 brokers can use it, without upgrading their brokers. Con of proposal 1 is that it doesn't allow for re-use of any code between the old and new implementation. This can be a good thing if we don't want to share any code in the new implementation but there is a definitely a bunch of test code that I think, it'd be good to share. Pro of Proposal 2 - test code, etc. can be shared, there will be a single artifact that folks would need to run the old direct stream implementation or the new one. The con is, of course, that folks would have to upgrade their brokers to Kafka 0.9, if they want to use Spark 2.0. > Update KafkaDStreams to new Kafka 0.9 Consumer API > -- > > Key: SPARK-12177 > URL: https://issues.apache.org/jira/browse/SPARK-12177 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.6.0 >Reporter: Nikita Tarasenko > Labels: consumer, kafka > > Kafka 0.9 already released and it introduce new consumer API that not > compatible with old one. So, I added new consumer api. I made separate > classes in package org.apache.spark.streaming.kafka.v09 with changed API. I > didn't remove old classes for more backward compatibility. User will not need > to change his old spark applications when he uprgade to new Spark version. > Please rewiew my changes -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API
[ https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15176037#comment-15176037 ] Cody Koeninger commented on SPARK-12177: How is it a huge hassle to keep the known working implementation in a separate subproject from the new beta consumer? Any of the kafka version churn hassle would be associated with the new subproject. Work on the existing project would pretty much be limited to changes in Spark that were (somehow?) incompatible with it, or bugfixes. I'm not saying that existing code is bug-free, but we've had 4 point revisions of spark with relatively little change to it. The fact that the 0.9 consumer is still considered beta by the Kafka project and that things are going to change in 0.10 is an argument for keeping the existing implementation as it is, not an argument for throwing it away prematurely. > Update KafkaDStreams to new Kafka 0.9 Consumer API > -- > > Key: SPARK-12177 > URL: https://issues.apache.org/jira/browse/SPARK-12177 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.6.0 >Reporter: Nikita Tarasenko > Labels: consumer, kafka > > Kafka 0.9 already released and it introduce new consumer API that not > compatible with old one. So, I added new consumer api. I made separate > classes in package org.apache.spark.streaming.kafka.v09 with changed API. I > didn't remove old classes for more backward compatibility. User will not need > to change his old spark applications when he uprgade to new Spark version. > Please rewiew my changes -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API
[ https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15175999#comment-15175999 ] Mark Grover commented on SPARK-12177: - I think the core of the question is a much broader Spark question - how many past versions to support? To add some more color to the question at hand, Kafka has already decided that the [next version of Kafka will be 0.10.0|https://github.com/apache/kafka/commit/b084c485e25bfe77154e805219b24714d59c396c] (instead of 0.9.1) and this next version will have yet another protocol change. So, where do we go on from there? Supporting Kafka 0.8, 0.9 and 0.10.0 in 2.x? I still think Spark 2.0 is a good time to drop support for Kafka 0.8.x. Other projects are doing it, that too, in their minor releases (links to Flume and Storm JIRAs are on the PR) and Kafka is moving fast with protocol changes in every new non-maintenance release and it will become a huge hassle to keep up with all the past releases. > Update KafkaDStreams to new Kafka 0.9 Consumer API > -- > > Key: SPARK-12177 > URL: https://issues.apache.org/jira/browse/SPARK-12177 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.6.0 >Reporter: Nikita Tarasenko > Labels: consumer, kafka > > Kafka 0.9 already released and it introduce new consumer API that not > compatible with old one. So, I added new consumer api. I made separate > classes in package org.apache.spark.streaming.kafka.v09 with changed API. I > didn't remove old classes for more backward compatibility. User will not need > to change his old spark applications when he uprgade to new Spark version. > Please rewiew my changes -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API
[ https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15175466#comment-15175466 ] Sean Owen commented on SPARK-12177: --- I favor updating to support only 0.9 in 2.0.0. My general stance is that nobody is forced to update to Kafka 0.9 since nobody is forced to stop using Spark 1.x. Major releases are exactly for this kind of upgrade. It's also why I personally feel pretty strongly that Java 7 should not be supported, but that's a separate question. > Update KafkaDStreams to new Kafka 0.9 Consumer API > -- > > Key: SPARK-12177 > URL: https://issues.apache.org/jira/browse/SPARK-12177 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.6.0 >Reporter: Nikita Tarasenko > Labels: consumer, kafka > > Kafka 0.9 already released and it introduce new consumer API that not > compatible with old one. So, I added new consumer api. I made separate > classes in package org.apache.spark.streaming.kafka.v09 with changed API. I > didn't remove old classes for more backward compatibility. User will not need > to change his old spark applications when he uprgade to new Spark version. > Please rewiew my changes -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API
[ https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15175066#comment-15175066 ] Reynold Xin commented on SPARK-12177: - This thread is getting to long for me to follow, but my instinct is that maybe we should have two subprojects and support both. > Update KafkaDStreams to new Kafka 0.9 Consumer API > -- > > Key: SPARK-12177 > URL: https://issues.apache.org/jira/browse/SPARK-12177 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.6.0 >Reporter: Nikita Tarasenko > Labels: consumer, kafka > > Kafka 0.9 already released and it introduce new consumer API that not > compatible with old one. So, I added new consumer api. I made separate > classes in package org.apache.spark.streaming.kafka.v09 with changed API. I > didn't remove old classes for more backward compatibility. User will not need > to change his old spark applications when he uprgade to new Spark version. > Please rewiew my changes -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API
[ https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15174856#comment-15174856 ] Mark Grover commented on SPARK-12177: - Hi [~tdas] and [~rxin], can you help us with your opinion on these questions, so we can unblock this work: 1. Should we support both Kafka 0.8 and 0.9 or just 0.9? The pros and cons are listed [here|https://github.com/apache/spark/pull/11143#issuecomment-182154267] along with what other projects are doing. 2. Should we make a separate project for the implementation using the new kafka consumer API with the same class names (e.g. KafkaRDD, etc.), or create new classes like hadoop did, in the same subproject e.g. NewKafkaRDD, etc. Thanks! > Update KafkaDStreams to new Kafka 0.9 Consumer API > -- > > Key: SPARK-12177 > URL: https://issues.apache.org/jira/browse/SPARK-12177 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.6.0 >Reporter: Nikita Tarasenko > Labels: consumer, kafka > > Kafka 0.9 already released and it introduce new consumer API that not > compatible with old one. So, I added new consumer api. I made separate > classes in package org.apache.spark.streaming.kafka.v09 with changed API. I > didn't remove old classes for more backward compatibility. User will not need > to change his old spark applications when he uprgade to new Spark version. > Please rewiew my changes -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API
[ https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15174534#comment-15174534 ] Mansi Shah commented on SPARK-12177: Thanks for the explanation Cody. I understand committing from within Spark can get ugly. So in spark streaming is there a way for the executors to communicate back with the driver? I think I am flexible if we cant get this extra optimization or maybe we can hack something up internally in our system to make this problem easier. But let us definitely design for the long living consumers on the executors. > Update KafkaDStreams to new Kafka 0.9 Consumer API > -- > > Key: SPARK-12177 > URL: https://issues.apache.org/jira/browse/SPARK-12177 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.6.0 >Reporter: Nikita Tarasenko > Labels: consumer, kafka > > Kafka 0.9 already released and it introduce new consumer API that not > compatible with old one. So, I added new consumer api. I made separate > classes in package org.apache.spark.streaming.kafka.v09 with changed API. I > didn't remove old classes for more backward compatibility. User will not need > to change his old spark applications when he uprgade to new Spark version. > Please rewiew my changes -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API
[ https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15174507#comment-15174507 ] Cody Koeninger commented on SPARK-12177: Thanks for the example of performance numbers. The direct stream RDD batch sizes are, by default, "whatever's left in kafka". The backpressure and maximum limits on batch sizes are in terms of messages not bytes, because that's easy to calculate with on the driver without having read messages. You can tune it for your app pretty straightforwardly, as long as you don't have a lot of tiny messages followed by a lot of huge messages in the same topic. Doing kafka offset commits on the executor without user intervention opens up a whole other can of worms, I'd prefer to avoid that if at all possible. > Update KafkaDStreams to new Kafka 0.9 Consumer API > -- > > Key: SPARK-12177 > URL: https://issues.apache.org/jira/browse/SPARK-12177 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.6.0 >Reporter: Nikita Tarasenko > Labels: consumer, kafka > > Kafka 0.9 already released and it introduce new consumer API that not > compatible with old one. So, I added new consumer api. I made separate > classes in package org.apache.spark.streaming.kafka.v09 with changed API. I > didn't remove old classes for more backward compatibility. User will not need > to change his old spark applications when he uprgade to new Spark version. > Please rewiew my changes -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API
[ https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15174490#comment-15174490 ] Mansi Shah commented on SPARK-12177: Sorry I forgot to mention that the numbers I quoted above were run on a kafka topic with 100 million records of 100 bytes each. > Update KafkaDStreams to new Kafka 0.9 Consumer API > -- > > Key: SPARK-12177 > URL: https://issues.apache.org/jira/browse/SPARK-12177 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.6.0 >Reporter: Nikita Tarasenko > Labels: consumer, kafka > > Kafka 0.9 already released and it introduce new consumer API that not > compatible with old one. So, I added new consumer api. I made separate > classes in package org.apache.spark.streaming.kafka.v09 with changed API. I > didn't remove old classes for more backward compatibility. User will not need > to change his old spark applications when he uprgade to new Spark version. > Please rewiew my changes -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API
[ https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15174485#comment-15174485 ] Mansi Shah commented on SPARK-12177: Caching the new consumer across batches. I did not know how to do this with the current implementation of spark plugin, so I ran a simple experiment that just reads messages from kafka in batches. So say for a batch that has 100K record - a cached consumer model finishes in 68.3 secs but its non-cached counter part where we recreate the consumer for every batch takes 165.9 secs. This gets way worse as the batch sizes change. Now if we make the batch sizes restrictive and throw out messages beyond the until offset then the same job takes 180 secs. I had one unrelated question - how come RDD batch sizes do not have anything to do with the actual message size. If the batch size is fixed at 100K records - wouldn't that mean completely different things for messages that are 100 bytes worse 1M? Just in terms of pure memory pressure? Do you have any thoughts on that? Btw the solution where we cache the records in the executer will unfortunately not work for us, as our system is truely distributed and a topic partition does not always reside on the same node like kafka. I mean we can do some predefined static allocation using getPreferredLocations but that means we can never use locality and always have to use this static assignment even if the spark cluster is colocated with mapr cluster. That is where I was suggesting the offsets being communicated back to driver. We do not need to do this explicitly - just doing a kafka commit on the executor should take care of this. > Update KafkaDStreams to new Kafka 0.9 Consumer API > -- > > Key: SPARK-12177 > URL: https://issues.apache.org/jira/browse/SPARK-12177 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.6.0 >Reporter: Nikita Tarasenko > Labels: consumer, kafka > > Kafka 0.9 already released and it introduce new consumer API that not > compatible with old one. So, I added new consumer api. I made separate > classes in package org.apache.spark.streaming.kafka.v09 with changed API. I > didn't remove old classes for more backward compatibility. User will not need > to change his old spark applications when he uprgade to new Spark version. > Please rewiew my changes -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API
[ https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15174415#comment-15174415 ] Cody Koeninger commented on SPARK-12177: Mansi are you talking about performance improvements from caching the new consumer, or from caching the old simple consumer? I had a branch of direct stream that cached the old simple consumer, but in my testing it didn't make enough of a difference to be worth the added complexity. Regarding throwing out messages beyond the untilOffset, I agree, that's why I'm saying the iterator would need to be redesigned. I don't think it needs to communicate back to the driver, it needs to cache those messages locally, then we can (hopefully) use getPreferredLocations to encourage future requests for that partition to be scheduled on that executor. > Update KafkaDStreams to new Kafka 0.9 Consumer API > -- > > Key: SPARK-12177 > URL: https://issues.apache.org/jira/browse/SPARK-12177 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.6.0 >Reporter: Nikita Tarasenko > Labels: consumer, kafka > > Kafka 0.9 already released and it introduce new consumer API that not > compatible with old one. So, I added new consumer api. I made separate > classes in package org.apache.spark.streaming.kafka.v09 with changed API. I > didn't remove old classes for more backward compatibility. User will not need > to change his old spark applications when he uprgade to new Spark version. > Please rewiew my changes -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API
[ https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15174392#comment-15174392 ] Mansi Shah commented on SPARK-12177: Cody / Mark I am glad you are discussing about caching the consumer on executors instead of creating them for every batch. I am not super familiar with Spark, but I am coming at this from my Kafka experience. I have been running some performance experiments and I see that if the consumer is not recreated on every batch, then depending on the batch size we get between 10x to 2x improvements. I will be happy to share the details of my experiments and the actual numbers. I am guessing this is mostly happening because of metadata lookup that are lost when the consumer is closed. I actually work on MapR Streams which is a kafka API compliant implementation of message streams and at least for us this is has huge performance implications. I agree that just massaging the 0.8.2 code to work against 0.9.0 is a really bad idea since the whole message fetch architecture itself is different for 0.9.0. Also if we do not throw out messages that were returned on poll because they fall beyond the "until" offset and make the rdd elastic instead, that means we do not read some messages twice from kafka. I do not know if this last one is even possible - since it will mean that the executor has to communicate back with the driver the actual offset of the last message read, but if we can make it happen that also gives some gains. I will be super happy to jump on a call and iron out details also more than happy to contribute in anyway possible. Btw if calls are frowned upon - maybe we can just add an update here with detailed minutes of the call - so that there is no lost trail? > Update KafkaDStreams to new Kafka 0.9 Consumer API > -- > > Key: SPARK-12177 > URL: https://issues.apache.org/jira/browse/SPARK-12177 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.6.0 >Reporter: Nikita Tarasenko > Labels: consumer, kafka > > Kafka 0.9 already released and it introduce new consumer API that not > compatible with old one. So, I added new consumer api. I made separate > classes in package org.apache.spark.streaming.kafka.v09 with changed API. I > didn't remove old classes for more backward compatibility. User will not need > to change his old spark applications when he uprgade to new Spark version. > Please rewiew my changes -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API
[ https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15174212#comment-15174212 ] Cody Koeninger commented on SPARK-12177: I'm happy to help in whatever way. If people think a call makes sense, I can do that (although I've seen public complaints about issues being discussed on calls instead of mailing lists or jira). > Update KafkaDStreams to new Kafka 0.9 Consumer API > -- > > Key: SPARK-12177 > URL: https://issues.apache.org/jira/browse/SPARK-12177 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.6.0 >Reporter: Nikita Tarasenko > Labels: consumer, kafka > > Kafka 0.9 already released and it introduce new consumer API that not > compatible with old one. So, I added new consumer api. I made separate > classes in package org.apache.spark.streaming.kafka.v09 with changed API. I > didn't remove old classes for more backward compatibility. User will not need > to change his old spark applications when he uprgade to new Spark version. > Please rewiew my changes -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API
[ https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15174149#comment-15174149 ] Mark Grover commented on SPARK-12177: - Thanks Cody, I appreciate your thoughts. I have been keeping most of my commentary on the PRs but I will post some parts of it here for the sake of argument. bq. No one (as far as I can tell) is actually doing integration testing of these existing PRs using the new kafka security features. We need actual manual integration testing and benchmarking, ideally with production loads. Agreed. The code in [my PR for the new security API|https://github.com/apache/spark/pull/10953/files] was integration tested by me against a distributed Kafka and ZK cluster, albeit manually. Working on adding automated integration tests is on my list of things to do, however, that PR is bit rotting because it's blocked by [the PR|https://github.com/apache/spark/pull/11143] to upgrade Kafka to 0.9.0. Your comment about caching consumers on executors is an excellent one. I haven't invested much time there because the way I was thinking of doing this was in several steps: 1. Upgrade Kafka to 0.9 (with or without 0.8 support, pending decision on https://github.com/apache/spark/pull/11143) 2. Add support for the new consumer API (https://github.com/apache/spark/pull/10953/files) 3. Add Kerberos/SASL support for authentication and SSL support for encryption over wire. This work is blocked until delegation token support is added in Kafka (https://issues.apache.org/jira/browse/KAFKA-1696). I have been following that design discussion closely Kafka mailing list. Thanks for sharing your preference. I understand where you are coming from, and think that's reasonable. I had gotten feedback to the contrary on [this PR|https://github.com/apache/spark/pull/10953/files] so I changed my original implementation which had separate subprojects, to all be in the same project. I don't mind changing it back, especially if we are going to keep 0.8 support. Related to not hiding the fact that the consumer is new, is concerned: I agree with you, KafkaUtils, for example has exposed TopicAndPartition, MessageAndMetadata classes. And, I think we may have to expose their new API equivalent TopicPartition and ConsumerRecord in KafkaUtils. In any case, I'd appreciate your help in moving this forward. I think the first step is to come to a resolution on https://github.com/apache/spark/pull/11143. Perhaps you, I, [~tdas] and anyone else who's interested could get on a call to sort this out? I will post the call details here so anyone would be able to join in. Other methods of communication work too, my goal is to move that conversation forward. > Update KafkaDStreams to new Kafka 0.9 Consumer API > -- > > Key: SPARK-12177 > URL: https://issues.apache.org/jira/browse/SPARK-12177 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.6.0 >Reporter: Nikita Tarasenko > Labels: consumer, kafka > > Kafka 0.9 already released and it introduce new consumer API that not > compatible with old one. So, I added new consumer api. I made separate > classes in package org.apache.spark.streaming.kafka.v09 with changed API. I > didn't remove old classes for more backward compatibility. User will not need > to change his old spark applications when he uprgade to new Spark version. > Please rewiew my changes -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API
[ https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15174092#comment-15174092 ] Cody Koeninger commented on SPARK-12177: My thoughts so far Must-haves: - The major new feature of the kafka 0.9 consumer that may justify upgrading is security. - No one (as far as I can tell) is actually doing integration testing of these existing PRs using the new kafka security features. We need actual manual integration testing and benchmarking, ideally with production loads. - Actually making security features usable at low latencies is probably going to require caching consumers on the executors, not just the driver. The current direct stream managed without caching, but you don't want to be doing handshaking/getting a new token/whatever every XXX ms batch. This is going to require redesigning the iterator and getPreferredLocations. - Even if handshaking doesn't turn out to be that bad, getting full benefit of the new consumer's pipelining will require caching it between batches. Nice to haves that should be considered up front: - There are some long standing minor features that would be easier to do with the new consumer (dynamic topicpartitions, making it easier to control offsetrange generation policy in general, making it easier to commit offsets into kafka) - Those minor features are going to require minor redesign of the Dstream, and the interface for creating it. >From what I can tell, the existing PRs and commentary on them don't address >any of this (don't get me wrong, I appreciate the work). Until that stuff is addressed, I think contemplating replacing the existing consumers is premature. Now that I've got a production 0.9 cluster going, I'm certainly willing to help work on it. But it doesn't seem like there's consensus (or even committer fiat, which I'd also be fine with) on the approach for dealing with 0.8 vs 0.9. My strong preference on the bikeshed's color: - Leave the existing 0.8 consumer integration subproject exactly as is. It's known working, should still work with 0.9 brokers, and the maintenance (if any) necessary to make it work with spark 2.0 should be minor. Changing the existing consumer to factor out functionality makes it harder to verify that the PR isn't breaking something, so don't do it. The substantive new features would be hard / impossible to backport, so don't do it. - Make an entirely new subproject, with a differently-named artifact, for the 0.9 consumer integration. Don't name all the packages or classes in the new subproject NewKafkaWhatever, it's unnecessary. Because of broker differences, someone's not going to be likely to use both artifacts in one project. Different signatures will make it fail at compile time if they use the wrong one. - Don't copy everything wholesale from the existing subproject, only copy those pieces of the direct stream that still make sense, and modify them. Cleanly factoring out common functionality from the existing subproject would be awkward given the actual differences in the consumer. A lot of the existing pieces don't make sense at all for the new consumer (KafkaCluster as a wrapper should just go away, for instance) - Don't try to hide the fact that the consumer is different from end users. It really is different, and exposing that will give them more power (e.g. topic subscription by pattern). That's just my 2 cents, I'm willing to help out regardless of the direction taken, but I think we need some committer weigh in on direction, especially given there are now 4 separate PRs relating to this issue. [~tdas] [~srowen][~vanzin] any thoughts? > Update KafkaDStreams to new Kafka 0.9 Consumer API > -- > > Key: SPARK-12177 > URL: https://issues.apache.org/jira/browse/SPARK-12177 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.6.0 >Reporter: Nikita Tarasenko > Labels: consumer, kafka > > Kafka 0.9 already released and it introduce new consumer API that not > compatible with old one. So, I added new consumer api. I made separate > classes in package org.apache.spark.streaming.kafka.v09 with changed API. I > didn't remove old classes for more backward compatibility. User will not need > to change his old spark applications when he uprgade to new Spark version. > Please rewiew my changes -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API
[ https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15137029#comment-15137029 ] Rama Mullapudi commented on SPARK-12177: Does the update include kerberos support, since 0.9 producers and consumers now support kerberos (SASL) and ssl. > Update KafkaDStreams to new Kafka 0.9 Consumer API > -- > > Key: SPARK-12177 > URL: https://issues.apache.org/jira/browse/SPARK-12177 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.6.0 >Reporter: Nikita Tarasenko > Labels: consumer, kafka > > Kafka 0.9 already released and it introduce new consumer API that not > compatible with old one. So, I added new consumer api. I made separate > classes in package org.apache.spark.streaming.kafka.v09 with changed API. I > didn't remove old classes for more backward compatibility. User will not need > to change his old spark applications when he uprgade to new Spark version. > Please rewiew my changes -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API
[ https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15137771#comment-15137771 ] Mark Grover commented on SPARK-12177: - Hi Rama, This particular PR adds support for the new API. There is some small code for SSL support in it too but I haven't invested much time in testing that, apart from the simple unit test that was written for it. Kerberos (SASL) will have to done incrementally in another patch because, it can't be done until Kafka supports delegation tokens (which is still not there yet: https://issues.apache.org/jira/browse/KAFKA-1696) > Update KafkaDStreams to new Kafka 0.9 Consumer API > -- > > Key: SPARK-12177 > URL: https://issues.apache.org/jira/browse/SPARK-12177 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.6.0 >Reporter: Nikita Tarasenko > Labels: consumer, kafka > > Kafka 0.9 already released and it introduce new consumer API that not > compatible with old one. So, I added new consumer api. I made separate > classes in package org.apache.spark.streaming.kafka.v09 with changed API. I > didn't remove old classes for more backward compatibility. User will not need > to change his old spark applications when he uprgade to new Spark version. > Please rewiew my changes -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API
[ https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15128377#comment-15128377 ] Cody Koeninger commented on SPARK-12177: It's probably worth either waiting for a point release from the kafka project, or at least keeping a careful eye on their jira for possibly relevant issues, e.g. https://issues.apache.org/jira/browse/KAFKA-3159 > Update KafkaDStreams to new Kafka 0.9 Consumer API > -- > > Key: SPARK-12177 > URL: https://issues.apache.org/jira/browse/SPARK-12177 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.6.0 >Reporter: Nikita Tarasenko > Labels: consumer, kafka > > Kafka 0.9 already released and it introduce new consumer API that not > compatible with old one. So, I added new consumer api. I made separate > classes in package org.apache.spark.streaming.kafka.v09 with changed API. I > didn't remove old classes for more backward compatibility. User will not need > to change his old spark applications when he uprgade to new Spark version. > Please rewiew my changes -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API
[ https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15128465#comment-15128465 ] Mark Grover commented on SPARK-12177: - Thanks Cody, will do! > Update KafkaDStreams to new Kafka 0.9 Consumer API > -- > > Key: SPARK-12177 > URL: https://issues.apache.org/jira/browse/SPARK-12177 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.6.0 >Reporter: Nikita Tarasenko > Labels: consumer, kafka > > Kafka 0.9 already released and it introduce new consumer API that not > compatible with old one. So, I added new consumer api. I made separate > classes in package org.apache.spark.streaming.kafka.v09 with changed API. I > didn't remove old classes for more backward compatibility. User will not need > to change his old spark applications when he uprgade to new Spark version. > Please rewiew my changes -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API
[ https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15120113#comment-15120113 ] Mark Grover commented on SPARK-12177: - I have issued a new PR https://github.com/apache/spark/pull/10953 for this which contains all of Nikita's changes as well. Please feel free to review and comment there. The python implementation is not in that PR just yet, it's being worked on separately at https://github.com/markgrover/spark/tree/kafka09-integration-python (for now, anyways). The new package is called 'newapi' instead of 'v09'. > Update KafkaDStreams to new Kafka 0.9 Consumer API > -- > > Key: SPARK-12177 > URL: https://issues.apache.org/jira/browse/SPARK-12177 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.6.0 >Reporter: Nikita Tarasenko > Labels: consumer, kafka > > Kafka 0.9 already released and it introduce new consumer API that not > compatible with old one. So, I added new consumer api. I made separate > classes in package org.apache.spark.streaming.kafka.v09 with changed API. I > didn't remove old classes for more backward compatibility. User will not need > to change his old spark applications when he uprgade to new Spark version. > Please rewiew my changes -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API
[ https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15120110#comment-15120110 ] Apache Spark commented on SPARK-12177: -- User 'markgrover' has created a pull request for this issue: https://github.com/apache/spark/pull/10953 > Update KafkaDStreams to new Kafka 0.9 Consumer API > -- > > Key: SPARK-12177 > URL: https://issues.apache.org/jira/browse/SPARK-12177 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.6.0 >Reporter: Nikita Tarasenko > Labels: consumer, kafka > > Kafka 0.9 already released and it introduce new consumer API that not > compatible with old one. So, I added new consumer api. I made separate > classes in package org.apache.spark.streaming.kafka.v09 with changed API. I > didn't remove old classes for more backward compatibility. User will not need > to change his old spark applications when he uprgade to new Spark version. > Please rewiew my changes -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API
[ https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15112785#comment-15112785 ] Mark Grover commented on SPARK-12177: - Hi Mario, Thanks for checking. I was still hoping to do everything in one assembly, so far it's looking good. Yeah, I'll take care of renaming the packages/python files to something other than v09. python part is coming along ok. Will keep you posted on the jira. > Update KafkaDStreams to new Kafka 0.9 Consumer API > -- > > Key: SPARK-12177 > URL: https://issues.apache.org/jira/browse/SPARK-12177 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.6.0 >Reporter: Nikita Tarasenko > Labels: consumer, kafka > > Kafka 0.9 already released and it introduce new consumer API that not > compatible with old one. So, I added new consumer api. I made separate > classes in package org.apache.spark.streaming.kafka.v09 with changed API. I > didn't remove old classes for more backward compatibility. User will not need > to change his old spark applications when he uprgade to new Spark version. > Please rewiew my changes -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API
[ https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15110994#comment-15110994 ] Mario Briggs commented on SPARK-12177: -- bq. If one uses the kafka v9 jar even when using the old consumer API, it can only work a Kafka v9 broker. I tried it on a single system setup (v0.9 client talking to v0.8 server-side) and the consumers had a problem (old or new). The producers though worked fine. So you are right. So then we will have kafka-assembly and kafka-assembly-v09/new and each including their version of kafka jars respectively right? ( I guess now, you were all along thinking 2 diff assemblies, and i guessed the other way round. Duh, IRC might have been faster) With the above confirmed, it automatically throws out 'The public API signatures (of KafkaUtils in v0.9 subproject) are different and do not clash (with KafkaUtils in original kafka subproject) and hence can be added to the existing (original kafka subproject) KafkaUtils class.’ So the only thing left it seems is to use 'new' or a better term instead of 'v09', since we both agree on that. Great and thanks Mark. How's the 'python/pyspark/streaming/kafka-v09(new).py' going > Update KafkaDStreams to new Kafka 0.9 Consumer API > -- > > Key: SPARK-12177 > URL: https://issues.apache.org/jira/browse/SPARK-12177 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.6.0 >Reporter: Nikita Tarasenko > Labels: consumer, kafka > > Kafka 0.9 already released and it introduce new consumer API that not > compatible with old one. So, I added new consumer api. I made separate > classes in package org.apache.spark.streaming.kafka.v09 with changed API. I > didn't remove old classes for more backward compatibility. User will not need > to change his old spark applications when he uprgade to new Spark version. > Please rewiew my changes -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API
[ https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15110082#comment-15110082 ] Mark Grover commented on SPARK-12177: - Hi Mario, I may have misunderstood some parts of your previous comment and if so, I apologize in advance. bq. i think that is not required when client uses a v0.9 jar though consuming only the older high level/low level API and talking to a v0.8 kafka cluster. Based on what I understand, that's not the case. If one uses the kafka v9 jar even when using the old consumer API, it can only work a Kafka v9 broker. So, if we have to support both Kafka v08 and Kafka v09 brokers with Spark (which I believe we do), we have to have both Kafka v08 and Kafka v09 jars in our assembly. As far as I understand, simply having Kafka v09 jar only will not help. bq. 1 thought around not introducing the version in the package name or class name (I see that Flink does it in the class name) was to avoid forcing us to create v0.10/v0.11 packages (and customers to change code and recompile), even if those releases of kafka don’t have client-api’ or otherwise such changes that warrant us to make a new version I totally agree with you on this note. I was actually thinking of renaming all the v09 packages to be something different (like 'new'? But may be there's a better term) because as very aptly pointed out that it would be very confusing as we support later kafka versions. bq. That’s why 1 earlier idea i mentioned in this JIRA was 'The public API signatures (of KafkaUtils in v0.9 subproject) are different and do not clash (with KafkaUtils in original kafka subproject) and hence can be added to the existing (original kafka subproject) KafkaUtils class.’ This also addresses the issues u mention above. Cody mentioned that we need to get others on the same page for this idea, so i guess we really need the committers to chime in here. Of course i forgot to answer’s Nikita’s followup question - 'do you mean that we would change the original KafkaUtils by adding new functions for new DirectIS/KafkaRDD but using them from separate module with kafka09 classes’ ? To be clear, these new public methods added to original kafka subproject’s ‘KafkaUtils' ,will make use of DirectKafkaInputDStream,KafkaRDD,KafkaRDDPartition,OffsetRange classes that are in a new v09 package (internal of course). In short we don’t have a new subproject. (I skipped class KafkaCluster class from the list, because i am thinking it makes more sense to call this class something like 'KafkaClient' instead going forward) At the core of it, I am not 100% sure if we can hide/abstract the fact away from our users that we have completely changed the consumer API from underneath us. I can think more about it but would appreciate more thoughts/insights along this direction, especially if you feel strongly about this. Thanks again, Mario! > Update KafkaDStreams to new Kafka 0.9 Consumer API > -- > > Key: SPARK-12177 > URL: https://issues.apache.org/jira/browse/SPARK-12177 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.6.0 >Reporter: Nikita Tarasenko > Labels: consumer, kafka > > Kafka 0.9 already released and it introduce new consumer API that not > compatible with old one. So, I added new consumer api. I made separate > classes in package org.apache.spark.streaming.kafka.v09 with changed API. I > didn't remove old classes for more backward compatibility. User will not need > to change his old spark applications when he uprgade to new Spark version. > Please rewiew my changes -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API
[ https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15109056#comment-15109056 ] Mario Briggs commented on SPARK-12177: -- bq. I totally understand what you mean. However, kafka has its own assembly in Spark and the way the code is structured right now, both the new API and old API would go in the same assembly so it's important to have a different package name. Also, I think for our end users transitioning from old to new API, I foresee them having 2 versions of their spark-kafka app. One that works with the old API and one with the new API. And, I think it would be an easier transition if they could include both the kafka API versions in the spark classpath and pick and choose which app to run without mucking with maven dependencies and re-compiling when they want to switch. Let me know if you disagree. Since this is WIP, i myself have atleast 1 more different option to what i suggested above... put just one to get the conversation rolling, so thanks for chipping in. Thanks for bringing out the kafka-assembly part… the assembly jar does include all dependencies too, so we would include kafka's v0.9’s jars i guess? I remember Cody mentioning that a v0.8 to v0.9 upgrade on kafka side involves upgrading the brokers… i think that is not required when client uses a v0.9 jar though consuming only the older high level/low level API and talking to a v0.8 kafka cluster. So we can go with 1 kafka-assembly and then my suggestion above itself is broken since we would have issues of same package & class names. 1 thought around not introducing the version in the package name or class name (I see that Flink does it in the class name) was to avoid forcing us to create v0.10/v0.11 packages (and customers to change code and recompile), even if those releases of kafka don’t have client-api’ or otherwise such changes that warrant us to make a new version (Also spark got away without putting a version# till now, which means less work in Spark, so not sure we want to start forcing this work going forward). Once we introduce the version #, we need to ensure it is in sync with kafka. That’s why 1 earlier idea i mentioned in this JIRA was 'The public API signatures (of KafkaUtils in v0.9 subproject) are different and do not clash (with KafkaUtils in original kafka subproject) and hence can be added to the existing (original kafka subproject) KafkaUtils class.’ Cody mentioned that we need to get others on the same page for this idea, so i guess we really need the committers to chime in here. Of course i forgot to answer’s Nikita’s followup question - 'do you mean that we would change the original KafkaUtils by adding new functions for new DirectIS/KafkaRDD but using them from separate module with kafka09 classes’ ? To be clear, these new public methods added to original kafka subproject’s ‘KafkaUtils' ,will make use of DirectKafkaInputDStream,KafkaRDD,KafkaRDDPartition,OffsetRange classes that are in a new v09 package (internal of course). In short we don’t have a new subproject. (I skipped class KafkaCluster class from the list, becuase i am thinking it makes more sense to call this class something like 'KafkaClient' instead going forward) > Update KafkaDStreams to new Kafka 0.9 Consumer API > -- > > Key: SPARK-12177 > URL: https://issues.apache.org/jira/browse/SPARK-12177 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.6.0 >Reporter: Nikita Tarasenko > Labels: consumer, kafka > > Kafka 0.9 already released and it introduce new consumer API that not > compatible with old one. So, I added new consumer api. I made separate > classes in package org.apache.spark.streaming.kafka.v09 with changed API. I > didn't remove old classes for more backward compatibility. User will not need > to change his old spark applications when he uprgade to new Spark version. > Please rewiew my changes -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API
[ https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15107267#comment-15107267 ] Mark Grover commented on SPARK-12177: - Thanks Mario! bq. We should also have a python/pyspark/streaming/kafka-v09.py as well that matches to our external/kafka-v09 I agree, I will look into this. bq. Why do you have the Broker.scala class? Unless i am missing something, it should be knocked off Yeah, I noticed that too and I agree. This should be pretty simple to take out. I also [noticed|https://issues.apache.org/jira/browse/SPARK-12177?focusedCommentId=15089750=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15089750] that the v09 example picking up some Kafka v08 jars so I am working on fixing that too. bq. I think the package should be 'org.apache.spark.streaming.kafka' only in external/kafka-v09 and not 'org.apache.spark.streaming.kafka.v09'. This is because we produce a jar with a diff name (user picks which one and even if he/she mismatches, it errors correctly since the KafkaUtils method signatures are different) I totally understand what you mean. However, kafka has its [own assembly in Spark|https://github.com/apache/spark/tree/master/external/kafka-assembly] and the way the code is structured right now, both the new API and old API would go in the same assembly so it's important to have a different package name. Also, I think for our end users transitioning from old to new API, I foresee them having 2 versions of their spark-kafka app. One that works with the old API and one with the new API. And, I think it would be an easier transition if they could include both the kafka API versions in the spark classpath and pick and choose which app to run without mucking with maven dependencies and re-compiling when they want to switch. Let me know if you disagree. > Update KafkaDStreams to new Kafka 0.9 Consumer API > -- > > Key: SPARK-12177 > URL: https://issues.apache.org/jira/browse/SPARK-12177 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.6.0 >Reporter: Nikita Tarasenko > Labels: consumer, kafka > > Kafka 0.9 already released and it introduce new consumer API that not > compatible with old one. So, I added new consumer api. I made separate > classes in package org.apache.spark.streaming.kafka.v09 with changed API. I > didn't remove old classes for more backward compatibility. User will not need > to change his old spark applications when he uprgade to new Spark version. > Please rewiew my changes -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API
[ https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15107669#comment-15107669 ] Mark Grover commented on SPARK-12177: - Posting an update. Took out Broker.scala, the example picking wrong version of Kafka was already taken care of by Nikita. I am looking into the python stuff now. > Update KafkaDStreams to new Kafka 0.9 Consumer API > -- > > Key: SPARK-12177 > URL: https://issues.apache.org/jira/browse/SPARK-12177 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.6.0 >Reporter: Nikita Tarasenko > Labels: consumer, kafka > > Kafka 0.9 already released and it introduce new consumer API that not > compatible with old one. So, I added new consumer api. I made separate > classes in package org.apache.spark.streaming.kafka.v09 with changed API. I > didn't remove old classes for more backward compatibility. User will not need > to change his old spark applications when he uprgade to new Spark version. > Please rewiew my changes -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API
[ https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15105560#comment-15105560 ] Mario Briggs commented on SPARK-12177: -- Hi Nikita, great. 1 - We should also have a python/pyspark/streaming/kafka-v09.py as well that matches to our external/kafka-v09 2 - Why do you have the Broker.scala class? Unless i am missing something, it should be knocked off 3 - I think the package should be 'org.apache.spark.streaming.kafka' only in external/kafka-v09 and not 'org.apache.spark.streaming.kafka.v09'. This is because we produce a jar with a diff name (user picks which one and even if he/she mismatches, it errors correctly since the KafkaUtils method signatures are different) > Update KafkaDStreams to new Kafka 0.9 Consumer API > -- > > Key: SPARK-12177 > URL: https://issues.apache.org/jira/browse/SPARK-12177 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.6.0 >Reporter: Nikita Tarasenko > Labels: consumer, kafka > > Kafka 0.9 already released and it introduce new consumer API that not > compatible with old one. So, I added new consumer api. I made separate > classes in package org.apache.spark.streaming.kafka.v09 with changed API. I > didn't remove old classes for more backward compatibility. User will not need > to change his old spark applications when he uprgade to new Spark version. > Please rewiew my changes -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API
[ https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15092455#comment-15092455 ] Mark Grover commented on SPARK-12177: - Thanks Nikita. And, I will be issuing PR's to your kafka09-integration branch so it can become the single source of truth until this change gets merged into spark. And, I believe Spark community prefers discussion on PRs once they are filed, so you'll hear more from me there:-) > Update KafkaDStreams to new Kafka 0.9 Consumer API > -- > > Key: SPARK-12177 > URL: https://issues.apache.org/jira/browse/SPARK-12177 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.6.0 >Reporter: Nikita Tarasenko > Labels: consumer, kafka > > Kafka 0.9 already released and it introduce new consumer API that not > compatible with old one. So, I added new consumer api. I made separate > classes in package org.apache.spark.streaming.kafka.v09 with changed API. I > didn't remove old classes for more backward compatibility. User will not need > to change his old spark applications when he uprgade to new Spark version. > Please rewiew my changes -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API
[ https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15091068#comment-15091068 ] Nikita Tarasenko commented on SPARK-12177: -- I created a new PR which is based on the master branch - https://github.com/apache/spark/pull/10681 > Update KafkaDStreams to new Kafka 0.9 Consumer API > -- > > Key: SPARK-12177 > URL: https://issues.apache.org/jira/browse/SPARK-12177 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.6.0 >Reporter: Nikita Tarasenko > Labels: consumer, kafka > > Kafka 0.9 already released and it introduce new consumer API that not > compatible with old one. So, I added new consumer api. I made separate > classes in package org.apache.spark.streaming.kafka.v09 with changed API. I > didn't remove old classes for more backward compatibility. User will not need > to change his old spark applications when he uprgade to new Spark version. > Please rewiew my changes -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API
[ https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15091067#comment-15091067 ] Apache Spark commented on SPARK-12177: -- User 'nikit-os' has created a pull request for this issue: https://github.com/apache/spark/pull/10681 > Update KafkaDStreams to new Kafka 0.9 Consumer API > -- > > Key: SPARK-12177 > URL: https://issues.apache.org/jira/browse/SPARK-12177 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.6.0 >Reporter: Nikita Tarasenko > Labels: consumer, kafka > > Kafka 0.9 already released and it introduce new consumer API that not > compatible with old one. So, I added new consumer api. I made separate > classes in package org.apache.spark.streaming.kafka.v09 with changed API. I > didn't remove old classes for more backward compatibility. User will not need > to change his old spark applications when he uprgade to new Spark version. > Please rewiew my changes -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API
[ https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15090588#comment-15090588 ] Nikita Tarasenko commented on SPARK-12177: -- Hi, Mark! Of course, it possible to collaborate =) 1. I think, I will create a new branch based on master, copy existing code with some changes (as you said) and create new pull request. Then you can issuing pull requests to this new branch. Is it ok? 2. Yes, this is a problem. I did not notice it. How we can handle it? I don't know how to properly separate those dependencies. Maybe we can put examples for Kafka 0.9 to kafka-v09 package until Kafka 0.8 exists? > Update KafkaDStreams to new Kafka 0.9 Consumer API > -- > > Key: SPARK-12177 > URL: https://issues.apache.org/jira/browse/SPARK-12177 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.6.0 >Reporter: Nikita Tarasenko > Labels: consumer, kafka > > Kafka 0.9 already released and it introduce new consumer API that not > compatible with old one. So, I added new consumer api. I made separate > classes in package org.apache.spark.streaming.kafka.v09 with changed API. I > didn't remove old classes for more backward compatibility. User will not need > to change his old spark applications when he uprgade to new Spark version. > Please rewiew my changes -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API
[ https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15090717#comment-15090717 ] Mark Grover commented on SPARK-12177: - #1 Sounds great, thanks! #2 Yeah, that's the only way I can think of for now but let me ponder a bit more. Thanks! Looking forward to it. > Update KafkaDStreams to new Kafka 0.9 Consumer API > -- > > Key: SPARK-12177 > URL: https://issues.apache.org/jira/browse/SPARK-12177 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.6.0 >Reporter: Nikita Tarasenko > Labels: consumer, kafka > > Kafka 0.9 already released and it introduce new consumer API that not > compatible with old one. So, I added new consumer api. I made separate > classes in package org.apache.spark.streaming.kafka.v09 with changed API. I > didn't remove old classes for more backward compatibility. User will not need > to change his old spark applications when he uprgade to new Spark version. > Please rewiew my changes -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API
[ https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15089750#comment-15089750 ] Mark Grover commented on SPARK-12177: - Thanks for working on this, Nikita. I'd like to help out. Here are a few things of feedback that I have: 1. I tried rebasing what you have in your current branch to upstream master (yours still seems to be based off of pre-1.6.0 code) but mostly because of some commits related to import ordering that happened on spark trunk relatively recently, I found it easier to migrate/copy the code for kafka-v09 and make the minor changes to examples and root pom instead of doing a 'git rebase'. 2. I also noticed that the v09DirectKafkaWordCount example is pulling at least the ConsumerConfig class from Kafka 0.8.2.1. This because the examples pom contains both kafka 0.8.2.1 and 0.9.0 dependencies and somewhat arbitrarily puts the 0.8.2.1 ahead. Since the ConsumerConfig class is available in both under the same namespace, we end up pulling 0.8.2.1. We should fix that. In general, I may have a few more changes/fixes that I'd like to contribute to your pull request. Would it be possible for us to collaborate? What's the best way to do so? Reopening the pull request and me adding to it? Or, just me issuing pull requests to [your branch|https://github.com/nikit-os/spark/tree/kafka-09-consumer-api]? Thanks! > Update KafkaDStreams to new Kafka 0.9 Consumer API > -- > > Key: SPARK-12177 > URL: https://issues.apache.org/jira/browse/SPARK-12177 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.6.0 >Reporter: Nikita Tarasenko > Labels: consumer, kafka > > Kafka 0.9 already released and it introduce new consumer API that not > compatible with old one. So, I added new consumer api. I made separate > classes in package org.apache.spark.streaming.kafka.v09 with changed API. I > didn't remove old classes for more backward compatibility. User will not need > to change his old spark applications when he uprgade to new Spark version. > Please rewiew my changes -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API
[ https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15089564#comment-15089564 ] Nikita Tarasenko commented on SPARK-12177: -- Thanks! I have used your changes with frequent KafkaConsumer object creation and with correct consumer.poll() implementation (below). Also I have removed receiver-based InputStream, related tests and example. > Update KafkaDStreams to new Kafka 0.9 Consumer API > -- > > Key: SPARK-12177 > URL: https://issues.apache.org/jira/browse/SPARK-12177 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.6.0 >Reporter: Nikita Tarasenko > Labels: consumer, kafka > > Kafka 0.9 already released and it introduce new consumer API that not > compatible with old one. So, I added new consumer api. I made separate > classes in package org.apache.spark.streaming.kafka.v09 with changed API. I > didn't remove old classes for more backward compatibility. User will not need > to change his old spark applications when he uprgade to new Spark version. > Please rewiew my changes -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API
[ https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15082674#comment-15082674 ] Mario Briggs commented on SPARK-12177: -- implemented here - https://github.com/mariobriggs/spark/commit/2fcbb721b99b48e336ba7ef7c317c279c9483840 > Update KafkaDStreams to new Kafka 0.9 Consumer API > -- > > Key: SPARK-12177 > URL: https://issues.apache.org/jira/browse/SPARK-12177 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.6.0 >Reporter: Nikita Tarasenko > Labels: consumer, kafka > > Kafka 0.9 already released and it introduce new consumer API that not > compatible with old one. So, I added new consumer api. I made separate > classes in package org.apache.spark.streaming.kafka.v09 with changed API. I > didn't remove old classes for more backward compatibility. User will not need > to change his old spark applications when he uprgade to new Spark version. > Please rewiew my changes -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API
[ https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15081565#comment-15081565 ] Mario Briggs commented on SPARK-12177: -- Confirmed that the frequent KafkaConsumer object creation was reason for the consumer.position() method hang. Cleaned up frequent KafkaConsumer object creation https://github.com/mariobriggs/spark/commit/e40df7ee70fd72418969e8f9c81a1fee304b8b1c and it resolved issue > Update KafkaDStreams to new Kafka 0.9 Consumer API > -- > > Key: SPARK-12177 > URL: https://issues.apache.org/jira/browse/SPARK-12177 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.6.0 >Reporter: Nikita Tarasenko > Labels: consumer, kafka > > Kafka 0.9 already released and it introduce new consumer API that not > compatible with old one. So, I added new consumer api. I made separate > classes in package org.apache.spark.streaming.kafka.v09 with changed API. I > didn't remove old classes for more backward compatibility. User will not need > to change his old spark applications when he uprgade to new Spark version. > Please rewiew my changes -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API
[ https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15075230#comment-15075230 ] Mario Briggs commented on SPARK-12177: -- you could also get just a few of the records you want i.e. not all in 1 shot override def getNext(): R = { if (iter == null || !iter.hasNext) { iter = consumer.poll(pollTime).iterator() } if (!iter.hasNext) { if ( requestOffset < part.untilOffset ) { // need to make another poll() and recheck above. So make a recursive call i.e. 'return getnext()' here ? } finished = true null.asInstanceOf[R] } else { ... > Update KafkaDStreams to new Kafka 0.9 Consumer API > -- > > Key: SPARK-12177 > URL: https://issues.apache.org/jira/browse/SPARK-12177 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.6.0 >Reporter: Nikita Tarasenko > Labels: consumer, kafka > > Kafka 0.9 already released and it introduce new consumer API that not > compatible with old one. So, I added new consumer api. I made separate > classes in package org.apache.spark.streaming.kafka.v09 with changed API. I > didn't remove old classes for more backward compatibility. User will not need > to change his old spark applications when he uprgade to new Spark version. > Please rewiew my changes -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API
[ https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15075172#comment-15075172 ] Mario Briggs commented on SPARK-12177: -- Nikita, thank you. A-C : Looks good to me. (BTW i didn't review changes related to receiver based approach, even in earlier round) D - I think it is OK for KafkaTestUtils to have dependency on core, since that is more of our internal test approach (however i havent spent time to think if even that can be bettered). To the higher issue, i think Kafka *will* provide TopicPartition as serializable, which will make this moot, but good that we have tracked it here > Update KafkaDStreams to new Kafka 0.9 Consumer API > -- > > Key: SPARK-12177 > URL: https://issues.apache.org/jira/browse/SPARK-12177 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.6.0 >Reporter: Nikita Tarasenko > Labels: consumer, kafka > > Kafka 0.9 already released and it introduce new consumer API that not > compatible with old one. So, I added new consumer api. I made separate > classes in package org.apache.spark.streaming.kafka.v09 with changed API. I > didn't remove old classes for more backward compatibility. User will not need > to change his old spark applications when he uprgade to new Spark version. > Please rewiew my changes -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API
[ https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15075205#comment-15075205 ] Mario Briggs commented on SPARK-12177: -- Very good point about creation of KafkaConsumer frequently. In fact, Praveen is investigating if that is reason the 'position()' method hangs when we have batch intervals at 200ms and below. So one way to try to optimize it is this way : since the 'compute' method in DirectKafkaInputDStream runs in the driver, why not store the 'KafkaConsumer' rather than the KafkaCluster as a member variable in this class. Of course we will need to mark it transient, so that its not attempted to be serialized and that means always check if null and re-initialize if required, before use. The only use of the Consumer here is to find the new latest offsets, so we will have to massage that method for use with an existing consumer object . Or another option is to let KafkaCluster have a KafkaConsumer instance as a member variable with same noted aspects about being transient. This also means, move the part about fetching the leader ipAddress for getPreferredLocations() away from KafkaRDD.getPartitions() to DirectKafkaInputDStream.compute() and have 'leaders' as constructor param to KafkaRDD ( i now realize that KafkaRDD is private so we are not having that on a public API as i thought earlier) > Update KafkaDStreams to new Kafka 0.9 Consumer API > -- > > Key: SPARK-12177 > URL: https://issues.apache.org/jira/browse/SPARK-12177 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.6.0 >Reporter: Nikita Tarasenko > Labels: consumer, kafka > > Kafka 0.9 already released and it introduce new consumer API that not > compatible with old one. So, I added new consumer api. I made separate > classes in package org.apache.spark.streaming.kafka.v09 with changed API. I > didn't remove old classes for more backward compatibility. User will not need > to change his old spark applications when he uprgade to new Spark version. > Please rewiew my changes -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API
[ https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15074131#comment-15074131 ] Mario Briggs commented on SPARK-12177: -- with regards to review comment 'C' (2 comments above), the kafka folks have clarified that the timeout parameter on the poll() method is the time to spend waiting till the client side gets the data (from the server) and not time to spend waiting for data to become available on the server. This means that one might have to call poll() more than once, till we get to the consumerRecord of 'untilOffset' > Update KafkaDStreams to new Kafka 0.9 Consumer API > -- > > Key: SPARK-12177 > URL: https://issues.apache.org/jira/browse/SPARK-12177 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.6.0 >Reporter: Nikita Tarasenko > Labels: consumer, kafka > > Kafka 0.9 already released and it introduce new consumer API that not > compatible with old one. So, I added new consumer api. I made separate > classes in package org.apache.spark.streaming.kafka.v09 with changed API. I > didn't remove old classes for more backward compatibility. User will not need > to change his old spark applications when he uprgade to new Spark version. > Please rewiew my changes -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API
[ https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15074262#comment-15074262 ] Nikita Tarasenko commented on SPARK-12177: -- Hi, Mario! Thank you for review! It very helpful for me. I made a few commits based on your A-C comments and your implementation on github repo: - https://github.com/nikit-os/spark/commit/44bb56806ab5fb86d0e27ed8108188c5808bae0f - https://github.com/nikit-os/spark/commit/85fad9410ab5133bd92e01af278e2f473c07b745 - https://github.com/nikit-os/spark/commit/2d36c76f1bc8bcba59261d37bbff90e67ece73d7 About 'D' comment - kafka-core needed not only for TopicAndPartition class but for some classes (such as KafkaConfig, ZkUtils, KafkaServer, Producer) that needed for KafkaTestUtils. This is reason why I keep kafka-core dependency. > Update KafkaDStreams to new Kafka 0.9 Consumer API > -- > > Key: SPARK-12177 > URL: https://issues.apache.org/jira/browse/SPARK-12177 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.6.0 >Reporter: Nikita Tarasenko > Labels: consumer, kafka > > Kafka 0.9 already released and it introduce new consumer API that not > compatible with old one. So, I added new consumer api. I made separate > classes in package org.apache.spark.streaming.kafka.v09 with changed API. I > didn't remove old classes for more backward compatibility. User will not need > to change his old spark applications when he uprgade to new Spark version. > Please rewiew my changes -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API
[ https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15074265#comment-15074265 ] Nikita Tarasenko commented on SPARK-12177: -- I'm not against to use only Direct approach and remove Receiver based approach from new implementations. The only thing that bothers me is frequent creation new instance of KafkaConsumer at KafkaCluster (withConsumer method). I do not know how effective it is. On this point of view, Receiver based approach more easier. But with ReliableKafkaReceiver we have problem with multithread acces to KafkaConsumer (because it is not thread-safe) - one from MessageHandler for poll and second from GeneratedBlockHandler for commit offsets. And if we leave only Direct approach - do you mean that we would change the original KafkaUtils by adding new functions for new DirectIS/KafkaRDD but using them from separate module with kafka09 classes? > Update KafkaDStreams to new Kafka 0.9 Consumer API > -- > > Key: SPARK-12177 > URL: https://issues.apache.org/jira/browse/SPARK-12177 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.6.0 >Reporter: Nikita Tarasenko > Labels: consumer, kafka > > Kafka 0.9 already released and it introduce new consumer API that not > compatible with old one. So, I added new consumer api. I made separate > classes in package org.apache.spark.streaming.kafka.v09 with changed API. I > didn't remove old classes for more backward compatibility. User will not need > to change his old spark applications when he uprgade to new Spark version. > Please rewiew my changes -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API
[ https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15074266#comment-15074266 ] Nikita Tarasenko commented on SPARK-12177: -- Should we use here https://github.com/nikit-os/spark/blob/44bb56806ab5fb86d0e27ed8108188c5808bae0f/external/kafka-v09/src/main/scala/org/apache/spark/streaming/kafka/v09/KafkaRDD.scala#L157 while loop for waiting data? > Update KafkaDStreams to new Kafka 0.9 Consumer API > -- > > Key: SPARK-12177 > URL: https://issues.apache.org/jira/browse/SPARK-12177 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.6.0 >Reporter: Nikita Tarasenko > Labels: consumer, kafka > > Kafka 0.9 already released and it introduce new consumer API that not > compatible with old one. So, I added new consumer api. I made separate > classes in package org.apache.spark.streaming.kafka.v09 with changed API. I > didn't remove old classes for more backward compatibility. User will not need > to change his old spark applications when he uprgade to new Spark version. > Please rewiew my changes -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API
[ https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15073035#comment-15073035 ] Mario Briggs commented on SPARK-12177: -- On the issue of lots of duplicate code from the original, 1 thought was whether we need to upgrade the older Receiver based approach to the new Consumer API? The Direct approach has so many benefits over the older Receiver based approach and i can't think of a drawback, that one might make the argument that we dont upgrade the latter at all, it remains on the older kafka consumer API and get deprecated over a long period time. Thoughts ? If we do go the above way, then there is very trivial overlap of code between original and this new consumer implementation. The public API signatures are different and do not clash and hence can be added to the existing KafkaUtils class. > Update KafkaDStreams to new Kafka 0.9 Consumer API > -- > > Key: SPARK-12177 > URL: https://issues.apache.org/jira/browse/SPARK-12177 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.6.0 >Reporter: Nikita Tarasenko > Labels: consumer, kafka > > Kafka 0.9 already released and it introduce new consumer API that not > compatible with old one. So, I added new consumer api. I made separate > classes in package org.apache.spark.streaming.kafka.v09 with changed API. I > didn't remove old classes for more backward compatibility. User will not need > to change his old spark applications when he uprgade to new Spark version. > Please rewiew my changes -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API
[ https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15073020#comment-15073020 ] Mario Briggs commented on SPARK-12177: -- Hi Nikita, thanks. Here are my review comments. I couldnt find a way to add them on the PR review in github, so added them here. My comments are a little detailed, since i too developed an implementation and tested it along with my colleague Praveen :-) after initial discussion on dev list A - KafkaCluster class getPartitions() seek() callers of the above methods all other methods that use withConsumer() These should not return a ‘Either’, but rather just the expected object ( the ‘Right’). The reason for the ‘Either’ object in the earlier code was due to the fact the earlier kafka client had to deal with trying the operation on all the ‘seedBrokers’ and handle the case if some of them were down. Similarly when dealing with ‘leaders’, client had to try the operation on all leaders for a TP (TopicAndPartition). When we use the new kaka-clients API, we don’t have to deal with trying against all the seedBrokers, leaders etc, since the new KafkaConsumer object internally handles all those details. Notice that in the earlier code, withBrokers() tries to connect() and invoke the passed in function multiple times with the brokers.forEach() and hence the need to accumulate errors. The earlier code also did a ‘return’ immediately when successful with one of the brokers. This does not apply with the new KafkaConsumer object. getPartitions() - https://github.com/nikit-os/spark/blob/kafka-09-consumer-api/external/kafka-v09/src/main/scala/org/apache/spark/streaming/kafka/v09/KafkaCluster.scala#L62 consumer.partitionsFor() java API will returns a null if the topic doesn’t exist. If you don’t handle that, you run into a NPE when the user specifies a topic that doesn’t exist or makes a typo in the topic name (also not returning an exception saying the partition doesn’t exist is not right) our implementation is at - https://github.com/mariobriggs/spark/blob/kafka0.9-streaming/external/kafka-newconsumer/src/main/scala/org/apache/spark/streaming/kafka/KafkaCluster.scala. If it is easier for you that we issue a PR to your repo, let us know B - KafkaRDD class getPreferredLocations() this method is missing in your code. The earlier implementation from Cody had the optimisation that if Kafka and the spark code (KafkaRDD) was running on the same cluster, then the RDD partition for a particular TopicPartition, would be local to that TopicPartition leader. Could you please add code to bring back this functionality. Our implementation, pulled this info inside the getPartitions- https://github.com/mariobriggs/spark/blob/kafka0.9-streaming/external/kafka-newconsumer/src/main/scala/org/apache/spark/streaming/kafka/KafkaRDD.scala#L52. Probably more efficient to do it inside compute() of the DStream, but that meant exposing ‘leaders’ on the KafkaRDD constructor as discussed - https://mail-archives.apache.org/mod_mbox/spark-dev/201512.mbox/%3CCAKWX9VV34Dto9irT3ZZfH78EoXO3bV3VHN8YYvTxfnyvGcRsQw%40mail.gmail.com%3E C - KafkaRDDIterator class getNext() As mentioned in issue #1 noted here - https://github.com/mariobriggs/spark/blob/kafka0.9-streaming/external/kafka-newconsumer/README.md#issues-noted-with-kafka-090-consumer-api ,KafkaConsumer.poll(x) is not guaranteed to return the data, when x is a small value. We are following up with [this issue with kafka](https://issues.apache.org/jira/browse/KAFKA-3044.I see that you have made this a configurable value in your implementation which is good, but either ways till this behaviour is clarified or even otherwise, we need [this assert](https://github.com/mariobriggs/spark/blob/kafka0.9-streaming/external/kafka-newconsumer/src/main/scala/org/apache/spark/streaming/kafka/KafkaRDD.scala#L171) or else we will be silently skipping data without the user knowing it (either default value or user specifying a smaller value) D- Non use of TopicPartition class of new Consumer You already have figured out that this class is not Serializable and hence in the public interface you have used the older TopicAndPartition class. We have raised [this issue](https://issues.apache.org/jira/browse/KAFKA-3029) with Kafka and maybe be provided with one (yet to see). However using the older TopicAndPartition class in our public API, which introduces a dependency on the older kafka core rather than just kaka-clients jar, i would think is not the preferred approach. If we are not provided with a serializable TopicPartition, then we should rather use our own serializable object (or just a tuple of string, int, Long) inside of DirectKafkaInputDStream to ensure it is Serializable. > Update KafkaDStreams to new Kafka 0.9 Consumer API > -- > >
[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API
[ https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15069579#comment-15069579 ] Nikita Tarasenko commented on SPARK-12177: -- I have added SSL configuration for Kafka. Commit - https://github.com/nikit-os/spark/commit/2379834aa92b127f1635013bbe3046e37464cbed > Update KafkaDStreams to new Kafka 0.9 Consumer API > -- > > Key: SPARK-12177 > URL: https://issues.apache.org/jira/browse/SPARK-12177 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.6.0 >Reporter: Nikita Tarasenko > Labels: consumer, kafka > > Kafka 0.9 already released and it introduce new consumer API that not > compatible with old one. So, I added new consumer api. I made separate > classes in package org.apache.spark.streaming.kafka.v09 with changed API. I > didn't remove old classes for more backward compatibility. User will not need > to change his old spark applications when he uprgade to new Spark version. > Please rewiew my changes -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API
[ https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15062313#comment-15062313 ] Cody Koeninger commented on SPARK-12177: Honestly SSL / auth is the only compelling thing about 0.9, not sure it makes sense to have 0.9 it without it. On Wed, Dec 16, 2015 at 3:16 AM, Nikita Tarasenko (JIRA)> Update KafkaDStreams to new Kafka 0.9 Consumer API > -- > > Key: SPARK-12177 > URL: https://issues.apache.org/jira/browse/SPARK-12177 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.6.0 >Reporter: Nikita Tarasenko > Labels: consumer, kafka > > Kafka 0.9 already released and it introduce new consumer API that not > compatible with old one. So, I added new consumer api. I made separate > classes in package org.apache.spark.streaming.kafka.v09 with changed API. I > didn't remove old classes for more backward compatibility. User will not need > to change his old spark applications when he uprgade to new Spark version. > Please rewiew my changes -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API
[ https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15059840#comment-15059840 ] Nikita Tarasenko commented on SPARK-12177: -- Current implementation doesn't include SSL support. I could add SSL support after accepting my current changes. I think for Spark integration we could use spark.ssl.* properties as global configuration and add new spark.ssl.kafka.* properties for overriding global properties. > Update KafkaDStreams to new Kafka 0.9 Consumer API > -- > > Key: SPARK-12177 > URL: https://issues.apache.org/jira/browse/SPARK-12177 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.6.0 >Reporter: Nikita Tarasenko > Labels: consumer, kafka > > Kafka 0.9 already released and it introduce new consumer API that not > compatible with old one. So, I added new consumer api. I made separate > classes in package org.apache.spark.streaming.kafka.v09 with changed API. I > didn't remove old classes for more backward compatibility. User will not need > to change his old spark applications when he uprgade to new Spark version. > Please rewiew my changes -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API
[ https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15060218#comment-15060218 ] Tom Waterhouse commented on SPARK-12177: SSL is very important for our deployment, +1 for adding it into the integration. > Update KafkaDStreams to new Kafka 0.9 Consumer API > -- > > Key: SPARK-12177 > URL: https://issues.apache.org/jira/browse/SPARK-12177 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.6.0 >Reporter: Nikita Tarasenko > Labels: consumer, kafka > > Kafka 0.9 already released and it introduce new consumer API that not > compatible with old one. So, I added new consumer api. I made separate > classes in package org.apache.spark.streaming.kafka.v09 with changed API. I > didn't remove old classes for more backward compatibility. User will not need > to change his old spark applications when he uprgade to new Spark version. > Please rewiew my changes -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API
[ https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15058856#comment-15058856 ] Dean Wampler commented on SPARK-12177: -- Since the new Kafka 0.9 consumer API supports SSL, would implementation of #12177 enable use of SSL or would additional work be required in Spark's integration? > Update KafkaDStreams to new Kafka 0.9 Consumer API > -- > > Key: SPARK-12177 > URL: https://issues.apache.org/jira/browse/SPARK-12177 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.6.0 >Reporter: Nikita Tarasenko > Labels: consumer, kafka > > Kafka 0.9 already released and it introduce new consumer API that not > compatible with old one. So, I added new consumer api. I made separate > classes in package org.apache.spark.streaming.kafka.v09 with changed API. I > didn't remove old classes for more backward compatibility. User will not need > to change his old spark applications when he uprgade to new Spark version. > Please rewiew my changes -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API
[ https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15055908#comment-15055908 ] Apache Spark commented on SPARK-12177: -- User 'nikit-os' has created a pull request for this issue: https://github.com/apache/spark/pull/10294 > Update KafkaDStreams to new Kafka 0.9 Consumer API > -- > > Key: SPARK-12177 > URL: https://issues.apache.org/jira/browse/SPARK-12177 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.6.0 >Reporter: Nikita Tarasenko > Labels: consumer, kafka > > Kafka 0.9 already released and it introduce new consumer API that not > compatible with old one. So, I added new consumer api. I made separate > classes in package org.apache.spark.streaming.kafka.v09 with changed API. I > didn't remove old classes for more backward compatibility. User will not need > to change his old spark applications when he uprgade to new Spark version. > Please rewiew my changes -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API
[ https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15055937#comment-15055937 ] Nikita Tarasenko commented on SPARK-12177: -- Do we need DirectKafkaInputDStream yet? We shouldn't use Zookeeper directly any more. At now DirectKafkaInputDStream is similar to KafkaInputDStream but with manual offset managment > Update KafkaDStreams to new Kafka 0.9 Consumer API > -- > > Key: SPARK-12177 > URL: https://issues.apache.org/jira/browse/SPARK-12177 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.6.0 >Reporter: Nikita Tarasenko > Labels: consumer, kafka > > Kafka 0.9 already released and it introduce new consumer API that not > compatible with old one. So, I added new consumer api. I made separate > classes in package org.apache.spark.streaming.kafka.v09 with changed API. I > didn't remove old classes for more backward compatibility. User will not need > to change his old spark applications when he uprgade to new Spark version. > Please rewiew my changes -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API
[ https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15055911#comment-15055911 ] Nikita Tarasenko commented on SPARK-12177: -- I moved all changes for Kafka 0.9 to separate subproject. > Update KafkaDStreams to new Kafka 0.9 Consumer API > -- > > Key: SPARK-12177 > URL: https://issues.apache.org/jira/browse/SPARK-12177 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.6.0 >Reporter: Nikita Tarasenko > Labels: consumer, kafka > > Kafka 0.9 already released and it introduce new consumer API that not > compatible with old one. So, I added new consumer api. I made separate > classes in package org.apache.spark.streaming.kafka.v09 with changed API. I > didn't remove old classes for more backward compatibility. User will not need > to change his old spark applications when he uprgade to new Spark version. > Please rewiew my changes -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API
[ https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15046975#comment-15046975 ] Cody Koeninger commented on SPARK-12177: I really think this needs to be handled as a separate subproject, or otherwise in a fully backwards compatible way. The new consumer api will require upgrading kafka brokers, which is a big ask just in order for people to upgrade spark versions. > Update KafkaDStreams to new Kafka 0.9 Consumer API > -- > > Key: SPARK-12177 > URL: https://issues.apache.org/jira/browse/SPARK-12177 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.6.0 >Reporter: Nikita Tarasenko > Labels: consumer, kafka > > Kafka 0.9 already released and it introduce new consumer API that not > compatible with old one. So, I added new consumer api. I made separate > classes in package org.apache.spark.streaming.kafka.v09 with changed API. I > didn't remove old classes for more backward compatibility. User will not need > to change his old spark applications when he uprgade to new Spark version. > Please rewiew my changes -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API
[ https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15044799#comment-15044799 ] Apache Spark commented on SPARK-12177: -- User 'ntarasenko' has created a pull request for this issue: https://github.com/apache/spark/pull/10173 > Update KafkaDStreams to new Kafka 0.9 Consumer API > -- > > Key: SPARK-12177 > URL: https://issues.apache.org/jira/browse/SPARK-12177 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.6.0 >Reporter: Nikita Tarasenko > Labels: consumer, kafka > > Kafka 0.9 already released and it introduce new consumer API that not > compatible with old one. So, I added new consumer api. I made separate > classes in package org.apache.spark.streaming.kafka.v09 with changed API. I > didn't remove old classes for more backward compatibility. User will not need > to change his old spark applications when he uprgade to new Spark version. > Please rewiew my changes -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org