[jira] [Commented] (SPARK-16534) Kafka 0.10 Python support
[ https://issues.apache.org/jira/browse/SPARK-16534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949556#comment-16949556 ] Hyukjin Kwon commented on SPARK-16534: -- Actually, this JIRA seems not going to be fixed per the explicit objection from a committer: {quote} >> So I would like to -1 this patch. I think it's been a mistake to support >> dstream in Python – yes it satisfies a checkbox and Spark could claim >> there's support for streaming in Python. However, the tooling and maturity >> for working with streaming data (both in Spark and the more broad ecosystem) >> is simply not there. It is a big baggage to maintain, and creates a the >> wrong impression that production streaming jobs can be written in Python. {quote} > Kafka 0.10 Python support > - > > Key: SPARK-16534 > URL: https://issues.apache.org/jira/browse/SPARK-16534 > Project: Spark > Issue Type: Sub-task > Components: DStreams >Reporter: Tathagata Das >Priority: Major > Labels: bulk-closed > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16534) Kafka 0.10 Python support
[ https://issues.apache.org/jira/browse/SPARK-16534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949554#comment-16949554 ] Hyukjin Kwon commented on SPARK-16534: -- No, it was resolved as discussed about affected versions and JIRA resolusions for EOL releases. If you're sure if this JIRA still affects Spark 3.0, feel free to reopen it. > Kafka 0.10 Python support > - > > Key: SPARK-16534 > URL: https://issues.apache.org/jira/browse/SPARK-16534 > Project: Spark > Issue Type: Sub-task > Components: DStreams >Reporter: Tathagata Das >Priority: Major > Labels: bulk-closed > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16534) Kafka 0.10 Python support
[ https://issues.apache.org/jira/browse/SPARK-16534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949521#comment-16949521 ] Rakesh commented on SPARK-16534: Was this actually resolved? Any when the support will be available? > Kafka 0.10 Python support > - > > Key: SPARK-16534 > URL: https://issues.apache.org/jira/browse/SPARK-16534 > Project: Spark > Issue Type: Sub-task > Components: DStreams >Reporter: Tathagata Das >Priority: Major > Labels: bulk-closed > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16534) Kafka 0.10 Python support
[ https://issues.apache.org/jira/browse/SPARK-16534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16537035#comment-16537035 ] Thomas Graves commented on SPARK-16534: --- I agree it seems a bit of a bad user story to drop support after having it with kafka 0.8, [~rxin] can you expand on the issues? Is it something that just needs more work or just a bad idea due to performance or other issues? > Kafka 0.10 Python support > - > > Key: SPARK-16534 > URL: https://issues.apache.org/jira/browse/SPARK-16534 > Project: Spark > Issue Type: Sub-task > Components: DStreams >Reporter: Tathagata Das >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16534) Kafka 0.10 Python support
[ https://issues.apache.org/jira/browse/SPARK-16534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16537034#comment-16537034 ] Thomas Graves commented on SPARK-16534: --- If we aren't going to do this we should close this as won't fix with an explanation. >From the pull request From [~rxin] >> So I would like to -1 this patch. I think it's been a mistake to support >> dstream in Python -- yes it satisfies a checkbox and Spark could claim >> there's support for streaming in Python. However, the tooling and maturity >> for working with streaming data (both in Spark and the more broad ecosystem) >> is simply not there. It is a big baggage to maintain, and creates a the >> wrong impression that production streaming jobs can be written in Python. > Kafka 0.10 Python support > - > > Key: SPARK-16534 > URL: https://issues.apache.org/jira/browse/SPARK-16534 > Project: Spark > Issue Type: Sub-task > Components: DStreams >Reporter: Tathagata Das >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16534) Kafka 0.10 Python support
[ https://issues.apache.org/jira/browse/SPARK-16534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16327553#comment-16327553 ] Maciej Bryński commented on SPARK-16534: [~rxin] I tested this patch with 2.2.1 and everything is working fine. Why it's not in the main distribution ? > Kafka 0.10 Python support > - > > Key: SPARK-16534 > URL: https://issues.apache.org/jira/browse/SPARK-16534 > Project: Spark > Issue Type: Sub-task > Components: DStreams >Reporter: Tathagata Das >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16534) Kafka 0.10 Python support
[ https://issues.apache.org/jira/browse/SPARK-16534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16008284#comment-16008284 ] Guangyang Li commented on SPARK-16534: -- I also hope there could be Python support for Kafka 0.10. The main issue is that Spark has the support with Kafka 0.8 and we are using it for production. While I really don't see the point that Spark stops it from updating to 0.10. > Kafka 0.10 Python support > - > > Key: SPARK-16534 > URL: https://issues.apache.org/jira/browse/SPARK-16534 > Project: Spark > Issue Type: Sub-task > Components: DStreams >Reporter: Tathagata Das > -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16534) Kafka 0.10 Python support
[ https://issues.apache.org/jira/browse/SPARK-16534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15496823#comment-15496823 ] Reynold Xin commented on SPARK-16534: - [~maver1ck] you don't need to wait on this do you? You can just build a module outside Spark to use for now. > Kafka 0.10 Python support > - > > Key: SPARK-16534 > URL: https://issues.apache.org/jira/browse/SPARK-16534 > Project: Spark > Issue Type: Sub-task > Components: Streaming >Reporter: Tathagata Das > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16534) Kafka 0.10 Python support
[ https://issues.apache.org/jira/browse/SPARK-16534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15495496#comment-15495496 ] Maciej Bryński commented on SPARK-16534: [~rxin] I understand your point of view, but I think that DStreams are sometimes the only option. (especially when there is no support for datasets in Python world) I'm using df.rdd.map a lot in my ETLs and dstreams are natural continuation when moving from batch to streaming world. Right now I have production streaming environment developed in Python and I'm looking forward to use new Kafka API. (mostly because of SSL support, but other features too) And I hope that I'm not the only one. Could we back to work on this feature ? > Kafka 0.10 Python support > - > > Key: SPARK-16534 > URL: https://issues.apache.org/jira/browse/SPARK-16534 > Project: Spark > Issue Type: Sub-task > Components: Streaming >Reporter: Tathagata Das > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16534) Kafka 0.10 Python support
[ https://issues.apache.org/jira/browse/SPARK-16534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15494955#comment-15494955 ] Reynold Xin commented on SPARK-16534: - [~maver1ck] thanks for the comment. That's a great point. That said, there is a big difference in whether we can implement a feature and allow users to use it for demo/learning, versus whether we realistically do a good job so it is good for use in production 24/7. It is very easy to promise the former and just ignore the latter. In this context, it would be a lot simpler to have the structured streaming architecture working in production 24/7 in the long term than the dstream architecture, **for non JVM languages**. To clarify, I wasn't suggesting streaming in Python should be killed, but rather dstream in Python isn't a good architecture for running 24/7 streaming jobs. > Kafka 0.10 Python support > - > > Key: SPARK-16534 > URL: https://issues.apache.org/jira/browse/SPARK-16534 > Project: Spark > Issue Type: Sub-task > Components: Streaming >Reporter: Tathagata Das > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16534) Kafka 0.10 Python support
[ https://issues.apache.org/jira/browse/SPARK-16534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15491107#comment-15491107 ] Maciej Bryński commented on SPARK-16534: [~rxin] Could you explain your decision ? I think that dropping Python support is very bad sign for Spark community. In my example I started with batch jobs in Python. Having ability to use same code in Streaming Job was the main reason to choose Spark. Eventually I'm here with production code written in Python supporting both batch and streaming. And I won't be able to use new Kafka features like SSL. Never. Am I right ? Or maybe I missed something. As far as I understand your point is that support for Python in Spark Streaming is wrong and we shouldn't develop new features. I don't agree with that. > Kafka 0.10 Python support > - > > Key: SPARK-16534 > URL: https://issues.apache.org/jira/browse/SPARK-16534 > Project: Spark > Issue Type: Sub-task > Components: Streaming >Reporter: Tathagata Das > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16534) Kafka 0.10 Python support
[ https://issues.apache.org/jira/browse/SPARK-16534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15401208#comment-15401208 ] Cody Koeninger commented on SPARK-16534: It's on the PR. Yes, one comitter veto is generally sufficient. > Kafka 0.10 Python support > - > > Key: SPARK-16534 > URL: https://issues.apache.org/jira/browse/SPARK-16534 > Project: Spark > Issue Type: Sub-task > Components: Streaming >Reporter: Tathagata Das > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16534) Kafka 0.10 Python support
[ https://issues.apache.org/jira/browse/SPARK-16534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15401192#comment-15401192 ] Jacek Laskowski commented on SPARK-16534: - Ming posting the -1 from [~rxin] ad acta (at the very least). A single -1 should not be the only reason to ditch a proposal, should it? > Kafka 0.10 Python support > - > > Key: SPARK-16534 > URL: https://issues.apache.org/jira/browse/SPARK-16534 > Project: Spark > Issue Type: Sub-task > Components: Streaming >Reporter: Tathagata Das > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16534) Kafka 0.10 Python support
[ https://issues.apache.org/jira/browse/SPARK-16534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15401128#comment-15401128 ] Cody Koeninger commented on SPARK-16534: This idea got a -1 from Reynold, so unless anyone's going to argue for it the ticket should be closed > Kafka 0.10 Python support > - > > Key: SPARK-16534 > URL: https://issues.apache.org/jira/browse/SPARK-16534 > Project: Spark > Issue Type: Sub-task > Components: Streaming >Reporter: Tathagata Das > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16534) Kafka 0.10 Python support
[ https://issues.apache.org/jira/browse/SPARK-16534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15391274#comment-15391274 ] Apache Spark commented on SPARK-16534: -- User 'jerryshao' has created a pull request for this issue: https://github.com/apache/spark/pull/14340 > Kafka 0.10 Python support > - > > Key: SPARK-16534 > URL: https://issues.apache.org/jira/browse/SPARK-16534 > Project: Spark > Issue Type: Sub-task > Components: Streaming >Reporter: Tathagata Das > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16534) Kafka 0.10 Python support
[ https://issues.apache.org/jira/browse/SPARK-16534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15378240#comment-15378240 ] Cody Koeninger commented on SPARK-16534: [~jerryshao] if you want to work on this, go for it. ping me with any questions that come up. > Kafka 0.10 Python support > - > > Key: SPARK-16534 > URL: https://issues.apache.org/jira/browse/SPARK-16534 > Project: Spark > Issue Type: Sub-task > Components: Streaming >Reporter: Tathagata Das > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16534) Kafka 0.10 Python support
[ https://issues.apache.org/jira/browse/SPARK-16534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15378179#comment-15378179 ] Tathagata Das commented on SPARK-16534: --- No we dont need to fix this for 2.0. [~c...@koeninger.org] are already working on this? If not, then Saisai can pick this up. > Kafka 0.10 Python support > - > > Key: SPARK-16534 > URL: https://issues.apache.org/jira/browse/SPARK-16534 > Project: Spark > Issue Type: Sub-task > Components: Streaming >Reporter: Tathagata Das > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16534) Kafka 0.10 Python support
[ https://issues.apache.org/jira/browse/SPARK-16534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15376449#comment-15376449 ] Saisai Shao commented on SPARK-16534: - Maybe I can take a try if no one is working on this :). BTW do we need to fix this in 2.0.0 since now in RC? > Kafka 0.10 Python support > - > > Key: SPARK-16534 > URL: https://issues.apache.org/jira/browse/SPARK-16534 > Project: Spark > Issue Type: Sub-task > Components: Streaming >Reporter: Tathagata Das > Fix For: 2.0.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16534) Kafka 0.10 Python support
[ https://issues.apache.org/jira/browse/SPARK-16534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15376444#comment-15376444 ] Tathagata Das commented on SPARK-16534: --- @koeninger are you working on this? > Kafka 0.10 Python support > - > > Key: SPARK-16534 > URL: https://issues.apache.org/jira/browse/SPARK-16534 > Project: Spark > Issue Type: Sub-task > Components: Streaming >Reporter: Tathagata Das > Fix For: 2.0.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16534) Kafka 0.10 Python support
[ https://issues.apache.org/jira/browse/SPARK-16534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15376432#comment-15376432 ] Saisai Shao commented on SPARK-16534: - Is there anyone working on this? > Kafka 0.10 Python support > - > > Key: SPARK-16534 > URL: https://issues.apache.org/jira/browse/SPARK-16534 > Project: Spark > Issue Type: Sub-task > Components: Streaming >Reporter: Tathagata Das > Fix For: 2.0.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org