[jira] [Commented] (SPARK-33635) Performance regression in Kafka read

2021-01-06 Thread David Wyles (Jira)
[ https://issues.apache.org/jira/browse/SPARK-33635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17259801#comment-17259801 ] David Wyles commented on SPARK-33635: - Thanks a lot for getting down to the route cause and

[jira] [Commented] (SPARK-33635) Performance regression in Kafka read

2021-01-05 Thread Jungtaek Lim (Jira)
[ https://issues.apache.org/jira/browse/SPARK-33635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17259446#comment-17259446 ] Jungtaek Lim commented on SPARK-33635: -- One more point, though the root cause is actually the

[jira] [Commented] (SPARK-33635) Performance regression in Kafka read

2021-01-05 Thread Apache Spark (Jira)
[ https://issues.apache.org/jira/browse/SPARK-33635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17259407#comment-17259407 ] Apache Spark commented on SPARK-33635: -- User 'HeartSaVioR' has created a pull request for this

[jira] [Commented] (SPARK-33635) Performance regression in Kafka read

2021-01-05 Thread Jungtaek Lim (Jira)
[ https://issues.apache.org/jira/browse/SPARK-33635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17259399#comment-17259399 ] Jungtaek Lim commented on SPARK-33635: -- I've spent some time to trace the issue, and noticed

[jira] [Commented] (SPARK-33635) Performance regression in Kafka read

2021-01-03 Thread Yukihito X (Jira)
[ https://issues.apache.org/jira/browse/SPARK-33635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17257962#comment-17257962 ] Yukihito X commented on SPARK-33635: [~david.wyles], I tried your sample code in my local dev

[jira] [Commented] (SPARK-33635) Performance regression in Kafka read

2020-12-29 Thread David Wyles (Jira)
[ https://issues.apache.org/jira/browse/SPARK-33635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17256046#comment-17256046 ] David Wyles commented on SPARK-33635: - [~gsomogyi] I now have my results. I was so unhappy about

[jira] [Commented] (SPARK-33635) Performance regression in Kafka read

2020-12-19 Thread Gabor Somogyi (Jira)
[ https://issues.apache.org/jira/browse/SPARK-33635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17252153#comment-17252153 ] Gabor Somogyi commented on SPARK-33635: --- {quote}Remember, based on all my testing, and raw kafka

[jira] [Commented] (SPARK-33635) Performance regression in Kafka read

2020-12-18 Thread David Wyles (Jira)
[ https://issues.apache.org/jira/browse/SPARK-33635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17251911#comment-17251911 ] David Wyles commented on SPARK-33635: - [~gsomogyi] Just so you know, I'm still doing this. I've

[jira] [Commented] (SPARK-33635) Performance regression in Kafka read

2020-12-13 Thread Gabor Somogyi (Jira)
[ https://issues.apache.org/jira/browse/SPARK-33635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17248550#comment-17248550 ] Gabor Somogyi commented on SPARK-33635: --- {quote}The collect in this test case is only 13 items of

[jira] [Commented] (SPARK-33635) Performance regression in Kafka read

2020-12-13 Thread Gabor Somogyi (Jira)
[ https://issues.apache.org/jira/browse/SPARK-33635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17248549#comment-17248549 ] Gabor Somogyi commented on SPARK-33635: --- Mixed up with DStreams, in Strutured Streaming and SQL

[jira] [Commented] (SPARK-33635) Performance regression in Kafka read

2020-12-11 Thread David Wyles (Jira)
[ https://issues.apache.org/jira/browse/SPARK-33635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17247879#comment-17247879 ] David Wyles commented on SPARK-33635: - [~gsomogyi]  "try to turn off Kafka consumer caching. Apart

[jira] [Commented] (SPARK-33635) Performance regression in Kafka read

2020-12-11 Thread David Wyles (Jira)
[ https://issues.apache.org/jira/browse/SPARK-33635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17247878#comment-17247878 ] David Wyles commented on SPARK-33635: - "Since you're measuring speed I've ported the Kafka source

[jira] [Commented] (SPARK-33635) Performance regression in Kafka read

2020-12-11 Thread David Wyles (Jira)
[ https://issues.apache.org/jira/browse/SPARK-33635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17247875#comment-17247875 ] David Wyles commented on SPARK-33635: - I'll give all those a go and get back to you. > Performance

[jira] [Commented] (SPARK-33635) Performance regression in Kafka read

2020-12-09 Thread Gabor Somogyi (Jira)
[ https://issues.apache.org/jira/browse/SPARK-33635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246643#comment-17246643 ] Gabor Somogyi commented on SPARK-33635: --- {quote}I no longer believe this is a true regression in

[jira] [Commented] (SPARK-33635) Performance regression in Kafka read

2020-12-09 Thread Gabor Somogyi (Jira)
[ https://issues.apache.org/jira/browse/SPARK-33635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246640#comment-17246640 ] Gabor Somogyi commented on SPARK-33635: --- I've changed to SQL because you're not executing a

[jira] [Commented] (SPARK-33635) Performance regression in Kafka read

2020-12-09 Thread Gabor Somogyi (Jira)
[ https://issues.apache.org/jira/browse/SPARK-33635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246633#comment-17246633 ] Gabor Somogyi commented on SPARK-33635: --- Since you're measuring speed I've ported the Kafka source

[jira] [Commented] (SPARK-33635) Performance regression in Kafka read

2020-12-09 Thread Gabor Somogyi (Jira)
[ https://issues.apache.org/jira/browse/SPARK-33635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246631#comment-17246631 ] Gabor Somogyi commented on SPARK-33635: --- BTW, I'm sure you know but using collect gathers all the

[jira] [Commented] (SPARK-33635) Performance regression in Kafka read

2020-12-09 Thread Gabor Somogyi (Jira)
[ https://issues.apache.org/jira/browse/SPARK-33635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246625#comment-17246625 ] Gabor Somogyi commented on SPARK-33635: --- [~david.wyles] try to turn off Kafka consumer caching.

[jira] [Commented] (SPARK-33635) Performance regression in Kafka read

2020-12-09 Thread David Wyles (Jira)
[ https://issues.apache.org/jira/browse/SPARK-33635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246430#comment-17246430 ] David Wyles commented on SPARK-33635: - Having performed my tests I can conclude that the kafka

[jira] [Commented] (SPARK-33635) Performance regression in Kafka read

2020-12-07 Thread David Wyles (Jira)
[ https://issues.apache.org/jira/browse/SPARK-33635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17245329#comment-17245329 ] David Wyles commented on SPARK-33635: - The diffs on that library between 2.4.5 and 3.0.0 are very

[jira] [Commented] (SPARK-33635) Performance regression in Kafka read

2020-12-07 Thread David Wyles (Jira)
[ https://issues.apache.org/jira/browse/SPARK-33635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17245309#comment-17245309 ] David Wyles commented on SPARK-33635: - Also note, in my example - the only thing that changes is the

[jira] [Commented] (SPARK-33635) Performance regression in Kafka read

2020-12-07 Thread David Wyles (Jira)
[ https://issues.apache.org/jira/browse/SPARK-33635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17245306#comment-17245306 ] David Wyles commented on SPARK-33635: - Is anyone even able to confirm my results are not just unique

[jira] [Commented] (SPARK-33635) Performance regression in Kafka read

2020-12-07 Thread David Wyles (Jira)
[ https://issues.apache.org/jira/browse/SPARK-33635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17245304#comment-17245304 ] David Wyles commented on SPARK-33635: - Fair point, the library I suspect is spark-sql-kafka-0-10, is

[jira] [Commented] (SPARK-33635) Performance regression in Kafka read

2020-12-07 Thread Sean R. Owen (Jira)
[ https://issues.apache.org/jira/browse/SPARK-33635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17245277#comment-17245277 ] Sean R. Owen commented on SPARK-33635: -- I don't think this is actionable by others unless you can