[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API

2016-05-27 Thread Cody Koeninger (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15304143#comment-15304143
 ] 

Cody Koeninger commented on SPARK-12177:


This issue already does link to Mark's PR.  In fact this issue probably 
shouldn't link to it unless he updates it, because that PR doesn't handle some 
pretty important things.

I went ahead and merged my 0.10 branch into my linked PR.  I think it's fairly 
obvious releasing something just for 0.9 is not useful at this point.

For anyone asking for this to be merged, have you actually tested any of these 
PRs to see if they meet your needs?

> Update KafkaDStreams to new Kafka 0.9 Consumer API
> --
>
> Key: SPARK-12177
> URL: https://issues.apache.org/jira/browse/SPARK-12177
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.6.0
>Reporter: Nikita Tarasenko
>  Labels: consumer, kafka
>
> Kafka 0.9 already released and it introduce new consumer API that not 
> compatible with old one. So, I added new consumer api. I made separate 
> classes in package org.apache.spark.streaming.kafka.v09 with changed API. I 
> didn't remove old classes for more backward compatibility. User will not need 
> to change his old spark applications when he uprgade to new Spark version.
> Please rewiew my changes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API

2016-05-27 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15303874#comment-15303874
 ] 

Grzegorz Rożniecki commented on SPARK-12177:


Any ETA on this? Being forced to work with old (0.8.X) Kafka APIs is not 
pleasant...

Plus, this issue should link to this [GitHub 
PR|https://github.com/apache/spark/pull/10953].

> Update KafkaDStreams to new Kafka 0.9 Consumer API
> --
>
> Key: SPARK-12177
> URL: https://issues.apache.org/jira/browse/SPARK-12177
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.6.0
>Reporter: Nikita Tarasenko
>  Labels: consumer, kafka
>
> Kafka 0.9 already released and it introduce new consumer API that not 
> compatible with old one. So, I added new consumer api. I made separate 
> classes in package org.apache.spark.streaming.kafka.v09 with changed API. I 
> didn't remove old classes for more backward compatibility. User will not need 
> to change his old spark applications when he uprgade to new Spark version.
> Please rewiew my changes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API

2016-05-25 Thread Andrey Borisov (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15300227#comment-15300227
 ] 

Andrey Borisov commented on SPARK-12177:


kafka 0.10.0.0 has been released, probably pull request can be merged now.

> Update KafkaDStreams to new Kafka 0.9 Consumer API
> --
>
> Key: SPARK-12177
> URL: https://issues.apache.org/jira/browse/SPARK-12177
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.6.0
>Reporter: Nikita Tarasenko
>  Labels: consumer, kafka
>
> Kafka 0.9 already released and it introduce new consumer API that not 
> compatible with old one. So, I added new consumer api. I made separate 
> classes in package org.apache.spark.streaming.kafka.v09 with changed API. I 
> didn't remove old classes for more backward compatibility. User will not need 
> to change his old spark applications when he uprgade to new Spark version.
> Please rewiew my changes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API

2016-05-06 Thread Mark Grover (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15274910#comment-15274910
 ] 

Mark Grover commented on SPARK-12177:
-

I spent some time earlier today looking at the latest Kafka 0.10 RC. Thanks 
Cody, looks like you poked at it too.

As Cody found out, the new kafka consumer API is in flux. I found and filed a 
PR (KAFKA-3669) for an incompatibility in KafkaConfig, which is now fixed in 
Kafka 0.10. But, there's more - KAFKA-3633 is of note - which discusses fixing 
a compatibility break between 0.9 and 0.10 and remains unresolved.

So, at this point, I can say, that there's no point in committing the PR 
associated with this JIRA (https://github.com/apache/spark/pull/11863), until 
at least Kafka 0.10.0 is released and we have a good sense that Kafka 0.11.0 is 
not going to break compatibility with 0.10.0. Otherwise, we have to bear the 
burden of adding and maintaining complexity to build against multiple versions 
of Kafka, something Storm folks are already suffering from, having their 
KafkaSpout now using the new Kafka consumer API from 0.9.

Given the timing of the Kafka 0.10.0 release and Spark 2 release, this JIRA 
likely wouldn't get resolved in 2.0.




> Update KafkaDStreams to new Kafka 0.9 Consumer API
> --
>
> Key: SPARK-12177
> URL: https://issues.apache.org/jira/browse/SPARK-12177
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.6.0
>Reporter: Nikita Tarasenko
>  Labels: consumer, kafka
>
> Kafka 0.9 already released and it introduce new consumer API that not 
> compatible with old one. So, I added new consumer api. I made separate 
> classes in package org.apache.spark.streaming.kafka.v09 with changed API. I 
> didn't remove old classes for more backward compatibility. User will not need 
> to change his old spark applications when he uprgade to new Spark version.
> Please rewiew my changes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API

2016-05-06 Thread Cody Koeninger (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15274294#comment-15274294
 ] 

Cody Koeninger commented on SPARK-12177:


I have a branch working with RC3 of kafka 0.10, tests are at least passing.

https://github.com/koeninger/spark-1/tree/kafka-0.10

The changes aren't huge, but I still don't see a reason to publish an artifact 
for 0.9 only to have it immediately be outdated.

> Update KafkaDStreams to new Kafka 0.9 Consumer API
> --
>
> Key: SPARK-12177
> URL: https://issues.apache.org/jira/browse/SPARK-12177
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.6.0
>Reporter: Nikita Tarasenko
>  Labels: consumer, kafka
>
> Kafka 0.9 already released and it introduce new consumer API that not 
> compatible with old one. So, I added new consumer api. I made separate 
> classes in package org.apache.spark.streaming.kafka.v09 with changed API. I 
> didn't remove old classes for more backward compatibility. User will not need 
> to change his old spark applications when he uprgade to new Spark version.
> Please rewiew my changes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API

2016-05-03 Thread Mark Grover (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15268696#comment-15268696
 ] 

Mark Grover commented on SPARK-12177:
-

bq. 1. Rename existing Kafka connector to include 0.8 version in the Maven 
project name.
That sounds good to me.

bq. 2. Don't support 0.9 client for now (otherwise we will need to support 3 
clients very soon). We should revisit the 0.10 support in 2.1, and most likely 
support that.
This is a little more involved. I agree with Cody that the new Kafka consumer 
API implementation in Spark, doesn't really have a benefit right now since we 
can't use the security features which are gated by delegation tokens support in 
Kafka (KAFKA-1696). However, delegation tokens aren't even going to make it to 
Kafka 0.10, so I see little point in us not committing the new Kafka consumer 
API implementation to Spark _because of that_.

Also, I think
bq. 2. Kafka 0.9 changes the client API.
can be better expressed as 
bq. Kafka 0.9 introduces a new client API.

There are 2 axes - one is kafka version (0.8/0.9) and another is the consumer 
API version (old/new). 
Both Kafka 0.8 and 0.9 support the old API without any modifications (for the 
most part) and the existing kafka module in Spark will continue to work with 
Kafka 0.8 and 0.9 (and with Kafka 0.10, I'd imagine. I have been working with 
Kafka community to report issues like KAFKA-3563, which break old API 
compatibility) because existing Spark module is based on the old API which is 
meant to be compatible in all those versions. As far as the new API goes, that 
may change in incompatible ways between 0.9 and 0.10, so we may need a new 
sub-module for an 0.10 based API implementation after all. 

The point I am trying to make  is that there's nothing we'd gain by waiting for 
Kafka 0.10 to come out. It's not any better than Kafka 0.9 in terms of support 
of security features. I suppose the only thing you could save on is not having 
an additional subproject, if new consumer API from Kafka 0.10 broke 
compatibility significantly as compared to 0.9's new consumer API (smaller 
incompatibilities can be deal with reflection and such). I haven't played with 
the RC yet so I don't know if even that is the case. So, really we should be 
gating this on security features (like KAFKA-1696) going in to Kafka, which 
won't happen till at least Kafka 0.11 or put this is in right now.

I can definitely look at the latest Kafka 0.10 and see how likely it is that we 
are going to need a new module (probably by end of this week), if that'd help 
in our decision.

> Update KafkaDStreams to new Kafka 0.9 Consumer API
> --
>
> Key: SPARK-12177
> URL: https://issues.apache.org/jira/browse/SPARK-12177
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.6.0
>Reporter: Nikita Tarasenko
>  Labels: consumer, kafka
>
> Kafka 0.9 already released and it introduce new consumer API that not 
> compatible with old one. So, I added new consumer api. I made separate 
> classes in package org.apache.spark.streaming.kafka.v09 with changed API. I 
> didn't remove old classes for more backward compatibility. User will not need 
> to change his old spark applications when he uprgade to new Spark version.
> Please rewiew my changes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API

2016-05-02 Thread Cody Koeninger (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15267461#comment-15267461
 ] 

Cody Koeninger commented on SPARK-12177:


The only real selling point of 0.9 for Spark is security mechanisms, but those 
weren't fully baked as far as I can tell.  There might be some performance wins 
from prefetching, but I haven't noticed a significant difference in my limited 
testing.

0.10 is on RC2 currently, and should hopefully work with 0.9 brokers as well, 
so it makes sense to skip 0.9 client IMHO.

> Update KafkaDStreams to new Kafka 0.9 Consumer API
> --
>
> Key: SPARK-12177
> URL: https://issues.apache.org/jira/browse/SPARK-12177
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.6.0
>Reporter: Nikita Tarasenko
>  Labels: consumer, kafka
>
> Kafka 0.9 already released and it introduce new consumer API that not 
> compatible with old one. So, I added new consumer api. I made separate 
> classes in package org.apache.spark.streaming.kafka.v09 with changed API. I 
> didn't remove old classes for more backward compatibility. User will not need 
> to change his old spark applications when he uprgade to new Spark version.
> Please rewiew my changes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API

2016-05-02 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15267419#comment-15267419
 ] 

Sean Owen commented on SPARK-12177:
---

Is the idea here that the 0.8 client covers 0.8/0.9 brokers, but a new 
incompatible client is needed for 0.10 in any event? What is the value of the 
0.9 client -- whatever it adds effectively doesn't matter for Spark? It sounds 
good to get ready, at least, to accommodate another client. CC [~mgrover]

> Update KafkaDStreams to new Kafka 0.9 Consumer API
> --
>
> Key: SPARK-12177
> URL: https://issues.apache.org/jira/browse/SPARK-12177
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.6.0
>Reporter: Nikita Tarasenko
>  Labels: consumer, kafka
>
> Kafka 0.9 already released and it introduce new consumer API that not 
> compatible with old one. So, I added new consumer api. I made separate 
> classes in package org.apache.spark.streaming.kafka.v09 with changed API. I 
> didn't remove old classes for more backward compatibility. User will not need 
> to change his old spark applications when he uprgade to new Spark version.
> Please rewiew my changes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API

2016-05-02 Thread Reynold Xin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15267347#comment-15267347
 ] 

Reynold Xin commented on SPARK-12177:
-

Talked more with Cody offline. Summary of discussion:

Background information:

1. Kafka 0.8 client works with Kafka 0.8 and 0.9 broker.
2. Kafka 0.9 changes the client API.
3. Kafka 0.10, not yet released, will change the client API again.

Recommended steps:

1. Rename existing Kafka connector to include 0.8 version in the Maven project 
name.
2. Don't support 0.9 client for now (otherwise we will need to support 3 
clients very soon). We should revisit the 0.10 support in 2.1, and most likely 
support that.



> Update KafkaDStreams to new Kafka 0.9 Consumer API
> --
>
> Key: SPARK-12177
> URL: https://issues.apache.org/jira/browse/SPARK-12177
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.6.0
>Reporter: Nikita Tarasenko
>  Labels: consumer, kafka
>
> Kafka 0.9 already released and it introduce new consumer API that not 
> compatible with old one. So, I added new consumer api. I made separate 
> classes in package org.apache.spark.streaming.kafka.v09 with changed API. I 
> didn't remove old classes for more backward compatibility. User will not need 
> to change his old spark applications when he uprgade to new Spark version.
> Please rewiew my changes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API

2016-05-02 Thread Cody Koeninger (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15267313#comment-15267313
 ] 

Cody Koeninger commented on SPARK-12177:


Yeah.  My desired approach is in my PR, namely leave external/kafka
as-is, and create a new external/kafka-beta (name change TBD)
subproject.



> Update KafkaDStreams to new Kafka 0.9 Consumer API
> --
>
> Key: SPARK-12177
> URL: https://issues.apache.org/jira/browse/SPARK-12177
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.6.0
>Reporter: Nikita Tarasenko
>  Labels: consumer, kafka
>
> Kafka 0.9 already released and it introduce new consumer API that not 
> compatible with old one. So, I added new consumer api. I made separate 
> classes in package org.apache.spark.streaming.kafka.v09 with changed API. I 
> didn't remove old classes for more backward compatibility. User will not need 
> to change his old spark applications when he uprgade to new Spark version.
> Please rewiew my changes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API

2016-05-02 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15267273#comment-15267273
 ] 

Sean Owen commented on SPARK-12177:
---

I'm not keeping score here, but I suspect it's a question of what you, Mark G 
et al want to do. Is the solution two Kafka modules? 

> Update KafkaDStreams to new Kafka 0.9 Consumer API
> --
>
> Key: SPARK-12177
> URL: https://issues.apache.org/jira/browse/SPARK-12177
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.6.0
>Reporter: Nikita Tarasenko
>  Labels: consumer, kafka
>
> Kafka 0.9 already released and it introduce new consumer API that not 
> compatible with old one. So, I added new consumer api. I made separate 
> classes in package org.apache.spark.streaming.kafka.v09 with changed API. I 
> didn't remove old classes for more backward compatibility. User will not need 
> to change his old spark applications when he uprgade to new Spark version.
> Please rewiew my changes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API

2016-05-02 Thread Cody Koeninger (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15267158#comment-15267158
 ] 

Cody Koeninger commented on SPARK-12177:


Given Reynold's message about a code freeze for 2.0 in the coming weeks, can we 
get some direction from a committer as to whether this is still being 
considered for 2.0 ?

> Update KafkaDStreams to new Kafka 0.9 Consumer API
> --
>
> Key: SPARK-12177
> URL: https://issues.apache.org/jira/browse/SPARK-12177
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.6.0
>Reporter: Nikita Tarasenko
>  Labels: consumer, kafka
>
> Kafka 0.9 already released and it introduce new consumer API that not 
> compatible with old one. So, I added new consumer api. I made separate 
> classes in package org.apache.spark.streaming.kafka.v09 with changed API. I 
> didn't remove old classes for more backward compatibility. User will not need 
> to change his old spark applications when he uprgade to new Spark version.
> Please rewiew my changes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API

2016-04-26 Thread Cody Koeninger (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259536#comment-15259536
 ] 

Cody Koeninger commented on SPARK-12177:


Well, given that we have working implementations of rdd and dstream for 0.9
(0.10 really at this point), and structured streaming is still in design...



> Update KafkaDStreams to new Kafka 0.9 Consumer API
> --
>
> Key: SPARK-12177
> URL: https://issues.apache.org/jira/browse/SPARK-12177
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.6.0
>Reporter: Nikita Tarasenko
>  Labels: consumer, kafka
>
> Kafka 0.9 already released and it introduce new consumer API that not 
> compatible with old one. So, I added new consumer api. I made separate 
> classes in package org.apache.spark.streaming.kafka.v09 with changed API. I 
> didn't remove old classes for more backward compatibility. User will not need 
> to change his old spark applications when he uprgade to new Spark version.
> Please rewiew my changes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API

2016-04-26 Thread Mario Briggs (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259530#comment-15259530
 ] 

Mario Briggs commented on SPARK-12177:
--

what's the thinking on this one with respect to Structured Streaming? Is the 
thinking that Kafka0.9 be supported with the older Dstream API 
(KafkaUtils.createDirectStream) AND the newer structured streaming way of doing 
things ? or kafka 0.9 will only be supported only with the new structured 
streaming way? I am going to assume only [~tdas] or [~rxin] have an good idea 
on Structured streaming (sorry [~mgrover] , [~c...@koeninger.org] if i have 
insulted you :-) ), so appreciate if they can chime in. 

For my side i am assuming that 0.9 will be supported older DStream API as well. 
[~mgrover] howz it going merging cody's changes and making a new subproject.

> Update KafkaDStreams to new Kafka 0.9 Consumer API
> --
>
> Key: SPARK-12177
> URL: https://issues.apache.org/jira/browse/SPARK-12177
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.6.0
>Reporter: Nikita Tarasenko
>  Labels: consumer, kafka
>
> Kafka 0.9 already released and it introduce new consumer API that not 
> compatible with old one. So, I added new consumer api. I made separate 
> classes in package org.apache.spark.streaming.kafka.v09 with changed API. I 
> didn't remove old classes for more backward compatibility. User will not need 
> to change his old spark applications when he uprgade to new Spark version.
> Please rewiew my changes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API

2016-04-08 Thread Mark Grover (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15232885#comment-15232885
 ] 

Mark Grover commented on SPARK-12177:
-

Thanks Cody. I agree about the separate subproject and I will review the code 
in your PR. Thank you!

> Update KafkaDStreams to new Kafka 0.9 Consumer API
> --
>
> Key: SPARK-12177
> URL: https://issues.apache.org/jira/browse/SPARK-12177
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.6.0
>Reporter: Nikita Tarasenko
>  Labels: consumer, kafka
>
> Kafka 0.9 already released and it introduce new consumer API that not 
> compatible with old one. So, I added new consumer api. I made separate 
> classes in package org.apache.spark.streaming.kafka.v09 with changed API. I 
> didn't remove old classes for more backward compatibility. User will not need 
> to change his old spark applications when he uprgade to new Spark version.
> Please rewiew my changes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API

2016-04-08 Thread Cody Koeninger (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15232828#comment-15232828
 ] 

Cody Koeninger commented on SPARK-12177:


Ok, since SPARK-13877 has been rejected and we're keeping the Kafka dstreams in 
spark, I'd like to get this moving again.

I've done some basic throughput testing on my PR using 
kafka-producer-perf-test.sh to generate load, and after some tweaks the 
performance is comparable to the existing direct stream.  I've made sure my 
existing transactional / idempotent examples work with the new consumer.

I don't yet have a compelling need to move any of my production jobs to the new 
consumer, but it's at the point that I'd feel comfortable with other people 
testing it out.

Given the issues I've seen with 0.9 / 0.10 (e.g. KAFKA-3135), I'm 100% sure 
that we want this in a totally separate subproject from the existing dstream, 
which should be left at 0.8.

> Update KafkaDStreams to new Kafka 0.9 Consumer API
> --
>
> Key: SPARK-12177
> URL: https://issues.apache.org/jira/browse/SPARK-12177
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.6.0
>Reporter: Nikita Tarasenko
>  Labels: consumer, kafka
>
> Kafka 0.9 already released and it introduce new consumer API that not 
> compatible with old one. So, I added new consumer api. I made separate 
> classes in package org.apache.spark.streaming.kafka.v09 with changed API. I 
> didn't remove old classes for more backward compatibility. User will not need 
> to change his old spark applications when he uprgade to new Spark version.
> Please rewiew my changes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API

2016-03-21 Thread Cody Koeninger (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15204421#comment-15204421
 ] 

Cody Koeninger commented on SPARK-12177:


I made a PR with my changes for discussion's sake.

https://github.com/apache/spark/pull/11863


On Sun, Mar 20, 2016 at 10:01 PM, Eugene Miretsky (JIRA)


> Update KafkaDStreams to new Kafka 0.9 Consumer API
> --
>
> Key: SPARK-12177
> URL: https://issues.apache.org/jira/browse/SPARK-12177
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.6.0
>Reporter: Nikita Tarasenko
>  Labels: consumer, kafka
>
> Kafka 0.9 already released and it introduce new consumer API that not 
> compatible with old one. So, I added new consumer api. I made separate 
> classes in package org.apache.spark.streaming.kafka.v09 with changed API. I 
> didn't remove old classes for more backward compatibility. User will not need 
> to change his old spark applications when he uprgade to new Spark version.
> Please rewiew my changes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API

2016-03-21 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15204418#comment-15204418
 ] 

Apache Spark commented on SPARK-12177:
--

User 'koeninger' has created a pull request for this issue:
https://github.com/apache/spark/pull/11863

> Update KafkaDStreams to new Kafka 0.9 Consumer API
> --
>
> Key: SPARK-12177
> URL: https://issues.apache.org/jira/browse/SPARK-12177
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.6.0
>Reporter: Nikita Tarasenko
>  Labels: consumer, kafka
>
> Kafka 0.9 already released and it introduce new consumer API that not 
> compatible with old one. So, I added new consumer api. I made separate 
> classes in package org.apache.spark.streaming.kafka.v09 with changed API. I 
> didn't remove old classes for more backward compatibility. User will not need 
> to change his old spark applications when he uprgade to new Spark version.
> Please rewiew my changes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API

2016-03-20 Thread Eugene Miretsky (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15203662#comment-15203662
 ] 

Eugene Miretsky commented on SPARK-12177:
-

1) MessageAndMetaData: Was looking at KafkaRDD instead of NewKafkaRDD by 
mistake - my bad. 

2)Decoder/Serializer: DirectKafkaInputDStreamBase has U <: Decoder[K]: 
ClassTag,  T <: Decoder[V]: ClassTag type parameters.
-  NewDirectKafkaInputDStream ignores the Decoder TypeTags. (The old 
DirectKafkaInputDStream passes them to KafkaRDDIterator which then creates 
valueDecoder & keyDecoder which are then passed to MessageAndMetaData). 
- kafka.serializer.Decoder in part of the old Scala Kafka Consumer, The new 
Kafka Java Consumer is using 
org.apache.kafka.common.serialization.Deserializer. 

Just to make sure we are on the same page - I'm looking at PR 10953. Also may 
have missed something. 

> Update KafkaDStreams to new Kafka 0.9 Consumer API
> --
>
> Key: SPARK-12177
> URL: https://issues.apache.org/jira/browse/SPARK-12177
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.6.0
>Reporter: Nikita Tarasenko
>  Labels: consumer, kafka
>
> Kafka 0.9 already released and it introduce new consumer API that not 
> compatible with old one. So, I added new consumer api. I made separate 
> classes in package org.apache.spark.streaming.kafka.v09 with changed API. I 
> didn't remove old classes for more backward compatibility. User will not need 
> to change his old spark applications when he uprgade to new Spark version.
> Please rewiew my changes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API

2016-03-19 Thread Cody Koeninger (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15203027#comment-15203027
 ] 

Cody Koeninger commented on SPARK-12177:


Unless I'm misunderstanding your point, those changes are all in my fork
already. Keeping a message handler for messageandmetadata doesn't make
sense. Backwards compatibility with the existing direct stream isn't really
workable.



> Update KafkaDStreams to new Kafka 0.9 Consumer API
> --
>
> Key: SPARK-12177
> URL: https://issues.apache.org/jira/browse/SPARK-12177
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.6.0
>Reporter: Nikita Tarasenko
>  Labels: consumer, kafka
>
> Kafka 0.9 already released and it introduce new consumer API that not 
> compatible with old one. So, I added new consumer api. I made separate 
> classes in package org.apache.spark.streaming.kafka.v09 with changed API. I 
> didn't remove old classes for more backward compatibility. User will not need 
> to change his old spark applications when he uprgade to new Spark version.
> Please rewiew my changes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API

2016-03-19 Thread Eugene Miretsky (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15203018#comment-15203018
 ] 

Eugene Miretsky commented on SPARK-12177:
-

The new Kafka Java Consumer is using Deserializer instead of Decoder. The 
difference is not too big (extra type safety, and Deserializer::deserialize  
accepts a topic and a byte payload, while Decoder::fromBytes accepts only a 
byte payload), but still it would be nice to align with the new Kafka consumer. 
Would it make sense to replace Decoder with Deserializer in the new 
DirectStream? This would require getting rid of MessageAndMetadata, and hence 
breaking backwards compatibility with the existing  DirectStream, but I guess 
it will have to be done at some point. 


> Update KafkaDStreams to new Kafka 0.9 Consumer API
> --
>
> Key: SPARK-12177
> URL: https://issues.apache.org/jira/browse/SPARK-12177
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.6.0
>Reporter: Nikita Tarasenko
>  Labels: consumer, kafka
>
> Kafka 0.9 already released and it introduce new consumer API that not 
> compatible with old one. So, I added new consumer api. I made separate 
> classes in package org.apache.spark.streaming.kafka.v09 with changed API. I 
> didn't remove old classes for more backward compatibility. User will not need 
> to change his old spark applications when he uprgade to new Spark version.
> Please rewiew my changes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API

2016-03-19 Thread Cody Koeninger (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15197536#comment-15197536
 ] 

Cody Koeninger commented on SPARK-12177:


My fork is working at a very basic level for caching consumers, preferred 
locations, dynamic topicpartitions, and being able to commit offsets into 
kafka.  Taking a configured consumer also should allow some degree of control 
over offset generation policy, just by wrapping a consumer to return different 
values for assignment() or position().  Unit tests are passing but would need a 
lot more manual testing, I'm sure there are lots of rough edges.

Given the discussion in SPARK-13877 about moving kakfa integration to a 
separate repo, I'm going to hold off on any more work until that's decided.

> Update KafkaDStreams to new Kafka 0.9 Consumer API
> --
>
> Key: SPARK-12177
> URL: https://issues.apache.org/jira/browse/SPARK-12177
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.6.0
>Reporter: Nikita Tarasenko
>  Labels: consumer, kafka
>
> Kafka 0.9 already released and it introduce new consumer API that not 
> compatible with old one. So, I added new consumer api. I made separate 
> classes in package org.apache.spark.streaming.kafka.v09 with changed API. I 
> didn't remove old classes for more backward compatibility. User will not need 
> to change his old spark applications when he uprgade to new Spark version.
> Please rewiew my changes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API

2016-03-10 Thread Cody Koeninger (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15190103#comment-15190103
 ] 

Cody Koeninger commented on SPARK-12177:


There are a lot of things I'm really not happy with so far.  Given the variety 
of ways topic subscription works, I think we're better off with a user-provided 
zero-arg function to create a configured consumer for use on the driver, rather 
than taking kafka parameters and trying to wrap a consumer as I'm doing 
currently.

People are going to need some kind of access for committing offsets to kafka, 
but I think that can be handled by submitting finished offset ranges to a 
threadsafe queue that gets drained in compute()

So anyway, for the two or three people actually looking at my repo, expect 
changes ;)

> Update KafkaDStreams to new Kafka 0.9 Consumer API
> --
>
> Key: SPARK-12177
> URL: https://issues.apache.org/jira/browse/SPARK-12177
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.6.0
>Reporter: Nikita Tarasenko
>  Labels: consumer, kafka
>
> Kafka 0.9 already released and it introduce new consumer API that not 
> compatible with old one. So, I added new consumer api. I made separate 
> classes in package org.apache.spark.streaming.kafka.v09 with changed API. I 
> didn't remove old classes for more backward compatibility. User will not need 
> to change his old spark applications when he uprgade to new Spark version.
> Please rewiew my changes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API

2016-03-10 Thread Praveen Devarao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15189410#comment-15189410
 ] 

Praveen Devarao commented on SPARK-12177:
-

>>It's also probably better for Spark users if they don't blindly
cache or collect consumer records, because its a lot of wasted space (e.g.
topic name).<<

I agree

Thanks
Praveen

> Update KafkaDStreams to new Kafka 0.9 Consumer API
> --
>
> Key: SPARK-12177
> URL: https://issues.apache.org/jira/browse/SPARK-12177
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.6.0
>Reporter: Nikita Tarasenko
>  Labels: consumer, kafka
>
> Kafka 0.9 already released and it introduce new consumer API that not 
> compatible with old one. So, I added new consumer api. I made separate 
> classes in package org.apache.spark.streaming.kafka.v09 with changed API. I 
> didn't remove old classes for more backward compatibility. User will not need 
> to change his old spark applications when he uprgade to new Spark version.
> Please rewiew my changes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API

2016-03-10 Thread Cody Koeninger (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15189325#comment-15189325
 ] 

Cody Koeninger commented on SPARK-12177:


Clearly K and V are serializable somehow, because they were in a byte array
in a Kafka message. So it's not really far fetched to expect a thin wrapper
around K and V to also be serializable, and it would be convenient from a
Spark perspective.

That being said, I agree with you that the Kafka project isn't likely to go
for it. It's also probably better for Spark users if they don't blindly
cache or collect consumer records, because its a lot of wasted space (e.g.
topic name). But people are going to be surprised the first the they try
and it doesn't work, so I wanted to mention it.



> Update KafkaDStreams to new Kafka 0.9 Consumer API
> --
>
> Key: SPARK-12177
> URL: https://issues.apache.org/jira/browse/SPARK-12177
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.6.0
>Reporter: Nikita Tarasenko
>  Labels: consumer, kafka
>
> Kafka 0.9 already released and it introduce new consumer API that not 
> compatible with old one. So, I added new consumer api. I made separate 
> classes in package org.apache.spark.streaming.kafka.v09 with changed API. I 
> didn't remove old classes for more backward compatibility. User will not need 
> to change his old spark applications when he uprgade to new Spark version.
> Please rewiew my changes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API

2016-03-09 Thread Praveen Devarao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15188761#comment-15188761
 ] 

Praveen Devarao commented on SPARK-12177:
-

Hi Cody,

The last time when I got TopicParition and OffsetAndMetadata class made 
serializable, the argument was that these classes are used by end-users and are 
metadata class which would be needed for checkpoint purpose.

As for ConsumerRecord,  this class is meant to hold the actual data and would 
usually be not needed for checkpoint purpose...if we need the data we can 
always go to respective offset in respective topic from respective partition. 
Also, the ConsumerRecord class has members which are of generic type (K and V) 
so really the serialization depends on what type of object is flowed in by the 
user and if that is serializable.

Given this, From Kafka perspective not sure how we can reason why would one 
want to mark this class as serializable.

Thanks

Praveen

> Update KafkaDStreams to new Kafka 0.9 Consumer API
> --
>
> Key: SPARK-12177
> URL: https://issues.apache.org/jira/browse/SPARK-12177
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.6.0
>Reporter: Nikita Tarasenko
>  Labels: consumer, kafka
>
> Kafka 0.9 already released and it introduce new consumer API that not 
> compatible with old one. So, I added new consumer api. I made separate 
> classes in package org.apache.spark.streaming.kafka.v09 with changed API. I 
> didn't remove old classes for more backward compatibility. User will not need 
> to change his old spark applications when he uprgade to new Spark version.
> Please rewiew my changes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API

2016-03-09 Thread Cody Koeninger (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15187847#comment-15187847
 ] 

Cody Koeninger commented on SPARK-12177:


Anybody want to volunteer to fight the good fight with convincing the Kafka 
project to make ConsumerRecord serializable?  Given that TopicPartition change 
took over a month, I don't have high hopes before 0.10 is released.

I don't think it's a dealbreaker for us if the KafkaRDD is an 
RDD[ConsumerRecord], it just means users are going to have to map() over it or 
whatever.  Attempts to collect it directly if it isn't serializable are going 
to fail.

> Update KafkaDStreams to new Kafka 0.9 Consumer API
> --
>
> Key: SPARK-12177
> URL: https://issues.apache.org/jira/browse/SPARK-12177
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.6.0
>Reporter: Nikita Tarasenko
>  Labels: consumer, kafka
>
> Kafka 0.9 already released and it introduce new consumer API that not 
> compatible with old one. So, I added new consumer api. I made separate 
> classes in package org.apache.spark.streaming.kafka.v09 with changed API. I 
> didn't remove old classes for more backward compatibility. User will not need 
> to change his old spark applications when he uprgade to new Spark version.
> Please rewiew my changes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API

2016-03-09 Thread Cody Koeninger (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15187613#comment-15187613
 ] 

Cody Koeninger commented on SPARK-12177:


If anyone wants to take a look at the stuff I'm hacking on, it's at
https://github.com/koeninger/spark-1/tree/kafka-0.9/external/kafka-beta/src/main/scala/org/apache/spark/streaming/kafka
usage e.g.
https://github.com/koeninger/kafka-exactly-once/blob/kafka-0.9/src/main/scala/example/BasicStream.scala

Caching and preferred locations is sort of working but could clearly be 
improved.  Dynamic topic subscriptions are, uh, interesting to say the least.

> Update KafkaDStreams to new Kafka 0.9 Consumer API
> --
>
> Key: SPARK-12177
> URL: https://issues.apache.org/jira/browse/SPARK-12177
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.6.0
>Reporter: Nikita Tarasenko
>  Labels: consumer, kafka
>
> Kafka 0.9 already released and it introduce new consumer API that not 
> compatible with old one. So, I added new consumer api. I made separate 
> classes in package org.apache.spark.streaming.kafka.v09 with changed API. I 
> didn't remove old classes for more backward compatibility. User will not need 
> to change his old spark applications when he uprgade to new Spark version.
> Please rewiew my changes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API

2016-03-08 Thread Mark Grover (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185461#comment-15185461
 ] 

Mark Grover commented on SPARK-12177:
-

For a) I think it's a larger discussion, that is relevant to not kafka - it'd 
be good for Spark to have a policy on far back does it want to support various 
versions and how that changes for major vs. minor releases of Spark.

For b) there is this PR: https://github.com/apache/spark/pull/10953 and Cody is 
working on the LRU caching like he said and here's the relevant email on the 
Kafka mailing list:
http://apache-spark-developers-list.1001551.n3.nabble.com/Upgrading-to-Kafka-0-9-x-td16466.html

If you'd like to review the PR, that'd be appreciated. Thanks!

> Update KafkaDStreams to new Kafka 0.9 Consumer API
> --
>
> Key: SPARK-12177
> URL: https://issues.apache.org/jira/browse/SPARK-12177
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.6.0
>Reporter: Nikita Tarasenko
>  Labels: consumer, kafka
>
> Kafka 0.9 already released and it introduce new consumer API that not 
> compatible with old one. So, I added new consumer api. I made separate 
> classes in package org.apache.spark.streaming.kafka.v09 with changed API. I 
> didn't remove old classes for more backward compatibility. User will not need 
> to change his old spark applications when he uprgade to new Spark version.
> Please rewiew my changes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API

2016-03-07 Thread Cody Koeninger (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15184387#comment-15184387
 ] 

Cody Koeninger commented on SPARK-12177:


I've been hacking on a simple lru cache for consumers and preferred
locations to take advantage of it. Will update here if it works out.

There are some things about the new consumer that make it awkward for this
purpose, mentioned on Kafka dev list but no real response



> Update KafkaDStreams to new Kafka 0.9 Consumer API
> --
>
> Key: SPARK-12177
> URL: https://issues.apache.org/jira/browse/SPARK-12177
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.6.0
>Reporter: Nikita Tarasenko
>  Labels: consumer, kafka
>
> Kafka 0.9 already released and it introduce new consumer API that not 
> compatible with old one. So, I added new consumer api. I made separate 
> classes in package org.apache.spark.streaming.kafka.v09 with changed API. I 
> didn't remove old classes for more backward compatibility. User will not need 
> to change his old spark applications when he uprgade to new Spark version.
> Please rewiew my changes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API

2016-03-07 Thread Mansi Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15184346#comment-15184346
 ] 

Mansi Shah commented on SPARK-12177:


So looks like we are dealing with two independent issues here - (a) version 
support. (b) 0.9.0 plugin design. Should we at least start hashing out the 
design for the new consumer and then we can see where it fits. Do the concerned 
folks still want to get on a call to figure out the design?

> Update KafkaDStreams to new Kafka 0.9 Consumer API
> --
>
> Key: SPARK-12177
> URL: https://issues.apache.org/jira/browse/SPARK-12177
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.6.0
>Reporter: Nikita Tarasenko
>  Labels: consumer, kafka
>
> Kafka 0.9 already released and it introduce new consumer API that not 
> compatible with old one. So, I added new consumer api. I made separate 
> classes in package org.apache.spark.streaming.kafka.v09 with changed API. I 
> didn't remove old classes for more backward compatibility. User will not need 
> to change his old spark applications when he uprgade to new Spark version.
> Please rewiew my changes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API

2016-03-02 Thread Mark Grover (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15177315#comment-15177315
 ] 

Mark Grover commented on SPARK-12177:
-

One more thing as a potential con for Proposal 1:
There are places that have to use the kafka artifact. 'examples' subproject is 
a good example of that. The subproject pulls kafka artifact as a dependency and 
has example for Kafka usage. However, it can't depend on the new 
implementation's artifact at the same time because they depend on different 
versions of kafka. Therefore, unless I am missing something, new 
implementation's example can't go there. 

And, that's fine, we can put it within the subproject itself, instead of 
examples, but that won't necessarily work with tooling like run-example, etc.

> Update KafkaDStreams to new Kafka 0.9 Consumer API
> --
>
> Key: SPARK-12177
> URL: https://issues.apache.org/jira/browse/SPARK-12177
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.6.0
>Reporter: Nikita Tarasenko
>  Labels: consumer, kafka
>
> Kafka 0.9 already released and it introduce new consumer API that not 
> compatible with old one. So, I added new consumer api. I made separate 
> classes in package org.apache.spark.streaming.kafka.v09 with changed API. I 
> didn't remove old classes for more backward compatibility. User will not need 
> to change his old spark applications when he uprgade to new Spark version.
> Please rewiew my changes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API

2016-03-02 Thread Mark Grover (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15176225#comment-15176225
 ] 

Mark Grover commented on SPARK-12177:
-

Let me clarify what I was saying:
There are 2 axes here - one is the new/old consumer API and other is the 
support for Kafka v0.8 and v0.9. Both Kafka v0.8 and v0.9 provide the old API, 
only v0.9 provides the new API.

bq. The fact that the 0.9 consumer is still considered beta by the Kafka 
project and that things are going to change in 0.10 is an argument for keeping 
the existing implementation as it is, not an argument for throwing it away 
prematurely. 
I totally agree with you, Cody, that the old API implementation is bug free and 
I am definitely not proposing to throw away that implementation. My proposal is 
that both the old implementation and the new will rely on depend against the 
same version of Kafka - that being 0.9.

Based on what I now understand (and please correct me if I am wrong), I think 
what you are proposing is:
Proposal 1:
2 subprojects - one with old implementation and one with new. The 'old' 
subproject will be built against Kafka 0.8, and will have it's own assembly and 
the new subproject will use the new API and will be built against Kafka 0.9 and 
will have it's own assembly.

And, what I am proposing is:
Proposal 2:
2 subprojects - one with old implementation and one with new. Both the 
implementations will be built against Kafka 0.9, they both end up in one single 
Kafka assembly artifact.

Pro of Proposal 1 is that folks who want to use the old implementation with 
Kafka 0.8 brokers can use it, without upgrading their brokers. Con of proposal 
1 is that it doesn't allow for re-use of any code between the old and new 
implementation. This can be a good thing if we don't want to share any code in 
the new implementation but there is a definitely a bunch of test code that I 
think, it'd be good to share.

Pro of Proposal 2 - test code, etc. can be shared, there will be a single 
artifact that folks would need to run the old direct stream implementation or 
the new one.
The con is, of course, that folks would have to upgrade their brokers to Kafka 
0.9, if they want to use Spark 2.0.

> Update KafkaDStreams to new Kafka 0.9 Consumer API
> --
>
> Key: SPARK-12177
> URL: https://issues.apache.org/jira/browse/SPARK-12177
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.6.0
>Reporter: Nikita Tarasenko
>  Labels: consumer, kafka
>
> Kafka 0.9 already released and it introduce new consumer API that not 
> compatible with old one. So, I added new consumer api. I made separate 
> classes in package org.apache.spark.streaming.kafka.v09 with changed API. I 
> didn't remove old classes for more backward compatibility. User will not need 
> to change his old spark applications when he uprgade to new Spark version.
> Please rewiew my changes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API

2016-03-02 Thread Cody Koeninger (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15176037#comment-15176037
 ] 

Cody Koeninger commented on SPARK-12177:


How is it a huge hassle to keep the known working implementation in a separate 
subproject from the new beta consumer?  Any of the kafka version churn hassle 
would be associated with the new subproject.  Work on the existing project 
would pretty much be limited to changes in Spark that were (somehow?) 
incompatible with it, or bugfixes.  I'm not saying that existing code is 
bug-free, but we've had 4 point revisions of spark with relatively little 
change to it.

The fact that the 0.9 consumer is still considered beta by the Kafka project 
and that things are going to change in 0.10 is an argument for keeping the 
existing implementation as it is, not an argument for throwing it away 
prematurely.

> Update KafkaDStreams to new Kafka 0.9 Consumer API
> --
>
> Key: SPARK-12177
> URL: https://issues.apache.org/jira/browse/SPARK-12177
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.6.0
>Reporter: Nikita Tarasenko
>  Labels: consumer, kafka
>
> Kafka 0.9 already released and it introduce new consumer API that not 
> compatible with old one. So, I added new consumer api. I made separate 
> classes in package org.apache.spark.streaming.kafka.v09 with changed API. I 
> didn't remove old classes for more backward compatibility. User will not need 
> to change his old spark applications when he uprgade to new Spark version.
> Please rewiew my changes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API

2016-03-02 Thread Mark Grover (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15175999#comment-15175999
 ] 

Mark Grover commented on SPARK-12177:
-

I think the core of the question is a much broader Spark question - how many 
past versions to support?

To add some more color to the question at hand, Kafka has already decided that 
the [next version of Kafka will be 
0.10.0|https://github.com/apache/kafka/commit/b084c485e25bfe77154e805219b24714d59c396c]
 (instead of 0.9.1) and this next version will have yet another protocol 
change. So, where do we go on from there? Supporting Kafka 0.8, 0.9 and 0.10.0 
in 2.x?

I still think Spark 2.0 is a good time to drop support for Kafka 0.8.x. Other 
projects are doing it, that too, in their minor releases (links to Flume and 
Storm JIRAs are on the PR) and Kafka is moving fast with protocol changes in 
every new non-maintenance release and it will become a huge hassle to keep up 
with all the past releases.

> Update KafkaDStreams to new Kafka 0.9 Consumer API
> --
>
> Key: SPARK-12177
> URL: https://issues.apache.org/jira/browse/SPARK-12177
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.6.0
>Reporter: Nikita Tarasenko
>  Labels: consumer, kafka
>
> Kafka 0.9 already released and it introduce new consumer API that not 
> compatible with old one. So, I added new consumer api. I made separate 
> classes in package org.apache.spark.streaming.kafka.v09 with changed API. I 
> didn't remove old classes for more backward compatibility. User will not need 
> to change his old spark applications when he uprgade to new Spark version.
> Please rewiew my changes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API

2016-03-02 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15175466#comment-15175466
 ] 

Sean Owen commented on SPARK-12177:
---

I favor updating to support only 0.9 in 2.0.0. My general stance is that nobody 
is forced to update to Kafka 0.9 since nobody is forced to stop using Spark 
1.x. Major releases are exactly for this kind of upgrade. It's also why I 
personally feel pretty strongly that Java 7 should not be supported, but that's 
a separate question.

> Update KafkaDStreams to new Kafka 0.9 Consumer API
> --
>
> Key: SPARK-12177
> URL: https://issues.apache.org/jira/browse/SPARK-12177
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.6.0
>Reporter: Nikita Tarasenko
>  Labels: consumer, kafka
>
> Kafka 0.9 already released and it introduce new consumer API that not 
> compatible with old one. So, I added new consumer api. I made separate 
> classes in package org.apache.spark.streaming.kafka.v09 with changed API. I 
> didn't remove old classes for more backward compatibility. User will not need 
> to change his old spark applications when he uprgade to new Spark version.
> Please rewiew my changes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API

2016-03-01 Thread Reynold Xin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15175066#comment-15175066
 ] 

Reynold Xin commented on SPARK-12177:
-

This thread is getting to long for me to follow, but my instinct is that maybe 
we should have two subprojects and support both.



> Update KafkaDStreams to new Kafka 0.9 Consumer API
> --
>
> Key: SPARK-12177
> URL: https://issues.apache.org/jira/browse/SPARK-12177
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.6.0
>Reporter: Nikita Tarasenko
>  Labels: consumer, kafka
>
> Kafka 0.9 already released and it introduce new consumer API that not 
> compatible with old one. So, I added new consumer api. I made separate 
> classes in package org.apache.spark.streaming.kafka.v09 with changed API. I 
> didn't remove old classes for more backward compatibility. User will not need 
> to change his old spark applications when he uprgade to new Spark version.
> Please rewiew my changes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API

2016-03-01 Thread Mark Grover (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15174856#comment-15174856
 ] 

Mark Grover commented on SPARK-12177:
-

Hi [~tdas] and [~rxin], can you help us with your opinion on these questions, 
so we can unblock this work:
1. Should we support both Kafka 0.8 and 0.9 or just 0.9? The pros and cons are 
listed [here|https://github.com/apache/spark/pull/11143#issuecomment-182154267] 
along with what other projects are doing.
2. Should we make a separate project for the implementation using the new kafka 
consumer API with the same class names (e.g. KafkaRDD, etc.), or create new 
classes like hadoop did, in the same subproject e.g. NewKafkaRDD, etc.

Thanks!

> Update KafkaDStreams to new Kafka 0.9 Consumer API
> --
>
> Key: SPARK-12177
> URL: https://issues.apache.org/jira/browse/SPARK-12177
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.6.0
>Reporter: Nikita Tarasenko
>  Labels: consumer, kafka
>
> Kafka 0.9 already released and it introduce new consumer API that not 
> compatible with old one. So, I added new consumer api. I made separate 
> classes in package org.apache.spark.streaming.kafka.v09 with changed API. I 
> didn't remove old classes for more backward compatibility. User will not need 
> to change his old spark applications when he uprgade to new Spark version.
> Please rewiew my changes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API

2016-03-01 Thread Mansi Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15174534#comment-15174534
 ] 

Mansi Shah commented on SPARK-12177:


Thanks for the explanation Cody. 

I understand committing from within Spark can get ugly. So in spark streaming 
is there a way for the executors to communicate back with the driver? I think I 
am flexible if we cant get this extra optimization or maybe we can hack 
something up internally in our system to make this problem easier. 

But let us definitely design for the long living consumers on the executors. 

> Update KafkaDStreams to new Kafka 0.9 Consumer API
> --
>
> Key: SPARK-12177
> URL: https://issues.apache.org/jira/browse/SPARK-12177
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.6.0
>Reporter: Nikita Tarasenko
>  Labels: consumer, kafka
>
> Kafka 0.9 already released and it introduce new consumer API that not 
> compatible with old one. So, I added new consumer api. I made separate 
> classes in package org.apache.spark.streaming.kafka.v09 with changed API. I 
> didn't remove old classes for more backward compatibility. User will not need 
> to change his old spark applications when he uprgade to new Spark version.
> Please rewiew my changes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API

2016-03-01 Thread Cody Koeninger (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15174507#comment-15174507
 ] 

Cody Koeninger commented on SPARK-12177:


Thanks for the example of performance numbers.

The direct stream RDD batch sizes are, by default, "whatever's left in kafka".  
The backpressure and maximum limits on batch sizes are in terms of messages not 
bytes, because that's easy to calculate with on the driver without having read 
messages.  You can tune it for your app pretty straightforwardly, as long as 
you don't have a lot of tiny messages followed by a lot of huge messages in the 
same topic.

Doing kafka offset commits on the executor without user intervention opens up a 
whole other can of worms, I'd prefer to avoid that if at all possible.

> Update KafkaDStreams to new Kafka 0.9 Consumer API
> --
>
> Key: SPARK-12177
> URL: https://issues.apache.org/jira/browse/SPARK-12177
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.6.0
>Reporter: Nikita Tarasenko
>  Labels: consumer, kafka
>
> Kafka 0.9 already released and it introduce new consumer API that not 
> compatible with old one. So, I added new consumer api. I made separate 
> classes in package org.apache.spark.streaming.kafka.v09 with changed API. I 
> didn't remove old classes for more backward compatibility. User will not need 
> to change his old spark applications when he uprgade to new Spark version.
> Please rewiew my changes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API

2016-03-01 Thread Mansi Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15174490#comment-15174490
 ] 

Mansi Shah commented on SPARK-12177:


Sorry I forgot to mention that the numbers I quoted above were run on a kafka 
topic with 100 million records of 100 bytes each. 

> Update KafkaDStreams to new Kafka 0.9 Consumer API
> --
>
> Key: SPARK-12177
> URL: https://issues.apache.org/jira/browse/SPARK-12177
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.6.0
>Reporter: Nikita Tarasenko
>  Labels: consumer, kafka
>
> Kafka 0.9 already released and it introduce new consumer API that not 
> compatible with old one. So, I added new consumer api. I made separate 
> classes in package org.apache.spark.streaming.kafka.v09 with changed API. I 
> didn't remove old classes for more backward compatibility. User will not need 
> to change his old spark applications when he uprgade to new Spark version.
> Please rewiew my changes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API

2016-03-01 Thread Mansi Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15174485#comment-15174485
 ] 

Mansi Shah commented on SPARK-12177:


Caching the new consumer across batches. I did not know how to do this with the 
current implementation of spark plugin, so I ran a simple experiment that just 
reads messages from kafka in batches. So say for a batch that has 100K record - 
a cached consumer model finishes in 68.3 secs but its non-cached counter part 
where we recreate the consumer for every batch takes 165.9 secs. This gets way 
worse as the batch sizes change. Now if we make the batch sizes restrictive and 
throw out messages beyond the until offset then the same job takes 180 secs. 

I had one unrelated question - how come RDD batch sizes do not have anything to 
do with the actual message size. If the batch size is fixed at 100K records - 
wouldn't that mean completely different things for messages that are 100 bytes 
worse 1M? Just in terms of pure memory pressure? Do you have any thoughts on 
that?

Btw the solution where we cache the records in the executer will unfortunately 
not work for us, as our system is truely distributed and a topic partition does 
not always reside on the same node like kafka. I mean we can do some predefined 
static allocation using getPreferredLocations but that means we can never use 
locality and always have to use this static assignment even if the spark 
cluster is colocated with mapr cluster. That is where I was suggesting the 
offsets being communicated back to driver. We do not need to do this explicitly 
- just doing a kafka commit on the executor should take care of this. 



> Update KafkaDStreams to new Kafka 0.9 Consumer API
> --
>
> Key: SPARK-12177
> URL: https://issues.apache.org/jira/browse/SPARK-12177
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.6.0
>Reporter: Nikita Tarasenko
>  Labels: consumer, kafka
>
> Kafka 0.9 already released and it introduce new consumer API that not 
> compatible with old one. So, I added new consumer api. I made separate 
> classes in package org.apache.spark.streaming.kafka.v09 with changed API. I 
> didn't remove old classes for more backward compatibility. User will not need 
> to change his old spark applications when he uprgade to new Spark version.
> Please rewiew my changes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API

2016-03-01 Thread Cody Koeninger (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15174415#comment-15174415
 ] 

Cody Koeninger commented on SPARK-12177:


Mansi are you talking about performance improvements from caching the new 
consumer, or from caching the old simple consumer?  I had a branch of direct 
stream that cached the old simple consumer, but in my testing it didn't make 
enough of a difference to be worth the added complexity.

Regarding throwing out messages beyond the untilOffset, I agree, that's why I'm 
saying the iterator would need to be redesigned.  I don't think it needs to 
communicate back to the driver, it needs to cache those messages locally, then 
we can (hopefully) use getPreferredLocations to encourage future requests for 
that partition to be scheduled on that executor.

> Update KafkaDStreams to new Kafka 0.9 Consumer API
> --
>
> Key: SPARK-12177
> URL: https://issues.apache.org/jira/browse/SPARK-12177
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.6.0
>Reporter: Nikita Tarasenko
>  Labels: consumer, kafka
>
> Kafka 0.9 already released and it introduce new consumer API that not 
> compatible with old one. So, I added new consumer api. I made separate 
> classes in package org.apache.spark.streaming.kafka.v09 with changed API. I 
> didn't remove old classes for more backward compatibility. User will not need 
> to change his old spark applications when he uprgade to new Spark version.
> Please rewiew my changes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API

2016-03-01 Thread Mansi Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15174392#comment-15174392
 ] 

Mansi Shah commented on SPARK-12177:


Cody / Mark 

I am glad you are discussing about caching the consumer on executors instead of 
creating them for every batch. I am not super familiar with Spark, but I am 
coming at this from my Kafka experience. I have been running some performance 
experiments and I see that if the consumer is not recreated on every batch, 
then depending on the batch size we get between 10x to 2x improvements. I will 
be happy to share the details of my experiments and the actual numbers. I am 
guessing this is mostly happening because of metadata lookup that are lost when 
the consumer is closed. I actually work on MapR Streams which is a kafka API 
compliant implementation of message streams and at least for us this is has 
huge performance implications. 

I agree that just massaging the 0.8.2 code to work against 0.9.0 is a really 
bad idea since the whole message fetch architecture itself is different for 
0.9.0. 

Also if we do not throw out messages that were returned on poll because they 
fall beyond the "until" offset and make the rdd elastic instead, that means we 
do not read some messages twice from kafka. I do not know if this last one is 
even  possible - since it will mean that the executor has to communicate back 
with the driver the actual offset of the last message read, but if we can make 
it happen that also gives some gains. 

I will be super happy to jump on a call and iron out details also more than 
happy to contribute in anyway possible. Btw if calls are frowned upon - maybe 
we can just add an update here with detailed minutes of the call - so that 
there is no lost trail?





> Update KafkaDStreams to new Kafka 0.9 Consumer API
> --
>
> Key: SPARK-12177
> URL: https://issues.apache.org/jira/browse/SPARK-12177
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.6.0
>Reporter: Nikita Tarasenko
>  Labels: consumer, kafka
>
> Kafka 0.9 already released and it introduce new consumer API that not 
> compatible with old one. So, I added new consumer api. I made separate 
> classes in package org.apache.spark.streaming.kafka.v09 with changed API. I 
> didn't remove old classes for more backward compatibility. User will not need 
> to change his old spark applications when he uprgade to new Spark version.
> Please rewiew my changes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API

2016-03-01 Thread Cody Koeninger (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15174212#comment-15174212
 ] 

Cody Koeninger commented on SPARK-12177:


I'm happy to help in whatever way.  If people think a call makes sense, I can 
do that (although I've seen public complaints about issues being discussed on 
calls instead of mailing lists or jira).

> Update KafkaDStreams to new Kafka 0.9 Consumer API
> --
>
> Key: SPARK-12177
> URL: https://issues.apache.org/jira/browse/SPARK-12177
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.6.0
>Reporter: Nikita Tarasenko
>  Labels: consumer, kafka
>
> Kafka 0.9 already released and it introduce new consumer API that not 
> compatible with old one. So, I added new consumer api. I made separate 
> classes in package org.apache.spark.streaming.kafka.v09 with changed API. I 
> didn't remove old classes for more backward compatibility. User will not need 
> to change his old spark applications when he uprgade to new Spark version.
> Please rewiew my changes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API

2016-03-01 Thread Mark Grover (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15174149#comment-15174149
 ] 

Mark Grover commented on SPARK-12177:
-

Thanks Cody, I appreciate your thoughts. I have been keeping most of my 
commentary on the PRs but I will post some parts of it here for the sake of 
argument.

bq. No one (as far as I can tell) is actually doing integration testing of 
these existing PRs using the new kafka security features.
We need actual manual integration testing and benchmarking, ideally with 
production loads.
Agreed. The code in [my PR for the new security 
API|https://github.com/apache/spark/pull/10953/files] was integration tested by 
me against a distributed Kafka and ZK cluster, albeit manually. Working on 
adding automated integration tests is on my list of things to do, however, that 
PR is bit rotting because it's blocked by [the 
PR|https://github.com/apache/spark/pull/11143] to upgrade Kafka to 0.9.0.

Your comment about caching consumers on executors is an excellent one. I 
haven't invested much time there because the way I was thinking of doing this 
was in several steps:
1. Upgrade Kafka to 0.9 (with or without 0.8 support, pending decision on 
https://github.com/apache/spark/pull/11143)
2. Add support for the new consumer API 
(https://github.com/apache/spark/pull/10953/files)
3. Add Kerberos/SASL support for authentication and SSL support for encryption 
over wire. This work is blocked until delegation token support is added in 
Kafka (https://issues.apache.org/jira/browse/KAFKA-1696). I have been following 
that design discussion closely Kafka mailing list.

Thanks for sharing your preference. I understand where you are coming from, and 
think that's reasonable. I had gotten feedback to the contrary on [this 
PR|https://github.com/apache/spark/pull/10953/files] so I changed my original 
implementation which had separate subprojects, to all be in the same project. I 
don't mind changing it back, especially if we are going to keep 0.8 support.

Related to not hiding the fact that the consumer is new, is concerned: I agree 
with you, KafkaUtils, for example has exposed TopicAndPartition, 
MessageAndMetadata classes. And, I think we may have to expose their new API 
equivalent TopicPartition and ConsumerRecord in KafkaUtils.

In any case, I'd appreciate your help in moving this forward. I think the first 
step is to come to a resolution on https://github.com/apache/spark/pull/11143. 
Perhaps you, I, [~tdas] and anyone else who's interested could get on a call to 
sort this out? I will post the call details here so anyone would be able to 
join in. Other methods of communication work too, my goal is to move that 
conversation forward.

> Update KafkaDStreams to new Kafka 0.9 Consumer API
> --
>
> Key: SPARK-12177
> URL: https://issues.apache.org/jira/browse/SPARK-12177
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.6.0
>Reporter: Nikita Tarasenko
>  Labels: consumer, kafka
>
> Kafka 0.9 already released and it introduce new consumer API that not 
> compatible with old one. So, I added new consumer api. I made separate 
> classes in package org.apache.spark.streaming.kafka.v09 with changed API. I 
> didn't remove old classes for more backward compatibility. User will not need 
> to change his old spark applications when he uprgade to new Spark version.
> Please rewiew my changes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API

2016-03-01 Thread Cody Koeninger (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15174092#comment-15174092
 ] 

Cody Koeninger commented on SPARK-12177:


My thoughts so far

Must-haves:

- The major new feature of the kafka 0.9 consumer that may justify upgrading is 
security.

- No one (as far as I can tell) is actually doing integration testing of these 
existing PRs using the new kafka security features.
We need actual manual integration testing and benchmarking, ideally with 
production loads.

- Actually making security features usable at low latencies is probably going 
to require caching consumers on the executors, not just the driver.
The current direct stream managed without caching, but you don't want to be 
doing handshaking/getting a new token/whatever every XXX ms batch.
This is going to require redesigning the iterator and getPreferredLocations.

- Even if handshaking doesn't turn out to be that bad, getting full benefit of 
the new consumer's pipelining will require caching it between batches.

Nice to haves that should be considered up front:

- There are some long standing minor features that would be easier to do with 
the new consumer
(dynamic topicpartitions, making it easier to control offsetrange generation 
policy in general, making it easier to commit offsets into kafka)

- Those minor features are going to require minor redesign of the Dstream, and 
the interface for creating it.

>From what I can tell, the existing PRs and commentary on them don't address 
>any of this (don't get me wrong, I appreciate the work).
Until that stuff is addressed, I think contemplating replacing the existing 
consumers is premature.

Now that I've got a production 0.9 cluster going, I'm certainly willing to help 
work on it.
But it doesn't seem like there's consensus (or even committer fiat, which I'd 
also be fine with) on the approach for dealing with 0.8 vs 0.9.

My strong preference on the bikeshed's color:

- Leave the existing 0.8 consumer integration subproject exactly as is.
It's known working, should still work with 0.9 brokers, and the maintenance (if 
any) necessary to make it work with spark 2.0 should be minor.
Changing the existing consumer to factor out functionality makes it harder to 
verify that the PR isn't breaking something, so don't do it.
The substantive new features would be hard / impossible to backport, so don't 
do it.

- Make an entirely new subproject, with a differently-named artifact, for the 
0.9 consumer integration.
Don't name all the packages or classes in the new subproject NewKafkaWhatever, 
it's unnecessary.
Because of broker differences, someone's not going to be likely to use both 
artifacts in one project.
Different signatures will make it fail at compile time if they use the wrong 
one.

- Don't copy everything wholesale from the existing subproject, only copy those 
pieces of the direct stream that still make sense, and modify them.
Cleanly factoring out common functionality from the existing subproject would 
be awkward given the actual differences in the consumer.
A lot of the existing pieces don't make sense at all for the new consumer 
(KafkaCluster as a wrapper should just go away, for instance)

- Don't try to hide the fact that the consumer is different from end users.
It really is different, and exposing that will give them more power (e.g. topic 
subscription by pattern).

That's just my 2 cents, I'm willing to help out regardless of the direction 
taken, but I think we need some committer weigh in on direction, especially 
given there are now 4 separate PRs relating to this issue.  [~tdas] 
[~srowen][~vanzin] any thoughts?


> Update KafkaDStreams to new Kafka 0.9 Consumer API
> --
>
> Key: SPARK-12177
> URL: https://issues.apache.org/jira/browse/SPARK-12177
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.6.0
>Reporter: Nikita Tarasenko
>  Labels: consumer, kafka
>
> Kafka 0.9 already released and it introduce new consumer API that not 
> compatible with old one. So, I added new consumer api. I made separate 
> classes in package org.apache.spark.streaming.kafka.v09 with changed API. I 
> didn't remove old classes for more backward compatibility. User will not need 
> to change his old spark applications when he uprgade to new Spark version.
> Please rewiew my changes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API

2016-02-08 Thread Rama Mullapudi (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15137029#comment-15137029
 ] 

Rama Mullapudi commented on SPARK-12177:


Does the update include kerberos support, since 0.9 producers and consumers now 
support kerberos (SASL) and ssl. 

> Update KafkaDStreams to new Kafka 0.9 Consumer API
> --
>
> Key: SPARK-12177
> URL: https://issues.apache.org/jira/browse/SPARK-12177
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.6.0
>Reporter: Nikita Tarasenko
>  Labels: consumer, kafka
>
> Kafka 0.9 already released and it introduce new consumer API that not 
> compatible with old one. So, I added new consumer api. I made separate 
> classes in package org.apache.spark.streaming.kafka.v09 with changed API. I 
> didn't remove old classes for more backward compatibility. User will not need 
> to change his old spark applications when he uprgade to new Spark version.
> Please rewiew my changes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API

2016-02-08 Thread Mark Grover (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15137771#comment-15137771
 ] 

Mark Grover commented on SPARK-12177:
-

Hi Rama,
This particular PR adds support for the new API. There is some small code for 
SSL support in it too but I haven't invested much time in testing that, apart 
from the simple unit test that was written for it. Kerberos (SASL) will have to 
done incrementally in another patch because, it can't be done until Kafka 
supports delegation tokens (which is still not there yet: 
https://issues.apache.org/jira/browse/KAFKA-1696)

> Update KafkaDStreams to new Kafka 0.9 Consumer API
> --
>
> Key: SPARK-12177
> URL: https://issues.apache.org/jira/browse/SPARK-12177
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.6.0
>Reporter: Nikita Tarasenko
>  Labels: consumer, kafka
>
> Kafka 0.9 already released and it introduce new consumer API that not 
> compatible with old one. So, I added new consumer api. I made separate 
> classes in package org.apache.spark.streaming.kafka.v09 with changed API. I 
> didn't remove old classes for more backward compatibility. User will not need 
> to change his old spark applications when he uprgade to new Spark version.
> Please rewiew my changes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API

2016-02-02 Thread Cody Koeninger (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15128377#comment-15128377
 ] 

Cody Koeninger commented on SPARK-12177:


It's probably worth either waiting for a point release from the kafka project, 
or at least keeping a careful eye on their jira for possibly relevant issues, 
e.g.
https://issues.apache.org/jira/browse/KAFKA-3159

> Update KafkaDStreams to new Kafka 0.9 Consumer API
> --
>
> Key: SPARK-12177
> URL: https://issues.apache.org/jira/browse/SPARK-12177
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.6.0
>Reporter: Nikita Tarasenko
>  Labels: consumer, kafka
>
> Kafka 0.9 already released and it introduce new consumer API that not 
> compatible with old one. So, I added new consumer api. I made separate 
> classes in package org.apache.spark.streaming.kafka.v09 with changed API. I 
> didn't remove old classes for more backward compatibility. User will not need 
> to change his old spark applications when he uprgade to new Spark version.
> Please rewiew my changes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API

2016-02-02 Thread Mark Grover (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15128465#comment-15128465
 ] 

Mark Grover commented on SPARK-12177:
-

Thanks Cody, will do!

> Update KafkaDStreams to new Kafka 0.9 Consumer API
> --
>
> Key: SPARK-12177
> URL: https://issues.apache.org/jira/browse/SPARK-12177
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.6.0
>Reporter: Nikita Tarasenko
>  Labels: consumer, kafka
>
> Kafka 0.9 already released and it introduce new consumer API that not 
> compatible with old one. So, I added new consumer api. I made separate 
> classes in package org.apache.spark.streaming.kafka.v09 with changed API. I 
> didn't remove old classes for more backward compatibility. User will not need 
> to change his old spark applications when he uprgade to new Spark version.
> Please rewiew my changes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API

2016-01-27 Thread Mark Grover (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15120113#comment-15120113
 ] 

Mark Grover commented on SPARK-12177:
-

I have issued a new PR https://github.com/apache/spark/pull/10953 for this 
which contains all of Nikita's changes as well. Please feel free to review and 
comment there.

The python implementation is not in that PR just yet, it's being worked on 
separately at 
https://github.com/markgrover/spark/tree/kafka09-integration-python (for now, 
anyways).

The new package is called 'newapi' instead of 'v09'.

> Update KafkaDStreams to new Kafka 0.9 Consumer API
> --
>
> Key: SPARK-12177
> URL: https://issues.apache.org/jira/browse/SPARK-12177
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.6.0
>Reporter: Nikita Tarasenko
>  Labels: consumer, kafka
>
> Kafka 0.9 already released and it introduce new consumer API that not 
> compatible with old one. So, I added new consumer api. I made separate 
> classes in package org.apache.spark.streaming.kafka.v09 with changed API. I 
> didn't remove old classes for more backward compatibility. User will not need 
> to change his old spark applications when he uprgade to new Spark version.
> Please rewiew my changes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API

2016-01-27 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15120110#comment-15120110
 ] 

Apache Spark commented on SPARK-12177:
--

User 'markgrover' has created a pull request for this issue:
https://github.com/apache/spark/pull/10953

> Update KafkaDStreams to new Kafka 0.9 Consumer API
> --
>
> Key: SPARK-12177
> URL: https://issues.apache.org/jira/browse/SPARK-12177
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.6.0
>Reporter: Nikita Tarasenko
>  Labels: consumer, kafka
>
> Kafka 0.9 already released and it introduce new consumer API that not 
> compatible with old one. So, I added new consumer api. I made separate 
> classes in package org.apache.spark.streaming.kafka.v09 with changed API. I 
> didn't remove old classes for more backward compatibility. User will not need 
> to change his old spark applications when he uprgade to new Spark version.
> Please rewiew my changes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API

2016-01-22 Thread Mark Grover (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15112785#comment-15112785
 ] 

Mark Grover commented on SPARK-12177:
-

Hi Mario,
Thanks for checking. I was still hoping to do everything in one assembly, so 
far it's looking good.

Yeah, I'll take care of renaming the packages/python files to something other 
than v09.

python part is coming along ok. Will keep you posted on the jira.

> Update KafkaDStreams to new Kafka 0.9 Consumer API
> --
>
> Key: SPARK-12177
> URL: https://issues.apache.org/jira/browse/SPARK-12177
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.6.0
>Reporter: Nikita Tarasenko
>  Labels: consumer, kafka
>
> Kafka 0.9 already released and it introduce new consumer API that not 
> compatible with old one. So, I added new consumer api. I made separate 
> classes in package org.apache.spark.streaming.kafka.v09 with changed API. I 
> didn't remove old classes for more backward compatibility. User will not need 
> to change his old spark applications when he uprgade to new Spark version.
> Please rewiew my changes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API

2016-01-21 Thread Mario Briggs (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15110994#comment-15110994
 ] 

Mario Briggs commented on SPARK-12177:
--

bq. If one uses the kafka v9 jar even when using the old consumer API, it can 
only work a Kafka v9 broker. 

I tried it on a single system setup (v0.9 client talking to v0.8 server-side) 
and the consumers had a problem (old or new). The producers though worked fine. 
So you are right. So then we will have kafka-assembly and 
kafka-assembly-v09/new and each including their version of kafka jars 
respectively right? ( I guess now, you were all along thinking 2 diff 
assemblies, and i guessed the other way round. Duh, IRC might have been faster)

With the above confirmed, it automatically throws out 'The public API 
signatures (of KafkaUtils in v0.9 subproject) are different and do not clash 
(with KafkaUtils in original kafka subproject) and hence can be added to the 
existing (original kafka subproject) KafkaUtils class.’ 

So the only thing left it seems is  to use 'new' or a better term instead of 
'v09', since we both agree on that. Great and thanks Mark.

How's the 'python/pyspark/streaming/kafka-v09(new).py' going

> Update KafkaDStreams to new Kafka 0.9 Consumer API
> --
>
> Key: SPARK-12177
> URL: https://issues.apache.org/jira/browse/SPARK-12177
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.6.0
>Reporter: Nikita Tarasenko
>  Labels: consumer, kafka
>
> Kafka 0.9 already released and it introduce new consumer API that not 
> compatible with old one. So, I added new consumer api. I made separate 
> classes in package org.apache.spark.streaming.kafka.v09 with changed API. I 
> didn't remove old classes for more backward compatibility. User will not need 
> to change his old spark applications when he uprgade to new Spark version.
> Please rewiew my changes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API

2016-01-20 Thread Mark Grover (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15110082#comment-15110082
 ] 

Mark Grover commented on SPARK-12177:
-

Hi Mario,
I may have misunderstood some parts of your previous comment and if so, I 
apologize in advance.

bq. i think that is not required when client uses a v0.9 jar though consuming 
only the older high level/low level API and talking to a v0.8 kafka cluster.
Based on what I understand, that's not the case. If one uses the kafka v9 jar 
even when using the old consumer API, it can only work a Kafka v9 broker. So, 
if we have to support both Kafka v08 and Kafka v09 brokers with Spark (which I 
believe we do), we have to have both Kafka v08 and Kafka v09 jars in our 
assembly. As far as I understand, simply having Kafka v09 jar only will not 
help.

bq. 1 thought around not introducing the version in the package name or class 
name (I see that Flink does it in the class name) was to avoid forcing us to 
create v0.10/v0.11 packages (and customers to change code and recompile), even 
if those releases of kafka don’t have client-api’ or otherwise such changes 
that warrant us to make a new version
I totally agree with you on this note. I was actually thinking of renaming all 
the v09 packages to be something different (like 'new'? But may be there's a 
better term) because as very aptly pointed out that it would be very confusing 
as we support later kafka versions.

bq. That’s why 1 earlier idea i mentioned in this JIRA was 'The public API 
signatures (of KafkaUtils in v0.9 subproject) are different and do not clash 
(with KafkaUtils in original kafka subproject) and hence can be added to the 
existing (original kafka subproject) KafkaUtils class.’ This also addresses the 
issues u mention above. Cody mentioned that we need to get others on the same 
page for this idea, so i guess we really need the committers to chime in here. 
Of course i forgot to answer’s Nikita’s followup question - 'do you mean that 
we would change the original KafkaUtils by adding new functions for new 
DirectIS/KafkaRDD but using them from separate module with kafka09 classes’ ? 
To be clear, these new public methods added to original kafka subproject’s 
‘KafkaUtils' ,will make use of 
DirectKafkaInputDStream,KafkaRDD,KafkaRDDPartition,OffsetRange classes that are 
in a new v09 package (internal of course). In short we don’t have a new 
subproject. (I skipped class KafkaCluster class from the list, because i am 
thinking it makes more sense to call this class something like 'KafkaClient' 
instead going forward)

At the core of it, I am not 100% sure if we can hide/abstract the fact away 
from our users that we have completely changed the consumer API from underneath 
us. I can think more about it but would appreciate more thoughts/insights along 
this direction, especially if you feel strongly about this.

Thanks again, Mario!

> Update KafkaDStreams to new Kafka 0.9 Consumer API
> --
>
> Key: SPARK-12177
> URL: https://issues.apache.org/jira/browse/SPARK-12177
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.6.0
>Reporter: Nikita Tarasenko
>  Labels: consumer, kafka
>
> Kafka 0.9 already released and it introduce new consumer API that not 
> compatible with old one. So, I added new consumer api. I made separate 
> classes in package org.apache.spark.streaming.kafka.v09 with changed API. I 
> didn't remove old classes for more backward compatibility. User will not need 
> to change his old spark applications when he uprgade to new Spark version.
> Please rewiew my changes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API

2016-01-20 Thread Mario Briggs (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15109056#comment-15109056
 ] 

Mario Briggs commented on SPARK-12177:
--

bq. I totally understand what you mean. However, kafka has its own assembly in 
Spark and the way the code is structured right now, both the new API and old 
API would go in the same assembly so it's important to have a different package 
name. Also, I think for our end users transitioning from old to new API, I 
foresee them having 2 versions of their spark-kafka app. One that works with 
the old API and one with the new API. And, I think it would be an easier 
transition if they could include both the kafka API versions in the spark 
classpath and pick and choose which app to run without mucking with maven 
dependencies and re-compiling when they want to switch. Let me know if you 
disagree.

Since this is WIP, i myself have atleast 1 more different option to what i 
suggested above... put just one to get the conversation rolling, so thanks for 
chipping in. Thanks for bringing out the kafka-assembly part… the assembly jar 
does include all dependencies too, so we would include kafka's v0.9’s jars i 
guess? I remember Cody mentioning that a v0.8 to v0.9 upgrade on kafka side 
involves upgrading the brokers… i think that is not required when client uses a 
v0.9 jar though consuming only the older high level/low level API and talking 
to a v0.8 kafka cluster. So we can go with 1 kafka-assembly and then my 
suggestion above itself is broken since we would have issues of same package & 
class names.

1 thought around not introducing the version in the package name or class name 
(I see that Flink does it in the class name) was to avoid forcing us to create 
v0.10/v0.11 packages (and customers to change code and recompile), even if 
those releases of kafka don’t have client-api’ or otherwise such changes that 
warrant us to make a new version (Also spark got away without putting a 
version# till now, which means less work in Spark, so not sure we want to start 
forcing this work going forward). Once we introduce the version #, we need to 
ensure it is in sync with kafka.

That’s why 1 earlier idea i mentioned in this JIRA was 'The public API 
signatures (of KafkaUtils in v0.9 subproject) are different and do not clash 
(with KafkaUtils in original kafka subproject) and hence can be added to the 
existing (original kafka subproject) KafkaUtils class.’  Cody mentioned that we 
need to get others on the same page for this idea, so i guess we really need 
the committers to chime in here. Of course i forgot to answer’s Nikita’s 
followup question - 'do you mean that we would change the original KafkaUtils 
by adding new functions for new DirectIS/KafkaRDD but using them from separate 
module with kafka09 classes’ ?  To be clear, these new public methods added to  
original kafka subproject’s ‘KafkaUtils' ,will  make use of 
DirectKafkaInputDStream,KafkaRDD,KafkaRDDPartition,OffsetRange classes that are 
in a new v09 package (internal of course).  In short we don’t have a new 
subproject. (I skipped class KafkaCluster class from the list, becuase i am 
thinking it makes more sense to call this class something like 'KafkaClient' 
instead going forward)

> Update KafkaDStreams to new Kafka 0.9 Consumer API
> --
>
> Key: SPARK-12177
> URL: https://issues.apache.org/jira/browse/SPARK-12177
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.6.0
>Reporter: Nikita Tarasenko
>  Labels: consumer, kafka
>
> Kafka 0.9 already released and it introduce new consumer API that not 
> compatible with old one. So, I added new consumer api. I made separate 
> classes in package org.apache.spark.streaming.kafka.v09 with changed API. I 
> didn't remove old classes for more backward compatibility. User will not need 
> to change his old spark applications when he uprgade to new Spark version.
> Please rewiew my changes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API

2016-01-19 Thread Mark Grover (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15107267#comment-15107267
 ] 

Mark Grover commented on SPARK-12177:
-

Thanks Mario! 
bq. We should also have a python/pyspark/streaming/kafka-v09.py as well that 
matches to our external/kafka-v09
I agree, I will look into this.
bq. Why do you have the Broker.scala class? Unless i am missing something, it 
should be knocked off
Yeah, I noticed that too and I agree. This should be pretty simple to take out. 
I also 
[noticed|https://issues.apache.org/jira/browse/SPARK-12177?focusedCommentId=15089750=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15089750]
 that the v09 example picking up some Kafka v08 jars so I am working on fixing 
that too.
bq. I think the package should be 'org.apache.spark.streaming.kafka' only in 
external/kafka-v09 and not 'org.apache.spark.streaming.kafka.v09'. This is 
because we produce a jar with a diff name (user picks which one and even if 
he/she mismatches, it errors correctly since the KafkaUtils method signatures 
are different)
I totally understand what you mean. However, kafka has its [own assembly in 
Spark|https://github.com/apache/spark/tree/master/external/kafka-assembly] and 
the way the code is structured right now, both the new API and old API would go 
in the same assembly so it's important to have a different package name. Also, 
I think for our end users transitioning from old to new API, I foresee them 
having 2 versions of their spark-kafka app. One that works with the old API and 
one with the new API. And, I think it would be an easier transition if they 
could include both the kafka API versions in the spark classpath and pick and 
choose which app to run without mucking with maven dependencies and 
re-compiling when they want to switch. Let me know if you disagree.

> Update KafkaDStreams to new Kafka 0.9 Consumer API
> --
>
> Key: SPARK-12177
> URL: https://issues.apache.org/jira/browse/SPARK-12177
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.6.0
>Reporter: Nikita Tarasenko
>  Labels: consumer, kafka
>
> Kafka 0.9 already released and it introduce new consumer API that not 
> compatible with old one. So, I added new consumer api. I made separate 
> classes in package org.apache.spark.streaming.kafka.v09 with changed API. I 
> didn't remove old classes for more backward compatibility. User will not need 
> to change his old spark applications when he uprgade to new Spark version.
> Please rewiew my changes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API

2016-01-19 Thread Mark Grover (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15107669#comment-15107669
 ] 

Mark Grover commented on SPARK-12177:
-

Posting an update. Took out Broker.scala, the example picking wrong version of 
Kafka was already taken care of by Nikita. I am looking into the python stuff 
now.

> Update KafkaDStreams to new Kafka 0.9 Consumer API
> --
>
> Key: SPARK-12177
> URL: https://issues.apache.org/jira/browse/SPARK-12177
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.6.0
>Reporter: Nikita Tarasenko
>  Labels: consumer, kafka
>
> Kafka 0.9 already released and it introduce new consumer API that not 
> compatible with old one. So, I added new consumer api. I made separate 
> classes in package org.apache.spark.streaming.kafka.v09 with changed API. I 
> didn't remove old classes for more backward compatibility. User will not need 
> to change his old spark applications when he uprgade to new Spark version.
> Please rewiew my changes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API

2016-01-18 Thread Mario Briggs (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15105560#comment-15105560
 ] 

Mario Briggs commented on SPARK-12177:
--

Hi Nikita,
 great. 

 1 - We should also have a python/pyspark/streaming/kafka-v09.py as well that 
matches to our external/kafka-v09
 2 - Why do you have the Broker.scala class? Unless i am missing something, it 
should be knocked off
 3 - I think the package should be 'org.apache.spark.streaming.kafka' only in 
external/kafka-v09 and not 'org.apache.spark.streaming.kafka.v09'. This is 
because we produce a jar with a diff name (user picks which one and even if 
he/she mismatches, it errors correctly since the KafkaUtils method signatures 
are different)
  
 

> Update KafkaDStreams to new Kafka 0.9 Consumer API
> --
>
> Key: SPARK-12177
> URL: https://issues.apache.org/jira/browse/SPARK-12177
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.6.0
>Reporter: Nikita Tarasenko
>  Labels: consumer, kafka
>
> Kafka 0.9 already released and it introduce new consumer API that not 
> compatible with old one. So, I added new consumer api. I made separate 
> classes in package org.apache.spark.streaming.kafka.v09 with changed API. I 
> didn't remove old classes for more backward compatibility. User will not need 
> to change his old spark applications when he uprgade to new Spark version.
> Please rewiew my changes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API

2016-01-11 Thread Mark Grover (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15092455#comment-15092455
 ] 

Mark Grover commented on SPARK-12177:
-

Thanks Nikita. And, I will be issuing PR's to your kafka09-integration branch 
so it can become the single source of truth until this change gets merged into 
spark. And, I believe Spark community prefers discussion on PRs once they are 
filed, so you'll hear more from me there:-)

> Update KafkaDStreams to new Kafka 0.9 Consumer API
> --
>
> Key: SPARK-12177
> URL: https://issues.apache.org/jira/browse/SPARK-12177
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.6.0
>Reporter: Nikita Tarasenko
>  Labels: consumer, kafka
>
> Kafka 0.9 already released and it introduce new consumer API that not 
> compatible with old one. So, I added new consumer api. I made separate 
> classes in package org.apache.spark.streaming.kafka.v09 with changed API. I 
> didn't remove old classes for more backward compatibility. User will not need 
> to change his old spark applications when he uprgade to new Spark version.
> Please rewiew my changes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API

2016-01-10 Thread Nikita Tarasenko (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15091068#comment-15091068
 ] 

Nikita Tarasenko commented on SPARK-12177:
--

I created a new PR which is based on the master branch - 
https://github.com/apache/spark/pull/10681

> Update KafkaDStreams to new Kafka 0.9 Consumer API
> --
>
> Key: SPARK-12177
> URL: https://issues.apache.org/jira/browse/SPARK-12177
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.6.0
>Reporter: Nikita Tarasenko
>  Labels: consumer, kafka
>
> Kafka 0.9 already released and it introduce new consumer API that not 
> compatible with old one. So, I added new consumer api. I made separate 
> classes in package org.apache.spark.streaming.kafka.v09 with changed API. I 
> didn't remove old classes for more backward compatibility. User will not need 
> to change his old spark applications when he uprgade to new Spark version.
> Please rewiew my changes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API

2016-01-10 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15091067#comment-15091067
 ] 

Apache Spark commented on SPARK-12177:
--

User 'nikit-os' has created a pull request for this issue:
https://github.com/apache/spark/pull/10681

> Update KafkaDStreams to new Kafka 0.9 Consumer API
> --
>
> Key: SPARK-12177
> URL: https://issues.apache.org/jira/browse/SPARK-12177
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.6.0
>Reporter: Nikita Tarasenko
>  Labels: consumer, kafka
>
> Kafka 0.9 already released and it introduce new consumer API that not 
> compatible with old one. So, I added new consumer api. I made separate 
> classes in package org.apache.spark.streaming.kafka.v09 with changed API. I 
> didn't remove old classes for more backward compatibility. User will not need 
> to change his old spark applications when he uprgade to new Spark version.
> Please rewiew my changes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API

2016-01-09 Thread Nikita Tarasenko (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15090588#comment-15090588
 ] 

Nikita Tarasenko commented on SPARK-12177:
--

Hi, Mark!

Of course, it possible to collaborate =)

1. I think, I will create a new branch based on master, copy existing code with 
some changes (as you said) and create new pull request. Then you can issuing 
pull requests to this new branch. Is it ok?

2. Yes, this is a problem. I did not notice it. How we can handle it? I don't 
know how to properly separate those dependencies. Maybe we can put examples for 
Kafka 0.9 to kafka-v09 package until Kafka 0.8 exists?

> Update KafkaDStreams to new Kafka 0.9 Consumer API
> --
>
> Key: SPARK-12177
> URL: https://issues.apache.org/jira/browse/SPARK-12177
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.6.0
>Reporter: Nikita Tarasenko
>  Labels: consumer, kafka
>
> Kafka 0.9 already released and it introduce new consumer API that not 
> compatible with old one. So, I added new consumer api. I made separate 
> classes in package org.apache.spark.streaming.kafka.v09 with changed API. I 
> didn't remove old classes for more backward compatibility. User will not need 
> to change his old spark applications when he uprgade to new Spark version.
> Please rewiew my changes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API

2016-01-09 Thread Mark Grover (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15090717#comment-15090717
 ] 

Mark Grover commented on SPARK-12177:
-

#1 Sounds great, thanks!
#2 Yeah, that's the only way I can think of for now but let me ponder a bit 
more. 

Thanks! Looking forward to it.

> Update KafkaDStreams to new Kafka 0.9 Consumer API
> --
>
> Key: SPARK-12177
> URL: https://issues.apache.org/jira/browse/SPARK-12177
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.6.0
>Reporter: Nikita Tarasenko
>  Labels: consumer, kafka
>
> Kafka 0.9 already released and it introduce new consumer API that not 
> compatible with old one. So, I added new consumer api. I made separate 
> classes in package org.apache.spark.streaming.kafka.v09 with changed API. I 
> didn't remove old classes for more backward compatibility. User will not need 
> to change his old spark applications when he uprgade to new Spark version.
> Please rewiew my changes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API

2016-01-08 Thread Mark Grover (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15089750#comment-15089750
 ] 

Mark Grover commented on SPARK-12177:
-

Thanks for working on this, Nikita. I'd like to help out. Here are a few things 
of feedback that I have:
1. I tried rebasing what you have in your current branch to upstream master 
(yours still seems to be based off of pre-1.6.0 code) but mostly because of 
some commits related to import ordering that happened on spark trunk relatively 
recently, I found it easier to migrate/copy the code for kafka-v09 and make the 
minor changes to examples and root pom instead of doing a 'git rebase'.
2. I also noticed that the v09DirectKafkaWordCount example is pulling at least 
the ConsumerConfig class from Kafka 0.8.2.1. This because the examples pom 
contains both kafka 0.8.2.1 and 0.9.0 dependencies and somewhat arbitrarily 
puts the 0.8.2.1 ahead. Since the ConsumerConfig class is available in both 
under the same namespace, we end up pulling 0.8.2.1. We should fix that.


In general, I may have a few more changes/fixes that I'd like to contribute to 
your pull request. Would it be possible for us to collaborate? What's the best 
way to do so? Reopening the pull request and me adding to it? Or, just me 
issuing pull requests to [your 
branch|https://github.com/nikit-os/spark/tree/kafka-09-consumer-api]? Thanks!

> Update KafkaDStreams to new Kafka 0.9 Consumer API
> --
>
> Key: SPARK-12177
> URL: https://issues.apache.org/jira/browse/SPARK-12177
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.6.0
>Reporter: Nikita Tarasenko
>  Labels: consumer, kafka
>
> Kafka 0.9 already released and it introduce new consumer API that not 
> compatible with old one. So, I added new consumer api. I made separate 
> classes in package org.apache.spark.streaming.kafka.v09 with changed API. I 
> didn't remove old classes for more backward compatibility. User will not need 
> to change his old spark applications when he uprgade to new Spark version.
> Please rewiew my changes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API

2016-01-08 Thread Nikita Tarasenko (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15089564#comment-15089564
 ] 

Nikita Tarasenko commented on SPARK-12177:
--

Thanks! I have used your changes with frequent KafkaConsumer object creation 
and with correct consumer.poll() implementation (below). Also I have removed 
receiver-based InputStream, related tests and example.

> Update KafkaDStreams to new Kafka 0.9 Consumer API
> --
>
> Key: SPARK-12177
> URL: https://issues.apache.org/jira/browse/SPARK-12177
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.6.0
>Reporter: Nikita Tarasenko
>  Labels: consumer, kafka
>
> Kafka 0.9 already released and it introduce new consumer API that not 
> compatible with old one. So, I added new consumer api. I made separate 
> classes in package org.apache.spark.streaming.kafka.v09 with changed API. I 
> didn't remove old classes for more backward compatibility. User will not need 
> to change his old spark applications when he uprgade to new Spark version.
> Please rewiew my changes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API

2016-01-05 Thread Mario Briggs (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15082674#comment-15082674
 ] 

Mario Briggs commented on SPARK-12177:
--

implemented here - 
https://github.com/mariobriggs/spark/commit/2fcbb721b99b48e336ba7ef7c317c279c9483840

> Update KafkaDStreams to new Kafka 0.9 Consumer API
> --
>
> Key: SPARK-12177
> URL: https://issues.apache.org/jira/browse/SPARK-12177
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.6.0
>Reporter: Nikita Tarasenko
>  Labels: consumer, kafka
>
> Kafka 0.9 already released and it introduce new consumer API that not 
> compatible with old one. So, I added new consumer api. I made separate 
> classes in package org.apache.spark.streaming.kafka.v09 with changed API. I 
> didn't remove old classes for more backward compatibility. User will not need 
> to change his old spark applications when he uprgade to new Spark version.
> Please rewiew my changes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API

2016-01-04 Thread Mario Briggs (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15081565#comment-15081565
 ] 

Mario Briggs commented on SPARK-12177:
--

Confirmed that the frequent KafkaConsumer object creation was reason for the 
consumer.position() method hang. Cleaned up frequent KafkaConsumer object 
creation 
https://github.com/mariobriggs/spark/commit/e40df7ee70fd72418969e8f9c81a1fee304b8b1c
 and it resolved issue

> Update KafkaDStreams to new Kafka 0.9 Consumer API
> --
>
> Key: SPARK-12177
> URL: https://issues.apache.org/jira/browse/SPARK-12177
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.6.0
>Reporter: Nikita Tarasenko
>  Labels: consumer, kafka
>
> Kafka 0.9 already released and it introduce new consumer API that not 
> compatible with old one. So, I added new consumer api. I made separate 
> classes in package org.apache.spark.streaming.kafka.v09 with changed API. I 
> didn't remove old classes for more backward compatibility. User will not need 
> to change his old spark applications when he uprgade to new Spark version.
> Please rewiew my changes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API

2015-12-30 Thread Mario Briggs (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15075230#comment-15075230
 ] 

Mario Briggs commented on SPARK-12177:
--

you could also get just a few of the records you want i.e. not all in 1 shot

override def getNext(): R = {
  if (iter == null || !iter.hasNext) {
iter = consumer.poll(pollTime).iterator()
  }

  if (!iter.hasNext) {
if ( requestOffset < part.untilOffset ) {
   // need to make another poll() and recheck above. So make a 
recursive call i.e. 'return getnext()' here ?
}
finished = true
null.asInstanceOf[R]
  } else {
   ...

> Update KafkaDStreams to new Kafka 0.9 Consumer API
> --
>
> Key: SPARK-12177
> URL: https://issues.apache.org/jira/browse/SPARK-12177
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.6.0
>Reporter: Nikita Tarasenko
>  Labels: consumer, kafka
>
> Kafka 0.9 already released and it introduce new consumer API that not 
> compatible with old one. So, I added new consumer api. I made separate 
> classes in package org.apache.spark.streaming.kafka.v09 with changed API. I 
> didn't remove old classes for more backward compatibility. User will not need 
> to change his old spark applications when he uprgade to new Spark version.
> Please rewiew my changes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API

2015-12-30 Thread Mario Briggs (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15075172#comment-15075172
 ] 

Mario Briggs commented on SPARK-12177:
--

Nikita,

thank you. 

A-C : Looks good to me. (BTW i didn't review changes related to receiver based 
approach, even in earlier round)

D - I think it is OK for KafkaTestUtils to have dependency on core, since that 
is more of our internal test approach (however i havent spent time to think if 
even that can be bettered). To the higher issue, i think Kafka *will* provide 
TopicPartition as serializable, which will make this moot, but good that we 
have tracked it here

> Update KafkaDStreams to new Kafka 0.9 Consumer API
> --
>
> Key: SPARK-12177
> URL: https://issues.apache.org/jira/browse/SPARK-12177
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.6.0
>Reporter: Nikita Tarasenko
>  Labels: consumer, kafka
>
> Kafka 0.9 already released and it introduce new consumer API that not 
> compatible with old one. So, I added new consumer api. I made separate 
> classes in package org.apache.spark.streaming.kafka.v09 with changed API. I 
> didn't remove old classes for more backward compatibility. User will not need 
> to change his old spark applications when he uprgade to new Spark version.
> Please rewiew my changes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API

2015-12-30 Thread Mario Briggs (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15075205#comment-15075205
 ] 

Mario Briggs commented on SPARK-12177:
--

Very good point about creation of KafkaConsumer frequently. In fact, Praveen is 
investigating if that is reason the 'position()' method hangs when we have 
batch intervals at 200ms and below.
 So one way to try to optimize it is this way : since the 'compute' method in 
DirectKafkaInputDStream runs in the driver, why not store the 'KafkaConsumer' 
rather than the KafkaCluster as a member variable in this class. Of course we 
will need to mark it transient, so that its not attempted to be serialized and 
that means always check if null and re-initialize if required, before use. The 
only use of the Consumer here is to find the new latest offsets, so we will 
have to massage that method for use with an existing consumer object .
Or another option is to let KafkaCluster have a KafkaConsumer instance as a 
member variable with same noted aspects about being transient.

This also means, move the part about fetching the leader ipAddress for 
getPreferredLocations() away from KafkaRDD.getPartitions() to 
DirectKafkaInputDStream.compute() and have 'leaders' as constructor param to 
KafkaRDD ( i now realize that KafkaRDD is private so we are not having that on 
a public API as i thought earlier)
  

> Update KafkaDStreams to new Kafka 0.9 Consumer API
> --
>
> Key: SPARK-12177
> URL: https://issues.apache.org/jira/browse/SPARK-12177
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.6.0
>Reporter: Nikita Tarasenko
>  Labels: consumer, kafka
>
> Kafka 0.9 already released and it introduce new consumer API that not 
> compatible with old one. So, I added new consumer api. I made separate 
> classes in package org.apache.spark.streaming.kafka.v09 with changed API. I 
> didn't remove old classes for more backward compatibility. User will not need 
> to change his old spark applications when he uprgade to new Spark version.
> Please rewiew my changes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API

2015-12-29 Thread Mario Briggs (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15074131#comment-15074131
 ] 

Mario Briggs commented on SPARK-12177:
--

with regards to review comment 'C' (2 comments above), the kafka folks have 
clarified that the timeout parameter on the poll() method is the time to spend 
waiting till the client side gets the data (from the server) and not time to 
spend waiting for data to become available on the server. This means that one 
might have to call poll() more than once, till we get to the consumerRecord of 
'untilOffset' 

> Update KafkaDStreams to new Kafka 0.9 Consumer API
> --
>
> Key: SPARK-12177
> URL: https://issues.apache.org/jira/browse/SPARK-12177
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.6.0
>Reporter: Nikita Tarasenko
>  Labels: consumer, kafka
>
> Kafka 0.9 already released and it introduce new consumer API that not 
> compatible with old one. So, I added new consumer api. I made separate 
> classes in package org.apache.spark.streaming.kafka.v09 with changed API. I 
> didn't remove old classes for more backward compatibility. User will not need 
> to change his old spark applications when he uprgade to new Spark version.
> Please rewiew my changes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API

2015-12-29 Thread Nikita Tarasenko (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15074262#comment-15074262
 ] 

Nikita Tarasenko commented on SPARK-12177:
--

Hi, Mario! Thank you for review! It very helpful for me.
I made a few commits based on your A-C comments and your implementation on 
github repo:
 - 
https://github.com/nikit-os/spark/commit/44bb56806ab5fb86d0e27ed8108188c5808bae0f
 - 
https://github.com/nikit-os/spark/commit/85fad9410ab5133bd92e01af278e2f473c07b745
 - 
https://github.com/nikit-os/spark/commit/2d36c76f1bc8bcba59261d37bbff90e67ece73d7

About 'D' comment - kafka-core needed not only for TopicAndPartition class but 
for some classes (such as KafkaConfig, ZkUtils, KafkaServer, Producer) that 
needed for KafkaTestUtils. This is reason why I keep kafka-core dependency. 

> Update KafkaDStreams to new Kafka 0.9 Consumer API
> --
>
> Key: SPARK-12177
> URL: https://issues.apache.org/jira/browse/SPARK-12177
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.6.0
>Reporter: Nikita Tarasenko
>  Labels: consumer, kafka
>
> Kafka 0.9 already released and it introduce new consumer API that not 
> compatible with old one. So, I added new consumer api. I made separate 
> classes in package org.apache.spark.streaming.kafka.v09 with changed API. I 
> didn't remove old classes for more backward compatibility. User will not need 
> to change his old spark applications when he uprgade to new Spark version.
> Please rewiew my changes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API

2015-12-29 Thread Nikita Tarasenko (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15074265#comment-15074265
 ] 

Nikita Tarasenko commented on SPARK-12177:
--

I'm not against to use only Direct approach and remove Receiver based approach 
from new implementations. The only thing that bothers me is frequent creation 
new instance of KafkaConsumer at KafkaCluster (withConsumer method). I do not 
know how effective it is. On this point of view, Receiver based approach more 
easier. But with ReliableKafkaReceiver we have problem with multithread acces 
to KafkaConsumer (because it is not thread-safe) - one from MessageHandler for 
poll and second from GeneratedBlockHandler for commit offsets.
And if we leave only Direct approach - do you mean that we would change the 
original KafkaUtils by adding new functions for new DirectIS/KafkaRDD but using 
them from separate module with kafka09 classes?

> Update KafkaDStreams to new Kafka 0.9 Consumer API
> --
>
> Key: SPARK-12177
> URL: https://issues.apache.org/jira/browse/SPARK-12177
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.6.0
>Reporter: Nikita Tarasenko
>  Labels: consumer, kafka
>
> Kafka 0.9 already released and it introduce new consumer API that not 
> compatible with old one. So, I added new consumer api. I made separate 
> classes in package org.apache.spark.streaming.kafka.v09 with changed API. I 
> didn't remove old classes for more backward compatibility. User will not need 
> to change his old spark applications when he uprgade to new Spark version.
> Please rewiew my changes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API

2015-12-29 Thread Nikita Tarasenko (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15074266#comment-15074266
 ] 

Nikita Tarasenko commented on SPARK-12177:
--

Should we use here 
https://github.com/nikit-os/spark/blob/44bb56806ab5fb86d0e27ed8108188c5808bae0f/external/kafka-v09/src/main/scala/org/apache/spark/streaming/kafka/v09/KafkaRDD.scala#L157
 while loop for waiting data?

> Update KafkaDStreams to new Kafka 0.9 Consumer API
> --
>
> Key: SPARK-12177
> URL: https://issues.apache.org/jira/browse/SPARK-12177
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.6.0
>Reporter: Nikita Tarasenko
>  Labels: consumer, kafka
>
> Kafka 0.9 already released and it introduce new consumer API that not 
> compatible with old one. So, I added new consumer api. I made separate 
> classes in package org.apache.spark.streaming.kafka.v09 with changed API. I 
> didn't remove old classes for more backward compatibility. User will not need 
> to change his old spark applications when he uprgade to new Spark version.
> Please rewiew my changes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API

2015-12-28 Thread Mario Briggs (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15073035#comment-15073035
 ] 

Mario Briggs commented on SPARK-12177:
--

On the issue of lots of duplicate code from the original, 1 thought was whether 
we need to upgrade the older Receiver based approach to the new Consumer API? 
The Direct approach has so many benefits over the older Receiver based approach 
and i can't think of a drawback, that one might make the argument that we dont 
upgrade the latter at all, it remains on the older kafka consumer API and get 
deprecated over a long period time. Thoughts ?

If we do go the above way, then there is very trivial overlap of code between 
original and this new consumer implementation. The public API signatures are 
different and do not clash and hence can be added to the existing KafkaUtils 
class.

> Update KafkaDStreams to new Kafka 0.9 Consumer API
> --
>
> Key: SPARK-12177
> URL: https://issues.apache.org/jira/browse/SPARK-12177
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.6.0
>Reporter: Nikita Tarasenko
>  Labels: consumer, kafka
>
> Kafka 0.9 already released and it introduce new consumer API that not 
> compatible with old one. So, I added new consumer api. I made separate 
> classes in package org.apache.spark.streaming.kafka.v09 with changed API. I 
> didn't remove old classes for more backward compatibility. User will not need 
> to change his old spark applications when he uprgade to new Spark version.
> Please rewiew my changes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API

2015-12-28 Thread Mario Briggs (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15073020#comment-15073020
 ] 

Mario Briggs commented on SPARK-12177:
--

  Hi Nikita,
 thanks. Here are my review comments. I couldnt find a way to add them on the 
PR review in github, so added them here. My comments are a little detailed, 
since i too developed an implementation and tested it along with my colleague 
Praveen :-) after initial discussion on dev list 

A - KafkaCluster class
 getPartitions()
 seek()
callers of the above methods
all other methods that use withConsumer()

 These should not return a ‘Either’, but rather just the expected object ( the 
‘Right’). The reason for the ‘Either’ object in the earlier code was due to the 
fact the earlier kafka client had to deal with trying the operation on all the 
‘seedBrokers’ and handle the case if some of them were down. Similarly when 
dealing with ‘leaders’, client had to try the operation on all leaders for a TP 
(TopicAndPartition).  When we use the new kaka-clients API, we don’t have to 
deal with trying against all the seedBrokers, leaders etc, since the new 
KafkaConsumer object internally handles all those details.
Notice that in the earlier code, withBrokers() tries to connect() and invoke 
the passed in function multiple times with the brokers.forEach() and hence the 
need to accumulate errors. The earlier code also did a ‘return’ immediately 
when successful with one of the brokers. This does not apply with the new 
KafkaConsumer object.

getPartitions() - 
https://github.com/nikit-os/spark/blob/kafka-09-consumer-api/external/kafka-v09/src/main/scala/org/apache/spark/streaming/kafka/v09/KafkaCluster.scala#L62
  consumer.partitionsFor() java API will returns a null if the topic doesn’t 
exist. If you don’t handle that, you run into a NPE when the user specifies a 
topic that doesn’t exist or makes a typo in the topic name (also not returning 
an exception saying the partition doesn’t exist is not right)

our implementation is at - 
https://github.com/mariobriggs/spark/blob/kafka0.9-streaming/external/kafka-newconsumer/src/main/scala/org/apache/spark/streaming/kafka/KafkaCluster.scala.
 If it is easier for you that we issue a PR to your repo, let us know

B - KafkaRDD class
   getPreferredLocations()
  this method is missing in your code. The earlier implementation from Cody had 
the optimisation that if Kafka and the spark code (KafkaRDD) was running on the 
same cluster, then the RDD partition for a particular TopicPartition, would be 
local to that TopicPartition leader. Could you please add code to bring back 
this functionality.

 Our implementation, pulled this info inside the getPartitions- 
https://github.com/mariobriggs/spark/blob/kafka0.9-streaming/external/kafka-newconsumer/src/main/scala/org/apache/spark/streaming/kafka/KafkaRDD.scala#L52.
 Probably more efficient to do it inside compute() of the DStream, but that 
meant exposing ‘leaders’ on the KafkaRDD constructor as discussed - 
https://mail-archives.apache.org/mod_mbox/spark-dev/201512.mbox/%3CCAKWX9VV34Dto9irT3ZZfH78EoXO3bV3VHN8YYvTxfnyvGcRsQw%40mail.gmail.com%3E

C - KafkaRDDIterator class

 getNext()
  As mentioned in issue #1 noted here - 
https://github.com/mariobriggs/spark/blob/kafka0.9-streaming/external/kafka-newconsumer/README.md#issues-noted-with-kafka-090-consumer-api
,KafkaConsumer.poll(x) is not guaranteed to return the data, when x is a small 
value. We are following up with [this issue with 
kafka](https://issues.apache.org/jira/browse/KAFKA-3044.I see that you have 
made this a configurable value in your implementation which is good, but either 
ways till this behaviour is clarified or even otherwise, we need [this 
assert](https://github.com/mariobriggs/spark/blob/kafka0.9-streaming/external/kafka-newconsumer/src/main/scala/org/apache/spark/streaming/kafka/KafkaRDD.scala#L171)
 or else we will be silently skipping data without the user knowing it (either 
default value or user specifying a smaller value)

D- Non use of TopicPartition class of new Consumer

   You already have figured out that this class is not Serializable and hence 
in the public interface you have used the older TopicAndPartition class. We 
have raised [this issue](https://issues.apache.org/jira/browse/KAFKA-3029) with 
Kafka and maybe be provided with one (yet to see). However using the older 
TopicAndPartition class in our public API, which introduces a dependency on the 
older kafka core rather than just kaka-clients jar, i would think is not the 
preferred approach. If we are not provided with a serializable TopicPartition, 
then we should rather use our own serializable object (or just a tuple of 
string, int, Long) inside of DirectKafkaInputDStream to ensure it is 
Serializable.


> Update KafkaDStreams to new Kafka 0.9 Consumer API
> --
>
>   

[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API

2015-12-23 Thread Nikita Tarasenko (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15069579#comment-15069579
 ] 

Nikita Tarasenko commented on SPARK-12177:
--

I have added SSL configuration for Kafka. Commit - 
https://github.com/nikit-os/spark/commit/2379834aa92b127f1635013bbe3046e37464cbed

> Update KafkaDStreams to new Kafka 0.9 Consumer API
> --
>
> Key: SPARK-12177
> URL: https://issues.apache.org/jira/browse/SPARK-12177
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.6.0
>Reporter: Nikita Tarasenko
>  Labels: consumer, kafka
>
> Kafka 0.9 already released and it introduce new consumer API that not 
> compatible with old one. So, I added new consumer api. I made separate 
> classes in package org.apache.spark.streaming.kafka.v09 with changed API. I 
> didn't remove old classes for more backward compatibility. User will not need 
> to change his old spark applications when he uprgade to new Spark version.
> Please rewiew my changes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API

2015-12-17 Thread Cody Koeninger (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15062313#comment-15062313
 ] 

Cody Koeninger commented on SPARK-12177:


Honestly SSL / auth is the only compelling thing about 0.9, not sure it
makes sense to have 0.9 it without it.

On Wed, Dec 16, 2015 at 3:16 AM, Nikita Tarasenko (JIRA) 



> Update KafkaDStreams to new Kafka 0.9 Consumer API
> --
>
> Key: SPARK-12177
> URL: https://issues.apache.org/jira/browse/SPARK-12177
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.6.0
>Reporter: Nikita Tarasenko
>  Labels: consumer, kafka
>
> Kafka 0.9 already released and it introduce new consumer API that not 
> compatible with old one. So, I added new consumer api. I made separate 
> classes in package org.apache.spark.streaming.kafka.v09 with changed API. I 
> didn't remove old classes for more backward compatibility. User will not need 
> to change his old spark applications when he uprgade to new Spark version.
> Please rewiew my changes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API

2015-12-16 Thread Nikita Tarasenko (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15059840#comment-15059840
 ] 

Nikita Tarasenko commented on SPARK-12177:
--

Current implementation doesn't include SSL support.
I could add SSL support after accepting my current changes. I think for Spark 
integration we could use spark.ssl.* properties as global configuration and add 
new spark.ssl.kafka.* properties for overriding global properties.

> Update KafkaDStreams to new Kafka 0.9 Consumer API
> --
>
> Key: SPARK-12177
> URL: https://issues.apache.org/jira/browse/SPARK-12177
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.6.0
>Reporter: Nikita Tarasenko
>  Labels: consumer, kafka
>
> Kafka 0.9 already released and it introduce new consumer API that not 
> compatible with old one. So, I added new consumer api. I made separate 
> classes in package org.apache.spark.streaming.kafka.v09 with changed API. I 
> didn't remove old classes for more backward compatibility. User will not need 
> to change his old spark applications when he uprgade to new Spark version.
> Please rewiew my changes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API

2015-12-16 Thread Tom Waterhouse (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15060218#comment-15060218
 ] 

Tom Waterhouse commented on SPARK-12177:


SSL is very important for our deployment, +1 for adding it into the integration.

> Update KafkaDStreams to new Kafka 0.9 Consumer API
> --
>
> Key: SPARK-12177
> URL: https://issues.apache.org/jira/browse/SPARK-12177
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.6.0
>Reporter: Nikita Tarasenko
>  Labels: consumer, kafka
>
> Kafka 0.9 already released and it introduce new consumer API that not 
> compatible with old one. So, I added new consumer api. I made separate 
> classes in package org.apache.spark.streaming.kafka.v09 with changed API. I 
> didn't remove old classes for more backward compatibility. User will not need 
> to change his old spark applications when he uprgade to new Spark version.
> Please rewiew my changes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API

2015-12-15 Thread Dean Wampler (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15058856#comment-15058856
 ] 

Dean Wampler commented on SPARK-12177:
--

Since the new Kafka 0.9 consumer API supports SSL, would implementation of 
#12177 enable use of SSL or would additional work be required in Spark's 
integration?

> Update KafkaDStreams to new Kafka 0.9 Consumer API
> --
>
> Key: SPARK-12177
> URL: https://issues.apache.org/jira/browse/SPARK-12177
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.6.0
>Reporter: Nikita Tarasenko
>  Labels: consumer, kafka
>
> Kafka 0.9 already released and it introduce new consumer API that not 
> compatible with old one. So, I added new consumer api. I made separate 
> classes in package org.apache.spark.streaming.kafka.v09 with changed API. I 
> didn't remove old classes for more backward compatibility. User will not need 
> to change his old spark applications when he uprgade to new Spark version.
> Please rewiew my changes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API

2015-12-14 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15055908#comment-15055908
 ] 

Apache Spark commented on SPARK-12177:
--

User 'nikit-os' has created a pull request for this issue:
https://github.com/apache/spark/pull/10294

> Update KafkaDStreams to new Kafka 0.9 Consumer API
> --
>
> Key: SPARK-12177
> URL: https://issues.apache.org/jira/browse/SPARK-12177
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.6.0
>Reporter: Nikita Tarasenko
>  Labels: consumer, kafka
>
> Kafka 0.9 already released and it introduce new consumer API that not 
> compatible with old one. So, I added new consumer api. I made separate 
> classes in package org.apache.spark.streaming.kafka.v09 with changed API. I 
> didn't remove old classes for more backward compatibility. User will not need 
> to change his old spark applications when he uprgade to new Spark version.
> Please rewiew my changes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API

2015-12-14 Thread Nikita Tarasenko (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15055937#comment-15055937
 ] 

Nikita Tarasenko commented on SPARK-12177:
--

Do we need DirectKafkaInputDStream yet? We shouldn't use Zookeeper directly any 
more. At now DirectKafkaInputDStream is similar to KafkaInputDStream but with 
manual offset managment

> Update KafkaDStreams to new Kafka 0.9 Consumer API
> --
>
> Key: SPARK-12177
> URL: https://issues.apache.org/jira/browse/SPARK-12177
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.6.0
>Reporter: Nikita Tarasenko
>  Labels: consumer, kafka
>
> Kafka 0.9 already released and it introduce new consumer API that not 
> compatible with old one. So, I added new consumer api. I made separate 
> classes in package org.apache.spark.streaming.kafka.v09 with changed API. I 
> didn't remove old classes for more backward compatibility. User will not need 
> to change his old spark applications when he uprgade to new Spark version.
> Please rewiew my changes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API

2015-12-14 Thread Nikita Tarasenko (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15055911#comment-15055911
 ] 

Nikita Tarasenko commented on SPARK-12177:
--

I moved all changes for Kafka 0.9 to separate subproject.

> Update KafkaDStreams to new Kafka 0.9 Consumer API
> --
>
> Key: SPARK-12177
> URL: https://issues.apache.org/jira/browse/SPARK-12177
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.6.0
>Reporter: Nikita Tarasenko
>  Labels: consumer, kafka
>
> Kafka 0.9 already released and it introduce new consumer API that not 
> compatible with old one. So, I added new consumer api. I made separate 
> classes in package org.apache.spark.streaming.kafka.v09 with changed API. I 
> didn't remove old classes for more backward compatibility. User will not need 
> to change his old spark applications when he uprgade to new Spark version.
> Please rewiew my changes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API

2015-12-08 Thread Cody Koeninger (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15046975#comment-15046975
 ] 

Cody Koeninger commented on SPARK-12177:


I really think this needs to be handled as a separate subproject, or otherwise 
in a fully backwards compatible way.  The new consumer api will require 
upgrading kafka brokers, which is a big ask just in order for people to upgrade 
spark versions.

> Update KafkaDStreams to new Kafka 0.9 Consumer API
> --
>
> Key: SPARK-12177
> URL: https://issues.apache.org/jira/browse/SPARK-12177
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.6.0
>Reporter: Nikita Tarasenko
>  Labels: consumer, kafka
>
> Kafka 0.9 already released and it introduce new consumer API that not 
> compatible with old one. So, I added new consumer api. I made separate 
> classes in package org.apache.spark.streaming.kafka.v09 with changed API. I 
> didn't remove old classes for more backward compatibility. User will not need 
> to change his old spark applications when he uprgade to new Spark version.
> Please rewiew my changes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API

2015-12-07 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15044799#comment-15044799
 ] 

Apache Spark commented on SPARK-12177:
--

User 'ntarasenko' has created a pull request for this issue:
https://github.com/apache/spark/pull/10173

> Update KafkaDStreams to new Kafka 0.9 Consumer API
> --
>
> Key: SPARK-12177
> URL: https://issues.apache.org/jira/browse/SPARK-12177
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.6.0
>Reporter: Nikita Tarasenko
>  Labels: consumer, kafka
>
> Kafka 0.9 already released and it introduce new consumer API that not 
> compatible with old one. So, I added new consumer api. I made separate 
> classes in package org.apache.spark.streaming.kafka.v09 with changed API. I 
> didn't remove old classes for more backward compatibility. User will not need 
> to change his old spark applications when he uprgade to new Spark version.
> Please rewiew my changes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org