Scala KafkaRDD uses a trait to handle this problem, but it is not so easy
and straightforward in Python, where we need to have a specific API to
handle this, I'm not sure is there any simple workaround to fix this, maybe
we should think carefully about it.
2015-06-12 13:59 GMT+08:00 Amit Ramesh
Hi,
If you want I would be happy to work in this. I have worked with
KafkaUtils.createDirectStream before, in a pull request that wasn't
accepted https://github.com/apache/spark/pull/5367. I'm fluent with Python
and I'm starting to feel comfortable with Scala, so if someone opens a JIRA
I can
Hi Juan,
I have created a ticket for this:
https://issues.apache.org/jira/browse/SPARK-8337
Thanks!
Amit
On Fri, Jun 12, 2015 at 3:17 PM, Juan Rodríguez Hortalá
juan.rodriguez.hort...@gmail.com wrote:
Hi,
If you want I would be happy to work in this. I have worked with
The scala api has 2 ways of calling createDirectStream. One of them allows
you to pass a message handler that gets full access to the kafka
MessageAndMetadata, including offset.
I don't know why the python api was developed with only one way to call
createDirectStream, but the first thing I'd
Hi Jerry,
Take a look at this example:
https://spark.apache.org/docs/latest/streaming-kafka-integration.html#tab_scala_2
The offsets are needed because as RDDs get generated within spark the
offsets move further along. With direct Kafka mode the current offsets are
no more persisted in Zookeeper
OK, I get it, I think currently Python based Kafka direct API do not
provide such equivalence like Scala, maybe we should figure out to add this
into Python API also.
2015-06-12 13:48 GMT+08:00 Amit Ramesh a...@yelp.com:
Hi Jerry,
Take a look at this example:
Hi,
What is your meaning of getting the offsets from the RDD, from my
understanding, the offsetRange is a parameter you offered to KafkaRDD, why
do you still want to get the one previous you set into?
Thanks
Jerry
2015-06-12 12:36 GMT+08:00 Amit Ramesh a...@yelp.com:
Congratulations on the