GitHub user jerryshao opened a pull request:

    https://github.com/apache/spark/pull/7185

    [SPARK-8389][Streaming][PySpark] Expose KafkaRDDs offsetRange in Python

    This PR propose a simple way to expose OffsetRange in Python code, also the 
usage of offsetRanges is similar to Scala/Java way, here in Python we could get 
OffsetRange like:
    
    ```
    dstream.foreachRDD(lambda r: KafkaUtils.offsetRanges(r))
    ```
    
    Reason I didn't follow the way what SPARK-8389 suggested is that: Python 
Kafka API has one more step to decode the message compared to Scala/Java, Which 
makes Python API return a transformed RDD/DStream, not directly wrapped 
so-called JavaKafkaRDD, so it is hard to backtrack to the original RDD to get 
the offsetRange.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jerryshao/apache-spark SPARK-8389

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/7185.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #7185
    
----
commit 848c708246a632aea19e3a992d28973b6d31df10
Author: jerryshao <[email protected]>
Date:   2015-06-25T02:28:51Z

    Add HasOffsetRanges for Python

commit 2aabf9e1b7b8414aab3428cbbab772504b66078f
Author: jerryshao <[email protected]>
Date:   2015-07-02T07:58:09Z

    Style fix

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to