[GitHub] spark pull request: [SPARK-8389][Streaming][PySpark] Expose KafkaR...

jerryshao Tue, 07 Jul 2015 22:59:03 -0700

Github user jerryshao commented on the pull request:

    https://github.com/apache/spark/pull/7185#issuecomment-119447855
  
    Hi @amit-ramesh , what I mentioned about getting offsetRanges in 
`transform` function is something like this:
    
    ```python
    dstream.transform(lambda r: r.offsetRanges())
    ```
    
    Here `r.offsetRanges` is executed in driver side, if you have follow-up 
transformations inside `transfrom` function which need to use this 
offsetRanges, this offsetRanges will implicitly be sent to executor side. 
That's what I mean about.
    
    Also:
    >1. Events in an RDD partition are ordered by Kafka offset
    >2. The index of an OffsetRanges object in the getOffsets() list 
corresponds to the partition index in the RDD.
    
    This two assumptions are true as I know. so you could rely on this 
assumptions.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-8389][Streaming][PySpark] Expose KafkaR...

Reply via email to