[GitHub] spark pull request: [SPARK-8389][Streaming][PySpark] Expose KafkaR...

amit-ramesh Tue, 07 Jul 2015 16:16:52 -0700

Github user amit-ramesh commented on the pull request:

    https://github.com/apache/spark/pull/7185#issuecomment-119372136
  
    @jerryshao @tdas 
    Have a few points from a user perspective regarding the DStream version:
    
    1. Based on this PR it looks like foreachRDD is the only way to get offsets 
in the DStream case. This would mean that the data needs to be sent over to the 
driver in order to obtain offsets. Is it possible to obtain offsets in the 
workers right after receiving the data from Kafka?
    
    2. We have state in updateStateByKey() that necessitates attaching 
corresponding Kafka metadata to every event in the DStream in order to be able 
to reconstruct the state across deployments. Is there a way to attach the Kafka 
offset to every event using the spark API? Essentially obtaining something akin 
to the output that juanrh had originally proposed in SPARK-8337.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-8389][Streaming][PySpark] Expose KafkaR...

Reply via email to