Github user amit-ramesh commented on the pull request:
https://github.com/apache/spark/pull/7185#issuecomment-119372136
@jerryshao @tdas
Have a few points from a user perspective regarding the DStream version:
1. Based on this PR it looks like foreachRDD is the only way to get offsets
in the DStream case. This would mean that the data needs to be sent over to the
driver in order to obtain offsets. Is it possible to obtain offsets in the
workers right after receiving the data from Kafka?
2. We have state in updateStateByKey() that necessitates attaching
corresponding Kafka metadata to every event in the DStream in order to be able
to reconstruct the state across deployments. Is there a way to attach the Kafka
offset to every event using the spark API? Essentially obtaining something akin
to the output that juanrh had originally proposed in SPARK-8337.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]