Github user amit-ramesh commented on the pull request:
https://github.com/apache/spark/pull/7185#issuecomment-119412470
@jerryshao
Although the title of SPARK-8337 is worded more broadly, all we really
wanted is a way to attach Kafka offsets to individual events/messages :).
AFAICT, the implementation of transform appears to be doing executor side
operations. And if the following two assumptions hold then we should be able to
achieve what we need:
1. Events in an RDD partition are ordered by Kafka offset
2. The index of an OffsetRanges object in the getOffsets() list corresponds
to the partition index in the RDD.
If this is not indeed so, it would be great if you could point me to some
lines that use driver side operations in transform.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]