Github user jerryshao commented on the pull request:
https://github.com/apache/spark/pull/7185#issuecomment-118215436
Hi @koeninger and @tdas , thanks a lot for your comments, I think TD
already explained well why I need to use `getNarrowAncestors()` to get the
original KafkaRDD, Python code not only wraps the original KafkaRDD, but also
do some additional transformations, so the RDD/DStream user get is not yet
original KafkaRDD/DirectKafkaInputDStream.
Also in the current implementation, the distance to the original KafkaRDD
is fixed, but it really depends on the internal of PySpark/ PySpark Streaming,
if the logic changed, the distance will also be changed, so we should not
assume a known and fixed distance.
Yes, currently my implementation definitely have some semantic difference
compared to Java/Scala, also not so strict enough.
@tdas , I tried your way, since the solution you mentioned offers the same
functionalities but with more python wrappers, I chose the previous simple way
what I implemented. But thought bit on your way, I think your way is more
strict and similar to the Scala/Java way, and will not lead to error use for
users. I will try again your way.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]