Github user jerryshao commented on the pull request:

    https://github.com/apache/spark/pull/7185#issuecomment-118215436
  
    Hi @koeninger and @tdas , thanks a lot for your comments, I think TD 
already explained well why I need to use `getNarrowAncestors()` to get the 
original KafkaRDD, Python code not only wraps the original KafkaRDD, but also 
do some additional transformations, so the RDD/DStream user get is not yet 
original KafkaRDD/DirectKafkaInputDStream.
    
    Also in the current implementation, the distance to the original KafkaRDD 
is fixed, but it really depends on the internal of PySpark/ PySpark Streaming, 
if the logic changed, the distance will also be changed, so we should not 
assume a known and fixed distance.
    
    Yes, currently my implementation definitely have some semantic difference 
compared to Java/Scala, also not so strict enough.
    
    @tdas , I tried your way, since the solution you mentioned offers the same 
functionalities but with more python wrappers, I chose the previous simple way 
what I implemented. But thought bit on your way, I think your way is more 
strict and similar to the Scala/Java way, and will not lead to error use for 
users. I will try again your way.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to