Github user mateiz commented on the pull request:

    https://github.com/apache/spark/pull/1276#issuecomment-48090143
  
    The cause seems to be that when you do operations like map() followed by 
map(), you get a PipelinedRDD, which does not necessarily have an underlying 
Java RDD until you access its _jrdd property. Creating a Java RDD for each 
PipelinedRDD is probably expensive so we shouldn't do that until we call id() 
on it. On the other hand, we probably want the IDs to match what will show up 
in the web UI, so I think we have to return the Java version of the ID, not a 
new set of numbers we make up in Python.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to