Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/6352#issuecomment-104848483
  
    Also: if my above reasoning is right and this optimization is incorrect, 
then it's concerning that it didn't cause a test failure.  My hunch is that we 
don't have unit tests for the particular combinations of RDD dependency graphs, 
caching states, and map output availability that would expose this issue. It 
would be nice to write a failing regression test which would have caught the 
problems in the current version of this patch, since that will help us to gain 
confidence that the new optimizations are safe.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to