[GitHub] spark pull request: [SPARK-6717][ML] Clear shuffle files after che...

holdenk Wed, 20 Apr 2016 17:46:13 -0700

Github user holdenk commented on the pull request:

    https://github.com/apache/spark/pull/11919#issuecomment-212672336
  
    @MLnick As mentioned in the line comments, that approach turns out to not 
be as simple as planned. checkpointing kills all of the parents information we 
need to clean up the shuffle files. I could refactor this so that we capture 
the dependency information needed - but a count() on a cached RDD should be low 
enough cost I'm not sure it would be worth it. What are your thoughts?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-6717][ML] Clear shuffle files after che...

Reply via email to