cloud-fan commented on pull request #29331: URL: https://github.com/apache/spark/pull/29331#issuecomment-671381145
Since we already have `DISK_ONLY_2`, I'm fine adding `DISK_ONLY_3`. I'm just giving a different proposal for this use case. The RDD lineage model relies on recomputing so that Spark can cache data on unreliable storage. I think caching with multiple copies is diverging from the original idea. If you don't want to trigger recomputing, you can save data to reliable storage, which is usually better than 3 hard copies (object store is cheaper, HDFS has Erasure Coding to save space). ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
