cloud-fan commented on pull request #29331:
URL: https://github.com/apache/spark/pull/29331#issuecomment-671381145


   Since we already have `DISK_ONLY_2`, I'm fine adding `DISK_ONLY_3`.
   
   I'm just giving a different proposal for this use case. The RDD lineage 
model relies on recomputing so that Spark can cache data on unreliable storage. 
I think caching with multiple copies is diverging from the original idea. If 
you don't want to trigger recomputing, you can save data to reliable storage, 
which is usually better than 3 hard copies (object store is cheaper, HDFS has 
Erasure Coding to save space).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to