mridulm edited a comment on pull request #30876: URL: https://github.com/apache/spark/pull/30876#issuecomment-751186157
@dongjoon-hyun proactive replication only applies to persisted RDD blocks, not shuffle blocks - not sure if I am missing something here. Even for persisted RDD blocks, it specifically applies when RDD is persisted with storage levels where `replication > 1` [1]. I view loss of all replicas of a RDD blockId similarly - whether replication is 1 or higher. Having said that, specifically for usecases where spark cluster might be source of truth (or cost of recomputation is prohibitive), applications can ofcourse enable proactive replication via this flag. I am not sure I am seeing a concrete reason to turn this on for all applications. Please let me know if I am missing something in my understanding. [1] ESS serving disk backed blocks might have some corner cases to this flow which I have not thought through. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
