viirya commented on issue #25856: [SPARK-29182][Core] Cache preferred locations of checkpointed RDD URL: https://github.com/apache/spark/pull/25856#issuecomment-536048562 > It's a significant change and I hesitate to add a new config. Yes, this may mean the preferred locations are 'wrong' sometimes. What's the impact of that, simply loss of locality? I'm trying to get a better sense of whether that's rare or common. What would cause the right answer to change -- data got cached on a different node? I have discussed this with @dongjoon-hyun. I think the impact is loss of locality. Preferred locations are used on finding hosts in scheduling tasks. The tasks will be still scheduled and run when the preferred locations are not correct. For example, when Spark executors are not on the same cluster as DFS cluster, we can not schedule tasks on the reported block locations too.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
