viirya commented on issue #25856: [SPARK-29182][Core] Cache preferred locations 
of checkpointed RDD
URL: https://github.com/apache/spark/pull/25856#issuecomment-536048562
 
 
   > It's a significant change and I hesitate to add a new config. Yes, this 
may mean the preferred locations are 'wrong' sometimes. What's the impact of 
that, simply loss of locality? I'm trying to get a better sense of whether 
that's rare or common. What would cause the right answer to change -- data got 
cached on a different node?
   
   I have discussed this with @dongjoon-hyun. I think the impact is loss of 
locality. Preferred locations are used on finding hosts in scheduling tasks. 
The tasks will be still scheduled and run when the preferred locations are not 
correct. For example, when Spark executors are not on the same cluster as DFS 
cluster, we can not schedule tasks on the reported block locations too.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to