maryannxue commented on issue #26633: [SPARK-29994][CORE] Add WILDCARD task location URL: https://github.com/apache/spark/pull/26633#issuecomment-558916448 @tgravescs > People are setting this to zero now anyway so changing default makes sense to me. Not sure how representative these "people" are. So let's bring the whole thing to dev list discussion. > Again the main issue I have is that once it's introduced anyone can use it in an RDD - therefore I consider it a public interface. You say its limited impact and only used by adaptive execution but once introduced nothing stopping others from using it. I actually see it the other way. If we do see that regular ShuffledRowRDD suffer from locality wait when it does happen to have a preferred location (because of satisfying `REDUCER_PREF_LOCS_FRACTION` and some other conditions), we might end up finding this new location handy. I don't see why I need to make an argument about how LocalShuffledRowRDD is different from other RDDs. On the contrary, if other RDDs have the same requirement, they can opt to use this approach as well. But I have to point out we can't rule out the possibility that there are other other RDDs that don't enjoy the world of no locality wait at all. As I pointed out in the very beginning https://github.com/apache/spark/pull/26633#issuecomment-558379888, each RDD should know their own locality preference as well as the importance of such locality. If we ever had to worry about this location being used improperly, we'd have to worry about if any other regular location is returned correctly by the RDD as well. That said, this is a partial solution only. I'd like to see a complete fix as well, but I don't think we should go completely the other way, by changing the default wait time to 0.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
