tgravescs commented on issue #26633: [SPARK-29994][CORE] Add WILDCARD task location URL: https://github.com/apache/spark/pull/26633#issuecomment-558658524 I'm not seeing how this is different then any other task in Spark? In my opinion the locality fall back in Spark is broken. I generally recommend people to set the locality delay to 0 on the node side because you can get very weird results where tasks wait way to long to be scheduled. On most networks these days its better to just run the task somewhere then wait for locality. I realize though there are other conditions this was added for and I've never spent the time to go look at a proper solution for it. This just seems like a workaround to doing the right fix that can only be used by this specialized RDDs. Maybe that is ok for now but I would like to make sure its very clear on that and somewhat hesitate because this is essentially a public API that would be hard to remove if it starts to get used elsewhere. I'm assuming the intention is to just have LocalShuffledRowRDD always add it to the preferred locations? I'm a bit surprised that change isn't in here as well as it seems relatively small and would show the use of it, maybe I'm wrong and its large though, which would make sense to split apart. Are there other specific usecases you have for this? I think this should be discussed more before going in. @squito
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
