tgravescs commented on issue #26633: [SPARK-29994][CORE] Add WILDCARD task 
location
URL: https://github.com/apache/spark/pull/26633#issuecomment-559124754
 
 
   Nobody has answered my questions above as to why this RDD should be treated 
differently and the impact of this.  You just keep saying this is for adaptive 
scheduling.  As far as I can see, this is purely another instance of 
https://issues.apache.org/jira/browse/SPARK-18886 and I don't see why we aren't 
using the same workaround or really fixing the real issue.
   
   > As I pointed out in the very beginning #26633 (comment), each RDD should 
know their own locality preference as well as the importance of such locality. 
If we ever had to worry about this location being used improperly, we'd have to 
worry about if any other regular location is returned correctly by the RDD as 
well.
   
   I don't agree. HadoopRDD for instance knows its locality, but how important 
the locality is very user/cluster specific. I don't see how the 
LocalShuffledRowRDD is any different.  You are saying the user never cares 
about the locality on this - please explain to me why and how it is different 
from HadoopRDD?  If we were to turn this on for HadoopRDD though then we would 
essentially be bypassing the locality settings.
   
   >  Even if we do it, people that run jobs that need delay scheduling still 
need to set the locality wait. For these users, we need this WILDCARD location 
feature to enable AQE.
   
   Again why is AQE different?  lets say I really want my HadoopRDD to use 
locality but then the shuffledRDD hits this issue.  As a user I can't just turn 
locality off for my shuffleRDD so what makes the LocalShuffledRowRDD any 
different?  From what has been described here, this is a very particular case. 
You have more nodes and reducers then maps, the maps finish very quickly 
(probably within 3 seconds), these are the same conditions other RDDs can hit 
the same issue

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to