[GitHub] [spark] tgravescs commented on issue #26633: [SPARK-29994][CORE] Add WILDCARD task location

GitBox Tue, 26 Nov 2019 10:48:08 -0800

tgravescs commented on issue #26633: [SPARK-29994][CORE] Add WILDCARD task 
location
URL: https://github.com/apache/spark/pull/26633#issuecomment-558767028
 
 
   Fixing the scheduler locality algorithm is definitely more changes. The 
locality delay to me should be a per task delay, if a task doesn't get 
scheduled in 3 seconds then fall to the next locality. Right now it waits for 
any tasks to not be scheduled for 3 seconds at that locality.  I know Kay has 
an argument for the FairScheduler use case but I don't know that I agree with 
that or that it isn't handled by the per task delay. If you really want your 
task to wait that long for locality you can simple set it higher. I'm not sure 
the code changes required to make that change though and if we really wanted to 
leave the old way in there with a config how ugly the code gets.
   
   | But the problem is when we have less mappers (from the shuffle map stage) 
than the number of worker nodes, e.g., 5 vs. 10, and if we stick to the 
preferred locations, the LocalShuffledRowRDD will suffer from locality wait and 
be even slower than the original ShuffledRowRDD.
   
   I'm not sure I follow this statement.  If you have less mappers -lets say 
you have 5 and you have 10 worker nodes (assuming this is standalone mode - or 
do you mean executors?) - the 5 maps will run one 5 of those nodes. Your 
LocalShuffledRowRDD uses the map output location as the preferred locations so 
why wouldn't the scheduler schedule on those nodes?  Are you saying the 10 
workers nodes (not sure if you mean executors or workers?) are being used by 
others (either job or stage) and some might be busy and the delay on waiting is 
more then just reading over the network?  Is this case with dynamic allocation 
or not?  It sounds to me like the normal case of you ran on some executors, you 
may not have the same executors when your reduce phase runs so you are being 
delayed scheduling because you can't get node locality.  I think you could have 
the same thing with shuffledRowRDD with a small number maps/reducers.
   
   The issue I want to understand is why are we special casing this one RDD for 
a performance improvement when in my opinion the majority of jobs would get a 
benefit from not having to wait for the locality (as implemented today). 
Changing the default like Imran mentioned might be a good first step and then 
fixing the algorithm would be the second in my opinion.  
   
   Do you think a default of 0 node locality would solve your problem? 
Obviously if a user does set it then it gets applied again though.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] tgravescs commented on issue #26633: [SPARK-29994][CORE] Add WILDCARD task location

Reply via email to