tgravescs commented on issue #26633: [SPARK-29994][CORE] Add WILDCARD task 
location
URL: https://github.com/apache/spark/pull/26633#issuecomment-558658524
 
 
   I'm not seeing how this is different then any other task in Spark?  In my 
opinion the locality fall back in Spark is broken.  I generally recommend 
people to set the locality delay to 0 on the node side because you can get very 
weird results where tasks wait way to long to be scheduled. On most networks 
these days its better to just run the task somewhere then wait for locality.  I 
realize though there are other conditions this was added for and I've never 
spent the time to go look at a proper solution for it.
   This just seems like a workaround to doing the right fix that can only be 
used by this specialized RDDs.  Maybe that is ok for now but I would like to 
make sure its very clear on that and somewhat hesitate because this is 
essentially a public API that would be hard to remove if it starts to get used 
elsewhere.  
   
   I'm assuming the intention is to just have LocalShuffledRowRDD always add it 
to the preferred locations?  I'm a bit surprised that change isn't in here as 
well as it seems relatively small and would show the use of it, maybe I'm wrong 
and its large though, which would make sense to split apart.
   
   Are there other specific usecases you have for this?
   
   I think this should be discussed more before going in.
   
   @squito 
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to