maryannxue commented on issue #26633: [SPARK-29994][CORE] Add WILDCARD task 
location
URL: https://github.com/apache/spark/pull/26633#issuecomment-558916448
 
 
   @tgravescs
   
   > People are setting this to zero now anyway so changing default makes sense 
to me.
   
   Not sure how representative these "people" are. So let's bring the whole 
thing to dev list discussion.
   
   > Again the main issue I have is that once it's introduced anyone can use it 
in an RDD - therefore I consider it a public interface. You say its limited 
impact and only used by adaptive execution but once introduced nothing stopping 
others from using it.
   
   I actually see it the other way. If we do see that regular ShuffledRowRDD 
suffer from locality wait when it does happen to have a preferred location 
(because of satisfying `REDUCER_PREF_LOCS_FRACTION` and some other conditions), 
we might end up finding this new location handy.
   I don't see why I need to make an argument about how LocalShuffledRowRDD is 
different from other RDDs. On the contrary, if other RDDs have the same 
requirement, they can opt to use this approach as well. But I have to point out 
we can't rule out the possibility that there are other other RDDs that don't 
enjoy the world of no locality wait at all.
   As I pointed out in the very beginning 
https://github.com/apache/spark/pull/26633#issuecomment-558379888, each RDD 
should know their own locality preference as well as the importance of such 
locality. If we ever had to worry about this location being used improperly, 
we'd have to worry about if any other regular location is returned correctly by 
the RDD as well.
   
   That said, this is a partial solution only. I'd like to see a complete fix 
as well, but I don't think we should go completely the other way, by changing 
the default wait time to 0.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to