tgravescs commented on issue #26633: [SPARK-29994][CORE] Add WILDCARD task 
location
URL: https://github.com/apache/spark/pull/26633#issuecomment-558898559
 
 
   I agree the default setting change needs to happen in a bigger conversation, 
but if that conversation is going to happen we shouldn't check this in until 
that is had in my opinion. 
   
   I have not seen a real argument why this RDD is different than any other. 
But if we fix the real issue with locality then it helps everything. The 
argument that its a special version of ShuffledRowRDD and that sometimes you 
hit this locality issue doesn't convince me. I can hit the locality issue with 
ShuffledRowRDD, I might not hit the issue with the LocalShuffleRowRDD. Why not 
change ShuffledRowRDD or HadoopRDD to use this as well because I can hit the 
same issue? The only argument I can see is limited scope, but at the same time 
does it only turn it on then when you hit the case described with mappers < 
reducers and I have more executors then mappers?  If it turns it on more than 
that, then one could argue you aren't following the semantics defined by Spark 
for locality wait.
   
   I don't see any concrete numbers here on performance impact or how much this 
affects users or why we should special case this?  If it has a huge impact then 
I can see why we would special case it but I haven't seen any evidence of that. 
 Do we have any cases this is seen in production - is there negative impact of 
user just setting node locality wait = 0?
   
   Again the main issue I have is that once it's introduced anyone can use it 
in an RDD - therefore I consider it a public interface.  You say its limited 
impact and only used by adaptive execution but once introduced nothing stopping 
others from using it.
   
   Adding more people to get opinions.
   @vanzin @dongjoon-hyun @srowen 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to