cloud-fan commented on issue #22173: [SPARK-24355] Spark external shuffle 
server improvement to better handle block fetch requests.
URL: https://github.com/apache/spark/pull/22173#issuecomment-570509238
 
 
   We hit significant performance regression in our internal workload caused by 
this commit. After this commit, the executor can handle at most N chunk fetch 
requests at the same time, where N is the value of 
`spark.shuffle.io.serverThreads` * 
`spark.shuffle.server.chunkFetchHandlerThreadsPercent`. Previously, it was 
unlimited, and most of the time we can saturate the underlying channel.
   
   This commit does fix a nasty problem, and I'm fine with it even if it may 
introduce perf regression, but there should be a way to turn it off. 
Unfortunately, we can't turn off this feature. We can set 
`spark.shuffle.server.chunkFetchHandlerThreadsPercent` to a large value so that 
we can handle many chunk fetch requests at the same time, but it's hard to pick 
a good value which is not too large and can saturate the channel.
   
   Looking back at this problem, I think we can either create a dedicated 
channel for non chunk fetch request, or ask netty to handle channel write of 
non chunk fetch request first. Both seem hard to implement. Shall we revert it 
first, and think of a good fix later?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to