[GitHub] [spark] HeartSaVioR commented on pull request #30139: [SPARK-31069][CORE] high cpu caused by chunksBeingTransferred in external shuffle service

GitBox Thu, 05 Nov 2020 04:52:37 -0800


HeartSaVioR commented on pull request #30139:
URL: https://github.com/apache/spark/pull/30139#issuecomment-722358480



   The root issue here is that `numChunksBeingTransferred` is called quite 
often.
   
   If I understand correctly, the counter in stream is either increased or 
decreased via `processFetchRequest` & `processStreamRequest`, which 
`chunksBeingTransferred` is also called in prior. That said, the number of 
operations across counters would be just 2x of the number of calls for 
chunksBeingTransferred, and the difference of cost is significant. 
(increase/decrease atomic integer vs iterate all stream entities and read the 
atomic integer, and sum up) The cost for latter is linearly increasing based on 
the number of entities in streams, hence the problem appears if the number of 
entities is quite huge.
   
   So that sounds like a trade-off. Probably it would be pretty much beneficial 
to leave it as it is if we assume the number of stream entities will retain 
small enough, but if we assume the case where the number of stream entities are 
quite large, the cost of synchronizing numChunksBeingTransferred is going to be 
smaller.
   
   Possible alternative would be reducing the calculation of 
numChunksBeingTransferred - use cached value and update on condition like rate 
(once in 5 calls) or interval (update if the cached value is calculated earlier 
than XX seconds prior). We agreed that the value of numChunksBeingTransferred 
doesn't need to be strictly accurate, so this might be acceptable.
   
   Would like to hear the voices. Thanks!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] HeartSaVioR commented on pull request #30139: [SPARK-31069][CORE] high cpu caused by chunksBeingTransferred in external shuffle service

Reply via email to