attilapiros opened a new pull request #23930: [SPARK-27021][CORE] Cleanup of 
Netty event loop group for shuffle chunk fetch requests
URL: https://github.com/apache/spark/pull/23930
 
 
   ## What changes were proposed in this pull request?
   
   Creating an Netty `EventLoopGroup` leads to creating a new Thread pool for 
handling the events. For stopping the threads of the pool the event loop group 
should be shut down which is properly done for transport servers and clients by 
calling for example the `shutdownGracefully()` method (for details see the 
`close()` method of `TransportClientFactory` and `TransportServer`). But there 
is a separate event loop group for shuffle chunk fetch requests which is in 
pipeline for handling fetch request (shared between the client and server) and 
owned by the `TransportContext` and this was never shut down. 
   
   ## How was this patch tested?
   
   With existing unittest.
   
   This leak is in the production system too but its effect is spiking in the 
unittest. 
   
   Checking the core unittest logs before the PR:
   ```
   $ grep "LEAK IN SUITE" unit-tests.log | grep -o shuffle-chunk-fetch-handler 
| wc -l
   381
   ```
   
   And after the PR without whitelisting in thread audit and with an extra 
`await` after the 
   ` chunkFetchWorkers.shutdownGracefully()`:
   ```
   $ grep "LEAK IN SUITE" unit-tests.log | grep -o shuffle-chunk-fetch-handler 
| wc -l
   0
   ```
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to