Github user dhruve commented on the issue:

    https://github.com/apache/spark/pull/18487
  
    @jinxing64 I performed few runs to see if we were observing any performance 
issues with the change. I ran a simple word count job over a random set of text 
- 3TB. I couldn't get 100's of executors hammering a single NM, however each NM 
ended up serving approximately a gig of shuffle data (This might be less in 
magnitude compared to the job that you are running). I ran it for different 
maxBlocks # - 10, 20 and 50 keeping the executors to be roughly same and didn't 
notice any performance difference in terms of the running time of the job 
across different runs.
    
    For a proper value of `maxBlocksInFlightPerAddress`, my understanding is 
that we want to set it to a value as high as possible which doesn't max out the 
NM and this depends a lot on your workloads and how your data ends up being 
distributed. Setting it to a lower value mitigates the issue. 
    
    Since the changes are on the reducer side, I would recommend to run your 
job against the changes and set it to a low value and have couple of runs to 
see if you are seeing any performance hit with large no. of connections hitting 
a node, as I couldn't reproduce it with the 3TB word count. 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to