Correct! It is TCP listen overflow issue!
Thanks for your help, Gopal V and Kuhu!



Using below command, i can see many overflows.


# netstat -s | grep -i overflow


[root@node-ana-coreLKpD0001 ~]# netstat -s | grep -i overflow
    96282 times the listen queue of a socket overflowed
    TCPTimeWaitOverflow: 2499680



net.core.somaxconn default is 128 on my cluster, which is too small.


I ran about 10 queries concurrently with the following settings
set hive.tez.auto.reducer.parallelism=true
which made the situation worse.



Follow Gopal V's instructions below ,fetch phase long pause disappeared.


------------------ Original ------------------
From: &nbsp;"Gopal V";<gop...@apache.org&gt;;
Send time:&nbsp;Monday, Jun 1, 2020 1:49 PM
To:&nbsp;"user"<user@tez.apache.org&gt;; 

Subject: &nbsp;Re: tez shuffle fetch phase has long pause





&gt; In Reduce5, i see long pause during fetch occasionally 

This is likely the TCP listen overflow issue, but just doesn't get 
reported as a packet loss issue because the retry works okay.

https://issues.apache.org/jira/browse/MAPREDUCE-6763

That's the fix to be applied on the YARN Shuffle handler.

You can confirm the change by running

# ss -tln

I usually diagnose it by checking for TCP cookies in the dmesg or 
looking at the snmp data.

# netstat -s | grep -i overflow


This issue also affects HDFS namenode, which is also usually unreported 
by users.

https://issues.apache.org/jira/browse/HADOOP-16504

The delay is usually 2 * tcp max-segment-length &amp; is usually reduced by 
increasing the OS half-open connection count.

I end up doing

# sysctl -w net.core.somaxconn=16384
# sysctl -w net.ipv4.tcp_fin_timeout=2

to speed up the retries &amp; restarting daemons.

This affects Tez a little worse than MRv2, because the same JVM runs 
multiple instances of the same vertex sequentially, instead of a new JVM 
for every task (which runs way slower, reducing the concurrency of 
connections).

Cheers,
Gopal

Reply via email to