+1 to looking at the remote end of the socket and see where they're going/coming to/from. I've seen a few HDFS JIRA issues filed about sockets left in CLOSED_WAIT.

Lucky you, this is a fun Linux rabbit hole to go down :)

(https://blog.cloudflare.com/this-is-strictly-a-violation-of-the-tcp-specification/ covers some of the technical details)

On 1/24/18 6:37 PM, Christopher wrote:
I haven't seen that, but I'm curious what OS, Hadoop, ZooKeeper, and Accumulo version you're running. I'm assuming you verified that it was the TabletServer process holding these TCP sockets open using `netstat -p` and cross-referencing the PID with `jps -ml` (or similar)? Are you able to confirm based on the port number that these were Thrift connections or could they be ZooKeeper or Hadoop connections? Do you have any special non-default Accumulo RPC configuration (SSL or SASL)?

On Wed, Jan 24, 2018 at 3:46 PM Adam J. Shook <adamjsh...@gmail.com <mailto:adamjsh...@gmail.com>> wrote:

    Hello all,

    Has anyone come across an issue with a TabletServer occupying a
    large number of ports in a CLOSED_WAIT state?  'Normal' number of
used ports on a 12-node cluster are around 12,000 to 20,000 ports. In one instance, there were over 68k and it was affecting other
    applications from getting a free port and they would fail to start
    (which is how we found this in the first place).

    Thank you,
    --Adam

Reply via email to