Thanks for the detail. Unless you've changed it, port 50010 is the *DataNode* data transfer socket. I'm surprised the HDFS tunings suggested by others on this thread have not had an impact.
I filed https://issues.apache.org/jira/browse/HBASE-11142 to track this report. On Mon, May 5, 2014 at 5:19 PM, Hansi Klose <[email protected]> wrote: > Hi Andrew, > > here is the output from our testing environment. > There we can see the same behavior like in our production environment. > > Sorry if my description was not clear. > The connection source is the hbase master process PID 793 and the target > are > the datanode port of our 3 regionserver. > > hbase master: lsof | grep TCP | grep CLOSE_WAIT > > http://pastebin.com/BTyiVgb2 > > Here are 40 connection in state CLOSE_WAIT to our 3 region server. > This connection are there since last week. > > Regards Hansi > > > Gesendet: Mittwoch, 30. April 2014 um 18:48 Uhr > > Von: "Andrew Purtell" <[email protected]> > > An: "[email protected]" <[email protected]> > > Betreff: Re: Re: Re: taking snapshot's creates to many TCP CLOSE_WAIT > handles on the hbase master server > > > > Let's circle back to the original mail: > > > > > When I run lsof I saw that there where a lot of TCP CLOSE_WAIT handles > > open with the regionserver as target. > > > > Is that right? *Regionserver*, not another process (datanode or > whatever)? > > Or did I miss where somewhere along this thread there was evidence > > confirming a datanode was the remote? > > > > If you are sure that the stuck connections are to the regionserver > process > > (maybe pastebin lsof output so we can double check the port numbers > > involved?) then the regionserver is closing the connection but the master > > is not somehow, by definition of what CLOSE_WAIT means. HDFS settings > won't > > matter if it is the master is failing to close a socket, maybe this is an > > IPC bug. > > > > > > > > On Wed, Apr 30, 2014 at 12:38 AM, Hansi Klose <[email protected]> > wrote: > > > > > Hi, > > > > > > sorry i missed that :-( > > > > > > I tried that parameter in my hbase-site.xml and restartet the hbase > master > > > and all regionserver. > > > > > > <property> > > > <name>dfs.client.socketcache.expiryMsec</name> > > > <value>900</value> > > > </property> > > > > > > No change, the ClOSE_WAIT sockets still persists on the hbase master > to the > > > regionserver's datanode after taking snapshots. > > > > > > Because it was not clear for me where to the setting has to go > > > i put it in our hdfs-site.xml too and restarted all datanodes. > > > I thought that settings with dfs.client maybe have to go there. > > > But this did not change the behavior either. > > > > > > Regards Hansi > > > > > > > Gesendet: Dienstag, 29. April 2014 um 19:21 Uhr > > > > Von: Stack <[email protected]> > > > > An: Hbase-User <[email protected]> > > > > Betreff: Re: Re: taking snapshot's creates to many TCP CLOSE_WAIT > > > handles on the hbase master server > > > > > > > > On Tue, Apr 29, 2014 at 8:15 AM, Hansi Klose <[email protected]> > wrote: > > > > > > > > > Hi all, > > > > > > > > > > sorry for the late answer. > > > > > > > > > > I configured the hbase-site.conf like this > > > > > > > > > > <property> > > > > > <name>dfs.client.socketcache.capacity</name> > > > > > <value>0</value> > > > > > </property> > > > > > <property> > > > > > <name>dfs.datanode.socket.reuse.keepalive</name> > > > > > <value>0</value> > > > > > </property> > > > > > > > > > > and restarted the hbase master and all regionservers. > > > > > I still can see the same behavior. Each snapshot creates > > > > > new CLOSE_WAIT Sockets which stay there till hbase master restart. > > > > > > > > > > I there any other setting I can try? > > > > > > > > > > > > > You saw my last suggestion about > "...dfs.client.socketcache.expiryMsec to > > > > 900 in your HBase client configuration.."? > > > > > > > > St.Ack > > > > > > > > > > > > > > > -- > > Best regards, > > > > - Andy > > > > Problems worthy of attack prove their worth by hitting back. - Piet Hein > > (via Tom White) > > > -- Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)
