> My current working theory is that
> too many sockets are in CLOSE_WAIT state (leading to
> ClosedChannelException?). We're going to try to adjust some OS
> parameters.

How many sockets are in that state? netstat -an | grep CLOSE_WAIT | wc -l

CDH3U1 contains HDFS-1836... https://issues.apache.org/jira/browse/HDFS-1836

Best regards,

       - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein (via 
Tom White)


>________________________________
>From: Geoff Hendrey <[email protected]>
>To: [email protected]
>Cc: Tony Wang <[email protected]>; Rohit Nigam <[email protected]>; Parmod 
>Mehta <[email protected]>; James Ladd <[email protected]>
>Sent: Tuesday, September 13, 2011 9:49 AM
>Subject: RE: scanner deadlock?
>
>Thanks Stack - 
>
>Answers to all your questions below. My current working theory is that
>too many sockets are in CLOSE_WAIT state (leading to
>ClosedChannelException?). We're going to try to adjust some OS
>parameters.
>
>" I'm asking if regionservers are bottlenecking on a single network
>resource; a particular datanode, dns?"
>
>Gotcha. I'm gathering some tools now to collect and analyze netstat
>output.
>
>" the regionserver is going slow getting data out of
>hdfs.  Whats iowait like at the time of slowness?  Has it changed from
>when all was running nicely?"
>
>iowait is high (20% above cpu), but not increasing. I'll try to quantify
>that better.
>
>" You talk to hbase in the reducer?   Reducers don't start writing hbase
>until job is 66% complete IIRC.    Perhaps its slowing as soon as it
>starts writing hbase?  Is that so?"
>
>My statement about "running fine" applies to after the reducer has
>completed sort. We have metrics produced by the reducer that log the
>results of scans ant Puts. So we know that scans and puts proceed
>without issue for hours.
> 

Reply via email to