As expected, J-D's suggestion basically causes the system to collapse almost 
immediately. All the region servers show stack traces such as these:

2011-09-13 23:34:43,151 WARN  [IPC Reader 0 on port 60020] 
ipc.HBaseServer$Listener(526): IPC Server listener on 60020: readAndProcess 
threw exception java.io.IOException: Connection reset by peer. Count of bytes 
read: 0
java.io.IOException: Connection reset by peer
        at sun.nio.ch.FileDispatcher.read0(Native Method)
        at sun.nio.ch.SocketDispatcher.read(Unknown Source)
        at sun.nio.ch.IOUtil.readIntoNativeBuffer(Unknown Source)
        at sun.nio.ch.IOUtil.read(Unknown Source)
        at sun.nio.ch.SocketChannelImpl.read(Unknown Source)
        at 
org.apache.hadoop.hbase.ipc.HBaseServer.channelRead(HBaseServer.java:1359)
        at 
org.apache.hadoop.hbase.ipc.HBaseServer$Connection.readAndProcess(HBaseServer.java:900)
        at 
org.apache.hadoop.hbase.ipc.HBaseServer$Listener.doRead(HBaseServer.java:522)
        at 
org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.run(HBaseServer.java:316)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown 
Source)
        at java.util.concurrent.ThreadPoolExe


2011-09-13 23:34:43,293 WARN  [IPC Server handler 0 on 60020] 
ipc.HBaseServer$Handler(1100): IPC Server handler 0 on 60020 caught: 
java.nio.channels.ClosedChannelException
        at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(Unknown Source)
        at sun.nio.ch.SocketChannelImpl.write(Unknown Source)
        at 
org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1341)
        at 
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:727)
        at 
org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:792)
        at 
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1083)

-----Original Message-----
From: Geoff Hendrey [mailto:[email protected]] 
Sent: Tuesday, September 13, 2011 10:56 PM
To: [email protected]
Cc: Andrew Purtell; Tony Wang; Rohit Nigam; Parmod Mehta; James Ladd
Subject: RE: scanner deadlock?

answers below

-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Stack
Sent: Tuesday, September 13, 2011 10:41 PM
To: [email protected]
Cc: Andrew Purtell; Tony Wang; Rohit Nigam; Parmod Mehta; James Ladd
Subject: Re: scanner deadlock?

On Tue, Sep 13, 2011 at 10:25 PM, Geoff Hendrey <[email protected]> wrote:
> I've upgraded to HotSpot 64 Bit Server VM, with HBase 90.4 and all 
> recommended config changes (100 region server handlers, mslab enabled, etc). 
> No change, if anything it dies faster. Count of sockets in CLOSE_WAIT on 
> 50010 increases linearly. I logged netstat from a random node in the cluster, 
> periodically. Then dumped the output into excel using a pivot table to look 
> at a behavior of TCP. Number of connections from the given node to others on 
> 50010 was relatively uniform (no hotspot). Connections on 50010 from given 
> node to *self* was much way higher than to other nodes, but that's probably a 
> good thing. My guess is it's HBase leveraging locality of files for the 
> region server. Just a guess.
>
Yes.  You have good locality.  So maybe you are not bound up on a
single network resource.

So when you jstack and you see that regionserver has its threads all
stuck in next -- are they? -- then we are likely going to the local
datanode.  

ANSWER: yes, I believe they are stuck in next. I can see from logs on the 
client that the call to next periodically takes very long to return. You had 
earlier commented that my 5 seconds regionserver least was low. I had set it 
low intentionally to get a fail-fast on the call to next. If I don't set it 
low, it will just take whatever the timeout is to fail (60 seconds default). 


Anything in its logs when regionserver slows down?

ANSWER: Yes. I see ScannerTimeoutException, and unknown scanner, and then 
ClosedChannelException. Stack trace shows the ClosedChannelException occurs 
when the server tries to write the response to the scanner. This seems like a 
bug to me. Once you close the channel you cannot write it, no? If you try to 
write it after you close it, you will get ClosedChannelException.


> next step will be to test with JD Cryans suggestion:
> " In order to completely rule out at least one thing, can you set 
> ipc.server.max.queue.size to 1 and hbase.regionserver.handler.count to a low 
> number (let's say 10)? If payload is putting too much memory pressure, we'll 
> know."
>
> ...though I'm not sure what I'm supposed to observe with these settings...but 
> I'll try it and report on the outcome.
>

Well, you have GC logging already.  If you look at the output do you
see big pauses? 

ANSWER: Nope, no long pauses at all. I've periodically run a few greps with a 
regex to try to find pauses one second or longer, and I haven't seen any of 
late. HOWEVER, one thing I don't understand at all is why ganglia reports HUGE 
gc pauses (like 1000 seconds!). But in reality I can *never* find such a pause 
in the GC log. Is there any known issue with ganglia graphs being on the wrong 
vertical scale for GC pauses? I know that sounds odd, but I just can't 
correlate the Ganglia graphs for GC to reality.

 I think J-D was thinking that regionservers would be
using less memory if you make the queues smaller.  You could try that.
 Maybe when queues are big, its taking a while to process them and
client times out.  What size are these rows?

ANSWER: rows are about 100KB-750KB. Trying J-D's suggestion now.

St.Ack


> -geoff
>
> -----Original Message-----
> From: Geoff Hendrey [mailto:[email protected]]
> Sent: Tuesday, September 13, 2011 4:50 PM
> To: [email protected]; Andrew Purtell
> Cc: Tony Wang; Rohit Nigam; Parmod Mehta; James Ladd
> Subject: RE: scanner deadlock?
>
> 1019 sockets on 50010 in CLOSED_WAIT state.
>
> -geoff
>
> -----Original Message-----
> From: Andrew Purtell [mailto:[email protected]]
> Sent: Tuesday, September 13, 2011 4:00 PM
> To: [email protected]
> Cc: Tony Wang; Rohit Nigam; Parmod Mehta; James Ladd
> Subject: Re: scanner deadlock?
>
>
>
>> My current working theory is that
>> too many sockets are in CLOSE_WAIT state (leading to
>> ClosedChannelException?). We're going to try to adjust some OS
>> parameters.
>
> How many sockets are in that state? netstat -an | grep CLOSE_WAIT | wc -l
>
> CDH3U1 contains HDFS-1836... https://issues.apache.org/jira/browse/HDFS-1836
>
> Best regards,
>
>        - Andy
>
> Problems worthy of attack prove their worth by hitting back. - Piet Hein (via 
> Tom White)
>
>
>>________________________________
>>From: Geoff Hendrey <[email protected]>
>>To: [email protected]
>>Cc: Tony Wang <[email protected]>; Rohit Nigam <[email protected]>; Parmod 
>>Mehta <[email protected]>; James Ladd <[email protected]>
>>Sent: Tuesday, September 13, 2011 9:49 AM
>>Subject: RE: scanner deadlock?
>>
>>Thanks Stack -
>>
>>Answers to all your questions below. My current working theory is that
>>too many sockets are in CLOSE_WAIT state (leading to
>>ClosedChannelException?). We're going to try to adjust some OS
>>parameters.
>>
>>" I'm asking if regionservers are bottlenecking on a single network
>>resource; a particular datanode, dns?"
>>
>>Gotcha. I'm gathering some tools now to collect and analyze netstat
>>output.
>>
>>" the regionserver is going slow getting data out of
>>hdfs.  Whats iowait like at the time of slowness?  Has it changed from
>>when all was running nicely?"
>>
>>iowait is high (20% above cpu), but not increasing. I'll try to quantify
>>that better.
>>
>>" You talk to hbase in the reducer?   Reducers don't start writing hbase
>>until job is 66% complete IIRC.    Perhaps its slowing as soon as it
>>starts writing hbase?  Is that so?"
>>
>>My statement about "running fine" applies to after the reducer has
>>completed sort. We have metrics produced by the reducer that log the
>>results of scans ant Puts. So we know that scans and puts proceed
>>without issue for hours.
>>
>

Reply via email to