Yeah. Ten handlers and no queue for the RPCs, it'll just reject calls when its > handler count.
Also send a listing of your hbase rootdir: hadoop fs -lsr /hbase St.Ack On Tue, Sep 13, 2011 at 11:40 PM, Geoff Hendrey <[email protected]> wrote: > As expected, J-D's suggestion basically causes the system to collapse almost > immediately. All the region servers show stack traces such as these: > > 2011-09-13 23:34:43,151 WARN [IPC Reader 0 on port 60020] > ipc.HBaseServer$Listener(526): IPC Server listener on 60020: readAndProcess > threw exception java.io.IOException: Connection reset by peer. Count of bytes > read: 0 > java.io.IOException: Connection reset by peer > at sun.nio.ch.FileDispatcher.read0(Native Method) > at sun.nio.ch.SocketDispatcher.read(Unknown Source) > at sun.nio.ch.IOUtil.readIntoNativeBuffer(Unknown Source) > at sun.nio.ch.IOUtil.read(Unknown Source) > at sun.nio.ch.SocketChannelImpl.read(Unknown Source) > at > org.apache.hadoop.hbase.ipc.HBaseServer.channelRead(HBaseServer.java:1359) > at > org.apache.hadoop.hbase.ipc.HBaseServer$Connection.readAndProcess(HBaseServer.java:900) > at > org.apache.hadoop.hbase.ipc.HBaseServer$Listener.doRead(HBaseServer.java:522) > at > org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.run(HBaseServer.java:316) > at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown > Source) > at java.util.concurrent.ThreadPoolExe > > > 2011-09-13 23:34:43,293 WARN [IPC Server handler 0 on 60020] > ipc.HBaseServer$Handler(1100): IPC Server handler 0 on 60020 caught: > java.nio.channels.ClosedChannelException > at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(Unknown Source) > at sun.nio.ch.SocketChannelImpl.write(Unknown Source) > at > org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1341) > at > org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:727) > at > org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:792) > at > org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1083) > > -----Original Message----- > From: Geoff Hendrey [mailto:[email protected]] > Sent: Tuesday, September 13, 2011 10:56 PM > To: [email protected] > Cc: Andrew Purtell; Tony Wang; Rohit Nigam; Parmod Mehta; James Ladd > Subject: RE: scanner deadlock? > > answers below > > -----Original Message----- > From: [email protected] [mailto:[email protected]] On Behalf Of Stack > Sent: Tuesday, September 13, 2011 10:41 PM > To: [email protected] > Cc: Andrew Purtell; Tony Wang; Rohit Nigam; Parmod Mehta; James Ladd > Subject: Re: scanner deadlock? > > On Tue, Sep 13, 2011 at 10:25 PM, Geoff Hendrey <[email protected]> wrote: >> I've upgraded to HotSpot 64 Bit Server VM, with HBase 90.4 and all >> recommended config changes (100 region server handlers, mslab enabled, etc). >> No change, if anything it dies faster. Count of sockets in CLOSE_WAIT on >> 50010 increases linearly. I logged netstat from a random node in the >> cluster, periodically. Then dumped the output into excel using a pivot table >> to look at a behavior of TCP. Number of connections from the given node to >> others on 50010 was relatively uniform (no hotspot). Connections on 50010 >> from given node to *self* was much way higher than to other nodes, but >> that's probably a good thing. My guess is it's HBase leveraging locality of >> files for the region server. Just a guess. >> > Yes. You have good locality. So maybe you are not bound up on a > single network resource. > > So when you jstack and you see that regionserver has its threads all > stuck in next -- are they? -- then we are likely going to the local > datanode. > > ANSWER: yes, I believe they are stuck in next. I can see from logs on the > client that the call to next periodically takes very long to return. You had > earlier commented that my 5 seconds regionserver least was low. I had set it > low intentionally to get a fail-fast on the call to next. If I don't set it > low, it will just take whatever the timeout is to fail (60 seconds default). > > > Anything in its logs when regionserver slows down? > > ANSWER: Yes. I see ScannerTimeoutException, and unknown scanner, and then > ClosedChannelException. Stack trace shows the ClosedChannelException occurs > when the server tries to write the response to the scanner. This seems like a > bug to me. Once you close the channel you cannot write it, no? If you try to > write it after you close it, you will get ClosedChannelException. > > >> next step will be to test with JD Cryans suggestion: >> " In order to completely rule out at least one thing, can you set >> ipc.server.max.queue.size to 1 and hbase.regionserver.handler.count to a low >> number (let's say 10)? If payload is putting too much memory pressure, we'll >> know." >> >> ...though I'm not sure what I'm supposed to observe with these >> settings...but I'll try it and report on the outcome. >> > > Well, you have GC logging already. If you look at the output do you > see big pauses? > > ANSWER: Nope, no long pauses at all. I've periodically run a few greps with a > regex to try to find pauses one second or longer, and I haven't seen any of > late. HOWEVER, one thing I don't understand at all is why ganglia reports > HUGE gc pauses (like 1000 seconds!). But in reality I can *never* find such a > pause in the GC log. Is there any known issue with ganglia graphs being on > the wrong vertical scale for GC pauses? I know that sounds odd, but I just > can't correlate the Ganglia graphs for GC to reality. > > I think J-D was thinking that regionservers would be > using less memory if you make the queues smaller. You could try that. > Maybe when queues are big, its taking a while to process them and > client times out. What size are these rows? > > ANSWER: rows are about 100KB-750KB. Trying J-D's suggestion now. > > St.Ack > > >> -geoff >> >> -----Original Message----- >> From: Geoff Hendrey [mailto:[email protected]] >> Sent: Tuesday, September 13, 2011 4:50 PM >> To: [email protected]; Andrew Purtell >> Cc: Tony Wang; Rohit Nigam; Parmod Mehta; James Ladd >> Subject: RE: scanner deadlock? >> >> 1019 sockets on 50010 in CLOSED_WAIT state. >> >> -geoff >> >> -----Original Message----- >> From: Andrew Purtell [mailto:[email protected]] >> Sent: Tuesday, September 13, 2011 4:00 PM >> To: [email protected] >> Cc: Tony Wang; Rohit Nigam; Parmod Mehta; James Ladd >> Subject: Re: scanner deadlock? >> >> >> >>> My current working theory is that >>> too many sockets are in CLOSE_WAIT state (leading to >>> ClosedChannelException?). We're going to try to adjust some OS >>> parameters. >> >> How many sockets are in that state? netstat -an | grep CLOSE_WAIT | wc -l >> >> CDH3U1 contains HDFS-1836... https://issues.apache.org/jira/browse/HDFS-1836 >> >> Best regards, >> >> - Andy >> >> Problems worthy of attack prove their worth by hitting back. - Piet Hein >> (via Tom White) >> >> >>>________________________________ >>>From: Geoff Hendrey <[email protected]> >>>To: [email protected] >>>Cc: Tony Wang <[email protected]>; Rohit Nigam <[email protected]>; Parmod >>>Mehta <[email protected]>; James Ladd <[email protected]> >>>Sent: Tuesday, September 13, 2011 9:49 AM >>>Subject: RE: scanner deadlock? >>> >>>Thanks Stack - >>> >>>Answers to all your questions below. My current working theory is that >>>too many sockets are in CLOSE_WAIT state (leading to >>>ClosedChannelException?). We're going to try to adjust some OS >>>parameters. >>> >>>" I'm asking if regionservers are bottlenecking on a single network >>>resource; a particular datanode, dns?" >>> >>>Gotcha. I'm gathering some tools now to collect and analyze netstat >>>output. >>> >>>" the regionserver is going slow getting data out of >>>hdfs. Whats iowait like at the time of slowness? Has it changed from >>>when all was running nicely?" >>> >>>iowait is high (20% above cpu), but not increasing. I'll try to quantify >>>that better. >>> >>>" You talk to hbase in the reducer? Reducers don't start writing hbase >>>until job is 66% complete IIRC. Perhaps its slowing as soon as it >>>starts writing hbase? Is that so?" >>> >>>My statement about "running fine" applies to after the reducer has >>>completed sort. We have metrics produced by the reducer that log the >>>results of scans ant Puts. So we know that scans and puts proceed >>>without issue for hours. >>> >> >
