On Mon, Jan 30, 2012 at 6:39 AM, Galed Friedmann <[email protected]> wrote: > Lately we're having weird issues with Thrift, after several hours the > Thrift server "hangs" - the scripts that are using it to access HBase get > connection timeouts, we're also using Heroku and ruby on rails apps that > use Thrift and they simply get stuck. Only when restarting the Thrift > process everything goes back to normal. >
Can you thread dump the thrift server when its all hung up? Have you enabled > 2012-01-30 10:52:08,823 INFO org.apache.zookeeper.server.NIOServerCnxn: > Established session 0x1352a393d18051e with negotiated timeout 90000 for > client /10.217.55.193:35940 > 2012-01-30 10:52:28,001 INFO org.apache.zookeeper.server.ZooKeeperServer: > Expiring session 0x1352a393d18051b, timeout of 90000ms exceeded > 2012-01-30 10:52:28,001 INFO > org.apache.zookeeper.server.PrepRequestProcessor: Processed session > termination for sessionid: 0x1352a393d18051b ZK is establishing a session w/ 90second timeout and then timing out immediately? > 2012-01-30 10:51:36,382 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server > listener on 60020: readAndProcess threw exception java.io.IOException: > Connection rese > t by peer. Count of bytes read: 0 > java.io.IOException: Connection reset by peer > at sun.nio.ch.FileDispatcher.read0(Native Method) > at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21) > at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:237) > at sun.nio.ch.IOUtil.read(IOUtil.java:210) > at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:236) > at > org.apache.hadoop.hbase.ipc.HBaseServer.channelRead(HBaseServer.java:1359) > at > org.apache.hadoop.hbase.ipc.HBaseServer$Connection.readAndProcess(HBaseServer.java:900) > at > org.apache.hadoop.hbase.ipc.HBaseServer$Listener.doRead(HBaseServer.java:522) > at > org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.run(HBaseServer.java:316) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:619) > 2012-01-30 10:52:24,016 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner > -4511393305838866925 lease expired > 2012-01-30 10:52:24,016 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner > -5818959718437063034 lease expired > 2012-01-30 10:52:24,016 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner > -1408921590864341720 lease expired > Client went away? All the lease expireds happen always or just around time of the hangup (You are closing scanners when done?) St.Ack
