Hi,
It doesn't look like the servers are loaded, we're not passing that much
traffic though the cluster at the moment.
Can you explain how to take the dump from the Thrift server? I couldn't
find how to do that.

At the moment we have only 1 Thrift gateway, I'm going to add some more
with load balancing.

Thanks again.

On Wed, Feb 1, 2012 at 6:57 PM, Stack <[email protected]> wrote:

> On Wed, Feb 1, 2012 at 1:00 AM, Galed Friedmann
> <[email protected]> wrote:
> > 1. I've taken a dump from the HMaster when we felt some timeouts, I hope
> > that's what you're looking for, attached.
>
> I was looking for dumps of the hung up thrift server.
>
> The master dump shows it idle.
>
> > 2. The timeout occurs around 10-12 hours after the ZK established the
> > connection with the Thrift server so it's not immediate. On the Thrift
> logs
> > you see that nothing happened and only see the timeouts on the ZK logs.
> > Actually we hadn't had errors in the last 15 hours nor ZK timeouts for
> > Thrift but it'll happen again I'm sure..
>
> OK.  Thread dump it when its hung up.    Thrift is getting stuck going
> against the cluster it seems.  How many gateways are you running?  Run
> more?
>
> > 3. The lease expiration happens all the time, we're using mostly JRuby
> > scripts and closing the scans when we're done.
> >
>
> Could it be the client is taking a long time to get back to the
> server?  Or maybe the server is taking long time to respond because
> its heavily loaded (is it?).
>
> St.Ack
>
> > Thanks again,
> > Galed.
> >
> >
> > On Tue, Jan 31, 2012 at 10:51 PM, Stack <[email protected]> wrote:
> >>
> >> On Mon, Jan 30, 2012 at 6:39 AM, Galed Friedmann
> >> <[email protected]> wrote:
> >> > Lately we're having weird issues with Thrift, after several hours the
> >> > Thrift server "hangs" - the scripts that are using it to access HBase
> >> > get
> >> > connection timeouts, we're also using Heroku and ruby on rails apps
> that
> >> > use Thrift and they simply get stuck. Only when restarting the Thrift
> >> > process everything goes back to normal.
> >> >
> >>
> >> Can you thread dump the thrift server when its all hung up?
> >>
> >> Have you enabled
> >>
> >>
> >> > 2012-01-30 10:52:08,823 INFO
> org.apache.zookeeper.server.NIOServerCnxn:
> >> > Established session 0x1352a393d18051e with negotiated timeout 90000
> for
> >> > client /10.217.55.193:35940
> >> > 2012-01-30 10:52:28,001 INFO
> >> > org.apache.zookeeper.server.ZooKeeperServer:
> >> > Expiring session 0x1352a393d18051b, timeout of 90000ms exceeded
> >> > 2012-01-30 10:52:28,001 INFO
> >> > org.apache.zookeeper.server.PrepRequestProcessor: Processed session
> >> > termination for sessionid: 0x1352a393d18051b
> >>
> >> ZK is establishing a session w/ 90second timeout and then timing out
> >> immediately?
> >>
> >>
> >>
> >>
> >> > 2012-01-30 10:51:36,382 WARN org.apache.hadoop.ipc.HBaseServer: IPC
> >> > Server
> >> > listener on 60020: readAndProcess threw exception java.io.IOException:
> >> > Connection rese
> >> > t by peer. Count of bytes read: 0
> >> > java.io.IOException: Connection reset by peer
> >> >        at sun.nio.ch.FileDispatcher.read0(Native Method)
> >> >        at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
> >> >        at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:237)
> >> >        at sun.nio.ch.IOUtil.read(IOUtil.java:210)
> >> >        at
> sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:236)
> >> >        at
> >> >
> >> >
> org.apache.hadoop.hbase.ipc.HBaseServer.channelRead(HBaseServer.java:1359)
> >> >        at
> >> >
> >> >
> org.apache.hadoop.hbase.ipc.HBaseServer$Connection.readAndProcess(HBaseServer.java:900)
> >> >        at
> >> >
> >> >
> org.apache.hadoop.hbase.ipc.HBaseServer$Listener.doRead(HBaseServer.java:522)
> >> >        at
> >> >
> >> >
> org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.run(HBaseServer.java:316)
> >> >        at
> >> >
> >> >
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> >> >        at
> >> >
> >> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> >> >        at java.lang.Thread.run(Thread.java:619)
> >> > 2012-01-30 10:52:24,016 INFO
> >> > org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner
> >> > -4511393305838866925 lease expired
> >> > 2012-01-30 10:52:24,016 INFO
> >> > org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner
> >> > -5818959718437063034 lease expired
> >> > 2012-01-30 10:52:24,016 INFO
> >> > org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner
> >> > -1408921590864341720 lease expired
> >> >
> >>
> >> Client went away?  All the lease expireds happen always or just around
> >> time of the hangup (You are closing scanners when done?)
> >>
> >> St.Ack
> >
> >
>

Reply via email to