Re: Thrift "hang ups" with no apparent reason

Galed Friedmann Wed, 01 Feb 2012 09:09:02 -0800

Hi,
It doesn't look like the servers are loaded, we're not passing that much
traffic though the cluster at the moment.
Can you explain how to take the dump from the Thrift server? I couldn't
find how to do that.


At the moment we have only 1 Thrift gateway, I'm going to add some more
with load balancing.

Thanks again.

On Wed, Feb 1, 2012 at 6:57 PM, Stack <[email protected]> wrote:

> On Wed, Feb 1, 2012 at 1:00 AM, Galed Friedmann
> <[email protected]> wrote:
> > 1. I've taken a dump from the HMaster when we felt some timeouts, I hope
> > that's what you're looking for, attached.
>
> I was looking for dumps of the hung up thrift server.
>
> The master dump shows it idle.
>
> > 2. The timeout occurs around 10-12 hours after the ZK established the
> > connection with the Thrift server so it's not immediate. On the Thrift
> logs
> > you see that nothing happened and only see the timeouts on the ZK logs.
> > Actually we hadn't had errors in the last 15 hours nor ZK timeouts for
> > Thrift but it'll happen again I'm sure..
>
> OK.  Thread dump it when its hung up.    Thrift is getting stuck going
> against the cluster it seems.  How many gateways are you running?  Run
> more?
>
> > 3. The lease expiration happens all the time, we're using mostly JRuby
> > scripts and closing the scans when we're done.
> >
>
> Could it be the client is taking a long time to get back to the
> server?  Or maybe the server is taking long time to respond because
> its heavily loaded (is it?).
>
> St.Ack
>
> > Thanks again,
> > Galed.
> >
> >
> > On Tue, Jan 31, 2012 at 10:51 PM, Stack <[email protected]> wrote:
> >>
> >> On Mon, Jan 30, 2012 at 6:39 AM, Galed Friedmann
> >> <[email protected]> wrote:
> >> > Lately we're having weird issues with Thrift, after several hours the
> >> > Thrift server "hangs" - the scripts that are using it to access HBase
> >> > get
> >> > connection timeouts, we're also using Heroku and ruby on rails apps
> that
> >> > use Thrift and they simply get stuck. Only when restarting the Thrift
> >> > process everything goes back to normal.
> >> >
> >>
> >> Can you thread dump the thrift server when its all hung up?
> >>
> >> Have you enabled
> >>
> >>
> >> > 2012-01-30 10:52:08,823 INFO
> org.apache.zookeeper.server.NIOServerCnxn:
> >> > Established session 0x1352a393d18051e with negotiated timeout 90000
> for
> >> > client /10.217.55.193:35940
> >> > 2012-01-30 10:52:28,001 INFO
> >> > org.apache.zookeeper.server.ZooKeeperServer:
> >> > Expiring session 0x1352a393d18051b, timeout of 90000ms exceeded
> >> > 2012-01-30 10:52:28,001 INFO
> >> > org.apache.zookeeper.server.PrepRequestProcessor: Processed session
> >> > termination for sessionid: 0x1352a393d18051b
> >>
> >> ZK is establishing a session w/ 90second timeout and then timing out
> >> immediately?
> >>
> >>
> >>
> >>
> >> > 2012-01-30 10:51:36,382 WARN org.apache.hadoop.ipc.HBaseServer: IPC
> >> > Server
> >> > listener on 60020: readAndProcess threw exception java.io.IOException:
> >> > Connection rese
> >> > t by peer. Count of bytes read: 0
> >> > java.io.IOException: Connection reset by peer
> >> >        at sun.nio.ch.FileDispatcher.read0(Native Method)
> >> >        at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
> >> >        at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:237)
> >> >        at sun.nio.ch.IOUtil.read(IOUtil.java:210)
> >> >        at
> sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:236)
> >> >        at
> >> >
> >> >
> org.apache.hadoop.hbase.ipc.HBaseServer.channelRead(HBaseServer.java:1359)
> >> >        at
> >> >
> >> >
> org.apache.hadoop.hbase.ipc.HBaseServer$Connection.readAndProcess(HBaseServer.java:900)
> >> >        at
> >> >
> >> >
> org.apache.hadoop.hbase.ipc.HBaseServer$Listener.doRead(HBaseServer.java:522)
> >> >        at
> >> >
> >> >
> org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.run(HBaseServer.java:316)
> >> >        at
> >> >
> >> >
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> >> >        at
> >> >
> >> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> >> >        at java.lang.Thread.run(Thread.java:619)
> >> > 2012-01-30 10:52:24,016 INFO
> >> > org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner
> >> > -4511393305838866925 lease expired
> >> > 2012-01-30 10:52:24,016 INFO
> >> > org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner
> >> > -5818959718437063034 lease expired
> >> > 2012-01-30 10:52:24,016 INFO
> >> > org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner
> >> > -1408921590864341720 lease expired
> >> >
> >>
> >> Client went away?  All the lease expireds happen always or just around
> >> time of the hangup (You are closing scanners when done?)
> >>
> >> St.Ack
> >
> >
>

Re: Thrift "hang ups" with no apparent reason

Reply via email to