答复: 答复: 答复: regionserver 60020 timeout

sunweiwei Tue, 24 Jun 2014 07:15:33 -0700

Thanks, Qiang Tian

-----邮件原件-----
发件人: Qiang Tian [mailto:[email protected]] 
发送时间: 2014年6月23日 10:55
收件人: [email protected]
主题: Re: 答复: 答复: regionserver 60020 timeout


Hi WeiWei,
FYI - the problem should be hbase-11277.



On Thu, Jun 12, 2014 at 9:04 PM, sunweiwei <[email protected]>
wrote:

> I check again. The timeout regionserver  is  just sometimes the namespace
> server, not always.
>
> -----邮件原件-----
> 发件人: sunweiwei [mailto:[email protected]]
> 发送时间: 2014年6月12日 20:35
> 收件人: [email protected]
> 主题: 答复: 答复: regionserver 60020 timeout
>
> Thanks, Qiang Tian.
>
> I'm using hbase0.96.0. It's a heavy-write cluster and read request is
> rarely .
> I note  When  met timeout, the timeout regionserver seems like always the
> namespace server.
>
> This is in master jstack log.  Maybe this connection cause regionserver's
> block?
> "RpcServer.handler=57,port=60000" daemon prio=10 tid=0x00007f12994cf800
> nid=0x7991 in Object.wait() [0x00007f12714f2000]
>    java.lang.Thread.State: TIMED_WAITING (on object monitor)
>         at java.lang.Object.wait(Native Method)
>         at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1432)
>         - locked <0x000000071ae56260> (a
> org.apache.hadoop.hbase.ipc.RpcClient$Call)
>         at
> org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1650)
>         at
> org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1708)
>         at
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.get(ClientProtos.java:27308)
>         at
> org.apache.hadoop.hbase.protobuf.ProtobufUtil.get(ProtobufUtil.java:1281)
>         at org.apache.hadoop.hbase.client.HTable$3.call(HTable.java:752)
>         at org.apache.hadoop.hbase.client.HTable$3.call(HTable.java:750)
>         at
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:116)
>         - locked <0x000000071d0f6cd8> (a
> org.apache.hadoop.hbase.client.RpcRetryingCaller)
>         at org.apache.hadoop.hbase.client.HTable.get(HTable.java:755)
>         at
> org.apache.hadoop.hbase.master.TableNamespaceManager.get(TableNamespaceManager.java:134)
>         at
> org.apache.hadoop.hbase.master.TableNamespaceManager.get(TableNamespaceManager.java:118)
>         - locked <0x000000071e1dd7c8> (a
> org.apache.hadoop.hbase.master.TableNamespaceManager)
>         at
> org.apache.hadoop.hbase.master.HMaster.getNamespaceDescriptor(HMaster.java:3103)
>         at
> org.apache.hadoop.hbase.master.HMaster.listTableDescriptorsByNamespace(HMaster.java:3123)
>         at
> org.apache.hadoop.hbase.master.HMaster.listTableDescriptorsByNamespace(HMaster.java:3024)
>         at
> org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java:38261)
>         at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2146)
>         at
> org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1851)
>
> -----邮件原件-----
> 发件人: Qiang Tian [mailto:[email protected]]
> 发送时间: 2014年6月12日 16:39
> 收件人: [email protected]
> 主题: Re: 答复: regionserver 60020 timeout
>
> Hi Ted, Stack,
> not sure if we could add some guide for information that need to be
> provided when describing a problem...? many times I see Ted asked the hbase
> version etc...:-P maybe it could help save some time..
>
>
>
>
>
> On Thu, Jun 12, 2014 at 4:30 PM, Qiang Tian <[email protected]> wrote:
>
> > another occurrence of this problem:
> >
> >
> http://mail-archives.apache.org/mod_mbox/hbase-dev/201402.mbox/%3ccahkeahxaajfyc6xnv+cbwupqakotmkj349cpquk02fgs8qc...@mail.gmail.com%3E
> >
> >
> > According to the java doc, the read call should returned immediately
> after
> > reading the available data in socket buffer. there is only 1 case that it
> > can be blocked:
> > If another thread has already initiated a read operation upon this
> > channel, however, then an invocation of this method will block until the
> > first operation is complete.
> >
> > from the regionserver jstack output, only 1 reader is in read, so not the
> > case.
> >
> >
> > there is a slight code path difference for the 2 stacks:
> >
> > firstly:
> > org.apache.hadoop.hbase.ipc.RpcServer.channelIO(RpcServer.java:2402)
> > org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.java:2368)
> >
> >
> > and later:
> > sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243)
> > org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.java:2368)
> >
> > i.e. firslty in channelIO call, then read directly. there is a while loop
> > in channelIO. perhaps it could take time when there is much data to read?
> >
> > questions for WeiWei:
> > 1)what is the hbase version?
> > 2)How the problem happened? what operation you did?
> >
> > thanks.
> >
> >
> > On Sat, Jun 7, 2014 at 9:47 PM, Qiang Tian <[email protected]> wrote:
> >
> >> it looks not a busy system behavior:
> >> 1)From regionserver stacktraces, only reader 4 is in reading, all other
> >> readers are in waiting..
> >> 2)Weiwei mentioned "ipc.RpcServer: RpcServer.listener,port=60020: count
> >> of bytes read: 0 java.nio.channels.ClosedByInterruptException",  --the
> >> reader read nothing before it is interrupted
> >>
> >>
> >> we set the socket to be non-blocking, and the code looks fine to do the
> >> read:
> >>               if (key.isValid()) {
> >>                 if (key.isReadable()) {
> >>                   doRead(key);
> >>                 }
> >>               }
> >>
> >> googled the issue,  only find cases that non-blocking socket blocks in
> >> read at client side(due to finishConnect not called).
> >>
> >> for our case, maybe there is something wrong with the corresponding
> >> client connection?
> >> need to dig more.
> >>
> >>
> >>
> >>
> >> On Sat, Jun 7, 2014 at 7:38 AM, sunweiwei <[email protected]>
> >> wrote:
> >>
> >>> Hi
> >>>
> >>> the regionserver jstack log is   http://paste2.org/2wJc5CJx
> >>> the master jstack log is   http://paste2.org/t4KLpbXw.   Maybe the
> >>> regionserver is hbase:namespace server.
> >>> the master dump log (from the 60010 web) is http://paste2.org/vB2UVPyO
> >>>
> >>>
> >>> I saw the regionserver's Requests Per Second is 0 at that time ,
> >>> other regionserver's Requests Per Second is about 10000~50000,  and
> some
> >>> clients failed.
> >>> I'm not sure about loading at that time.  regionserver's loading
> usually
> >>> is not high.
> >>>
> >>> I restart the regionserver and master then hbase back to normal.
> >>>
> >>> Thanks
> >>> -----邮件原件-----
> >>> 发件人: [email protected] [mailto:[email protected]] 代表 Stack
> >>> 发送时间: 2014年6月7日 0:16
> >>> 收件人: Hbase-User
> >>> 主题: Re: regionserver 60020 timeout
> >>>
> >>> On Thu, Jun 5, 2014 at 11:35 PM, sunweiwei <[email protected]
> >
> >>> wrote:
> >>>
> >>> > Hi
> >>> >
> >>> > I find some 60000 millis timeout in master log, when master connect
> >>> > regionserver's 60020 port. Log like this:
> >>> >
> >>> >
> >>> Could we see the full stack trace?  It would help confirm Qiang's
> >>> prognosis.  Do you have monitoring of this server setup?  What did it
> >>> show
> >>> for loading, handler-count, at this time?
> >>>
> >>> Thanks,
> >>> St.Ack
> >>>
> >>>
> >>>
> >>> > because java.net.SocketTimeoutException: 60000 millis timeout while
> >>> waiting
> >>> > for channel to be ready for read. ch :
> >>> > java.nio.channels.SocketChannel[connected local=/
> 132.228.248.61:34835
> >>> > remote=dsjhd074/132.228.248.84:60020]
> >>> >
> >>> > at
> >>>
> org.apache.hadoop.hbase.ipc.RpcClient.wrapException(RpcClient.java:1475)
> >>> >
> >>> >          at
> >>> org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1450)
> >>> >
> >>> >          at
> >>> >
> >>> >
> >>>
> org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1650
> >>> > )
> >>> >
> >>> >          at
> >>> >
> >>> >
> >>>
> org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callB
> >>> > lockingMethod(RpcClient.java:1708)
> >>> >
> >>> >          at
> >>> >
> >>> >
> >>>
> org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$Blocking
> >>> > Stub.openRegion(AdminProtos.java:20595)
> >>> >
> >>> >          at
> >>> >
> >>> >
> >>>
> org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManager.ja
> >>> > va:628)
> >>> >
> >>> >          at
> >>> >
> >>> >
> >>>
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.ja
> >>> > va:1901)
> >>> >
> >>> >          at
> >>> >
> >>> >
> >>>
> org.apache.hadoop.hbase.master.AssignmentManager.access$300(AssignmentManage
> >>> > r.java:106)
> >>> >
> >>> >          at
> >>> >
> >>> >
> >>>
> org.apache.hadoop.hbase.master.AssignmentManager$2.process(AssignmentManager
> >>> > .java:660)
> >>> >
> >>> >          at
> >>> >
> >>>
> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128)
> >>> >
> >>> >          at
> >>> >
> >>> >
> >>>
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.ja
> >>> > va:886)
> >>> >
> >>> >          at
> >>> >
> >>> >
> >>>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:9
> >>> > 08)
> >>> >
> >>> >          at java.lang.Thread.run(Thread.java:662)
> >>> >
> >>> >
> >>> >
> >>> > And I find listener,port 60020 is waiting to lock
> <0x000000041ae290d8>
> >>>  in
> >>> > regionserver's jstack log. Is that ok? Thanks
> >>> >
> >>> > "RpcServer.reader=4,port=60020" daemon prio=10 tid=0x00007fade53c0000
> >>> > nid=0x22d8c runnable [0x00007fadb8b32000]
> >>> >
> >>> >    java.lang.Thread.State: RUNNABLE
> >>> >
> >>> >         at sun.nio.ch.FileDispatcher.read0(Native Method)
> >>> >
> >>> >         at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
> >>> >
> >>> >         at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:198)
> >>> >
> >>> >         at sun.nio.ch.IOUtil.read(IOUtil.java:171)
> >>> >
> >>> >         at
> >>> sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243)
> >>> >
> >>> >         - locked <0x000000041b1c78c0> (a java.lang.Object)
> >>> >
> >>> >         at
> >>> > org.apache.hadoop.hbase.ipc.RpcServer.channelIO(RpcServer.java:2402)
> >>> >
> >>> >         at
> >>> >
> org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.java:2368)
> >>> >
> >>> >         at
> >>> >
> >>> >
> >>>
> org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.ja
> >>> > va:1476)
> >>> >
> >>> >         at
> >>> >
> >>>
> org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:770)
> >>> >
> >>> >         at
> >>> >
> >>> >
> >>>
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.ja
> >>> > va:563)
> >>> >
> >>> >         - locked <0x000000041ae290d8> (a
> >>> > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader)
> >>> >
> >>> >         at
> >>> >
> >>> >
> >>>
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:538
> >>> > )
> >>> >
> >>> >         at
> >>> >
> >>> >
> >>>
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.ja
> >>> > va:886)
> >>> >
> >>> >         at
> >>> >
> >>> >
> >>>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:9
> >>> > 08)
> >>> >
> >>> >         at java.lang.Thread.run(Thread.java:662)
> >>> >
> >>> >
> >>> >
> >>> > "RpcServer.listener,port=60020" daemon prio=10 tid=0x00007fadd4426800
> >>> > nid=0x22db3 waiting for monitor entry [0x00007fadb658a000]
> >>> >
> >>> >    java.lang.Thread.State: BLOCKED (on object monitor)
> >>> >
> >>> >         at
> >>> >
> >>> >
> >>>
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.registerChannel(RpcSer
> >>> > ver.java:593)
> >>> >
> >>> >         - waiting to lock <0x000000041ae290d8> (a
> >>> > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader)
> >>> >
> >>> >         at
> >>> >
> >>>
> org.apache.hadoop.hbase.ipc.RpcServer$Listener.doAccept(RpcServer.java:745)
> >>> >
> >>> >         at
> >>> >
> org.apache.hadoop.hbase.ipc.RpcServer$Listener.run(RpcServer.java:668)
> >>> >
> >>> >
> >>>
> >>>
> >>
> >
>
>

答复: 答复: 答复: regionserver 60020 timeout

Reply via email to