Re: Master timeout during list tables

Ted Yu Wed, 25 Jun 2014 10:46:25 -0700

Looks like master was stuck in FileSystem.listStatus() call.
I noticed the following - did this show up if you take jstack one more time
?


   1.         at
   
org.apache.hadoop.hbase.master.SplitLogManager.waitForSplittingCompletion(SplitLogManager.java:372)
   2.         - locked <0x00000000bfa0a068> (a
   org.apache.hadoop.hbase.master.SplitLogManager$TaskBatch)
   3.         at
   
org.apache.hadoop.hbase.master.SplitLogManager.splitLogDistributed(SplitLogManager.java:290)
   4.         at
   
org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:371)
   5.         at
   
org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:337)
   6.         at
   
org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:282)
   7.         at
   
org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:127)



On Wed, Jun 25, 2014 at 10:09 AM, Tom Brown <[email protected]> wrote:

> Before I was able to acquire a stack trace, I restarted the master.
> However, the issue has just happened again and I was able to get a stack
> trace:
>
> http://pastebin.com/Mz5c6AML
>
> (The pastebin is set to never expire, so anyone viewing an archived version
> of this message should still be able to see the stack)
>
> The version of hbase is 0.94.10.
>
> Thanks!
>
> --Tom
>
>
> On Wed, Jun 18, 2014 at 8:55 PM, Qiang Tian <[email protected]> wrote:
>
> > Hi Tom,
> > Can you collect your master jvm stacktrace when problem happens and put
> it
> > to pastbin?
> > what is your hbase version?
> >
> >
> > On Thu, Jun 19, 2014 at 1:34 AM, Tom Brown <[email protected]> wrote:
> >
> > > Could this happen if the master is running too many RPC tasks and can't
> > > keep up? What about if there's too many connections to the server?
> > >
> > > --Tom
> > >
> > >
> > > On Wed, Jun 18, 2014 at 11:33 AM, Tom Brown <[email protected]>
> > wrote:
> > >
> > > > That server is the master and is not a regionserver.
> > > >
> > > > --Tom
> > > >
> > > >
> > > > On Wed, Jun 18, 2014 at 11:29 AM, Ted Yu <[email protected]>
> wrote:
> > > >
> > > >> Have you checked region server log on 10.100.101.221
> > > >> <http://hdpmgr001.pse.movenetworks.com/10.100.101.221:60000> ?
> > > >>
> > > >> Cheers
> > > >>
> > > >>
> > > >> On Wed, Jun 18, 2014 at 10:19 AM, Tom Brown <[email protected]>
> > > wrote:
> > > >>
> > > >> > Hello all,
> > > >> >
> > > >> > I'm trying to view the master status of a 6 node (0.94.10; hadoop
> > > 1.1.2)
> > > >> > cluster but I keep getting a timeout exception.
> > > >> >
> > > >> > The rest of the cluster is operating quite normally. From the
> > > >> exception, it
> > > >> > seems like the "list tables" function (required to display the web
> > UI)
> > > >> is
> > > >> > timing out for some reason.
> > > >> >
> > > >> > From the shell, I'm able to scan the entire .META. table, so the
> > table
> > > >> > information is conceivably available. I don't understand the rest
> of
> > > the
> > > >> > architecture well enough to know what might be causing this
> timeout
> > > >> during
> > > >> > "list".
> > > >> >
> > > >> > Any suggestions?
> > > >> >
> > > >> > java.net.SocketTimeoutException: Call to
> > > >> > hdpmgr001.pse.movenetworks.com/10.100.101.221:60000 failed on
> > socket
> > > >> > timeout exception: java.net.SocketTimeoutException: 60000 millis
> > > >> > timeout while waiting for channel to be ready for read. ch :
> > > >> > java.nio.channels.SocketChannel[connected local=/
> > 10.100.101.221:36722
> > > >> > remote=hdpmgr001.pse.movenetworks.com/10.100.101.221:60000]
> > > >> >         at
> > > >> >
> > > >>
> > >
> >
> org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:1026)
> > > >> >         at
> > > >> > org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:999)
> > > >> >         at
> > > >> >
> > > >>
> > >
> >
> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:86)
> > > >> >         at $Proxy11.getHTableDescriptors(Unknown Source)
> > > >> >         at
> > > >> >
> > > >>
> > >
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.listTables(HConnectionManager.java:1870)
> > > >> >         at
> > > >> >
> > > >>
> > >
> org.apache.hadoop.hbase.client.HBaseAdmin.listTables(HBaseAdmin.java:279)
> > > >> >         at
> > > >> >
> > > >>
> > >
> >
> org.apache.hadoop.hbase.tmpl.master.MasterStatusTmplImpl.__jamon_innerUnit__userTables(MasterStatusTmplImpl.java:504)
> > > >> >         at
> > > >> >
> > > >>
> > >
> >
> org.apache.hadoop.hbase.tmpl.master.MasterStatusTmplImpl.renderNoFlush(MasterStatusTmplImpl.java:297)
> > > >> >         at
> > > >> >
> > > >>
> > >
> >
> org.apache.hadoop.hbase.tmpl.master.MasterStatusTmpl.renderNoFlush(MasterStatusTmpl.java:399)
> > > >> >         at
> > > >> >
> > > >>
> > >
> >
> org.apache.hadoop.hbase.tmpl.master.MasterStatusTmpl.render(MasterStatusTmpl.java:389)
> > > >> >         at
> > > >> >
> > > >>
> > >
> >
> org.apache.hadoop.hbase.master.MasterStatusServlet.doGet(MasterStatusServlet.java:82)
> > > >> >         at
> > > javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
> > > >> >         at
> > > javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
> > > >> >         at
> > > >> >
> > org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
> > > >> >         at
> > > >> >
> > > >>
> > >
> >
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)
> > > >> >         at
> > > >> >
> > > >>
> > >
> >
> org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:101)
> > > >> >         at
> > > >> >
> > > >>
> > >
> >
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
> > > >> >         at
> > > >> >
> > > >>
> > >
> >
> org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:835)
> > > >> >         at
> > > >> >
> > > >>
> > >
> >
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
> > > >> >         at
> > > >> >
> > >
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
> > > >> >         at
> > > >> >
> > > >>
> > >
> >
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
> > > >> >         at
> > > >> >
> > >
> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
> > > >> >         at
> > > >> >
> > >
> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
> > > >> >         at
> > > >> >
> > org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
> > > >> >         at
> > > >> >
> > > >>
> > >
> >
> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
> > > >> >         at
> > > >> >
> > >
> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
> > > >> >         at org.mortbay.jetty.Server.handle(Server.java:326)
> > > >> >         at
> > > >> >
> > > org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
> > > >> >         at
> > > >> >
> > > >>
> > >
> >
> org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
> > > >> >         at
> > org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
> > > >> >         at
> > > >> org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
> > > >> >         at
> > > >> org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
> > > >> >         at
> > > >> >
> > > >>
> > >
> >
> org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
> > > >> >         at
> > > >> >
> > > >>
> > >
> >
> org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
> > > >> > Caused by: java.net.SocketTimeoutException: 60000 millis timeout
> > while
> > > >> > waiting for channel to be ready for read. ch :
> > > >> > java.nio.channels.SocketChannel[connected local=/
> > 10.100.101.221:36722
> > > >> > remote=hdpmgr001.pse.movenetworks.com/10.100.101.221:60000]
> > > >> >         at
> > > >> >
> > > >>
> > >
> >
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
> > > >> >         at
> > > >> >
> > >
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
> > > >> >         at
> > > >> >
> > >
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
> > > >> >         at java.io.FilterInputStream.read(Unknown Source)
> > > >> >         at
> > > >> >
> > > >>
> > >
> >
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection$PingInputStream.read(HBaseClient.java:373)
> > > >> >         at java.io.BufferedInputStream.fill(Unknown Source)
> > > >> >         at java.io.BufferedInputStream.read(Unknown Source)
> > > >> >         at java.io.DataInputStream.readInt(Unknown Source)
> > > >> >         at
> > > >> >
> > > >>
> > >
> >
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:646)
> > > >> >         at
> > > >> >
> > > >>
> > >
> >
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:580)
> > > >> >
> > > >> >
> > > >> >
> > > >> > The master log file is unhelpful. Almost all of it is notices
> about
> > > >> > skipping load balancing, but at least the exception appears in the
> > > >> > log:
> > > >> >
> > > >> >
> > > >> > 2014-06-18 16:18:50,359 INFO
> > > >> > org.apache.hadoop.hbase.master.LoadBalancer: Skipping load
> balancing
> > > >> > because balanced cluster; servers=6 regions=20 average=3.3333333
> mos
> > > >> > tloaded=4 leastloaded=3
> > > >> > 2014-06-18 16:18:50,359 INFO
> > > >> > org.apache.hadoop.hbase.master.LoadBalancer: Skipping load
> balancing
> > > >> > because balanced cluster; servers=6 regions=1 average=0.16666667
> mos
> > > >> > tloaded=1 leastloaded=0
> > > >> > 2014-06-18 16:18:50,360 INFO
> > > >> > org.apache.hadoop.hbase.master.LoadBalancer: Skipping load
> balancing
> > > >> > because balanced cluster; servers=6 regions=17 average=2.8333333
> mos
> > > >> > tloaded=3 leastloaded=2
> > > >> > 2014-06-18 16:18:50,360 INFO
> > > >> > org.apache.hadoop.hbase.master.LoadBalancer: Skipping load
> balancing
> > > >> > because balanced cluster; servers=6 regions=12 average=2.0
> mostloade
> > > >> > d=2 leastloaded=2
> > > >> > 2014-06-18 16:18:50,360 INFO
> > > >> > org.apache.hadoop.hbase.master.LoadBalancer: Skipping load
> balancing
> > > >> > because balanced cluster; servers=6 regions=68 average=11.333333
> mos
> > > >> > tloaded=12 leastloaded=11
> > > >> > 2014-06-18 16:20:36,118 WARN org.mortbay.log: /master-status:
> > > >> > java.net.SocketTimeoutException: Call to
> > > >> > hdpmgr001.pse.movenetworks.com/10.100.101.221:60000 failed on s
> > > >> > ocket timeout exception: java.net.SocketTimeoutException: 60000
> > millis
> > > >> > timeout while waiting for channel to be ready for read. ch :
> > > >> > java.nio.channels.SocketChannel[co
> > > >> > nnected local=/10.100.101.221:36674
> > > >> > remote=hdpmgr001.pse.movenetworks.com/10.100.101.221:60000]
> > > >> >
> > > >> >
> > > >> >
> > > >> >
> > > >> > --Tom
> > > >> >
> > > >>
> > > >
> > > >
> > >
> >
>

Re: Master timeout during list tables

Reply via email to