Before I was able to acquire a stack trace, I restarted the master. However, the issue has just happened again and I was able to get a stack trace:
http://pastebin.com/Mz5c6AML (The pastebin is set to never expire, so anyone viewing an archived version of this message should still be able to see the stack) The version of hbase is 0.94.10. Thanks! --Tom On Wed, Jun 18, 2014 at 8:55 PM, Qiang Tian <[email protected]> wrote: > Hi Tom, > Can you collect your master jvm stacktrace when problem happens and put it > to pastbin? > what is your hbase version? > > > On Thu, Jun 19, 2014 at 1:34 AM, Tom Brown <[email protected]> wrote: > > > Could this happen if the master is running too many RPC tasks and can't > > keep up? What about if there's too many connections to the server? > > > > --Tom > > > > > > On Wed, Jun 18, 2014 at 11:33 AM, Tom Brown <[email protected]> > wrote: > > > > > That server is the master and is not a regionserver. > > > > > > --Tom > > > > > > > > > On Wed, Jun 18, 2014 at 11:29 AM, Ted Yu <[email protected]> wrote: > > > > > >> Have you checked region server log on 10.100.101.221 > > >> <http://hdpmgr001.pse.movenetworks.com/10.100.101.221:60000> ? > > >> > > >> Cheers > > >> > > >> > > >> On Wed, Jun 18, 2014 at 10:19 AM, Tom Brown <[email protected]> > > wrote: > > >> > > >> > Hello all, > > >> > > > >> > I'm trying to view the master status of a 6 node (0.94.10; hadoop > > 1.1.2) > > >> > cluster but I keep getting a timeout exception. > > >> > > > >> > The rest of the cluster is operating quite normally. From the > > >> exception, it > > >> > seems like the "list tables" function (required to display the web > UI) > > >> is > > >> > timing out for some reason. > > >> > > > >> > From the shell, I'm able to scan the entire .META. table, so the > table > > >> > information is conceivably available. I don't understand the rest of > > the > > >> > architecture well enough to know what might be causing this timeout > > >> during > > >> > "list". > > >> > > > >> > Any suggestions? > > >> > > > >> > java.net.SocketTimeoutException: Call to > > >> > hdpmgr001.pse.movenetworks.com/10.100.101.221:60000 failed on > socket > > >> > timeout exception: java.net.SocketTimeoutException: 60000 millis > > >> > timeout while waiting for channel to be ready for read. ch : > > >> > java.nio.channels.SocketChannel[connected local=/ > 10.100.101.221:36722 > > >> > remote=hdpmgr001.pse.movenetworks.com/10.100.101.221:60000] > > >> > at > > >> > > > >> > > > org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:1026) > > >> > at > > >> > org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:999) > > >> > at > > >> > > > >> > > > org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:86) > > >> > at $Proxy11.getHTableDescriptors(Unknown Source) > > >> > at > > >> > > > >> > > > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.listTables(HConnectionManager.java:1870) > > >> > at > > >> > > > >> > > org.apache.hadoop.hbase.client.HBaseAdmin.listTables(HBaseAdmin.java:279) > > >> > at > > >> > > > >> > > > org.apache.hadoop.hbase.tmpl.master.MasterStatusTmplImpl.__jamon_innerUnit__userTables(MasterStatusTmplImpl.java:504) > > >> > at > > >> > > > >> > > > org.apache.hadoop.hbase.tmpl.master.MasterStatusTmplImpl.renderNoFlush(MasterStatusTmplImpl.java:297) > > >> > at > > >> > > > >> > > > org.apache.hadoop.hbase.tmpl.master.MasterStatusTmpl.renderNoFlush(MasterStatusTmpl.java:399) > > >> > at > > >> > > > >> > > > org.apache.hadoop.hbase.tmpl.master.MasterStatusTmpl.render(MasterStatusTmpl.java:389) > > >> > at > > >> > > > >> > > > org.apache.hadoop.hbase.master.MasterStatusServlet.doGet(MasterStatusServlet.java:82) > > >> > at > > javax.servlet.http.HttpServlet.service(HttpServlet.java:707) > > >> > at > > javax.servlet.http.HttpServlet.service(HttpServlet.java:820) > > >> > at > > >> > > org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511) > > >> > at > > >> > > > >> > > > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221) > > >> > at > > >> > > > >> > > > org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:101) > > >> > at > > >> > > > >> > > > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > > >> > at > > >> > > > >> > > > org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:835) > > >> > at > > >> > > > >> > > > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > > >> > at > > >> > > > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) > > >> > at > > >> > > > >> > > > org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) > > >> > at > > >> > > > org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) > > >> > at > > >> > > > org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) > > >> > at > > >> > > org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) > > >> > at > > >> > > > >> > > > org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) > > >> > at > > >> > > > org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) > > >> > at org.mortbay.jetty.Server.handle(Server.java:326) > > >> > at > > >> > > > org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) > > >> > at > > >> > > > >> > > > org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928) > > >> > at > org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) > > >> > at > > >> org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) > > >> > at > > >> org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) > > >> > at > > >> > > > >> > > > org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410) > > >> > at > > >> > > > >> > > > org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) > > >> > Caused by: java.net.SocketTimeoutException: 60000 millis timeout > while > > >> > waiting for channel to be ready for read. ch : > > >> > java.nio.channels.SocketChannel[connected local=/ > 10.100.101.221:36722 > > >> > remote=hdpmgr001.pse.movenetworks.com/10.100.101.221:60000] > > >> > at > > >> > > > >> > > > org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164) > > >> > at > > >> > > > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155) > > >> > at > > >> > > > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128) > > >> > at java.io.FilterInputStream.read(Unknown Source) > > >> > at > > >> > > > >> > > > org.apache.hadoop.hbase.ipc.HBaseClient$Connection$PingInputStream.read(HBaseClient.java:373) > > >> > at java.io.BufferedInputStream.fill(Unknown Source) > > >> > at java.io.BufferedInputStream.read(Unknown Source) > > >> > at java.io.DataInputStream.readInt(Unknown Source) > > >> > at > > >> > > > >> > > > org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:646) > > >> > at > > >> > > > >> > > > org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:580) > > >> > > > >> > > > >> > > > >> > The master log file is unhelpful. Almost all of it is notices about > > >> > skipping load balancing, but at least the exception appears in the > > >> > log: > > >> > > > >> > > > >> > 2014-06-18 16:18:50,359 INFO > > >> > org.apache.hadoop.hbase.master.LoadBalancer: Skipping load balancing > > >> > because balanced cluster; servers=6 regions=20 average=3.3333333 mos > > >> > tloaded=4 leastloaded=3 > > >> > 2014-06-18 16:18:50,359 INFO > > >> > org.apache.hadoop.hbase.master.LoadBalancer: Skipping load balancing > > >> > because balanced cluster; servers=6 regions=1 average=0.16666667 mos > > >> > tloaded=1 leastloaded=0 > > >> > 2014-06-18 16:18:50,360 INFO > > >> > org.apache.hadoop.hbase.master.LoadBalancer: Skipping load balancing > > >> > because balanced cluster; servers=6 regions=17 average=2.8333333 mos > > >> > tloaded=3 leastloaded=2 > > >> > 2014-06-18 16:18:50,360 INFO > > >> > org.apache.hadoop.hbase.master.LoadBalancer: Skipping load balancing > > >> > because balanced cluster; servers=6 regions=12 average=2.0 mostloade > > >> > d=2 leastloaded=2 > > >> > 2014-06-18 16:18:50,360 INFO > > >> > org.apache.hadoop.hbase.master.LoadBalancer: Skipping load balancing > > >> > because balanced cluster; servers=6 regions=68 average=11.333333 mos > > >> > tloaded=12 leastloaded=11 > > >> > 2014-06-18 16:20:36,118 WARN org.mortbay.log: /master-status: > > >> > java.net.SocketTimeoutException: Call to > > >> > hdpmgr001.pse.movenetworks.com/10.100.101.221:60000 failed on s > > >> > ocket timeout exception: java.net.SocketTimeoutException: 60000 > millis > > >> > timeout while waiting for channel to be ready for read. ch : > > >> > java.nio.channels.SocketChannel[co > > >> > nnected local=/10.100.101.221:36674 > > >> > remote=hdpmgr001.pse.movenetworks.com/10.100.101.221:60000] > > >> > > > >> > > > >> > > > >> > > > >> > --Tom > > >> > > > >> > > > > > > > > >
