Looks like master was stuck in FileSystem.listStatus() call. I noticed the following - did this show up if you take jstack one more time ?
1. at org.apache.hadoop.hbase.master.SplitLogManager.waitForSplittingCompletion(SplitLogManager.java:372) 2. - locked <0x00000000bfa0a068> (a org.apache.hadoop.hbase.master.SplitLogManager$TaskBatch) 3. at org.apache.hadoop.hbase.master.SplitLogManager.splitLogDistributed(SplitLogManager.java:290) 4. at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:371) 5. at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:337) 6. at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:282) 7. at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:127) On Wed, Jun 25, 2014 at 10:09 AM, Tom Brown <[email protected]> wrote: > Before I was able to acquire a stack trace, I restarted the master. > However, the issue has just happened again and I was able to get a stack > trace: > > http://pastebin.com/Mz5c6AML > > (The pastebin is set to never expire, so anyone viewing an archived version > of this message should still be able to see the stack) > > The version of hbase is 0.94.10. > > Thanks! > > --Tom > > > On Wed, Jun 18, 2014 at 8:55 PM, Qiang Tian <[email protected]> wrote: > > > Hi Tom, > > Can you collect your master jvm stacktrace when problem happens and put > it > > to pastbin? > > what is your hbase version? > > > > > > On Thu, Jun 19, 2014 at 1:34 AM, Tom Brown <[email protected]> wrote: > > > > > Could this happen if the master is running too many RPC tasks and can't > > > keep up? What about if there's too many connections to the server? > > > > > > --Tom > > > > > > > > > On Wed, Jun 18, 2014 at 11:33 AM, Tom Brown <[email protected]> > > wrote: > > > > > > > That server is the master and is not a regionserver. > > > > > > > > --Tom > > > > > > > > > > > > On Wed, Jun 18, 2014 at 11:29 AM, Ted Yu <[email protected]> > wrote: > > > > > > > >> Have you checked region server log on 10.100.101.221 > > > >> <http://hdpmgr001.pse.movenetworks.com/10.100.101.221:60000> ? > > > >> > > > >> Cheers > > > >> > > > >> > > > >> On Wed, Jun 18, 2014 at 10:19 AM, Tom Brown <[email protected]> > > > wrote: > > > >> > > > >> > Hello all, > > > >> > > > > >> > I'm trying to view the master status of a 6 node (0.94.10; hadoop > > > 1.1.2) > > > >> > cluster but I keep getting a timeout exception. > > > >> > > > > >> > The rest of the cluster is operating quite normally. From the > > > >> exception, it > > > >> > seems like the "list tables" function (required to display the web > > UI) > > > >> is > > > >> > timing out for some reason. > > > >> > > > > >> > From the shell, I'm able to scan the entire .META. table, so the > > table > > > >> > information is conceivably available. I don't understand the rest > of > > > the > > > >> > architecture well enough to know what might be causing this > timeout > > > >> during > > > >> > "list". > > > >> > > > > >> > Any suggestions? > > > >> > > > > >> > java.net.SocketTimeoutException: Call to > > > >> > hdpmgr001.pse.movenetworks.com/10.100.101.221:60000 failed on > > socket > > > >> > timeout exception: java.net.SocketTimeoutException: 60000 millis > > > >> > timeout while waiting for channel to be ready for read. ch : > > > >> > java.nio.channels.SocketChannel[connected local=/ > > 10.100.101.221:36722 > > > >> > remote=hdpmgr001.pse.movenetworks.com/10.100.101.221:60000] > > > >> > at > > > >> > > > > >> > > > > > > org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:1026) > > > >> > at > > > >> > org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:999) > > > >> > at > > > >> > > > > >> > > > > > > org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:86) > > > >> > at $Proxy11.getHTableDescriptors(Unknown Source) > > > >> > at > > > >> > > > > >> > > > > > > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.listTables(HConnectionManager.java:1870) > > > >> > at > > > >> > > > > >> > > > > org.apache.hadoop.hbase.client.HBaseAdmin.listTables(HBaseAdmin.java:279) > > > >> > at > > > >> > > > > >> > > > > > > org.apache.hadoop.hbase.tmpl.master.MasterStatusTmplImpl.__jamon_innerUnit__userTables(MasterStatusTmplImpl.java:504) > > > >> > at > > > >> > > > > >> > > > > > > org.apache.hadoop.hbase.tmpl.master.MasterStatusTmplImpl.renderNoFlush(MasterStatusTmplImpl.java:297) > > > >> > at > > > >> > > > > >> > > > > > > org.apache.hadoop.hbase.tmpl.master.MasterStatusTmpl.renderNoFlush(MasterStatusTmpl.java:399) > > > >> > at > > > >> > > > > >> > > > > > > org.apache.hadoop.hbase.tmpl.master.MasterStatusTmpl.render(MasterStatusTmpl.java:389) > > > >> > at > > > >> > > > > >> > > > > > > org.apache.hadoop.hbase.master.MasterStatusServlet.doGet(MasterStatusServlet.java:82) > > > >> > at > > > javax.servlet.http.HttpServlet.service(HttpServlet.java:707) > > > >> > at > > > javax.servlet.http.HttpServlet.service(HttpServlet.java:820) > > > >> > at > > > >> > > > org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511) > > > >> > at > > > >> > > > > >> > > > > > > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221) > > > >> > at > > > >> > > > > >> > > > > > > org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:101) > > > >> > at > > > >> > > > > >> > > > > > > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > > > >> > at > > > >> > > > > >> > > > > > > org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:835) > > > >> > at > > > >> > > > > >> > > > > > > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > > > >> > at > > > >> > > > > > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) > > > >> > at > > > >> > > > > >> > > > > > > org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) > > > >> > at > > > >> > > > > > org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) > > > >> > at > > > >> > > > > > org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) > > > >> > at > > > >> > > > org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) > > > >> > at > > > >> > > > > >> > > > > > > org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) > > > >> > at > > > >> > > > > > org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) > > > >> > at org.mortbay.jetty.Server.handle(Server.java:326) > > > >> > at > > > >> > > > > org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) > > > >> > at > > > >> > > > > >> > > > > > > org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928) > > > >> > at > > org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) > > > >> > at > > > >> org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) > > > >> > at > > > >> org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) > > > >> > at > > > >> > > > > >> > > > > > > org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410) > > > >> > at > > > >> > > > > >> > > > > > > org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) > > > >> > Caused by: java.net.SocketTimeoutException: 60000 millis timeout > > while > > > >> > waiting for channel to be ready for read. ch : > > > >> > java.nio.channels.SocketChannel[connected local=/ > > 10.100.101.221:36722 > > > >> > remote=hdpmgr001.pse.movenetworks.com/10.100.101.221:60000] > > > >> > at > > > >> > > > > >> > > > > > > org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164) > > > >> > at > > > >> > > > > > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155) > > > >> > at > > > >> > > > > > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128) > > > >> > at java.io.FilterInputStream.read(Unknown Source) > > > >> > at > > > >> > > > > >> > > > > > > org.apache.hadoop.hbase.ipc.HBaseClient$Connection$PingInputStream.read(HBaseClient.java:373) > > > >> > at java.io.BufferedInputStream.fill(Unknown Source) > > > >> > at java.io.BufferedInputStream.read(Unknown Source) > > > >> > at java.io.DataInputStream.readInt(Unknown Source) > > > >> > at > > > >> > > > > >> > > > > > > org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:646) > > > >> > at > > > >> > > > > >> > > > > > > org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:580) > > > >> > > > > >> > > > > >> > > > > >> > The master log file is unhelpful. Almost all of it is notices > about > > > >> > skipping load balancing, but at least the exception appears in the > > > >> > log: > > > >> > > > > >> > > > > >> > 2014-06-18 16:18:50,359 INFO > > > >> > org.apache.hadoop.hbase.master.LoadBalancer: Skipping load > balancing > > > >> > because balanced cluster; servers=6 regions=20 average=3.3333333 > mos > > > >> > tloaded=4 leastloaded=3 > > > >> > 2014-06-18 16:18:50,359 INFO > > > >> > org.apache.hadoop.hbase.master.LoadBalancer: Skipping load > balancing > > > >> > because balanced cluster; servers=6 regions=1 average=0.16666667 > mos > > > >> > tloaded=1 leastloaded=0 > > > >> > 2014-06-18 16:18:50,360 INFO > > > >> > org.apache.hadoop.hbase.master.LoadBalancer: Skipping load > balancing > > > >> > because balanced cluster; servers=6 regions=17 average=2.8333333 > mos > > > >> > tloaded=3 leastloaded=2 > > > >> > 2014-06-18 16:18:50,360 INFO > > > >> > org.apache.hadoop.hbase.master.LoadBalancer: Skipping load > balancing > > > >> > because balanced cluster; servers=6 regions=12 average=2.0 > mostloade > > > >> > d=2 leastloaded=2 > > > >> > 2014-06-18 16:18:50,360 INFO > > > >> > org.apache.hadoop.hbase.master.LoadBalancer: Skipping load > balancing > > > >> > because balanced cluster; servers=6 regions=68 average=11.333333 > mos > > > >> > tloaded=12 leastloaded=11 > > > >> > 2014-06-18 16:20:36,118 WARN org.mortbay.log: /master-status: > > > >> > java.net.SocketTimeoutException: Call to > > > >> > hdpmgr001.pse.movenetworks.com/10.100.101.221:60000 failed on s > > > >> > ocket timeout exception: java.net.SocketTimeoutException: 60000 > > millis > > > >> > timeout while waiting for channel to be ready for read. ch : > > > >> > java.nio.channels.SocketChannel[co > > > >> > nnected local=/10.100.101.221:36674 > > > >> > remote=hdpmgr001.pse.movenetworks.com/10.100.101.221:60000] > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > --Tom > > > >> > > > > >> > > > > > > > > > > > > > >
