Thanks JD. Sorry - I got swamped and was going to file a Jira, but sounds like you don't need it now.
On Tue, Jun 28, 2011 at 3:14 PM, Jean-Daniel Cryans <[email protected]>wrote: > Looks like I'll be fixing it in the context of HBASE-3984. > > J-D > > On Mon, Jun 27, 2011 at 10:21 AM, Jean-Daniel Cryans > <[email protected]> wrote: > > Yeah the BulkAssigner uses a AssignmentManager.assign method that > > won't retry if it gets an exception because originally it was only > > used during startup (eg if you can't assign the regions properly at > > startup then kill yourself). > > > > In this case it happened to you when creating a table (it also uses > > the BulkAssigner now). To me it's a bug, it should just be assigned > > somewhere else. Mind filing a jira? > > > > As to how you got a 60 secs timeout, you should take a look at the RS > > log (and next time it happens also do a jstack). > > > > Thx, > > > > J-D > > > > On Mon, Jun 27, 2011 at 9:54 AM, Matt Davies <[email protected]> > wrote: > >> Hey all, > >> > >> Interesting problem I'm running into. It appears that our master has > died > >> after a failed region assignment. Any ideas as to why? > >> > >> > >> We are running version 0.90.1 in this cluster. > >> H3M1 = master > >> H3S* = regionservers > >> > >> out on H3S3 the regionserver process is running (but didn't respond per > the > >> log) > >> > >> 2011-06-27 10:18:22,517 INFO > >> org.apache.hadoop.hbase.master.handler.DisableTableHandler: Attemping to > >> disable table ContentGuidance > >> 2011-06-27 10:18:22,522 INFO > >> org.apache.hadoop.hbase.master.handler.DisableTableHandler: Offlining 1 > >> regions. > >> 2011-06-27 10:18:23,527 INFO > >> org.apache.hadoop.hbase.master.handler.DisableTableHandler: Disabled > table > >> is done=true > >> 2011-06-27 10:18:24,536 INFO > >> org.apache.hadoop.hbase.master.handler.TableEventHandler: Handling table > >> operation C_M_DELETE_TABLE on table ContentGuidance > >> 2011-06-27 10:18:24,584 INFO org.apache.hadoop.hbase.catalog.MetaEditor: > >> Deleted region > >> ContentGuidance,,1298676132327.9d63b6173c7307c27aa9728f738f0ef3. from > META > >> 2011-06-27 10:19:17,296 WARN org.apache.hadoop.hbase.zookeeper.ZKTable: > >> Moving table ContentGuidance state to enabled but was already enabled > >> 2011-06-27 10:19:17,313 INFO > org.apache.hadoop.hbase.regionserver.wal.HLog: > >> HLog configuration: blocksize=64 MB, rollsize=60.8 MB, enabled=true, > >> flushlogentries=1, optionallogflushinternal=1000ms > >> 2011-06-27 10:19:17,349 INFO > >> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter: Using > syncFs > >> -- HDFS-200 > >> 2011-06-27 10:19:17,350 INFO > org.apache.hadoop.hbase.regionserver.wal.HLog: > >> New hlog > >> > /hbase/ContentGuidance/f38bbc9bc672b5ea415b6dcc7e1da43a/.logs/hlog.1309191557313 > >> 2011-06-27 10:19:17,350 INFO > org.apache.hadoop.hbase.regionserver.wal.HLog: > >> Using getNumCurrentReplicas--HDFS-826 > >> 2011-06-27 10:19:17,433 INFO > org.apache.hadoop.hbase.regionserver.HRegion: > >> Onlined > ContentGuidance,,1309191557292.f38bbc9bc672b5ea415b6dcc7e1da43a.; > >> next sequenceid=1 > >> 2011-06-27 10:19:17,438 INFO org.apache.hadoop.hbase.catalog.MetaEditor: > >> Added region > >> ContentGuidance,,1309191557292.f38bbc9bc672b5ea415b6dcc7e1da43a. to META > >> 2011-06-27 10:19:17,439 INFO > org.apache.hadoop.hbase.regionserver.HRegion: > >> Closed ContentGuidance,,1309191557292.f38bbc9bc672b5ea415b6dcc7e1da43a. > >> 2011-06-27 10:19:17,439 INFO > org.apache.hadoop.hbase.regionserver.wal.HLog: > >> IPC Server handler 87 on 60000.logSyncer exiting > >> 2011-06-27 10:19:17,470 INFO > >> org.apache.hadoop.hbase.master.AssignmentManager: Bulk assigning 1 > region(s) > >> round-robin across 4 server(s) > >> 2011-06-27 10:19:17,472 INFO > >> org.apache.hadoop.hbase.master.AssignmentManager: Bulk assigning done > >> 2011-06-27 10:19:17,480 INFO > >> org.apache.hadoop.hbase.master.AssignmentManager: > H3S4,60020,1308946657703 > >> unassigned znodes=1 of total=1 > >> 2011-06-27 10:19:21,530 INFO > >> org.apache.hadoop.hbase.master.handler.DisableTableHandler: Attemping to > >> disable table ContentGuidance > >> 2011-06-27 10:19:21,536 INFO > >> org.apache.hadoop.hbase.master.handler.DisableTableHandler: Offlining 1 > >> regions. > >> 2011-06-27 10:19:22,543 INFO > >> org.apache.hadoop.hbase.master.handler.DisableTableHandler: Disabled > table > >> is done=true > >> 2011-06-27 10:19:23,544 INFO > >> org.apache.hadoop.hbase.master.handler.TableEventHandler: Handling table > >> operation C_M_DELETE_TABLE on table ContentGuidance > >> 2011-06-27 10:19:23,551 INFO org.apache.hadoop.hbase.catalog.MetaEditor: > >> Deleted region > >> ContentGuidance,,1309191557292.f38bbc9bc672b5ea415b6dcc7e1da43a. from > META > >> 2011-06-27 10:20:17,479 FATAL org.apache.hadoop.hbase.master.HMaster: > Failed > >> assignment of regions to serverName=H3S3,60020,1308946657608, > >> load=(requests=0, regions=7, usedHeap=976, maxHeap=1011) > >> java.net.SocketTimeoutException: Call to H3S3/10..x.x.x3:60020 failed on > >> socket timeout exception: java.net.SocketTimeoutException: 60000 millis > >> timeout while waiting for channel to be ready for read. ch : > >> java.nio.channels.SocketChannel[connected local=/10..x.x.x:34904 > >> remote=H3S3/10.x.x.x:60020] > >> at > >> > org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:802) > >> at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:775) > >> at > org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257) > >> at $Proxy7.openRegions(Unknown Source) > >> at > >> > org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManager.java:566) > >> at > >> > org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:776) > >> at > >> > org.apache.hadoop.hbase.master.AssignmentManager$SingleServerBulkAssigner.run(AssignmentManager.java:1310) > >> at > >> > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > >> at > >> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > >> at java.lang.Thread.run(Thread.java:619) > >> Caused by: java.net.SocketTimeoutException: 60000 millis timeout while > >> waiting for channel to be ready for read. ch : > >> java.nio.channels.SocketChannel[connected local=/10..x.x.x:34904 > >> remote=H3S3/10..x.x.x:60020] > >> at > >> > org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164) > >> at > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155) > >> at > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128) > >> at java.io.FilterInputStream.read(FilterInputStream.java:116) > >> at > >> > org.apache.hadoop.hbase.ipc.HBaseClient$Connection$PingInputStream.read(HBaseClient.java:299) > >> at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) > >> at java.io.BufferedInputStream.read(BufferedInputStream.java:237) > >> at java.io.DataInputStream.readInt(DataInputStream.java:370) > >> at > >> > org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:539) > >> at > >> > org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:477) > >> 2011-06-27 10:20:17,482 INFO org.apache.hadoop.hbase.master.HMaster: > >> Aborting > >> 2011-06-27 10:20:18,390 INFO org.apache.hadoop.ipc.HBaseServer: Stopping > >> server on 60000 > >> 2011-06-27 10:20:18,390 INFO > org.apache.hadoop.hbase.master.CatalogJanitor: > >> H3M1:60000-CatalogJanitor exiting > >> 2011-06-27 10:20:18,390 INFO org.apache.hadoop.hbase.master.HMaster$1: > >> H3M1:60000-BalancerChore exiting > >> 2011-06-27 10:20:18,392 INFO org.apache.hadoop.ipc.HBaseServer: Stopping > IPC > >> Server listener on 60000 > >> > > >
