Sujee: Can you explain why zookeeper.session.timeout should be increased ? I don't see the following in server logs: We slept 117947ms, ten times longer than scheduled: 10000
Our HBase cluster has 7 region servers. Each one is 8 core Intel(R) Xeon(R) with 16 GB memory. On Fri, Jun 11, 2010 at 8:51 AM, Sujee Maniyam <[email protected]> wrote: > Can you post some details about your cluster : number of machines, type of > machines, how much data you are going through ..etc > > try increasing the zookeeper timeout > > in Hbase/conf/hbase-site.xml > ---- > <property> > <name>zookeeper.session.timeout</name> > <value>60000</value> > </property> > > <property> > <name>hbase.regionserver.handler.count</name> > <value>100</value> > </property> > <property> > ----- > > The second property increases handlers on region servers to handle data > requests. might worth a shot. > > > > http://sujee.net > > > On Thu, Jun 10, 2010 at 10:46 PM, Ted Yu <[email protected]> wrote: > > > Hi, > > We use hbase 0.20.3 > > HBase went down when we ran map-reduce job. > > > > Your comment / help is appreciated. > > > > I didn't find such WARN in master or region server log: > > We slept 117947ms, ten times longer than scheduled: 10000 > > > > Here is master log snippet: > > > > 2010-06-11 04:32:50,077 INFO > > org.apache.hadoop.hbase.master.RegionServerOperation: Updated row > > > > > T3_HBASE_NO_INCR_ORCL_SQLLDR-THREEGPPSPEECHCALLS-1276229903001-0,BCBF1C0DB6FCE3695709CC6175CF512D,1276230743333 > > in region .META.,,1 with startcode=1276204602423, server= > > 10.201.8.208:60020 > > 2010-06-11 04:33:16,002 INFO > org.apache.hadoop.hbase.master.ServerManager: > > sjc1-hadoop1.sjc1.carrieriq.com,60020,1276204602195 znode expired > > 2010-06-11 04:33:16,003 INFO > org.apache.hadoop.hbase.master.RegionManager: > > -ROOT- region unset (but not set to be reassigned) > > 2010-06-11 04:33:16,003 INFO > > org.apache.hadoop.hbase.master.RegionServerOperation: process shutdown of > > server sjc1-hadoop1.sjc1.carrieriq.com,60020,1276204602195: logSplit: > > false, > > rootRescanned: false, numberOfMetaRegions: 1, onlineMetaRegions.size(): 1 > > 2010-06-11 04:33:16,005 INFO org.apache.hadoop.hbase.regionserver.HLog: > > Splitting 3 hlog(s) in hdfs:// > > > > > sjc1-hadoop0.sjc1.carrieriq.com:9000/hbase/.logs/sjc1-hadoop1.sjc1.carrieriq.com,60020,1276204602195 > > 2010-06-11 04:33:16,098 WARN org.apache.hadoop.hbase.regionserver.HLog: > > Empty hlog, continuing: org.apache.hadoop.fs.filesta...@bc89a24a count=0 > > java.io.EOFException > > at java.io.DataInputStream.readFully(DataInputStream.java:180) > > at java.io.DataInputStream.readFully(DataInputStream.java:152) > > at > org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1450) > > at > > org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1428) > > at > > org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1417) > > at > > org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1412) > > at org.apache.hadoop.hbase.regionserver.HLog.splitLog(HLog.java:962) > > at org.apache.hadoop.hbase.regionserver.HLog.splitLog(HLog.java:872) > > at > > > > > org.apache.hadoop.hbase.master.ProcessServerShutdown.process(ProcessServerShutdown.java:286) > > at > > org.apache.hadoop.hbase.master.HMaster.processToDoQueue(HMaster.java:494) > > at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:429) > > 2010-06-11 04:33:17,024 INFO org.apache.hadoop.hbase.regionserver.HLog: > > hlog > > file splitting completed in 1020 millis for hdfs:// > > > > > sjc1-hadoop0.sjc1.carrieriq.com:9000/hbase/.logs/sjc1-hadoop1.sjc1.carrieriq.com,60020,1276204602195 > > 2010-06-11 04:33:17,024 INFO > > org.apache.hadoop.hbase.master.RegionServerOperation: Log split complete, > > meta reassignment and scanning: > > 2010-06-11 04:33:17,024 INFO > > org.apache.hadoop.hbase.master.RegionServerOperation: > ProcessServerShutdown > > reassigning ROOT region > > 2010-06-11 04:33:17,024 INFO > org.apache.hadoop.hbase.master.RegionManager: > > -ROOT- region unset (but not set to be reassigned) > > 2010-06-11 04:33:17,024 INFO > org.apache.hadoop.hbase.master.RegionManager: > > ROOT inserted into regionsInTransition > > 2010-06-11 04:33:17,064 INFO > org.apache.hadoop.hbase.master.RegionManager: > > Assigning region -ROOT-,,0 to sjc1-hadoop6.sjc1.carrieriq.com > > ,60020,1276204602430 > > 2010-06-11 04:33:20,070 INFO > org.apache.hadoop.hbase.master.ServerManager: > > Processing MSG_REPORT_PROCESS_OPEN: -ROOT-,,0 from > > sjc1-hadoop6.sjc1.carrieriq.com,60020,1276204602430; 1 of 2 > > 2010-06-11 04:33:20,070 INFO > org.apache.hadoop.hbase.master.ServerManager: > > Processing MSG_REPORT_OPEN: -ROOT-,,0 from > > sjc1-hadoop6.sjc1.carrieriq.com,60020,1276204602430; > > 2 of 2 > > 2010-06-11 04:33:25,891 INFO org.apache.hadoop.hbase.master.BaseScanner: > > RegionManager.metaScanner scanning meta region {server: > 10.201.8.208:60020 > > , > > regionname: .META.,,1, startKey: <>} > > 2010-06-11 04:33:26,502 INFO org.apache.hadoop.hbase.master.BaseScanner: > > RegionManager.metaScanner scan of 161 row(s) of meta region {server: > > 10.201.8.208:60020, regionname: .META.,,1, startKey: <>} complete > > 2010-06-11 04:33:26,502 INFO org.apache.hadoop.hbase.master.BaseScanner: > > All > > 1 .META. region(s) scanned > > 2010-06-11 04:33:29,826 INFO org.apache.hadoop.ipc.HBaseServer: IPC > Server > > handler 7 on 60000, call regionServerReport(address: 10.201.8.205:60020, > > startcode: 1276204602195, load: (requests=19880, regions=24, > usedHeap=533, > > maxHeap=3983), [Lorg.apache.hadoop.hbase.HMsg;@6c9a819d, > > [Lorg.apache.hadoop.hbase.HRegionInfo;@13b3a21f) from 10.201.8.205:60755 > : > > error: org.apache.hadoop.hbase.Leases$LeaseStillHeldException > > org.apache.hadoop.hbase.Leases$LeaseStillHeldException > > at > > > > > org.apache.hadoop.hbase.master.ServerManager.regionServerReport(ServerManager.java:252) > > at > > > org.apache.hadoop.hbase.master.HMaster.regionServerReport(HMaster.java:724) > > at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source) > > at > > > > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > > at java.lang.reflect.Method.invoke(Method.java:597) > > at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:657) > > at > > org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915) > > 2010-06-11 04:33:40,350 INFO > org.apache.hadoop.hbase.master.ServerManager: > > 5 > > region servers, 1 dead, average load 27.0[ > sjc1-hadoop1.sjc1.carrieriq.com > > ,60020,1276204602195] > > 2010-06-11 04:33:45,892 INFO org.apache.hadoop.hbase.master.BaseScanner: > > RegionManager.rootScanner scanning meta region {server: > 10.201.8.210:60020 > > , > > regionname: -ROOT-,,0, startKey: <>} > > 2010-06-11 04:34:17,044 INFO > > org.apache.hadoop.hbase.master.RegionServerOperation: process shutdown of > > server sjc1-hadoop1.sjc1.carrieriq.com,60020,1276204602195: logSplit: > > true, > > rootRescanned: false, numberOfMetaRegions: 1, onlineMetaRegions.size(): 1 > > 2010-06-11 04:34:17,044 INFO > > org.apache.hadoop.hbase.master.RegionServerOperation: Log split complete, > > meta reassignment and scanning: > > 2010-06-11 04:34:24,195 INFO org.apache.hadoop.hbase.master.BaseScanner: > > RegionManager.rootScanner scan of 1 row(s) of meta region {server: > > 10.201.8.210:60020, regionname: -ROOT-,,0, startKey: <>} complete > > > > 10.201.8.205 is one of the region servers which went down. > > Here is region server log from 10.201.8.205 > > > > 2010-06-11 04:32:42,734 INFO > > org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_OPEN: > > > > > T3_HBASE_NO_INCR_ORCL_SQLLDR-ERRORS-1276229903001-0,7FE653FC85CC4242C46672EED004EC87.4,1276230744561 > > 2010-06-11 04:32:42,734 INFO > > org.apache.hadoop.hbase.regionserver.HRegionServer: Worker: > > MSG_REGION_OPEN: > > > > > T3_HBASE_NO_INCR_ORCL_SQLLDR-ERRORS-1276229903001-0,7FE653FC85CC4242C46672EED004EC87.4,1276230744561 > > 2010-06-11 04:32:43,441 INFO > org.apache.hadoop.hbase.regionserver.HRegion: > > region > > > > > T3_HBASE_NO_INCR_ORCL_SQLLDR-ERRORS-1276229903001-0,7FE653FC85CC4242C46672EED004EC87.4,1276230744561/1645281535 > > available; sequence id is 342755 > > 2010-06-11 04:32:43,515 INFO > org.apache.hadoop.hbase.regionserver.HRegion: > > Starting compaction on region > > > > > T3_HBASE_NO_INCR_ORCL_SQLLDR-ERRORS-1276229903001-0,7FE653FC85CC4242C46672EED004EC87.4,1276230744561 > > 2010-06-11 04:33:10,793 WARN org.apache.zookeeper.ClientCnxn: Exception > > closing session 0x12923b8c6b50001 to sun.nio.ch.selectionkeyi...@69267649 > > java.io.IOException: TIMED OUT > > at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:906) > > 2010-06-11 04:33:10,895 INFO > > org.apache.hadoop.hbase.regionserver.HRegionServer: Got ZooKeeper event, > > state: Disconnected, type: None, path: null > > 2010-06-11 04:33:29,676 INFO org.apache.zookeeper.ClientCnxn: Attempting > > connection to server sjc1-hadoop0.sjc1.carrieriq.com/10.201.8.204:2181 > > 2010-06-11 04:33:29,676 INFO org.apache.zookeeper.ClientCnxn: Priming > > connection to java.nio.channels.SocketChannel[connected local=/ > > 10.201.8.205:31261 remote= > > sjc1-hadoop0.sjc1.carrieriq.com/10.201.8.204:2181] > > 2010-06-11 04:33:29,677 INFO org.apache.zookeeper.ClientCnxn: Server > > connection successful > > 2010-06-11 04:33:29,678 INFO > > org.apache.hadoop.hbase.regionserver.HRegionServer: Got ZooKeeper event, > > state: Expired, type: None, path: null > > 2010-06-11 04:33:29,678 WARN org.apache.zookeeper.ClientCnxn: Exception > > closing session 0x12923b8c6b50001 to sun.nio.ch.selectionkeyi...@32bc4106 > > java.io.IOException: Session Expired > > at > > > > > org.apache.zookeeper.ClientCnxn$SendThread.readConnectResult(ClientCnxn.java:589) > > at > org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:709) > > at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:945) > > 2010-06-11 04:33:29,678 ERROR > > org.apache.hadoop.hbase.regionserver.HRegionServer: ZooKeeper session > > expired > > 2010-06-11 04:33:29,678 INFO org.apache.hadoop.ipc.HBaseServer: IPC > Server > > handler 0 on 60020, call put([...@2ba504e0, > > [Lorg.apache.hadoop.hbase.client.Put;@7b420664) from 10.201.8.205:24720: > > error: java.io.IOException: Server not running, aborting > > java.io.IOException: Server not running, aborting > > at > > > > > org.apache.hadoop.hbase.regionserver.HRegionServer.checkOpen(HRegionServer.java:2304) > > at > > > > > org.apache.hadoop.hbase.regionserver.HRegionServer.put(HRegionServer.java:1771) > > at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source) > > at > > > > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > > at java.lang.reflect.Method.invoke(Method.java:597) > > at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:657) > > at > > org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915) > > 2010-06-11 04:33:29,679 INFO org.apache.hadoop.ipc.HBaseServer: IPC > Server > > handler 7 on 60020, call put([...@1d014ece, > > [Lorg.apache.hadoop.hbase.client.Put;@40dcc542) from 10.201.8.210:3754: > > error: java.io.IOException: Server not running, aborting > > java.io.IOException: Server not running, aborting > > at > > > > > org.apache.hadoop.hbase.regionserver.HRegionServer.checkOpen(HRegionServer.java:2304) > > at > > > > > org.apache.hadoop.hbase.regionserver.HRegionServer.put(HRegionServer.java:1771) > > at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source) > > at > > > > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > > at java.lang.reflect.Method.invoke(Method.java:597) > > at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:657) > > at > > org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915) > > 2010-06-11 04:33:29,679 INFO org.apache.hadoop.ipc.HBaseServer: IPC > Server > > handler 4 on 60020, call put([...@2d4ed3d0, > > [Lorg.apache.hadoop.hbase.client.Put;@123413ed) from 10.201.8.210:3755: > > error: java.io.IOException: Server not running, aborting > > java.io.IOException: Server not running, aborting > > at > > > > > org.apache.hadoop.hbase.regionserver.HRegionServer.checkOpen(HRegionServer.java:2304) > > at > > > > > org.apache.hadoop.hbase.regionserver.HRegionServer.put(HRegionServer.java:1771) > > at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source) > > at > > > > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > > at java.lang.reflect.Method.invoke(Method.java:597) > > at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:657) > > at > > org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915) > > > > Later, I restarted region server but I saw: > > > > 2010-06-11 05:05:34,087 INFO > org.apache.hadoop.hbase.regionserver.HRegion: > > Closed TED-THREEGPPSPEECHCALLS-1276213362068-17,,1276213463847 > > 2010-06-11 05:05:34,087 INFO > > org.apache.hadoop.hbase.regionserver.HRegionServer: Worker: > > MSG_REGION_CLOSE: MetaTable-15,,1276213160085: Overloaded > > 2010-06-11 05:05:34,088 INFO > org.apache.hadoop.hbase.regionserver.HRegion: > > Closed MetaTable-15,,1276213160085 > > 2010-06-11 05:06:07,818 ERROR > > org.apache.hadoop.hbase.regionserver.HRegionServer: > > org.apache.hadoop.hbase.NotServingRegionException: -ROOT-,,0 > > at > > > > > org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2266) > > at > > > > > org.apache.hadoop.hbase.regionserver.HRegionServer.getClosestRowBefore(HRegionServer.java:1715) > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > > at > > > > > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > > at > > > > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > > at java.lang.reflect.Method.invoke(Method.java:597) > > at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:657) > > at > > org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915) > > 2010-06-11 05:06:07,820 INFO org.apache.hadoop.ipc.HBaseServer: IPC > Server > > handler 0 on 60020, call getClosestRowBefore([...@51f88fbd, [...@15a4eb56, > > [...@7e566633) from 10.201.8.212:2687: error: > > org.apache.hadoop.hbase.NotServingRegionException: -ROOT-,,0 > > org.apache.hadoop.hbase.NotServingRegionException: -ROOT-,,0 > > at > > > > > org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2266) > > at > > > > > org.apache.hadoop.hbase.regionserver.HRegionServer.getClosestRowBefore(HRegionServer.java:1715) > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > > at > > > > > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > > at > > > > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > > at java.lang.reflect.Method.invoke(Method.java:597) > > at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:657) > > at > > org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915) > > >
