Can you post the full log somewhere? You talk about several Exceptions but we can't see them.
J-D On Tue, May 31, 2011 at 4:41 AM, bijieshan <[email protected]> wrote: > It occurred in an RegionServer with an un-known reason. I have check this > RegionServer logs, there's no prev-aborting, and no other infos showed the > RegionServer has aborted. So I saw the following msg showed in a sudden. > >>>[logs] > 2011-05-25 09:15:44,232 INFO > org.apache.hadoop.hbase.regionserver.CompactSplitThread: Running rollback of > failed split of > ufdr,0065286138106876#4228000,1306260358978.37875b35a870957da534ad29fd2944d5.; > java.io.IOException: Server not running, aborting > at > org.apache.hadoop.hbase.regionserver.HRegionServer.checkOpen(HRegionServer.java:2352) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.put(HRegionServer.java:1653) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570) > at > org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039) > >>>[description] > Before that time, everything seemed ok(it received so many open-region > requests, and handled them), except several Exception happened during the > time. > >>>[logs] > 2011-05-25 09:15:36,214 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: Received request to open > region: ufdr,110018,1305617626846.f22eafaf06c3be7f7e5a447cec40c7fa. > 2011-05-25 09:15:36,214 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: Received request to open > region: ufdr,111336,1305617636659.b47f1cf09b10aecaee6805a985646b30. > 2011-05-25 09:15:36,215 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: Received request to open > region: ufdr,280884,1305617952747.7cac8ccc9c9770ef0e48807e07bc8bec. > 2011-05-25 09:15:36,215 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: Received request to open > region: ufdr,280446,1305617949439.b332cfff565d77b2e2d4f1a2c52494b0. > 2011-05-25 09:15:36,216 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: Received request to open > region: ufdr,101974,1305617623149.b60c71565f2c73b81477fe25ba80378b. > 2011-05-25 09:15:36,413 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: > regionserver:20020-0x32ffbc7e6e00321 Attempting to transition node > 11dc72d94c7a5a3d19b0c0c3c49624a5 from RS_ZK_REGION_OPENING to > RS_ZK_REGION_OPENING > 2011-05-25 09:15:36,423 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: > regionserver:20020-0x32ffbc7e6e00321 Successfully transitioned node > 11dc72d94c7a5a3d19b0c0c3c49624a5 from RS_ZK_REGION_OPENING to > RS_ZK_REGION_OPENING > 2011-05-25 09:15:37,645 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: > regionserver:20020-0x32ffbc7e6e00321 Attempting to transition node > c4c657deda864fa2018b38388ee928e6 from RS_ZK_REGION_OPENING to > RS_ZK_REGION_OPENING > 2011-05-25 09:15:37,653 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: > regionserver:20020-0x32ffbc7e6e00321 Attempting to transition node > d2546cde1d6546133af74f25252b652c from RS_ZK_REGION_OPENING to > RS_ZK_REGION_OPENING > 2011-05-25 09:15:37,660 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: > regionserver:20020-0x32ffbc7e6e00321 Successfully transitioned node > c4c657deda864fa2018b38388ee928e6 from RS_ZK_REGION_OPENING to > RS_ZK_REGION_OPENING > 2011-05-25 09:15:37,662 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: > regionserver:20020-0x32ffbc7e6e00321 Attempting to transition node > bd07db88de68c1022c5bba9e5471c9f9 from RS_ZK_REGION_OPENING to > RS_ZK_REGION_OPENING > 2011-05-25 09:15:37,668 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: > regionserver:20020-0x32ffbc7e6e00321 Successfully transitioned node > d2546cde1d6546133af74f25252b652c from RS_ZK_REGION_OPENING to > RS_ZK_REGION_OPENING > >>>[description] > From the code, I saw that , only the Server has aborted, the msg of "Server > not running, aborting" could be shown. It determined by the param of > HRegionServer#abortRequested, if it was true, and if somewhere called the > method of HRegionServer#checkOpen, this msg would been printed. > > /** > * Called to verify that this server is up and running. > * > * @throws IOException > */ > protected void checkOpen() throws IOException { > if (this.stopped || this.abortRequested) { > throw new IOException("Server not running" > + (this.abortRequested ? ", aborting" : "")); > } > if (!fsOk) { > throw new IOException("File system not available"); > } > } > > For I said there's no aborted info been found in Logs: > > public void abort(String reason, Throwable cause) { > if (cause != null) { > LOG.fatal("ABORTING region server " + this + ": " + reason, cause); > } else { > LOG.fatal("ABORTING region server " + this + ": " + reason); > } > this.abortRequested = true; > this.reservedSpace.clear(); > if (this.metrics != null) { > LOG.info("Dump of metrics: " + this.metrics); > } > stop(reason); > } > > For there's no prev-aborting found in the logs, how did the sudden msg of " > java.io.IOException: Server not running, aborting " occurred? > Thanks! > > Jieshan Bean. > > > > >
