Oh I'm so dumb, I didn't see that this is coming from _another_ region
server. What happens when splitting is that we try to update the
.META. table which requires talking to another region server, except
that in this case the RS was aborting meaning that the split
transaction had to be rolled back.

Looking at the logs of the RS that carried .META. at that time would
give you more clues into what really happened.

J-D

On Tue, May 31, 2011 at 5:24 PM, bijieshan <[email protected]> wrote:
> I paste the logs as following. To short of the logs, some similar and closer 
> logs I deleted:
>
>> 2011-05-25 09:15:44,232 INFO 
>> org.apache.hadoop.hbase.regionserver.CompactSplitThread: Running rollback of 
>> failed split of 
>> ufdr,0065286138106876#4228000,1306260358978.37875b35a870957da534ad29fd2944d5.;
>>  java.io.IOException: Server not running, aborting
>>        at 
>> org.apache.hadoop.hbase.regionserver.HRegionServer.checkOpen(HRegionServer.java:2352)
>>        at 
>> org.apache.hadoop.hbase.regionserver.HRegionServer.put(HRegionServer.java:1653)
>>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>        at 
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>        at 
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>        at java.lang.reflect.Method.invoke(Method.java:597)
>>        at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570)
>>        at 
>> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039)
>>
>>>>[description]
>> Before that time, everything seemed ok(it received so many open-region 
>> requests, and handled them), except several Exception happened during the 
>> time.
>>
>>>>[logs]
>> 2011-05-25 09:15:36,214 INFO 
>> org.apache.hadoop.hbase.regionserver.HRegionServer: Received request to open 
>> region: ufdr,110018,1305617626846.f22eafaf06c3be7f7e5a447cec40c7fa.
>> 2011-05-25 09:15:36,214 INFO 
>> org.apache.hadoop.hbase.regionserver.HRegionServer: Received request to open 
>> region: ufdr,111336,1305617636659.b47f1cf09b10aecaee6805a985646b30.
>> 2011-05-25 09:15:36,215 INFO 
>> org.apache.hadoop.hbase.regionserver.HRegionServer: Received request to open 
>> region: ufdr,280884,1305617952747.7cac8ccc9c9770ef0e48807e07bc8bec.
>> 2011-05-25 09:15:36,215 INFO 
>> org.apache.hadoop.hbase.regionserver.HRegionServer: Received request to open 
>> region: ufdr,280446,1305617949439.b332cfff565d77b2e2d4f1a2c52494b0.
>> 2011-05-25 09:15:36,216 INFO 
>> org.apache.hadoop.hbase.regionserver.HRegionServer: Received request to open 
>> region: ufdr,101974,1305617623149.b60c71565f2c73b81477fe25ba80378b.
>> 2011-05-25 09:15:36,413 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
>> regionserver:20020-0x32ffbc7e6e00321 Attempting to transition node 
>> 11dc72d94c7a5a3d19b0c0c3c49624a5 from RS_ZK_REGION_OPENING to 
>> RS_ZK_REGION_OPENING
>> 2011-05-25 09:15:36,423 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
>> regionserver:20020-0x32ffbc7e6e00321 Successfully transitioned node 
>> 11dc72d94c7a5a3d19b0c0c3c49624a5 from RS_ZK_REGION_OPENING to 
>> RS_ZK_REGION_OPENING
>> 2011-05-25 09:15:37,645 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
>> regionserver:20020-0x32ffbc7e6e00321 Attempting to transition node 
>> c4c657deda864fa2018b38388ee928e6 from RS_ZK_REGION_OPENING to 
>> RS_ZK_REGION_OPENING
>> 2011-05-25 09:15:37,653 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
>> regionserver:20020-0x32ffbc7e6e00321 Attempting to transition node 
>> d2546cde1d6546133af74f25252b652c from RS_ZK_REGION_OPENING to 
>> RS_ZK_REGION_OPENING
>> 2011-05-25 09:15:37,660 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
>> regionserver:20020-0x32ffbc7e6e00321 Successfully transitioned node 
>> c4c657deda864fa2018b38388ee928e6 from RS_ZK_REGION_OPENING to 
>> RS_ZK_REGION_OPENING
>> 2011-05-25 09:15:37,662 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
>> regionserver:20020-0x32ffbc7e6e00321 Attempting to transition node 
>> bd07db88de68c1022c5bba9e5471c9f9 from RS_ZK_REGION_OPENING to 
>> RS_ZK_REGION_OPENING
>> 2011-05-25 09:15:37,668 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
>> regionserver:20020-0x32ffbc7e6e00321 Successfully transitioned node 
>> d2546cde1d6546133af74f25252b652c from RS_ZK_REGION_OPENING to 
>> RS_ZK_REGION_OPENING
>>
>>>>[description]
>> From the code, I saw that , only the Server has aborted, the msg of "Server 
>> not running, aborting" could be shown. It determined by the param of 
>> HRegionServer#abortRequested, if it was true, and if somewhere called the 
>> method of HRegionServer#checkOpen, this msg would been printed.
>>
>>  /**
>>   * Called to verify that this server is up and running.
>>   *
>>   * @throws IOException
>>   */
>>  protected void checkOpen() throws IOException {
>>    if (this.stopped || this.abortRequested) {
>>      throw new IOException("Server not running"
>>          + (this.abortRequested ? ", aborting" : ""));
>>    }
>>    if (!fsOk) {
>>      throw new IOException("File system not available");
>>    }
>>  }
>>
>> For I said there's no aborted info been found in Logs:
>>
>> public void abort(String reason, Throwable cause) {
>>    if (cause != null) {
>>      LOG.fatal("ABORTING region server " + this + ": " + reason, cause);
>>    } else {
>>      LOG.fatal("ABORTING region server " + this + ": " + reason);
>>    }
>>    this.abortRequested = true;
>>    this.reservedSpace.clear();
>>    if (this.metrics != null) {
>>      LOG.info("Dump of metrics: " + this.metrics);
>>    }
>>    stop(reason);
>>  }
>>
>> For there's no prev-aborting found in the logs, how did the sudden msg of " 
>> java.io.IOException: Server not running, aborting " occurred?
>> Thanks!
>>
>> Jieshan Bean.
>>
>>
>>
>>
>>
>

Reply via email to