I am pretty sure that is the case but will double check.
I found another case where RS died without an apparent stop-the-world GC.
RS grv-hadoopc05
*** rs.log ***
2010-07-30 10:43:36,028 INFO org.apache.hadoop.hbase.regionserver.HLog: Roll
/hbase/.logs/grv-hadoopc05.local,60020,1280508235347/hlog.dat.1280508235693,
entries=2, calcsize=438, filesize=349. New hlog
/hbase/.logs/grv-hadoopc05.local,60020,1280508235347/hlog.dat.1280511816023
2010-07-30 10:43:36,031 INFO org.apache.hadoop.hbase.regionserver.HLog:
removing old hlog file
/hbase/.logs/grv-hadoopc05.local,60020,1280508235347/hlog.dat.1280508235693
whose highest sequence/edit id is 3894
2010-07-30 10:43:36,739 DEBUG org.apache.zookeeper.ClientCnxn: Got ping
response for sessionid:0x12a243c71170001 after 1ms
.
.
2010-07-30 11:08:56,823 DEBUG org.apache.zookeeper.ClientCnxn: Got ping
response for sessionid:0x12a243c71170001 after 1ms
2010-07-30 11:09:16,823 DEBUG org.apache.zookeeper.ClientCnxn: Got ping
response for sessionid:0x12a243c71170001 after 1ms
2010-07-30 11:10:29,113 WARN org.apache.zookeeper.ClientCnxn: Exception
closing session 0x12a243c71170001 to sun.nio.ch.selectionkeyi...@5421e554
java.io.IOException: Read error rc = -1 java.nio.DirectByteBuffer[pos=0
lim=4 cap=4]
at
org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:701)
at
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:945)
2o010-07-30 11:14:27,274 WARN
org.apache.hadoop.hbase.regionserver.HRegionServer: unable to report to
master for 300966 milliseconds - retrying
2010-07-30 11:14:27,274 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Got ZooKeeper event,
state: Disconnected, type: None, path: null
*** gc-hbase.log ***
2010-07-30T11:08:37.743-0700: 5111.742: [GC 5111.742: [ParNew:
5032K->28K(5568K), 0.0012490 secs] 30199K->25198K(42684K) icms_dc=0 ,
0.0013410 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
2010-07-30T11:10:11.619-0700: 5205.618: [GC 5205.618: [ParNew:
5020K->37K(5568K), 0.0011360 secs] 30190K->25210K(42684K) icms_dc=0 ,
0.0012350 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
2010-07-30T11:11:56.776-0700: 5310.775: [GC 5310.775: [ParNew:
5029K->25K(5568K), 0.0018500 secs] 30202K->25202K(42684K) icms_dc=0 ,
0.0019450 secs] [Times: user=0.01 sys=0.00, real=0.01 secs]
2010-07-30T11:13:28.876-0700: 5402.874: [GC 5402.874: [ParNew:
5017K->22K(5568K), 0.0015400 secs] 30194K->25202K(42684K) icms_dc=0 ,
0.0016590 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
2010-07-30T11:14:37.276-0700: 5471.274: [GC 5471.274: [ParNew:
5014K->158K(5568K), 0.0059290 secs] 30194K->25341K(42684K) icms_dc=0 ,
0.0060060 secs] [Times: user=0.00 sys=0.01, real=0.01 secs]
Heap
par new generation total 5568K, used 264K [0x00007fd7eda90000,
0x00007fd7ee090000, 0x00007fd7ee090000)
eden space 4992K, 2% used [0x00007fd7eda90000, 0x00007fd7edaaa838,
0x00007fd7edf70000)
from space 576K, 27% used [0x00007fd7ee000000, 0x00007fd7ee027978,
0x00007fd7ee090000)
to space 576K, 0% used [0x00007fd7edf70000, 0x00007fd7edf70000,
0x00007fd7ee000000)
concurrent mark-sweep generation total 37116K, used 25183K
[0x00007fd7ee090000, 0x00007fd7f04cf000, 0x00007fd8eda90000)
concurrent-mark-sweep perm gen total 25932K, used 18303K
[0x00007fd8eda90000, 0x00007fd8ef3e3000, 0x00007fd8f2e90000)
On master node
*** zookeeper.log ***
2010-07-30 11:10:20,000 INFO org.apache.zookeeper.server.SessionTrackerImpl:
Expiring session 0x12a243c71170001
2010-07-30 11:10:20,000 INFO org.apache.zookeeper.server.ZooKeeperServer:
Expiring session 0x12a243c71170001
2010-07-30 11:10:20,001 INFO
org.apache.zookeeper.server.PrepRequestProcessor: Processed session
termination request for id: 0x12a243c71170001
2010-07-30 11:10:20,001 DEBUG
org.apache.zookeeper.server.FinalRequestProcessor: Processing request::
sessionid:0x12a243c71170001 type:closeSession cxid:0x0 zxid:0x1121
txntype:-11 n/a
2010-07-30 11:10:20,002 DEBUG org.apache.zookeeper.server.DataTree: Deleting
ephemeral node /hbase/rs/1280508235347 for session 0x12a243c71170001
2010-07-30 11:10:20,002 INFO org.apache.zookeeper.server.NIOServerCnxn:
closing session:0x12a243c71170001 NIOServerCnxn:
java.nio.channels.SocketChannel[connected local=/10.128.135.100:2181remote=/
10.128.135.107:33396]
*** HMaster.log ***
010-07-30 11:10:20,002 DEBUG org.apache.zookeeper.ClientCnxn: Got
notification sessionid:0x12a243c71170000
2010-07-30 11:10:20,003 DEBUG org.apache.zookeeper.ClientCnxn: Got
WatchedEvent: Znode change. Path: /hbase/rs/1280508235347 Type: NodeDeleted
for sessionid 0x12a243c71170000
2010-07-30 11:10:20,003 INFO org.apache.hadoop.hbase.master.ServerManager:
grv-hadoopc05.local,60020,1280508235347 znode expired
2010-07-30 11:10:20,003 INFO org.apache.hadoop.hbase.master.RegionManager:
-ROOT- region unset (but not set to be reassigned)
Did HMaster log the wrong sessionid 0x12a243c71170000?