Hi all,

I've had a Hadoop system with hbase working for quite a long time now.  We've 
got hadoop-hbase-master-0.90.6+84.73-1 installed on Red Hat 5, with four 
regionservers on slave nodes, and the rest and thrift server running on the 
master.  Just today, and pretty much without warning the master crashed.  Now 
we can't restart it.  It starts, and then almost immediately dies.  No error 
message is appearing in the log, though it's cleaning itself up normally.  The 
log contains only:

2013-08-02T14:34:40.142-0400: [GC [ParNew: 17024K->1334K(19136K), 0.0052490 
secs] 17024K->1334K(83008K), 0.0053100 secs] [Times: user=0.02 sys=0.01, 
real=0.01 secs]
2013-08-02T14:34:40.347-0400: [GC [1 CMS-initial-mark: 0K(63872K)] 
9036K(83008K), 0.0071700 secs] [Times: user=0.02 sys=0.01, real=0.01 secs]
2013-08-02T14:34:40.471-0400: [GC [ParNew: 18358K->1234K(19136K), 0.0265690 
secs] 18358K->2644K(83008K), 0.0266550 secs] [Times: user=0.12 sys=0.00, 
real=0.03 secs]
2013-08-02T14:34:40.630-0400: [CMS-concurrent-mark: 0.013/0.275 secs] [Times: 
user=0.53 sys=0.01, real=0.27 secs]
2013-08-02T14:34:40.645-0400: [CMS-concurrent-preclean: 0.014/0.015 secs] 
[Times: user=0.01 sys=0.00, real=0.02 secs]
2013-08-02T14:34:40.645-0400: [CMS-concurrent-abortable-preclean: 0.000/0.000 
secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
2013-08-02T14:34:40.645-0400: [GC[YG occupancy: 7584 K (19136 K)][Rescan 
(parallel) , 0.0030240 secs][weak refs processing, 0.0000090 secs] [1 
CMS-remark: 1410K(63872K)] 8994K(83008K), 0.0031230 secs] [Times: user=0.02 
sys=0.00, real=0.00 secs]
2013-08-02T14:34:40.649-0400: [CMS-concurrent-sweep: 0.000/0.000 secs] [Times: 
user=0.00 sys=0.00, real=0.00 secs]
2013-08-02T14:34:40.726-0400: [CMS-concurrent-reset: 0.077/0.077 secs] [Times: 
user=0.02 sys=0.05, real=0.08 secs]
Heap
par new generation   total 19136K, used 7584K [0x00002b7281fe0000, 
0x00002b72834a0000, 0x00002b72957e0000)
  eden space 17024K,  37% used [0x00002b7281fe0000, 0x00002b7282613928, 
0x00002b7283080000)
  from space 2112K,  58% used [0x00002b7283080000, 0x00002b72831b4838, 
0x00002b7283290000)
  to   space 2112K,   0% used [0x00002b7283290000, 0x00002b7283290000, 
0x00002b72834a0000)
concurrent mark-sweep generation total 63872K, used 1410K [0x00002b72957e0000, 
0x00002b7299640000, 0x00002b7475fe0000)
concurrent-mark-sweep perm gen total 26256K, used 15758K [0x00002b7475fe0000, 
0x00002b7477984000, 0x00002b747b3e0000)

And if I restart I get essentially the exact same log overwriting this one 
(with new timestamps of course). The rest, thrift, and all the regionservers 
appear fine.  There's no issues with disk space or resources on the server box 
and HDFS appears fine.  Any advice of other places I can look for more data or 
how I might get more granularity in the logs?  Or does someone see an error I'm 
missing in what already being logged?

Thanks in advance,
Trevor Antczak

Reply via email to