I kept running into the stop-the-world GC during batch import of data into
hbase. The configuration of a node in the 8-node cluster is as follows.
* 4-core
* 64-bit JVM
* 8 GB of memory
* CDH2 for hadoop and 0.20.5 for hbase
* TT: 128 MB
* DN: 128 MB
* 2 Mappers at 512 MB each
* 2 Reducer at 512 MB each
* 1 regionserver at 4096 MB
The import job was a mapper only job so that only TT, DN, 2 mappers and
regionserver were running. Below is the JMX output for the dead
regionserver.
Time:
2010-07-29 12:25:47
Used:
224,949 kbytes
Committed:
670,728 kbytes
Max:
4,185,792 kbytes
GC time:
5 minutes on ParNew (2,126 collections)
0.000 seconds on ConcurrentMarkSweep (0 collections)
Clearly the regionserver was spent all GC time on ParNew, which was not
surprising as I was imported tons of data. But I could not figure out why
the same GC that usually take way less than a second, took 299 secs at line
3. Any enlightenment is greatly appreciated.
I will change ParNew to 6M as documented in Performance Tuning page and gave
it another shot.
010-07-28T12:06:57.249-0700: 2406.986: [GC 2406.986: [ParNew:
17786K->755K(19136K), 0.0015410 secs] 348288K->331394K(620416K) icms_dc=27 ,
0.0016330 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
2010-07-28T12:06:57.268-0700: 2407.004: [GC 2407.004: [ParNew:
17580K->761K(19136K), 0.0016710 secs] 348154K->331343K(620416K) icms_dc=27 ,
0.0017610 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
2010-07-28T12:06:57.288-0700: 2407.024: [GC 2407.088: [ParNew:
17564K->757K(19136K), 299.1513910 secs] 348081K->331283K(620416K) icms_dc=27
, 299.1515120 secs] [Times: user=0.17 sys=0.04, real=299.23 secs]
2010-07-28T12:11:56.558-0700: 2706.294: [GC 2706.294: [ParNew:
17735K->925K(19136K), 0.0094600 secs] 348197K->331458K(620416K) icms_dc=27 ,
0.0095670 secs] [Times: user=0.01 sys=0.00, real=0.01 secs]
2010-07-28T12:11:56.606-0700: 2706.343: [GC 2706.343: [ParNew:
17940K->932K(19136K), 0.0085750 secs] 348473K->331474K(620416K) icms_dc=27 ,
0.0086710 secs] [Times: user=0.01 sys=0.00, real=0.01 secs]