We are frequently seeing "flush storms" like the following:

2012-03-29 07:44:32,743 INFO
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter: Using
syncFs -- HDFS-200
2012-03-29 07:44:32,749 INFO org.apache.hadoop.hbase.regionserver.wal.HLog:
Roll /hbase/.logs/alf-data1001.ve.box.net,60020,1329866131456/
alf-data1001.ve.box.net%3A60020.1333030764796, entries=64807, filesize=63774
984. New hlog /hbase/.logs/alf-data1001.ve.box.net,60020,1329866131456/
alf-data1001.ve.box.net%3A60020.1333032272737
2012-03-29 07:44:32,749 INFO org.apache.hadoop.hbase.regionserver.wal.HLog:
moving old hlog file /hbase/.logs/alf-data1001.ve.box.net
,60020,1329866131456/alf-data1001.ve.box.net%3A60020.1332965480857 whose
highest
sequenceid is 539044450 to /hbase/.oldlogs/alf-data1001.ve.box.net
%3A60020.1332965480857
2012-03-29 07:44:32,758 INFO org.apache.hadoop.hbase.regionserver.wal.HLog:
Too many hlogs: logs=33, maxlogs=32; *forcing flush of 146 regions*(s):
0266c1e46d2ddb16fc78720d8ff27d54, 0276c53e422628bc40d00e3623e37eea, *{truncated
for clarity}*
2012-03-29 07:44:32,998 INFO
org.apache.hadoop.hbase.regionserver.StoreFile: Bloom added to HFile
(hdfs://
alf-name1001.ve.box.net/hbase/actionlog_all/0266c1e46d2ddb16fc78720d8ff27d54/.tmp/6451618467415382295):
23.9k, 20396/20396 (100%)
2012-03-29 07:44:33,015 INFO org.apache.hadoop.hbase.regionserver.Store:
Renaming flushed file at hdfs://
alf-name1001.ve.box.net/hbase/actionlog_all/0266c1e46d2ddb16fc78720d8ff27d54/.tmp/6451618467415382295to
hdfs://
alf-name1001.ve.box.net/hbase/actionlog_all/0266c1e46d2ddb16fc78720d8ff27d54/actions/1224153605422917866
2012-03-29 07:44:33,019 INFO
org.apache.hadoop.hbase.regionserver.StoreFile$Reader: Loaded row bloom
filter metadata for hdfs://
alf-name1001.ve.box.net/hbase/actionlog_all/0266c1e46d2ddb16fc78720d8ff27d54/actions/1224153605422917866
2012-03-29 07:44:33,019 INFO org.apache.hadoop.hbase.regionserver.Store:
Added hdfs://
alf-name1001.ve.box.net/hbase/actionlog_all/0266c1e46d2ddb16fc78720d8ff27d54/actions/1224153605422917866,
entries=20396, sequenceid=541865234, memsize=19.9m, filesize=4.1m
2012-03-29 07:44:33,021 INFO org.apache.hadoop.hbase.regionserver.HRegion:
Starting compaction on region
actionlog_all,\xC9\x07_\x00\xC2\xA6\xB3_\x00\x00\x013\xB37\x85,,1331710865818.0266c1e46d2ddb16fc78720d8ff27d54.
2012-03-29 07:44:33,023 INFO org.apache.hadoop.hbase.regionserver.Store:
Started compaction of 3 file(s) in cf=actions  into hdfs://
alf-name1001.ve.box.net/hbase/actionlog_all/0266c1e46d2ddb16fc78720d8ff27d54/.tmp,
seqid=541865234, totalSize=10.9m
2012-03-29 07:44:33,108 INFO
org.apache.hadoop.hbase.regionserver.StoreFile: Bloom added to HFile
(hdfs://
alf-name1001.ve.box.net/hbase/actionlog_changes/0276c53e422628bc40d00e3623e37eea/.tmp/7544373528311273725):
6.6k, 5578/5578 (100%)
2012-03-29 07:44:33,115 INFO org.apache.hadoop.hbase.regionserver.Store:
Renaming flushed file at hdfs://
alf-name1001.ve.box.net/hbase/actionlog_changes/0276c53e422628bc40d00e3623e37eea/.tmp/7544373528311273725to
hdfs://
alf-name1001.ve.box.net/hbase/actionlog_changes/0276c53e422628bc40d00e3623e37eea/actions/2305627845824500441
2012-03-29 07:44:33,139 INFO
org.apache.hadoop.hbase.regionserver.StoreFile$Reader: Loaded row bloom
filter metadata for hdfs://
alf-name1001.ve.box.net/hbase/actionlog_changes/0276c53e422628bc40d00e3623e37eea/actions/2305627845824500441
2012-03-29 07:44:33,139 INFO org.apache.hadoop.hbase.regionserver.Store:
Added hdfs://
alf-name1001.ve.box.net/hbase/actionlog_changes/0276c53e422628bc40d00e3623e37eea/actions/2305627845824500441,
entries=5578, sequenceid=541865381, memsize=5.5m, filesize=870.6k
2012-03-29 07:44:33,252 INFO
org.apache.hadoop.hbase.regionserver.StoreFile: Bloom added to HFile
(hdfs://
alf-name1001.ve.box.net/hbase/actionlog_all/051f919e7810fae600843859f45a4ab8/.tmp/3436496607242466108):
7.5k, 6328/6328 (100%)
2012-03-29 07:44:33,278 INFO org.apache.hadoop.hbase.regionserver.Store:
Renaming flushed file at hdfs://
alf-name1001.ve.box.net/hbase/actionlog_all/051f919e7810fae600843859f45a4ab8/.tmp/3436496607242466108to
hdfs://
alf-name1001.ve.box.net/hbase/actionlog_all/051f919e7810fae600843859f45a4ab8/actions/2542339108025847509
2012-03-29 07:44:33,283 INFO
org.apache.hadoop.hbase.regionserver.StoreFile$Reader: Loaded row bloom
filter metadata for hdfs://
alf-name1001.ve.box.net/hbase/actionlog_all/051f919e7810fae600843859f45a4ab8/actions/2542339108025847509
2012-03-29 07:44:33,284 INFO org.apache.hadoop.hbase.regionserver.Store:
Added hdfs://
alf-name1001.ve.box.net/hbase/actionlog_all/051f919e7810fae600843859f45a4ab8/actions/2542339108025847509,
entries=6328, sequenceid=541865393, memsize=6.3m, filesize=1002.1k
2012-03-29 07:44:33,408 INFO
org.apache.hadoop.hbase.regionserver.StoreFile: Bloom added to HFile
(hdfs://
alf-name1001.ve.box.net/hbase/actionlog_all/09d91fd5d9823106ff63946e3b35f257/.tmp/4799647439140326440):
7.0k, 5967/5967 (100%)
2012-03-29 07:44:33,415 INFO org.apache.hadoop.hbase.regionserver.Store:
Renaming flushed file at hdfs://
alf-name1001.ve.box.net/hbase/actionlog_all/09d91fd5d9823106ff63946e3b35f257/.tmp/4799647439140326440to
hdfs://
alf-name1001.ve.box.net/hbase/actionlog_all/09d91fd5d9823106ff63946e3b35f257/actions/2390220271020153320
...

Our region server are each hosting ~270 regions. Our writes are extremely
well distributed (our HBase keys are output from a hash function) and small
(~100s of bytes). I believe that the writes are being so well distributed
across the regions that no one MemStore grows large enough to be written to
HDFS, forcing the RegionServer to retain the old WALs. When these flush
storms happen, we also see a compaction storm as all of the StoreFiles are
written out at once for the flushed regions, as well as increased latency.
What would be the recommended way to deal with this situation? Increase *
hbase*.regionserver.*maxlogs? Decrease *hbase.hregion.memstore.flush.size?
Are there other configuration knobs to tweak to control how long MemStores
stick around before being flushed?

Reply via email to