I'm running some YCSB tests and am seeing performance loss when I enable lzo on the table when doing
the "load." There are times where the insert rate will drop to 0 operations per second.
The drop in ops/sec is caused by:
16:17:51,410 INFO HRegion: Blocking updates for 'IPC Server handler 72 on 60020' on region
usertable,user1754755504,1301436104287.779d91282740038221bca426493d9ae0.: memstore size 192.0m is >=
than blocking 192.0m size
16:17:51,808 INFO HRegion: Blocking updates for 'IPC Server handler 80 on 60020' on region
usertable,user1264540941,1301436193579.97d94c6fed18aa0c35afcc4e4e3505d7.: memstore size 192.2m is >=
than blocking 192.0m size
16:17:51,909 INFO HRegion: Blocking updates for 'IPC Server handler 39 on 60020' on region
usertable,user1264540941,1301436193579.97d94c6fed18aa0c35afcc4e4e3505d7.: memstore size 192.2m is >=
than blocking 192.0m size
16:17:51,983 INFO HRegion: Blocking updates for 'IPC Server handler 10 on 60020' on region
usertable,user1264540941,1301436193579.97d94c6fed18aa0c35afcc4e4e3505d7.: memstore size 192.2m is >=
than blocking 192.0m size
16:17:52,127 INFO HRegion: Blocking updates for 'IPC Server handler 34 on 60020' on region
usertable,user1264540941,1301436193579.97d94c6fed18aa0c35afcc4e4e3505d7.: memstore size 192.2m is >=
than blocking 192.0m size
16:17:52,223 INFO HRegion: Blocking updates for 'IPC Server handler 61 on 60020' on region
usertable,user1264540941,1301436193579.97d94c6fed18aa0c35afcc4e4e3505d7.: memstore size 192.2m is >=
than blocking 192.0m size
During the same time I see:
16:17:50,116 INFO HRegion: Finished memstore flush of ~199.9m for region
usertable,user1020000000,1301436193579.b2ad4a4b18f02a0d4c104dfd445cb81c. in 14882ms,
sequenceid=434372473, compaction requested=false
16:17:50,277 INFO HRegion: Finished memstore flush of ~5.5k for region
response-index,R_cwrJ0DhNp9u5lgo,1298774642178.043cabc33b864e030253cd692a608792. in 161ms,
sequenceid=434372741, compaction requested=true
16:18:04,439 INFO HRegion: Finished memstore flush of ~177.1m for region
usertable,user1264540941,1301436193579.97d94c6fed18aa0c35afcc4e4e3505d7. in 14161ms,
sequenceid=434372759, compaction requested=true
16:18:18,289 INFO HRegion: Finished memstore flush of ~192.0m for region
usertable,user1754755504,1301436104287.779d91282740038221bca426493d9ae0. in 13850ms,
sequenceid=434373121, compaction requested=true
16:18:39,521 INFO HRegion: Finished memstore flush of ~168.5m for region
usertable,user150997537,1301436104287.15f426db2efc810aa23ab753fa7b8965. in 21228ms,
sequenceid=434373508, compaction requested=false
Which looks like it is taking a lot of time to flush the memstore with lzo enabled. For comparison
here are similar memstore flushes with and without lzo.
With lzo:
15:55:45,385 INFO HRegion: Finished memstore flush of ~64.1m for region
usertable,user9\x09p:\xDA\xDA\xDA\xDA\xDA\x9C,1301435595000.9c47aed1f43b745ade509c4e390c14d7. in
4191ms, sequenceid=434307849, compaction requested=false
15:58:50,836 INFO HRegion: Finished memstore flush of ~64.1m for region
usertable,user7\x08p\xBA\xDA\xDA\xDA\xDA\xDA\xAC,1301435594999.8f3df80dab5b6df73761f203f48c1bad. in
3773ms, sequenceid=434312129, compaction requested=false
16:00:04,289 INFO HRegion: Finished memstore flush of ~64.1m for region
usertable,user5\x07q:\xDA\xDA\xDA\xDA\xDA\xBC,1301435594998.68f3fbd6fe3c364031b0dc0a7d4fbddc. in
3613ms, sequenceid=434313422, compaction requested=false
16:00:07,916 INFO HRegion: Finished memstore flush of ~66.1m for region
usertable,user9\x09p:\xDA\xDA\xDA\xDA\xDA\x9C,1301435595000.9c47aed1f43b745ade509c4e390c14d7. in
3627ms, sequenceid=434313616, compaction requested=false
Without lzo:
16:41:54,415 INFO HRegion: Finished memstore flush of ~65.0m for region
usertable,user20\xB1\xF000000(,1301438473877.3740787fab9695d18f77dc3ec08f5ce9. in 892ms,
sequenceid=434359154, compaction requested=false
16:42:03,846 INFO HRegion: Finished memstore flush of ~64.0m for region
usertable,user41\xB1p00000\x18,1301438473878.2f2de0ac90c3718bc6cfef2df4dd9398. in 808ms,
sequenceid=434359429, compaction requested=false
16:42:08,658 INFO HRegion: Finished memstore flush of ~64.1m for region
usertable,user62\xB0\xF000000\x08,1301438473879.914eaad85cfbf6f4d60f943dfbcc2221. in 1369ms,
sequenceid=434359568, compaction requested=false
16:42:09,593 INFO HRegion: Finished memstore flush of ~65.3m for region
usertable,user20\xB1\xF000000(,1301438473877.3740787fab9695d18f77dc3ec08f5ce9. in 935ms,
sequenceid=434359632, compaction requested=false
The average lzo flush takes ~3801ms whereas a non-lzo flush takes ~1001ms which is 3.8x slower. Any
ideas of what to do to prevent this problem? I'm guessing it might be fixable with some parameter
tuning. Here are the important parts of my hbase-site.xml.
<property>
<name>hbase.regionserver.handler.count</name>
<value>100</value>
</property>
<property>
<name>hbase.zookeeper.property.maxClientCnxns</name>
<value>100</value>
</property>
<property>
<name>hbase.hregion.max.filesize</name>
<value>536870912</value>
</property>
<property>
<name>hbase.hregion.memstore.block.multiplier</name>
<value>3</value>
</property>
<property>
<name>hbase.hstore.blockingStoreFiles</name>
<value>14</value>
</property>
<property>
<name>hbase.hstore.compactionThreshold</name>
<value>5</value>
</property>
Any suggestions would be great.
Thanks,
~Jeff
--
Jeff Whiting
Qualtrics Senior Software Engineer
[email protected]