Hi there
I'm investigating a problem we have with a MR job and I discovered that
the tasks that fail (scan lease expired while fetching next row) were
processing one particular region.
I've written a small app that scans that region and counts its rows and
run it on same machine where region is hosted. The result is very very
poor, scan speed is in average 7 rows/sec and sometimes when scan
caching is increased it gets lease expired exception. By contrary,
scanning the other regions from same table on same machine with same
caching value gets ~3800 rows/sec. Any idea what can cause such
dizastrous scan performance on a particular region ?
Some extra info
hbase is 0.90.4
lease timeout is 4 minutes
table has 1 family, cell values are empty, row keys and qualifiers are
small strings, biggest row has 146 columns
row sizes are almost identical since table was create by a load tool and
each row has almost the same nr of colums with same kind of values...
all regions have 1 store file of ~655MB
cluster has no activity except the test app
GC activity looks normal
regions might have many deleted KV (we were testing data cleanup with MR
jobs)
major compaction is deactivated and we didn't run it for some time
Can this problem be caused by the last 2 points above, many deleted KV
concentrated on that region so they need to be skipped by the StoreScanners?
Any other thoughts?
Thanks
Daniel