Hi,
Quick question regarding the split.
Let's consider the table "work_proposed' below:
275164921921 hdfs://node3:9000/hbase/work_proposed
This is a 256GB table. I think there is more than 1B lines into it but I
have not counted them for a while.
This table as a pretty default definition:
hbase(main):001:0> describe 'work_proposed'
DESCRIPTION
ENABLED
'work_proposed', {NAME => '@', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER
=> 'ROW', REPLICATION_SCOPE => '0', COMPRESSION => 'NONE', VERSIONS => '3',
TTL => '2147483647', MIN
true
_VERSIONS => '0', KEEP_DELETED_CELLS => 'false', BLOCKSIZE => '65536',
ENCODE_ON_DISK => 'true', IN_MEMORY => 'false', BLOCKCACHE => 'true'},
{NAME => 'a',
DATA_BLOCK_ENCODIN
G => 'NONE', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0', VERSIONS =>
'3', COMPRESSION => 'NONE', MIN_VERSIONS => '0', TTL => '2147483647',
KEEP_DELETED_CELLS =>
'false',
BLOCKSIZE => '65536', IN_MEMORY => 'false', ENCODE_ON_DISK => 'true',
BLOCKCACHE =>
'true'}
1 row(s) in 0.7590 seconds
Those are all default parameters. Which mean, the default FILE_SIZE value
is 10GB.
If I look into Hannibal, it's fine. I can see my table, the regions, the
red line at 10GB showing the max size before the split, etc. All the
regions are under this line.... except one!
hadoop@buldo:~/hadoop-1.0.3$ bin/hadoop fs -ls
/hbase/work_proposed/46f8ea6e24982fbeb249a4516c879109/@
Found 1 items
-rw-r--r-- 3 hbase supergroup 22911054018 2013-08-03 20:57
/hbase/work_proposed/46f8ea6e24982fbeb249a4516c879109/@/404fcf681e5e4fdbac99db80345b011b
This region is 21GB. And it doesn't want to split. The first thing you will
say is it's because I have one single 21GB row in this region, but I don't
think so. My rows are URLs. I will be surprised if I have a 21GB URL ;)
I triggered major_compact many times, I stopped/start the cluster many
times, nothing. I can most probably ask for a manual split and that will
work, but I want to take this oportunity to figure why it's not splitting,
if it should be, and if there is any defect behind that.
I have not found any exception in the logs. I just started another
major_compaction and will grep the region name from the logs, but any idea
why I'm facing that, and where in the code I should start to look at? I can
deploy customized code to show more logs if required. I still start to look
at the split policies...
JM