Regards, Jean-Marc.
What version of HBase are you using?
In the new version of the platform (0.94), there a lot of improvements
for auto spliting and pre-spliting regions.
The great Hortonworks's team published an amazing post for this
particular topic:
http://hortonworks.com/blog/apache-hbase-region-splitting-and-merging/

2013/8/9, Jean-Marc Spaggiari <[email protected]>:
> Hi,
>
> Quick question regarding the split.
>
> Let's consider the table "work_proposed' below:
>
> 275164921921  hdfs://node3:9000/hbase/work_proposed
>
> This is a 256GB table. I think there is more than 1B lines into it but I
> have not counted them for a while.
>
> This table as a pretty default definition:
>
>
> hbase(main):001:0> describe 'work_proposed'
> DESCRIPTION
> ENABLED
>
>  'work_proposed', {NAME => '@', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER
> => 'ROW', REPLICATION_SCOPE => '0', COMPRESSION => 'NONE', VERSIONS => '3',
> TTL => '2147483647', MIN
> true
>
>  _VERSIONS => '0', KEEP_DELETED_CELLS => 'false', BLOCKSIZE => '65536',
> ENCODE_ON_DISK => 'true', IN_MEMORY => 'false', BLOCKCACHE => 'true'},
> {NAME => 'a',
> DATA_BLOCK_ENCODIN
>
>  G => 'NONE', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0', VERSIONS =>
> '3', COMPRESSION => 'NONE', MIN_VERSIONS => '0', TTL => '2147483647',
> KEEP_DELETED_CELLS =>
> 'false',
>
>  BLOCKSIZE => '65536', IN_MEMORY => 'false', ENCODE_ON_DISK => 'true',
> BLOCKCACHE =>
> 'true'}
>
> 1 row(s) in 0.7590 seconds
>
> Those are all default parameters. Which mean, the default FILE_SIZE value
> is 10GB.
>
> If I look into Hannibal, it's fine. I can see my table, the regions, the
> red line at 10GB showing the max size before the split, etc. All the
> regions are under this line.... except one!
>
> hadoop@buldo:~/hadoop-1.0.3$ bin/hadoop fs -ls
> /hbase/work_proposed/46f8ea6e24982fbeb249a4516c879109/@
> Found 1 items
> -rw-r--r--   3 hbase supergroup 22911054018 2013-08-03 20:57
> /hbase/work_proposed/46f8ea6e24982fbeb249a4516c879109/@/404fcf681e5e4fdbac99db80345b011b
>
> This region is 21GB. And it doesn't want to split. The first thing you will
> say is it's because I have one single 21GB row in this region, but I don't
> think so. My rows are URLs. I will be surprised if I have a 21GB URL ;)
>
> I triggered major_compact many times, I stopped/start the cluster many
> times, nothing. I can most probably ask for a manual split and that will
> work, but I want to take this oportunity to figure why it's not splitting,
> if it should be, and if there is any defect behind that.
>
> I have not found any exception in the logs. I just started another
> major_compaction and will grep the region name from the logs, but any idea
> why I'm facing that, and where in the code I should start to look at? I can
> deploy customized code to show more logs if required. I still start to look
> at the split policies...
>
> JM
>


-- 
Marcos Ortiz Valmaseda
Product Manager at PDVSA
http://about.me/marcosortiz

Reply via email to