Hi,
I was trying to find out if the hbase can be used in real-time processing
scenario. In order to
do so, I set the in_memory for a table to be true, and set the TTL for the
table to 10 minuets.
The data comes in chronnological order. I let the test to run for 1 day.
The idea is that we are only
interested in last 10 minute's data. as data gets older, it will be purged,
and the amount of memory and disk usage will remain low.
What I found is that the region number continue to grow , and overnight it
created 46 regions. the HDFS shows it used 8.6G of disk space. This is one
order of magnitude higher than what I estimate in the ideal case. The data
rate that I am pumping is only 3 regions/hour. I would imagine that we will
only have less than 3 regions in hbase for this kind of situation, and only
700M in terms of HDFS usage, regardless how long I run the test.
I understand that the region merge request is already filed. Does anybody
know when that
will be implemented ?
Jimmy.