Hi,
I was trying to find out if the hbase can be used in real-time processing scenario. In order to do so, I set the in_memory for a table to be true, and set the TTL for the table to 10 minuets. The data comes in chronnological order. I let the test to run for 1 day. The idea is that we are only interested in last 10 minute's data. as data gets older, it will be purged, and the amount of memory and disk usage will remain low. What I found is that the region number continue to grow , and overnight it created 46 regions. the HDFS shows it used 8.6G of disk space. This is one order of magnitude higher than what I estimate in the ideal case. The data rate that I am pumping is only 3 regions/hour. I would imagine that we will only have less than 3 regions in hbase for this kind of situation, and only 700M in terms of HDFS usage, regardless how long I run the test. I understand that the region merge request is already filed. Does anybody know when that
will be implemented ?

Jimmy.

Reply via email to