Hi everyone,

I am a complete novice in hbase and the community. And this is my first email. 
Please forgive me if I make some trouble.

Here is the issue:

We use hbase store the file information and using compose userid and rowkey as 
the file path. 
        For example: A user’s id is 1000, and he has a file “a.txt” store in 
“/root/data/”, then the rowkey will be “1000_/root/data/a.txt” .

User will store a number of files in our system, like "millions of" or 
"billions of”. Sometimes, he will do a delete action to a folder which maybe 
store millions of files. And after this kind of delete action, it will often 
turn up a “timeout issue” while scanning until we do a major compaction.

In order to make clear this issue, I read the google bigtable paper, “hbase in 
action” and bloggers about block cache wrote by Nick, and many other articles 
relevant to hbase, also the source code. I do some tests and I got my 
conclusion list follows:

The test table only have one column family and this cf only have one column.

There’s 3 aspects will influent the read latency, search key, disk I/O, and 
network I/O.
Make hbase client caching smaller will reduce the latency for the sake of 
“network I/O”.
Compare to normal scan, the “delete” scenario will result in spending more time 
on searching and disk I/O. And I think mainly on searching. Think a scenario: I 
put a number of data into hbase that just flush into a hfile. Then I delete the 
majority of these data from the start key. It will record into another hfile. 
At this time, it will read the data one by one if i do a scan action from the 
start key(suppose there’s no compaction). Until we get the first item not 
deleted. 

So, do compaction is the most effective way to resolve this kind of issue.

I still have some doubt. Hope anyone could clear that.
First, I am not very confirm about the scan process of "delete scenario" I 
described in "number 3”. 
Second, block cache seems make less effect on this scenario.

P.S. I don’t attach my test result cause I am afraid confuse others. I will 
clear up them if necessary. 

Br, Great Soul
[email protected]





Reply via email to