Hi everyone,
I am a complete novice in hbase and the community. And this is my first email.
Please forgive me if I make some trouble.
Here is the issue:
We use hbase store the file information and using compose userid and rowkey as
the file path.
For example: A user’s id is 1000, and he has a file “a.txt” store in
“/root/data/”, then the rowkey will be “1000_/root/data/a.txt” .
User will store a number of files in our system, like "millions of" or
"billions of”. Sometimes, he will do a delete action to a folder which maybe
store millions of files. And after this kind of delete action, it will often
turn up a “timeout issue” while scanning until we do a major compaction.
In order to make clear this issue, I read the google bigtable paper, “hbase in
action” and bloggers about block cache wrote by Nick, and many other articles
relevant to hbase, also the source code. I do some tests and I got my
conclusion list follows:
The test table only have one column family and this cf only have one column.
There’s 3 aspects will influent the read latency, search key, disk I/O, and
network I/O.
Make hbase client caching smaller will reduce the latency for the sake of
“network I/O”.
Compare to normal scan, the “delete” scenario will result in spending more time
on searching and disk I/O. And I think mainly on searching. Think a scenario: I
put a number of data into hbase that just flush into a hfile. Then I delete the
majority of these data from the start key. It will record into another hfile.
At this time, it will read the data one by one if i do a scan action from the
start key(suppose there’s no compaction). Until we get the first item not
deleted.
So, do compaction is the most effective way to resolve this kind of issue.
I still have some doubt. Hope anyone could clear that.
First, I am not very confirm about the scan process of "delete scenario" I
described in "number 3”.
Second, block cache seems make less effect on this scenario.
P.S. I don’t attach my test result cause I am afraid confuse others. I will
clear up them if necessary.
Br, Great Soul
[email protected]