You may have read http://hbase.apache.org/book.html#version.delete
Please see 'Scan Improvements in HBase 1.1.0' under https://blogs.apache.org/hbase/ Cheers On Thu, Jul 2, 2015 at 6:54 PM, Song Geng <[email protected]> wrote: > Hi everyone, > > I am a complete novice in hbase and the community. And this is my first > email. Please forgive me if I make some trouble. > > Here is the issue: > > We use hbase store the file information and using compose userid and > rowkey as the file path. > For example: A user’s id is 1000, and he has a file “a.txt” store > in “/root/data/”, then the rowkey will be “1000_/root/data/a.txt” . > > User will store a number of files in our system, like "millions of" or > "billions of”. Sometimes, he will do a delete action to a folder which > maybe store millions of files. And after this kind of delete action, it > will often turn up a “timeout issue” while scanning until we do a major > compaction. > > In order to make clear this issue, I read the google bigtable paper, > “hbase in action” and bloggers about block cache wrote by Nick, and many > other articles relevant to hbase, also the source code. I do some tests and > I got my conclusion list follows: > > The test table only have one column family and this cf only have one > column. > > There’s 3 aspects will influent the read latency, search key, disk I/O, > and network I/O. > Make hbase client caching smaller will reduce the latency for the sake of > “network I/O”. > Compare to normal scan, the “delete” scenario will result in spending more > time on searching and disk I/O. And I think mainly on searching. Think a > scenario: I put a number of data into hbase that just flush into a hfile. > Then I delete the majority of these data from the start key. It will record > into another hfile. At this time, it will read the data one by one if i do > a scan action from the start key(suppose there’s no compaction). Until we > get the first item not deleted. > > So, do compaction is the most effective way to resolve this kind of issue. > > I still have some doubt. Hope anyone could clear that. > First, I am not very confirm about the scan process of "delete scenario" I > described in "number 3”. > Second, block cache seems make less effect on this scenario. > > P.S. I don’t attach my test result cause I am afraid confuse others. I > will clear up them if necessary. > > Br, Great Soul > [email protected] > > > > > >
