If you will read HFile v2 document on HBase site you will understand completely how the search for a record works and why there is linear search in the block but binary search to get to the right block. Also bear in mind the amount of keys in a blocks is not big since a block in HFile by default is 65k, thus from a 10GB HFile you are only fully scanning 65k out of it.
On Wednesday, June 5, 2013, Pankaj Gupta wrote: > Thanks for the replies. I'll take a look at src/main/java/org/apache/ > hadoop/hbase/coprocessor/BaseRegionObserver.java. > > @ramkrishna: I do want to have bloom filter and block index all the time. > For good read performance they're critical in my workflow. The worry is > that when HBase is restarted it will take a long time for them to get > populated again and performance will suffer. If there was a way of loading > them quickly and warm up the table then we'll be able to restart HBase > without causing slow down in processing. > > > On Tue, Jun 4, 2013 at 9:29 PM, Ted Yu <[email protected]> wrote: > > > bq. But i am not very sure if we can control the files getting selected > for > > compaction in the older verisons. > > > > Same mechanism is available in 0.94 > > > > Take a look > > at > > src/main/java/org/apache/hadoop/hbase/coprocessor/BaseRegionObserver.java > > where you would find the following methods (and more): > > > > public void preCompactSelection(final > > ObserverContext<RegionCoprocessorEnvironment> c, > > final Store store, final List<StoreFile> candidates, final > > CompactionRequest request) > > public InternalScanner > > preCompact(ObserverContext<RegionCoprocessorEnvironment> e, > > final Store store, final InternalScanner scanner) throws > IOException > > { > > > > Cheers > > > > On Tue, Jun 4, 2013 at 8:14 PM, ramkrishna vasudevan < > > [email protected]> wrote: > > > > > >>Does Minor compaction remove HFiles in which all entries are out of > > > TTL or does only Major compaction do that > > > Yes it applies for Minor compactions. > > > >>Is there a way of configuring major compaction to compact only files > > > older than a certain time or to compress all the files except the > > latest > > > few? > > > In the latest trunk version the compaction algo itself can be plugged. > > > There are some coprocessor hooks that gives control on the scanner > that > > > gets created for compaction with which we can control the KVs being > > > selected. But i am not very sure if we can control the files getting > > > selected for compaction in the older verisons. > > > >> The above excerpt seems to imply to me that the search for key > inside > > a > > > block > > > is linear and I feel I must be reading it wrong. I would expect the > scan > > to > > > be a binary search. > > > Once the data block is identified for a key, we seek to the beginning > of > > > the block and then do a linear search until we reach the exact key that > > we > > > are looking out for. Because internally the data (KVs) are stored as > > byte > > > buffers per block and it follows this pattern > > > <keylength><valuelength><keybytearray><valuebytearray> > > > >>Is there a way to warm up the bloom filter and block index cache for > > > a table? > > > You always want the bloom and block index to be in cache? > > > > > > > > > On Wed, Jun 5, 2013 at 7:45 AM, Pankaj Gupta <[email protected]> > > > wrote: > > > > > > > Hi, > > > > > > > > I have a few small questions regarding HBase. I've searched the forum > > but > > > > couldn't find clear answers hence asking them here: > > > > > > > > > > > > 1. Does Minor compaction remove HFiles in which all entries are > out > > of > > > > TTL or does only Major compaction do that? I found this jira: > > > > https://issues.apache.org/jira/browse/HBASE-5199 but I dont' know > > if > > > > the > > > > compaction being talked about there is minor or major. > > > > 2. Is there a way of configuring major compaction to compact only > > > files > > > > older than a certain time or to compress all the files except the > > > latest > > > > few? We basically want to use the time based filtering > optimization > > in > > > > HBase to get the latest additions to the table and since major > > > > compaction > > > > bunches everything into one file, it would defeat the > optimization. > > > > 3. Is there a way to warm up the bloom filter and block index > cache > > > for > > > > a table? This is for a case where I always want the bloom filters > > and > > > > index > > > > to be all in memory, but not the
