for the question whether you will be able to do a warm up for the bloom and block cache i don't think it is possible now.
Regards Ram On Wed, Jun 5, 2013 at 10:57 AM, Asaf Mesika <[email protected]> wrote: > If you will read HFile v2 document on HBase site you will understand > completely how the search for a record works and why there is linear search > in the block but binary search to get to the right block. > Also bear in mind the amount of keys in a blocks is not big since a block > in HFile by default is 65k, thus from a 10GB HFile you are only fully > scanning 65k out of it. > > On Wednesday, June 5, 2013, Pankaj Gupta wrote: > > > Thanks for the replies. I'll take a look at src/main/java/org/apache/ > > hadoop/hbase/coprocessor/BaseRegionObserver.java. > > > > @ramkrishna: I do want to have bloom filter and block index all the time. > > For good read performance they're critical in my workflow. The worry is > > that when HBase is restarted it will take a long time for them to get > > populated again and performance will suffer. If there was a way of > loading > > them quickly and warm up the table then we'll be able to restart HBase > > without causing slow down in processing. > > > > > > On Tue, Jun 4, 2013 at 9:29 PM, Ted Yu <[email protected]> wrote: > > > > > bq. But i am not very sure if we can control the files getting selected > > for > > > compaction in the older verisons. > > > > > > Same mechanism is available in 0.94 > > > > > > Take a look > > > at > > > > src/main/java/org/apache/hadoop/hbase/coprocessor/BaseRegionObserver.java > > > where you would find the following methods (and more): > > > > > > public void preCompactSelection(final > > > ObserverContext<RegionCoprocessorEnvironment> c, > > > final Store store, final List<StoreFile> candidates, final > > > CompactionRequest request) > > > public InternalScanner > > > preCompact(ObserverContext<RegionCoprocessorEnvironment> e, > > > final Store store, final InternalScanner scanner) throws > > IOException > > > { > > > > > > Cheers > > > > > > On Tue, Jun 4, 2013 at 8:14 PM, ramkrishna vasudevan < > > > [email protected]> wrote: > > > > > > > >>Does Minor compaction remove HFiles in which all entries are out of > > > > TTL or does only Major compaction do that > > > > Yes it applies for Minor compactions. > > > > >>Is there a way of configuring major compaction to compact only > files > > > > older than a certain time or to compress all the files except the > > > latest > > > > few? > > > > In the latest trunk version the compaction algo itself can be > plugged. > > > > There are some coprocessor hooks that gives control on the scanner > > that > > > > gets created for compaction with which we can control the KVs being > > > > selected. But i am not very sure if we can control the files getting > > > > selected for compaction in the older verisons. > > > > >> The above excerpt seems to imply to me that the search for key > > inside > > > a > > > > block > > > > is linear and I feel I must be reading it wrong. I would expect the > > scan > > > to > > > > be a binary search. > > > > Once the data block is identified for a key, we seek to the beginning > > of > > > > the block and then do a linear search until we reach the exact key > that > > > we > > > > are looking out for. Because internally the data (KVs) are stored as > > > byte > > > > buffers per block and it follows this pattern > > > > <keylength><valuelength><keybytearray><valuebytearray> > > > > >>Is there a way to warm up the bloom filter and block index cache > for > > > > a table? > > > > You always want the bloom and block index to be in cache? > > > > > > > > > > > > On Wed, Jun 5, 2013 at 7:45 AM, Pankaj Gupta <[email protected]> > > > > wrote: > > > > > > > > > Hi, > > > > > > > > > > I have a few small questions regarding HBase. I've searched the > forum > > > but > > > > > couldn't find clear answers hence asking them here: > > > > > > > > > > > > > > > 1. Does Minor compaction remove HFiles in which all entries are > > out > > > of > > > > > TTL or does only Major compaction do that? I found this jira: > > > > > https://issues.apache.org/jira/browse/HBASE-5199 but I dont' > know > > > if > > > > > the > > > > > compaction being talked about there is minor or major. > > > > > 2. Is there a way of configuring major compaction to compact > only > > > > files > > > > > older than a certain time or to compress all the files except > the > > > > latest > > > > > few? We basically want to use the time based filtering > > optimization > > > in > > > > > HBase to get the latest additions to the table and since major > > > > > compaction > > > > > bunches everything into one file, it would defeat the > > optimization. > > > > > 3. Is there a way to warm up the bloom filter and block index > > cache > > > > for > > > > > a table? This is for a case where I always want the bloom > filters > > > and > > > > > index > > > > > to be all in memory, but not the >
