When you do the first read of this region, wouldn't this load all bloom filters?
On Wed, Jun 5, 2013 at 8:43 AM, ramkrishna vasudevan < [email protected]> wrote: > for the question whether you will be able to do a warm up for the bloom and > block cache i don't think it is possible now. > > Regards > Ram > > > On Wed, Jun 5, 2013 at 10:57 AM, Asaf Mesika <[email protected]> > wrote: > > > If you will read HFile v2 document on HBase site you will understand > > completely how the search for a record works and why there is linear > search > > in the block but binary search to get to the right block. > > Also bear in mind the amount of keys in a blocks is not big since a block > > in HFile by default is 65k, thus from a 10GB HFile you are only fully > > scanning 65k out of it. > > > > On Wednesday, June 5, 2013, Pankaj Gupta wrote: > > > > > Thanks for the replies. I'll take a look at src/main/java/org/apache/ > > > hadoop/hbase/coprocessor/BaseRegionObserver.java. > > > > > > @ramkrishna: I do want to have bloom filter and block index all the > time. > > > For good read performance they're critical in my workflow. The worry is > > > that when HBase is restarted it will take a long time for them to get > > > populated again and performance will suffer. If there was a way of > > loading > > > them quickly and warm up the table then we'll be able to restart HBase > > > without causing slow down in processing. > > > > > > > > > On Tue, Jun 4, 2013 at 9:29 PM, Ted Yu <[email protected]> wrote: > > > > > > > bq. But i am not very sure if we can control the files getting > selected > > > for > > > > compaction in the older verisons. > > > > > > > > Same mechanism is available in 0.94 > > > > > > > > Take a look > > > > at > > > > > > src/main/java/org/apache/hadoop/hbase/coprocessor/BaseRegionObserver.java > > > > where you would find the following methods (and more): > > > > > > > > public void preCompactSelection(final > > > > ObserverContext<RegionCoprocessorEnvironment> c, > > > > final Store store, final List<StoreFile> candidates, final > > > > CompactionRequest request) > > > > public InternalScanner > > > > preCompact(ObserverContext<RegionCoprocessorEnvironment> e, > > > > final Store store, final InternalScanner scanner) throws > > > IOException > > > > { > > > > > > > > Cheers > > > > > > > > On Tue, Jun 4, 2013 at 8:14 PM, ramkrishna vasudevan < > > > > [email protected]> wrote: > > > > > > > > > >>Does Minor compaction remove HFiles in which all entries are out > of > > > > > TTL or does only Major compaction do that > > > > > Yes it applies for Minor compactions. > > > > > >>Is there a way of configuring major compaction to compact only > > files > > > > > older than a certain time or to compress all the files except > the > > > > latest > > > > > few? > > > > > In the latest trunk version the compaction algo itself can be > > plugged. > > > > > There are some coprocessor hooks that gives control on the scanner > > > that > > > > > gets created for compaction with which we can control the KVs being > > > > > selected. But i am not very sure if we can control the files > getting > > > > > selected for compaction in the older verisons. > > > > > >> The above excerpt seems to imply to me that the search for key > > > inside > > > > a > > > > > block > > > > > is linear and I feel I must be reading it wrong. I would expect the > > > scan > > > > to > > > > > be a binary search. > > > > > Once the data block is identified for a key, we seek to the > beginning > > > of > > > > > the block and then do a linear search until we reach the exact key > > that > > > > we > > > > > are looking out for. Because internally the data (KVs) are stored > as > > > > byte > > > > > buffers per block and it follows this pattern > > > > > <keylength><valuelength><keybytearray><valuebytearray> > > > > > >>Is there a way to warm up the bloom filter and block index cache > > for > > > > > a table? > > > > > You always want the bloom and block index to be in cache? > > > > > > > > > > > > > > > On Wed, Jun 5, 2013 at 7:45 AM, Pankaj Gupta < > [email protected]> > > > > > wrote: > > > > > > > > > > > Hi, > > > > > > > > > > > > I have a few small questions regarding HBase. I've searched the > > forum > > > > but > > > > > > couldn't find clear answers hence asking them here: > > > > > > > > > > > > > > > > > > 1. Does Minor compaction remove HFiles in which all entries > are > > > out > > > > of > > > > > > TTL or does only Major compaction do that? I found this jira: > > > > > > https://issues.apache.org/jira/browse/HBASE-5199 but I dont' > > know > > > > if > > > > > > the > > > > > > compaction being talked about there is minor or major. > > > > > > 2. Is there a way of configuring major compaction to compact > > only > > > > > files > > > > > > older than a certain time or to compress all the files except > > the > > > > > latest > > > > > > few? We basically want to use the time based filtering > > > optimization > > > > in > > > > > > HBase to get the latest additions to the table and since major > > > > > > compaction > > > > > > bunches everything into one file, it would defeat the > > > optimization. > > > > > > 3. Is there a way to warm up the bloom filter and block index > > > cache > > > > > for > > > > > > a table? This is for a case where I always want the bloom > > filters > > > > and > > > > > > index > > > > > > to be all in memory, but not the > > >
