I'm not sure what caused so many index block misses. At the time I ran the experiment had over 12 GB of RAM assigned to block cache. My understanding is that since I had restarted HBase before running this experiment it was basically loading index blocks as and when needed and thus index misses were spread over a period of time. I monitored the region server while running this debugging session and didn't see a single block eviction so it couldn't be that the index blocks were being kicked out by something else.
I've got some really good information in this thread and I thank you all. The blockSeek function in HFileReaderV2 clearly confirms the linear nature of scan for finding a key in a block. I feel that warming up the block and index cache could be a useful feature for many workflows. Would it be a good idea to have a JIRA for that? Thanks, Pankaj On Wed, Jun 5, 2013 at 1:24 AM, Anoop John <[email protected]> wrote: > Why there are so many miss for the index blocks? WHat is the block cache > mem you use? > > On Wed, Jun 5, 2013 at 12:37 PM, ramkrishna vasudevan < > [email protected]> wrote: > > > I get your point Pankaj. > > Going thro the code to confirm it > > // Data index. We also read statistics about the block index written > > after > > // the root level. > > dataBlockIndexReader.readMultiLevelIndexRoot( > > blockIter.nextBlockWithBlockType(BlockType.ROOT_INDEX), > > trailer.getDataIndexCount()); > > > > // Meta index. > > metaBlockIndexReader.readRootIndex( > > blockIter.nextBlockWithBlockType(BlockType.ROOT_INDEX), > > trailer.getMetaIndexCount()); > > > > We read the root level of the multilevel index and the actual root index. > > So as and when when we need new index blocks we will be hitting the disk > > and your observation is correct. Sorry if i had confused you in this. > > The new version of HFile was mainly to address the concern in the > previous > > versoin where the entire indices was in memory. The version V2 addressed > > that concern like having the root level (something like metadata of the > > indices) and from there you should be able to get new index blocks. > > But there are chances that if you region size is small you may have only > > one level and the entire thing may be in memory. > > > > Regards > > Ram > > > > > > On Wed, Jun 5, 2013 at 11:56 AM, Pankaj Gupta <[email protected]> > > wrote: > > > > > Sorry, forgot to mention that I added the log statements to the method > > > readBlock in HFileReaderV2.java. I'm on hbase 0.94.2. > > > > > > > > > On Tue, Jun 4, 2013 at 11:16 PM, Pankaj Gupta <[email protected]> > > > wrote: > > > > > > > Some context on how I observed bloom filters being loaded > constantly. I > > > > added the following logging statements to HFileReaderV2.java: > > > > } > > > > if (!useLock) { > > > > // check cache again with lock > > > > useLock = true; > > > > continue; > > > > } > > > > > > > > // Load block from filesystem. > > > > long startTimeNs = System.nanoTime(); > > > > HFileBlock hfileBlock = > > > > fsBlockReader.readBlockData(dataBlockOffset, > > > > onDiskBlockSize, -1, pread); > > > > hfileBlock = dataBlockEncoder.diskToCacheFormat(hfileBlock, > > > > isCompaction); > > > > validateBlockType(hfileBlock, expectedBlockType); > > > > passSchemaMetricsTo(hfileBlock); > > > > BlockCategory blockCategory = > > > > hfileBlock.getBlockType().getCategory(); > > > > > > > > // My logging statements ----> > > > > if(blockCategory == BlockCategory.INDEX) { > > > > LOG.info("index block miss, reading from disk " + > cacheKey); > > > > } else if (blockCategory == BlockCategory.BLOOM) { > > > > LOG.info("bloom block miss, reading from disk " + > cacheKey); > > > > } else { > > > > LOG.info("block miss other than index or bloom, reading > from > > > > disk " + cacheKey); > > > > } > > > > //--------------> > > > > final long delta = System.nanoTime() - startTimeNs; > > > > HFile.offerReadLatency(delta, pread); > > > > getSchemaMetrics().updateOnCacheMiss(blockCategory, > > isCompaction, > > > > delta); > > > > > > > > // Cache the block if necessary > > > > if (cacheBlock && cacheConf.shouldCacheBlockOnRead( > > > > hfileBlock.getBlockType().getCategory())) { > > > > cacheConf.getBlockCache().cacheBlock(cacheKey, hfileBlock, > > > > cacheConf.isInMemory()); > > > > } > > > > > > > > if (hfileBlock.getBlockType() == BlockType.DATA) { > > > > HFile.dataBlockReadCnt.incrementAndGet(); > > > > } > > > > > > > > With these in place I saw the following statements in log: > > > > 2013-06-05 01:04:55,281 INFO > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: index block miss, > > reading > > > > from disk 11958ab7a4a1492e853743b02e1bd7b1_30361506 > > > > 2013-06-05 01:05:00,579 INFO > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: index block miss, > > reading > > > > from disk 52cded0c399b48fdbccd8b3d4e25502f_28779560 > > > > 2013-06-05 01:07:41,335 INFO > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss, > > reading > > > > from disk 11958ab7a4a1492e853743b02e1bd7b1_4199735 > > > > 2013-06-05 01:08:58,460 INFO > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss, > > reading > > > > from disk 11958ab7a4a1492e853743b02e1bd7b1_8519720 > > > > 2013-06-05 01:11:01,545 INFO > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss, > > reading > > > > from disk 11958ab7a4a1492e853743b02e1bd7b1_12838948 > > > > 2013-06-05 01:11:03,035 INFO > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss, > > reading > > > > from disk 52cded0c399b48fdbccd8b3d4e25502f_3973250 > > > > 2013-06-05 01:11:36,339 INFO > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss, > > reading > > > > from disk 11958ab7a4a1492e853743b02e1bd7b1_17159812 > > > > 2013-06-05 01:12:35,398 INFO > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss, > > reading > > > > from disk 11958ab7a4a1492e853743b02e1bd7b1_21478349 > > > > 2013-06-05 01:13:02,572 INFO > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss, > > reading > > > > from disk 11958ab7a4a1492e853743b02e1bd7b1_25798003 > > > > 2013-06-05 01:13:03,260 INFO > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss, > > reading > > > > from disk 52cded0c399b48fdbccd8b3d4e25502f_8068381 > > > > 2013-06-05 01:13:20,265 INFO > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss, > > reading > > > > from disk 11958ab7a4a1492e853743b02e1bd7b1_30118048 > > > > 2013-06-05 01:13:20,522 INFO > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: index block miss, > > reading > > > > from disk 11958ab7a4a1492e853743b02e1bd7b1_60833137 > > > > 2013-06-05 01:13:32,261 INFO > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss, > > reading > > > > from disk 11958ab7a4a1492e853743b02e1bd7b1_34545951 > > > > 2013-06-05 01:13:48,504 INFO > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss, > > reading > > > > from disk 11958ab7a4a1492e853743b02e1bd7b1_38865311 > > > > 2013-06-05 01:13:49,951 INFO > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss, > > reading > > > > from disk 52cded0c399b48fdbccd8b3d4e25502f_12161793 > > > > 2013-06-05 01:14:02,073 INFO > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss, > > reading > > > > from disk 11958ab7a4a1492e853743b02e1bd7b1_43185677 > > > > 2013-06-05 01:14:12,956 INFO > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss, > > reading > > > > from disk 11958ab7a4a1492e853743b02e1bd7b1_47506066 > > > > 2013-06-05 01:14:25,132 INFO > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss, > > reading > > > > from disk 11958ab7a4a1492e853743b02e1bd7b1_51825831 > > > > 2013-06-05 01:14:25,946 INFO > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss, > > reading > > > > from disk 52cded0c399b48fdbccd8b3d4e25502f_16257519 > > > > 2013-06-05 01:14:34,478 INFO > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss, > > reading > > > > from disk 11958ab7a4a1492e853743b02e1bd7b1_56145793 > > > > 2013-06-05 01:14:45,319 INFO > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss, > > reading > > > > from disk 11958ab7a4a1492e853743b02e1bd7b1_60466405 > > > > 2013-06-05 01:14:45,998 INFO > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: index block miss, > > reading > > > > from disk 11958ab7a4a1492e853743b02e1bd7b1_91304775 > > > > 2013-06-05 01:14:58,203 INFO > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss, > > reading > > > > from disk 11958ab7a4a1492e853743b02e1bd7b1_64893493 > > > > 2013-06-05 01:14:58,463 INFO > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss, > > reading > > > > from disk 52cded0c399b48fdbccd8b3d4e25502f_20352561 > > > > 2013-06-05 01:15:09,299 INFO > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss, > > reading > > > > from disk 11958ab7a4a1492e853743b02e1bd7b1_69214092 > > > > 2013-06-05 01:15:32,944 INFO > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss, > > reading > > > > from disk 11958ab7a4a1492e853743b02e1bd7b1_73533616 > > > > 2013-06-05 01:15:46,903 INFO > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss, > > reading > > > > from disk 11958ab7a4a1492e853743b02e1bd7b1_77865906 > > > > 2013-06-05 01:15:47,273 INFO > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss, > > reading > > > > from disk 52cded0c399b48fdbccd8b3d4e25502f_24448138 > > > > 2013-06-05 01:15:55,312 INFO > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss, > > reading > > > > from disk 11958ab7a4a1492e853743b02e1bd7b1_82185687 > > > > 2013-06-05 01:16:07,591 INFO > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss, > > reading > > > > from disk 11958ab7a4a1492e853743b02e1bd7b1_86506129 > > > > 2013-06-05 01:16:20,728 INFO > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss, > > reading > > > > from disk 11958ab7a4a1492e853743b02e1bd7b1_90825624 > > > > 2013-06-05 01:16:22,551 INFO > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss, > > reading > > > > from disk 52cded0c399b48fdbccd8b3d4e25502f_28542144 > > > > 2013-06-05 01:16:22,810 INFO > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: index block miss, > > reading > > > > from disk 11958ab7a4a1492e853743b02e1bd7b1_121777484 > > > > 2013-06-05 01:16:23,035 INFO > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: index block miss, > > reading > > > > from disk 52cded0c399b48fdbccd8b3d4e25502f_57670002 > > > > 2013-06-05 01:16:33,196 INFO > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss, > > reading > > > > from disk 11958ab7a4a1492e853743b02e1bd7b1_95253904 > > > > 2013-06-05 01:16:48,187 INFO > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss, > > reading > > > > from disk 11958ab7a4a1492e853743b02e1bd7b1_99574899 > > > > 2013-06-05 01:17:06,648 INFO > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss, > > reading > > > > from disk 11958ab7a4a1492e853743b02e1bd7b1_103895087 > > > > 2013-06-05 01:17:10,526 INFO > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss, > > reading > > > > from disk 52cded0c399b48fdbccd8b3d4e25502f_32744846 > > > > 2013-06-05 01:17:22,939 INFO > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss, > > reading > > > > from disk 11958ab7a4a1492e853743b02e1bd7b1_108214936 > > > > 2013-06-05 01:17:36,010 INFO > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss, > > reading > > > > from disk 11958ab7a4a1492e853743b02e1bd7b1_112535209 > > > > 2013-06-05 01:17:46,028 INFO > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss, > > reading > > > > from disk 11958ab7a4a1492e853743b02e1bd7b1_116855742 > > > > 2013-06-05 01:17:47,029 INFO > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss, > > reading > > > > from disk 52cded0c399b48fdbccd8b3d4e25502f_36838416 > > > > 2013-06-05 01:17:54,472 INFO > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss, > > reading > > > > from disk 11958ab7a4a1492e853743b02e1bd7b1_121174753 > > > > 2013-06-05 01:17:55,491 INFO > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: index block miss, > > reading > > > > from disk 11958ab7a4a1492e853743b02e1bd7b1_152248177 > > > > 2013-06-05 01:18:05,912 INFO > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss, > > reading > > > > from disk 11958ab7a4a1492e853743b02e1bd7b1_125601238 > > > > 2013-06-05 01:18:15,417 INFO > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss, > > reading > > > > from disk 11958ab7a4a1492e853743b02e1bd7b1_129921797 > > > > 2013-06-05 01:18:16,713 INFO > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss, > > reading > > > > from disk 52cded0c399b48fdbccd8b3d4e25502f_40933856 > > > > 2013-06-05 01:18:29,521 INFO > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss, > > reading > > > > from disk 11958ab7a4a1492e853743b02e1bd7b1_134242324 > > > > 2013-06-05 01:18:38,653 INFO > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss, > > reading > > > > from disk 11958ab7a4a1492e853743b02e1bd7b1_138561860 > > > > 2013-06-05 01:18:49,280 INFO > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss, > > reading > > > > from disk 11958ab7a4a1492e853743b02e1bd7b1_142881436 > > > > 2013-06-05 01:18:50,052 INFO > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss, > > reading > > > > from disk 52cded0c399b48fdbccd8b3d4e25502f_45029905 > > > > 2013-06-05 01:18:58,339 INFO > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss, > > reading > > > > from disk 11958ab7a4a1492e853743b02e1bd7b1_147201737 > > > > 2013-06-05 01:19:06,371 INFO > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss, > > reading > > > > from disk 11958ab7a4a1492e853743b02e1bd7b1_151533253 > > > > 2013-06-05 01:19:07,782 INFO > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: index block miss, > > reading > > > > from disk 11958ab7a4a1492e853743b02e1bd7b1_182719269 > > > > > > > > I kept seeing these statements appearing constantly over a long > period, > > > > this seemed to confirm to me that bloom filter blocks are being > loaded > > > over > > > > a period time, which also matched what I read about HFileV2. May be I > > am > > > > wrong about both. Would love to understand what's really going on. > > > > > > > > Thanks in Advance, > > > > Pankaj > > > > > > > > > > > > > > > > On Tue, Jun 4, 2013 at 11:05 PM, ramkrishna vasudevan < > > > > [email protected]> wrote: > > > > > > > >> Whenever the region is opened all the bloom filter meta data are > > loaded > > > >> into memory. I think his concern is every time all the store files > > are > > > >> read and then we load it into memory and wants some faster ways of > > doing > > > >> it. > > > >> Asaf you are right. > > > >> > > > >> Regards > > > >> Ram > > > >> > > > >> > > > >> On Wed, Jun 5, 2013 at 11:22 AM, Asaf Mesika <[email protected] > > > > > >> wrote: > > > >> > > > >> > When you do the first read of this region, wouldn't this load all > > > bloom > > > >> > filters? > > > >> > > > > >> > > > > >> > > > > >> > On Wed, Jun 5, 2013 at 8:43 AM, ramkrishna vasudevan < > > > >> > [email protected]> wrote: > > > >> > > > > >> > > for the question whether you will be able to do a warm up for > the > > > >> bloom > > > >> > and > > > >> > > block cache i don't think it is possible now. > > > >> > > > > > >> > > Regards > > > >> > > Ram > > > >> > > > > > >> > > > > > >> > > On Wed, Jun 5, 2013 at 10:57 AM, Asaf Mesika < > > [email protected] > > > > > > > >> > > wrote: > > > >> > > > > > >> > > > If you will read HFile v2 document on HBase site you will > > > understand > > > >> > > > completely how the search for a record works and why there is > > > linear > > > >> > > search > > > >> > > > in the block but binary search to get to the right block. > > > >> > > > Also bear in mind the amount of keys in a blocks is not big > > since > > > a > > > >> > block > > > >> > > > in HFile by default is 65k, thus from a 10GB HFile you are > only > > > >> fully > > > >> > > > scanning 65k out of it. > > > >> > > > > > > >> > > > On Wednesday, June 5, 2013, Pankaj Gupta wrote: > > > >> > > > > > > >> > > > > Thanks for the replies. I'll take a look at > > > >> src/main/java/org/apache/ > > > >> > > > > hadoop/hbase/coprocessor/BaseRegionObserver.java. > > > >> > > > > > > > >> > > > > @ramkrishna: I do want to have bloom filter and block index > > all > > > >> the > > > >> > > time. > > > >> > > > > For good read performance they're critical in my workflow. > The > > > >> worry > > > >> > is > > > >> > > > > that when HBase is restarted it will take a long time for > them > > > to > > > >> get > > > >> > > > > populated again and performance will suffer. If there was a > > way > > > of > > > >> > > > loading > > > >> > > > > them quickly and warm up the table then we'll be able to > > restart > > > >> > HBase > > > >> > > > > without causing slow down in processing. > > > >> > > > > > > > >> > > > > > > > >> > > > > On Tue, Jun 4, 2013 at 9:29 PM, Ted Yu <[email protected] > > > > > >> wrote: > > > >> > > > > > > > >> > > > > > bq. But i am not very sure if we can control the files > > getting > > > >> > > selected > > > >> > > > > for > > > >> > > > > > compaction in the older verisons. > > > >> > > > > > > > > >> > > > > > Same mechanism is available in 0.94 > > > >> > > > > > > > > >> > > > > > Take a look > > > >> > > > > > at > > > >> > > > > > > > > >> > > > > > > >> > > > > >> > > > > src/main/java/org/apache/hadoop/hbase/coprocessor/BaseRegionObserver.java > > > >> > > > > > where you would find the following methods (and more): > > > >> > > > > > > > > >> > > > > > public void preCompactSelection(final > > > >> > > > > > ObserverContext<RegionCoprocessorEnvironment> c, > > > >> > > > > > final Store store, final List<StoreFile> candidates, > > > final > > > >> > > > > > CompactionRequest request) > > > >> > > > > > public InternalScanner > > > >> > > > > > preCompact(ObserverContext<RegionCoprocessorEnvironment> > e, > > > >> > > > > > final Store store, final InternalScanner scanner) > > throws > > > >> > > > > IOException > > > >> > > > > > { > > > >> > > > > > > > > >> > > > > > Cheers > > > >> > > > > > > > > >> > > > > > On Tue, Jun 4, 2013 at 8:14 PM, ramkrishna vasudevan < > > > >> > > > > > [email protected]> wrote: > > > >> > > > > > > > > >> > > > > > > >>Does Minor compaction remove HFiles in which all > entries > > > are > > > >> > out > > > >> > > of > > > >> > > > > > > TTL or does only Major compaction do that > > > >> > > > > > > Yes it applies for Minor compactions. > > > >> > > > > > > >>Is there a way of configuring major compaction to > > compact > > > >> only > > > >> > > > files > > > >> > > > > > > older than a certain time or to compress all the > files > > > >> except > > > >> > > the > > > >> > > > > > latest > > > >> > > > > > > few? > > > >> > > > > > > In the latest trunk version the compaction algo itself > can > > > be > > > >> > > > plugged. > > > >> > > > > > > There are some coprocessor hooks that gives control on > > the > > > >> > scanner > > > >> > > > > that > > > >> > > > > > > gets created for compaction with which we can control > the > > > KVs > > > >> > being > > > >> > > > > > > selected. But i am not very sure if we can control the > > files > > > >> > > getting > > > >> > > > > > > selected for compaction in the older verisons. > > > >> > > > > > > >> The above excerpt seems to imply to me that the > search > > > for > > > >> key > > > >> > > > > inside > > > >> > > > > > a > > > >> > > > > > > block > > > >> > > > > > > is linear and I feel I must be reading it wrong. I would > > > >> expect > > > >> > the > > > >> > > > > scan > > > >> > > > > > to > > > >> > > > > > > be a binary search. > > > >> > > > > > > Once the data block is identified for a key, we seek to > > the > > > >> > > beginning > > > >> > > > > of > > > >> > > > > > > the block and then do a linear search until we reach the > > > exact > > > >> > key > > > >> > > > that > > > >> > > > > > we > > > >> > > > > > > are looking out for. Because internally the data (KVs) > > are > > > >> > stored > > > >> > > as > > > >> > > > > > byte > > > >> > > > > > > buffers per block and it follows this pattern > > > >> > > > > > > <keylength><valuelength><keybytearray><valuebytearray> > > > >> > > > > > > >>Is there a way to warm up the bloom filter and block > > index > > > >> > cache > > > >> > > > for > > > >> > > > > > > a table? > > > >> > > > > > > You always want the bloom and block index to be in > cache? > > > >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > On Wed, Jun 5, 2013 at 7:45 AM, Pankaj Gupta < > > > >> > > [email protected]> > > > >> > > > > > > wrote: > > > >> > > > > > > > > > >> > > > > > > > Hi, > > > >> > > > > > > > > > > >> > > > > > > > I have a few small questions regarding HBase. I've > > > searched > > > >> the > > > >> > > > forum > > > >> > > > > > but > > > >> > > > > > > > couldn't find clear answers hence asking them here: > > > >> > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > 1. Does Minor compaction remove HFiles in which all > > > >> entries > > > >> > > are > > > >> > > > > out > > > >> > > > > > of > > > >> > > > > > > > TTL or does only Major compaction do that? I found > > this > > > >> > jira: > > > >> > > > > > > > > https://issues.apache.org/jira/browse/HBASE-5199but I > > > >> > dont' > > > >> > > > know > > > >> > > > > > if > > > >> > > > > > > > the > > > >> > > > > > > > compaction being talked about there is minor or > > major. > > > >> > > > > > > > 2. Is there a way of configuring major compaction > to > > > >> compact > > > >> > > > only > > > >> > > > > > > files > > > >> > > > > > > > older than a certain time or to compress all the > > files > > > >> > except > > > >> > > > the > > > >> > > > > > > latest > > > >> > > > > > > > few? We basically want to use the time based > > filtering > > > >> > > > > optimization > > > >> > > > > > in > > > >> > > > > > > > HBase to get the latest additions to the table and > > > since > > > >> > major > > > >> > > > > > > > compaction > > > >> > > > > > > > bunches everything into one file, it would defeat > the > > > >> > > > > optimization. > > > >> > > > > > > > 3. Is there a way to warm up the bloom filter and > > block > > > >> > index > > > >> > > > > cache > > > >> > > > > > > for > > > >> > > > > > > > a table? This is for a case where I always want the > > > bloom > > > >> > > > filters > > > >> > > > > > and > > > >> > > > > > > > index > > > >> > > > > > > > to be all in memory, but not the > > > >> > > > > > > >> > > > > > >> > > > > >> > > > > > > > > > > > > > > > > -- > > > > > > > > > > > > *P* | (415) 677-9222 ext. 205 *F *| (415) 677-0895 | > > > [email protected] > > > > > > > > Pankaj Gupta | Software Engineer > > > > > > > > *BrightRoll, Inc. *| Smart Video Advertising | www.brightroll.com > > > > > > > > > > > > United States | Canada | United Kingdom | Germany > > > > > > > > > > > > We're hiring< > > > > > > http://newton.newtonsoftware.com/career/CareerHome.action?clientId=8a42a12b3580e2060135837631485aa7 > > > > > > > > ! > > > > > > > > > > > > > > > > -- > > > > > > > > > *P* | (415) 677-9222 ext. 205 *F *| (415) 677-0895 | > > [email protected] > > > > > > Pankaj Gupta | Software Engineer > > > > > > *BrightRoll, Inc. *| Smart Video Advertising | www.brightroll.com > > > > > > > > > United States | Canada | United Kingdom | Germany > > > > > > > > > We're hiring< > > > > > > http://newton.newtonsoftware.com/career/CareerHome.action?clientId=8a42a12b3580e2060135837631485aa7 > > > > > > > ! > > > > > > -- *P* | (415) 677-9222 ext. 205 *F *| (415) 677-0895 | [email protected] Pankaj Gupta | Software Engineer *BrightRoll, Inc. *| Smart Video Advertising | www.brightroll.com United States | Canada | United Kingdom | Germany We're hiring<http://newton.newtonsoftware.com/career/CareerHome.action?clientId=8a42a12b3580e2060135837631485aa7> !
