> I feel that warming up the block and
index cache could be a useful feature for many workflows. Would it be a
good idea to have a JIRA for that?
I would suggest yes.  You can log the details that you observed and can
discuss over there.
In your case you had multi level index blocks itself and hence the
restarting of hbase had to load them on demand.

Thanks for the useful discussion.

Regards
Ram



On Thu, Jun 6, 2013 at 8:22 AM, Pankaj Gupta <[email protected]> wrote:

> I'm not sure what caused so many index block misses. At the time I ran the
> experiment had over 12 GB of RAM assigned to block cache. My understanding
> is that since I had restarted HBase before running this experiment it was
> basically loading index blocks as and when needed and thus index misses
> were spread over a period of time. I monitored the region server while
> running this debugging session and didn't see a single block eviction so it
> couldn't be that the index blocks were being kicked out by something else.
>
> I've got some really good information in this thread and I thank you all.
> The blockSeek function in HFileReaderV2 clearly confirms the linear nature
> of scan for finding a key in a block. I feel that warming up the block and
> index cache could be a useful feature for many workflows. Would it be a
> good idea to have a JIRA for that?
>
> Thanks,
> Pankaj
>
>
> On Wed, Jun 5, 2013 at 1:24 AM, Anoop John <[email protected]> wrote:
>
> > Why there are so many miss for the index blocks? WHat is the block cache
> > mem you use?
> >
> > On Wed, Jun 5, 2013 at 12:37 PM, ramkrishna vasudevan <
> > [email protected]> wrote:
> >
> > > I get your point Pankaj.
> > > Going thro the code to confirm it
> > >     // Data index. We also read statistics about the block index
> written
> > > after
> > >     // the root level.
> > >     dataBlockIndexReader.readMultiLevelIndexRoot(
> > >         blockIter.nextBlockWithBlockType(BlockType.ROOT_INDEX),
> > >         trailer.getDataIndexCount());
> > >
> > >     // Meta index.
> > >     metaBlockIndexReader.readRootIndex(
> > >         blockIter.nextBlockWithBlockType(BlockType.ROOT_INDEX),
> > >         trailer.getMetaIndexCount());
> > >
> > > We read the root level of the multilevel index and the actual root
> index.
> > > So as and when when we need new index blocks we will be hitting the
> disk
> > > and your observation is correct.  Sorry if i had confused you in this.
> > > The new version of HFile was mainly to address the concern in the
> > previous
> > > versoin where the entire indices was in memory.  The version V2
> addressed
> > > that concern like having the root level (something like metadata of the
> > > indices) and from there you should be able to get new index blocks.
> > > But there are chances that if you region size is small you may have
> only
> > > one level and the entire thing may be in memory.
> > >
> > > Regards
> > > Ram
> > >
> > >
> > > On Wed, Jun 5, 2013 at 11:56 AM, Pankaj Gupta <[email protected]>
> > > wrote:
> > >
> > > > Sorry, forgot to mention that I added the log statements to the
> method
> > > > readBlock in HFileReaderV2.java. I'm on hbase 0.94.2.
> > > >
> > > >
> > > > On Tue, Jun 4, 2013 at 11:16 PM, Pankaj Gupta <[email protected]
> >
> > > > wrote:
> > > >
> > > > > Some context on how I observed bloom filters being loaded
> > constantly. I
> > > > > added the following logging statements to HFileReaderV2.java:
> > > > > }
> > > > >         if (!useLock) {
> > > > >           // check cache again with lock
> > > > >           useLock = true;
> > > > >           continue;
> > > > >         }
> > > > >
> > > > >         // Load block from filesystem.
> > > > >         long startTimeNs = System.nanoTime();
> > > > >         HFileBlock hfileBlock =
> > > > > fsBlockReader.readBlockData(dataBlockOffset,
> > > > >             onDiskBlockSize, -1, pread);
> > > > >         hfileBlock = dataBlockEncoder.diskToCacheFormat(hfileBlock,
> > > > >             isCompaction);
> > > > >         validateBlockType(hfileBlock, expectedBlockType);
> > > > >         passSchemaMetricsTo(hfileBlock);
> > > > >         BlockCategory blockCategory =
> > > > > hfileBlock.getBlockType().getCategory();
> > > > >
> > > > > // My logging statements ---->
> > > > >         if(blockCategory == BlockCategory.INDEX) {
> > > > >           LOG.info("index block miss, reading from disk " +
> > cacheKey);
> > > > >         } else if (blockCategory == BlockCategory.BLOOM) {
> > > > >           LOG.info("bloom block miss, reading from disk " +
> > cacheKey);
> > > > >         } else {
> > > > >           LOG.info("block miss other than index or bloom, reading
> > from
> > > > > disk " + cacheKey);
> > > > >         }
> > > > > //-------------->
> > > > >         final long delta = System.nanoTime() - startTimeNs;
> > > > >         HFile.offerReadLatency(delta, pread);
> > > > >         getSchemaMetrics().updateOnCacheMiss(blockCategory,
> > > isCompaction,
> > > > > delta);
> > > > >
> > > > >         // Cache the block if necessary
> > > > >         if (cacheBlock && cacheConf.shouldCacheBlockOnRead(
> > > > >             hfileBlock.getBlockType().getCategory())) {
> > > > >           cacheConf.getBlockCache().cacheBlock(cacheKey,
> hfileBlock,
> > > > >               cacheConf.isInMemory());
> > > > >         }
> > > > >
> > > > >         if (hfileBlock.getBlockType() == BlockType.DATA) {
> > > > >           HFile.dataBlockReadCnt.incrementAndGet();
> > > > >         }
> > > > >
> > > > > With these in place I saw the following statements in log:
> > > > > 2013-06-05 01:04:55,281 INFO
> > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: index block miss,
> > > reading
> > > > > from disk 11958ab7a4a1492e853743b02e1bd7b1_30361506
> > > > > 2013-06-05 01:05:00,579 INFO
> > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: index block miss,
> > > reading
> > > > > from disk 52cded0c399b48fdbccd8b3d4e25502f_28779560
> > > > > 2013-06-05 01:07:41,335 INFO
> > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss,
> > > reading
> > > > > from disk 11958ab7a4a1492e853743b02e1bd7b1_4199735
> > > > > 2013-06-05 01:08:58,460 INFO
> > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss,
> > > reading
> > > > > from disk 11958ab7a4a1492e853743b02e1bd7b1_8519720
> > > > > 2013-06-05 01:11:01,545 INFO
> > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss,
> > > reading
> > > > > from disk 11958ab7a4a1492e853743b02e1bd7b1_12838948
> > > > > 2013-06-05 01:11:03,035 INFO
> > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss,
> > > reading
> > > > > from disk 52cded0c399b48fdbccd8b3d4e25502f_3973250
> > > > > 2013-06-05 01:11:36,339 INFO
> > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss,
> > > reading
> > > > > from disk 11958ab7a4a1492e853743b02e1bd7b1_17159812
> > > > > 2013-06-05 01:12:35,398 INFO
> > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss,
> > > reading
> > > > > from disk 11958ab7a4a1492e853743b02e1bd7b1_21478349
> > > > > 2013-06-05 01:13:02,572 INFO
> > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss,
> > > reading
> > > > > from disk 11958ab7a4a1492e853743b02e1bd7b1_25798003
> > > > > 2013-06-05 01:13:03,260 INFO
> > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss,
> > > reading
> > > > > from disk 52cded0c399b48fdbccd8b3d4e25502f_8068381
> > > > > 2013-06-05 01:13:20,265 INFO
> > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss,
> > > reading
> > > > > from disk 11958ab7a4a1492e853743b02e1bd7b1_30118048
> > > > > 2013-06-05 01:13:20,522 INFO
> > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: index block miss,
> > > reading
> > > > > from disk 11958ab7a4a1492e853743b02e1bd7b1_60833137
> > > > > 2013-06-05 01:13:32,261 INFO
> > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss,
> > > reading
> > > > > from disk 11958ab7a4a1492e853743b02e1bd7b1_34545951
> > > > > 2013-06-05 01:13:48,504 INFO
> > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss,
> > > reading
> > > > > from disk 11958ab7a4a1492e853743b02e1bd7b1_38865311
> > > > > 2013-06-05 01:13:49,951 INFO
> > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss,
> > > reading
> > > > > from disk 52cded0c399b48fdbccd8b3d4e25502f_12161793
> > > > > 2013-06-05 01:14:02,073 INFO
> > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss,
> > > reading
> > > > > from disk 11958ab7a4a1492e853743b02e1bd7b1_43185677
> > > > > 2013-06-05 01:14:12,956 INFO
> > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss,
> > > reading
> > > > > from disk 11958ab7a4a1492e853743b02e1bd7b1_47506066
> > > > > 2013-06-05 01:14:25,132 INFO
> > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss,
> > > reading
> > > > > from disk 11958ab7a4a1492e853743b02e1bd7b1_51825831
> > > > > 2013-06-05 01:14:25,946 INFO
> > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss,
> > > reading
> > > > > from disk 52cded0c399b48fdbccd8b3d4e25502f_16257519
> > > > > 2013-06-05 01:14:34,478 INFO
> > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss,
> > > reading
> > > > > from disk 11958ab7a4a1492e853743b02e1bd7b1_56145793
> > > > > 2013-06-05 01:14:45,319 INFO
> > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss,
> > > reading
> > > > > from disk 11958ab7a4a1492e853743b02e1bd7b1_60466405
> > > > > 2013-06-05 01:14:45,998 INFO
> > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: index block miss,
> > > reading
> > > > > from disk 11958ab7a4a1492e853743b02e1bd7b1_91304775
> > > > > 2013-06-05 01:14:58,203 INFO
> > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss,
> > > reading
> > > > > from disk 11958ab7a4a1492e853743b02e1bd7b1_64893493
> > > > > 2013-06-05 01:14:58,463 INFO
> > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss,
> > > reading
> > > > > from disk 52cded0c399b48fdbccd8b3d4e25502f_20352561
> > > > > 2013-06-05 01:15:09,299 INFO
> > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss,
> > > reading
> > > > > from disk 11958ab7a4a1492e853743b02e1bd7b1_69214092
> > > > > 2013-06-05 01:15:32,944 INFO
> > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss,
> > > reading
> > > > > from disk 11958ab7a4a1492e853743b02e1bd7b1_73533616
> > > > > 2013-06-05 01:15:46,903 INFO
> > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss,
> > > reading
> > > > > from disk 11958ab7a4a1492e853743b02e1bd7b1_77865906
> > > > > 2013-06-05 01:15:47,273 INFO
> > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss,
> > > reading
> > > > > from disk 52cded0c399b48fdbccd8b3d4e25502f_24448138
> > > > > 2013-06-05 01:15:55,312 INFO
> > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss,
> > > reading
> > > > > from disk 11958ab7a4a1492e853743b02e1bd7b1_82185687
> > > > > 2013-06-05 01:16:07,591 INFO
> > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss,
> > > reading
> > > > > from disk 11958ab7a4a1492e853743b02e1bd7b1_86506129
> > > > > 2013-06-05 01:16:20,728 INFO
> > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss,
> > > reading
> > > > > from disk 11958ab7a4a1492e853743b02e1bd7b1_90825624
> > > > > 2013-06-05 01:16:22,551 INFO
> > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss,
> > > reading
> > > > > from disk 52cded0c399b48fdbccd8b3d4e25502f_28542144
> > > > > 2013-06-05 01:16:22,810 INFO
> > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: index block miss,
> > > reading
> > > > > from disk 11958ab7a4a1492e853743b02e1bd7b1_121777484
> > > > > 2013-06-05 01:16:23,035 INFO
> > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: index block miss,
> > > reading
> > > > > from disk 52cded0c399b48fdbccd8b3d4e25502f_57670002
> > > > > 2013-06-05 01:16:33,196 INFO
> > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss,
> > > reading
> > > > > from disk 11958ab7a4a1492e853743b02e1bd7b1_95253904
> > > > > 2013-06-05 01:16:48,187 INFO
> > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss,
> > > reading
> > > > > from disk 11958ab7a4a1492e853743b02e1bd7b1_99574899
> > > > > 2013-06-05 01:17:06,648 INFO
> > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss,
> > > reading
> > > > > from disk 11958ab7a4a1492e853743b02e1bd7b1_103895087
> > > > > 2013-06-05 01:17:10,526 INFO
> > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss,
> > > reading
> > > > > from disk 52cded0c399b48fdbccd8b3d4e25502f_32744846
> > > > > 2013-06-05 01:17:22,939 INFO
> > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss,
> > > reading
> > > > > from disk 11958ab7a4a1492e853743b02e1bd7b1_108214936
> > > > > 2013-06-05 01:17:36,010 INFO
> > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss,
> > > reading
> > > > > from disk 11958ab7a4a1492e853743b02e1bd7b1_112535209
> > > > > 2013-06-05 01:17:46,028 INFO
> > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss,
> > > reading
> > > > > from disk 11958ab7a4a1492e853743b02e1bd7b1_116855742
> > > > > 2013-06-05 01:17:47,029 INFO
> > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss,
> > > reading
> > > > > from disk 52cded0c399b48fdbccd8b3d4e25502f_36838416
> > > > > 2013-06-05 01:17:54,472 INFO
> > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss,
> > > reading
> > > > > from disk 11958ab7a4a1492e853743b02e1bd7b1_121174753
> > > > > 2013-06-05 01:17:55,491 INFO
> > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: index block miss,
> > > reading
> > > > > from disk 11958ab7a4a1492e853743b02e1bd7b1_152248177
> > > > > 2013-06-05 01:18:05,912 INFO
> > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss,
> > > reading
> > > > > from disk 11958ab7a4a1492e853743b02e1bd7b1_125601238
> > > > > 2013-06-05 01:18:15,417 INFO
> > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss,
> > > reading
> > > > > from disk 11958ab7a4a1492e853743b02e1bd7b1_129921797
> > > > > 2013-06-05 01:18:16,713 INFO
> > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss,
> > > reading
> > > > > from disk 52cded0c399b48fdbccd8b3d4e25502f_40933856
> > > > > 2013-06-05 01:18:29,521 INFO
> > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss,
> > > reading
> > > > > from disk 11958ab7a4a1492e853743b02e1bd7b1_134242324
> > > > > 2013-06-05 01:18:38,653 INFO
> > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss,
> > > reading
> > > > > from disk 11958ab7a4a1492e853743b02e1bd7b1_138561860
> > > > > 2013-06-05 01:18:49,280 INFO
> > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss,
> > > reading
> > > > > from disk 11958ab7a4a1492e853743b02e1bd7b1_142881436
> > > > > 2013-06-05 01:18:50,052 INFO
> > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss,
> > > reading
> > > > > from disk 52cded0c399b48fdbccd8b3d4e25502f_45029905
> > > > > 2013-06-05 01:18:58,339 INFO
> > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss,
> > > reading
> > > > > from disk 11958ab7a4a1492e853743b02e1bd7b1_147201737
> > > > > 2013-06-05 01:19:06,371 INFO
> > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: bloom block miss,
> > > reading
> > > > > from disk 11958ab7a4a1492e853743b02e1bd7b1_151533253
> > > > > 2013-06-05 01:19:07,782 INFO
> > > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2: index block miss,
> > > reading
> > > > > from disk 11958ab7a4a1492e853743b02e1bd7b1_182719269
> > > > >
> > > > > I kept seeing these statements appearing constantly over a long
> > period,
> > > > > this seemed to confirm to me that bloom filter blocks are being
> > loaded
> > > > over
> > > > > a period time, which also matched what I read about HFileV2. May
> be I
> > > am
> > > > > wrong about both. Would love to understand what's really going on.
> > > > >
> > > > > Thanks in Advance,
> > > > > Pankaj
> > > > >
> > > > >
> > > > >
> > > > > On Tue, Jun 4, 2013 at 11:05 PM, ramkrishna vasudevan <
> > > > > [email protected]> wrote:
> > > > >
> > > > >> Whenever the region is opened all the bloom filter meta data are
> > > loaded
> > > > >> into memory.  I think his concern is every time all the store
> files
> > > are
> > > > >> read and then we load it into memory and wants some faster ways of
> > > doing
> > > > >> it.
> > > > >> Asaf you are right.
> > > > >>
> > > > >> Regards
> > > > >> Ram
> > > > >>
> > > > >>
> > > > >> On Wed, Jun 5, 2013 at 11:22 AM, Asaf Mesika <
> [email protected]
> > >
> > > > >> wrote:
> > > > >>
> > > > >> > When you do the first read of this region, wouldn't this load
> all
> > > > bloom
> > > > >> > filters?
> > > > >> >
> > > > >> >
> > > > >> >
> > > > >> > On Wed, Jun 5, 2013 at 8:43 AM, ramkrishna vasudevan <
> > > > >> > [email protected]> wrote:
> > > > >> >
> > > > >> > > for the question whether you will be able to do a warm up for
> > the
> > > > >> bloom
> > > > >> > and
> > > > >> > > block cache i don't think it is possible now.
> > > > >> > >
> > > > >> > > Regards
> > > > >> > > Ram
> > > > >> > >
> > > > >> > >
> > > > >> > > On Wed, Jun 5, 2013 at 10:57 AM, Asaf Mesika <
> > > [email protected]
> > > > >
> > > > >> > > wrote:
> > > > >> > >
> > > > >> > > > If you will read HFile v2 document on HBase site you will
> > > > understand
> > > > >> > > > completely how the search for a record works and why there
> is
> > > > linear
> > > > >> > > search
> > > > >> > > > in the block but binary search to get to the right block.
> > > > >> > > > Also bear in mind the amount of keys in a blocks is not big
> > > since
> > > > a
> > > > >> > block
> > > > >> > > > in HFile by default is 65k, thus from a 10GB HFile you are
> > only
> > > > >> fully
> > > > >> > > > scanning 65k out of it.
> > > > >> > > >
> > > > >> > > > On Wednesday, June 5, 2013, Pankaj Gupta wrote:
> > > > >> > > >
> > > > >> > > > > Thanks for the replies. I'll take a look at
> > > > >> src/main/java/org/apache/
> > > > >> > > > > hadoop/hbase/coprocessor/BaseRegionObserver.java.
> > > > >> > > > >
> > > > >> > > > > @ramkrishna: I do want to have bloom filter and block
> index
> > > all
> > > > >> the
> > > > >> > > time.
> > > > >> > > > > For good read performance they're critical in my workflow.
> > The
> > > > >> worry
> > > > >> > is
> > > > >> > > > > that when HBase is restarted it will take a long time for
> > them
> > > > to
> > > > >> get
> > > > >> > > > > populated again and performance will suffer. If there was
> a
> > > way
> > > > of
> > > > >> > > > loading
> > > > >> > > > > them quickly and warm up the table then we'll be able to
> > > restart
> > > > >> > HBase
> > > > >> > > > > without causing slow down in processing.
> > > > >> > > > >
> > > > >> > > > >
> > > > >> > > > > On Tue, Jun 4, 2013 at 9:29 PM, Ted Yu <
> [email protected]
> > >
> > > > >> wrote:
> > > > >> > > > >
> > > > >> > > > > > bq. But i am not very sure if we can control the files
> > > getting
> > > > >> > > selected
> > > > >> > > > > for
> > > > >> > > > > > compaction in the older verisons.
> > > > >> > > > > >
> > > > >> > > > > > Same mechanism is available in 0.94
> > > > >> > > > > >
> > > > >> > > > > > Take a look
> > > > >> > > > > > at
> > > > >> > > > > >
> > > > >> > > >
> > > > >> >
> > > > >>
> > > >
> > src/main/java/org/apache/hadoop/hbase/coprocessor/BaseRegionObserver.java
> > > > >> > > > > > where you would find the following methods (and more):
> > > > >> > > > > >
> > > > >> > > > > >   public void preCompactSelection(final
> > > > >> > > > > > ObserverContext<RegionCoprocessorEnvironment> c,
> > > > >> > > > > >       final Store store, final List<StoreFile>
> candidates,
> > > > final
> > > > >> > > > > > CompactionRequest request)
> > > > >> > > > > >   public InternalScanner
> > > > >> > > > > > preCompact(ObserverContext<RegionCoprocessorEnvironment>
> > e,
> > > > >> > > > > >       final Store store, final InternalScanner scanner)
> > > throws
> > > > >> > > > > IOException
> > > > >> > > > > > {
> > > > >> > > > > >
> > > > >> > > > > > Cheers
> > > > >> > > > > >
> > > > >> > > > > > On Tue, Jun 4, 2013 at 8:14 PM, ramkrishna vasudevan <
> > > > >> > > > > > [email protected]> wrote:
> > > > >> > > > > >
> > > > >> > > > > > > >>Does Minor compaction remove HFiles in which all
> > entries
> > > > are
> > > > >> > out
> > > > >> > > of
> > > > >> > > > > > >    TTL or does only Major compaction do that
> > > > >> > > > > > > Yes it applies for Minor compactions.
> > > > >> > > > > > > >>Is there a way of configuring major compaction to
> > > compact
> > > > >> only
> > > > >> > > > files
> > > > >> > > > > > >    older than a certain time or to compress all the
> > files
> > > > >> except
> > > > >> > > the
> > > > >> > > > > > latest
> > > > >> > > > > > >    few?
> > > > >> > > > > > > In the latest trunk version the compaction algo itself
> > can
> > > > be
> > > > >> > > > plugged.
> > > > >> > > > > > >  There are some coprocessor hooks that gives control
> on
> > > the
> > > > >> > scanner
> > > > >> > > > > that
> > > > >> > > > > > > gets created for compaction with which we can control
> > the
> > > > KVs
> > > > >> > being
> > > > >> > > > > > > selected. But i am not very sure if we can control the
> > > files
> > > > >> > > getting
> > > > >> > > > > > > selected for compaction in the older verisons.
> > > > >> > > > > > > >> The above excerpt seems to imply to me that the
> > search
> > > > for
> > > > >> key
> > > > >> > > > > inside
> > > > >> > > > > > a
> > > > >> > > > > > > block
> > > > >> > > > > > > is linear and I feel I must be reading it wrong. I
> would
> > > > >> expect
> > > > >> > the
> > > > >> > > > > scan
> > > > >> > > > > > to
> > > > >> > > > > > > be a binary search.
> > > > >> > > > > > > Once the data block is identified for a key, we seek
> to
> > > the
> > > > >> > > beginning
> > > > >> > > > > of
> > > > >> > > > > > > the block and then do a linear search until we reach
> the
> > > > exact
> > > > >> > key
> > > > >> > > > that
> > > > >> > > > > > we
> > > > >> > > > > > > are looking out for.  Because internally the data
> (KVs)
> > > are
> > > > >> > stored
> > > > >> > > as
> > > > >> > > > > > byte
> > > > >> > > > > > > buffers per block and it follows this pattern
> > > > >> > > > > > > <keylength><valuelength><keybytearray><valuebytearray>
> > > > >> > > > > > > >>Is there a way to warm up the bloom filter and block
> > > index
> > > > >> > cache
> > > > >> > > > for
> > > > >> > > > > > >    a table?
> > > > >> > > > > > > You always want the bloom and block index to be in
> > cache?
> > > > >> > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > > > On Wed, Jun 5, 2013 at 7:45 AM, Pankaj Gupta <
> > > > >> > > [email protected]>
> > > > >> > > > > > > wrote:
> > > > >> > > > > > >
> > > > >> > > > > > > > Hi,
> > > > >> > > > > > > >
> > > > >> > > > > > > > I have a few small questions regarding HBase. I've
> > > > searched
> > > > >> the
> > > > >> > > > forum
> > > > >> > > > > > but
> > > > >> > > > > > > > couldn't find clear answers hence asking them here:
> > > > >> > > > > > > >
> > > > >> > > > > > > >
> > > > >> > > > > > > >    1. Does Minor compaction remove HFiles in which
> all
> > > > >> entries
> > > > >> > > are
> > > > >> > > > > out
> > > > >> > > > > > of
> > > > >> > > > > > > >    TTL or does only Major compaction do that? I
> found
> > > this
> > > > >> > jira:
> > > > >> > > > > > > >
> > https://issues.apache.org/jira/browse/HBASE-5199but I
> > > > >> > dont'
> > > > >> > > > know
> > > > >> > > > > > if
> > > > >> > > > > > > > the
> > > > >> > > > > > > >    compaction being talked about there is minor or
> > > major.
> > > > >> > > > > > > >    2. Is there a way of configuring major compaction
> > to
> > > > >> compact
> > > > >> > > > only
> > > > >> > > > > > > files
> > > > >> > > > > > > >    older than a certain time or to compress all the
> > > files
> > > > >> > except
> > > > >> > > > the
> > > > >> > > > > > > latest
> > > > >> > > > > > > >    few? We basically want to use the time based
> > > filtering
> > > > >> > > > > optimization
> > > > >> > > > > > in
> > > > >> > > > > > > >    HBase to get the latest additions to the table
> and
> > > > since
> > > > >> > major
> > > > >> > > > > > > > compaction
> > > > >> > > > > > > >    bunches everything into one file, it would defeat
> > the
> > > > >> > > > > optimization.
> > > > >> > > > > > > >    3. Is there a way to warm up the bloom filter and
> > > block
> > > > >> > index
> > > > >> > > > > cache
> > > > >> > > > > > > for
> > > > >> > > > > > > >    a table? This is for a case where I always want
> the
> > > > bloom
> > > > >> > > > filters
> > > > >> > > > > > and
> > > > >> > > > > > > > index
> > > > >> > > > > > > >    to be all in memory, but not the
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > >
> > > > >
> > > > > *P* | (415) 677-9222 ext. 205 *F *| (415) 677-0895 |
> > > > [email protected]
> > > > >
> > > > > Pankaj Gupta | Software Engineer
> > > > >
> > > > > *BrightRoll, Inc. *| Smart Video Advertising | www.brightroll.com
> > > > >
> > > > >
> > > > > United States | Canada | United Kingdom | Germany
> > > > >
> > > > >
> > > > > We're hiring<
> > > >
> > >
> >
> http://newton.newtonsoftware.com/career/CareerHome.action?clientId=8a42a12b3580e2060135837631485aa7
> > > > >
> > > > > !
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > >
> > > >
> > > > *P* | (415) 677-9222 ext. 205 *F *| (415) 677-0895 |
> > > [email protected]
> > > >
> > > > Pankaj Gupta | Software Engineer
> > > >
> > > > *BrightRoll, Inc. *| Smart Video Advertising | www.brightroll.com
> > > >
> > > >
> > > > United States | Canada | United Kingdom | Germany
> > > >
> > > >
> > > > We're hiring<
> > > >
> > >
> >
> http://newton.newtonsoftware.com/career/CareerHome.action?clientId=8a42a12b3580e2060135837631485aa7
> > > > >
> > > > !
> > > >
> > >
> >
>
>
>
> --
>
>
> *P* | (415) 677-9222 ext. 205 *F *| (415) 677-0895 | [email protected]
>
> Pankaj Gupta | Software Engineer
>
> *BrightRoll, Inc. *| Smart Video Advertising | www.brightroll.com
>
>
> United States | Canada | United Kingdom | Germany
>
>
> We're hiring<
> http://newton.newtonsoftware.com/career/CareerHome.action?clientId=8a42a12b3580e2060135837631485aa7
> >
> !
>

Reply via email to