Thanks Jean-daniel. I did go through the documentation, but there was no clear answer to interleaving puts from two or more row keys or if there was a way to reserve contiguous blocks per rowkey. I made some derivations but clearly, I was incorrect in some of them as you pointed out too. The questions were partly validations and partly doubt-riddance. :)
Thanks Abhishek i Sent from my iPad with iMstakes On Aug 23, 2012, at 17:19, "Jean-Daniel Cryans" <[email protected]> wrote: > Inline. In general I'd recommend you read the documentation more > closely and/or get the book. > > J-D > > On Thu, Aug 23, 2012 at 4:21 PM, Pamecha, Abhishek <[email protected]> wrote: >> 1. Can there be multiple row keys per block and then per HFile? Or is >> a block or Hfile dedicated to a single row key? > > Multiple row keys per HFile block. Read > http://hbase.apache.org/book.html#hfilev2 > >> I have a scenario, where for the same column family, some rowkeys will have >> very wide rows, say rowkey W, and some rowkeys will have very narrow rows, >> say rowkey N. In my case, puts for rowkeys W and N are interleaved with a >> ratio of say 90 rowkeyW puts vs 10 rowkeyN puts. On the get side, my app >> works on getting data for a single rowkey at a time. >> Will that mean for a rowkeyN, the entries will be scattered across regions >> on that same region server, given there are interleaved puts? Or Is there a >> way I can enforce contiguous writes to a region/Hfile reserved for rowkey >> N. This way, I can leverage the block cache and have the entire/most of >> rowkeyN fit in there for that session. > > The row keys are sorted according to their lexicographical order. See > http://hbase.apache.org/book.html#row > > If you don't want the big rows coexisting with the small rows, put > them in different column families or different tables. > >> 2. Is there a limit on number of HFiles that can exist per region? > > I think your understanding of HFiles being a bit wrong prompted you to > ask this, my previous answers probably make it so that you don't need > this answer anymore, but there it is just in case: > > The HFiles are compacted when reaching > hbase.hstore.compactionThreshold (default of 3) per family, and you > can have no more than hbase.hstore.blockingStoreFiles (default of 7). > > " Basically, on what criteria does a rowkey data gets split in two > regions [on the same region server]. I am assuming there can be many > regions per region server. And multiple regions for the same table can > belong in the same region server. > > A row key only lives in a single region since the regions are split > based on row keys. > >> 3. Also, is there a limit on the number of blocks that are created per >> HFile? > > No. > >> What determines whether a split is required? > > hbase.hregion.max.filesize, also see > http://hbase.apache.org/book.html#disable.splitting if you want to > change that.
