Hi Ryan, Thanks for your reply. So, even if I use get.addColumn(byte[] family, byte[] qualifier) for a certain cell, the HBase will have to traverse from the beginning of the column family to the qualifier I defined? Is it because HBase has to traverse all the blocks in the HFile to find the row key or the qualifier? I am confused here, in the keyvalue pairs in the data block, does the key refer to the row key or it refer to qualifier? Where is the row key and where is the qualifier? This has bothered me for a while. It would be nice to figure it out. Many thanks.
William On Tue, Oct 12, 2010 at 1:12 AM, Ryan Rawson <[email protected]> wrote: > > Yes this is spot on. When hbase scans we read a block, iterate through the > keys in the block then goes to the next block. We try to be as efficient as > possible, but the inescapable fact remains we must read all the intervening > data. We can do tricks (in 0.90) to use the block index to skip some blocks, > but it is not always possible. > On Oct 11, 2010 5:01 PM, "Sean Bigdatafun" <[email protected]> > wrote: > > I think this is a good suggestion too. > > > > HBase linearly scans through the 64KB that is bring to memory. If big data > > payload (yet unused in a query/scan) is mixed with small data payload, it > > will be rather ineffective, I think? > > > > On Mon, Oct 11, 2010 at 9:43 AM, Ryan Rawson <[email protected]> wrote: > > > >> The reason I talk about value size is one area where multiple families > >> are good is when you have really large values in one column and > >> smaller values in different columns. So if you want to just read the > >> small values without scanning through the big values you can use > >> separate column families. > >> > >> -ryan > >> > >> On Mon, Oct 11, 2010 at 9:32 AM, Jean-Daniel Cryans <[email protected]> > >> wrote: > >> >> Yes. I agree. OOME unlikely. I misinterpreted my current problem. > >> >> I found, that this (gc timeout) on my 0.89-stumpbleupon hbase occurs > >> >> only if writeToWAL=false. My RS eats all available memory (5GB), but > >> >> don't get OOME. I try ti figure out what is going on. > >> > > >> > Long GC pauses happens for many different reasons, first make sure > >> > that your IO, CPU, and RAM aren't over committed and that there's no > >> > swap. > >> > > >> >> Hm.. How I can flush family from client side? I don't see any api in > >> 0.20.x. > >> >> Is it 0.89 api changes? (don't dig into 0.89 yet). > >> >> > >> > > >> > You can't, I was talking about a possible fix in the code. > >> > > >> >> > >> >> Sorry for wrong information. > >> > > >> > No problem :) > >> > > >> > J-D > >> > > >>
