Re: Pack rows into a wide row for better performance?

林煒清 Wed, 28 Aug 2013 19:42:58 -0700

Oh , so they have them packed into one cell .  If so, now its reasonable
that they claim it speed up row seeking .
thanks a lot.



2013/8/28 Chris Perluss <[email protected]>

> Sorry, accidentally hit send. I'm guessing a 10 minute time slice would
> drop their space savings from 4-8x down to 2-4x.
> On Aug 27, 2013 11:30 PM, "Chris Perluss" <[email protected]> wrote:
>
> > I'm still kinda new to HBase so please excuse me if I am wrong.  I
> suspect
> > the reason has to do with a different slide from their presentation where
> > they run a job every hour to combine all the cells from the previous hour
> > into one cell.
> >
> > OpenTSDB has quite a long row key. It contains the metric name, the
> > timestamp, and numerous optional tags. If you wrote one metric every
> second
> > then you would write 3600 columns per row key. Since the row key is very
> > long, it uses quite a bit of space to store the same row key 3600 times.
> > By combining an hours worth of data into one cell OpenTMS claims they
> save
> > 4-8x of their storage.
> >
> > If they stayed with their original 10 minute time slice then they would
> > have to store their giant row key 6 times per hour instead of once. I'm
> > going to guess this
> > On Aug 27, 2013 10:50 PM, "林煒清" <[email protected]> wrote:
> >
> >> *Context*:
> >>
> >> Recently, I see openTSDB having their rows packed by period, thus end in
> >> ten to hundred columns per row. It claim that this design performs more
> >> efficient for row seeking.(on slide:Lessons learned from openTSDB)
> >>
> >> *My argument*:
> >>
> >>  If *a block of HFile *is indexed by the start key of itself, which the
> >> key
> >> is made of {row, cf, cq} , then I think read time for the specific Key
> >> should be the same for all tall-or-wide table case, since the physical
> >> storage is sorted by key, not only by rowkey.
> >>
> >>  So that under one column family the rowkey+column is a key as a whole,
> >> shift a part of the rowkey to the column is the same as shift a part of
> >> rowkey to the tail of the rowkey, vice versa.
> >>
> >> Follow this logic , under physical view the openTSDB did is just change
> >> key
> >> index by shifting a portion of timestamp bytes to position behind
> rowkey,
> >> that is column qualifier.
> >>
> >> *Question*:
> >>
> >> 1.When getting (get is a special scan, right?) a packed row worth of one
> >> hour, or scan over one hour range of rows, I don't see there could any
> >> performance improvement. So why openTSDB says packed row have better
> >> performance for row seeking?
> >>
> >> 2.Almost every doc & books all recommend tall table design and
> especially
> >> at book "HBase in Action", it says that ,among others, the consideration
> >> of
> >> reading performance is the reason why tall is adopting, though I still
> >> can't get it why?
> >>
> >> 3.Also that the KeyValues inside a block is searched by *linear* scan,
> and
> >> start key of blocks is by binary search , right?
> >>
> >> any hint is much appreciated.
> >>
> >
>

Re: Pack rows into a wide row for better performance?

Reply via email to