Oh , so they have them packed into one cell . If so, now its reasonable that they claim it speed up row seeking . thanks a lot.
2013/8/28 Chris Perluss <[email protected]> > Sorry, accidentally hit send. I'm guessing a 10 minute time slice would > drop their space savings from 4-8x down to 2-4x. > On Aug 27, 2013 11:30 PM, "Chris Perluss" <[email protected]> wrote: > > > I'm still kinda new to HBase so please excuse me if I am wrong. I > suspect > > the reason has to do with a different slide from their presentation where > > they run a job every hour to combine all the cells from the previous hour > > into one cell. > > > > OpenTSDB has quite a long row key. It contains the metric name, the > > timestamp, and numerous optional tags. If you wrote one metric every > second > > then you would write 3600 columns per row key. Since the row key is very > > long, it uses quite a bit of space to store the same row key 3600 times. > > By combining an hours worth of data into one cell OpenTMS claims they > save > > 4-8x of their storage. > > > > If they stayed with their original 10 minute time slice then they would > > have to store their giant row key 6 times per hour instead of once. I'm > > going to guess this > > On Aug 27, 2013 10:50 PM, "林煒清" <[email protected]> wrote: > > > >> *Context*: > >> > >> Recently, I see openTSDB having their rows packed by period, thus end in > >> ten to hundred columns per row. It claim that this design performs more > >> efficient for row seeking.(on slide:Lessons learned from openTSDB) > >> > >> *My argument*: > >> > >> If *a block of HFile *is indexed by the start key of itself, which the > >> key > >> is made of {row, cf, cq} , then I think read time for the specific Key > >> should be the same for all tall-or-wide table case, since the physical > >> storage is sorted by key, not only by rowkey. > >> > >> So that under one column family the rowkey+column is a key as a whole, > >> shift a part of the rowkey to the column is the same as shift a part of > >> rowkey to the tail of the rowkey, vice versa. > >> > >> Follow this logic , under physical view the openTSDB did is just change > >> key > >> index by shifting a portion of timestamp bytes to position behind > rowkey, > >> that is column qualifier. > >> > >> *Question*: > >> > >> 1.When getting (get is a special scan, right?) a packed row worth of one > >> hour, or scan over one hour range of rows, I don't see there could any > >> performance improvement. So why openTSDB says packed row have better > >> performance for row seeking? > >> > >> 2.Almost every doc & books all recommend tall table design and > especially > >> at book "HBase in Action", it says that ,among others, the consideration > >> of > >> reading performance is the reason why tall is adopting, though I still > >> can't get it why? > >> > >> 3.Also that the KeyValues inside a block is searched by *linear* scan, > and > >> start key of blocks is by binary search , right? > >> > >> any hint is much appreciated. > >> > > >
