Thanks Michael for this information. FYI CDH4 (as of now) is based on HBase 0.92.x which doesn't have the two features I cited below.
On Thu, Feb 7, 2013 at 5:02 PM, Michael Ellery <[email protected]> wrote: > There is only one CF in this schema. > > Yes, we are looking at upgrading to CDH4, but it is not trivial since we > cannot have cluster downtime. Our current upgrade plans involves additional > hardware with side-by side clusters until everything is exported/imported. > > Thanks, > Mike > > On Feb 7, 2013, at 4:34 PM, Ted Yu wrote: > > > How many column families are involved ? > > > > Have you considered upgrading to 0.94.4 where you would be able to > benefit > > from lazy seek, Data Block Encoding, etc ? > > > > Thanks > > > > On Thu, Feb 7, 2013 at 3:47 PM, Michael Ellery <[email protected]> > wrote: > > > >> I'm looking for some advice about per row CQ (column qualifier) count > >> guidelines. Our current schema design means we have a HIGHLY variable CQ > >> count per row -- some rows have one or two CQs and some rows have > upwards > >> of 1 million. Each CQ is on the order of 100 bytes (for round numbers) > and > >> the cell values are null. We see highly variable and too often > >> unacceptable read performance using this schema. I don't know for a > fact > >> that the CQ count variability is the source of our problems, but I am > >> suspicious. > >> > >> I'm curious about others' experience with CQ counts per row -- are there > >> some best practices/guidelines about how to optimally size the number of > >> CQs per row. The other obvious solution will involve breaking this data > >> into finer grained rows, which means shifting from GETs to SCANs - are > >> there performance trade-offs in such a change? > >> > >> We are currently using CDH3u4, if that is relevant. All of our loading > is > >> done via HFILE loading (bulk), so we have not had to tune write > performance > >> beyond using bulk loads. Any advice appreciated, including what metrics > we > >> should be looking at to further diagnose our read performance > challenges. > >> > >> Thanks, > >> Mike Ellery > >
