How many column families are involved ? Have you considered upgrading to 0.94.4 where you would be able to benefit from lazy seek, Data Block Encoding, etc ?
Thanks On Thu, Feb 7, 2013 at 3:47 PM, Michael Ellery <[email protected]> wrote: > I'm looking for some advice about per row CQ (column qualifier) count > guidelines. Our current schema design means we have a HIGHLY variable CQ > count per row -- some rows have one or two CQs and some rows have upwards > of 1 million. Each CQ is on the order of 100 bytes (for round numbers) and > the cell values are null. We see highly variable and too often > unacceptable read performance using this schema. I don't know for a fact > that the CQ count variability is the source of our problems, but I am > suspicious. > > I'm curious about others' experience with CQ counts per row -- are there > some best practices/guidelines about how to optimally size the number of > CQs per row. The other obvious solution will involve breaking this data > into finer grained rows, which means shifting from GETs to SCANs - are > there performance trade-offs in such a change? > > We are currently using CDH3u4, if that is relevant. All of our loading is > done via HFILE loading (bulk), so we have not had to tune write performance > beyond using bulk loads. Any advice appreciated, including what metrics we > should be looking at to further diagnose our read performance challenges. > > Thanks, > Mike Ellery
