thanks for reminding me of the HBASE version in CDH4 - that's something we'll definitely take into consideration.
-Mike On Feb 7, 2013, at 5:09 PM, Ted Yu wrote: > Thanks Michael for this information. > > FYI CDH4 (as of now) is based on HBase 0.92.x which doesn't have the two > features I cited below. > > On Thu, Feb 7, 2013 at 5:02 PM, Michael Ellery <[email protected]> wrote: > >> There is only one CF in this schema. >> >> Yes, we are looking at upgrading to CDH4, but it is not trivial since we >> cannot have cluster downtime. Our current upgrade plans involves additional >> hardware with side-by side clusters until everything is exported/imported. >> >> Thanks, >> Mike >> >> On Feb 7, 2013, at 4:34 PM, Ted Yu wrote: >> >>> How many column families are involved ? >>> >>> Have you considered upgrading to 0.94.4 where you would be able to >> benefit >>> from lazy seek, Data Block Encoding, etc ? >>> >>> Thanks >>> >>> On Thu, Feb 7, 2013 at 3:47 PM, Michael Ellery <[email protected]> >> wrote: >>> >>>> I'm looking for some advice about per row CQ (column qualifier) count >>>> guidelines. Our current schema design means we have a HIGHLY variable CQ >>>> count per row -- some rows have one or two CQs and some rows have >> upwards >>>> of 1 million. Each CQ is on the order of 100 bytes (for round numbers) >> and >>>> the cell values are null. We see highly variable and too often >>>> unacceptable read performance using this schema. I don't know for a >> fact >>>> that the CQ count variability is the source of our problems, but I am >>>> suspicious. >>>> >>>> I'm curious about others' experience with CQ counts per row -- are there >>>> some best practices/guidelines about how to optimally size the number of >>>> CQs per row. The other obvious solution will involve breaking this data >>>> into finer grained rows, which means shifting from GETs to SCANs - are >>>> there performance trade-offs in such a change? >>>> >>>> We are currently using CDH3u4, if that is relevant. All of our loading >> is >>>> done via HFILE loading (bulk), so we have not had to tune write >> performance >>>> beyond using bulk loads. Any advice appreciated, including what metrics >> we >>>> should be looking at to further diagnose our read performance >> challenges. >>>> >>>> Thanks, >>>> Mike Ellery >> >>
