The reason I mentioned 0.94.4 was that it is the most recent 0.94 release. For the features, you can refer to the following JIRAs: HBASE-4465 Lazy-seek optimization for StoreFile scanners HBASE-4218 Data Block Encoding of KeyValues (aka delta encoding / prefix compression)
Cheers On Fri, Feb 8, 2013 at 8:25 AM, Asaf Mesika <[email protected]> wrote: > Can you elaborate more on that features? I thought 4 was just for bug > fixes. > > Sent from my iPhone > > On 8 בפבר 2013, at 02:34, Ted Yu <[email protected]> wrote: > > How many column families are involved ? > > Have you considered upgrading to 0.94.4 where you would be able to benefit > from lazy seek, Data Block Encoding, etc ? > > Thanks > > On Thu, Feb 7, 2013 at 3:47 PM, Michael Ellery <[email protected]> > wrote: > > I'm looking for some advice about per row CQ (column qualifier) count > > guidelines. Our current schema design means we have a HIGHLY variable CQ > > count per row -- some rows have one or two CQs and some rows have upwards > > of 1 million. Each CQ is on the order of 100 bytes (for round numbers) and > > the cell values are null. We see highly variable and too often > > unacceptable read performance using this schema. I don't know for a fact > > that the CQ count variability is the source of our problems, but I am > > suspicious. > > > I'm curious about others' experience with CQ counts per row -- are there > > some best practices/guidelines about how to optimally size the number of > > CQs per row. The other obvious solution will involve breaking this data > > into finer grained rows, which means shifting from GETs to SCANs - are > > there performance trade-offs in such a change? > > > We are currently using CDH3u4, if that is relevant. All of our loading is > > done via HFILE loading (bulk), so we have not had to tune write performance > > beyond using bulk loads. Any advice appreciated, including what metrics we > > should be looking at to further diagnose our read performance challenges. > > > Thanks, > > Mike Ellery >
