Re: column count guidelines

Ted Yu Fri, 08 Feb 2013 09:50:45 -0800

The reason I mentioned 0.94.4 was that it is the most recent 0.94 release.

For the features, you can refer to the following JIRAs:
HBASE-4465 Lazy-seek optimization for StoreFile scanners
HBASE-4218 Data Block Encoding of KeyValues  (aka delta encoding / prefix
compression)


Cheers

On Fri, Feb 8, 2013 at 8:25 AM, Asaf Mesika <[email protected]> wrote:

> Can you elaborate more on that features? I thought 4 was just for bug
> fixes.
>
> Sent from my iPhone
>
> On 8 בפבר 2013, at 02:34, Ted Yu <[email protected]> wrote:
>
> How many column families are involved ?
>
> Have you considered upgrading to 0.94.4 where you would be able to benefit
> from lazy seek, Data Block Encoding, etc ?
>
> Thanks
>
> On Thu, Feb 7, 2013 at 3:47 PM, Michael Ellery <[email protected]>
> wrote:
>
> I'm looking for some advice about per row CQ (column qualifier) count
>
> guidelines. Our current schema design means we have a HIGHLY variable CQ
>
> count per row -- some rows have one or two CQs and some rows have upwards
>
> of 1 million. Each CQ is on the order of 100 bytes (for round numbers) and
>
> the cell values are null.  We see highly variable and too often
>
> unacceptable read performance using this schema.  I don't know for a fact
>
> that the CQ count variability is the source of our problems, but I am
>
> suspicious.
>
>
> I'm curious about others' experience with CQ counts per row -- are there
>
> some best practices/guidelines about how to optimally size the number of
>
> CQs per row. The other obvious solution will involve breaking this data
>
> into finer grained rows, which means shifting from GETs to SCANs - are
>
> there performance trade-offs in such a change?
>
>
> We are currently using CDH3u4, if that is relevant. All of our loading is
>
> done via HFILE loading (bulk), so we have not had to tune write performance
>
> beyond using bulk loads. Any advice appreciated, including what metrics we
>
> should be looking at to further diagnose our read performance challenges.
>
>
> Thanks,
>
> Mike Ellery
>

Re: column count guidelines

Reply via email to