The full row name, column family name, and column qualifier are stored with
every cell (called a KeyValue).  Using gzip or lzo compression can greatly
reduce the size of the data stored on disk, and prefix compression could
eventually reduce the size of the data stored in memory.

But, to your question about performance, most get/scan operations require
iterating through and comparing the byte arrays that back the KeyValues, so
the more bytes it has to iterate over, the slower it will be.  However,
we're talking about sequential memory access which computers are very good
at, so what *overall *performance increase you can expect from shortening
the values is hard to say.


On Wed, Jan 26, 2011 at 12:54 PM, Bill Graham <[email protected]> wrote:

> I can't say from experience, but here's a thread that implies that
> shorter column names are better.
>
> http://search-hadoop.com/m/oWZQd161GI22
>
> On Tue, Jan 25, 2011 at 11:14 PM, JinChao Wen <[email protected]>
> wrote:
> > Hi all,
> >
> > If there are lots of  very long column family name and column name in my
> > table,  is there any performance impact on query?
> >
> > thx.
> >
> >
> > JinChao
> >
>

Reply via email to