The full row name, column family name, and column qualifier are stored with every cell (called a KeyValue). Using gzip or lzo compression can greatly reduce the size of the data stored on disk, and prefix compression could eventually reduce the size of the data stored in memory.
But, to your question about performance, most get/scan operations require iterating through and comparing the byte arrays that back the KeyValues, so the more bytes it has to iterate over, the slower it will be. However, we're talking about sequential memory access which computers are very good at, so what *overall *performance increase you can expect from shortening the values is hard to say. On Wed, Jan 26, 2011 at 12:54 PM, Bill Graham <[email protected]> wrote: > I can't say from experience, but here's a thread that implies that > shorter column names are better. > > http://search-hadoop.com/m/oWZQd161GI22 > > On Tue, Jan 25, 2011 at 11:14 PM, JinChao Wen <[email protected]> > wrote: > > Hi all, > > > > If there are lots of very long column family name and column name in my > > table, is there any performance impact on query? > > > > thx. > > > > > > JinChao > > >
