Re: Column family data distribution and performance

Chris Tarnas Fri, 07 Jan 2011 10:22:31 -0800

On Jan 7, 2011, at 10:14 AM, Stack wrote:

> On Fri, Jan 7, 2011 at 10:01 AM, Chris Tarnas <[email protected]> wrote:
>> I was wondering how much impact on read and write performance a column 
>> family would have on rows where they don't contain any data?
>> 
> 
> The index column family would have data, right, just not data for every row?
>


yes - The row in the INDEX column family would have one key value - the rowkey 
that the index points to.

> If you don't query this index cf, then should be near to no impact.
> 

The index column family would not be requested when getting a "normal" row.

> You'd be querying the index and data independently?
> 

Yes - either a single scanner or get would be getting data from the INDEX 
column family or from the "data" column families but no single get/scanner 
would be retrieving from both sets. Of course index lookups require two gets 
(one to lookup the index, the other to get the desired row based on the index 
lookup) but that seems inevitable at this time.

> 
>> I'm testing out an indexing method where rather than have a separate table 
>> for storing indexes I just keep them in the same table in an INDEX column 
>> family. The construction of the rowkeys guarantees that an index value will 
>> never be the same as a rowkey of a normal row. This allows us to send all 
>> mutations for one row and its indexes in a single thrift call with a batch 
>> mutation rather than two thrift calls. Are there any serious back end 
>> downsides to this methodology?
>> 
> 
> I can't think of any.  Its definetly all upside from keeping two tables.
> 
> St.Ack


Thanks!
-chris

Re: Column family data distribution and performance

Reply via email to