ColumnFamilies vs composite rows in one table.

2010-03-05 Thread Erik Holstad
What are the benefits of using multiple ColumnFamilies compared to using a
composite
row name?

Example: You have messages that you want to index on sent and to.

So you can either have
ColumnFamilyFrom:userTo:{userFrom-messageid}
ColumnFamilyTo:userFrom:{userTo-messageid}

or something like
ColumnFamily:user_to:{user1_messageId, user2_messageId}
ColumnFamily:user_from:{user1_messageId, user2_messageId}

One thing that I can see the advantage of using families are if you want to
use different types in the families. But are there others? Like storage
space,
read/write speeds etc.

-- 
Regards Erik


Re: ColumnFamilies vs composite rows in one table.

2010-03-05 Thread David Strauss
On 2010-03-05 18:04, Erik Holstad wrote:
 What are the benefits of using multiple ColumnFamilies compared to using
 a composite row name?

Just for terminology's sake, I'll note that rows have keys, not names.
Only columns and supercolumns have names.

I'm not the top expert here by any means, but I think the choice between
{CF-as-direction, key-as-person} and {key-as-person-and-direction} won't
affect performance substantially if the multiple CFs in the first option
are identically configured. All messages with the same source or
destination still share the same row.

What *would* make a huge difference is composite row keys like
from_userA_userB and to_userB_userA where you'd have to pull key ranges
to get all the messages to or from someone. That design would trade
performance for inbox scalability, assuming users distribute their
messages to a wide breadth other users.

 Example: You have messages that you want to index on sent and to.
 
 So you can either have
 ColumnFamilyFrom:userTo:{userFrom-messageid}
 ColumnFamilyTo:userFrom:{userTo-messageid}
 
 or something like
 ColumnFamily:user_to:{user1_messageId, user2_messageId}
 ColumnFamily:user_from:{user1_messageId, user2_messageId}

You've changed two different things between the examples:

(1) Whether direction is distinguished by the key or by the CF.
(2) Something about the columns, but this isn't clear or necessary to
support the change in CF/key structure.

What is the second change, and why did you make it?

-- 
David Strauss
   | da...@fourkitchens.com
Four Kitchens
   | http://fourkitchens.com
   | +1 512 454 6659 [office]
   | +1 512 870 8453 [direct]



signature.asc
Description: OpenPGP digital signature


Re: ColumnFamilies vs composite rows in one table.

2010-03-05 Thread David Strauss
On 2010-03-05 18:30, David Strauss wrote:
 On 2010-03-05 18:04, Erik Holstad wrote:
 So you can either have
 ColumnFamilyFrom:userTo:{userFrom-messageid}
 ColumnFamilyTo:userFrom:{userTo-messageid}

 or something like
 ColumnFamily:user_to:{user1_messageId, user2_messageId}
 ColumnFamily:user_from:{user1_messageId, user2_messageId}
 
 You've changed two different things between the examples:
 
 (1) Whether direction is distinguished by the key or by the CF.
 (2) Something about the columns, but this isn't clear or necessary to
 support the change in CF/key structure.

Upon further inspection, the first example appears to use the other
party to a message as the column name. That will only allow one
messageid for any unique direction, userA, userB. That seems broken to me.

-- 
David Strauss
   | da...@fourkitchens.com
Four Kitchens
   | http://fourkitchens.com
   | +1 512 454 6659 [office]
   | +1 512 870 8453 [direct]



signature.asc
Description: OpenPGP digital signature


Re: ColumnFamilies vs composite rows in one table.

2010-03-05 Thread Jonathan Ellis
Generally, you want to have different types of data in different CFs
so you can tune them separately (key / row caches).

Mixing different row types in one CF also makes doing get_slice_range
scans difficult.

On Fri, Mar 5, 2010 at 12:04 PM, Erik Holstad erikhols...@gmail.com wrote:
 What are the benefits of using multiple ColumnFamilies compared to using a
 composite
 row name?

 Example: You have messages that you want to index on sent and to.

 So you can either have
 ColumnFamilyFrom:userTo:{userFrom-messageid}
 ColumnFamilyTo:userFrom:{userTo-messageid}

 or something like
 ColumnFamily:user_to:{user1_messageId, user2_messageId}
 ColumnFamily:user_from:{user1_messageId, user2_messageId}

 One thing that I can see the advantage of using families are if you want to
 use different types in the families. But are there others? Like storage
 space,
 read/write speeds etc.

 --
 Regards Erik