ColumnFamilies vs composite rows in one table.
What are the benefits of using multiple ColumnFamilies compared to using a composite row name? Example: You have messages that you want to index on sent and to. So you can either have ColumnFamilyFrom:userTo:{userFrom-messageid} ColumnFamilyTo:userFrom:{userTo-messageid} or something like ColumnFamily:user_to:{user1_messageId, user2_messageId} ColumnFamily:user_from:{user1_messageId, user2_messageId} One thing that I can see the advantage of using families are if you want to use different types in the families. But are there others? Like storage space, read/write speeds etc. -- Regards Erik
Re: ColumnFamilies vs composite rows in one table.
On 2010-03-05 18:04, Erik Holstad wrote: What are the benefits of using multiple ColumnFamilies compared to using a composite row name? Just for terminology's sake, I'll note that rows have keys, not names. Only columns and supercolumns have names. I'm not the top expert here by any means, but I think the choice between {CF-as-direction, key-as-person} and {key-as-person-and-direction} won't affect performance substantially if the multiple CFs in the first option are identically configured. All messages with the same source or destination still share the same row. What *would* make a huge difference is composite row keys like from_userA_userB and to_userB_userA where you'd have to pull key ranges to get all the messages to or from someone. That design would trade performance for inbox scalability, assuming users distribute their messages to a wide breadth other users. Example: You have messages that you want to index on sent and to. So you can either have ColumnFamilyFrom:userTo:{userFrom-messageid} ColumnFamilyTo:userFrom:{userTo-messageid} or something like ColumnFamily:user_to:{user1_messageId, user2_messageId} ColumnFamily:user_from:{user1_messageId, user2_messageId} You've changed two different things between the examples: (1) Whether direction is distinguished by the key or by the CF. (2) Something about the columns, but this isn't clear or necessary to support the change in CF/key structure. What is the second change, and why did you make it? -- David Strauss | da...@fourkitchens.com Four Kitchens | http://fourkitchens.com | +1 512 454 6659 [office] | +1 512 870 8453 [direct] signature.asc Description: OpenPGP digital signature
Re: ColumnFamilies vs composite rows in one table.
On 2010-03-05 18:30, David Strauss wrote: On 2010-03-05 18:04, Erik Holstad wrote: So you can either have ColumnFamilyFrom:userTo:{userFrom-messageid} ColumnFamilyTo:userFrom:{userTo-messageid} or something like ColumnFamily:user_to:{user1_messageId, user2_messageId} ColumnFamily:user_from:{user1_messageId, user2_messageId} You've changed two different things between the examples: (1) Whether direction is distinguished by the key or by the CF. (2) Something about the columns, but this isn't clear or necessary to support the change in CF/key structure. Upon further inspection, the first example appears to use the other party to a message as the column name. That will only allow one messageid for any unique direction, userA, userB. That seems broken to me. -- David Strauss | da...@fourkitchens.com Four Kitchens | http://fourkitchens.com | +1 512 454 6659 [office] | +1 512 870 8453 [direct] signature.asc Description: OpenPGP digital signature
Re: ColumnFamilies vs composite rows in one table.
Generally, you want to have different types of data in different CFs so you can tune them separately (key / row caches). Mixing different row types in one CF also makes doing get_slice_range scans difficult. On Fri, Mar 5, 2010 at 12:04 PM, Erik Holstad erikhols...@gmail.com wrote: What are the benefits of using multiple ColumnFamilies compared to using a composite row name? Example: You have messages that you want to index on sent and to. So you can either have ColumnFamilyFrom:userTo:{userFrom-messageid} ColumnFamilyTo:userFrom:{userTo-messageid} or something like ColumnFamily:user_to:{user1_messageId, user2_messageId} ColumnFamily:user_from:{user1_messageId, user2_messageId} One thing that I can see the advantage of using families are if you want to use different types in the families. But are there others? Like storage space, read/write speeds etc. -- Regards Erik