Re: Data Model - Additional Column Families or one CF?

Hiller, Dean Tue, 26 Feb 2013 07:28:09 -0800

The bottleneck is RAM, each CF uses more RAM.  We tried to go above 15000 
column families and that hurt big time so we added a feature to PlayOrm and now 
have 60,000 virtual Column families all in one column family.  This turned out 
to be HUGE benefit though as those 60,000 tables now have been easy to modify 
cluster settings in one shot.  Right now, we are modifying the false positive 
ratio to free up ram on just one column family that all 60,000 run in.


You can always follow the same pattern PlayOrm uses which is just to prefix 
every row key but we preferred something else do the heavy lifting for us (ie. 
PlayOrm), plus the command line tool isn't too bad for inspecting the data with 
PlayOrm either.  Of course, we now have 7 billion rows and that is a new issue 
for us.  While we could just add more nodes and scale, they are pushing us to 
make it even more cost effective so we are trying to scale on a single node 
first then scale out.

Later,
Dean

From: Javier Sotelo 
<javier.a.sot...@gmail.com<mailto:javier.a.sot...@gmail.com>>
Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Date: Tuesday, February 26, 2013 12:27 AM
To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Subject: Re: Data Model - Additional Column Families or one CF?

Aaron,

Would 50 CFs be pushing it? According to 
http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-0-improved-memory-and-disk-space-management,
 "This has been tested to work across hundreds or even thousands of 
ColumnFamilies."

What is the bottleneck, IO?

Thanks,
Javier


On Sun, Feb 24, 2013 at 5:51 PM, Adam Venturella 
<aventure...@gmail.com<mailto:aventure...@gmail.com>> wrote:

Thanks Aaron, this was a big help!

—
Sent from Mailbox<https://bit.ly/SZvoJe> for iPhone



On Thu, Feb 21, 2013 at 9:27 AM, aaron morton 
<aa...@thelastpickle.com<mailto:aa...@thelastpickle.com>> wrote:

If you have a limited / known number (say < 30)  of types, I would create a CF 
for each of them.

If the number of types is unknown or very large I would have one CF with the 
row key you described.

Generally I avoid data models that require new CF's as the data grows. 
Additionally having different CF's allows you to use different cache settings, 
compactions settings and even storage mediums.

Cheers

-----------------
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 21/02/2013, at 7:43 AM, Adam Venturella 
<aventure...@gmail.com<mailto:aventure...@gmail.com>> wrote:

My data needs only require me to store JSON, and I can handle this in 1 column 
family by prefixing row keys with a type, for example:

comments:{message_id}

Where comments: represents the prefix and {message_id} represents some row key 
to a message object in the same column family.

In this case comments:{message_id} would be a wide row using comment creation 
time and descending clustering order to sort the messages as they are added.

My question is, would I be better off splitting comments into their own Column 
Family or is storing them in with the Messages Column Family sufficient, they 
are all messages after all.

Or do Column Families really just provide a nice organizational front for data. 
I'm just storing JSON.

Re: Data Model - Additional Column Families or one CF?

Reply via email to