understanding the cassandra storage scaling

2010-12-09 Thread Jonathan Colby
I have a very basic question which I have been unable to find in online documentation on cassandra. It seems like every node in a cassandra cluster contains all the data ever stored in the cluster (i.e., all nodes are identical). I don't understand how you can scale this on commodity servers

Re: understanding the cassandra storage scaling

2010-12-09 Thread Ran Tavory
there are two numbers to look at, N the numbers of hosts in the ring (cluster) and R the number of replicas for each data item. R is configurable per column family. Typically for large clusters N R. For very small clusters if makes sense for R to be close to N in which case cassandra is useful so

unsubscribe

2010-12-09 Thread Massimo Carro
Massimo Carro www.liquida.it - www.liquida.com

Re: understanding the cassandra storage scaling

2010-12-09 Thread Jonathan Colby
Thanks Ran. This helps a little but unfortunately I'm still a bit fuzzy for me. So is it not true that each node contains all the data in the cluster? I haven't come across any information on how clustered data is coordinated in cassandra. how does my query get directed to the right node? On

Re: understanding the cassandra storage scaling

2010-12-09 Thread Ran Tavory
So is it not true that each node contains all the data in the cluster? No, not in the general case, in fact rarely is it the case. Usually RN. In my case I have N=6 and R=2. You configure R per CF under ReplicationFactor (v0.6.*) or replication_factor (v0.7.*).

Re: understanding the cassandra storage scaling

2010-12-09 Thread Sylvain Lebresne
This helps a little but unfortunately I'm still a bit fuzzy for me. So is it not true that each node contains all the data in the cluster? Not at all. Basically each node is responsible of only a part of the data (a range really). But for each data you can choose on how many nodes it is; this

Re: understanding the cassandra storage scaling

2010-12-09 Thread Jonathan Colby
awesome! Thank you guys for the really quick answers and the links to the presentations. On Thu, Dec 9, 2010 at 12:06 PM, Sylvain Lebresne sylv...@yakaz.com wrote: This helps a little but unfortunately I'm still a bit fuzzy for me.  So is it not true that each node contains all the data in the

N to N relationships

2010-12-09 Thread Sébastien Druon
Hello, For a specific case, we are thinking about representing a N to N relationship with a NxN Matrix in Cassandra. The relations will be only between a subset of elements, so the Matrix will mostly contain empty elements. We have a set of questions concerning this: - what is the best way to

Re: N to N relationships

2010-12-09 Thread David Boxenhorn
How about a regular CF where keys are n...@n ? Then, getting a matrix row would be the same cost as getting a matrix column (N gets), and it would be very easy to add element N+1. On Thu, Dec 9, 2010 at 1:48 PM, Sébastien Druon sdr...@spotuse.com wrote: Hello, For a specific case, we are

Secondary indexes change everything?

2010-12-09 Thread David Boxenhorn
It seems to me that secondary indexes (new in 0.7) change everything when it comes to data modeling. - OOP becomes obsolete - primary indexes become obsolete if you ever want to do a range query (which you probably will...), better to assign a random row id Taken together, it's likely that very

Re: Secondary indexes change everything?

2010-12-09 Thread David Boxenhorn
- OPP becomes obsolete (OOP is not obsolete!) - primary indexes become obsolete if you ever want to do a range query (which you probably will...), better to assign a random row id Taken together, it's likely that very little will remain of your old database schema... Am I right?

Quorum: killing 1 out of 3 server kills the cluster (?)

2010-12-09 Thread Timo Nentwig
Hi! I've 3 servers running (0.7rc1) with a replication_factor of 2 and use quorum for writes. But when I shut down one of them UnavailableExceptions are thrown. Why is that? Isn't that the sense of quorum and a fault-tolerant DB that it continues with the remaining 2 nodes and redistributes

Re: Quorum: killing 1 out of 3 server kills the cluster (?)

2010-12-09 Thread Thibaut Britz
Hi, The UnavailableExceptions will be thrown because quorum of size 2 needs at least 2 nodes to be alive (as for qurom of size 3 as well). The data won't be automatically redistributed to other nodes. Thibaut On Thu, Dec 9, 2010 at 4:40 PM, Timo Nentwig timo.nent...@toptarif.de wrote: Hi!

Re: unsubscribe

2010-12-09 Thread Eric Evans
On Thu, 2010-12-09 at 11:42 +0100, Massimo Carro wrote: Massimo Carro www.liquida.it - www.liquida.com http://wiki.apache.org/cassandra/FAQ#unsubscribe -- Eric Evans eev...@rackspace.com

Re: Quorum: killing 1 out of 3 server kills the cluster (?)

2010-12-09 Thread Daniel Lundin
Quorum is really only useful when RF 2, since the for a quorum to succeed RF/2+1 replicas must be available. This means for RF = 2, consistency levels QUORUM and ALL yield the same result. /d On Thu, Dec 9, 2010 at 4:40 PM, Timo Nentwig timo.nent...@toptarif.de wrote: Hi! I've 3 servers

Re: Quorum: killing 1 out of 3 server kills the cluster (?)

2010-12-09 Thread Timo Nentwig
On Dec 9, 2010, at 16:50, Daniel Lundin wrote: Quorum is really only useful when RF 2, since the for a quorum to succeed RF/2+1 replicas must be available. 2/2+1==2 and I killed 1 of 3, so... don't get it. This means for RF = 2, consistency levels QUORUM and ALL yield the same result.

RE: Quorum: killing 1 out of 3 server kills the cluster (?)

2010-12-09 Thread Viktor Jevdokimov
With 3 nodes and RF=2 you have 3 key ranges: N1+N2, N2+N3 and N3+N1. Killing N1 you've got only 1 alive range N2+N3 and 2/3 of the range is down for Quorum, which is actually all, so N1+N2 and N3+N1 fails. -Original Message- From: Timo Nentwig [mailto:timo.nent...@toptarif.de] Sent:

Re: Quorum: killing 1 out of 3 server kills the cluster (?)

2010-12-09 Thread Sylvain Lebresne
I'ts 2 out of the number of replicas, not the number of nodes. At RF=2, you have 2 replicas. And since quorum is also 2 with that replication factor, you cannot lose a node, otherwise some query will end up as UnavailableException. Again, this is not related to the total number of nodes. Even

Re: Quorum: killing 1 out of 3 server kills the cluster (?)

2010-12-09 Thread David Boxenhorn
In other words, if you want to use QUORUM, you need to set RF=3. (I know because I had exactly the same problem.) On Thu, Dec 9, 2010 at 6:05 PM, Sylvain Lebresne sylv...@yakaz.com wrote: I'ts 2 out of the number of replicas, not the number of nodes. At RF=2, you have 2 replicas. And since

Re: Quorum: killing 1 out of 3 server kills the cluster (?)

2010-12-09 Thread Timo Nentwig
On Dec 9, 2010, at 17:39, David Boxenhorn wrote: In other words, if you want to use QUORUM, you need to set RF=3. (I know because I had exactly the same problem.) I naively assume that if I kill either node that holds N1 (i.e. node 1 or 3), N1 will still remain on another node. Only if

Re: Quorum: killing 1 out of 3 server kills the cluster (?)

2010-12-09 Thread David Boxenhorn
If that is what you want, use CL=ONE On Thu, Dec 9, 2010 at 6:43 PM, Timo Nentwig timo.nent...@toptarif.dewrote: On Dec 9, 2010, at 17:39, David Boxenhorn wrote: In other words, if you want to use QUORUM, you need to set RF=3. (I know because I had exactly the same problem.) I naively

Re: Quorum: killing 1 out of 3 server kills the cluster (?)

2010-12-09 Thread Nick Bailey
On Thu, Dec 9, 2010 at 10:43 AM, Timo Nentwig timo.nent...@toptarif.dewrote: On Dec 9, 2010, at 17:39, David Boxenhorn wrote: In other words, if you want to use QUORUM, you need to set RF=3. (I know because I had exactly the same problem.) I naively assume that if I kill either node

Re: Quorum: killing 1 out of 3 server kills the cluster (?)

2010-12-09 Thread Sylvain Lebresne
I naively assume that if I kill either node that holds N1 (i.e. node 1 or 3), N1 will still remain on another node. Only if both fail, I actually lose data. But apparently this is not how it works... Sure, the data that N1 holds is also on another node and you won't lose it by only losing

Cassandra and disk space

2010-12-09 Thread Mark
I recently ran into a problem during a repair operation where my nodes completely ran out of space and my whole cluster was... well, clusterfucked. I want to make sure how to prevent this problem in the future. Should I make sure that at all times every node is under 50% of its disk space?

Re: Quorum: killing 1 out of 3 server kills the cluster (?)

2010-12-09 Thread Timo Nentwig
On Dec 9, 2010, at 17:55, Sylvain Lebresne wrote: I naively assume that if I kill either node that holds N1 (i.e. node 1 or 3), N1 will still remain on another node. Only if both fail, I actually lose data. But apparently this is not how it works... Sure, the data that N1 holds is also

Re: N to N relationships

2010-12-09 Thread Sébastien Druon
Thanks a lot for the answer What about the indexing when adding a new element? Is it incremental? Thanks again On 9 December 2010 14:38, David Boxenhorn da...@lookin2.com wrote: How about a regular CF where keys are n...@n ? Then, getting a matrix row would be the same cost as getting a

Re: N to N relationships

2010-12-09 Thread David Boxenhorn
What do you mean by indexing? On Thu, Dec 9, 2010 at 7:30 PM, Sébastien Druon sdr...@spotuse.com wrote: Thanks a lot for the answer What about the indexing when adding a new element? Is it incremental? Thanks again On 9 December 2010 14:38, David Boxenhorn da...@lookin2.com wrote: How

Re: Secondary indexes change everything?

2010-12-09 Thread Tyler Hobbs
OPP is not yet obsolete. The included secondary indexes still aren't good at finding keys for ranges of indexed values, such as name 'b' and name 'c' . This is something that an OPP index would be good at. Of course, you can do something similar with one or more rows, so it's not that big of

Re: Quorum: killing 1 out of 3 server kills the cluster (?)

2010-12-09 Thread Tyler Hobbs
If you switch your writes to CL ONE when a failure occurs, you might as well use ONE for all writes. ONE and QUORUM behave the same when all nodes are working correctly. - Tyler On Thu, Dec 9, 2010 at 11:26 AM, Timo Nentwig timo.nent...@toptarif.dewrote: On Dec 9, 2010, at 17:55, Sylvain

Re: Quorum: killing 1 out of 3 server kills the cluster (?)

2010-12-09 Thread Sylvain Lebresne
And my application would fall back to ONE. Quorum writes will also fail so I would also use ONE so that the app stays up. What would I have to do make the data to redistribute when the broken node is up again? Simply call nodetool repair on it? There is 3 mechanisms for that: - hinted

Re: Cassandra and disk space

2010-12-09 Thread Peter Schuller
I recently ran into a problem during a repair operation where my nodes completely ran out of space and my whole cluster was... well, clusterfucked. I want to make sure how to prevent this problem in the future. Depending on which version you're on, you may be seeing this:

Re: Cassandra and disk space

2010-12-09 Thread Tyler Hobbs
If you are on 0.6, repair is particularly dangerous with respect to disk space usage. If your replica is sufficiently out of sync, you can triple your disk usage pretty easily. This has been improved in 0.7, so repairs should use about half as much disk space, on average. In general, yes, keep

Re: N to N relationships

2010-12-09 Thread Sébastien Druon
I mean if I have secondary indexes. Apparently they are calculated in the background... On 9 December 2010 18:33, David Boxenhorn da...@lookin2.com wrote: What do you mean by indexing? On Thu, Dec 9, 2010 at 7:30 PM, Sébastien Druon sdr...@spotuse.comwrote: Thanks a lot for the answer

Re: Cassandra and disk space

2010-12-09 Thread Rustam Aliyev
Is there any plans to improve this in future? For big data clusters this could be very expensive. Based on your comment, I will need 200TB of storage for 100TB of data to keep Cassandra running. -- Rustam. On 09/12/2010 17:56, Tyler Hobbs wrote: If you are on 0.6, repair is particularly

Stuck with adding nodes

2010-12-09 Thread Daniel Doubleday
Hi good people. I underestimated load during peak times and now I'm stuck with our production cluster. Right now its 3 nodes, rf 3 so everything is everywhere. We have ~300GB data load. ~10MB/sec incoming traffic and ~50 (peak) reads/sec to the cluster The problem derives from our quorum read

Re: Stuck with adding nodes

2010-12-09 Thread Peter Schuller
Currently I am copying all data files (thats all existing data) from one node to the new nodes in hope that I could than manually assign them their new tokenrange (nodetool move) and do cleanup. Unless I'm misunderstanding you I believe you should be setting the initial token. nodetool move

Re: Cassandra and disk space

2010-12-09 Thread Tyler Hobbs
That depends on your scenario. In the worst case of one big CF, there's not much that can be easily done for the disk usage of compaction and cleanup (which is essentially compaction). If, instead, you have several column families and no single CF makes up the majority of your data, you can push

Re: Cassandra and disk space

2010-12-09 Thread Scott Dworkis
i recently finished a practice expansion of 4 nodes to 5 nodes, a series of nodetool move, nodetool cleanup and jmx gc steps. i found that in some of the steps, disk usage actually grew to 2.5x the base data size on one of the nodes. i'm using 0.6.4. -scott On Thu, 9 Dec 2010, Rustam

Re: N to N relationships

2010-12-09 Thread Aaron Morton
Am assuming you have one matrix and you know the dimensions. Also as you say the most important queries are to get an entire column or an entire row.I would consider using a standard CF for the Columns and one for the Rows. The key for each would be the col / row number, each cassandra column name

Re: Secondary indexes change everything?

2010-12-09 Thread Jonathan Ellis
On Thu, Dec 9, 2010 at 12:16 PM, David Boxenhorn da...@lookin2.com wrote: What do you mean by, The included secondary indexes still aren't good at finding keys for ranges of indexed values, such as name 'b' and name 'c' .? Do you mean that secondary indexes don't support range queries at

Re: Running multiple instances on a single server --micrandra ??

2010-12-09 Thread Ryan King
Overall, I don't think this is a crazy idea, though I think I'd prefer cassandra to manage this setup. The problem you will run into is that because the storage port is assumed to be the same across the cluster you'll only be able to do this if you can assign multiple IPs to each server (one for

Obscured question about data size in a Column Family

2010-12-09 Thread Joshua Partogi
Hi there, Quoting an information in the wiki about Cassandra limitations ( http://wiki.apache.org/cassandra/CassandraLimitations): ... So all the data from a given columnfamily/key pair had to fit in memory, or 2GB ... Does this mean 1. A ColumnFamily can only be 2GB of data 2. A Column

Re: Cassandra and disk space

2010-12-09 Thread Rustam Aliyev
That depends on your scenario. In the worst case of one big CF, there's not much that can be easily done for the disk usage of compaction and cleanup (which is essentially compaction). If, instead, you have several column families and no single CF makes up the majority of your data, you

Re: Cassandra and disk space

2010-12-09 Thread Tyler Hobbs
Yes, that's correct, but I wouldn't push it too far. You'll become much more sensitive to disk usage changes; in particular, rebalancing your cluster will particularly difficult, and repair will also become dangerous. Disk performance also tends to drop when a disk nears capacity. There's no

Re: Cassandra and disk space

2010-12-09 Thread Nick Bailey
Additionally, cleanup will fail to run when the disk is more than 50% full. Another reason to stay below 50%. On Thu, Dec 9, 2010 at 6:03 PM, Tyler Hobbs ty...@riptano.com wrote: Yes, that's correct, but I wouldn't push it too far. You'll become much more sensitive to disk usage changes; in

Re: Cassandra and disk space

2010-12-09 Thread Rustam Aliyev
Thanks Tyler, this is really useful. Also, I noticed that you can specify multiple data file directories located on different disks. Let's say if I have machine with 4 x 500GB drives, what would be the difference between following 2 setups: 1. each drive mounted separately and has data

Re: Cassandra and disk space

2010-12-09 Thread Robert Coli
On Thu, Dec 9, 2010 at 4:20 PM, Rustam Aliyev rus...@code.az wrote: Thanks Tyler, this is really useful. [ RAID0 vs JBOD question ] In other words, does splitting data folder into smaller ones bring any performance or stability advantages? This is getting to be a FAQ, so here's my stock

Re: Cassandra and disk space

2010-12-09 Thread Brandon Williams
On Thu, Dec 9, 2010 at 6:20 PM, Rustam Aliyev rus...@code.az wrote: Also, I noticed that you can specify multiple data file directories located on different disks. Let's say if I have machine with 4 x 500GB drives, what would be the difference between following 2 setups: 1. each drive

Re: [OT] shout out for riptano training

2010-12-09 Thread Sal Fuentes
I second that as well. I actually found the training to be fun (love the new stuff in 0.7.0) and quite interesting. Now I'm looking forward to the next Cassandra Summit. Thank you Riptano. On Thu, Dec 9, 2010 at 2:48 PM, Dave Viner davevi...@gmail.com wrote: Just wanted to give a shout-out to

Re: N to N relationships

2010-12-09 Thread Nick Bailey
I would also recommend two column families. Storing the key as NxN would require you to hit multiple machines to query for an entire row or column with RandomPartitioner. Even with OPP you would need to pick row or columns to order by and the other would require hitting multiple machines. Two

[RELEASE] 0.7.0 rc2

2010-12-09 Thread Eric Evans
I'd have thought all that turkey and stuffing would have done more damage to momentum, but judging by the number of bug-fixes in the last couple of weeks, that isn't the case. As usual, I'd be remiss if I didn't point out that this is not yet a stable release. It's getting pretty close, but

Re: Obscured question about data size in a Column Family

2010-12-09 Thread Jonathan Ellis
In = 0.6 (but not 0.7) a row could not be larger than 2GB. 2GB is still the largest possible column value. On Thu, Dec 9, 2010 at 5:38 PM, Joshua Partogi joshua.j...@gmail.com wrote: Hi there, Quoting an information in the wiki about Cassandra limitations

Re: NullPointerException in Beta3 and rc1

2010-12-09 Thread Wenjun Che
describe_schema_versions() returns a MapString, ListString with one entry. The key is an UUID and ListString has one element, which is IP of my machine. I think this has something to do with 'truncate' command in CLI, I can reproduce by: 1. create a CF with column1 as a secondary index 2. add

Re: Cassandra and disk space

2010-12-09 Thread Bill de hÓra
This is true, but for larger installations I end up needing more servers to hold the disks, more racks to hold the servers the point where the overall cost per GB climbs (granted the cost per IOP is probably still good). AIUI, a chunk of that 50% is replicated data such that the truly available

Re: NullPointerException in Beta3 and rc1

2010-12-09 Thread Jonathan Ellis
Can you still reproduce this with rc2, after starting with an empty data and commitlog directory? There used to be a bug w/ truncate + 2ary indexes but that should be fixed now. On Thu, Dec 9, 2010 at 8:53 PM, Wenjun Che wen...@openf.in wrote: describe_schema_versions()  returns a MapString,

Re: Running multiple instances on a single server --micrandra ??

2010-12-09 Thread Bill de hÓra
On Tue, 2010-12-07 at 21:25 -0500, Edward Capriolo wrote: The idea behind micrandra is for a 6 disk system run 6 instances of Cassandra, one per disk. Use the RackAwareSnitch to make sure no replicas live on the same node. The downsides 1) we would have to manage 6x the instances of