Re: Multi-DC Deployment

2011-04-21 Thread Peter Schuller
Cassandra doesn't replicate sstable corruptions. It detects corrupt data and only replicates good data. This is incorrect. Depending on the nature of the corruption it may spread to other nodes. Checksumming (done right) would be a great addition to alleiate this. Yes, there is code that tries

Re: Multi-DC Deployment

2011-04-20 Thread Terje Marthinussen
Assuming that you generally put an API on top of this, delivering to two or more systems then boils down to a message queue issue or some similar mechanism which handles secure delivery of messages. Maybe not trivial, but there are many products that can help you with this, and it is a lot easier

Re: Multi-DC Deployment

2011-04-20 Thread Adrian Cockcroft
Queues replicate bad data just as well as anything else. The biggest source of bad data is broken app code... You will still need to implement a reconciliation/repair checker, as queues have their own failure modes when they get backed up. We have also looked at using queues to bounce data between

Re: Multi-DC Deployment

2011-04-19 Thread Terje Marthinussen
Hum... Seems like it could be an idea in a case like this with a mode where result is always returned (if possible), but where a flay saying if the consistency level was met, or to what level it was met (number of nodes answering for instance).? Terje On Tue, Apr 19, 2011 at 1:13 AM, Jonathan

Re: Multi-DC Deployment

2011-04-19 Thread Adrian Cockcroft
If you want to use local quorum for a distributed setup, it doesn't make sense to have less than RF=3 local and remote. Three copies at both ends will give you high availability. Only one copy of the data is sent over the wide area link (with recent versions). There is no need to use mirrored or

Re: Multi-DC Deployment

2011-04-19 Thread Terje Marthinussen
If you have RF=3 in both datacenters, it could be discussed if there is a point to use the built in replication in Cassandra at all vs. feeding the data to both datacenters and get 2 100% isolated cassandra instances that cannot replicate sstable corruptions between each others My point is

Re: Multi-DC Deployment

2011-04-18 Thread Jonathan Ellis
They will timeout until failure detector realizes the DC1 nodes are down (~10 seconds). After that they will immediately return UnavailableException until DC1 comes back up. On Mon, Apr 18, 2011 at 10:43 AM, Baskar Duraikannu baskar.duraikannu...@gmail.com wrote: We are planning to deploy