Does anyone know of any good benchmark data about cassandra replication across data centers? I'm aware of the articles below.
This article http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html from netflix is about benchmarking Cassandra scalability using AWS. It shows linear scaling up to 288 nodes. I don't see much data about the cost of cross-data-center replication. This compares AWS, Rackspace and Google cloud performance for Cassandra: http://www.stackdriver.com/cassandra-aws-gce-rackspace/. AWS does well. These Netflix slides say they run a nightly repair job to make sure everything stays consistent: http://www.slideshare.net/adrianco/cassandra-performance-on-aws They also talk about backing up Cassandra. Data size reached only 5GB per node (tiny!). Slide 35 says there's a 100+ mls latency between US and EU datacenters. Slide 36 shows how to add a new datacenter with no down time (pre-load from a back-up), then do repair jobs. http://www.odbms.org/blog/2011/05/measuring-the-scalability-of-sql-and-nosql-systems/ compares cassandra with three other systems. Cassandra performed well. They open-sourced the benchmark code: it is available here<https://github.com/brianfrankcooper/YCSB>. Complex. Thanks, Don Donald A. Smith | Senior Software Engineer P: 425.201.3900 x 3866 C: (206) 819-5965 F: (646) 443-2333 dona...@audiencescience.com<mailto:dona...@audiencescience.com> [AudienceScience]
<<inline: image001.jpg>>