I concur with Mark, Rackspace has some poor performance with Riak. On Fri, Apr 1, 2011 at 1:00 PM, Mark Steele <[email protected]> wrote: > I've done some rather disappointing tests with Riak using Rackspace cloud > servers. Much better off on dedicated hardware if you can find it. > Mark Steele > Bering Media Inc. > > > On Fri, Apr 1, 2011 at 11:51 AM, David Dawson <[email protected]> > wrote: >> >> Mathias and Alexander, >> >> Thanks for both of your replies they were very informative and >> really have helped me to make my mind up, but to summarise: >> >> - If you want good predictable performance but you are happy to >> live with the risk of loosing some of your data ( in the event of a cluster >> failure where the number of nodes that fail > than your n_val ) then run >> with the local ephemeral storage in RAID 5 or 10 and take snapshots of the >> data periodically or run in dual DC mode with replication. >> - If you want 100% assurance that your data is available and you >> are happy with unpredictable performance then use EBS. >> - If you want 100% assurance that your data is available and also >> has predictable performance then Amazon EC2 is not the most optimal choice. >> >> In our scenario we are doing a equal amount of reads and writes, >> and will need to guarantee about 32K ops/sec from a RIAK cluster over a 2 >> hour period with minimal risks of a outage or drop in performance, hence I >> am guessing that maybe EC2 is not the right choice for us. We are going to >> look at Joyent as an alternative, that said has anyone else used other >> solutions e.g. RackSpace cloud? >> >> Dave >> >> >> On 1 Apr 2011, at 14:21, Mathias Meyer wrote: >> >> > Hi David, >> > >> > Alexander already gave you a good rundown on EC2 and Riak, but let me >> > add some of my own experiences running databases on EC2 in general. >> > >> > The short answer is, Riak is certainly successfully used in production >> > on EC2, so nothing should hold you back from testing a setup on EC2. But >> > there's a whole bunch of things you should keep in mind. >> > >> > First, it's probably a good idea to avoid using ephemeral storage as >> > persistent storage. Even though it rarely happens, instances can crash on >> > EC2 for any kind of reason, mostly a hardware failure of the underlying >> > host >> > of course. >> > >> > Cluster compute instances offer especially high CPU power, but what you >> > really want is really fast and reliable storage I/O, persisted for eternity >> > if need be. CC instances are certainly a lot better than any other instance >> > in that terms of general I/O (see [2] for a comparison), but fall prey to >> > similar limitations in terms of network storage I/O as other instance >> > types, >> > see below. >> > >> > The RAID 0'd ephemeral storage on the cluster compute instances may >> > sound good in theory in terms of performance, but in practice it takes away >> > data durability in case of a single disk failure. One disk fails, and the >> > data on that node is gone. Depending on what kinds of seek your doing, an >> > EBS setup may even turn out to be faster. See [6] and [4] for a comparison >> > and some initial and extended measurements, and [7] for another comparison. >> > But certainly, the cluster compute instance's ephemeral storage can achieve >> > a good amount of throughput, see [5] for some pretty graphs comparing both >> > RAID and non-RAID setups. >> > >> > As Alexander pointed out, multiple instance failures can make this >> > scenario a real killer, though you end up with the same risks as running on >> > raw iron servers. Both ephemeral storage and EBS don't make the problem of >> > a >> > proper backup disappear. You could e.g. run off ephemeral storage, relying >> > on both Riak's replication and a good backup e.g. to an EBS volume or to >> > S3. >> > >> > EBS on the other hand is prone to a large variance in network latency, >> > making performance at any point unpredictable and unreliable. Every >> > measurement you take is likely to be different an hour later. This may >> > sound >> > extreme, but it turns out to be a very big issue for databases where >> > there's >> > lots of disk I/O involved to read and write data, as is the case with >> > Riak's >> > Bitcask storage. >> > >> > You can increase the performance and reliability of EBS by using a RAID >> > of volumes. Preferrably go for a RAID 5 or RAID 10 to add redundancy. >> > There's mixed opinions on whether that's really necessary on EBS, with >> > Amazon keeping the data redundant on their end as well, but in general, >> > it's >> > a good tradeoff between increased performance through striping and >> > increased >> > redundancy through mirroring. [1] has a good summary of when it's better to >> > choose RAID 5 vs. 10. >> > >> > RAID 0 will obviously bring the best performance, it's certainly a valid >> > setup. We've been running RAID 0 setups with 4 volumes, and got great >> > improvements over a single volume. You're also likely to achieve more >> > throughput on bigger instances with a setup like this. The caveat once >> > again >> > is that one corrupted volume is enough to make a RAID 0 setup unusable. >> > >> > Another crazy thought is to setup a RAID striping across a bunch of >> > ephemeral drives and EBS volumes, maximizing throughput on both local and >> > network storage. But know what you're getting yourself into with this kind >> > of setup, especially when your write load is a lot heavier then the >> > available network bandwidth can handle, a scenario where your network >> > volumes will never be able to catch up with the local storage. >> > >> > All that said, EBS I/O sure is reasonably fast, but it depends on your >> > particular use case and performance requirements. It's also worth noting >> > that the I/O capabilities of EBS increase with the instance size. The >> > bigger >> > your instance, the more throughput you'll achieve (see [3]). Bigger >> > instances tend to have better network throughput in general, with cluster >> > compute instances obviously having some of the highest bandwidth available. >> > >> > All this turns out to be much less of a problem when data can be held in >> > memory very easily, e.g. with Innostore, where you can read and write >> > to/from cache buffers first and then have InnoDB take care of flushing to >> > disk. >> > >> > Personally, I don't think you're overcomplicating things in regard to >> > multiple availability zones, it's a good idea to do that, when highest >> > availability possible is your goal, as when it's usually just a single >> > availability zone that's affected by increased latency or network timeouts, >> > but as Alexander said, you should think about having cross-datacenter >> > replication in that scenario, as availability zones are data centers >> > located >> > in different physical locations. Usually they're not that far apart, but >> > far >> > enough to increase latency considerably. But as always, it depends on your >> > particular use case. >> > >> > Now, after all this realtalk, here's the kicker. Riak's way of >> > replicating data can make both scenarios work. When it's ensured that your >> > data is replicated on more than one node, it can work in both ways. You >> > could use both ephemeral storage and be somewhat safe because data will >> > reside on multiple nodes. The same is true for EBS volumes, as potential >> > variances in I/O or even minutes of total unavailabilities (as seen on the >> > recent Reddit outage) can be recovered a lot easier thanks to handoff and >> > read repairs. You can increase the number of replicas (n_val) to increase >> > your tolerance of instance failure, just make sure that n_val is less than >> > the number of nodes in your cluster. >> > >> > Don't get me wrong, I love EC2 and EBS, being able to spin up servers at >> > any time and to attache more storage to a running instance is extremely >> > powerful, when you can handle the downsides. But if very low latency is >> > what >> > you're looking for, raw iron with lots of memory and SSD as storage device >> > thrown on top is hard to beat. >> > >> > When in doubt, start with a RAID 0 setup on EBS with 4 volumes, and >> > compare it with a RAID 5 in terms of performance. They're known to give a >> > good enough performance in a lot of cases. If you decide to go with a RAID, >> > be sure to add LVM on top for simpler snapshotting, which will be quite >> > painful if not impossible to get consistent snapshots using just EBS >> > snapshots on a bunch of striped volumes. >> > >> > Let us know if you have more questions, there's lots of details involved >> > when you're going under the hood, but this should cover the most important >> > bases. >> > >> > Mathias Meyer >> > Developer Advocate, Basho Technologies >> > >> > [1] >> > http://en.wikipedia.org/wiki/RAID#RAID_10_versus_RAID_5_in_Relational_Databases >> > [2] >> > http://blog.cloudharmony.com/2010/09/benchmarking-of-ec2s-new-cluster.html >> > [3] >> > http://blog.cloudharmony.com/2010/06/disk-io-benchmarking-in-cloud.html >> > [4] >> > http://blog.bioteam.net/2010/07/boot-ephemeral-ebs-storage-performance-on-amazon-cc1-4xlarge-instance-types/ >> > [5] >> > http://blog.bioteam.net/2010/07/local-storage-performance-of-aws-cluster-compute-instances/ >> > [6] >> > http://blog.bioteam.net/2010/07/preliminary-ebs-performance-tests-on-amazon-compute-cluster-cc1-4xlarge-instance-types/ >> > [7] http://victortrac.com/EC2_Ephemeral_Disks_vs_EBS_Volumes >> > >> > On Mittwoch, 30. März 2011 at 18:29, David Dawson wrote: >> >> I am not sure if this has already been discussed, but I am looking at >> >> the feasibility of running RIAK in a EC2 cloud, as we have a requirement >> >> that may require us to scale up and down quite considerably on a month by >> >> month basis. After some initial testing and investigation we have come to >> >> the conclusion that there are 2 solutions although both have their >> >> downsides >> >> in my opinion: >> >> >> >> 1. Run multiple cluster compute( cc1.4xlarge ) instances ( 23 GB RAM, >> >> 10 Gigabit ethernet, 2 x 845 GB disks running RAID 0 ) >> >> 2. Same as above but using EBS as the storage instead of the local >> >> disks. >> >> >> >> The problems I see are as follows with solution 1: >> >> >> >> - A instance failure results in complete loss of data on that machine, >> >> as the disks are ephemeral storage ( e.g. they only exist whilst the >> >> machine >> >> is up ). >> >> >> >> The problems I see are as follows with solution 2: >> >> >> >> - EBS is slower than the local disks and from what I have read is >> >> susceptible to latency depending on factors out of your control. >> >> - There has been a bit of press lately about availability problems with >> >> EBS, so we would have to use multiple availability zones although there >> >> are >> >> only 4 in total and it just seems as though I am over complicating things. >> >> >> >> Has anyone used EC2 and RIAK in production and if so what are their >> >> experiences? >> >> >> >> Otherwise has anyone used RackSpace or Joyent? as these are >> >> alternatives although the Joyent solution seems very expensive, and what >> >> are >> >> their experiences? >> >> >> >> Dave >> >> _______________________________________________ >> >> riak-users mailing list >> >> [email protected] >> >> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >> > >> >> >> _______________________________________________ >> riak-users mailing list >> [email protected] >> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > > > _______________________________________________ > riak-users mailing list > [email protected] > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > >
_______________________________________________ riak-users mailing list [email protected] http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
