Re: Cassandra on iSCSI?
So if one is forced to use a SAN, how should you set up Cassandra is the interesting question - to me! Here are some thoughts:- 1. Ensure that each node gets dedicated - not shared - LUNs 2. Ensure that these LUNs do share spindles, or nodes will seize to be isolatable (this will be tough to get, given how SAN administrators think about this) 3. Most SANs deliver performance by striping (RAID 0) - sacrifice striping for isolation if push comes to shove 4. Do not share data directories from multiple nodes onto a single location via NFS or CFS for example. They are cool in shared resource environments, but breaks the premise behind Cassandra. All data storage should be private to the cassandra node, even when on shared storage 5. Do not change any assumption around Replication Factor (RF) or Consistency Level (CL) due to the shared storage - in fact if anything, increase your replication factor because you now have potential SPOF storage. That was gold, and lead to a direct conversation between provider and developer. Various tests showed IOPS will often be at 5k per node. Therefore the iSCSI solution would need to be tailored to handle it. Just like mentioned above our provider simply couldn't provide us so much disk per server. But after a good discussion it became obvious (doh!) that the application can actually save a lot of disk by using different keyspaces with different RF. We have raw data that needs to be collected, but can be temporarily unavailable for reading, hence RF=1 makes sense. This raw data is the vast bulk of the data so this saves lots of disk space. The aggregated data, which is relatively small in comparison, is critical for the application to read so we can keep in a separate keyspace with higher RF... ~mck -- “Anyone who lives within their means suffers from a lack of imagination.” - Oscar Wilde | http://semb.wever.org | http://sesat.no | http://finn.no | Java XSS Filter signature.asc Description: This is a digitally signed message part
Re: Cassandra on iSCSI?
Of course with a SAN you'd want RF=1 since it's replicating internally. Isn't this the same case for raid-5 as well? And we want RF=2 if we need to keep reading while doing rolling restarts? ~mck -- “Anyone who lives within their means suffers from a lack of imagination.” - Oscar Wilde | http://semb.wever.org | http://sesat.no | http://finn.no | Java XSS Filter signature.asc Description: This is a digitally signed message part
Re: Cassandra on iSCSI?
[OT] They're quoting roughly the same price for both (claiming that the extra cost goes into having for each node a separate disk cabinet to run local raid-5). You might not need raid-5 for local attached storage. Yes we did ask. But raid-5 is the minimum being offered from our hosting provider... We could go to raid 10, but raid 0 is out of the question... ~mck -- To be young, really young, takes a very long time. Picasso | http://semb.wever.org | http://sesat.no | http://finn.no | Java XSS Filter signature.asc Description: This is a digitally signed message part
Re: Cassandra on iSCSI?
On Fri, Jan 21, 2011 at 2:19 AM, Mick Semb Wever m...@apache.org wrote: Of course with a SAN you'd want RF=1 since it's replicating internally. Isn't this the same case for raid-5 as well? No, because the replication is (mainly) to protect you from machine failures; if the SAN is a SPOF then putting more replicas on it doesn't help. And we want RF=2 if we need to keep reading while doing rolling restarts? Yes. -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com
Re: Cassandra on iSCSI?
On Fri, Jan 21, 2011 at 12:07 PM, Jonathan Ellis jbel...@gmail.com wrote: On Fri, Jan 21, 2011 at 2:19 AM, Mick Semb Wever m...@apache.org wrote: Of course with a SAN you'd want RF=1 since it's replicating internally. Isn't this the same case for raid-5 as well? No, because the replication is (mainly) to protect you from machine failures; if the SAN is a SPOF then putting more replicas on it doesn't help. And we want RF=2 if we need to keep reading while doing rolling restarts? Yes. -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com If you are using cassandra with a SAN RF=1 makes sense because we are making the assumption the san is already replicating your data. RF2 makes good sense to be not effected by outages. Another alternative is something like linux-HA and manage each cassandra instance as a resource. This way if a head goes down another node linux ha would detect the failure and bring up that instance on another physical piece of hardware. Using LinuxHA+SAN+Cassandra would actually bring Cassandra closer to the hbase model which you have a distributed file system but the front end Cassandra acts like a region server.
Re: Cassandra on iSCSI?
Sort of - do not agree!! This is the Shared nothing V/s Shared Disk debate. There are many mainstream RDBMS products that pretend to do horizontal scalability with Shared Disks. They have the kinds of problems that Cassandra is specifically architected to avoid! The original question here has 2 aspects to it:- 1. Is iSCSI SAN good enough - My take is that it is still the poor man's SAN as compared to FC based SANs. Having said that, they have found increasing adoption and the performance penalty is really marginal. Couple that with the fact that Cassandra is architected to reduce the need for high performance storage systems via features like reducing of random writes etc. So net net - a reasonable iSCSI SAN should work. 2. Does it make sense to use a SPOF SAN - again this militates again the architectural underpinnings of Cassandra, that relies on the shared nothing idea to ensure that problems - say a bad disk - are easily isolated to a particular node. On a SAN, depending on RAID configs, and how LUNs are carved out and so on, a few disk outages could affect multiple nodes. A performance problem with the SAN, could now affects your entire Cassandra cluster, and so on. Cassandra is not meant to be set up this way! But but but...in the real world today - Large storage volumes are available only with SANs. Rackable machines do not leave a lot of space - typically - for a bunch of HDDs. On top of that, SANs provide all kinds of admin capabilities that supposedly help with uptime and performance guarantees and so on. So a Colo DC might not have any other option but shared storage! So if one is forced to use a SAN, how should you set up Cassandra is the interesting question - to me! Here are some thoughts:- 1. Ensure that each node gets dedicated - not shared - LUNs 2. Ensure that these LUNs do share spindles, or nodes will seize to be isolatable (this will be tough to get, given how SAN administrators think about this) 3. Most SANs deliver performance by striping (RAID 0) - sacrifice striping for isolation if push comes to shove 4. Do not share data directories from mutliple nodes onto a single location via NFS or CFS for example. They are cool in shared resource environments, but breaks the premise behind Cassandra. All data storage should be private to the cassandra node, even when on shared storage 5. Do not change any assumption around Replication Factor (RF) or Consistency Levle (CL) due to the shared storage - in fact if anything, increase your replication factor because you now have potential SPOF storage. My two - or maybe more - cents on the issue, HTH, -JA On Fri, Jan 21, 2011 at 1:15 PM, Edward Capriolo edlinuxg...@gmail.comwrote: On Fri, Jan 21, 2011 at 12:07 PM, Jonathan Ellis jbel...@gmail.com wrote: On Fri, Jan 21, 2011 at 2:19 AM, Mick Semb Wever m...@apache.org wrote: Of course with a SAN you'd want RF=1 since it's replicating internally. Isn't this the same case for raid-5 as well? No, because the replication is (mainly) to protect you from machine failures; if the SAN is a SPOF then putting more replicas on it doesn't help. And we want RF=2 if we need to keep reading while doing rolling restarts? Yes. -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com If you are using cassandra with a SAN RF=1 makes sense because we are making the assumption the san is already replicating your data. RF2 makes good sense to be not effected by outages. Another alternative is something like linux-HA and manage each cassandra instance as a resource. This way if a head goes down another node linux ha would detect the failure and bring up that instance on another physical piece of hardware. Using LinuxHA+SAN+Cassandra would actually bring Cassandra closer to the hbase model which you have a distributed file system but the front end Cassandra acts like a region server.
Cassandra on iSCSI?
Does anyone have any experiences with Cassandra on iSCSI? I'm currently testing a (soon-to-be) production server using both local raid-5 and iSCSI disks. Our hosting provider is pushing us hard towards the iSCSI disks because it is easier for them to run (and to meet our needs for increasing disk capacity overtime). I'm worried that iSCSI is a non-scalable solution for an otherwise scalable application (all cassandra nodes will have separate partitions to the one iSCSI). To go with raid-5 disks our hosting provider requires proof that iSCSI won't work. I tried various things (eg `nodetool cleanup` on 12Gb load giving 5k IOPS) but iSCSI seems to keep up to the performance of the local raid-5 disks... Should i be worried about using iSCSI? Are there better tests i should be running? ~mck -- The turtle only makes progress when it's neck is stuck out Rollo May | http://semb.wever.org | http://sesat.no | http://finn.no | Java XSS Filter signature.asc Description: This is a digitally signed message part
Re: Cassandra on iSCSI?
On Thu, Jan 20, 2011 at 2:13 PM, Mick Semb Wever m...@apache.org wrote: To go with raid-5 disks our hosting provider requires proof that iSCSI won't work. I tried various things (eg `nodetool cleanup` on 12Gb load giving 5k IOPS) but iSCSI seems to keep up to the performance of the local raid-5 disks... Should i be worried about using iSCSI? It should work fine; the main reason to go with local storage is the huge cost advantage. Of course with a SAN you'd want RF=1 since it's replicating internally. Are there better tests i should be running? I would test write scalability going from 1 machine, to half your planned cluster size, to your full cluster size, or as close as is feasible, using enough client machines running contrib/stress* (much faster than contrib/py_stress) that you saturate it. Writes should be CPU bound, so you expect those to scale roughly linearly as you add Cassandra nodes. Reads (once your data set can't be cached in RAM) will be i/o bound, so I imagine with a SAN you'll be able to max that out at some number of machines and adding more Cassandra nodes won't help. What that limit is depends on your SAN iops and how much of it is being consumed by other applications. *I just committed a README for contrib/stress to the 0.7 svn branch -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com
Re: Cassandra on iSCSI?
It should work fine; the main reason to go with local storage is the huge cost advantage. [OT] They're quoting roughly the same price for both (claiming that the extra cost goes into having for each node a separate disk cabinet to run local raid-5). *I just committed a README for contrib/stress to the 0.7 svn branch thanks! i'll check it out. ~mck -- “An invasion of armies can be resisted, but not an idea whose time has come.” - Victor Hugo | www.semb.wever.org | www.sesat.no | www.finn.no | http://xss-http-filter.sf.net signature.asc Description: This is a digitally signed message part