Re: Cassandra on iSCSI?

2011-01-22 Thread Mick Semb Wever
 So if one is forced to use a SAN, how should you set up Cassandra is
 the interesting question - to me! Here are some thoughts:- 
 1. Ensure that each node gets dedicated - not shared - LUNs 
 2. Ensure that these LUNs do share spindles, or nodes will seize to be
 isolatable (this will be tough to get, given how SAN administrators
 think about this) 
 3. Most SANs deliver performance by striping (RAID 0) - sacrifice
 striping for isolation if push comes to shove 
 4. Do not share data directories from multiple nodes onto a single
 location via NFS or CFS for example. They are cool in shared resource
 environments, but breaks the premise behind Cassandra. All data
 storage should be private to the cassandra node, even when on shared
 storage 
 5. Do not change any assumption around Replication Factor (RF) or
 Consistency Level (CL) due to the shared storage - in fact if
 anything, increase your replication factor because you now have
 potential SPOF storage.  

That was gold, and lead to a direct conversation between provider and
developer. Various tests showed IOPS will often be at 5k per node.
Therefore the iSCSI solution would need to be tailored to handle it.

Just like mentioned above our provider simply couldn't provide us so much
disk per server. But after a good discussion it became obvious (doh!)
that the application can actually save a lot of disk by using different
keyspaces with different RF. We have raw data that needs to be
collected, but can be temporarily unavailable for reading, hence RF=1
makes sense. This raw data is the vast bulk of the data so this saves
lots of disk space. The aggregated data, which is relatively small in
comparison, is critical for the application to read so we can keep in a
separate keyspace with higher RF...

~mck

-- 
“Anyone who lives within their means suffers from a lack of
imagination.” - Oscar Wilde 
| http://semb.wever.org | http://sesat.no
| http://finn.no   | Java XSS Filter



signature.asc
Description: This is a digitally signed message part


Re: Cassandra on iSCSI?

2011-01-21 Thread Mick Semb Wever

 Of course with a SAN you'd want RF=1 since it's replicating
 internally. 

Isn't this the same case for raid-5 as well?

And we want RF=2 if we need to keep reading while doing rolling
restarts?

~mck

-- 
“Anyone who lives within their means suffers from a lack of
imagination.” - Oscar Wilde 
| http://semb.wever.org | http://sesat.no
| http://finn.no   | Java XSS Filter


signature.asc
Description: This is a digitally signed message part


Re: Cassandra on iSCSI?

2011-01-21 Thread Mick Semb Wever
 [OT] They're quoting roughly the same price for both (claiming
 that the
 extra cost goes into having for each node a separate disk
 cabinet to run
 local raid-5).
 
 You might not need raid-5 for local attached storage. 

Yes we did ask. But raid-5 is the minimum being offered from our hosting
provider... We could go to raid 10, but raid 0 is out of the question...

~mck

-- 
To be young, really young, takes a very long time. Picasso 
| http://semb.wever.org | http://sesat.no
| http://finn.no   | Java XSS Filter


signature.asc
Description: This is a digitally signed message part


Re: Cassandra on iSCSI?

2011-01-21 Thread Jonathan Ellis
On Fri, Jan 21, 2011 at 2:19 AM, Mick Semb Wever m...@apache.org wrote:

 Of course with a SAN you'd want RF=1 since it's replicating
 internally.

 Isn't this the same case for raid-5 as well?

No, because the replication is (mainly) to protect you from machine
failures; if the SAN is a SPOF then putting more replicas on it
doesn't help.

 And we want RF=2 if we need to keep reading while doing rolling
 restarts?

Yes.

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com


Re: Cassandra on iSCSI?

2011-01-21 Thread Edward Capriolo
On Fri, Jan 21, 2011 at 12:07 PM, Jonathan Ellis jbel...@gmail.com wrote:
 On Fri, Jan 21, 2011 at 2:19 AM, Mick Semb Wever m...@apache.org wrote:

 Of course with a SAN you'd want RF=1 since it's replicating
 internally.

 Isn't this the same case for raid-5 as well?

 No, because the replication is (mainly) to protect you from machine
 failures; if the SAN is a SPOF then putting more replicas on it
 doesn't help.

 And we want RF=2 if we need to keep reading while doing rolling
 restarts?

 Yes.

 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of Riptano, the source for professional Cassandra support
 http://riptano.com


If you are using cassandra with a SAN RF=1 makes sense because we are
making the assumption the san is already replicating your data. RF2
makes good sense to be not effected by outages. Another alternative is
something like linux-HA and manage each cassandra instance as a
resource. This way if a head goes down another node linux ha would
detect the failure and bring up that instance on another physical
piece of hardware.

Using LinuxHA+SAN+Cassandra would actually bring Cassandra closer to
the hbase model which you have a distributed file system but the front
end Cassandra acts like a region server.


Re: Cassandra on iSCSI?

2011-01-21 Thread Anthony John
Sort of - do not agree!!

This is the Shared nothing V/s Shared Disk debate. There are many mainstream
RDBMS products that pretend to do horizontal scalability with Shared Disks.
They have the kinds of problems that Cassandra is specifically architected
to avoid!

The original question here has 2 aspects to it:-
1. Is iSCSI SAN good enough - My take is that it is still the poor man's SAN
as compared to FC based SANs. Having said that,  they have found increasing
adoption and the performance penalty is really marginal. Couple that with
the fact that Cassandra is architected to reduce the need for high
performance storage systems via features like reducing of random writes etc.
So net net - a reasonable iSCSI SAN should work.
2. Does it make sense to use a SPOF SAN - again this militates again the
architectural underpinnings of Cassandra, that relies on the shared nothing
idea to ensure that problems - say a bad disk - are easily isolated to a
particular node. On a SAN, depending on RAID configs, and how LUNs are
carved out and so on, a few disk outages could affect multiple nodes. A
performance problem with the SAN, could now affects your entire Cassandra
cluster, and so on. Cassandra is not meant to be set up this way!

But but but...in the real world today - Large storage volumes are available
only with SANs. Rackable machines do not leave a lot of space - typically -
for a bunch of HDDs. On top of that, SANs provide all kinds of admin
capabilities that supposedly help with uptime and performance guarantees and
so on. So a Colo DC might not have any other option but shared storage!

So if one is forced to use a SAN, how should you set up Cassandra is the
interesting question - to me! Here are some thoughts:-
1. Ensure that each node gets dedicated - not shared - LUNs
2. Ensure that these LUNs do share spindles, or nodes will seize to be
isolatable (this will be tough to get, given how SAN administrators think
about this)
3. Most SANs deliver performance by striping (RAID 0) - sacrifice striping
for isolation if push comes to shove
4. Do not share data directories from mutliple nodes onto a single location
via NFS or CFS for example. They are cool in shared resource environments,
but breaks the premise behind Cassandra. All data storage should be private
to the cassandra node, even when on shared storage
5. Do not change any assumption around Replication Factor (RF) or
Consistency Levle (CL) due to the shared storage - in fact if anything,
increase your replication factor because you now have potential SPOF
storage.

My two - or maybe more - cents on the issue,

HTH,

-JA
On Fri, Jan 21, 2011 at 1:15 PM, Edward Capriolo edlinuxg...@gmail.comwrote:

 On Fri, Jan 21, 2011 at 12:07 PM, Jonathan Ellis jbel...@gmail.com
 wrote:
  On Fri, Jan 21, 2011 at 2:19 AM, Mick Semb Wever m...@apache.org wrote:
 
  Of course with a SAN you'd want RF=1 since it's replicating
  internally.
 
  Isn't this the same case for raid-5 as well?
 
  No, because the replication is (mainly) to protect you from machine
  failures; if the SAN is a SPOF then putting more replicas on it
  doesn't help.
 
  And we want RF=2 if we need to keep reading while doing rolling
  restarts?
 
  Yes.
 
  --
  Jonathan Ellis
  Project Chair, Apache Cassandra
  co-founder of Riptano, the source for professional Cassandra support
  http://riptano.com
 

 If you are using cassandra with a SAN RF=1 makes sense because we are
 making the assumption the san is already replicating your data. RF2
 makes good sense to be not effected by outages. Another alternative is
 something like linux-HA and manage each cassandra instance as a
 resource. This way if a head goes down another node linux ha would
 detect the failure and bring up that instance on another physical
 piece of hardware.

 Using LinuxHA+SAN+Cassandra would actually bring Cassandra closer to
 the hbase model which you have a distributed file system but the front
 end Cassandra acts like a region server.



Cassandra on iSCSI?

2011-01-20 Thread Mick Semb Wever
Does anyone have any experiences with Cassandra on iSCSI?

I'm currently testing a (soon-to-be) production server using both local
raid-5 and iSCSI disks. Our hosting provider is pushing us hard towards
the iSCSI disks because it is easier for them to run (and to meet our
needs for increasing disk capacity overtime).

I'm worried that iSCSI is a non-scalable solution for an otherwise
scalable application (all cassandra nodes will have separate partitions
to the one iSCSI).

To go with raid-5 disks our hosting provider requires proof that iSCSI
won't work. I tried various things (eg `nodetool cleanup` on 12Gb load
giving 5k IOPS) but iSCSI seems to keep up to the performance of the
local raid-5 disks...

Should i be worried about using iSCSI?
Are there better tests i should be running? 

~mck

-- 
The turtle only makes progress when it's neck is stuck out Rollo May 
| http://semb.wever.org | http://sesat.no
| http://finn.no   | Java XSS Filter


signature.asc
Description: This is a digitally signed message part


Re: Cassandra on iSCSI?

2011-01-20 Thread Jonathan Ellis
On Thu, Jan 20, 2011 at 2:13 PM, Mick Semb Wever m...@apache.org wrote:
 To go with raid-5 disks our hosting provider requires proof that iSCSI
 won't work. I tried various things (eg `nodetool cleanup` on 12Gb load
 giving 5k IOPS) but iSCSI seems to keep up to the performance of the
 local raid-5 disks...

 Should i be worried about using iSCSI?

It should work fine; the main reason to go with local storage is the
huge cost advantage.

Of course with a SAN you'd want RF=1 since it's replicating internally.

 Are there better tests i should be running?

I would test write scalability going from 1 machine, to half your
planned cluster size, to your full cluster size, or as close as is
feasible, using enough client machines running contrib/stress* (much
faster than contrib/py_stress) that you saturate it.

Writes should be CPU bound, so you expect those to scale roughly
linearly as you add Cassandra nodes.

Reads (once your data set can't be cached in RAM) will be i/o bound,
so I imagine with a SAN you'll be able to max that out at some number
of machines and adding more Cassandra nodes won't help.  What that
limit is depends on your SAN iops and how much of it is being consumed
by other applications.

*I just committed a README for contrib/stress to the 0.7 svn branch

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com


Re: Cassandra on iSCSI?

2011-01-20 Thread Mick Semb Wever
 It should work fine; the main reason to go with local storage is the
 huge cost advantage.

[OT] They're quoting roughly the same price for both (claiming that the
extra cost goes into having for each node a separate disk cabinet to run
local raid-5).

 *I just committed a README for contrib/stress to the 0.7 svn branch 

thanks! i'll check it out.

~mck

-- 
“An invasion of armies can be resisted, but not an idea whose time has
come.” - Victor Hugo 
| www.semb.wever.org | www.sesat.no 
| www.finn.no | http://xss-http-filter.sf.net


signature.asc
Description: This is a digitally signed message part