Re: Effective allocation of multiple disks

2010-03-12 Thread Ryan King
We're going to us software raid. -ryan On Fri, Mar 12, 2010 at 9:24 AM, Eric Rosenberry wrote: > Ryan- > Are you going to use software or hardware based RAID 0? > > Does anyone on the list have any data to compare the performance of hardware > RAID 0 vs. software LVM RAID 0? > I would think soft

Re: Effective allocation of multiple disks

2010-03-12 Thread Ted Zlatanov
On Thu, 11 Mar 2010 12:01:27 -0600 Eric Evans wrote: EE> On Wed, 2010-03-10 at 23:20 -0600, Jonathan Ellis wrote: >> On Wed, Mar 10, 2010 at 9:31 PM, Anthony Molinaro >> wrote: >> > I would almost recommend just keeping things simple and removing >> > multiple data directories from the config a

Re: Effective allocation of multiple disks

2010-03-12 Thread Eric Rosenberry
Ryan- Are you going to use software or hardware based RAID 0? Does anyone on the list have any data to compare the performance of hardware RAID 0 vs. software LVM RAID 0? I would think software RAID 0 would be fine since there is no actual computation being done... Thanks! -Eric On Thu, Mar 1

Re: Effective allocation of multiple disks

2010-03-11 Thread Ryan King
On Thu, Mar 11, 2010 at 10:45 AM, Jonathan Ellis wrote: > Except that for a major compaction the whole thing gets put in one > directory.  That's the problem w/ the JBOD approach. Even without major compaction, you can get significant imbalances in how much data is on each disk which will bottlen

Re: Effective allocation of multiple disks

2010-03-11 Thread Anthony Molinaro
I'm still wondering what happens when you have something like 2 500GB disks, with 2 sstables which use up 25OGB, one on each disk, then a major compaction occurs. Will it still compact and probably fill up a disk (especially with the 2x overhead of compaction mentioned either here or on the wiki)?

Re: Effective allocation of multiple disks

2010-03-11 Thread Jonathan Ellis
Except that for a major compaction the whole thing gets put in one directory. That's the problem w/ the JBOD approach. On Thu, Mar 11, 2010 at 12:01 PM, Eric Evans wrote: > On Wed, 2010-03-10 at 23:20 -0600, Jonathan Ellis wrote: >> On Wed, Mar 10, 2010 at 9:31 PM, Anthony Molinaro >> wrote: >>

Re: Effective allocation of multiple disks

2010-03-11 Thread Eric Evans
On Wed, 2010-03-10 at 23:20 -0600, Jonathan Ellis wrote: > On Wed, Mar 10, 2010 at 9:31 PM, Anthony Molinaro > wrote: > > I would almost > > recommend just keeping things simple and removing multiple data > directories > > from the config altogether and just documenting that you should plan > on u

Re: Effective allocation of multiple disks

2010-03-10 Thread Jonathan Ellis
On Wed, Mar 10, 2010 at 9:31 PM, Anthony Molinaro wrote: > I would almost > recommend just keeping things simple and removing multiple data directories > from the config altogether and just documenting that you should plan on using > OS level mechanisms for growing diskspace and io. I think that

Re: Effective allocation of multiple disks

2010-03-10 Thread Anthony Molinaro
e able to utilize > those disks more thoroughly, and I have some ideas there. > > > -Original Message- > From: "Anthony Molinaro" > Sent: Wednesday, March 10, 2010 3:38pm > To: cassandra-user@incubator.apache.org > Subject: Re: Effective allocation of m

Re: Effective allocation of multiple disks

2010-03-10 Thread Stu Hood
ssage- From: "Anthony Molinaro" Sent: Wednesday, March 10, 2010 3:38pm To: cassandra-user@incubator.apache.org Subject: Re: Effective allocation of multiple disks This is incorrect, as discussed a few weeks ago. I have a setup with multiple disks, and as soon as compaction occurs all the

Re: Effective allocation of multiple disks

2010-03-10 Thread Anthony Molinaro
This is incorrect, as discussed a few weeks ago. I have a setup with multiple disks, and as soon as compaction occurs all the data ends up on one disk. If you need the additional io, you will want raid0. But simply listing multiple DataFileDirectories will not work. -Anthony On Wed, Mar 10, 20

Re: Effective allocation of multiple disks

2010-03-10 Thread Jonathan Ellis
Thanks for testing that, added a note to http://wiki.apache.org/cassandra/CassandraHardware on stripe size. On Wed, Mar 10, 2010 at 11:03 AM, B. Todd Burruss wrote: > with the file sizes we're talking about with cassandra and other database > products, the stripe size doesn't seem to matter.  i s

Re: Effective allocation of multiple disks

2010-03-10 Thread B. Todd Burruss
with the file sizes we're talking about with cassandra and other database products, the stripe size doesn't seem to matter. i suppose there may be a modicum of overhead with a small stripe size, but i'm not sure. mine is set to 128k, which produced the same results as 16k and 256k. i will sa

Re: Effective allocation of multiple disks

2010-03-10 Thread Eric Rosenberry
Ahh, thanks! I had read that, but I had assumed the reference to "use one or more devices for DataFileDirectories" was referring to somehow making multiple physical devices into one logical device via some underlying RAID system. So then as far as free space on the disks go, I have seen reference

RE: Effective allocation of multiple disks

2010-03-10 Thread Stu Hood
You can list multiple DataFileDirectories, and Cassandra will scatter files across all of them. Use 1 disk for the commitlog, and 3 disks for data directories. See http://wiki.apache.org/cassandra/CassandraHardware#Disk Thanks, Stu -Original Message- From: "Eric Rosenberry" Sent: Wedn