Re: Effective allocation of multiple disks

2010-03-12 Thread Eric Rosenberry
Ryan-

Are you going to use software or hardware based RAID 0?

Does anyone on the list have any data to compare the performance of hardware
RAID 0 vs. software LVM RAID 0?

I would think software RAID 0 would be fine since there is no actual
computation being done...

Thanks!

-Eric

On Thu, Mar 11, 2010 at 1:16 PM, Ryan King r...@twitter.com wrote:


 Even without major compaction, you can get significant imbalances in
 how much data is on each disk which will bottleneck your IO
 throughput. We're running JBOD right now, but going to switch to RAID
 0 soon.

 -ryan



Re: Effective allocation of multiple disks

2010-03-12 Thread Ted Zlatanov
On Thu, 11 Mar 2010 12:01:27 -0600 Eric Evans eev...@rackspace.com wrote: 

EE On Wed, 2010-03-10 at 23:20 -0600, Jonathan Ellis wrote:
 On Wed, Mar 10, 2010 at 9:31 PM, Anthony Molinaro
 antho...@alumni.caltech.edu wrote:
  I would almost recommend just keeping things simple and removing
  multiple data directories from the config altogether and just
  documenting that you should plan on using OS level mechanisms for
  growing diskspace and io.
 
 I think that is a pretty sane suggestion actually. 

EE Or maybe leave the code as is and just document the situation more
EE clearly? If you're adding more disks to increase storage capacity
EE and you don't strictly need the extra IO, then multiple data
EE directories might be preferable to other forms of aggregation (it's
EE certainly simpler than say a volume manager).

Could Cassandra use a block device as raw storage?  You avoid the
filesystem overhead and it lets the sysadmin determine the best kind of
device (RAID or not underneath) to allocate.

Ted



Re: Effective allocation of multiple disks

2010-03-12 Thread Ryan King
We're going to us software raid.

-ryan

On Fri, Mar 12, 2010 at 9:24 AM, Eric Rosenberry epros...@gmail.com wrote:
 Ryan-
 Are you going to use software or hardware based RAID 0?

 Does anyone on the list have any data to compare the performance of hardware
 RAID 0 vs. software LVM RAID 0?
 I would think software RAID 0 would be fine since there is no actual
 computation being done...
 Thanks!

 -Eric

 On Thu, Mar 11, 2010 at 1:16 PM, Ryan King r...@twitter.com wrote:

 Even without major compaction, you can get significant imbalances in
 how much data is on each disk which will bottleneck your IO
 throughput. We're running JBOD right now, but going to switch to RAID
 0 soon.

 -ryan




Re: Effective allocation of multiple disks

2010-03-11 Thread Eric Evans
On Wed, 2010-03-10 at 23:20 -0600, Jonathan Ellis wrote:
 On Wed, Mar 10, 2010 at 9:31 PM, Anthony Molinaro
 antho...@alumni.caltech.edu wrote:
  I would almost
  recommend just keeping things simple and removing multiple data
 directories
  from the config altogether and just documenting that you should plan
 on using
  OS level mechanisms for growing diskspace and io.
 
 I think that is a pretty sane suggestion actually. 

Or maybe leave the code as is and just document the situation more
clearly? If you're adding more disks to increase storage capacity and
you don't strictly need the extra IO, then multiple data directories
might be preferable to other forms of aggregation (it's certainly
simpler than say a volume manager).

-- 
Eric Evans
eev...@rackspace.com



Re: Effective allocation of multiple disks

2010-03-11 Thread Jonathan Ellis
Except that for a major compaction the whole thing gets put in one
directory.  That's the problem w/ the JBOD approach.

On Thu, Mar 11, 2010 at 12:01 PM, Eric Evans eev...@rackspace.com wrote:
 On Wed, 2010-03-10 at 23:20 -0600, Jonathan Ellis wrote:
 On Wed, Mar 10, 2010 at 9:31 PM, Anthony Molinaro
 antho...@alumni.caltech.edu wrote:
  I would almost
  recommend just keeping things simple and removing multiple data
 directories
  from the config altogether and just documenting that you should plan
 on using
  OS level mechanisms for growing diskspace and io.

 I think that is a pretty sane suggestion actually.

 Or maybe leave the code as is and just document the situation more
 clearly? If you're adding more disks to increase storage capacity and
 you don't strictly need the extra IO, then multiple data directories
 might be preferable to other forms of aggregation (it's certainly
 simpler than say a volume manager).

 --
 Eric Evans
 eev...@rackspace.com




Re: Effective allocation of multiple disks

2010-03-11 Thread Anthony Molinaro
I'm still wondering what happens when you have something like 2 500GB disks,
with 2 sstables which use up 25OGB, one on each disk, then a major compaction
occurs.  Will it still compact and probably fill up a disk (especially with
the 2x overhead of compaction mentioned either here or on the wiki)?

Seems like you basically could easily get into a situation where you can't
fix it without something like a volume manager, or a complete shutdown, move
data to bigger disk upgrade.

I guess one way might be to treat each disk as a separate node (ie, give
it some fraction of the keyspace based on its disk space), then when you
add a directory to the config you would have to load balance but only
within that node.  I'm sure that complicates ring maintenance but maybe
its a better experience, as the multiple data directories should all fill
uniformly?

Just some other thoughts.

-Anthony

On Thu, Mar 11, 2010 at 12:45:14PM -0600, Jonathan Ellis wrote:
 Except that for a major compaction the whole thing gets put in one
 directory.  That's the problem w/ the JBOD approach.
 
 On Thu, Mar 11, 2010 at 12:01 PM, Eric Evans eev...@rackspace.com wrote:
  On Wed, 2010-03-10 at 23:20 -0600, Jonathan Ellis wrote:
  On Wed, Mar 10, 2010 at 9:31 PM, Anthony Molinaro
  antho...@alumni.caltech.edu wrote:
   I would almost
   recommend just keeping things simple and removing multiple data
  directories
   from the config altogether and just documenting that you should plan
  on using
   OS level mechanisms for growing diskspace and io.
 
  I think that is a pretty sane suggestion actually.
 
  Or maybe leave the code as is and just document the situation more
  clearly? If you're adding more disks to increase storage capacity and
  you don't strictly need the extra IO, then multiple data directories
  might be preferable to other forms of aggregation (it's certainly
  simpler than say a volume manager).
 
  --
  Eric Evans
  eev...@rackspace.com
 
 

-- 

Anthony Molinaro   antho...@alumni.caltech.edu


Re: Effective allocation of multiple disks

2010-03-11 Thread Ryan King
On Thu, Mar 11, 2010 at 10:45 AM, Jonathan Ellis jbel...@gmail.com wrote:
 Except that for a major compaction the whole thing gets put in one
 directory.  That's the problem w/ the JBOD approach.

Even without major compaction, you can get significant imbalances in
how much data is on each disk which will bottleneck your IO
throughput. We're running JBOD right now, but going to switch to RAID
0 soon.

-ryan


RE: Effective allocation of multiple disks

2010-03-10 Thread Stu Hood
You can list multiple DataFileDirectories, and Cassandra will scatter files 
across all of them. Use 1 disk for the commitlog, and 3 disks for data 
directories.

See http://wiki.apache.org/cassandra/CassandraHardware#Disk

Thanks,
Stu

-Original Message-
From: Eric Rosenberry epros...@gmail.com
Sent: Wednesday, March 10, 2010 2:00am
To: cassandra-user@incubator.apache.org
Subject: Effective allocation of multiple disks

Based on the documentation, it is clear that with Cassandra you want to have
one disk for commitlog, and one disk for data.

My question is: If you think your workload is going to require more io
performance to the data disks than a single disk can handle, how would you
recommend effectively utilizing additional disks?

It would seem a number of vendors sell 1U boxes with four 3.5 inch disks.
 If we use one for commitlog, is there a way to have Cassandra itself
equally split data across the three remaining disks?  Or is this something
that needs to be handled by the hardware level, or operating system/file
system level?

Options include a hardware RAID controller in a RAID 0 stripe (this is more
$$$ and for what gain?), or utilizing a volume manager like LVM.

Along those same lines, if you do implement some type of striping, what RAID
stripe size is recommended?  (I think Todd Burruss asked this earlier but I
did not see a response)

Thanks for any input!

-Eric




Re: Effective allocation of multiple disks

2010-03-10 Thread Eric Rosenberry
Ahh, thanks!  I had read that, but I had assumed the reference to use one
or more devices for DataFileDirectories was referring to somehow making
multiple physical devices into one logical device via some underlying RAID
system.

So then as far as free space on the disks go, I have seen references to
keeping utilization below 50% to handle compaction.  Would it not be true to
say that you only need as much free space as the to handle another copy of
the largest data file you have?  (i.e. perhaps less than 50% of the disk)

Due to the compaction space requirement, would it be more efficient to do
RAID 0 somewhere under the hood?

Just simply being able to specify multiple DataFileDirectories does does
indeed sound appealing...

Thanks.

-Eric

On Wed, Mar 10, 2010 at 12:08 AM, Stu Hood stu.h...@rackspace.com wrote:

 You can list multiple DataFileDirectories, and Cassandra will scatter files
 across all of them. Use 1 disk for the commitlog, and 3 disks for data
 directories.

 See http://wiki.apache.org/cassandra/CassandraHardware#Disk

 Thanks,
 Stu

 -Original Message-
 From: Eric Rosenberry epros...@gmail.com
 Sent: Wednesday, March 10, 2010 2:00am
 To: cassandra-user@incubator.apache.org
 Subject: Effective allocation of multiple disks

 Based on the documentation, it is clear that with Cassandra you want to
 have
 one disk for commitlog, and one disk for data.

 My question is: If you think your workload is going to require more io
 performance to the data disks than a single disk can handle, how would you
 recommend effectively utilizing additional disks?

 It would seem a number of vendors sell 1U boxes with four 3.5 inch disks.
  If we use one for commitlog, is there a way to have Cassandra itself
 equally split data across the three remaining disks?  Or is this something
 that needs to be handled by the hardware level, or operating system/file
 system level?

 Options include a hardware RAID controller in a RAID 0 stripe (this is more
 $$$ and for what gain?), or utilizing a volume manager like LVM.

 Along those same lines, if you do implement some type of striping, what
 RAID
 stripe size is recommended?  (I think Todd Burruss asked this earlier but I
 did not see a response)

 Thanks for any input!

 -Eric





Re: Effective allocation of multiple disks

2010-03-10 Thread Jonathan Ellis
Thanks for testing that, added a note to
http://wiki.apache.org/cassandra/CassandraHardware on stripe size.

On Wed, Mar 10, 2010 at 11:03 AM, B. Todd Burruss bburr...@real.com wrote:
 with the file sizes we're talking about with cassandra and other database
 products, the stripe size doesn't seem to matter.  i suppose there may be a
 modicum of overhead with a small stripe size, but i'm not sure.  mine is set
 to 128k, which produced the same results as 16k and 256k.

 i will say the number of drives within the RAID 0 setup does seem to matter.
  more you have the more parallelism you can get with a good RAID controller.

 Eric Rosenberry wrote:

 Based on the documentation, it is clear that with Cassandra you want to
 have one disk for commitlog, and one disk for data.

 My question is: If you think your workload is going to require more io
 performance to the data disks than a single disk can handle, how would you
 recommend effectively utilizing additional disks?

 It would seem a number of vendors sell 1U boxes with four 3.5 inch disks.
  If we use one for commitlog, is there a way to have Cassandra itself
 equally split data across the three remaining disks?  Or is this something
 that needs to be handled by the hardware level, or operating system/file
 system level?

 Options include a hardware RAID controller in a RAID 0 stripe (this is
 more $$$ and for what gain?), or utilizing a volume manager like LVM.

 Along those same lines, if you do implement some type of striping, what
 RAID stripe size is recommended?  (I think Todd Burruss asked this earlier
 but I did not see a response)

 Thanks for any input!

 -Eric



Re: Effective allocation of multiple disks

2010-03-10 Thread Anthony Molinaro
This is incorrect, as discussed a few weeks ago.  I have a setup with multiple
disks, and as soon as compaction occurs all the data ends up on one disk.  If
you need the additional io, you will want raid0.  But simply listing multiple
DataFileDirectories will not work.

-Anthony

On Wed, Mar 10, 2010 at 02:08:13AM -0600, Stu Hood wrote:
 You can list multiple DataFileDirectories, and Cassandra will scatter files 
 across all of them. Use 1 disk for the commitlog, and 3 disks for data 
 directories.
 
 See http://wiki.apache.org/cassandra/CassandraHardware#Disk
 
 Thanks,
 Stu
 
 -Original Message-
 From: Eric Rosenberry epros...@gmail.com
 Sent: Wednesday, March 10, 2010 2:00am
 To: cassandra-user@incubator.apache.org
 Subject: Effective allocation of multiple disks
 
 Based on the documentation, it is clear that with Cassandra you want to have
 one disk for commitlog, and one disk for data.
 
 My question is: If you think your workload is going to require more io
 performance to the data disks than a single disk can handle, how would you
 recommend effectively utilizing additional disks?
 
 It would seem a number of vendors sell 1U boxes with four 3.5 inch disks.
  If we use one for commitlog, is there a way to have Cassandra itself
 equally split data across the three remaining disks?  Or is this something
 that needs to be handled by the hardware level, or operating system/file
 system level?
 
 Options include a hardware RAID controller in a RAID 0 stripe (this is more
 $$$ and for what gain?), or utilizing a volume manager like LVM.
 
 Along those same lines, if you do implement some type of striping, what RAID
 stripe size is recommended?  (I think Todd Burruss asked this earlier but I
 did not see a response)
 
 Thanks for any input!
 
 -Eric
 
 

-- 

Anthony Molinaro   antho...@alumni.caltech.edu


Re: Effective allocation of multiple disks

2010-03-10 Thread Stu Hood
Yea, I suppose major compactions are the wildcard here. Nonetheless, the 
situation where you only have 1 SSTable should be very rare.

I'll open a ticket though, because we really ought to be able to utilize those 
disks more thoroughly, and I have some ideas there.


-Original Message-
From: Anthony Molinaro antho...@alumni.caltech.edu
Sent: Wednesday, March 10, 2010 3:38pm
To: cassandra-user@incubator.apache.org
Subject: Re: Effective allocation of multiple disks

This is incorrect, as discussed a few weeks ago.  I have a setup with multiple
disks, and as soon as compaction occurs all the data ends up on one disk.  If
you need the additional io, you will want raid0.  But simply listing multiple
DataFileDirectories will not work.

-Anthony

On Wed, Mar 10, 2010 at 02:08:13AM -0600, Stu Hood wrote:
 You can list multiple DataFileDirectories, and Cassandra will scatter files 
 across all of them. Use 1 disk for the commitlog, and 3 disks for data 
 directories.
 
 See http://wiki.apache.org/cassandra/CassandraHardware#Disk
 
 Thanks,
 Stu
 
 -Original Message-
 From: Eric Rosenberry epros...@gmail.com
 Sent: Wednesday, March 10, 2010 2:00am
 To: cassandra-user@incubator.apache.org
 Subject: Effective allocation of multiple disks
 
 Based on the documentation, it is clear that with Cassandra you want to have
 one disk for commitlog, and one disk for data.
 
 My question is: If you think your workload is going to require more io
 performance to the data disks than a single disk can handle, how would you
 recommend effectively utilizing additional disks?
 
 It would seem a number of vendors sell 1U boxes with four 3.5 inch disks.
  If we use one for commitlog, is there a way to have Cassandra itself
 equally split data across the three remaining disks?  Or is this something
 that needs to be handled by the hardware level, or operating system/file
 system level?
 
 Options include a hardware RAID controller in a RAID 0 stripe (this is more
 $$$ and for what gain?), or utilizing a volume manager like LVM.
 
 Along those same lines, if you do implement some type of striping, what RAID
 stripe size is recommended?  (I think Todd Burruss asked this earlier but I
 did not see a response)
 
 Thanks for any input!
 
 -Eric
 
 

-- 

Anthony Molinaro   antho...@alumni.caltech.edu




Re: Effective allocation of multiple disks

2010-03-10 Thread Anthony Molinaro
Except major compactions are not that rare if you have a cluster which you
need to add capacity to.  Anytime to add nodes with bootstrap it is recommended
you run cleanup on nodes which you removed data from (and this is useful to
see how much space you are now using).  Cleanup does a major compaction and
if you happen to have one disk larger than others, most of the data ends
up there (this happened to me when I added some ebs's to some ec2 nodes, I
distributed the sstables and everything was cool, I had more io, things were
great, then I need to add another node, did a cleanup, boom everything is
on one disk and io sucks again).  I also don't quite know what happens when
a major compaction occurs which would combine sstables and fill up the
largest disk?

However after discussion I completely understand why things were done this
way, it's difficut to manage the space and really it should be relegated to
the disk subsystem of the OS (ie, RAID0, JBOD, LVM, etc).  I would almost
recommend just keeping things simple and removing multiple data directories
from the config altogether and just documenting that you should plan on using
OS level mechanisms for growing diskspace and io.

-Anthony

On Wed, Mar 10, 2010 at 04:43:36PM -0600, Stu Hood wrote:
 Yea, I suppose major compactions are the wildcard here. Nonetheless, the 
 situation where you only have 1 SSTable should be very rare.
 
 I'll open a ticket though, because we really ought to be able to utilize 
 those disks more thoroughly, and I have some ideas there.
 
 
 -Original Message-
 From: Anthony Molinaro antho...@alumni.caltech.edu
 Sent: Wednesday, March 10, 2010 3:38pm
 To: cassandra-user@incubator.apache.org
 Subject: Re: Effective allocation of multiple disks
 
 This is incorrect, as discussed a few weeks ago.  I have a setup with multiple
 disks, and as soon as compaction occurs all the data ends up on one disk.  If
 you need the additional io, you will want raid0.  But simply listing multiple
 DataFileDirectories will not work.
 
 -Anthony
 
 On Wed, Mar 10, 2010 at 02:08:13AM -0600, Stu Hood wrote:
  You can list multiple DataFileDirectories, and Cassandra will scatter files 
  across all of them. Use 1 disk for the commitlog, and 3 disks for data 
  directories.
  
  See http://wiki.apache.org/cassandra/CassandraHardware#Disk
  
  Thanks,
  Stu
  
  -Original Message-
  From: Eric Rosenberry epros...@gmail.com
  Sent: Wednesday, March 10, 2010 2:00am
  To: cassandra-user@incubator.apache.org
  Subject: Effective allocation of multiple disks
  
  Based on the documentation, it is clear that with Cassandra you want to have
  one disk for commitlog, and one disk for data.
  
  My question is: If you think your workload is going to require more io
  performance to the data disks than a single disk can handle, how would you
  recommend effectively utilizing additional disks?
  
  It would seem a number of vendors sell 1U boxes with four 3.5 inch disks.
   If we use one for commitlog, is there a way to have Cassandra itself
  equally split data across the three remaining disks?  Or is this something
  that needs to be handled by the hardware level, or operating system/file
  system level?
  
  Options include a hardware RAID controller in a RAID 0 stripe (this is more
  $$$ and for what gain?), or utilizing a volume manager like LVM.
  
  Along those same lines, if you do implement some type of striping, what RAID
  stripe size is recommended?  (I think Todd Burruss asked this earlier but I
  did not see a response)
  
  Thanks for any input!
  
  -Eric
  
  
 
 -- 
 
 Anthony Molinaro   antho...@alumni.caltech.edu
 
 

-- 

Anthony Molinaro   antho...@alumni.caltech.edu


Re: Effective allocation of multiple disks

2010-03-10 Thread Jonathan Ellis
On Wed, Mar 10, 2010 at 9:31 PM, Anthony Molinaro
antho...@alumni.caltech.edu wrote:
 I would almost
 recommend just keeping things simple and removing multiple data directories
 from the config altogether and just documenting that you should plan on using
 OS level mechanisms for growing diskspace and io.

I think that is a pretty sane suggestion actually.

-Jonathan