Re: large RAID volume partition strategy

2007-08-29 Thread Vivek Khera


On Aug 17, 2007, at 10:44 PM, Ivan Voras wrote:



fdisk and bsdlabels both have a limit: because of the way they  
store the

data about the disk space they span, they can't store values that
reference space  2 TB. In particular, every partition must start  
at an

offset = 2 TB, and cannot be larger than 2 TB.


Thanks.  This is good advice (along with your other note about doing  
it in the RAID volume manager).  Nearly everyone else decided to jump  
on the raid level instead and spew forth the RAID10 is better for  
database party line.  Well to you folks: once you have 1Gb cache and  
a lot of disks, there is not much difference between RAID10 and RAID5  
or RAID6 in my testing.


I ended up making 6 RAID volumes across all the disks to maximize  
spindle counts and strip the data at 16kB.  This seems to work well,  
and I can assign the other partition as I need later on.


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: large RAID volume partition strategy

2007-08-29 Thread Vivek Khera


On Aug 17, 2007, at 10:44 PM, Ivan Voras wrote:

fdisk and bsdlabels both have a limit: because of the way they  
store the

data about the disk space they span, they can't store values that
reference space  2 TB. In particular, every partition must start  
at an

offset = 2 TB, and cannot be larger than 2 TB.


Oh... one more note: if I don't use fdisk or paritions, I *can* newfs  
the raw drive much bigger than 2Tb.  I just don't want to do that for  
a production box. :-)


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: large RAID volume partition strategy

2007-08-29 Thread Darren Pilgrim

Vivek Khera wrote:

On Aug 17, 2007, at 10:44 PM, Ivan Voras wrote:

fdisk and bsdlabels both have a limit: because of the way they  
store the

data about the disk space they span, they can't store values that
reference space  2 TB. In particular, every partition must start  
at an

offset = 2 TB, and cannot be larger than 2 TB.


Oh... one more note: if I don't use fdisk or paritions, I *can* newfs  
the raw drive much bigger than 2Tb.  I just don't want to do that for  
a production box. :-)


Or you can use GPT, which uses 64-bit data structures and thus has an 8 
ZB limit.


--
Darren Pilgrim
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: large RAID volume partition strategy

2007-08-29 Thread Vivek Khera


On Aug 29, 2007, at 2:43 PM, Kirill Ponomarew wrote:


What type I/O did you test, random read/writes, sequential writes ?
The performance of RAID group always depends on what software you
run on your RAID group.  If it's database, be prepared for many
random read/writes, hence dd(1) tests would be useless.


I ran my database on it with a sample workload based on our live  
workload.  Anything else would be a waste of time.


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: large RAID volume partition strategy

2007-08-29 Thread Kirill Ponomarew
On Wed, Aug 29, 2007 at 10:07:19AM -0400, Vivek Khera wrote:
 
 On Aug 17, 2007, at 10:44 PM, Ivan Voras wrote:
 
 
 fdisk and bsdlabels both have a limit: because of the way they store the
 data about the disk space they span, they can't store values that
 reference space  2 TB. In particular, every partition must start at an
 offset = 2 TB, and cannot be larger than 2 TB.
 
 Thanks.  This is good advice (along with your other note about doing it in 
 the RAID volume manager).  Nearly everyone else decided to jump on the raid 
 level instead and spew forth the RAID10 is better for database party 
 line.  Well to you folks: once you have 1Gb cache and a lot of disks, there 
 is not much difference between RAID10 and RAID5 or RAID6 in my testing.
 
What type I/O did you test, random read/writes, sequential writes ?
The performance of RAID group always depends on what software you
run on your RAID group.  If it's database, be prepared for many
random read/writes, hence dd(1) tests would be useless.

 I ended up making 6 RAID volumes across all the disks to maximize spindle 
 counts and strip the data at 16kB.  This seems to work well, and I can 
 assign the other partition as I need later on.
 
 ___
 freebsd-stable@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-stable
 To unsubscribe, send any mail to [EMAIL PROTECTED]

-Kirill
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: large RAID volume partition strategy

2007-08-20 Thread Claus Guttesen
  If you want to avoid the long fsck-times your remaining options are a
  journaling filesystem or zfs, either requires an upgrade from freebsd
  6.2. I have used zfs and had a serverstop due to powerutage in out
  area. Our zfs-samba-server came up fine with no data corruption. So I
  will suggest freebsd 7.0 with zfs.

 But, if I don't go with zfs, which would be a better way to slice the
 space up: RAID volumes exported as individual disks to freebsd, or
 one RAID volume divided into multiple logical partitions with disklabel?

If you want to place data and the transaction-log on different
partitions you want to be shure they reside on different physical
disks so you probably want option 1.

-- 
regards
Claus

When lenity and cruelty play for a kingdom,
the gentlest gamester is the soonest winner.

Shakespeare
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: large RAID volume partition strategy

2007-08-18 Thread Thomas Hurst
* Vivek Khera ([EMAIL PROTECTED]) wrote:

 I'll investigate this option.  Does anyone know the stability
 reliability of the mpt(4) driver on CURRENT?  Is it out of GIANT lock
 yet?  It was hard to tell from the TODO list if it is entirely free of
 GIANT or not.

Yes, mpt(4) was made MPSAFE in revision 1.41, about 3 months ago:

  http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/dev/mpt/mpt.c#rev1.41

I've not seen any stability issues with mpt in either of our test
systems, running heavy MySQL load over 20 spindles and a couple of
controllers each.

 My only fear of this is that once this system is in production, that's
 pretty much it.  Maintenance windows are about 1 year apart, usually
 longer.

Best temper your fear with some thorough testing then.  If you are going
to use ZFS in such a situation, though, I might be strongly tempted to
use Solaris instead.

Why the long gaps between maintenance?

-- 
Thomas 'Freaky' Hurst
http://hur.st/
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: large RAID volume partition strategy

2007-08-18 Thread Matthew Seaman
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Clayton Milos wrote:

 If you want awesome performance and reliability the real way to go is
 RAID10 (or more correctly RAID 0+1).

RAID10 and RAID0+1 are very different beasts.  RAID10 is the best
choice for a read/write intensive f/s with valuable data, exactly
what you need to support a RDBMS.  It is built by pairing up all of
the drives as RAID1 mirrors[*] and then creating a RAID0 stripe
across all of the mirrors.  It's the least economical RAID setup,
giving you a usable space which is 50% of the total raw disk space,
but it is the most resilient -- potentially being able to survive
half of the drives failing -- and much the best performing of the
RAID types.

RAID0+1 on the other hand is what you give to someone you don't like
very much.  In this case, you divide the disks into two equal sets,
create a RAID0 stripe over each set and then a RAID1 mirror over the
stripes.  It has the /delightful/ feature that failure of any one
drive immediately puts half of the available disks out of action: ie
it is *less* resilient than any other RAID setup (other than a RAID0
stripe over all the drives).  Space economy-wise it's exactly like
RAID10 and performance characteristics are pretty similar to RAID10,
leading to the obvious conclusion: use RAID10 instead.

Cheers,

Matthew

[*] The correctly paranoid sysadmin will of course ensure that each
of the disks in those pairs hangs off a different bus, comes from a
different manufacturing batch and is preferably connected to a
different controller and with different, independent power supplies.
 Or, in extreme cases, that each half of the mirrors are in
completely different datacenters.

- --
Dr Matthew J Seaman MA, D.Phil.   7 Priory Courtyard
  Flat 3
PGP: http://www.infracaninophile.co.uk/pgpkey Ramsgate
  Kent, CT11 9PW
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.4 (FreeBSD)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGxqmh8Mjk52CukIwRCPvyAJ4k/POTK9Moqu80nV9TKHZqLIC5ngCfYEd4
oiV2MAAiFIXcNSTSiCM4D6M=
=GDZN
-END PGP SIGNATURE-
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: large RAID volume partition strategy

2007-08-18 Thread Torfinn Ingolfsen
On Fri, 17 Aug 2007 21:50:53 -0400
Vivek Khera [EMAIL PROTECTED] wrote:

 My only fear of this is that once this system is in production,  
 that's pretty much it.  Maintenance windows are about 1 year apart,  
 usually longer.

Seems to me you really should want a redundant / clustered system,
allowing you to do maintenance on one server while running full
production on the rest.

Just my 0.2 euros.
-- 
Regards,
Torfinn Ingolfsen

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: large RAID volume partition strategy

2007-08-18 Thread Vivek Khera


On Aug 18, 2007, at 4:09 AM, Thomas Hurst wrote:

Best temper your fear with some thorough testing then.  If you are  
going

to use ZFS in such a situation, though, I might be strongly tempted to
use Solaris instead.

Why the long gaps between maintenance?


This is a DB server for a 24x7 service.  Maintenance involves moving  
the DB master server to one of the replicas, and this involves  
downtime, so we like to do it as infrequently as possible.  Also, it  
is not exposed to the internet at large, and runs on a closed private  
network, so remote and local attacks are not a major concern.


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: large RAID volume partition strategy

2007-08-17 Thread Claus Guttesen
 I have a shiny new big RAID array.  16x500GB SATA 300+NCQ drives
 connected to the host via 4Gb fibre channel.  This gives me 6.5Tb of
 raw disk.

 I've come up with three possibilities on organizing this disk.  My
 needs are really for a single 1Tb file system on which I will run
 postgres.  However, in the future I'm not sure what I'll really need.
 I don't plan to ever connect any other servers to this RAID unit.

 The three choices I've come with so far are:

 1) Make one RAID volume of 6.5Tb (in a RAID6 + hot spare
 configuration), and make one FreeBSD file system on the whole partition.

 2) Make one RAID volume of 6.5Tb (in a RAID6 + hot spare
 configuration), and make 6 FreeBSD partitions with one file system each.

 3) Make 6 RAID volumes and expose them to FreeBSD as multiple drives,
 then make one partition + file system on each disk.  Each RAID
 volume would span across all 16 drives, and I could make the volumes
 of differing RAID levels, if needed, but I'd probably stick with RAID6
 +spare.

 I'm not keen on option 1 because of the potentially long fsck times
 after a crash.

If you want to avoid the long fsck-times your remaining options are a
journaling filesystem or zfs, either requires an upgrade from freebsd
6.2. I have used zfs and had a serverstop due to powerutage in out
area. Our zfs-samba-server came up fine with no data corruption. So I
will suggest freebsd 7.0 with zfs.

Short fsck-times and ufs2 don't do well together. I know there is
background-fsck but for me that is not an option.

-- 
regards
Claus

When lenity and cruelty play for a kingdom,
the gentlest gamester is the soonest winner.

Shakespeare
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: large RAID volume partition strategy

2007-08-17 Thread Clayton Milos


- Original Message - 
From: Claus Guttesen [EMAIL PROTECTED]

To: Vivek Khera [EMAIL PROTECTED]
Cc: FreeBSD Stable freebsd-stable@freebsd.org
Sent: Friday, August 17, 2007 11:10 PM
Subject: Re: large RAID volume partition strategy



I have a shiny new big RAID array.  16x500GB SATA 300+NCQ drives
connected to the host via 4Gb fibre channel.  This gives me 6.5Tb of
raw disk.

I've come up with three possibilities on organizing this disk.  My
needs are really for a single 1Tb file system on which I will run
postgres.  However, in the future I'm not sure what I'll really need.
I don't plan to ever connect any other servers to this RAID unit.

The three choices I've come with so far are:

1) Make one RAID volume of 6.5Tb (in a RAID6 + hot spare
configuration), and make one FreeBSD file system on the whole partition.

2) Make one RAID volume of 6.5Tb (in a RAID6 + hot spare
configuration), and make 6 FreeBSD partitions with one file system each.

3) Make 6 RAID volumes and expose them to FreeBSD as multiple drives,
then make one partition + file system on each disk.  Each RAID
volume would span across all 16 drives, and I could make the volumes
of differing RAID levels, if needed, but I'd probably stick with RAID6
+spare.

I'm not keen on option 1 because of the potentially long fsck times
after a crash.


If you want to avoid the long fsck-times your remaining options are a
journaling filesystem or zfs, either requires an upgrade from freebsd
6.2. I have used zfs and had a serverstop due to powerutage in out
area. Our zfs-samba-server came up fine with no data corruption. So I
will suggest freebsd 7.0 with zfs.

Short fsck-times and ufs2 don't do well together. I know there is
background-fsck but for me that is not an option.

--
regards
Claus

When lenity and cruelty play for a kingdom,
the gentlest gamester is the soonest winner.

Shakespeare



If you goal is speed and obviously as little possibility of a fail 
(RAID6+spare) then RAID6 is the wrong way to go...

RAID6's read speeds are great but the write speeds are not.
If you want awesome performance and reliability the real way to go is RAID10 
(or more correctly RAID 0+1).
You will of course lose a lot more space than you will with RAID6 but the 
write speeds will be astronomically higher.
How would you feel with 16 drives in RAID10 with 2 hot spares? This will 
give you 3.5TB and if you're using a good RAID controller you should be 
getting write speeds of around 400MB/s to the array.


I've got an Areca 1120 RAID controller with 4 320G drives in a stripe set 
and I'm writing at 280MB/s to that. With 7 500G drives you should be getting 
around 400MB/s because hte RAID10 doesn't have to calculate reconstrust 
data. The theoretical max you're ever going to get from the array is 500MB/s 
anyways with a 4Gb fibre channel controller.


What it really boils down to is how much space are you willing to sacrifice 
for performance...
Another thing you really have to do is make sure you have a good backup 
system. I've seen more than one customer crying because their RAID system 
with hot spares went on the blink and they lost their data.


-Clay


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: large RAID volume partition strategy

2007-08-17 Thread Boris Samorodov
On Fri, 17 Aug 2007 17:42:55 -0400 Vivek Khera wrote:

 I have a shiny new big RAID array.  16x500GB SATA 300+NCQ drives
 connected to the host via 4Gb fibre channel.  This gives me 6.5Tb of
 raw disk.

 I've come up with three possibilities on organizing this disk.  My
 needs are really for a single 1Tb file system on which I will run
 postgres.  However, in the future I'm not sure what I'll really need.
 I don't plan to ever connect any other servers to this RAID unit.

 The three choices I've come with so far are:

 1) Make one RAID volume of 6.5Tb (in a RAID6 + hot spare
 configuration), and make one FreeBSD file system on the whole
 partition.

 2) Make one RAID volume of 6.5Tb (in a RAID6 + hot spare
 configuration), and make 6 FreeBSD partitions with one file system
 each.

 3) Make 6 RAID volumes and expose them to FreeBSD as multiple drives,
 then make one partition + file system on each disk.  Each RAID
 volume would span across all 16 drives, and I could make the volumes
 of differing RAID levels, if needed, but I'd probably stick with RAID6
 +spare.

 I'm not keen on option 1 because of the potentially long fsck times
 after a crash.

 What advantage/disadvantage would I have between 2 and 3?  The only
 thing I can come up with is that the disk scheduling algorithm in
 FreeBSD might not be optimal if the drives really are not truly
 independent as they are really backed by the same 16 drives, so
 option 2 might be better.  However, with option 3, if I do ever end
 up connecting another host to the array, I can assign some of the
 volumes to the other host(s).

 My goal is speed, speed, speed.

Seems that RAID[56] may be too slw. I'd suggest RAID10.

I have 6 SATA-II 300MB/s disks at 3WARE adapter. My (very!) simple
tests gave about 170MB/s for dd. BTW, I tested (OK, very fast)
RAID5, RAID6, gmirror+gstripe and noone get close to RAID10. (Well, as
expected, I suppose).

 I'm running FreeBSD 6.2/amd64 and
 using an LSI fibre card.

If you have time you may do your own tests... And in case RAID0 you
shouldn't have problems with long fsck. Leave a couple of your disks
for hot-swapping and you'll get 7Tb. ;-)

 Thanks for any opinions and recommendations.


WBR
-- 
bsam
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: large RAID volume partition strategy

2007-08-17 Thread Boris Samorodov
On Sat, 18 Aug 2007 02:26:04 +0400 Boris Samorodov wrote:
 On Fri, 17 Aug 2007 17:42:55 -0400 Vivek Khera wrote:

  I have a shiny new big RAID array.  16x500GB SATA 300+NCQ drives
  connected to the host via 4Gb fibre channel.  This gives me 6.5Tb of
  raw disk.

  I've come up with three possibilities on organizing this disk.  My
  needs are really for a single 1Tb file system on which I will run
  postgres.  However, in the future I'm not sure what I'll really need.
  I don't plan to ever connect any other servers to this RAID unit.

  The three choices I've come with so far are:

  1) Make one RAID volume of 6.5Tb (in a RAID6 + hot spare
  configuration), and make one FreeBSD file system on the whole
  partition.

  2) Make one RAID volume of 6.5Tb (in a RAID6 + hot spare
  configuration), and make 6 FreeBSD partitions with one file system
  each.

  3) Make 6 RAID volumes and expose them to FreeBSD as multiple drives,
  then make one partition + file system on each disk.  Each RAID
  volume would span across all 16 drives, and I could make the volumes
  of differing RAID levels, if needed, but I'd probably stick with RAID6
  +spare.

  I'm not keen on option 1 because of the potentially long fsck times
  after a crash.

  What advantage/disadvantage would I have between 2 and 3?  The only
  thing I can come up with is that the disk scheduling algorithm in
  FreeBSD might not be optimal if the drives really are not truly
  independent as they are really backed by the same 16 drives, so
  option 2 might be better.  However, with option 3, if I do ever end
  up connecting another host to the array, I can assign some of the
  volumes to the other host(s).

  My goal is speed, speed, speed.

 Seems that RAID[56] may be too slw. I'd suggest RAID10.

 I have 6 SATA-II 300MB/s disks at 3WARE adapter. My (very!) simple
 tests gave about 170MB/s for dd. BTW, I tested (OK, very fast)
 RAID5, RAID6, gmirror+gstripe and noone get close to RAID10. (Well, as
 expected, I suppose).

  I'm running FreeBSD 6.2/amd64 and
  using an LSI fibre card.

 If you have time you may do your own tests... And in case RAID0 you
^
RAID10

 shouldn't have problems with long fsck. Leave a couple of your disks
 for hot-swapping and you'll get 7Tb. ;-)
  ^^^
3.5TB

  Thanks for any opinions and recommendations.

sorry, not my night...


WBR
-- 
bsam
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: large RAID volume partition strategy

2007-08-17 Thread Ivan Voras
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Vivek Khera wrote:

 I'm not keen on option 1 because of the potentially long fsck times
 after a crash.

Depending on your allowable downtime after a crash, fscking even a 1 TB
UFS file system can be a long time. For large file systems there's
really no alternative to using -CURRENT / 7.0, and either gjournal or ZFS.

When you get there, you'll need to create 1 small RAID volume (= 1 GB)
from which to boot (and probably use it for root) and use the rest for
whatever your choice is (doesn't really matter at this point). This is
because you can't have fdisk or bsdlabel partitions larger than 2 TB and
you can't boot from GPT.
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.2.5 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGxi/aldnAQVacBcgRAovJAKCnFTdEn81uf9lsYg+CuI5kulrd5ACeKcLt
J/4WEUQA9Paw2FR9EnHZ8g0=
=HXLY
-END PGP SIGNATURE-

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: large RAID volume partition strategy

2007-08-17 Thread Tom Samplonius

- Clayton Milos [EMAIL PROTECTED] wrote:

 If you goal is speed and obviously as little possibility of a fail 
 (RAID6+spare) then RAID6 is the wrong way to go...
 RAID6's read speeds are great but the write speeds are not.
 If you want awesome performance and reliability the real way to go is
 RAID10 
 (or more correctly RAID 0+1).

  RAID6 has better reliability than RAID10, because in RAID6, you can survive 
the failure of _any_ two disks.  In RAID10, double disks failures are only 
survivable if specific disks fail (alternates).  In RAID10 sets this is a 
problem.  So:

* Reliability and space:  RAID6
* Performance:  RAID10

And the performance issues on RAID5/6 on writing is directly proportional to 
the quality of your controller.  Good controllers can do partial stripe 
updates, and other optimizations to avoid having to read data back before 
writing anything.  A simple minded RAID controller which has to read the entire 
strip back, writes are 1/3 slower than reads.  Any good controller should be 
about 75% of read the speed.  And RAID5+0 and RAID6+1 are also good options.

  And make sure your controller can do RAID scrubbing.  The chances of a fatal 
failure on an array can be greatly minimized with RAID scrubing.  None of the 
cheap controllers can do this.  ZFS can do it though.  ZFS software RAID is 
almost always better than a cheaper hardware RAID, though maybe not fully 
mature in FreeBSD 7.  RAID6 also minimizes the risk of double disk failures.

  Big huge disks are almost always the wrong choice for databases though.  You 
will never be able to fill the disk up before you hit the IOPS limit per each 
spindle.  A 15K SAS disk has at least twice the IOPS of a 7K SATA disk.  And 
while adding another bank to your RAID0 array doubles your IOPS in theory, it 
isn't exactly a linear increase.  You need the IOPS, it is better to start with 
faster disks.  

Tom


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: large RAID volume partition strategy

2007-08-17 Thread Vivek Khera


On Aug 17, 2007, at 6:26 PM, Boris Samorodov wrote:


I have 6 SATA-II 300MB/s disks at 3WARE adapter. My (very!) simple
tests gave about 170MB/s for dd. BTW, I tested (OK, very fast)
RAID5, RAID6, gmirror+gstripe and noone get close to RAID10. (Well, as
expected, I suppose).


Whichever RAID level I choose, I still need to decide how to split  
the 6.5Tb into smaller hunks.


In any case, my testing with RAID10, RAID5, and RAID6 showed marginal  
differences with my workload.


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: large RAID volume partition strategy

2007-08-17 Thread Vivek Khera


On Aug 17, 2007, at 6:10 PM, Claus Guttesen wrote:


If you want to avoid the long fsck-times your remaining options are a
journaling filesystem or zfs, either requires an upgrade from freebsd
6.2. I have used zfs and had a serverstop due to powerutage in out
area. Our zfs-samba-server came up fine with no data corruption. So I
will suggest freebsd 7.0 with zfs.


Interesting idea...

But, if I don't go with zfs, which would be a better way to slice the  
space up: RAID volumes exported as individual disks to freebsd, or  
one RAID volume divided into multiple logical partitions with disklabel?


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: large RAID volume partition strategy

2007-08-17 Thread Vivek Khera


On Aug 17, 2007, at 7:31 PM, Ivan Voras wrote:

Depending on your allowable downtime after a crash, fscking even a  
1 TB

UFS file system can be a long time. For large file systems there's
really no alternative to using -CURRENT / 7.0, and either gjournal  
or ZFS.


I'll investigate this option.  Does anyone know the stability  
reliability of the mpt(4) driver on CURRENT?  Is it out of GIANT lock  
yet?  It was hard to tell from the TODO list if it is entirely free  
of GIANT or not.


My only fear of this is that once this system is in production,  
that's pretty much it.  Maintenance windows are about 1 year apart,  
usually longer.




When you get there, you'll need to create 1 small RAID volume (= 1  
GB)

from which to boot (and probably use it for root) and use the rest for
whatever your choice is (doesn't really matter at this point). This is
because you can't have fdisk or bsdlabel partitions larger than 2  
TB and

you can't boot from GPT.


So what your saying here is that I can't do either my option 1 or 2,  
but have to create smaller volumes exported as individual drives?  Or  
just that I can't do 1, because my case 2 I could make three 2Tb  
fdisk slices which bsdlabel can then partition?


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: large RAID volume partition strategy

2007-08-17 Thread Ivan Voras
Vivek Khera wrote:

 My only fear of this is that once this system is in production, that's
 pretty much it.  Maintenance windows are about 1 year apart, usually
 longer.

Others will have to comment about that. I have only one 7-CURRENT in
production (because of ZFS) and I had only one panic (in ZFS). But this
machine is not heavily utilized.

 When you get there, you'll need to create 1 small RAID volume (= 1 GB)
 from which to boot (and probably use it for root) and use the rest for
 whatever your choice is (doesn't really matter at this point). This is
 because you can't have fdisk or bsdlabel partitions larger than 2 TB and
 you can't boot from GPT.
 
 So what your saying here is that I can't do either my option 1 or 2, but
 have to create smaller volumes exported as individual drives?  Or just
 that I can't do 1, because my case 2 I could make three 2Tb fdisk slices
 which bsdlabel can then partition?

fdisk and bsdlabels both have a limit: because of the way they store the
data about the disk space they span, they can't store values that
reference space  2 TB. In particular, every partition must start at an
offset = 2 TB, and cannot be larger than 2 TB.

In theory, the maximum you could do in normal (read on) circumstances
is have a 4 TB volume partitioned into two 2 TB slices/partitions, and
that's it. In practice, you can't usefully partition drives larger than
2 TB at all.

There's one (also theoretical... I doubt anyone has tried it) way out of
it: simulate a device with larger sector size through gnop(8). For
example, if you use a 1 KB sector size you'll double all the limits (at
least for bsdlabel, I think fdisk is stuck in 512-byte sectors) to 4 TB,
for 4 KB sectors, to 16 TB). I know from experience that UFS can handle
sectors up to 8 KB, other file systems might not.

(ref: sys/disklabel.h:

struct partition {  /* the partition table */
u_int32_t p_size;   /* number of sectors in partition */
u_int32_t p_offset; /* starting sector */
u_int32_t p_fsize;  /* filesystem basic fragment size */
u_int8_t p_fstype;  /* filesystem type, see below */
u_int8_t p_frag;/* filesystem fragments per block */
u_int16_t p_cpg;/* filesystem cylinders per group */
} d_partitions[MAXPARTITIONS];  /* actually may be more */
)



signature.asc
Description: OpenPGP digital signature


Re: large RAID volume partition strategy

2007-08-17 Thread Ivan Voras
Vivek Khera wrote:

 But, if I don't go with zfs, which would be a better way to slice the
 space up: RAID volumes exported as individual disks to freebsd, or one
 RAID volume divided into multiple logical partitions with disklabel?

In general, it's almost always better to do the partitioning in the
storage manager (RAID controller, etc.) - this way there's less chance
of file system / stripe misalignment. If you really want to use
disklabel, read my other post :)




signature.asc
Description: OpenPGP digital signature