Re: large RAID volume partition strategy
On Aug 17, 2007, at 10:44 PM, Ivan Voras wrote: fdisk and bsdlabels both have a limit: because of the way they store the data about the disk space they span, they can't store values that reference space 2 TB. In particular, every partition must start at an offset = 2 TB, and cannot be larger than 2 TB. Thanks. This is good advice (along with your other note about doing it in the RAID volume manager). Nearly everyone else decided to jump on the raid level instead and spew forth the RAID10 is better for database party line. Well to you folks: once you have 1Gb cache and a lot of disks, there is not much difference between RAID10 and RAID5 or RAID6 in my testing. I ended up making 6 RAID volumes across all the disks to maximize spindle counts and strip the data at 16kB. This seems to work well, and I can assign the other partition as I need later on. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: large RAID volume partition strategy
On Aug 17, 2007, at 10:44 PM, Ivan Voras wrote: fdisk and bsdlabels both have a limit: because of the way they store the data about the disk space they span, they can't store values that reference space 2 TB. In particular, every partition must start at an offset = 2 TB, and cannot be larger than 2 TB. Oh... one more note: if I don't use fdisk or paritions, I *can* newfs the raw drive much bigger than 2Tb. I just don't want to do that for a production box. :-) ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: large RAID volume partition strategy
Vivek Khera wrote: On Aug 17, 2007, at 10:44 PM, Ivan Voras wrote: fdisk and bsdlabels both have a limit: because of the way they store the data about the disk space they span, they can't store values that reference space 2 TB. In particular, every partition must start at an offset = 2 TB, and cannot be larger than 2 TB. Oh... one more note: if I don't use fdisk or paritions, I *can* newfs the raw drive much bigger than 2Tb. I just don't want to do that for a production box. :-) Or you can use GPT, which uses 64-bit data structures and thus has an 8 ZB limit. -- Darren Pilgrim ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: large RAID volume partition strategy
On Aug 29, 2007, at 2:43 PM, Kirill Ponomarew wrote: What type I/O did you test, random read/writes, sequential writes ? The performance of RAID group always depends on what software you run on your RAID group. If it's database, be prepared for many random read/writes, hence dd(1) tests would be useless. I ran my database on it with a sample workload based on our live workload. Anything else would be a waste of time. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: large RAID volume partition strategy
On Wed, Aug 29, 2007 at 10:07:19AM -0400, Vivek Khera wrote: On Aug 17, 2007, at 10:44 PM, Ivan Voras wrote: fdisk and bsdlabels both have a limit: because of the way they store the data about the disk space they span, they can't store values that reference space 2 TB. In particular, every partition must start at an offset = 2 TB, and cannot be larger than 2 TB. Thanks. This is good advice (along with your other note about doing it in the RAID volume manager). Nearly everyone else decided to jump on the raid level instead and spew forth the RAID10 is better for database party line. Well to you folks: once you have 1Gb cache and a lot of disks, there is not much difference between RAID10 and RAID5 or RAID6 in my testing. What type I/O did you test, random read/writes, sequential writes ? The performance of RAID group always depends on what software you run on your RAID group. If it's database, be prepared for many random read/writes, hence dd(1) tests would be useless. I ended up making 6 RAID volumes across all the disks to maximize spindle counts and strip the data at 16kB. This seems to work well, and I can assign the other partition as I need later on. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED] -Kirill ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: large RAID volume partition strategy
If you want to avoid the long fsck-times your remaining options are a journaling filesystem or zfs, either requires an upgrade from freebsd 6.2. I have used zfs and had a serverstop due to powerutage in out area. Our zfs-samba-server came up fine with no data corruption. So I will suggest freebsd 7.0 with zfs. But, if I don't go with zfs, which would be a better way to slice the space up: RAID volumes exported as individual disks to freebsd, or one RAID volume divided into multiple logical partitions with disklabel? If you want to place data and the transaction-log on different partitions you want to be shure they reside on different physical disks so you probably want option 1. -- regards Claus When lenity and cruelty play for a kingdom, the gentlest gamester is the soonest winner. Shakespeare ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: large RAID volume partition strategy
* Vivek Khera ([EMAIL PROTECTED]) wrote: I'll investigate this option. Does anyone know the stability reliability of the mpt(4) driver on CURRENT? Is it out of GIANT lock yet? It was hard to tell from the TODO list if it is entirely free of GIANT or not. Yes, mpt(4) was made MPSAFE in revision 1.41, about 3 months ago: http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/dev/mpt/mpt.c#rev1.41 I've not seen any stability issues with mpt in either of our test systems, running heavy MySQL load over 20 spindles and a couple of controllers each. My only fear of this is that once this system is in production, that's pretty much it. Maintenance windows are about 1 year apart, usually longer. Best temper your fear with some thorough testing then. If you are going to use ZFS in such a situation, though, I might be strongly tempted to use Solaris instead. Why the long gaps between maintenance? -- Thomas 'Freaky' Hurst http://hur.st/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: large RAID volume partition strategy
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Clayton Milos wrote: If you want awesome performance and reliability the real way to go is RAID10 (or more correctly RAID 0+1). RAID10 and RAID0+1 are very different beasts. RAID10 is the best choice for a read/write intensive f/s with valuable data, exactly what you need to support a RDBMS. It is built by pairing up all of the drives as RAID1 mirrors[*] and then creating a RAID0 stripe across all of the mirrors. It's the least economical RAID setup, giving you a usable space which is 50% of the total raw disk space, but it is the most resilient -- potentially being able to survive half of the drives failing -- and much the best performing of the RAID types. RAID0+1 on the other hand is what you give to someone you don't like very much. In this case, you divide the disks into two equal sets, create a RAID0 stripe over each set and then a RAID1 mirror over the stripes. It has the /delightful/ feature that failure of any one drive immediately puts half of the available disks out of action: ie it is *less* resilient than any other RAID setup (other than a RAID0 stripe over all the drives). Space economy-wise it's exactly like RAID10 and performance characteristics are pretty similar to RAID10, leading to the obvious conclusion: use RAID10 instead. Cheers, Matthew [*] The correctly paranoid sysadmin will of course ensure that each of the disks in those pairs hangs off a different bus, comes from a different manufacturing batch and is preferably connected to a different controller and with different, independent power supplies. Or, in extreme cases, that each half of the mirrors are in completely different datacenters. - -- Dr Matthew J Seaman MA, D.Phil. 7 Priory Courtyard Flat 3 PGP: http://www.infracaninophile.co.uk/pgpkey Ramsgate Kent, CT11 9PW -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.4 (FreeBSD) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGxqmh8Mjk52CukIwRCPvyAJ4k/POTK9Moqu80nV9TKHZqLIC5ngCfYEd4 oiV2MAAiFIXcNSTSiCM4D6M= =GDZN -END PGP SIGNATURE- ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: large RAID volume partition strategy
On Fri, 17 Aug 2007 21:50:53 -0400 Vivek Khera [EMAIL PROTECTED] wrote: My only fear of this is that once this system is in production, that's pretty much it. Maintenance windows are about 1 year apart, usually longer. Seems to me you really should want a redundant / clustered system, allowing you to do maintenance on one server while running full production on the rest. Just my 0.2 euros. -- Regards, Torfinn Ingolfsen ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: large RAID volume partition strategy
On Aug 18, 2007, at 4:09 AM, Thomas Hurst wrote: Best temper your fear with some thorough testing then. If you are going to use ZFS in such a situation, though, I might be strongly tempted to use Solaris instead. Why the long gaps between maintenance? This is a DB server for a 24x7 service. Maintenance involves moving the DB master server to one of the replicas, and this involves downtime, so we like to do it as infrequently as possible. Also, it is not exposed to the internet at large, and runs on a closed private network, so remote and local attacks are not a major concern. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: large RAID volume partition strategy
I have a shiny new big RAID array. 16x500GB SATA 300+NCQ drives connected to the host via 4Gb fibre channel. This gives me 6.5Tb of raw disk. I've come up with three possibilities on organizing this disk. My needs are really for a single 1Tb file system on which I will run postgres. However, in the future I'm not sure what I'll really need. I don't plan to ever connect any other servers to this RAID unit. The three choices I've come with so far are: 1) Make one RAID volume of 6.5Tb (in a RAID6 + hot spare configuration), and make one FreeBSD file system on the whole partition. 2) Make one RAID volume of 6.5Tb (in a RAID6 + hot spare configuration), and make 6 FreeBSD partitions with one file system each. 3) Make 6 RAID volumes and expose them to FreeBSD as multiple drives, then make one partition + file system on each disk. Each RAID volume would span across all 16 drives, and I could make the volumes of differing RAID levels, if needed, but I'd probably stick with RAID6 +spare. I'm not keen on option 1 because of the potentially long fsck times after a crash. If you want to avoid the long fsck-times your remaining options are a journaling filesystem or zfs, either requires an upgrade from freebsd 6.2. I have used zfs and had a serverstop due to powerutage in out area. Our zfs-samba-server came up fine with no data corruption. So I will suggest freebsd 7.0 with zfs. Short fsck-times and ufs2 don't do well together. I know there is background-fsck but for me that is not an option. -- regards Claus When lenity and cruelty play for a kingdom, the gentlest gamester is the soonest winner. Shakespeare ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: large RAID volume partition strategy
- Original Message - From: Claus Guttesen [EMAIL PROTECTED] To: Vivek Khera [EMAIL PROTECTED] Cc: FreeBSD Stable freebsd-stable@freebsd.org Sent: Friday, August 17, 2007 11:10 PM Subject: Re: large RAID volume partition strategy I have a shiny new big RAID array. 16x500GB SATA 300+NCQ drives connected to the host via 4Gb fibre channel. This gives me 6.5Tb of raw disk. I've come up with three possibilities on organizing this disk. My needs are really for a single 1Tb file system on which I will run postgres. However, in the future I'm not sure what I'll really need. I don't plan to ever connect any other servers to this RAID unit. The three choices I've come with so far are: 1) Make one RAID volume of 6.5Tb (in a RAID6 + hot spare configuration), and make one FreeBSD file system on the whole partition. 2) Make one RAID volume of 6.5Tb (in a RAID6 + hot spare configuration), and make 6 FreeBSD partitions with one file system each. 3) Make 6 RAID volumes and expose them to FreeBSD as multiple drives, then make one partition + file system on each disk. Each RAID volume would span across all 16 drives, and I could make the volumes of differing RAID levels, if needed, but I'd probably stick with RAID6 +spare. I'm not keen on option 1 because of the potentially long fsck times after a crash. If you want to avoid the long fsck-times your remaining options are a journaling filesystem or zfs, either requires an upgrade from freebsd 6.2. I have used zfs and had a serverstop due to powerutage in out area. Our zfs-samba-server came up fine with no data corruption. So I will suggest freebsd 7.0 with zfs. Short fsck-times and ufs2 don't do well together. I know there is background-fsck but for me that is not an option. -- regards Claus When lenity and cruelty play for a kingdom, the gentlest gamester is the soonest winner. Shakespeare If you goal is speed and obviously as little possibility of a fail (RAID6+spare) then RAID6 is the wrong way to go... RAID6's read speeds are great but the write speeds are not. If you want awesome performance and reliability the real way to go is RAID10 (or more correctly RAID 0+1). You will of course lose a lot more space than you will with RAID6 but the write speeds will be astronomically higher. How would you feel with 16 drives in RAID10 with 2 hot spares? This will give you 3.5TB and if you're using a good RAID controller you should be getting write speeds of around 400MB/s to the array. I've got an Areca 1120 RAID controller with 4 320G drives in a stripe set and I'm writing at 280MB/s to that. With 7 500G drives you should be getting around 400MB/s because hte RAID10 doesn't have to calculate reconstrust data. The theoretical max you're ever going to get from the array is 500MB/s anyways with a 4Gb fibre channel controller. What it really boils down to is how much space are you willing to sacrifice for performance... Another thing you really have to do is make sure you have a good backup system. I've seen more than one customer crying because their RAID system with hot spares went on the blink and they lost their data. -Clay ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: large RAID volume partition strategy
On Fri, 17 Aug 2007 17:42:55 -0400 Vivek Khera wrote: I have a shiny new big RAID array. 16x500GB SATA 300+NCQ drives connected to the host via 4Gb fibre channel. This gives me 6.5Tb of raw disk. I've come up with three possibilities on organizing this disk. My needs are really for a single 1Tb file system on which I will run postgres. However, in the future I'm not sure what I'll really need. I don't plan to ever connect any other servers to this RAID unit. The three choices I've come with so far are: 1) Make one RAID volume of 6.5Tb (in a RAID6 + hot spare configuration), and make one FreeBSD file system on the whole partition. 2) Make one RAID volume of 6.5Tb (in a RAID6 + hot spare configuration), and make 6 FreeBSD partitions with one file system each. 3) Make 6 RAID volumes and expose them to FreeBSD as multiple drives, then make one partition + file system on each disk. Each RAID volume would span across all 16 drives, and I could make the volumes of differing RAID levels, if needed, but I'd probably stick with RAID6 +spare. I'm not keen on option 1 because of the potentially long fsck times after a crash. What advantage/disadvantage would I have between 2 and 3? The only thing I can come up with is that the disk scheduling algorithm in FreeBSD might not be optimal if the drives really are not truly independent as they are really backed by the same 16 drives, so option 2 might be better. However, with option 3, if I do ever end up connecting another host to the array, I can assign some of the volumes to the other host(s). My goal is speed, speed, speed. Seems that RAID[56] may be too slw. I'd suggest RAID10. I have 6 SATA-II 300MB/s disks at 3WARE adapter. My (very!) simple tests gave about 170MB/s for dd. BTW, I tested (OK, very fast) RAID5, RAID6, gmirror+gstripe and noone get close to RAID10. (Well, as expected, I suppose). I'm running FreeBSD 6.2/amd64 and using an LSI fibre card. If you have time you may do your own tests... And in case RAID0 you shouldn't have problems with long fsck. Leave a couple of your disks for hot-swapping and you'll get 7Tb. ;-) Thanks for any opinions and recommendations. WBR -- bsam ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: large RAID volume partition strategy
On Sat, 18 Aug 2007 02:26:04 +0400 Boris Samorodov wrote: On Fri, 17 Aug 2007 17:42:55 -0400 Vivek Khera wrote: I have a shiny new big RAID array. 16x500GB SATA 300+NCQ drives connected to the host via 4Gb fibre channel. This gives me 6.5Tb of raw disk. I've come up with three possibilities on organizing this disk. My needs are really for a single 1Tb file system on which I will run postgres. However, in the future I'm not sure what I'll really need. I don't plan to ever connect any other servers to this RAID unit. The three choices I've come with so far are: 1) Make one RAID volume of 6.5Tb (in a RAID6 + hot spare configuration), and make one FreeBSD file system on the whole partition. 2) Make one RAID volume of 6.5Tb (in a RAID6 + hot spare configuration), and make 6 FreeBSD partitions with one file system each. 3) Make 6 RAID volumes and expose them to FreeBSD as multiple drives, then make one partition + file system on each disk. Each RAID volume would span across all 16 drives, and I could make the volumes of differing RAID levels, if needed, but I'd probably stick with RAID6 +spare. I'm not keen on option 1 because of the potentially long fsck times after a crash. What advantage/disadvantage would I have between 2 and 3? The only thing I can come up with is that the disk scheduling algorithm in FreeBSD might not be optimal if the drives really are not truly independent as they are really backed by the same 16 drives, so option 2 might be better. However, with option 3, if I do ever end up connecting another host to the array, I can assign some of the volumes to the other host(s). My goal is speed, speed, speed. Seems that RAID[56] may be too slw. I'd suggest RAID10. I have 6 SATA-II 300MB/s disks at 3WARE adapter. My (very!) simple tests gave about 170MB/s for dd. BTW, I tested (OK, very fast) RAID5, RAID6, gmirror+gstripe and noone get close to RAID10. (Well, as expected, I suppose). I'm running FreeBSD 6.2/amd64 and using an LSI fibre card. If you have time you may do your own tests... And in case RAID0 you ^ RAID10 shouldn't have problems with long fsck. Leave a couple of your disks for hot-swapping and you'll get 7Tb. ;-) ^^^ 3.5TB Thanks for any opinions and recommendations. sorry, not my night... WBR -- bsam ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: large RAID volume partition strategy
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Vivek Khera wrote: I'm not keen on option 1 because of the potentially long fsck times after a crash. Depending on your allowable downtime after a crash, fscking even a 1 TB UFS file system can be a long time. For large file systems there's really no alternative to using -CURRENT / 7.0, and either gjournal or ZFS. When you get there, you'll need to create 1 small RAID volume (= 1 GB) from which to boot (and probably use it for root) and use the rest for whatever your choice is (doesn't really matter at this point). This is because you can't have fdisk or bsdlabel partitions larger than 2 TB and you can't boot from GPT. -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.5 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGxi/aldnAQVacBcgRAovJAKCnFTdEn81uf9lsYg+CuI5kulrd5ACeKcLt J/4WEUQA9Paw2FR9EnHZ8g0= =HXLY -END PGP SIGNATURE- ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: large RAID volume partition strategy
- Clayton Milos [EMAIL PROTECTED] wrote: If you goal is speed and obviously as little possibility of a fail (RAID6+spare) then RAID6 is the wrong way to go... RAID6's read speeds are great but the write speeds are not. If you want awesome performance and reliability the real way to go is RAID10 (or more correctly RAID 0+1). RAID6 has better reliability than RAID10, because in RAID6, you can survive the failure of _any_ two disks. In RAID10, double disks failures are only survivable if specific disks fail (alternates). In RAID10 sets this is a problem. So: * Reliability and space: RAID6 * Performance: RAID10 And the performance issues on RAID5/6 on writing is directly proportional to the quality of your controller. Good controllers can do partial stripe updates, and other optimizations to avoid having to read data back before writing anything. A simple minded RAID controller which has to read the entire strip back, writes are 1/3 slower than reads. Any good controller should be about 75% of read the speed. And RAID5+0 and RAID6+1 are also good options. And make sure your controller can do RAID scrubbing. The chances of a fatal failure on an array can be greatly minimized with RAID scrubing. None of the cheap controllers can do this. ZFS can do it though. ZFS software RAID is almost always better than a cheaper hardware RAID, though maybe not fully mature in FreeBSD 7. RAID6 also minimizes the risk of double disk failures. Big huge disks are almost always the wrong choice for databases though. You will never be able to fill the disk up before you hit the IOPS limit per each spindle. A 15K SAS disk has at least twice the IOPS of a 7K SATA disk. And while adding another bank to your RAID0 array doubles your IOPS in theory, it isn't exactly a linear increase. You need the IOPS, it is better to start with faster disks. Tom ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: large RAID volume partition strategy
On Aug 17, 2007, at 6:26 PM, Boris Samorodov wrote: I have 6 SATA-II 300MB/s disks at 3WARE adapter. My (very!) simple tests gave about 170MB/s for dd. BTW, I tested (OK, very fast) RAID5, RAID6, gmirror+gstripe and noone get close to RAID10. (Well, as expected, I suppose). Whichever RAID level I choose, I still need to decide how to split the 6.5Tb into smaller hunks. In any case, my testing with RAID10, RAID5, and RAID6 showed marginal differences with my workload. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: large RAID volume partition strategy
On Aug 17, 2007, at 6:10 PM, Claus Guttesen wrote: If you want to avoid the long fsck-times your remaining options are a journaling filesystem or zfs, either requires an upgrade from freebsd 6.2. I have used zfs and had a serverstop due to powerutage in out area. Our zfs-samba-server came up fine with no data corruption. So I will suggest freebsd 7.0 with zfs. Interesting idea... But, if I don't go with zfs, which would be a better way to slice the space up: RAID volumes exported as individual disks to freebsd, or one RAID volume divided into multiple logical partitions with disklabel? ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: large RAID volume partition strategy
On Aug 17, 2007, at 7:31 PM, Ivan Voras wrote: Depending on your allowable downtime after a crash, fscking even a 1 TB UFS file system can be a long time. For large file systems there's really no alternative to using -CURRENT / 7.0, and either gjournal or ZFS. I'll investigate this option. Does anyone know the stability reliability of the mpt(4) driver on CURRENT? Is it out of GIANT lock yet? It was hard to tell from the TODO list if it is entirely free of GIANT or not. My only fear of this is that once this system is in production, that's pretty much it. Maintenance windows are about 1 year apart, usually longer. When you get there, you'll need to create 1 small RAID volume (= 1 GB) from which to boot (and probably use it for root) and use the rest for whatever your choice is (doesn't really matter at this point). This is because you can't have fdisk or bsdlabel partitions larger than 2 TB and you can't boot from GPT. So what your saying here is that I can't do either my option 1 or 2, but have to create smaller volumes exported as individual drives? Or just that I can't do 1, because my case 2 I could make three 2Tb fdisk slices which bsdlabel can then partition? ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: large RAID volume partition strategy
Vivek Khera wrote: My only fear of this is that once this system is in production, that's pretty much it. Maintenance windows are about 1 year apart, usually longer. Others will have to comment about that. I have only one 7-CURRENT in production (because of ZFS) and I had only one panic (in ZFS). But this machine is not heavily utilized. When you get there, you'll need to create 1 small RAID volume (= 1 GB) from which to boot (and probably use it for root) and use the rest for whatever your choice is (doesn't really matter at this point). This is because you can't have fdisk or bsdlabel partitions larger than 2 TB and you can't boot from GPT. So what your saying here is that I can't do either my option 1 or 2, but have to create smaller volumes exported as individual drives? Or just that I can't do 1, because my case 2 I could make three 2Tb fdisk slices which bsdlabel can then partition? fdisk and bsdlabels both have a limit: because of the way they store the data about the disk space they span, they can't store values that reference space 2 TB. In particular, every partition must start at an offset = 2 TB, and cannot be larger than 2 TB. In theory, the maximum you could do in normal (read on) circumstances is have a 4 TB volume partitioned into two 2 TB slices/partitions, and that's it. In practice, you can't usefully partition drives larger than 2 TB at all. There's one (also theoretical... I doubt anyone has tried it) way out of it: simulate a device with larger sector size through gnop(8). For example, if you use a 1 KB sector size you'll double all the limits (at least for bsdlabel, I think fdisk is stuck in 512-byte sectors) to 4 TB, for 4 KB sectors, to 16 TB). I know from experience that UFS can handle sectors up to 8 KB, other file systems might not. (ref: sys/disklabel.h: struct partition { /* the partition table */ u_int32_t p_size; /* number of sectors in partition */ u_int32_t p_offset; /* starting sector */ u_int32_t p_fsize; /* filesystem basic fragment size */ u_int8_t p_fstype; /* filesystem type, see below */ u_int8_t p_frag;/* filesystem fragments per block */ u_int16_t p_cpg;/* filesystem cylinders per group */ } d_partitions[MAXPARTITIONS]; /* actually may be more */ ) signature.asc Description: OpenPGP digital signature
Re: large RAID volume partition strategy
Vivek Khera wrote: But, if I don't go with zfs, which would be a better way to slice the space up: RAID volumes exported as individual disks to freebsd, or one RAID volume divided into multiple logical partitions with disklabel? In general, it's almost always better to do the partitioning in the storage manager (RAID controller, etc.) - this way there's less chance of file system / stripe misalignment. If you really want to use disklabel, read my other post :) signature.asc Description: OpenPGP digital signature