Re: gptmbr.bin vs RAIDframe
On Sun 21 Jun 2015 at 13:56:15 +0200, Rhialto wrote: On Tue 16 Jun 2015 at 14:32:52 +0100, David Brownlee wrote: The script I used is below in case anyone finds it of interest A few notes: Ah never mind. I wrote this before seeing your followup which addresses both points. -Olaf. -- ___ Olaf 'Rhialto' Seibert -- The Doctor: No, 'eureka' is Greek for \X/ rhialto/at/xs4all.nl-- 'this bath is too hot.' pgpwRTWzbFAPm.pgp Description: PGP signature
Re: gptmbr.bin vs RAIDframe
On Fri 19 Jun 2015 at 01:27:09 +0700, Robert Elz wrote: Of course, if a drive develops bad spots, the single raidframe approach will fail that drive, and none of the filesystems will be mirrored until the drive is replaced or the bad spots corrected - the multiple raidframe approach will only file the arrays where the bad spots occur, other filesystems would remain mirrored. Which leads to the question: has this principle never been used in single-large-RAID setups? Like there is now some memory of which parts of the disk have parity that still needs to be rebuilt (right?), one could re-use the same zones and remember in which one of those there was a read error. kre -Olaf. -- ___ Olaf 'Rhialto' Seibert -- The Doctor: No, 'eureka' is Greek for \X/ rhialto/at/xs4all.nl-- 'this bath is too hot.' pgpEVWdjJnDYD.pgp Description: PGP signature
Re: gptmbr.bin vs RAIDframe
On Tue 16 Jun 2015 at 14:32:52 +0100, David Brownlee wrote: The script I used is below in case anyone finds it of interest A few notes: cat $RAID.confEND START array 1 2 0 START disks /dev/dk1 /dev/dk3 START layout 64 1 1 1 START queue fifo 100 END cat $RAID.conf END START array 1 2 0 START disks /dev/dk1 /dev/dk3 START layout 64 1 1 1 START queue fifo 100 END Why does it have this part twice? ... gpt add -a 64k -l $RAIDa -t ffs -s $ROOTSIZE $RAID I suppose you mean ${RAID}a, since the variable RAIDa doesn't seem to exist. $RAIDa occurs a few more times. -Olaf. -- ___ Olaf 'Rhialto' Seibert -- The Doctor: No, 'eureka' is Greek for \X/ rhialto/at/xs4all.nl-- 'this bath is too hot.' pgpdv1zs9226W.pgp Description: PGP signature
Re: gptmbr.bin vs RAIDframe
Rhialto rhia...@falu.nl writes: On Fri 19 Jun 2015 at 01:27:09 +0700, Robert Elz wrote: Of course, if a drive develops bad spots, the single raidframe approach will fail that drive, and none of the filesystems will be mirrored until the drive is replaced or the bad spots corrected - the multiple raidframe approach will only file the arrays where the bad spots occur, other filesystems would remain mirrored. Which leads to the question: has this principle never been used in single-large-RAID setups? Like there is now some memory of which parts of the disk have parity that still needs to be rebuilt (right?), one could re-use the same zones and remember in which one of those there was a read error. Three unrelated thoughts: 1) raidframe, at least in RAID1, already keeps track of what's dirty by zone, so that parity rebuild after a crash is very fast. It could be extended to keep track of failure by zone as well. 2) In my experience with raidframe, I have experienced three failure modes. One is transient errors, perhaps SATA cabling or conrtroller flakes. that are cleared on reboot. Another is complete failure of the disk. The most common is some bad sectors. In these cases, rewriting parity restores the raid set. So raidframe could, again at least for RAID1, write the data from the other disk back to the disk with the read error. Beyond that, it would be nice to have a background process to read/compare both halves and on read failure on one half write the good value to the non-readable sector. I more or less do this manually by running dd over my disks to /dev/null every few months. 3) I think zfs does part of this, but I'm not clear on how it handles a disk with some blocks that can't be read. It would be really nice to get modern OpenZFS running well on NetBSD. pgp3wIo3W600e.pgp Description: PGP signature
Re: gptmbr.bin vs RAIDframe
mar...@duskware.de (Martin Husemann) writes: On Thu, Jun 18, 2015 at 08:09:20AM +0100, David Brownlee wrote: Hmm, how about if raidframe were extended to have an option to specify a wedge label type, similar to a gpt slot? Completely independ from the topic at hand, I really love this idea! I use most raids w/o further partitioning them, and this would map them to named wedges straight forward. You could create autodiscover code that just creates a wedge spanning the whole disk. It would be of least priority, so it is only used when no other disk label is recognized. Fine if you just want to access a raw disk. But it a) wouldn't have a name and b) wouldn't have a type. But why create a wedge when it is as useful as the RAW_PART of the parent device? With raidframe-magic-code you could put such information into the raid label and somehow pass that to the wedge. But it is much simpler to just use a GPT on the raid device. No magic needed. -- -- Michael van Elst Internet: mlel...@serpens.de A potential Snark may lurk in every tree.
Re: gptmbr.bin vs RAIDframe
On Thu, Jun 18, 2015 at 01:14:52PM +, Michael van Elst wrote: You could create autodiscover code that just creates a wedge spanning the whole disk. It would be of least priority, so it is only used when no other disk label is recognized. Fine if you just want to access a raw disk. But it a) wouldn't have a name and b) wouldn't have a type. Adding (a) and (b) was the idea, I think. But why create a wedge when it is as useful as the RAW_PART of the parent device? Because you can mount it via name. With raidframe-magic-code you could put such information into the raid label and somehow pass that to the wedge. But it is much simpler to just use a GPT on the raid device. No magic needed. No *other* magic needed - but if the raid label could provide both name and type, and the autodiscover code would deal, the setup is even simpler. But yes, it is just replacing some magic with other magic. Martin
Re: gptmbr.bin vs RAIDframe
Date:Wed, 17 Jun 2015 14:36:24 -0700 From:John Nemeth jnem...@cue.bc.ca Message-ID: 201506172136.t5hlaogg020...@server.cornerstoneservice.ca | Given that a GPT typically has a minimum of 128 slots, would | you do gpt-raid-gpt? Why not just create a sufficient number of | RAID partitions? Recovery workload. Consider 2 drives, on which there are to be 30-40 filesystems, all mirrored. One raid, with 40 partitions (wedges) on it is one possibility, 40 partitions (wedges) each containing a raidframs, each containing a single filesystem is another. When everything is working (and to some extent, initial config) there is little real difference between those two approaches (the gtp-raidframs way means more raidframe overhead but that is not really significant). But when one of the drives dies, and is replaced, the raidframe-gpt way requires (the operator) to arrange reconstruction of a single raid array (issuing raidctl to add in the replacement) just once, the gtp-raidrame way required doing that once for each of he raid arrays. For me, that's enough to prefer fewer raid arrays. Of course, if a drive develops bad spots, the single raidframe approach will fail that drive, and none of the filesystems will be mirrored until the drive is replaced or the bad spots corrected - the multiple raidframe approach will only file the arrays where the bad spots occur, other filesystems would remain mirrored. kre
Re: gptmbr.bin vs RAIDframe
On 17 June 2015 at 22:36, John Nemeth jnem...@cue.bc.ca wrote: On Jun 17, 9:42pm, David Brownlee wrote: } The issue is less encoding which is the root partition, more how the } (very space limited) initial boot blocks can find it. } } Absent the workaround suggested by Stephen (of which I am now a huge } fan and currently have 5.3TB of data copying across to :), the basic } options to allow booting from gpt-raid-gpt root might be: Given that a GPT typically has a minimum of 128 slots, would you do gpt-raid-gpt? Why not just create a sufficient number of RAID partitions? Is there was some way of creating a typed wedge from raw raid partitions? For 2TB raid partitions we can just create a single active partition disklabel with ffs, swap or whatever, but for 2TB that falls down. Hmm, how about if raidframe were extended to have an option to specify a wedge label type, similar to a gpt slot? If set then the raw raid device just becomes a named wedge. Would allow both dk? and raid? numbers to change dynamically between boots without affecting fstab...
Re: gptmbr.bin vs RAIDframe
On Thu, Jun 18, 2015 at 08:09:20AM +0100, David Brownlee wrote: Hmm, how about if raidframe were extended to have an option to specify a wedge label type, similar to a gpt slot? Completely independ from the topic at hand, I really love this idea! I use most raids w/o further partitioning them, and this would map them to named wedges straight forward. Martin
Re: gptmbr.bin vs RAIDframe
In article pine.neb.4.64.1506171043190@ugly.internal.precedence.co.uk, Stephen Borrill net...@precedence.co.uk wrote: On Tue, 16 Jun 2015, David Brownlee wrote: OK, I've identified the problem (if not the solution :) I'm trying to setup - gpt with a wedge (at offset +64) that covers the entire disk, containing: - a raid1 partition, which offsets its context by +64, containing: - gpt with a wedge (at offset +64) that contains the root filesystem By my count thats means /boot will be in a filesystem at 64+64+64 = 192, while bootxx_ will only try for filesystems at the partition start, plus another attempt at +64 (so 64 and 128). If I manually add another attempt at an additional +64 bootxx will find /boot. At which point /boot will fail to find /netbsd (as I haven't mangled it as well) The first +64 comes from the initial gpt partition, so thats fine - if the initial gpt had a wedge starting at 2048 then the gpt biosboot would plug things in appropriately. The second +64 is looking for the the fixed offset of raidframe which is also ~fine (its either there or not, and if its there, its 64). The final +64 is a kludge which just happens to match my 'gpt-on-raid' layout, and is clearly not a solution. The problem is there not being enough space in the bootxx blocks to parse the disk layout for the gpt-on raid. As I see it my options are 1) Separate boot partition, simple but not elegant 2) An initial 'root' wedge which has a RAID1 with a disklabel for booting, then another wedge for everything else. Also simple, no less inelegant, and avoids the annoying extra boot partition(s), but means you cannot have root on a named wedge (minor point) 2) is what I ended up doing, so instead of dk - raid - dk it was just dk - raid (just remember the installboot step from my earlier mail). Even with a separate boot partition (or with a fixed bootloader), dk - raid - dk doesn't allow you to set -A root on the RAID (well it sets it, but it doesn't work), so there's still a missing piece of the puzzle. Yes, you need to know the device name for root in your fstab, perhaps the name lookup code could learn to recognise %ROOT% or such like to mean /dev/kern.root_device'a'+kern.root_partition As the comment hints, it should probably be done using an attribute... /* * XXX: The following code assumes that the root raid * is the first ('a') partition. This is about the best * we can do with a BSD disklabel, but we might be able * to do better with a GPT label, by setting a specified * attribute to indicate the root partition. We can then * stash the partition number in the r-root_partition * high bits (the bottom 2 bits are already used). For * now we just set booted_partition to 0 when we override * root. */ It is pretty simple to implement. christos
Re: gptmbr.bin vs RAIDframe
On 17 June 2015 at 19:24, Christos Zoulas chris...@astron.com wrote: In article pine.neb.4.64.1506171043190@ugly.internal.precedence.co.uk, Stephen Borrill net...@precedence.co.uk wrote: On Tue, 16 Jun 2015, David Brownlee wrote: OK, I've identified the problem (if not the solution :) I'm trying to setup - gpt with a wedge (at offset +64) that covers the entire disk, containing: - a raid1 partition, which offsets its context by +64, containing: - gpt with a wedge (at offset +64) that contains the root filesystem By my count thats means /boot will be in a filesystem at 64+64+64 = 192, while bootxx_ will only try for filesystems at the partition start, plus another attempt at +64 (so 64 and 128). If I manually add another attempt at an additional +64 bootxx will find /boot. At which point /boot will fail to find /netbsd (as I haven't mangled it as well) The first +64 comes from the initial gpt partition, so thats fine - if the initial gpt had a wedge starting at 2048 then the gpt biosboot would plug things in appropriately. The second +64 is looking for the the fixed offset of raidframe which is also ~fine (its either there or not, and if its there, its 64). The final +64 is a kludge which just happens to match my 'gpt-on-raid' layout, and is clearly not a solution. The problem is there not being enough space in the bootxx blocks to parse the disk layout for the gpt-on raid. As I see it my options are 1) Separate boot partition, simple but not elegant 2) An initial 'root' wedge which has a RAID1 with a disklabel for booting, then another wedge for everything else. Also simple, no less inelegant, and avoids the annoying extra boot partition(s), but means you cannot have root on a named wedge (minor point) 2) is what I ended up doing, so instead of dk - raid - dk it was just dk - raid (just remember the installboot step from my earlier mail). Even with a separate boot partition (or with a fixed bootloader), dk - raid - dk doesn't allow you to set -A root on the RAID (well it sets it, but it doesn't work), so there's still a missing piece of the puzzle. Yes, you need to know the device name for root in your fstab, perhaps the name lookup code could learn to recognise %ROOT% or such like to mean /dev/kern.root_device'a'+kern.root_partition As the comment hints, it should probably be done using an attribute... /* * XXX: The following code assumes that the root raid * is the first ('a') partition. This is about the best * we can do with a BSD disklabel, but we might be able * to do better with a GPT label, by setting a specified * attribute to indicate the root partition. We can then * stash the partition number in the r-root_partition * high bits (the bottom 2 bits are already used). For * now we just set booted_partition to 0 when we override * root. */ It is pretty simple to implement. The issue is less encoding which is the root partition, more how the (very space limited) initial boot blocks can find it. Absent the workaround suggested by Stephen (of which I am now a huge fan and currently have 5.3TB of data copying across to :), the basic options to allow booting from gpt-raid-gpt root might be: a) Chain the bootloaders so the initial (gpt) one loads a later one b) plug the offset to the actual root filesystem into the initial boot blocks c) squeeze the logic for gtp-raid-gpt and similar into the first level boot blocks - possibly by having a custom boot block just for that
Re: gptmbr.bin vs RAIDframe
Date:Wed, 17 Jun 2015 12:01:10 +0100 (BST) From:Stephen Borrill net...@precedence.co.uk Message-ID: pine.neb.4.64.1506171043190@ugly.internal.precedence.co.uk | dk - raid - dk doesn't allow you to set -A root on the RAID | (well it sets it, but it doesn't work), Can you tell me what you believe doesn't work exactly? I do that, and it seems to work just fine for me. I do use your #1 method for booting (separate non-raid boot wedge - or if you like, manual raid1 ... that is, each drive has a wedge containing a root filesys, including all the boot stuff, that are kept synchronised manually .. the only thing that ever changes on them is /netbsd and that changes very rarely for me. Those filesystems are not normally mounted.) But beyond that, I have a large wedge (on each drive) that contains a raid1 (mirrors), and in that, a GPT with wedges, one of which is root. The raid is -A root and it works just fine. | so there's still a missing piece of the puzzle. You do need to know that the root wedge needs to be called raidNa (in my case, raid7a as it is raid7 that contains the root). | Yes, you need to know the device name for root in your fstab, fstab contains ... NAME=raid7a / ffs rw,log 1 1 tmpfs /tmptmpfs rw,-m=1777,-s=537255936 NAME=SWAP_0 noneswapsw,dp0 0 NAME=USR_7 /usrffs ro 1 2 (etc). | perhaps the name | lookup code could learn to recognise %ROOT% or such like to mean | /dev/kern.root_device'a'+kern.root_partition disklabels are dying, adding more hacks to deal with them seems counter-productive. kern.root_partition is a disklabel artifact, it makes no sense when using wedges. For booting, what we need in the fullness of time, is proper UEFI boot code, rather than a legacy style MBR boot half-block and the tiny boot loader that it can handle. kre
Re: gptmbr.bin vs RAIDframe
On Tue, 16 Jun 2015, David Brownlee wrote: OK, I've identified the problem (if not the solution :) I'm trying to setup - gpt with a wedge (at offset +64) that covers the entire disk, containing: - a raid1 partition, which offsets its context by +64, containing: - gpt with a wedge (at offset +64) that contains the root filesystem By my count thats means /boot will be in a filesystem at 64+64+64 = 192, while bootxx_ will only try for filesystems at the partition start, plus another attempt at +64 (so 64 and 128). If I manually add another attempt at an additional +64 bootxx will find /boot. At which point /boot will fail to find /netbsd (as I haven't mangled it as well) The first +64 comes from the initial gpt partition, so thats fine - if the initial gpt had a wedge starting at 2048 then the gpt biosboot would plug things in appropriately. The second +64 is looking for the the fixed offset of raidframe which is also ~fine (its either there or not, and if its there, its 64). The final +64 is a kludge which just happens to match my 'gpt-on-raid' layout, and is clearly not a solution. The problem is there not being enough space in the bootxx blocks to parse the disk layout for the gpt-on raid. As I see it my options are 1) Separate boot partition, simple but not elegant 2) An initial 'root' wedge which has a RAID1 with a disklabel for booting, then another wedge for everything else. Also simple, no less inelegant, and avoids the annoying extra boot partition(s), but means you cannot have root on a named wedge (minor point) 2) is what I ended up doing, so instead of dk - raid - dk it was just dk - raid (just remember the installboot step from my earlier mail). Even with a separate boot partition (or with a fixed bootloader), dk - raid - dk doesn't allow you to set -A root on the RAID (well it sets it, but it doesn't work), so there's still a missing piece of the puzzle. Yes, you need to know the device name for root in your fstab, perhaps the name lookup code could learn to recognise %ROOT% or such like to mean /dev/kern.root_device'a'+kern.root_partition -- Stephen
Re: gptmbr.bin vs RAIDframe
On Tue, Jun 16, 2015 at 02:32:52PM +0100, David Brownlee wrote: The only missing part is trying to make the system directly bootable. I tried gpt biosboot -i 1 wd0 which didn't give any errors, but equally didn't work. At boot time gptmgr prints Missing OS which appears to be because it cannot locate the 0xaa55 signature. I've also not been able to make a raid-on-wedge partition bootable. I think the bootloader needs to be taught another variant of 'skipping raidframe header'... I currently have (-current/amd64): # sysctl kern.root_device kern.root_device = raid7 # dmesg | grep dk0 dk0 at wd0: 80706d87-e1f8-11e3-9080-10bf48bd3389 dk0: 14680192 blocks at 64, type: raidframe raid7: Components: /dev/dk7 /dev/dk0 dk0 at wd0: 80706d87-e1f8-11e3-9080-10bf48bd3389 dk0: 14680192 blocks at 64, type: raidframe raid7: Components: /dev/dk7 /dev/dk0 Is that the way around you are trying? Cheers, Patrick
Re: gptmbr.bin vs RAIDframe
On Tue, 16 Jun 2015, David Brownlee wrote: On 16 June 2015 at 14:14, Stephen Borrill net...@precedence.co.uk wrote: I've been testing out wedges combined with RAIDframe on HDDs 2TB. I have: # gpt show wd1 startsize index contents 0 1 PMBR 1 1 Pri GPT header 2 32 Pri GPT table 3420972448 1 GPT part - NetBSD RAIDFrame component 2097248220972448 2 GPT part - NetBSD RAIDFrame component 41944930 5818588205 3 GPT part - NetBSD RAIDFrame component 5860533135 32 Sec GPT table 5860533167 1 Sec GPT header I'd be wary of starting a partition at 34 (not 4K aligned) for performance reasons Sure, in production I'd do exactly the same. This was just to test out the concepts. (wd2 is same) I've then set up 3x RAIDframe RAID1 on the 3 wedges from each disk. raid0 and raid1 (from the wedges with indices 1 and 2) are used directly (i.e. /dev/raid0a is root). There's another GPT on raid2: # gpt show raid2 startsize index contents 0 1 PMBR 1 1 Pri GPT header 2 32 Pri GPT table 34 5818587965 1 GPT part - NetBSD FFSv1/FFSv2 5818587999 32 Sec GPT table 5818588031 1 Sec GPT header and I've named this usr. My fstab contains the following and all is well: /dev/raid0a / ffs rw 1 1 /dev/raid1a noneswapsw,dp NAME=usr/usrffs rw 1 2 This copes with missing components and wedges being renumbered. The only missing part is trying to make the system directly bootable. I tried gpt biosboot -i 1 wd0 which didn't give any errors, but equally didn't work. At boot time gptmgr prints Missing OS which appears to be because it cannot locate the 0xaa55 signature. I've also not been able to make a raid-on-wedge partition bootable. I think the bootloader needs to be taught another variant of 'skipping raidframe header'... jakllsch@ pointed out off-list that in addition to gpt biosboot -i 1 wd0, I also needed to run installboot on the wedge containing raid0 (i.e. root). Therefore I ran installboot /dev/rdk0 /usr/mdec/bootxx_ffsv2 (and the same on dk3) and it now boots. -- Stephen
Re: gptmbr.bin vs RAIDframe
On 16 June 2015 at 14:14, Stephen Borrill net...@precedence.co.uk wrote: I've been testing out wedges combined with RAIDframe on HDDs 2TB. I have: # gpt show wd1 startsize index contents 0 1 PMBR 1 1 Pri GPT header 2 32 Pri GPT table 3420972448 1 GPT part - NetBSD RAIDFrame component 2097248220972448 2 GPT part - NetBSD RAIDFrame component 41944930 5818588205 3 GPT part - NetBSD RAIDFrame component 5860533135 32 Sec GPT table 5860533167 1 Sec GPT header I'd be wary of starting a partition at 34 (not 4K aligned) for performance reasons (wd2 is same) I've then set up 3x RAIDframe RAID1 on the 3 wedges from each disk. raid0 and raid1 (from the wedges with indices 1 and 2) are used directly (i.e. /dev/raid0a is root). There's another GPT on raid2: # gpt show raid2 startsize index contents 0 1 PMBR 1 1 Pri GPT header 2 32 Pri GPT table 34 5818587965 1 GPT part - NetBSD FFSv1/FFSv2 5818587999 32 Sec GPT table 5818588031 1 Sec GPT header and I've named this usr. My fstab contains the following and all is well: /dev/raid0a / ffs rw 1 1 /dev/raid1a noneswapsw,dp NAME=usr/usrffs rw 1 2 This copes with missing components and wedges being renumbered. The only missing part is trying to make the system directly bootable. I tried gpt biosboot -i 1 wd0 which didn't give any errors, but equally didn't work. At boot time gptmgr prints Missing OS which appears to be because it cannot locate the 0xaa55 signature. I've also not been able to make a raid-on-wedge partition bootable. I think the bootloader needs to be taught another variant of 'skipping raidframe header'... In case its of any interest I've just recently setup a 'fully wedged' 2*6TB RAIDframe system on netbsd-7, and apart from RAIDframe and installboot still needing to learn NAME= syntax, plus the need for boot partitions everything seemed to go well. The script I used is below in case anyone finds it of interest wedgeraidsetup.sh Description: Bourne shell script
Re: gptmbr.bin vs RAIDframe
On 16 June 2015 at 20:08, Michael van Elst mlel...@serpens.de wrote: a...@absd.org (David Brownlee) writes: In case its of any interest I've just recently setup a 'fully wedged' 2*6TB RAIDframe system on netbsd-7, and apart from RAIDframe and installboot still needing to learn NAME=3D syntax, plus the need for boot partitions everything seemed to go well. installboot now (not netbsd-7) understands the NAME syntax. Ah wonderful - would it be slated for a netbsd-7 pullup :) or did it miss the cut? :/