Re: [CFC/CFT] large changes in the loader(8) code
On Mon, Jul 16, 2012 at 04:00:49PM +0400, Andrey V. Elsukov wrote: > On 16.07.2012 15:31, Andriy Gapon wrote: > >> Yes. It should work as before. > > > > Well, but it's obvious that zfs_probe_dev would be attempting to do some > > unneeded > > stuff (trying to treat partitions as disks) for that case. To me this is a > > clear > > indication zfs_probe_dev is not optimal for arch-independent > > implementation. So I > > still think that arch_zfs_probe should decide what disks and partitions to > > probe, > > and zfs_probe_dev should only probe what it's given and not try to be any > > smarter. > > But I've repeated myself three times already :-) > > And we will have the same - several copies of the same code in each > architecture, > which i have deleted... > > Sparc doesn't support DIOCGMEDIASIZE and DIOCGSECTORSIZE ioctls, > so it will not check each partition, only fd that is passed to the > zfs_probe_dev. > > Currently there is only one problem with ZFS tasting, that can affect users - > now we taste each disk and partition, but in the my branch ZFS tastes only > disks and > partitions with type "freebsd" and "freebsd-zfs". So if you have created ZFS > on top > of MBR partition with type "ntfs", then loader will be unable to detect it. > Sorry, I'm missing the big picture of ZFS support in the loader and currently unfortunately don't have the time to look into it or your patches. I don't think there's a way to determine the media and sector sizes without actually looking at the Sun and/or VTOC8 labels though. As for zfs_probe_dev, some user recently indicated that on sparc64 we should rather look at the disk devices listed in the "boot-device" environment variable in order to mimic what Solaris does rather than trying to probe anything that might be a disk device, mimicking what the FreeBSD/i386 ZFS loader does. Maybe that's a hint whether a arch_zfs_probe should exist. I can test patches once you guys have figures out how things should work though. Marius ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: [CFC/CFT] large changes in the loader(8) code
On 16.07.2012 15:31, Andriy Gapon wrote: >> Yes. It should work as before. > > Well, but it's obvious that zfs_probe_dev would be attempting to do some > unneeded > stuff (trying to treat partitions as disks) for that case. To me this is a > clear > indication zfs_probe_dev is not optimal for arch-independent implementation. > So I > still think that arch_zfs_probe should decide what disks and partitions to > probe, > and zfs_probe_dev should only probe what it's given and not try to be any > smarter. > But I've repeated myself three times already :-) And we will have the same - several copies of the same code in each architecture, which i have deleted... Sparc doesn't support DIOCGMEDIASIZE and DIOCGSECTORSIZE ioctls, so it will not check each partition, only fd that is passed to the zfs_probe_dev. Currently there is only one problem with ZFS tasting, that can affect users - now we taste each disk and partition, but in the my branch ZFS tastes only disks and partitions with type "freebsd" and "freebsd-zfs". So if you have created ZFS on top of MBR partition with type "ntfs", then loader will be unable to detect it. -- WBR, Andrey V. Elsukov signature.asc Description: OpenPGP digital signature
Re: [CFC/CFT] large changes in the loader(8) code
on 16/07/2012 14:14 Andrey V. Elsukov said the following: > On 16.07.2012 15:05, Andriy Gapon wrote: 2. I am not sure if I like the approach of moving partition tasting code into common ZFS code (zfs.c). On one hand, it now makes sense because the new partition iteration code is machine-independent. On the other hand, the reason that I added arch_zfs_probe method was to give platforms full control over which partitions and in what order are probed. It seems to be important for some of them. So, I like how your new partition interface makes it much easier to ZFS-probe partitions, but I would prefer to have that code in arch_zfs_probe implementations rather than in zfs_probe_dev. >>> >>> From the other point of view, ZFS is not a just file system and it works >>> directly with disks and partitions. And it seems to me this code will be >>> common >>> for other architectures. >> >> Well, it seems that you haven't yet touched sparc64_zfs_probe. > > Yes. It should work as before. Well, but it's obvious that zfs_probe_dev would be attempting to do some unneeded stuff (trying to treat partitions as disks) for that case. To me this is a clear indication zfs_probe_dev is not optimal for arch-independent implementation. So I still think that arch_zfs_probe should decide what disks and partitions to probe, and zfs_probe_dev should only probe what it's given and not try to be any smarter. But I've repeated myself three times already :-) > But if Marius can suggest how to change ofw_disk.c to get disk size and > sector size, > then i will be able to break something here :) > >> If you'll find that you don't have to use any ugly hacks there, then good. >> But my impression is that it would be easier to stick to the previous >> approach. > -- Andriy Gapon ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: [CFC/CFT] large changes in the loader(8) code
On 16.07.2012 15:05, Andriy Gapon wrote: >>> 2. I am not sure if I like the approach of moving partition tasting code >>> into >>> common ZFS code (zfs.c). On one hand, it now makes sense because the new >>> partition iteration code is machine-independent. On the other hand, the >>> reason >>> that I added arch_zfs_probe method was to give platforms full control over >>> which >>> partitions and in what order are probed. It seems to be important for some >>> of them. >>> So, I like how your new partition interface makes it much easier to >>> ZFS-probe >>> partitions, but I would prefer to have that code in arch_zfs_probe >>> implementations >>> rather than in zfs_probe_dev. >> >> From the other point of view, ZFS is not a just file system and it works >> directly with disks and partitions. And it seems to me this code will be >> common >> for other architectures. > > Well, it seems that you haven't yet touched sparc64_zfs_probe. Yes. It should work as before. But if Marius can suggest how to change ofw_disk.c to get disk size and sector size, then i will be able to break something here :) > If you'll find that you don't have to use any ugly hacks there, then good. > But my impression is that it would be easier to stick to the previous > approach. -- WBR, Andrey V. Elsukov signature.asc Description: OpenPGP digital signature
Re: [CFC/CFT] large changes in the loader(8) code
on 16/07/2012 13:57 Andrey V. Elsukov said the following: > On 16.07.2012 14:23, Andriy Gapon wrote: >> on 26/06/2012 15:50 Andrey V. Elsukov said the following: >>> 3. ZFS code now uses new API and probing on the systems with many disks >>> should be greatly increased: >>> zfs/zfs.c >>> i386/loader/main.c >> >> First of all, it's hard to parse the above sentence. "probing ... should be >> greatly increased". Probing what? :-) If probing time, then we don't want >> that ;-) >> >> I looked through the ZFS-related part and here are a few comments: > > Thanks for that. > >> 1. I think that the predominant indentation style of i386/loader/main.c >> should be >> preserved for consistency. >> >> 2. I am not sure if I like the approach of moving partition tasting code into >> common ZFS code (zfs.c). On one hand, it now makes sense because the new >> partition iteration code is machine-independent. On the other hand, the >> reason >> that I added arch_zfs_probe method was to give platforms full control over >> which >> partitions and in what order are probed. It seems to be important for some >> of them. >> So, I like how your new partition interface makes it much easier to ZFS-probe >> partitions, but I would prefer to have that code in arch_zfs_probe >> implementations >> rather than in zfs_probe_dev. > > From the other point of view, ZFS is not a just file system and it works > directly with disks and partitions. And it seems to me this code will be > common > for other architectures. Well, it seems that you haven't yet touched sparc64_zfs_probe. If you'll find that you don't have to use any ugly hacks there, then good. But my impression is that it would be easier to stick to the previous approach. >> 3. Related to the above. In what shape is sparc64 ZFS support in your >> branch? >> Have you tried to adapt it to the new model too? >> It's the platform that has special requirements for disk/partition probing >> order. >> Marius can help with additional information and testing here. > > Currently i have not received any feedback reports from the users who can test > patches on the other architectures. I added VTOC8 support to the part.c, but > it > seems it is not needed and ofw can work without this. > -- Andriy Gapon ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: [CFC/CFT] large changes in the loader(8) code
On 16.07.2012 14:23, Andriy Gapon wrote: > on 26/06/2012 15:50 Andrey V. Elsukov said the following: >> 3. ZFS code now uses new API and probing on the systems with many disks >> should be greatly increased: >> zfs/zfs.c >> i386/loader/main.c > > First of all, it's hard to parse the above sentence. "probing ... should be > greatly increased". Probing what? :-) If probing time, then we don't want > that ;-) > > I looked through the ZFS-related part and here are a few comments: Thanks for that. > 1. I think that the predominant indentation style of i386/loader/main.c > should be > preserved for consistency. > > 2. I am not sure if I like the approach of moving partition tasting code into > common ZFS code (zfs.c). On one hand, it now makes sense because the new > partition iteration code is machine-independent. On the other hand, the > reason > that I added arch_zfs_probe method was to give platforms full control over > which > partitions and in what order are probed. It seems to be important for some > of them. > So, I like how your new partition interface makes it much easier to ZFS-probe > partitions, but I would prefer to have that code in arch_zfs_probe > implementations > rather than in zfs_probe_dev. From the other point of view, ZFS is not a just file system and it works directly with disks and partitions. And it seems to me this code will be common for other architectures. > 3. Related to the above. In what shape is sparc64 ZFS support in your > branch? > Have you tried to adapt it to the new model too? > It's the platform that has special requirements for disk/partition probing > order. > Marius can help with additional information and testing here. Currently i have not received any feedback reports from the users who can test patches on the other architectures. I added VTOC8 support to the part.c, but it seems it is not needed and ofw can work without this. -- WBR, Andrey V. Elsukov signature.asc Description: OpenPGP digital signature
Re: [CFC/CFT] large changes in the loader(8) code
on 26/06/2012 15:50 Andrey V. Elsukov said the following: > 3. ZFS code now uses new API and probing on the systems with many disks > should be greatly increased: > zfs/zfs.c > i386/loader/main.c First of all, it's hard to parse the above sentence. "probing ... should be greatly increased". Probing what? :-) If probing time, then we don't want that ;-) I looked through the ZFS-related part and here are a few comments: 1. I think that the predominant indentation style of i386/loader/main.c should be preserved for consistency. 2. I am not sure if I like the approach of moving partition tasting code into common ZFS code (zfs.c). On one hand, it now makes sense because the new partition iteration code is machine-independent. On the other hand, the reason that I added arch_zfs_probe method was to give platforms full control over which partitions and in what order are probed. It seems to be important for some of them. So, I like how your new partition interface makes it much easier to ZFS-probe partitions, but I would prefer to have that code in arch_zfs_probe implementations rather than in zfs_probe_dev. 3. Related to the above. In what shape is sparc64 ZFS support in your branch? Have you tried to adapt it to the new model too? It's the platform that has special requirements for disk/partition probing order. Marius can help with additional information and testing here. Overall, thank you very much for this work! I believe that it moves us in the correct direction. -- Andriy Gapon ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: [CFC/CFT] large changes in the loader(8) code
"Andrey V. Elsukov" writes: > On 29.06.2012 15:01, Jan Beich wrote: > So, i have created the branch and committed the changes: http://svnweb.freebsd.org/base/user/ae/bootcode/ The patch is here: http://people.freebsd.org/~ae/boot.diff >>> >>> FWIW, I verified it compiles OK with clang, and especially boot2's size >>> isn't increased at all. >> Does it boot for you? Same if I build zfs.c with gcc -O0: >> >> FreeBSD/x86 ZFS enabled bootstrap loader, Revision 1.1 >> (foo@bar, Tue Jun 26 18:52:52 UTC 2012) >> ZFS: can't find pool by guid >> ZFS: can't find pool by guid >> >> can't load 'kernel' >> > > Does zfsloader without patches compiled with CLANG work for you? It does. I did test before using $ cd /usr/src/sys/boot $ env -i __MAKE_CONF= PATH=/bin:/sbin:/usr/bin:/usr/sbin make CC=clang $ make install $ sudo qemu-system-x86_64 -curses -drive file=/dev/ada0 -drive file=/dev/ada1 In gcc -O0 case $ touch zfs/zfs.c $ rm i386/zfsloader/zfsloader* $ echo CFLAGS+=-O0 >>zfs/Makefile $ env -i ... make CC=gcc And for gcc47 $ touch zfs/zfs.c $ rm i386/zfsloader/zfsloader* $ env -i ... make CC=/usr/local/bin/gcc47 -C zfs I haven't tried to further track down which flag(s) within -O1 make zfsloader from your branch work when compiled with base gcc. ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: [CFC/CFT] large changes in the loader(8) code
On 29.06.2012 15:01, Jan Beich wrote: >>> So, i have created the branch and committed the changes: >>> http://svnweb.freebsd.org/base/user/ae/bootcode/ >>> The patch is here: >>> http://people.freebsd.org/~ae/boot.diff >> >> FWIW, I verified it compiles OK with clang, and especially boot2's size >> isn't increased at all. > Does it boot for you? Same if I build zfs.c with gcc -O0: > > FreeBSD/x86 ZFS enabled bootstrap loader, Revision 1.1 > (foo@bar, Tue Jun 26 18:52:52 UTC 2012) > ZFS: can't find pool by guid > ZFS: can't find pool by guid > > can't load 'kernel' > Does zfsloader without patches compiled with CLANG work for you? -- WBR, Andrey V. Elsukov signature.asc Description: OpenPGP digital signature
Re: [CFC/CFT] large changes in the loader(8) code
Dimitry Andric writes: > On 2012-06-26 14:50, Andrey V. Elsukov wrote: > >> Some time ago i have started reading the code in the sys/boot. >> Especially i'm interested in the partition tables handling. >> I found several problems: >> 1. There are several copies of the same code in the libi386/biosdisk.c >> and common/disk.c, and partially libpc98/biosdisk.c. >> 2. ZFS probing is very slow, because the ZFS code doesn't know how many >> disks and partitions the system has: >> http://www.freebsd.org/cgi/query-pr.cgi?pr=148296 >> http://www.freebsd.org/cgi/query-pr.cgi?pr=161897 >> 3. The GPT support doesn't check CRC and even doesn't know anything >> about the secondary GPT header/table. >> >> So, i have created the branch and committed the changes: >> http://svnweb.freebsd.org/base/user/ae/bootcode/ >> The patch is here: >> http://people.freebsd.org/~ae/boot.diff > > FWIW, I verified it compiles OK with clang, and especially boot2's size > isn't increased at all. Does it boot for you? Same if I build zfs.c with gcc -O0: FreeBSD/x86 ZFS enabled bootstrap loader, Revision 1.1 (foo@bar, Tue Jun 26 18:52:52 UTC 2012) ZFS: can't find pool by guid ZFS: can't find pool by guid can't load 'kernel' > > It would be nice if you could check it with clang now and again, before > you finally merge this project into head. ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: [CFC/CFT] large changes in the loader(8) code
Pawel Jakub Dawidek wrote in <20120628230725.gb1...@garage.freebsd.pl>: pj> PS. We are discussing two totally different things here: pj> 1. Is placing GPT on anything but raw disk violates the spec? I can pj>agree that it does and I'm happy with gpart(8) growing a warning. I agree that there is a sort of violation, but in practice most of implementations which use GPT can recognize the backup header as long as the primary one is not corrupted by using the alternative LBA field. One thing we have to consider is what happens when the primary header becomes broken. In that case and if a GEOM metadata is placed at the end of the raw disk, GPT will be lost and it cannot recover by non-GEOM-aware software including BIOS and other OS. Also, even for FreeBSD it causes a boot failure. The modification which ae@ proposes mitigates this case. Of course, maybe BIOS or EFI will not recognize the corrupted header because the backup header is not located at the end. In that case all of the partitions are not recognized and the FreeBSD does not boot. This is the trade-off when we use GPT in a logical volume provided by GEOM. In short, the risk is that backup header does not work as a backup when the primary is broken. I agree that putting a warning about that is good and enough. Whether this risk is acceptable or not depends on the sysadmin. Also, we can describe the pros and cons in detail in a section of the handbook because I and wblock@ are working on updating it. pj> 2. How to do software mirroring. Besides trying really hard I'm not sure pj>what alternative are you proposing. Could you be more specific and pj>describe how gmirror should be implemented in your opinion? I do not think this topic is related to ae@'s change and this should be discussed in a separate thread. His change aims to support a non-standard GPT header location in a quite limited situation, not actively promote such a configuration. The issue of GPT+GEOM is not limited to gmirror. Just putting GEOM::LABEL metadata causes the same issue. -- Hiroki pgp6Fd1NnM4r5.pgp Description: PGP signature
Re: [CFC/CFT] large changes in the loader(8) code
On Jun 28, 2012, at 4:07 PM, Pawel Jakub Dawidek wrote: >> >> I would be having less problems if the mirroring didn't force the backup >> GPT header in anything but the last sector. [...] > > GPT backup header is placed in the last sector of the mirror device, > just like the user asked. Gmirror doesn't force anything. User decides > to put GPT partitioning on the mirror device instead of raw disk. > Gmirror doesn't even know and doesn't have to know how the user uses > data area on the mirror device. This really is a cop-out paragraph. >> [...] If the metadata was somewhere >> else, then we wouldn't need to kluge various places to deal with the >> ambiguity and visible interoperability problems of the various tools and >> OSes. [...] > > Where is "somewhere else", exactly? I already suggested a few things in this thread. Go read it. I'm bored now, so I'll just wait for UEFI booting to be forced upon those who mirror the whole disk with gmirror. I think that's when we will have a more substantial and meaningful continuation of this thread. -- Marcel Moolenaar mar...@xcllnt.net ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: [CFC/CFT] large changes in the loader(8) code
On Thu, Jun 28, 2012 at 02:54:43PM -0700, Marcel Moolenaar wrote: > On Jun 28, 2012, at 12:49 PM, Alexander Leidinger wrote: > > Or are you suggesting to > > convince all BIOS vendors to include the ability to boot from some kind > > of FreeBSD private partitioning scheme (not MBR as it is not > > suitable, not GPT as you are not OK to use it on a gmirror)? > > I would be having less problems if the mirroring didn't force the backup > GPT header in anything but the last sector. [...] GPT backup header is placed in the last sector of the mirror device, just like the user asked. Gmirror doesn't force anything. User decides to put GPT partitioning on the mirror device instead of raw disk. Gmirror doesn't even know and doesn't have to know how the user uses data area on the mirror device. > [...] If the metadata was somewhere > else, then we wouldn't need to kluge various places to deal with the > ambiguity and visible interoperability problems of the various tools and > OSes. [...] Where is "somewhere else", exactly? If somewhere else on this disk, then where? At the begining of the disk? Then you would complain that it keeps metadata where the primary header should be located and also MBR metadata, BSDlabel metadata, etc. Somewhere in the middle of the disk? Some future GPTng may want to use the same spot, but also gmirror-unaware boot loader will see corrupted data (shifted by one sector). Come on... If somewhere else is not on this disk, then I'm sorry, but this is totally impractical. Disks are the place you store stuff. In 99% of the cases there is no other place to store it, but the disk itself. Should we ask users to use additional disk to keep mirror's metadata? > [...] Thus, it's not that I object to the mirroring per se, just to the > mirroring as it is currently implemented with gmirror. Do you know software RAID (>=1) or volume manager that doesn't keep metadata on component disks? PS. We are discussing two totally different things here: 1. Is placing GPT on anything but raw disk violates the spec? I can agree that it does and I'm happy with gpart(8) growing a warning. 2. How to do software mirroring. Besides trying really hard I'm not sure what alternative are you proposing. Could you be more specific and describe how gmirror should be implemented in your opinion? > > What about multipathing? In case the disk is attached via two paths but > > multipath is not enabled, the OS sees the same disk (and the same > > identical unique disk identifier) multiple times. Is this a violation > > of the spec too? > > It's the same disk, isn't it? The OS can actually use the property > of the ID to infer that it has already seen this disk and not create > multiple device nodes. You cannot trust some id that is found on disk to be unique, as all your assumptions break when the user decides to dd(1)-copy content of this disk to another disk, for example. -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://tupytaj.pl pgp4nneOO8jiW.pgp Description: PGP signature
Re: [CFC/CFT] large changes in the loader(8) code
On Jun 28, 2012, at 12:49 PM, Alexander Leidinger wrote: > On Thu, 28 Jun 2012 08:33:17 -0700 Marcel Moolenaar > wrote: > >> My advise is to leave disk mirroring to H/W or firmware solutions and >> use FreeBSD mirroring for FreeBSD partitions only. If you want to >> mirror the whole disk, don't partition the disk with non-FreeBSD >> partitioning schemes and partition only with FreeBSD-specific schemes >> or put a FreeBSD file system on the whole disk. In other words: make >> the whole disk private to FreeBSD. > > If I gmirror the entire disk, I already expressed my interest to make > the whole disk private to FreeBSD, haven't I? No. All you've done is type some commands. There's no inherent value in it that relays that you know what you're doing. I have no problem accepting that you do in fact know what you're doing, but that doesn't mean that anyone who types the same sequence of commands is as skilled as you are -- that would be a silly inference. What you need to do is not have it be about you, but about some random user. > Or are you suggesting to > convince all BIOS vendors to include the ability to boot from some kind > of FreeBSD private partitioning scheme (not MBR as it is not > suitable, not GPT as you are not OK to use it on a gmirror)? I would be having less problems if the mirroring didn't force the backup GPT header in anything but the last sector. If the metadata was somewhere else, then we wouldn't need to kluge various places to deal with the ambiguity and visible interoperability problems of the various tools and OSes. Thus, it's not that I object to the mirroring per se, just to the mirroring as it is currently implemented with gmirror. > What about multipathing? In case the disk is attached via two paths but > multipath is not enabled, the OS sees the same disk (and the same > identical unique disk identifier) multiple times. Is this a violation > of the spec too? It's the same disk, isn't it? The OS can actually use the property of the ID to infer that it has already seen this disk and not create multiple device nodes. -- Marcel Moolenaar mar...@xcllnt.net ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: [CFC/CFT] large changes in the loader(8) code
Pawel Jakub Dawidek wrote: On Thu, Jun 28, 2012 at 08:33:17AM -0700, Marcel Moolenaar wrote: On Jun 28, 2012, at 3:10 AM, Stefan Esser wrote: All of the above is ugly, U'm afraid :( Indeed. The only sane way is to put the metadata in a partition of its own. Every compliant OS will respect that and consequently will not scribble over the data unintentionally. Any other scheme that puts valuable data in some undocumented or unregistered location is violating the GPT spec right away and is susceptible to being clobbered unintentionally. If the user runs: # gpart create -s GPT /dev/mirror/foo for me it is obvious that he wants to partition the mirror device and not individual disks. Because the mirror was configured earlier, do you expect gmirror to somehow detect that someone is writting GPT metadata later and magically place GPT metadata on the raw disk and move mirror's metadata to some magic partition? Not to mention that the mirror itself doesn't have to be configured on top of raw disks. And not to mention that the mirror may never be partitioned. If GPT in your opinion is limited only to raw disks then I guess the best way to fix that is to refuse to configure GPT on anything except raw disks (which was already proposed by Andrey?). In my opinion this is unacceptable, but I think this is what you are suggesting. One of the GEOM design goals was to be flexible. Let the user decide in what order he wants to configure various layers. How do you know that in every possible scenerio software mirroring should come after partitioning and encryption after mirroring? Why can't we provide flexible tools to the user and let him decide? Maybe GPT nesting violates standards, but why can't we support it as an extention, really? I recognize the need to warn users if they use FreeBSD-specific features. We do that with non-standard APIs. So how about this. Let's modify gpart(8) to print a warning if GPT is configured on something else than raw disk. Let's the warning say that such configuration is non-standard and problems are expected if the disk is shared between other OSes. In my opinion that's fair. With such a warning in place, I think we can allow users to decide on their own if they really want that or not. Then, we can also improve FreeBSD boot loader to play nice with FreeBSD-specific extensions. I think this is valid point of view. FreeBSD already does things not supported by other OSes and I am completely fine with it - I am running FreeBSD on servers, not sharing anything with other OSes so I prefer extended FreeBSD specific features over 100% standard compliant behaviour crippling SW mirroring etc. I think that our tools should support / provide all standard compliant (compatible) features, but let user to choose any other extended non-compatible features if user wants it. Even if it can be seen as foot shooting by somebody else. And maybe one day our solution will be widespread and taken as a standard. Miroslav Lachman ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: [CFC/CFT] large changes in the loader(8) code
On Jun 28, 2012, at 10:25 AM, Pawel Jakub Dawidek wrote: > On Thu, Jun 28, 2012 at 08:33:17AM -0700, Marcel Moolenaar wrote: >> >> On Jun 28, 2012, at 3:10 AM, Stefan Esser wrote: >>> >>> All of the above is ugly, U'm afraid :( >> >> Indeed. The only sane way is to put the metadata in a partition of its own. >> Every compliant OS will respect that and consequently will not scribble over >> the data unintentionally. Any other scheme that puts valuable data in some >> undocumented or unregistered location is violating the GPT spec right away >> and is susceptible to being clobbered unintentionally. > > If the user runs: > > # gpart create -s GPT /dev/mirror/foo > > for me it is obvious that he wants to partition the mirror device and > not individual disks. It could definitely be interpreted as the user knowing what he/she wants and as such design an infrastructure around this assumption. If users were at least as knowledgable as developers, my concerns wouldn't be as big. But we all know how knoweldgable users can be and kike it or not, even developers aren't gurus in everything. We may think to know stuff, but in practice we're just as clueless in cases as users -- more clueless even sometimes. So you may think the intend is obvious, but you should know better. > Let's modify gpart(8) to print a warning if GPT is configured on > something else than raw disk. Let's the warning say that such > configuration is non-standard and problems are expected if the disk is > shared between other OSes. Yes. I think we finally reached the point we should have reached years ago. With the proper tooling, our flexible infrastructure can be used in a safe and complaint way while still giving the freedom to those who unwisely think they know better. Build it and I'll concur. -- Marcel Moolenaar mar...@xcllnt.net ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: [CFC/CFT] large changes in the loader(8) code
On Thu, 28 Jun 2012 14:49:02 -0500, Alexander Leidinger wrote: What about multipathing? In case the disk is attached via two paths but multipath is not enabled, the OS sees the same disk (and the same identical unique disk identifier) multiple times. Is this a violation of the spec too? Good point; does gmirror and gmultipath play together nicely? ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: [CFC/CFT] large changes in the loader(8) code
On Thu, 28 Jun 2012 08:33:17 -0700 Marcel Moolenaar wrote: > My advise is to leave disk mirroring to H/W or firmware solutions and > use FreeBSD mirroring for FreeBSD partitions only. If you want to > mirror the whole disk, don't partition the disk with non-FreeBSD > partitioning schemes and partition only with FreeBSD-specific schemes > or put a FreeBSD file system on the whole disk. In other words: make > the whole disk private to FreeBSD. If I gmirror the entire disk, I already expressed my interest to make the whole disk private to FreeBSD, haven't I? Or are you suggesting to convince all BIOS vendors to include the ability to boot from some kind of FreeBSD private partitioning scheme (not MBR as it is not suitable, not GPT as you are not OK to use it on a gmirror)? > Whether or not people agree with this is besides the point. All I'm > saying is that unique disk identifiers such as the > "UniqueMBRSignature" (a 4 byte ID written at offset 440 in the MBR) > or the "DiskGUID" (an UUID written to offset 56 in the GPT header) > cannot, in general, be mirrored across disks if OSes can see the > mirrored disks as independent entities. One violates the spec on > grounds of making the *unique* disk identifier non-unique by > presenting OSes with multiple disks that have identical IDs. What about multipathing? In case the disk is attached via two paths but multipath is not enabled, the OS sees the same disk (and the same identical unique disk identifier) multiple times. Is this a violation of the spec too? Bye, Alexander. -- http://www.Leidinger.netAlexander @ Leidinger.net: PGP ID = B0063FE7 http://www.FreeBSD.org netchild @ FreeBSD.org : PGP ID = 72077137 ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: [CFC/CFT] large changes in the loader(8) code
On Thu, Jun 28, 2012 at 08:33:17AM -0700, Marcel Moolenaar wrote: > > On Jun 28, 2012, at 3:10 AM, Stefan Esser wrote: > > > > All of the above is ugly, U'm afraid :( > > Indeed. The only sane way is to put the metadata in a partition of its own. > Every compliant OS will respect that and consequently will not scribble over > the data unintentionally. Any other scheme that puts valuable data in some > undocumented or unregistered location is violating the GPT spec right away > and is susceptible to being clobbered unintentionally. If the user runs: # gpart create -s GPT /dev/mirror/foo for me it is obvious that he wants to partition the mirror device and not individual disks. Because the mirror was configured earlier, do you expect gmirror to somehow detect that someone is writting GPT metadata later and magically place GPT metadata on the raw disk and move mirror's metadata to some magic partition? Not to mention that the mirror itself doesn't have to be configured on top of raw disks. And not to mention that the mirror may never be partitioned. If GPT in your opinion is limited only to raw disks then I guess the best way to fix that is to refuse to configure GPT on anything except raw disks (which was already proposed by Andrey?). In my opinion this is unacceptable, but I think this is what you are suggesting. One of the GEOM design goals was to be flexible. Let the user decide in what order he wants to configure various layers. How do you know that in every possible scenerio software mirroring should come after partitioning and encryption after mirroring? Why can't we provide flexible tools to the user and let him decide? Maybe GPT nesting violates standards, but why can't we support it as an extention, really? I recognize the need to warn users if they use FreeBSD-specific features. We do that with non-standard APIs. So how about this. Let's modify gpart(8) to print a warning if GPT is configured on something else than raw disk. Let's the warning say that such configuration is non-standard and problems are expected if the disk is shared between other OSes. In my opinion that's fair. With such a warning in place, I think we can allow users to decide on their own if they really want that or not. Then, we can also improve FreeBSD boot loader to play nice with FreeBSD-specific extensions. -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://tupytaj.pl pgpk4jIfg3rJt.pgp Description: PGP signature
Re: [CFC/CFT] large changes in the loader(8) code
On Jun 28, 2012, at 3:10 AM, Stefan Esser wrote: > > All of the above is ugly, U'm afraid :( Indeed. The only sane way is to put the metadata in a partition of its own. Every compliant OS will respect that and consequently will not scribble over the data unintentionally. Any other scheme that puts valuable data in some undocumented or unregistered location is violating the GPT spec right away and is susceptible to being clobbered unintentionally. If the metadata is in its own partition, one can document the metadata layout and providing a reference implementation. That way one increases the chance that someone, somewhere may port support for it to some other OS. Lacking widespread support for the mirroring scheme, I think that the notion that one can safely and reliably mirror entire disks (read: mirror data not owned or controlled by FreeBSD) is a very questionable one -- all one has to do is boot some other OS and start modifying one of its partitions and you've failed to achieve the objective. My advise is to leave disk mirroring to H/W or firmware solutions and use FreeBSD mirroring for FreeBSD partitions only. If you want to mirror the whole disk, don't partition the disk with non-FreeBSD partitioning schemes and partition only with FreeBSD-specific schemes or put a FreeBSD file system on the whole disk. In other words: make the whole disk private to FreeBSD. Whether or not people agree with this is besides the point. All I'm saying is that unique disk identifiers such as the "UniqueMBRSignature" (a 4 byte ID written at offset 440 in the MBR) or the "DiskGUID" (an UUID written to offset 56 in the GPT header) cannot, in general, be mirrored across disks if OSes can see the mirrored disks as independent entities. One violates the spec on grounds of making the *unique* disk identifier non-unique by presenting OSes with multiple disks that have identical IDs. -- Marcel Moolenaar mar...@xcllnt.net ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: [CFC/CFT] large changes in the loader(8) code
On 28.06.2012 15:36, Boris Samorodov wrote: > 28.06.2012 14:10, Stefan Esser пишет: > >> All of the above is ugly, U'm afraid :( > > One more try to overcome it. :-) > > We already have freebsd-boot partition at GPT scheme. Right? > Then why not use it (dedicated file/part/etc.) to store > geom FreeBSD information? Recently i have ported LDM support to the FreeBSD. LDM uses 1Mbytes to store its database. All disks that are used by LDM have this database. When the disk is partitioned with MBR, LDM is stored in the last 1Mbyte. When the disk is partitioned with GPT, one partition is dedicated to this database. LDM is not just partitioning scheme, it is also LVM and can do RAID0-5. This is how Microsoft and Veritas have resolved this problem. -- WBR, Andrey V. Elsukov ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: [CFC/CFT] large changes in the loader(8) code
Just modify GEOM classes that keep state at the end of a partition to leave some spare area *behind* the GEOM data. I.e.: what is really a problem aat all? just leave as is. If someone want's use gpart and mirror then mirroring every partition is simpler. usually not every partition needs to be mirrored. or mirror a whole and make gpart in it, it should still boot fine. even better - update bsdlabel to work with >2TB devices. MUCH better. ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: [CFC/CFT] large changes in the loader(8) code
On 28.06.2012 14:35, Wojciech Puchar wrote: >> Just modify GEOM classes that keep state at the end of a partition to >> leave some spare area *behind* the GEOM data. I.e.: >> > > what is really a problem aat all? > > just leave as is. If someone want's use gpart and mirror then mirroring every > partition is simpler. > usually not every partition needs to be mirrored. > > or mirror a whole and make gpart in it, it should still boot fine. I already reverted changes related to the GPT and GEOM metadata detection. > even better - update bsdlabel to work with >2TB devices. > MUCH better. DragonFlyBSD has disklabel64 partitioning scheme. Make a port is simple task. -- WBR, Andrey V. Elsukov ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: [CFC/CFT] large changes in the loader(8) code
28.06.2012 14:10, Stefan Esser пишет: All of the above is ugly, U'm afraid :( One more try to overcome it. :-) We already have freebsd-boot partition at GPT scheme. Right? Then why not use it (dedicated file/part/etc.) to store geom FreeBSD information? -- WBR, Boris Samorodov (bsam) FreeBSD Committer, http://www.FreeBSD.org The Power To Serve ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: [CFC/CFT] large changes in the loader(8) code
28.06.2012 13:41, Andrey V. Elsukov пишет: On 28.06.2012 13:19, Boris Samorodov wrote: 27.06.2012 23:27, Andrey V. Elsukov пишет: 1. You are against from: Our loader detects that primary GPT header is damaged. It tries to read backup GPT header from the last LBA and it detects that there is "GEOM::" signature. It tries to read one previous sector and there is *valid* GPT header. Can we do the other way round? I.e. the GPT header is at the last sector. And if GEOM singature is not found at last sector of the disk and this sector is a GPT header then look at the previous sector? Then this sector contains GPT table. OK, then place GEOM sector before GPT table. -- WBR, Boris Samorodov (bsam) FreeBSD Committer, http://www.FreeBSD.org The Power To Serve ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: [CFC/CFT] large changes in the loader(8) code
Sorry for following up to self, but ... I just noticed somebody else suggesting the same method (put GMIRROR configuration below Secondary GPT header), but I think there is a problem: If GMIRROR is used to mirror whole GPT partitioned drives, then you want the GPT sectors to be considered part of the mirror (to keep them identical when GPT partitions are created/modified on the mirrored disks). But the GMIRROR configuration must not be assigned to any GPT partition. Therefore it must be protected, either by hiding it (e.g. create a special partition to hold GEOM config data, just to reserve the space within GPT, since the configuration data will still be located by looking at specific sectors of the provider), or by skipping the sectors assigned to GEOM config data in the GEOM provider that interprets them (e.g. GMIRROR). The former only works if a GMIRROR (or GELI or whatever) is created on a disk that already has GPT headers (since these lead to the GEOM config data put before the Secondary GPT header and allow the GEOM config to be marked as a special partition in that header). The latter only works on disks without GPT headers, since the size of the provider will be smaller then the physical disk. Even with the last physical disks available for GPT, the GPT headers will probably not conform to the standard, since remapping of the sectors to hide the GMIRROR config will lead to different logical sector numbers for the secondary GPT header when looked at with or without GMIRROR loaded. I still think it is possible to find a layout, that does not violate the GPT standard (use last LBAs on disk, have self-referential information like own LBA address consistent with physical block numbers and block numbers presented to users of GMIRROR et.al.). Perhaps, GMIRROR could treat its configuration sector (that is placed at the sector just below the secondary GPT header) as read only. Requests may go to all sectors below and also to the area above the GMIRROR config sector used for the GPT header, to write it to all mirrored devices). But this is also ugly, since GPT must know to not assign the GMIRROR config sector to any partition (it is read- only for user requests, but writable on each individual drive in case of GMIRROR configuration or state changes, just as it is now). The reservation was best achieved by use of a specific GPT partition for the configuration data, for which GPT headers must exist, before the GMIRROR is created (or bith must be created at the same time, but that would mix GPT knowledge into GMIRROR). All of the above is ugly, U'm afraid :( Regards, STefan ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: [CFC/CFT] large changes in the loader(8) code
On 28.06.2012 13:19, Boris Samorodov wrote: > 27.06.2012 23:27, Andrey V. Elsukov пишет: > >> 1. You are against from: >> Our loader detects that primary GPT header is damaged. It tries to read >> backup GPT header from the last LBA and it detects that there is >> "GEOM::" signature. It tries to read one previous sector and there is >> *valid* GPT header. > > Can we do the other way round? I.e. the GPT header is at the last sector. And > if GEOM singature is > not found at last sector of the disk > and this sector is a GPT header then look at the previous sector? Then this sector contains GPT table. -- WBR, Andrey V. Elsukov ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: [CFC/CFT] large changes in the loader(8) code
Am 27.06.2012 21:14, schrieb Marcel Moolenaar: > > On Jun 27, 2012, at 12:08 PM, Christian Laursen wrote: > >> On 06/27/12 16:28, John Baldwin wrote: >>> On Wednesday, June 27, 2012 8:45:45 am Andrey V. Elsukov wrote: >>> When we are in the FreeBSD, our loader can detect that device size is lower than it see and it will work. When primary header is OK, then other OSes should work with this GPT. When it isn't OK, you just can't load other OS :) >>> >>> Ah, yes. The solution to violating standards is to make sure you never >>> use standards-compliant software. That's a great argument. :) >>> >>> (Although not entirely uncommon. Standards aren't always perfect, but if >>> we had a way to not gratuitously violate them it would be nice to avoid >>> doing so.) >> >> To be standards compliant and allow whole-disk based mirroring to work at >> the same time wouldn't nested GPT work like this? > > GPTs don't nest. It is not strictly necessary to use nested GPT to have GMIRROR et.al. and GPT co-exist. And I think this is possible without violation of any standard. Just modify GEOM classes that keep state at the end of a partition to leave some spare area *behind* the GEOM data. I.e.: MBR or Primary GTP header <> GMIRROR Configuration and State <> (Spare area for Sec. GPT header) If creating a GMIRROR (or other GEOM that keeps state at the end of the provider) left at least the last 32KByte untouched (33 GPT sectors rounded up to a power of 2), GPT could use this spare space to store its Secondary Header. These sectors could be treated as part of the User Data area, i.e. logical addresses would be translated by GMIRROR to skip the GMIRROR configuration sector (which I'd enlarge to at least 4KB for alignment of "User Data 2"). This implies that the GMIRROR specification covers the whole provider (including the spare space but without the sectors holding the GMIRROR config, which are "mapped out"), since updates to the Secondary GPT must be performed on all mirrored devices. This is a complication of the current GMIRROR code, but could be added without impact on existing disk layouts. (I have not checked, whether backwards compatibility mandates introduction of a new GMIRROR class that supports such spare space after the GMIRROR config data, but I assume that there is enough spare space pre-initialized to 0 that can be used to add a flag that declares the 32 KByte beyond the end of the config data to be part of the mirror.) The only modifications required are: - If a GMIRROR is created, place the configuration sector 32 KByte before the end of the provider and mark it as "GPT compatible". (It is unknown at this point, whether GPT is to be used on the mirror at a later time.) - Tasting a provider should support looking for a valid GMIRROR (or GRAID) config sector not only at the end of the provider, but if that fails then also 32 KByte before the end of the provider. The GMIRROR is considered to be the provider for the GPT (i.e. the GMIRROR extends to 32 KByte beyond its config sector). - Creating partitions with MBR or GPT within a GMIRROR is possible without modification. The only difference is that the protected GMIRROR configuration sector is physically within the range of sectors used for the partition, but logically mapped out. The space available for partitioning is the provider size minus the size of the GMIRROR configuration, just as it used to be. - Readind and writing the mirror is allowed for all sectors in the User Data area, as in a "normal" GMIRROR. The only difference is the test for logical sectors in the last 32 KByte, for which the request is modified to be offset by a few sectors to skip the GMIRROR configuration sector. Requests that cover physical sectors before and behind these GMIRROR config sectors must be split. If instead of splitting off the final 32 KByte as "User Data 2", just the 33 sectors (of 512 Byte) required for GPT were assigned to that area, then there would never be requests that extend beyond the GMIRROR config sectors on GPT partitioned disks. But since such request were still possible if MBR partitions were used, code to treat such requests was still required in GMIRROR. There is one caveat, though: Creating a GMIRROR and then using an OS that does not know about FreeBSD to partition the disk would result in the GMIRROR configuration space being ignored. Another problem could be, that the available space in the GPT is the size of the disk minus the GMIRROR configuration sectors, i.e. there is a difference between the number of physical sectors on the disk and the number of sectors to be assigned to partitions by GPT. >> Nothing but FreeBSD would understand the freebsd-geom partition >> type, so the inner GPT device should be valid and standards >> compliant. > > If it were standards compliant, it would be discoverable by non-FreeBSD. > That clearly isn't the case -- hence it's not standards complia
Re: [CFC/CFT] large changes in the loader(8) code
27.06.2012 23:27, Andrey V. Elsukov пишет: 1. You are against from: Our loader detects that primary GPT header is damaged. It tries to read backup GPT header from the last LBA and it detects that there is "GEOM::" signature. It tries to read one previous sector and there is *valid* GPT header. Can we do the other way round? I.e. the GPT header is at the last sector. And if GEOM singature is not found at last sector of the disk and this sector is a GPT header then look at the previous sector? It is valid, because it's CRC is valid, it's self_LBA is valid. For the*FreeBSD* users it is better to don't use this GPT and just complain "i'm sorry, can't boot". The other OSes can't, and we shouldn't. -- WBR, Boris Samorodov (bsam) FreeBSD Committer, http://www.FreeBSD.org The Power To Serve ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: [CFC/CFT] large changes in the loader(8) code
On Wed, Jun 27, 2012 at 2:39 PM, Poul-Henning Kamp wrote: > > I would like to point out that all other operating system which has > had this precise problem, have solved it by adding a bootfs partition > to hold the kernel+modules required to truly understand the disk-layout ? I have seen some form of this solution suggested three times (once by me) and now by someone who I think I can safely states is pretty familiar with geom. So far I have seen no direct response and only a passing comment by jhb that it might be difficult. Sometimes standards need to be broken. Sometimes they such so badly that te entire industry ignores them. But, unless there i a good reason to ignore them, one should fully justify doing so, all the more so when there are obvious ways that non-compliance can lead to disaster. (Think of geli disk there some other software steps on the last block.) Moreover, I think I can see a legitimate case, though I have not tried it. Say I have a FreeBSD system with a large, unused space on the disk and it uses gmirror. I decide that I need to have the ability to occasionally boot Linux on this system (or, even Windows 8). For some reason, and I can think of several, I can't use a virtual system. I create a new partition for the second OS and install it. It knows nothing about the gmirror, so it just uses the disk it is installed on and never touches the metadata. Is this possible? Looks reasonable to me. I really, really feel uncomfortable about all of this. And when people start claiming that, by a very strained interpretation of what appears on the surface to be a clear specification, they are not violating the standard. -- R. Kevin Oberman, Network Engineer E-mail: kob6...@gmail.com ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: [CFC/CFT] large changes in the loader(8) code
I would like to point out that all other operating system which has had this precise problem, have solved it by adding a bootfs partition to hold the kernel+modules required to truly understand the disk-layout ? -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 p...@freebsd.org | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: [CFC/CFT] large changes in the loader(8) code
On Jun 27, 2012, at 1:48 PM, Andrey V. Elsukov wrote: > On 28.06.2012 00:14, Marcel Moolenaar wrote: >>> Our loader detects that primary GPT header is damaged. It tries to read >>> backup GPT header from the last LBA and it detects that there is >>> "GEOM::" signature. It tries to read one previous sector and there is >>> *valid* GPT header. >> >> How do you know it's valid? It's in a location that is not valid >> to begin with. Validity is based on rules and you're violating the >> the rules without defining exactly what we call valid given the >> new rules. This may seem nitpicking, but having went through the >> hassle of dealing with the broken way we created the dangerously >> dedicated disk, I appreciate the importance of being anal when it >> comes to something that lives on non-volatile storage and gets to >> be exposed to a world much larger than FreeBSD. > > So why do you not prevent to attach GEOM_PART_GPT to any providers that > are not the disk drive? This will be the right solution to all our > problems. Just don't create invalid GPT. It's not even the right solution, as it prevents legit nesting of gpart GEOMs *and* is fundamentally based on a flawed assumption that any non-disk GEOM underneath gpart yields an invalid GPT. Think gnop. -- Marcel Moolenaar mar...@xcllnt.net ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: [CFC/CFT] large changes in the loader(8) code
On 28.06.2012 00:14, Marcel Moolenaar wrote: >> Our loader detects that primary GPT header is damaged. It tries to read >> backup GPT header from the last LBA and it detects that there is >> "GEOM::" signature. It tries to read one previous sector and there is >> *valid* GPT header. > > How do you know it's valid? It's in a location that is not valid > to begin with. Validity is based on rules and you're violating the > the rules without defining exactly what we call valid given the > new rules. This may seem nitpicking, but having went through the > hassle of dealing with the broken way we created the dangerously > dedicated disk, I appreciate the importance of being anal when it > comes to something that lives on non-volatile storage and gets to > be exposed to a world much larger than FreeBSD. So why do you not prevent to attach GEOM_PART_GPT to any providers that are not the disk drive? This will be the right solution to all our problems. Just don't create invalid GPT. -- WBR, Andrey V. Elsukov ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: [CFC/CFT] large changes in the loader(8) code
On Jun 27, 2012, at 12:27 PM, Andrey V. Elsukov wrote: > On 27.06.2012 21:55, Marcel Moolenaar wrote: >> You can't just re-interpret standards to match a context you know very well >> isn't applicable and consequently redefine what the word "device" means. >> You're on a slippery slope and while you may not see it as a problem, you >> do make it a problem for FreeBSD users. It's our users we should be keeping >> in mind when we solve problems. >> >>> If a user wants modify GPT in the disk editor from the another OS, >>> he can do it, and it should work. The result depends only from the >>> partition editor, >>> it might overwrite the last sector and might don't. >> >> Right. Another happy user that sees his/her FreeBSD installation destroyed >> or degraded (no mirroring, warning messages about corrupted GPT, etc) for >> no apparent reason and without any kind of warning that what he/she is doing >> is potentially harmful... That's the spirit! > > Ok. Let's return back to my patches. They don't add any new methods to > shoot in the foot. We are talking about the *FreeBSD loader*. > This is the program that starts FreeBSD kernel. It doesn't start other > OS. We already have many users who uses FreeBSD as a single system on > the machine. Many of them use GPT inside of some GEOM provider. Your patches are a continuation on a path that we're discussing isn't necessarily the path we should be on. While you don't make things worse from a compliance perspective, you make it worse by adding the non-compliant behaviour to more components. > As i understand there two parts where we haven't a consensus: > > 1. You are against from: > Our loader detects that primary GPT header is damaged. It tries to read > backup GPT header from the last LBA and it detects that there is > "GEOM::" signature. It tries to read one previous sector and there is > *valid* GPT header. How do you know it's valid? It's in a location that is not valid to begin with. Validity is based on rules and you're violating the the rules without defining exactly what we call valid given the new rules. This may seem nitpicking, but having went through the hassle of dealing with the broken way we created the dangerously dedicated disk, I appreciate the importance of being anal when it comes to something that lives on non-volatile storage and gets to be exposed to a world much larger than FreeBSD. > 2. You are against from having one fake PMBR entry by default in the > /boot/pmbr image. I don't understand what you're saying or what I'm being accused to be against. -- Marcel Moolenaar mar...@xcllnt.net ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: [CFC/CFT] large changes in the loader(8) code
On 27.06.2012 21:55, Marcel Moolenaar wrote: > You can't just re-interpret standards to match a context you know very well > isn't applicable and consequently redefine what the word "device" means. > You're on a slippery slope and while you may not see it as a problem, you > do make it a problem for FreeBSD users. It's our users we should be keeping > in mind when we solve problems. > >> If a user wants modify GPT in the disk editor from the another OS, >> he can do it, and it should work. The result depends only from the partition >> editor, >> it might overwrite the last sector and might don't. > > Right. Another happy user that sees his/her FreeBSD installation destroyed > or degraded (no mirroring, warning messages about corrupted GPT, etc) for > no apparent reason and without any kind of warning that what he/she is doing > is potentially harmful... That's the spirit! Ok. Let's return back to my patches. They don't add any new methods to shoot in the foot. We are talking about the *FreeBSD loader*. This is the program that starts FreeBSD kernel. It doesn't start other OS. We already have many users who uses FreeBSD as a single system on the machine. Many of them use GPT inside of some GEOM provider. You can just read the lists, articles about installing FreeBSD, forums, etc. We already have these users and i hope they will use FreeBSD as before. So, why can't add a simple quirk to make theirs system a bit more reliable? As i understand there two parts where we haven't a consensus: 1. You are against from: Our loader detects that primary GPT header is damaged. It tries to read backup GPT header from the last LBA and it detects that there is "GEOM::" signature. It tries to read one previous sector and there is *valid* GPT header. It is valid, because it's CRC is valid, it's self_LBA is valid. For the *FreeBSD* users it is better to don't use this GPT and just complain "i'm sorry, can't boot". The other OSes can't, and we shouldn't. 2. You are against from having one fake PMBR entry by default in the /boot/pmbr image. Ok, I can propose several ways to resolve this: * remove from the loader's GPT probing code restriction to necessarily have PMBR partition record in the MBR; * teach the boot0cfg command properly write the PMBR; * add new condition to mark GPT as corrupt when it has invalid PMBR. Thus, when you write PMBR with empty partition table with dd(1), the kernel will complain and you will be forced to run `gpart recover`. -- WBR, Andrey V. Elsukov signature.asc Description: OpenPGP digital signature
Re: [CFC/CFT] large changes in the loader(8) code
On Jun 27, 2012, at 12:08 PM, Christian Laursen wrote: > On 06/27/12 16:28, John Baldwin wrote: >> On Wednesday, June 27, 2012 8:45:45 am Andrey V. Elsukov wrote: >> >>> When we are in the FreeBSD, our loader can detect that device size >>> is lower than it see and it will work. When primary header is OK, then >>> other OSes should work with this GPT. When it isn't OK, you just can't >>> load other OS :) >> >> Ah, yes. The solution to violating standards is to make sure you never >> use standards-compliant software. That's a great argument. :) >> >> (Although not entirely uncommon. Standards aren't always perfect, but if >> we had a way to not gratuitously violate them it would be nice to avoid >> doing so.) > > To be standards compliant and allow whole-disk based mirroring to work at the > same time wouldn't nested GPT work like this? GPTs don't nest. > Nothing but FreeBSD would understand the freebsd-geom partition type, so the > inner GPT device should be valid and standards compliant. If it were standards compliant, it would be discoverable by non-FreeBSD. That clearly isn't the case -- hence it's not standards compliant. What for example if someone wanted to share the swap partition between Linux and FreeBSD? -- Marcel Moolenaar mar...@xcllnt.net ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: [CFC/CFT] large changes in the loader(8) code
On Jun 27, 2012, at 11:20 AM, Pawel Jakub Dawidek wrote: > On Wed, Jun 27, 2012 at 10:37:11AM -0700, Marcel Moolenaar wrote: >> >> On Jun 26, 2012, at 10:37 AM, John Baldwin wrote: >>> >>> GPT really wants the backup header at the last LBA. I know you can set it, >>> but I've interpreted that as a way to see if the primary header is correct >>> or >>> not. It seems to me that GPT tables created in this fashion (inside a GEOM >>> provider) will not work properly with partition editors for other OS's. >>> I'm >>> hesitant to encourage the use of this as I do think putting GPT inside of a >>> gmirror violates the GPT spec. >> >> Agreed. > > Guys. This doesn't violate the GPT spec in any way. The spec is > narrow-minded if it talks only about raw disks, but you should think > about gmirror as pseudo-hardware RAID. I'm sorry, but this is a contradiction. If it doesn't violate the spec, then the spec is not narrow-minded on the grounds of what we're discussing. If the spec *is* narrow-minded then obviously it doesn't capture our scenario, which means that we're violating the spec. Clearly we're not discussing anything that falls well within the spec, or is undebatable. This makes the whole topic dangerous anyway. When you're in the grey area (this is only for argument's sake -- we're in violation for sure) you're opening yourself up to compatibility problems. Should we deliberately go there? -- Marcel Moolenaar mar...@xcllnt.net ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: [CFC/CFT] large changes in the loader(8) code
On Wednesday, June 27, 2012 1:45:35 pm Marcel Moolenaar wrote: > > On Jun 26, 2012, at 2:43 PM, Pawel Jakub Dawidek wrote: > > > > As for sharing disk with other OS. If you share the disk with OS that > > doesn't support gmirror, you shouldn't use gmirror in the first place. > > You probably want to use only formats that are recognized by all your > > OSes. > > This statement is ridicuous by virtue of not being in touch with > reality and by making gmirror useless for such wide range of cases > that one can question why we have it at all. > > Put differently: a mirroring class is a fairly basic and useful thing > to have. Limiting it's use is nothing but artificial and follows from > having to use the underlying provider to store metadata. This then > changes the view of the underlying providing to consumers above gmirror > in a way that makes the presence or absence of gmirror visible. > Solving the visibility problem makes gmirror useful all the time. > I see that as a better way of looking at it than simply blurting out > that you shouldn't use gmirror when certain awkward and artifical > conditions apply. I'm not sure we can force gmirror to be anything except FreeBSD-specific, but it would be nice to not make non-standard GPT tables while we are at it. The reason the metadata for things like Intel's onboard SATA RAID does work ok is because the metadata format is enforced by the vendor, so it is reasonable to assume that metadata format will work across other OS's. Anyway, I've said my piece and will let the matter drop from my end at this point. -- John Baldwin ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: [CFC/CFT] large changes in the loader(8) code
On 2012-06-26 14:50, Andrey V. Elsukov wrote: > Some time ago i have started reading the code in the sys/boot. > Especially i'm interested in the partition tables handling. > I found several problems: > 1. There are several copies of the same code in the libi386/biosdisk.c > and common/disk.c, and partially libpc98/biosdisk.c. > 2. ZFS probing is very slow, because the ZFS code doesn't know how many > disks and partitions the system has: > http://www.freebsd.org/cgi/query-pr.cgi?pr=148296 > http://www.freebsd.org/cgi/query-pr.cgi?pr=161897 > 3. The GPT support doesn't check CRC and even doesn't know anything > about the secondary GPT header/table. > > So, i have created the branch and committed the changes: > http://svnweb.freebsd.org/base/user/ae/bootcode/ > The patch is here: > http://people.freebsd.org/~ae/boot.diff FWIW, I verified it compiles OK with clang, and especially boot2's size isn't increased at all. It would be nice if you could check it with clang now and again, before you finally merge this project into head. ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: [CFC/CFT] large changes in the loader(8) code
On 06/27/12 16:28, John Baldwin wrote: On Wednesday, June 27, 2012 8:45:45 am Andrey V. Elsukov wrote: When we are in the FreeBSD, our loader can detect that device size is lower than it see and it will work. When primary header is OK, then other OSes should work with this GPT. When it isn't OK, you just can't load other OS :) Ah, yes. The solution to violating standards is to make sure you never use standards-compliant software. That's a great argument. :) (Although not entirely uncommon. Standards aren't always perfect, but if we had a way to not gratuitously violate them it would be nice to avoid doing so.) To be standards compliant and allow whole-disk based mirroring to work at the same time wouldn't nested GPT work like this? Whole disk (start) | GPT header | GPT partition of type freebsd-geom (start) | | gmirror device (start) | | | GPT header | | | | freebsd-boot | | | | freebsd-ufs | | | | freebsd-swap | | | GPT backup header | | gmirror metadata | | gmirror device (end) | GPT partition of type freebsd-geom (end) | GPT backup header Whole disk (end) Nothing but FreeBSD would understand the freebsd-geom partition type, so the inner GPT device should be valid and standards compliant. The boot loader would of course need to understand this setup but that shouldn't be impossible. Just a thought. It might be too complicated compared to the non-standards compliant way it works now which works quite well in practice though. -- Christian Laursen ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: [CFC/CFT] large changes in the loader(8) code
On Jun 27, 2012, at 11:34 AM, Pawel Jakub Dawidek wrote: > > I'm sorry, Marcel, but what you describe here has nothing to do with > reality. To be able to implement realiable mirroring you have to use > on-disk metadata. There is no way around that. You can implement > non-redundant GEOM classes without using on-disk metadata, but > out-of-band configuration in case of mirroring is simply naive. How do > you detect that components are out of sync, for example? GEOM configuration and per-class runtime state are not to be treated the same. Out-of-band configuration is trivial. Per-class runtime state, like whether elements in a mirrored configuration are in sync or not is more difficult, but does not a priori require on-disk metadata as it's implemented now. You can have the configuration tell the GEOM where that state is being kept, so that you can put it in a partition on the disks involved, or even keep it independent from the disks, which then requires disks to be uniquely identifiable, for sure. But that's what GPT gives you anyway. But even without identification, you can invert the question from "how do I detect that components are out of sync" to "how do I prove they are in fact in sync". That question has a very simple O(n) answer. So, if time isn't a concern or your storage is small, you can always scan all sectors as such prove that the disks are in sync. The point being: the current implementation isn't the only one. Granted, it can easily be the simplest one or even the best one in some cases, but that's besides the point you were making. -- Marcel Moolenaar mar...@xcllnt.net ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: [CFC/CFT] large changes in the loader(8) code
On Wed, Jun 27, 2012 at 10:45:35AM -0700, Marcel Moolenaar wrote: > > On Jun 26, 2012, at 2:43 PM, Pawel Jakub Dawidek wrote: > > > > As for sharing disk with other OS. If you share the disk with OS that > > doesn't support gmirror, you shouldn't use gmirror in the first place. > > You probably want to use only formats that are recognized by all your > > OSes. > > This statement is ridicuous by virtue of not being in touch with > reality and by making gmirror useless for such wide range of cases > that one can question why we have it at all. > > Put differently: a mirroring class is a fairly basic and useful thing > to have. Limiting it's use is nothing but artificial and follows from > having to use the underlying provider to store metadata. This then > changes the view of the underlying providing to consumers above gmirror > in a way that makes the presence or absence of gmirror visible. > Solving the visibility problem makes gmirror useful all the time. > I see that as a better way of looking at it than simply blurting out > that you shouldn't use gmirror when certain awkward and artifical > conditions apply. I'm sorry, Marcel, but what you describe here has nothing to do with reality. To be able to implement realiable mirroring you have to use on-disk metadata. There is no way around that. You can implement non-redundant GEOM classes without using on-disk metadata, but out-of-band configuration in case of mirroring is simply naive. How do you detect that components are out of sync, for example? And when it comes to visablity. Are you suggesting that gmirror should present entire underlying provider to upper layers? Including its metadata? I hope not, because we went through that hell already (remember skipping first 16 sectors by UFS, as BSDlabel metadata might be there? The same for swap?). I think I did pretty good job by making the metadata as simple as possible - I use exactly one sector at the end of the target device. I'm really having a hard time to think of a simpler format. -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://tupytaj.pl pgpwcpFwrh1lk.pgp Description: PGP signature
Re: [CFC/CFT] large changes in the loader(8) code
On Wed, Jun 27, 2012 at 10:37:11AM -0700, Marcel Moolenaar wrote: > > On Jun 26, 2012, at 10:37 AM, John Baldwin wrote: > > > > GPT really wants the backup header at the last LBA. I know you can set it, > > but I've interpreted that as a way to see if the primary header is correct > > or > > not. It seems to me that GPT tables created in this fashion (inside a GEOM > > provider) will not work properly with partition editors for other OS's. > > I'm > > hesitant to encourage the use of this as I do think putting GPT inside of a > > gmirror violates the GPT spec. > > Agreed. Guys. This doesn't violate the GPT spec in any way. The spec is narrow-minded if it talks only about raw disks, but you should think about gmirror as pseudo-hardware RAID. That's all. If putting GPT on top of RAID array is spec violation, then I guess we just have to live with it. > While it is a nice trick to use the last sector for meta data, it does > create 2 problems. 1 is mentioned above. [...] It doesn't really matter where gmirror puts its metadata. If gmirror would keep its metadata in the first sector, gpart/gpt will find its metadata in the last sector and will complain about missing primary header. > [...] The second is that when there's > different metadata in the first *and* the last sector, you can't decide > which is to take precedence without also looking at the other and know > how to interpret it. We have not solved this second problem at all. We > do get reports about the problems though. At best we're handwaving or > kluging. This is different kind of problem. It took me a while to realize that, but now I know:) The real problem is that not all metadata formats are suitable for autodetection. That's all. The metadata I use in my GEOM classes play nice with autodetection. The solution is very easy - keep size of the disk device within metadata. This allows gmirror to figure out if it is configured on raw disk, last slice or last partition within last slice, etc. If GPT would keep disk size in its metadata the second problem you mentioned would not exist. And to be honest GPT kinda does that by having backup header's LBA stored in the primary header. And this is fine as long the primary header is valid. The same problem is with things like UFS labels. There is no way to properly support them using GEOM autodetection, because there is no provider size in UFS superblock. UFS superblock contains file system size, but it is not the same, as one can create smaller file system than the underlying disk device. > I think it's unwise to depend on FreeBSD-specific extensions or features > in industry-standard partitioning schemes and as such make the use of > "foreign" tools hard if not impossible. If you plan to use the given disk with FreeBSD only, what's the problem? Partitioning is not the end of the world. Even if you use "industry-standard partitioning schemes" what file system are you going to use to actually access your data? FAT? Of course if you do share your disk between various OSes then probably your best bet is to use MBR or GPT on raw disk and FAT file system. But if you use your disk with FreeBSD only, then I see no reason to not to leverage FreeBSD-specific features (be it gmirror, geli or zfs). > A much more flexible approach is to support out-of-band configuration > data. This allows us to mirror GPT disks without having to become non- > standard as it removes the need to use the last sector for meta-data. > The ability to construct GEOM hierarchies unambiguously is very > important and our current approach has proven to not deliver on that. > This is actually impacting existing FreeBSD consumers already, like > Juniper. So, se should not go deeper into this rabbit hole. We should > finally solve this problem for real... Marcel, nothing stops anyone from implementing GEOM mirror class that uses no on-disk metadata. GEOM is not a limiting factor here. GEOM does provide mechanism for autoconfiguration, but it is totally optional and GEOM class might choose not to use it. As an example you can take a look at two other GEOM classes of mine: gconcat(8) and gstripe(8). You can use 'label' subcommand to store metadata on component disks, which will take advantage of GEOM autodetection and autoconfiguration. You can also use 'create' subcommand to create ad hoc provider that stores no metadata and makes use of entire disks, which also means it won't be automatically created on next boot. For Juniper it might be more handy to use out-of-band configuration as you know the hardware you are running on, so you know where the disks are exactly, etc. My company build appliances too, so I have been there. For most of our users automatic configuration is simply better, as they can shuffle disks around and not wonder if the system will boot or not. -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes
Re: [CFC/CFT] large changes in the loader(8) code
On Jun 26, 2012, at 9:50 PM, Andrey V. Elsukov wrote: > If the primary GPT is corrupt, software must check the last LBA of the device > to see if it has a > valid GPT Header and point to a valid GPT Partition Entry Array." > > For the FreeBSD an each GEOM provider can be treated as disk device. > So, i don't see anything criminal if we will add some quirks in the our loader > for the better supporting of our technologies. You can't just re-interpret standards to match a context you know very well isn't applicable and consequently redefine what the word "device" means. You're on a slippery slope and while you may not see it as a problem, you do make it a problem for FreeBSD users. It's our users we should be keeping in mind when we solve problems. > If a user wants modify GPT in the disk editor from the another OS, > he can do it, and it should work. The result depends only from the partition > editor, > it might overwrite the last sector and might don't. Right. Another happy user that sees his/her FreeBSD installation destroyed or degraded (no mirroring, warning messages about corrupted GPT, etc) for no apparent reason and without any kind of warning that what he/she is doing is potentially harmful... That's the spirit! -- Marcel Moolenaar mar...@xcllnt.net ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: [CFC/CFT] large changes in the loader(8) code
On Jun 26, 2012, at 2:43 PM, Pawel Jakub Dawidek wrote: > > As for sharing disk with other OS. If you share the disk with OS that > doesn't support gmirror, you shouldn't use gmirror in the first place. > You probably want to use only formats that are recognized by all your > OSes. This statement is ridicuous by virtue of not being in touch with reality and by making gmirror useless for such wide range of cases that one can question why we have it at all. Put differently: a mirroring class is a fairly basic and useful thing to have. Limiting it's use is nothing but artificial and follows from having to use the underlying provider to store metadata. This then changes the view of the underlying providing to consumers above gmirror in a way that makes the presence or absence of gmirror visible. Solving the visibility problem makes gmirror useful all the time. I see that as a better way of looking at it than simply blurting out that you shouldn't use gmirror when certain awkward and artifical conditions apply. -- Marcel Moolenaar mar...@xcllnt.net ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: [CFC/CFT] large changes in the loader(8) code
On Jun 26, 2012, at 10:37 AM, John Baldwin wrote: > > GPT really wants the backup header at the last LBA. I know you can set it, > but I've interpreted that as a way to see if the primary header is correct or > not. It seems to me that GPT tables created in this fashion (inside a GEOM > provider) will not work properly with partition editors for other OS's. I'm > hesitant to encourage the use of this as I do think putting GPT inside of a > gmirror violates the GPT spec. Agreed. While it is a nice trick to use the last sector for meta data, it does create 2 problems. 1 is mentioned above. The second is that when there's different metadata in the first *and* the last sector, you can't decide which is to take precedence without also looking at the other and know how to interpret it. We have not solved this second problem at all. We do get reports about the problems though. At best we're handwaving or kluging. I think it's unwise to depend on FreeBSD-specific extensions or features in industry-standard partitioning schemes and as such make the use of "foreign" tools hard if not impossible. A much more flexible approach is to support out-of-band configuration data. This allows us to mirror GPT disks without having to become non- standard as it removes the need to use the last sector for meta-data. The ability to construct GEOM hierarchies unambiguously is very important and our current approach has proven to not deliver on that. This is actually impacting existing FreeBSD consumers already, like Juniper. So, se should not go deeper into this rabbit hole. We should finally solve this problem for real... -- Marcel Moolenaar mar...@xcllnt.net ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: [CFC/CFT] large changes in the loader(8) code
On Wednesday, June 27, 2012 8:45:45 am Andrey V. Elsukov wrote: > On 27.06.2012 16:07, John Baldwin wrote: > >> • Check the Signature > >> • Check the Header CRC > >> • Check that the MyLBA entry points to the LBA that contains the GUID > >> Partition Table > >> • Check the CRC of the GUID Partition Entry Array > >> If the GPT is the primary table, stored at LBA 1: > >> • Check the AlternateLBA to see if it is a valid GPT > >> If the primary GPT is corrupt, software must check the last LBA of the > >> device to see if it has a > >> valid GPT Header and point to a valid GPT Partition Entry Array." > > > > Right, we break the last rule. If you want to use a partition editor > > that doesn't grok gmirror (because you are using another OS's editor), > > to repair a GPT, it will do the wrong thing. > > When we are in the FreeBSD, our loader can detect that device size > is lower than it see and it will work. When primary header is OK, then > other OSes should work with this GPT. When it isn't OK, you just can't > load other OS :) Ah, yes. The solution to violating standards is to make sure you never use standards-compliant software. That's a great argument. :) (Although not entirely uncommon. Standards aren't always perfect, but if we had a way to not gratuitously violate them it would be nice to avoid doing so.) > > We can't write bootcode with gpart? What do you think the 'bootcode' > > command > > does? > > `gpart bootcode -b` reads file, creates ioctl request and sends this data to > the GEOM_PART class. GEOM_PART receives the control request, checks the data > and writes it to the provider. > `gpart bootcode -p` works like dd(1) and writes bootcode to the given > partition. > gpart(8) haven't any knowledge about specific partitioning scheme. Correct, but in both cases it writes "bootcode". > > Also, there is no reason we can't have a 'recover' command that attempts to > > recover a corrupted table including repairing the PMBR. gpart(8) already > > generates a full PMBR when you use 'gpart create' to create a GPT even > > though > > there isn't a GPT object yet. > > `gpart create` creates only ioctl control request to the GEOM_PART class. > GEOM_PART class creates new GPT geom object and this objects writes PMBR and > its > metadata to the provider. You can't add a new ioctl? -- John Baldwin ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: [CFC/CFT] large changes in the loader(8) code
On Wednesday, June 27, 2012 10:08:17 am Pawel Jakub Dawidek wrote: > On Wed, Jun 27, 2012 at 08:22:25AM -0400, John Baldwin wrote: > > > I don't think so. Most common case is to configure partitions on top of > > > a mirror. Mirroring partitions is less common. Mostly because of > > > hardware RAIDs being popular. You don't expect hardware RAID vendor to > > > mirror partitions. Partition editors for other OS's won't work, but only > > > because they don't support gmirror. If they wouldn't recognize and > > > support some hardware (or pseudo-hardware) RAIDs there will be the same > > > problem. > > > > Hardware RAIDs hide the metadata from the disk that the BIOS (and disk > > editors) see. Thus, putting a GPT on a hardware RAID volume works fine > > as the logical volume is always seen by all OS's consistently. [...] > > Only if you won't connect this disk to a different controller. Yes, but people do not expect to be able to yank a hardware RAID drive out and hook it up to a "raw" disk controller and have it work. > > [...] The same > > is even true of the "software" RAID that graid supports since the metadata > > is defined by the vendor and thus the logical volume is always seen other > > OS's consistently. > > But is it seen without metadata by the boot loader? Yes. The logical volume shows up as a BIOS disk device. > What I'm trying to say is that it is fair to expect from the user to not > use gmirror-configured disk on different OS. If the user wants to use > this disk in different OS then he has to use format that is recognized > by both. > > Because gmirror is supported by FreeBSD we should improve the support by > teaching boot loader about it. Pretending gmirror is special and > recommending to mirror partitions with it instead of raw disks is not > the solution. > > I really can't see how gmirror is different in this regard from any > other software RAID or volume manager. If you try to use disk that > contains unrecognized metadata the behaviour is undefined (but hopefully > not a panic). It is not gmirror I am complaining about, it is the non-standard use of GPT. Note that gmirror + MBR works fine without violating what little standard there is for the MBR. Using a dedicated GPT partition to hold the gmirrror metadata would work with GPT (but be a good bit harder to work with in terms of GEOM I realize). But as I said, I won't object to these patches. -- John Baldwin ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: [CFC/CFT] large changes in the loader(8) code
On Wed, Jun 27, 2012 at 08:22:25AM -0400, John Baldwin wrote: > > I don't think so. Most common case is to configure partitions on top of > > a mirror. Mirroring partitions is less common. Mostly because of > > hardware RAIDs being popular. You don't expect hardware RAID vendor to > > mirror partitions. Partition editors for other OS's won't work, but only > > because they don't support gmirror. If they wouldn't recognize and > > support some hardware (or pseudo-hardware) RAIDs there will be the same > > problem. > > Hardware RAIDs hide the metadata from the disk that the BIOS (and disk > editors) see. Thus, putting a GPT on a hardware RAID volume works fine > as the logical volume is always seen by all OS's consistently. [...] Only if you won't connect this disk to a different controller. > [...] The same > is even true of the "software" RAID that graid supports since the metadata > is defined by the vendor and thus the logical volume is always seen other > OS's consistently. But is it seen without metadata by the boot loader? What I'm trying to say is that it is fair to expect from the user to not use gmirror-configured disk on different OS. If the user wants to use this disk in different OS then he has to use format that is recognized by both. Because gmirror is supported by FreeBSD we should improve the support by teaching boot loader about it. Pretending gmirror is special and recommending to mirror partitions with it instead of raw disks is not the solution. I really can't see how gmirror is different in this regard from any other software RAID or volume manager. If you try to use disk that contains unrecognized metadata the behaviour is undefined (but hopefully not a panic). -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://tupytaj.pl pgpQmYWVuPgKs.pgp Description: PGP signature
Re: [CFC/CFT] large changes in the loader(8) code
On 27.06.2012 16:07, John Baldwin wrote: >> • Check the Signature >> • Check the Header CRC >> • Check that the MyLBA entry points to the LBA that contains the GUID >> Partition Table >> • Check the CRC of the GUID Partition Entry Array >> If the GPT is the primary table, stored at LBA 1: >> • Check the AlternateLBA to see if it is a valid GPT >> If the primary GPT is corrupt, software must check the last LBA of the >> device to see if it has a >> valid GPT Header and point to a valid GPT Partition Entry Array." > > Right, we break the last rule. If you want to use a partition editor > that doesn't grok gmirror (because you are using another OS's editor), > to repair a GPT, it will do the wrong thing. When we are in the FreeBSD, our loader can detect that device size is lower than it see and it will work. When primary header is OK, then other OSes should work with this GPT. When it isn't OK, you just can't load other OS :) >>> As I said earlier, I do not think this is appropriate and that instead >>> gpart should have an appropriate 'recover' command to install just the pmbr >>> on >>> a disk and also create a correct entry in the MBR if needed while doing so. >> >> gpart(8) is only one of several geom(8)' tools to manage objects of a GEOM >> class. >> It only sends control requests to the kernel. If GPT is not detected, >> there is no geom objects to manage. And we can't write bootcode with >> gpart(8). >> I think that adding such functions to the gpart(8) is not good. Maybe, >> the boot0cfg is the better tool for that. Also we still haven't any tool to >> install zfsboot. > > We can't write bootcode with gpart? What do you think the 'bootcode' command > does? `gpart bootcode -b` reads file, creates ioctl request and sends this data to the GEOM_PART class. GEOM_PART receives the control request, checks the data and writes it to the provider. `gpart bootcode -p` works like dd(1) and writes bootcode to the given partition. gpart(8) haven't any knowledge about specific partitioning scheme. > Also, there is no reason we can't have a 'recover' command that attempts to > recover a corrupted table including repairing the PMBR. gpart(8) already > generates a full PMBR when you use 'gpart create' to create a GPT even though > there isn't a GPT object yet. `gpart create` creates only ioctl control request to the GEOM_PART class. GEOM_PART class creates new GPT geom object and this objects writes PMBR and its metadata to the provider. -- WBR, Andrey V. Elsukov ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: [CFC/CFT] large changes in the loader(8) code
On Tuesday, June 26, 2012 5:23:08 pm Pawel Jakub Dawidek wrote: > On Tue, Jun 26, 2012 at 01:37:11PM -0400, John Baldwin wrote: > > > 4. The gptboot now searches the backup GPT header in the previous sectors, > > > when it finds the "GEOM::" signature in the last sector. PMBR code also > > > tries to do the same: > > > common/gpt.c > > > i386/pmbr/pmbr.s > > > > GPT really wants the backup header at the last LBA. I know you can set it, > > but I've interpreted that as a way to see if the primary header is correct > > or > > not. [...] > > My interpretation is different: The way to verify if the header is valid > is to check its checksum, not to check if the backup header location in > the primary header points at the last LBA. > > Of course if primary header's checksum is incorrect it is hard to trust > that the backup header location is correct. And we need the backup > header when the primary header is invalid... Right, which is why this fails. > > [...] It seems to me that GPT tables created in this fashion (inside a GEOM > > provider) will not work properly with partition editors for other OS's. > > I'm > > hesitant to encourage the use of this as I do think putting GPT inside of a > > gmirror violates the GPT spec. > > I don't think so. Most common case is to configure partitions on top of > a mirror. Mirroring partitions is less common. Mostly because of > hardware RAIDs being popular. You don't expect hardware RAID vendor to > mirror partitions. Partition editors for other OS's won't work, but only > because they don't support gmirror. If they wouldn't recognize and > support some hardware (or pseudo-hardware) RAIDs there will be the same > problem. Hardware RAIDs hide the metadata from the disk that the BIOS (and disk editors) see. Thus, putting a GPT on a hardware RAID volume works fine as the logical volume is always seen by all OS's consistently. The same is even true of the "software" RAID that graid supports since the metadata is defined by the vendor and thus the logical volume is always seen other OS's consistently. My approach has been to only use gmirror with MBR so far, though I realize that doesn't work above 2TB (until recently one had to have a hardware RAID to get above 2TB anyway which made this last a moot point). I won't object to patch our tools to handle this, but I think it is a really bad idea that users will have a hard way to recover from when they are bitten by it. -- John Baldwin ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: [CFC/CFT] large changes in the loader(8) code
On Wednesday, June 27, 2012 12:50:20 am Andrey V. Elsukov wrote: > On 26.06.2012 21:37, John Baldwin wrote: > >> 4. The gptboot now searches the backup GPT header in the previous sectors, > >> when it finds the "GEOM::" signature in the last sector. PMBR code also > >> tries to do the same: > >> common/gpt.c > >> i386/pmbr/pmbr.s > > > > GPT really wants the backup header at the last LBA. I know you can set it, > > but I've interpreted that as a way to see if the primary header is correct > > or > > not. It seems to me that GPT tables created in this fashion (inside a GEOM > > provider) will not work properly with partition editors for other OS's. > > I'm > > hesitant to encourage the use of this as I do think putting GPT inside of a > > gmirror violates the GPT spec. > > The standard says: > "The following test must be performed to determine if a GPT is valid: > • Check the Signature > • Check the Header CRC > • Check that the MyLBA entry points to the LBA that contains the GUID > Partition Table > • Check the CRC of the GUID Partition Entry Array > If the GPT is the primary table, stored at LBA 1: > • Check the AlternateLBA to see if it is a valid GPT > If the primary GPT is corrupt, software must check the last LBA of the device > to see if it has a > valid GPT Header and point to a valid GPT Partition Entry Array." Right, we break the last rule. If you want to use a partition editor that doesn't grok gmirror (because you are using another OS's editor), to repair a GPT, it will do the wrong thing. > If a user wants modify GPT in the disk editor from the another OS, > he can do it, and it should work. The result depends only from the partition > editor, > it might overwrite the last sector and might don't. I would not assume it would work at all. If it can't trust the primary GPT, it has to assume the alternate is at the last LBA. > >> 5. Also the pmbr image now contains one fake partition record. > >> When several first sectors are damaged the kernel can't detect GPT > >> (see RECOVERING section in the gpart(8)). We can restore PMBR with dd(1) > >> command, but the old pmbr image has an empty partition table and > >> loader doesn't able to boot from GPT, when there is no partition record > >> in the PMBR. Now it will be able. When pmbr is installed via 'gpart > > bootcode' > >> command, the kernel correctly modifies this partition record. So, this is > > only > >> for the first rescue step. > > > > As I said earlier, I do not think this is appropriate and that instead > > gpart should have an appropriate 'recover' command to install just the pmbr > > on > > a disk and also create a correct entry in the MBR if needed while doing so. > > gpart(8) is only one of several geom(8)' tools to manage objects of a GEOM > class. > It only sends control requests to the kernel. If GPT is not detected, > there is no geom objects to manage. And we can't write bootcode with gpart(8). > I think that adding such functions to the gpart(8) is not good. Maybe, > the boot0cfg is the better tool for that. Also we still haven't any tool to > install zfsboot. We can't write bootcode with gpart? What do you think the 'bootcode' command does? Also, there is no reason we can't have a 'recover' command that attempts to recover a corrupted table including repairing the PMBR. gpart(8) already generates a full PMBR when you use 'gpart create' to create a GPT even though there isn't a GPT object yet. -- John Baldwin ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: [CFC/CFT] large changes in the loader(8) code
on 27/06/2012 07:50 Andrey V. Elsukov said the following: > Also we still haven't any tool to install zfsboot. Yeah, I think it would be nice if ZFS provided some interface (ioctl?) to properly write stuff to its special areas. -- Andriy Gapon ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: [CFC/CFT] large changes in the loader(8) code
On 27.06.2012 1:41, Kevin Oberman wrote: > Long ago I saw a proposal to create a dedicated partition on GPT to > hold the metadata. With the large number of partitions available on > GPT, tying up one just for GEOM seems like a low price and it moves > the device GEOM out of the realm of FreeBSD unique and subject to > serious issues when/if a disk is shared with some other OS. I have > seen little comment on this and have never seen any argument that that > it could not work. When you share some disk with another OS, it seems that much serious issue will be when other OS did some changes in your mirror without you knowing. I know about successful sharing of the disk between Windows and FreeBSD via graid on the Intel pseudo raid. Just use compatible technologies. -- WBR, Andrey V. Elsukov signature.asc Description: OpenPGP digital signature
Re: [CFC/CFT] large changes in the loader(8) code
On 26.06.2012 21:37, John Baldwin wrote: >> 4. The gptboot now searches the backup GPT header in the previous sectors, >> when it finds the "GEOM::" signature in the last sector. PMBR code also >> tries to do the same: >> common/gpt.c >> i386/pmbr/pmbr.s > > GPT really wants the backup header at the last LBA. I know you can set it, > but I've interpreted that as a way to see if the primary header is correct or > not. It seems to me that GPT tables created in this fashion (inside a GEOM > provider) will not work properly with partition editors for other OS's. I'm > hesitant to encourage the use of this as I do think putting GPT inside of a > gmirror violates the GPT spec. The standard says: "The following test must be performed to determine if a GPT is valid: • Check the Signature • Check the Header CRC • Check that the MyLBA entry points to the LBA that contains the GUID Partition Table • Check the CRC of the GUID Partition Entry Array If the GPT is the primary table, stored at LBA 1: • Check the AlternateLBA to see if it is a valid GPT If the primary GPT is corrupt, software must check the last LBA of the device to see if it has a valid GPT Header and point to a valid GPT Partition Entry Array." For the FreeBSD an each GEOM provider can be treated as disk device. So, i don't see anything criminal if we will add some quirks in the our loader for the better supporting of our technologies. If a user wants modify GPT in the disk editor from the another OS, he can do it, and it should work. The result depends only from the partition editor, it might overwrite the last sector and might don't. >> 5. Also the pmbr image now contains one fake partition record. >> When several first sectors are damaged the kernel can't detect GPT >> (see RECOVERING section in the gpart(8)). We can restore PMBR with dd(1) >> command, but the old pmbr image has an empty partition table and >> loader doesn't able to boot from GPT, when there is no partition record >> in the PMBR. Now it will be able. When pmbr is installed via 'gpart > bootcode' >> command, the kernel correctly modifies this partition record. So, this is > only >> for the first rescue step. > > As I said earlier, I do not think this is appropriate and that instead > gpart should have an appropriate 'recover' command to install just the pmbr > on > a disk and also create a correct entry in the MBR if needed while doing so. gpart(8) is only one of several geom(8)' tools to manage objects of a GEOM class. It only sends control requests to the kernel. If GPT is not detected, there is no geom objects to manage. And we can't write bootcode with gpart(8). I think that adding such functions to the gpart(8) is not good. Maybe, the boot0cfg is the better tool for that. Also we still haven't any tool to install zfsboot. -- WBR, Andrey V. Elsukov signature.asc Description: OpenPGP digital signature
Re: [CFC/CFT] large changes in the loader(8) code
On Tue, Jun 26, 2012 at 02:41:31PM -0700, Kevin Oberman wrote: > Long ago I saw a proposal to create a dedicated partition on GPT to > hold the metadata. With the large number of partitions available on > GPT, tying up one just for GEOM seems like a low price and it moves > the device GEOM out of the realm of FreeBSD unique and subject to > serious issues when/if a disk is shared with some other OS. I have > seen little comment on this and have never seen any argument that that > it could not work. > > I think this is an issue that will continue to bite users unless it is fixed. I don't really see how dedicating a partition for metadata can work or is good idea, sorry. As for sharing disk with other OS. If you share the disk with OS that doesn't support gmirror, you shouldn't use gmirror in the first place. You probably want to use only formats that are recognized by all your OSes. -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://tupytaj.pl pgprySQFgONfF.pgp Description: PGP signature
Re: [CFC/CFT] large changes in the loader(8) code
On Tue, Jun 26, 2012 at 2:23 PM, Pawel Jakub Dawidek wrote: > On Tue, Jun 26, 2012 at 01:37:11PM -0400, John Baldwin wrote: >> > 4. The gptboot now searches the backup GPT header in the previous sectors, >> > when it finds the "GEOM::" signature in the last sector. PMBR code also >> > tries to do the same: >> > common/gpt.c >> > i386/pmbr/pmbr.s >> >> GPT really wants the backup header at the last LBA. I know you can set it, >> but I've interpreted that as a way to see if the primary header is correct or >> not. [...] > > My interpretation is different: The way to verify if the header is valid > is to check its checksum, not to check if the backup header location in > the primary header points at the last LBA. > > Of course if primary header's checksum is incorrect it is hard to trust > that the backup header location is correct. And we need the backup > header when the primary header is invalid... > >> [...] It seems to me that GPT tables created in this fashion (inside a GEOM >> provider) will not work properly with partition editors for other OS's. I'm >> hesitant to encourage the use of this as I do think putting GPT inside of a >> gmirror violates the GPT spec. > > I don't think so. Most common case is to configure partitions on top of > a mirror. Mirroring partitions is less common. Mostly because of > hardware RAIDs being popular. You don't expect hardware RAID vendor to > mirror partitions. Partition editors for other OS's won't work, but only > because they don't support gmirror. If they wouldn't recognize and > support some hardware (or pseudo-hardware) RAIDs there will be the same > problem. > > In other words, IMHO, our problem is that FreeBSD's boot code doesn't > recognize/support gmirror's metadata. What Andrey is proposing is to > recognize the metadata and act accordingly - in case of a gmirror we > simply need to skip it. > > In the future we will have the same problem with graid - until we add > support for it to the boot code, we won't be able to boot from it. Long ago I saw a proposal to create a dedicated partition on GPT to hold the metadata. With the large number of partitions available on GPT, tying up one just for GEOM seems like a low price and it moves the device GEOM out of the realm of FreeBSD unique and subject to serious issues when/if a disk is shared with some other OS. I have seen little comment on this and have never seen any argument that that it could not work. I think this is an issue that will continue to bite users unless it is fixed. -- R. Kevin Oberman, Network Engineer E-mail: kob6...@gmail.com ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: [CFC/CFT] large changes in the loader(8) code
On Tue, Jun 26, 2012 at 01:37:11PM -0400, John Baldwin wrote: > > 4. The gptboot now searches the backup GPT header in the previous sectors, > > when it finds the "GEOM::" signature in the last sector. PMBR code also > > tries to do the same: > > common/gpt.c > > i386/pmbr/pmbr.s > > GPT really wants the backup header at the last LBA. I know you can set it, > but I've interpreted that as a way to see if the primary header is correct or > not. [...] My interpretation is different: The way to verify if the header is valid is to check its checksum, not to check if the backup header location in the primary header points at the last LBA. Of course if primary header's checksum is incorrect it is hard to trust that the backup header location is correct. And we need the backup header when the primary header is invalid... > [...] It seems to me that GPT tables created in this fashion (inside a GEOM > provider) will not work properly with partition editors for other OS's. I'm > hesitant to encourage the use of this as I do think putting GPT inside of a > gmirror violates the GPT spec. I don't think so. Most common case is to configure partitions on top of a mirror. Mirroring partitions is less common. Mostly because of hardware RAIDs being popular. You don't expect hardware RAID vendor to mirror partitions. Partition editors for other OS's won't work, but only because they don't support gmirror. If they wouldn't recognize and support some hardware (or pseudo-hardware) RAIDs there will be the same problem. In other words, IMHO, our problem is that FreeBSD's boot code doesn't recognize/support gmirror's metadata. What Andrey is proposing is to recognize the metadata and act accordingly - in case of a gmirror we simply need to skip it. In the future we will have the same problem with graid - until we add support for it to the boot code, we won't be able to boot from it. -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://tupytaj.pl pgpxIGC3e4HIp.pgp Description: PGP signature
Re: [CFC/CFT] large changes in the loader(8) code
On Tuesday, June 26, 2012 8:50:36 am Andrey V. Elsukov wrote: > Hi All, > > Some time ago i have started reading the code in the sys/boot. > Especially i'm interested in the partition tables handling. > I found several problems: > 1. There are several copies of the same code in the libi386/biosdisk.c > and common/disk.c, and partially libpc98/biosdisk.c. > 2. ZFS probing is very slow, because the ZFS code doesn't know how many > disks and partitions the system has: > http://www.freebsd.org/cgi/query-pr.cgi?pr=148296 > http://www.freebsd.org/cgi/query-pr.cgi?pr=161897 > 3. The GPT support doesn't check CRC and even doesn't know anything > about the secondary GPT header/table. > > So, i have created the branch and committed the changes: > http://svnweb.freebsd.org/base/user/ae/bootcode/ > The patch is here: > http://people.freebsd.org/~ae/boot.diff > > What i already did: > 1. The partition tables handling now is machine independent, > and it is compatible with the kernel's GEOM_PART implementation. > There is new API for disk drivers in the loader to get information > about partitions and tables: > common/Makefile.inc > common/part.c > common/part.h > > 2. The similar and general code from the disk drivers merged in the > disk.c: > common/disk.c > common/disk.h > i386/libi386/libi386.h > i386/libi386/biosdisk.c > userboot/test/test.c > userboot/userboot/userboot_disk.c > userboot/userboot.h > 3. ZFS code now uses new API and probing on the systems with many disks > should be greatly increased: > zfs/zfs.c > i386/loader/main.c > 4. The gptboot now searches the backup GPT header in the previous sectors, > when it finds the "GEOM::" signature in the last sector. PMBR code also > tries to do the same: > common/gpt.c > i386/pmbr/pmbr.s GPT really wants the backup header at the last LBA. I know you can set it, but I've interpreted that as a way to see if the primary header is correct or not. It seems to me that GPT tables created in this fashion (inside a GEOM provider) will not work properly with partition editors for other OS's. I'm hesitant to encourage the use of this as I do think putting GPT inside of a gmirror violates the GPT spec. > 5. Also the pmbr image now contains one fake partition record. > When several first sectors are damaged the kernel can't detect GPT > (see RECOVERING section in the gpart(8)). We can restore PMBR with dd(1) > command, but the old pmbr image has an empty partition table and > loader doesn't able to boot from GPT, when there is no partition record > in the PMBR. Now it will be able. When pmbr is installed via 'gpart bootcode' > command, the kernel correctly modifies this partition record. So, this is only > for the first rescue step. As I said earlier, I do not think this is appropriate and that instead gpart should have an appropriate 'recover' command to install just the pmbr on a disk and also create a correct entry in the MBR if needed while doing so. > 6. I have changed userboot interface. I guess there is none consumers except > the one test program. But if it isn't that, i can make it compatible. One other consumer is in the bhyve branch. I think the 'kload' patches also use it. However, they can probably be adapted easily. [ Note, I haven't done a detailed review of the patch at all yet. ] -- John Baldwin ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: [CFC/CFT] large changes in the loader(8) code
On Tue, Jun 26, 2012 at 06:01:26PM +0400, Andrey V. Elsukov wrote: > On 26.06.2012 16:57, Pawel Jakub Dawidek wrote: > > On Tue, Jun 26, 2012 at 04:50:36PM +0400, Andrey V. Elsukov wrote: > >> Hi All, > >> > >> Some time ago i have started reading the code in the sys/boot. > >> Especially i'm interested in the partition tables handling. > >> I found several problems: > >> 1. There are several copies of the same code in the libi386/biosdisk.c > >> and common/disk.c, and partially libpc98/biosdisk.c. > >> 2. ZFS probing is very slow, because the ZFS code doesn't know how many > >> disks and partitions the system has: > >>http://www.freebsd.org/cgi/query-pr.cgi?pr=148296 > >>http://www.freebsd.org/cgi/query-pr.cgi?pr=161897 > >> 3. The GPT support doesn't check CRC and even doesn't know anything > >> about the secondary GPT header/table. > > > > Just a quick note here. At some point when I was adding GPT attributes > > to allow for test starts I greatly improved, at least parts of, the GPT > > implementation. I did implement support for both CRC checksum > > verification and fallback to backup GPT header when primary is broken. > > And the code is still in sys/boot/common/gpt.c. So my question would be > > what do you mean by this sentence? > > Yes, gptboot does that, but the loader/zfsloader doesn't. So there might > be a situation when gptboot does boot, but loader(8) can't. I see. I don't know if I'll find time for a proper review, but it is really great that you are working on cleaning up this huge mess. -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://tupytaj.pl pgp62kXgtS21u.pgp Description: PGP signature
Re: [CFC/CFT] large changes in the loader(8) code
On 26.06.2012 16:57, Pawel Jakub Dawidek wrote: > On Tue, Jun 26, 2012 at 04:50:36PM +0400, Andrey V. Elsukov wrote: >> Hi All, >> >> Some time ago i have started reading the code in the sys/boot. >> Especially i'm interested in the partition tables handling. >> I found several problems: >> 1. There are several copies of the same code in the libi386/biosdisk.c >> and common/disk.c, and partially libpc98/biosdisk.c. >> 2. ZFS probing is very slow, because the ZFS code doesn't know how many >> disks and partitions the system has: >> http://www.freebsd.org/cgi/query-pr.cgi?pr=148296 >> http://www.freebsd.org/cgi/query-pr.cgi?pr=161897 >> 3. The GPT support doesn't check CRC and even doesn't know anything >> about the secondary GPT header/table. > > Just a quick note here. At some point when I was adding GPT attributes > to allow for test starts I greatly improved, at least parts of, the GPT > implementation. I did implement support for both CRC checksum > verification and fallback to backup GPT header when primary is broken. > And the code is still in sys/boot/common/gpt.c. So my question would be > what do you mean by this sentence? Yes, gptboot does that, but the loader/zfsloader doesn't. So there might be a situation when gptboot does boot, but loader(8) can't. -- WBR, Andrey V. Elsukov signature.asc Description: OpenPGP digital signature
Re: [CFC/CFT] large changes in the loader(8) code
On Tue, Jun 26, 2012 at 04:50:36PM +0400, Andrey V. Elsukov wrote: > Hi All, > > Some time ago i have started reading the code in the sys/boot. > Especially i'm interested in the partition tables handling. > I found several problems: > 1. There are several copies of the same code in the libi386/biosdisk.c > and common/disk.c, and partially libpc98/biosdisk.c. > 2. ZFS probing is very slow, because the ZFS code doesn't know how many > disks and partitions the system has: > http://www.freebsd.org/cgi/query-pr.cgi?pr=148296 > http://www.freebsd.org/cgi/query-pr.cgi?pr=161897 > 3. The GPT support doesn't check CRC and even doesn't know anything > about the secondary GPT header/table. Just a quick note here. At some point when I was adding GPT attributes to allow for test starts I greatly improved, at least parts of, the GPT implementation. I did implement support for both CRC checksum verification and fallback to backup GPT header when primary is broken. And the code is still in sys/boot/common/gpt.c. So my question would be what do you mean by this sentence? > So, i have created the branch and committed the changes: > http://svnweb.freebsd.org/base/user/ae/bootcode/ > The patch is here: > http://people.freebsd.org/~ae/boot.diff > > What i already did: > 1. The partition tables handling now is machine independent, > and it is compatible with the kernel's GEOM_PART implementation. > There is new API for disk drivers in the loader to get information > about partitions and tables: > common/Makefile.inc > common/part.c > common/part.h > > 2. The similar and general code from the disk drivers merged in the > disk.c: > common/disk.c > common/disk.h > i386/libi386/libi386.h > i386/libi386/biosdisk.c > userboot/test/test.c > userboot/userboot/userboot_disk.c > userboot/userboot.h > 3. ZFS code now uses new API and probing on the systems with many disks > should be greatly increased: > zfs/zfs.c > i386/loader/main.c > 4. The gptboot now searches the backup GPT header in the previous sectors, > when it finds the "GEOM::" signature in the last sector. PMBR code also > tries to do the same: > common/gpt.c > i386/pmbr/pmbr.s > > 5. Also the pmbr image now contains one fake partition record. > When several first sectors are damaged the kernel can't detect GPT > (see RECOVERING section in the gpart(8)). We can restore PMBR with dd(1) > command, but the old pmbr image has an empty partition table and > loader doesn't able to boot from GPT, when there is no partition record > in the PMBR. Now it will be able. When pmbr is installed via 'gpart bootcode' > command, the kernel correctly modifies this partition record. So, this is only > for the first rescue step. > > 6. I have changed userboot interface. I guess there is none consumers except > the one test program. But if it isn't that, i can make it compatible. > > Any comments are welcome. > > -- > WBR, Andrey V. Elsukov > > -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://tupytaj.pl pgpWMg9tnH9tN.pgp Description: PGP signature
[CFC/CFT] large changes in the loader(8) code
Hi All, Some time ago i have started reading the code in the sys/boot. Especially i'm interested in the partition tables handling. I found several problems: 1. There are several copies of the same code in the libi386/biosdisk.c and common/disk.c, and partially libpc98/biosdisk.c. 2. ZFS probing is very slow, because the ZFS code doesn't know how many disks and partitions the system has: http://www.freebsd.org/cgi/query-pr.cgi?pr=148296 http://www.freebsd.org/cgi/query-pr.cgi?pr=161897 3. The GPT support doesn't check CRC and even doesn't know anything about the secondary GPT header/table. So, i have created the branch and committed the changes: http://svnweb.freebsd.org/base/user/ae/bootcode/ The patch is here: http://people.freebsd.org/~ae/boot.diff What i already did: 1. The partition tables handling now is machine independent, and it is compatible with the kernel's GEOM_PART implementation. There is new API for disk drivers in the loader to get information about partitions and tables: common/Makefile.inc common/part.c common/part.h 2. The similar and general code from the disk drivers merged in the disk.c: common/disk.c common/disk.h i386/libi386/libi386.h i386/libi386/biosdisk.c userboot/test/test.c userboot/userboot/userboot_disk.c userboot/userboot.h 3. ZFS code now uses new API and probing on the systems with many disks should be greatly increased: zfs/zfs.c i386/loader/main.c 4. The gptboot now searches the backup GPT header in the previous sectors, when it finds the "GEOM::" signature in the last sector. PMBR code also tries to do the same: common/gpt.c i386/pmbr/pmbr.s 5. Also the pmbr image now contains one fake partition record. When several first sectors are damaged the kernel can't detect GPT (see RECOVERING section in the gpart(8)). We can restore PMBR with dd(1) command, but the old pmbr image has an empty partition table and loader doesn't able to boot from GPT, when there is no partition record in the PMBR. Now it will be able. When pmbr is installed via 'gpart bootcode' command, the kernel correctly modifies this partition record. So, this is only for the first rescue step. 6. I have changed userboot interface. I guess there is none consumers except the one test program. But if it isn't that, i can make it compatible. Any comments are welcome. -- WBR, Andrey V. Elsukov signature.asc Description: OpenPGP digital signature