Re: RFC: Project geom-events
Lev Serebryakov l...@freebsd.org wrote: GPT must have backup copy in last sector by standard ... In that case, shouldn't it refuse to install on any provider that is not in fact a disk, so as not to create configurations that cannot work properly? MBR doesn;t have any additional metadata. How adding one will help it? It would add robustness, for cases like the one that started this thread. If MBR put a GEOM metadata block at the end of its provider, it would fix the tasting race when an MBR is installed on a glabelled (or gmirrored) drive. ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: RFC: Project geom-events
Lev Serebryakov l...@freebsd.org wrote: GPT _must_ be placed twice -- at first and last sectors (really, more than one sectors). By standard. Secondary copy must be at end of disk. Period. Then, by standard GPT cannot coexist with GLABEL. Such setup should be disallowed, or at least big nasty message that you have just shoot yourself in the leg should be output. (period) Ok, maybe adding check to geom_part, that it is used on rank-1 provider (whole disk) is not so bad idea. But it then raise question how to install FreeBSD on software mirror, what is useful. To install FreeBSD on a gmirrored disk, use MBR (or dangerously dedicated BSD label) instead of GPT. (This is one reason why BSD label and MBR should not be considered obsolete.) If you want to use gmirror and *have* to use GPT, e.g. if you have a (hypothetical) BIOS which will not boot from MBR, mirror the individual partitions instead of the whole disk. Granted that is more trouble, both to set up initially and to replace a failed drive. ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: RFC: Project geom-events
John j...@freebsd.org wrote: ... gpart should show warning message if user is trying to put GPT on non real disk devices. ... This also seem to prevent something useful like: # camcontrol inquiry da0 pass2: HP EH0146FAWJB HPDD Fixed Direct Access SCSI-5 device pass2: Serial Number 3TB1BKGX9036W9EN pass2: 600.000MB/s transfers, Command Queueing Enabled # camcontrol inquiry da25 pass27: HP EH0146FAWJB HPDD Fixed Direct Access SCSI-5 device pass27: Serial Number 3TB1BKGX9036W9EN pass27: 600.000MB/s transfers, Command Queueing Enabled # gmultipath label ZFS0 da0 da25 # gpart create -s gpt $device # gpart add -s 128 -t freebsd-boot $device # Create 64K boot partition # gpart add -s 4m -t freebsd-ufs -l mb$dev $device # small partition # gpart add-t freebsd-zfs -l $dev $device # Remaining space for zfs It seems like protecting your partitions with multiple paths would be a good thing. I've been experimenting with this and end up with corrupt partitions. The setting of $device is not shown, but I suppose it is the name of the multipath provider. I'm not familiar with gmultipath, but it would not surprise me if (like most GEOMs) it were putting its metadata in the last block(s) of its providers and therefore encountering the same issues as gmirror and glabel. In that case, the best fix may be to define the multipathing per-partition instead of per-device (if that is possible), or to use MBR/BSD instead of GPT for partitioning. ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: RFC: Project geom-events
Lev Serebryakov wrote: Hello, Miroslav. You wrote 6 октября 2011 г., 16:59:19: [...] The current state is simply wrong, because user can do something what cannot work and is not documented anywhere. It is Ok in UNIX way, in general. You should be able to shoot your leg, it is good :) I am sorry for my late reply. Foot shooting is OK, if somebody wants to shoot his foot, but I don't want to shoot my foot if I am aiming at my head :) But if geom_label doesn't reduce its provider to count its own metadata, it looks like a bug! As Ivan Voras explained, it is not a bug, it is just a matter of mixing two things thant can't coexist together. So the problem is that it is not mentioned anywhere in the FreeBSD docs. (Thank you Ivan for your explanation!) And as somebody else already mentioned in this thread, it should be documented in manpages and Handbook and gpart should show warning message if user is trying to put GPT on non real disk devices. As is mentioned in the thread Memstick image differences between 8.x and 9.x, the GPT brings more problems by requirement of second table at the end of the device (so disk image cannot be easily written by dd on bigger disk) ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: RFC: Project geom-events
- Miroslav Lachman's Original Message - Lev Serebryakov wrote: Hello, Miroslav. You wrote 6 ?? 2011 ??., 16:59:19: [...] The current state is simply wrong, because user can do something what cannot work and is not documented anywhere. It is Ok in UNIX way, in general. You should be able to shoot your leg, it is good :) I am sorry for my late reply. Foot shooting is OK, if somebody wants to shoot his foot, but I don't want to shoot my foot if I am aiming at my head :) But if geom_label doesn't reduce its provider to count its own metadata, it looks like a bug! As Ivan Voras explained, it is not a bug, it is just a matter of mixing two things thant can't coexist together. So the problem is that it is not mentioned anywhere in the FreeBSD docs. (Thank you Ivan for your explanation!) And as somebody else already mentioned in this thread, it should be documented in manpages and Handbook and gpart should show warning message if user is trying to put GPT on non real disk devices. As is mentioned in the thread Memstick image differences between 8.x and 9.x, the GPT brings more problems by requirement of second table at the end of the device (so disk image cannot be easily written by dd on bigger disk) This also seem to prevent something useful like: # camcontrol inquiry da0 pass2: HP EH0146FAWJB HPDD Fixed Direct Access SCSI-5 device pass2: Serial Number 3TB1BKGX9036W9EN pass2: 600.000MB/s transfers, Command Queueing Enabled # camcontrol inquiry da25 pass27: HP EH0146FAWJB HPDD Fixed Direct Access SCSI-5 device pass27: Serial Number 3TB1BKGX9036W9EN pass27: 600.000MB/s transfers, Command Queueing Enabled # gmultipath label ZFS0 da0 da25 # gpart create -s gpt $device # gpart add-s 128-t freebsd-boot$device # Create 64K boot partition # gpart add-s 4m -t freebsd-ufs -l mb$dev $device # small partition # gpart add -t freebsd-zfs -l $dev$device # Remaining space for zfs It seems like protecting your partitions with multiple paths would be a good thing. I've been experimenting with this and end up with corrupt partitions. Am I missing something? -john ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: RFC: Project geom-events
Hello, Daniel. You wrote 8 октября 2011 г., 0:13:54: GPT (and MBR) metadata placement is dictated from outside world, where is no GEOM and geom_label. They INTENDED to be used on DISKS. BIOSes should be able to find it :) Certainly GPT and MBR must place an instance of the partition table where the BIOS expects it, but there's no immediately obvious reason why they must regard that instance as their GEOM metadata. GPT puts a second copy in the provider's last block, and AFAICT it could just as well use _that_ instance -- or even a differently-formatted block that included the same data -- as the primary. MBR could do likewise. I have deja-vu, that I answered this. Please, read standard. GPT _must_ be placed twice -- at first and last sectors (really, more than one sectors). By standard. Secondary copy must be at end of disk. Period. Then, by standard GPT cannot coexist with GLABEL. Such setup should be disallowed, or at least big nasty message that you have just shoot yourself in the leg should be output. (period) Ok, maybe adding check to geom_part, that it is used on rank-1 provider (whole disk) is not so bad idea. But it then raise question how to install FreeBSD on software mirror, what is useful. But could bite you sometimes... Hm... -- // Black Lion AKA Lev Serebryakov l...@freebsd.org ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: RFC: Project geom-events
Hello, Lev. You wrote 8 октября 2011 г., 13:52:21: GPT must have backup copy in last sector by standard ... In that case, shouldn't it refuse to install on any provider that is not in fact a disk, so as not to create configurations that cannot work properly? Installation of FreeBSD on software mirror?.. MBR doesn;t have any additional metadata. How adding one will help it? It would add robustness, for cases like the one that started this thread. If MBR put a GEOM metadata block at the end of its provider, it would fix the tasting race when an MBR is installed on a glabelled (or gmirrored) drive. And how it should work with MBR created by non-FreeBSD tools? -- // Black Lion AKA Lev Serebryakov l...@freebsd.org ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: RFC: Project geom-events
Hello, Ivan. You wrote 8 октября 2011 г., 0:23:14: If you think this should be explicitely handled, please file a PR which requests the modification of gpart so that it detects that a GPT is being created in anything other than a raw drive, and warns the user. It should be mentioned in documentation, at least. But how people will create bootable gmirror installation in such case? Make (many) mirrors from parts? I don't like this idea... -- // Black Lion AKA Lev Serebryakov l...@freebsd.org ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: RFC: Project geom-events
On Oct 8, 2011, at 12:05 , Lev Serebryakov wrote: Hello, Ivan. You wrote 8 октября 2011 г., 0:23:14: If you think this should be explicitely handled, please file a PR which requests the modification of gpart so that it detects that a GPT is being created in anything other than a raw drive, and warns the user. It should be mentioned in documentation, at least. But how people will create bootable gmirror installation in such case? Make (many) mirrors from parts? I don't like this idea... Good example of what I would call laziness -- other would call it hacking I guess. Either way, the solution we have now is permitting some exotic setups, but is fragile and is not consistent. Most of the useful features are actually side effects of the hack. If it should remain this way, a warning in the documentation and at runtime is very helpful. Daniel___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: RFC: Project geom-events
Hello, Perryh. You wrote 7 октября 2011 г., 18:06:38: GPT (and MBR) metadata placement is dictated from outside world, where is no GEOM and geom_label. They INTENDED to be used on DISKS. BIOSes should be able to find it :) Certainly GPT and MBR must place an instance of the partition table where the BIOS expects it, but there's no immediately obvious reason why they must regard that instance as their GEOM metadata. GPT puts a second copy in the provider's last block, and AFAICT it could just as well use _that_ instance -- or even a differently-formatted block GPT must have backup copy in last sector by standard, it is not caprise of GEOM class author.. BIOSes could refuse to boot from it, if they don't find second copy. And it could occupies not only one sector, but up to 34 of them. that included the same data -- as the primary. MBR could do likewise. MBR doesn;t have any additional metadata. How adding one will help it? -- // Black Lion AKA Lev Serebryakov l...@freebsd.org ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: RFC: Project geom-events
Lev Serebryakov l...@freebsd.org wrote: GPT (and MBR) metadata placement is dictated from outside world, where is no GEOM and geom_label. They INTENDED to be used on DISKS. BIOSes should be able to find it :) Certainly GPT and MBR must place an instance of the partition table where the BIOS expects it, but there's no immediately obvious reason why they must regard that instance as their GEOM metadata. GPT puts a second copy in the provider's last block, and AFAICT it could just as well use _that_ instance -- or even a differently-formatted block that included the same data -- as the primary. MBR could do likewise. ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: RFC: Project geom-events
Hello, Perryh. You wrote 7 октября 2011 г., 18:06:38: GPT (and MBR) metadata placement is dictated from outside world, where is no GEOM and geom_label. They INTENDED to be used on DISKS. BIOSes should be able to find it :) Certainly GPT and MBR must place an instance of the partition table where the BIOS expects it, but there's no immediately obvious reason why they must regard that instance as their GEOM metadata. GPT puts a second copy in the provider's last block, and AFAICT it could just as well use _that_ instance -- or even a differently-formatted block that included the same data -- as the primary. MBR could do likewise. I have deja-vu, that I answered this. Please, read standard. GPT _must_ be placed twice -- at first and last sectors (really, more than one sectors). By standard. Secondary copy must be at end of disk. Period. -- // Black Lion AKA Lev Serebryakov l...@freebsd.org ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: RFC: Project geom-events
On 07.10.11 22:44, Lev Serebryakov wrote: Hello, Perryh. You wrote 7 октября 2011 г., 18:06:38: GPT (and MBR) metadata placement is dictated from outside world, where is no GEOM and geom_label. They INTENDED to be used on DISKS. BIOSes should be able to find it :) Certainly GPT and MBR must place an instance of the partition table where the BIOS expects it, but there's no immediately obvious reason why they must regard that instance as their GEOM metadata. GPT puts a second copy in the provider's last block, and AFAICT it could just as well use _that_ instance -- or even a differently-formatted block that included the same data -- as the primary. MBR could do likewise. I have deja-vu, that I answered this. Please, read standard. GPT _must_ be placed twice -- at first and last sectors (really, more than one sectors). By standard. Secondary copy must be at end of disk. Period. Then, by standard GPT cannot coexist with GLABEL. Such setup should be disallowed, or at least big nasty message that you have just shoot yourself in the leg should be output. (period) Daniel ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: RFC: Project geom-events
2011/10/7 Daniel Kalchev dan...@digsys.bg: Then, by standard GPT cannot coexist with GLABEL. Such setup should be disallowed, or at least big nasty message that you have just shoot yourself in the leg should be output. (period) GPT cannot coexist with ANY GEOM CLASS which writes metadata to the last sector. If you think this should be explicitely handled, please file a PR which requests the modification of gpart so that it detects that a GPT is being created in anything other than a raw drive, and warns the user. ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: RFC: Project geom-events
Hello, John-Mark. You wrote 6 октября 2011 г., 2:53:53: gmirror0 gstripe0 ada0 ada1 gstripe1 ada2 ada3 and administrator kills gstripe0, for example, geom_mirror will send event, because from its point of view it is not administrative action... But such situations, IMHO, are not very often ones. Won't gmirror still report COMPLETE after a gmirror remove? So the I say kill gstripe0, not Remove gstripe0 from gmirror0, it is different situations. gmirror0 will be DEGRADED after this action, but will send DISCONNECT message with fixable state and it is state when geom-events(8) try to find replacement (spare). Exactly as when it lost component due to accident. If you say gmirror remove, yes, it will be COMPLETE after it. script can look at the gmirror device, and see that it is still complete even though one of the providers were dropped and assume it was an administrative command that did it.. Here is one problem: there is no STANDARD way to understand state of provider from userland, as it is GEOM-specific. g${class} status ${geom} prints almost free-form information. Not all classes names it COMPLETE, and some classes (geom_raid, for example) could have many providers and many states, which adds complexity. So, to make it work this way I need to add knowledge about all classes and their output formats to geom-ecents(8). I don't think, that it is good design -- it is bad idea to put knowledge about GEOM classes in two places -- class itself and some script. It will hard to synchronize, etc. So, I think, GEOM class itself should decide and report its state in standard way. -- // Black Lion AKA Lev Serebryakov l...@freebsd.org ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: RFC: Project geom-events
Hello, Alexander. You wrote 6 октября 2011 г., 1:34:33: That works perfect for case when class (geom_raid) is known to work on raw device. Other RAID classes can be used over partitions, so some care should be taken to avoid false positives. Oh, yes... I see... I'm not sure here. In that case it is helpful to include media size into the metadata. Comparing that value with provider size during taste allows to avoid these false positives. geom_mirror metadata include/check provider size since version 3. Pity that MBR and probably others don't. Yep. And what if class is not loaded/supported? There should be a way to manage/clear that label. Most of classes have clear command which doesn't need loaded module and works completely from userland now (as label works from userland too for these classes). And, if such changes will be made, generic command to geom itself, without class at all, could be added -- of course, it will refuse to clear anything that doesn't start from common signature. -- // Black Lion AKA Lev Serebryakov l...@freebsd.org ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: RFC: Project geom-events
On 06/10/2011 00:12, Miroslav Lachman wrote: Scot Hetzel wrote: 2011/10/5 Miroslav Lachman000.f...@quip.cz: I am waiting years for the moment, when these GEOM problems will be fixed, so I am really glad to see your interest! It will be move to right direction even if changes will not be backward compatible. The current state is too fragile to be used in production. Gmirror alone can be used, glabel alone can be used, GPT alone can be used... but mix it all stacked together is way to hell. e.g. Using GPT on glabeled provider always ends with error message about corrupted secondary GPT table. (But how can I use iSCSI in reliable way if I cannot use glable on devices and iSCSI device can have different number on each reboot? I wrote about it almost 2 years ago) You don't need to use glabel on GPT disks, as gpart has it's own way to label GPT disks: [...] The point was that glabel on disk device is successful, gpartitioning on glabeled device is successful, but metadata handling / device tasting is wrong after reboot and this should be fixed, not worked around. Otherwise thank you for example with GPT labels, it can be useful in some cases. Um, you do realize this is a physical problem with metadata location and cannot be solved in any meaningful way? Geom_label stores its label in the last sector of the device, and GPT stores the secondary / backup table also at the end of the device. The two can NEVER work together. The same goes for any other GEOM class which stores metadata and GPT. The only way to get this sorted out is to make a label class (or adapt glabel) which does NOT store metadata anywhere on the devices. Maybe they can store it in the file system (a file in /etc - though you then lose bootability, and have to somehow connect devices and labels), or the device hardware ID can be used as a label (but not all devices have it, and in case of software constructs like iSCSI the labels can be changed). signature.asc Description: OpenPGP digital signature
Re: RFC: Project geom-events
On 06.10.11 14:07, Ivan Voras wrote: Um, you do realize this is a physical problem with metadata location and cannot be solved in any meaningful way? Geom_label stores its label in the last sector of the device, and GPT stores the secondary / backup table also at the end of the device. The two can NEVER work together. The same goes for any other GEOM class which stores metadata and GPT. The proper way for this is to have these things store their metadata in the first/last sector of the provider, not the underlying device. This means that, if you have GPT within GLABEL, for example -- you will only see the GPT label if you first see the GLABEL. I guess the present situation was created out of laziness ;) ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: RFC: Project geom-events
On 06/10/2011 13:29, Daniel Kalchev wrote: On 06.10.11 14:07, Ivan Voras wrote: Um, you do realize this is a physical problem with metadata location and cannot be solved in any meaningful way? Geom_label stores its label in the last sector of the device, and GPT stores the secondary / backup table also at the end of the device. The two can NEVER work together. The same goes for any other GEOM class which stores metadata and GPT. The proper way for this is to have these things store their metadata in the first/last sector of the provider, not the underlying device. This means that, if you have GPT within GLABEL, for example -- you will only see the GPT label if you first see the GLABEL. I guess the present situation was created out of laziness ;) No, I don't think you understand. The layering *is* correct and you *can* create a GPT inside a glabel label, but then 1) you get device names like /dev/label/somethingp1, /dev/label/somethingp2, etc. 2) this makes the device unbootable as the GPT partition is per definition not valid. It still stores the primary partition table on the first sector (and the following sectors...), but its secondary table is stored at one sector short of device's last sector (which is used by glabel). Any utilities and BIOSes which test for GPT will find the first table but not the last and depending on how sensitive / broken they are, they will either recognize a broken GPT (and/or try to fix it, destroying the glabel label), or not work at all. You could argue that the GPT design is broken, but it was always, per design, only made to work on whole drives. There is no way to use it with any other scheme which uses either the first or the last sectors of a drive. Luckily, GPT also provides its own labels (per design) and instead of labeling the provider, you could just as easily label the individual partitions and skip glabel in this case. signature.asc Description: OpenPGP digital signature
Re: RFC: Project geom-events
Ivan Voras wrote: The point was that glabel on disk device is successful, gpartitioning on glabeled device is successful, but metadata handling / device tasting is wrong after reboot and this should be fixed, not worked around. Otherwise thank you for example with GPT labels, it can be useful in some cases. Um, you do realize this is a physical problem with metadata location and cannot be solved in any meaningful way? Geom_label stores its label in the last sector of the device, and GPT stores the secondary / backup table also at the end of the device. The two can NEVER work together. The same goes for any other GEOM class which stores metadata and GPT. The only way to get this sorted out is to make a label class (or adapt glabel) which does NOT store metadata anywhere on the devices. Maybe they can store it in the file system (a file in /etc - though you then lose bootability, and have to somehow connect devices and labels), or the device hardware ID can be used as a label (but not all devices have it, and in case of software constructs like iSCSI the labels can be changed). Then there should be warning in documentation or error message printed by command in the time of writing metadata. I am not a GEOM expert, but isn't it wrong concept, that glabel writes its metadata and publish original device size? If some GEOM write metadata at last sector (or first), then it should shrink the published size (or offset). Or is the problem at geom_part, that it is writing metadata past the advertised end of the device? e.g. If I have disk device with size of 100 sectors and glabel metadata is stored at the last sector, then glabel should shrink the advertised size to 99 sectors - then GPT secondary table will be at sector 99 instead of 100. I know there is problem if somebody access the device by its normal device node (e.g. /dev/ada0), then secondary GPT table will be at different place, not in last sector. But this is the mistake in glabel concept and if it cannot be solved by any other way, then glabel should not be allowed to place labels on the disk device at all. (if we cannot be sure it is non conflicting) The current state is simply wrong, because user can do something what cannot work and is not documented anywhere. Miroslav Lachman ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: RFC: Project geom-events
On 06.10.11 15:36, Ivan Voras wrote: On 06/10/2011 13:29, Daniel Kalchev wrote: On 06.10.11 14:07, Ivan Voras wrote: Um, you do realize this is a physical problem with metadata location and cannot be solved in any meaningful way? Geom_label stores its label in the last sector of the device, and GPT stores the secondary / backup table also at the end of the device. The two can NEVER work together. The same goes for any other GEOM class which stores metadata and GPT. The proper way for this is to have these things store their metadata in the first/last sector of the provider, not the underlying device. This means that, if you have GPT within GLABEL, for example -- you will only see the GPT label if you first see the GLABEL. I guess the present situation was created out of laziness ;) No, I don't think you understand. The layering *is* correct and you *can* create a GPT inside a glabel label, but then 1) you get device names like /dev/label/somethingp1, /dev/label/somethingp2, etc. .. and, you overwrite the last sector of the device, not of the provider. This is incorrect layering -- GPT should see only the provider it was given and nothing at different layers. 2) this makes the device unbootable as the GPT partition is per definition not valid. It still stores the primary partition table on the first sector (and the following sectors...), but its secondary table is stored at one sector short of device's last sector (which is used by glabel). Any utilities and BIOSes which test for GPT will find the first table but not the last and depending on how sensitive / broken they are, they will either recognize a broken GPT (and/or try to fix it, destroying the glabel label), or not work at all. This is why I said, lazy. If you use proper layering, the GPT within another GEOM provider should be unbootable. No BIOS should ever see it. The rationale of using GPT within another GEOM is questionable, but apparently has applications. You could argue that the GPT design is broken, but it was always, per design, only made to work on whole drives. There is no way to use it with any other scheme which uses either the first or the last sectors of a drive. I am not arguing this. But suppose you use GPT on HAST providers. These cannot boot so whether GPT is bootable in that case is irelevant. Luckily, GPT also provides its own labels (per design) and instead of labeling the provider, you could just as easily label the individual partitions and skip glabel in this case. That is another option, yes. But this does not mean we do not have a issue. Probably, it should be simply disallowed to use GPT within GLABEL. Or, in cases where this might be beneficial, use sort of 'boot-less' GPT. Probably, the problem here is with GLABEL, that does not use the first sector as well. ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: RFC: Project geom-events
I am not a GEOM expert, but isn't it wrong concept, that glabel writes its metadata and publish original device size? It does not. # diskinfo -v /dev/md0 /dev/md0 512 # sectorsize 104857600 # mediasize in bytes (100M) 204800 # mediasize in sectors 0 # stripesize 0 # stripeoffset # glabel create /dev/label/blah md0 # diskinfo -v /dev/label/blah /dev/label/blah 512 # sectorsize 104857088 # mediasize in bytes (100M) 204799 # mediasize in sectors 0 # stripesize 0 # stripeoffset (i.e. one sector is used for glabel). # gpart create -s GPT /dev/label/blah # gpart add -t freebsd -l gptpart label/blah # diskinfo -v /dev/label/blahs1 /dev/label/blahs1 512 # sectorsize 104822784 # mediasize in bytes (100M) 204732 # mediasize in sectors 0 # stripesize 17408 # stripeoffset 710 # Cylinders according to firmware. 32 # Heads according to firmware. 9 # Sectors according to firmware. (i.e. 67 sectors are used for: protective MBR (1), GPT header (1), GPT table (32), the backup GPT header (1) and the backup GPT table (32)). If some GEOM write metadata at last sector (or first), then it should shrink the published size (or offset). It does that. Or is the problem at geom_part, that it is writing metadata past the advertised end of the device? There is no problem as far as I can see. The only possible problem is that you are trying to do something outside the specification. This is not the fault of FreeBSD, GEOM, or glabel, and it's even probably not the fault of the specification as it cannot avoid real-world problems such as this. See this: http://www.uefi.org/specs/ If the primary GPT is invalid, the backup GPT is used instead and it is located on the last logical block on the disk. If the backup GPT is valid it must be used to restore the primary GPT. If the primary GPT is valid and the backup GPT is invalid software must restore the backup GPT. If both the primary and backup GPTs are corrupted this block device is defined as not having a valid GUID Partition Header. Even though the primary GPT header contains the LBA of the backup GPT header, this paragraph says that it must be the last sector of the device. e.g. If I have disk device with size of 100 sectors and glabel metadata is stored at the last sector, then glabel should shrink the advertised size to 99 sectors - then GPT secondary table will be at sector 99 instead of 100. Maybe an illustration will help. In the scenario like the above, this is what you have on the drive: [ PMBR | GPT 1 | --- random file system data --- | GPT 2 | glabel ] The $glabeled_size is $device_sectors - 1. The size of the GPT available space is $glabeled_size - 67. Since the PMBR and primary GPT headers are located at the start of the drive, they are detected there. A tool which is made to the specification from the paragraph above will detect that the second GPT table (GPT 2) is invalid in this configuration and will try to fix it by destroying the glabel metadata. It doesn't matter that from GEOM's point of view, the layering is correct and the provider sizes are calculated correctly. If you want to do this, you must do it with something which isn't GPT. signature.asc Description: OpenPGP digital signature
Re: RFC: Project geom-events
Hello, Daniel. You wrote 6 октября 2011 г., 15:29:58: The proper way for this is to have these things store their metadata in the first/last sector of the provider, not the underlying device. This means that, if you have GPT within GLABEL, for example -- you will only see the GPT label if you first see the GLABEL. I guess the present situation was created out of laziness ;) No. GPT (and MBR) metadata placement is dictated from outside world, where is no GEOM and geom_label. They INTENDED to be used on DISKS. BIOSes should be able to find it :) -- // Black Lion AKA Lev Serebryakov l...@freebsd.org ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: RFC: Project geom-events
Hello, Miroslav. You wrote 6 октября 2011 г., 16:59:19: I am not a GEOM expert, but isn't it wrong concept, that glabel writes its metadata and publish original device size? If some GEOM write metadata at last sector (or first), then it should shrink the published size (or offset). Or is the problem at geom_part, that it is writing metadata past the advertised end of the device? Good point. e.g. If I have disk device with size of 100 sectors and glabel metadata is stored at the last sector, then glabel should shrink the advertised size to 99 sectors - then GPT secondary table will be at sector 99 instead of 100. The current state is simply wrong, because user can do something what cannot work and is not documented anywhere. It is Ok in UNIX way, in general. You should be able to shoot your leg, it is good :) But if geom_label doesn't reduce its provider to count its own metadata, it looks like a bug! -- // Black Lion AKA Lev Serebryakov l...@freebsd.org ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: RFC: Project geom-events
On 06.10.11 17:04, Pieter de Goeje wrote: The layering *is* correct and you *can* create a GPT inside a glabel label, but then 1) you get device names like /dev/label/somethingp1, /dev/label/somethingp2, etc. .. and, you overwrite the last sector of the device, not of the provider. This is incorrect layering -- GPT should see only the provider it was given and nothing at different layers. If you stack GPT on top of glabel, then your statement is not true. GPT will overwrite the last sector of the (glabel) provider, not the underlying device. There is no layering violation. I stand corrected. Sorry for creating confusion with this statement. Most of the time I was blaming GPT, I was actually blaming GLABEL (see below) Because physically the first sector of the device is still GPT data the BIOS will still try to boot from it, hence it would probably be wise to disallow GPT on anything other then raw devices. Yes, but.. what is a raw device? Probably disallow GPT on devices that are not bootable, but how this can be indicated? GPT is very useful for it's ability to create labeled partitions. This problem wouldn't exist if geom classes would write their metadata to the first sector, but then you could no longer boot from for example gmirrored/glabeled devices with a MBR. We seem to blame GPT here, but it is really GLABEL the culprit here. If GLABEL writes to the first sector of the provider and that makes the disk non-bootable, then there is little chance that somebody will try to use first GLABEL, then GPT etc and create the current situation. Unfortunately, the GLABEL + GMIRROR setup is so common.. ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: RFC: Project geom-events
On Thursday, October 06, 2011 02:43:03 PM Daniel Kalchev wrote: On 06.10.11 15:36, Ivan Voras wrote: On 06/10/2011 13:29, Daniel Kalchev wrote: On 06.10.11 14:07, Ivan Voras wrote: Um, you do realize this is a physical problem with metadata location and cannot be solved in any meaningful way? Geom_label stores its label in the last sector of the device, and GPT stores the secondary / backup table also at the end of the device. The two can NEVER work together. The same goes for any other GEOM class which stores metadata and GPT. The proper way for this is to have these things store their metadata in the first/last sector of the provider, not the underlying device. This means that, if you have GPT within GLABEL, for example -- you will only see the GPT label if you first see the GLABEL. I guess the present situation was created out of laziness ;) No, I don't think you understand. The layering *is* correct and you *can* create a GPT inside a glabel label, but then 1) you get device names like /dev/label/somethingp1, /dev/label/somethingp2, etc. .. and, you overwrite the last sector of the device, not of the provider. This is incorrect layering -- GPT should see only the provider it was given and nothing at different layers. If you stack GPT on top of glabel, then your statement is not true. GPT will overwrite the last sector of the (glabel) provider, not the underlying device. There is no layering violation. So you get this physical layout: Primary GPT header,data,Secondary GPT header,glabel metadata. Because physically the first sector of the device is still GPT data the BIOS will still try to boot from it, hence it would probably be wise to disallow GPT on anything other then raw devices. This problem wouldn't exist if geom classes would write their metadata to the first sector, but then you could no longer boot from for example gmirrored/glabeled devices with a MBR. - Pieter ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: RFC: Project geom-events
On 06.10.2011 16:36, Ivan Voras wrote: 2) this makes the device unbootable as the GPT partition is per definition not valid. It still stores the primary partition table on the first sector (and the following sectors...), but its secondary table is stored at one sector short of device's last sector (which is used by glabel). Any utilities and BIOSes which test for GPT will find the first table but not the last and depending on how sensitive / broken they are, they will either recognize a broken GPT (and/or try to fix it, destroying the glabel label), or not work at all. Actually we support booting from GPT when secondary GPT header is not in the last LBA. Our bootcode will complain in this situation, but it works. -- WBR, Andrey V. Elsukov signature.asc Description: OpenPGP digital signature
Re: RFC: Project geom-events
Hello, Miroslav. You wrote 5 октября 2011 г., 1:27:03: I am still missing one thing - dropped provider is not marked as failed RAID provider and is accessible for anything like normal disk device. So in some edge cases, the system can boot from failed RAID component instead of degraded RAID. This can cause data loss or demage. What RAID do you mean exactly? geom_stripe? geom_mirrot? geom_raid? Something else? If GEOM class drops underlying provider due to errors, it doesn't have chances to update metadata for it. But most of classes, if dropped provider attached again, will rebuild itself, as they track which components are actual and which ones are old. Do you want GEOM classes to track droppen components somewhere else and din't even try to attach them automaticaly when they re-appear? Is it possible to fix it by something like your geom-events, or should it be done in each GEOM RAID class separately? geom-events only process events from GEOM classes in userland. Each class should decide what happens to him by itself, as only class itself knows is this particular error fatal or not. geom-events could help, if it replaces dropped component fith spare drive, as in such case most classes prefer latest drive, not old one. Without spares, everything will be exactly as it is now, plus e-mails to administrator :) -- // Black Lion AKA Lev Serebryakov l...@freebsd.org ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: RFC: Project geom-events
Hello, Andrey. You wrote 5 октября 2011 г., 9:07:16: It seems that you could change only geom_dev.c to get most of what you want. Actually, the part of your changes related to the DISCONNECT events, and maybe DESTROY events could be implemented in the geom_dev. Does geom_dev knows all needed bits of information to report? It seems to me, that it isn't. I mean: (1) Class and name of GEOM which is affected. (2) Name of provider which is affected. (3) Name of underlying provider which is lost (consumer from reporting GEOM's point of view). (4) Resulting state of affected provider (fixable, alive, dead). Yes, geom_dev knows name of FAILED provider, but does it knows all other? I'm affraid -- not, or I don't understand how generic mechanism could now, that geom_stripe could not lose components and still be fixable, and gome_mirror could. Additionally, some GEOM classes could throw away faulty consumers before they disappear from geom_dev point of view. Actually, DESTROY could be observed without my changes at all -- message from DEVFS about removing entry :) But, again, this notification will not contain name and class of GEOM, only provider's name (devfs entry). -- // Black Lion AKA Lev Serebryakov l...@freebsd.org ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: RFC: Project geom-events
Hello, Andrey. You wrote 5 октября 2011 г., 10:27:10: It seems that you could change only geom_dev.c to get most of what you want. Actually, the part of your changes related to the DISCONNECT events, and maybe DESTROY events could be implemented in the geom_dev. Does geom_dev knows all needed bits of information to report? It seems to me, that it isn't. I mean: (1) Class and name of GEOM which is affected. (2) Name of provider which is affected. (3) Name of underlying provider which is lost (consumer from reporting GEOM's point of view). (4) Resulting state of affected provider (fixable, alive, dead). And, I'm affraid, that geom_dev could not distinguish manual operations with geom (performed from userland by administrator) and real accidents. I don't want geoms to post DISCONNECTED or DESTROYED events when administrator knows what he does -- and it could lead to race conditions, when administrator rebuild array and forgot todisable spare drives, for example. Other example -- geom_label creates and destroys about 10 labels on boot (on my test VM) and, if DESTROYED will be reported by very generic mechanism, it will end up with 10 e-mails to administrator on every boot -- I've got this, when put notifications in too generic place for first try. -- // Black Lion AKA Lev Serebryakov l...@freebsd.org ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: RFC: Project geom-events
Hello, Stephane. You wrote 5 октября 2011 г., 10:25:51: On 10/05/2011 03:19 PM, Lev Serebryakov wrote: A bit unrelated, but are there plans to integrate hardware RAID (mps/mfi/mpt/amr) failure notification in the same way as this would be done for GEOM ? As in, one framework and way to manage both hard and soft RAIDs. I don't have such plans, as I think, only drivers authors could identify proper places to add event sending. Drivers are much more complicated, that RAID classes (I was unable to find proper places for geom_vinum, for example, and hardware drivers doesn't look simpler, that that). But from userland's point of view, there is nothing special about hardware RAIDs -- geom-events(8) needs two commands to be configured: to remove failed drive from array and to add new one, that's all. Of course, GEOM system name in events will looks like odd for hardware controllers. But it could be renamed to something more generic. And hardware controllers has same bits of information as software ones -- type of controller, name of failed drive, name of affected volume and resulting state, everything is the same. So, if here is interest form hardware RAID driver's authors, it could be integrated, of course. -- // Black Lion AKA Lev Serebryakov l...@freebsd.org ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: RFC: Project geom-events
On 05.10.2011 10:39, Lev Serebryakov wrote: (1) Class and name of GEOM which is affected. (2) Name of provider which is affected. (3) Name of underlying provider which is lost (consumer from reporting GEOM's point of view). (4) Resulting state of affected provider (fixable, alive, dead). All except last could be get from the consumer in the orphan method. And, I'm affraid, that geom_dev could not distinguish manual operations with geom (performed from userland by administrator) and real accidents. I don't want geoms to post DISCONNECTED or DESTROYED events when administrator knows what he does -- and it could lead to race conditions, when administrator rebuild array and forgot todisable spare drives, for example. Other example -- geom_label creates and destroys about 10 labels on boot (on my test VM) and, if DESTROYED will be reported by very generic mechanism, it will end up with 10 e-mails to administrator on every boot -- I've got this, when put notifications in too generic place for first try. Ok, good point. Can you explain how your script will distinguish which actions are performed by administrator? Since change made by administrator could trigger disappearing of several child geoms. -- WBR, Andrey V. Elsukov ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: RFC: Project geom-events
Lev Serebryakov wrote: Hello, Miroslav. You wrote 5 октября 2011 г., 1:27:03: I am still missing one thing - dropped provider is not marked as failed RAID provider and is accessible for anything like normal disk device. So in some edge cases, the system can boot from failed RAID component instead of degraded RAID. This can cause data loss or demage. What RAID do you mean exactly? geom_stripe? geom_mirrot? geom_raid? Something else? I am mostly using geom_mirror. If GEOM class drops underlying provider due to errors, it doesn't have chances to update metadata for it. I understand this, but if there are (stale) metadata on provider, system can read this metadata and should disallow normal operations (for example propagating slices, partitions and labels) But most of classes, if dropped provider attached again, will rebuild itself, as they track which components are actual and which ones are old. I see many times dropped provider (for example ada1) because of some DMA timeout (bad cables and so on), sometimes provider (disk ada1) detached from ATA channel and reattached after reboot. In both cases, provider has stale metadata and is marked as broken by geom_mirror and auto rebuild did not start. In this case, I see gm0 with all of its slices, partitions and labels and ada1 with the same slices, partitions and labels - this is the problem. Because there are two devices providing same labels and the winner is the first tasted... Even if the system (geom_mirror) knows, that ada1 is broken disk. I think that GEOM should be more robust in this case and if metadata is found, do not publish slices, partitions, labels and so on... Do you want GEOM classes to track droppen components somewhere else and din't even try to attach them automaticaly when they re-appear? If some disk is removed, reinserted and synchronisation starts, then everything is OK. But situation where component is marked as broken and system and user can operate on it like on normal good and clean drive is wrong. The drive's content should be inacessible until operator do some action (for example gmirror clear on broken disk device). Miroslav Lachman ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: RFC: Project geom-events
Hello, Andrey. You wrote 5 октября 2011 г., 11:51:36: On 05.10.2011 10:39, Lev Serebryakov wrote: (1) Class and name of GEOM which is affected. (2) Name of provider which is affected. (3) Name of underlying provider which is lost (consumer from reporting GEOM's point of view). (4) Resulting state of affected provider (fixable, alive, dead). All except last could be get from the consumer in the orphan method. I'm afraid, that (2) could not be known too in generic way, as GEOM could have several providers, and only part of them could be affected by disconnection. Consumer contains geom (with class) and underlying provider, it is items (1) and (3)... Other example -- geom_label creates and destroys about 10 labels on boot (on my test VM) and, if DESTROYED will be reported by very generic mechanism, it will end up with 10 e-mails to administrator on every boot -- I've got this, when put notifications in too generic place for first try. Ok, good point. Can you explain how your script will distinguish which actions are performed by administrator? Since change made by administrator could trigger disappearing of several child geoms. Not the script, but GEOMs themselves. They knows, why disk disappears. Of course, it work only one-level -- if administrator calls gmirror remove gm0 ada4 geom_mirror knows, that ada4 is no failed. Yes, I understand, that if here is configuration like this: gmirror0 gstripe0 ada0 ada1 gstripe1 ada2 ada3 and administrator kills gstripe0, for example, geom_mirror will send event, because from its point of view it is not administrative action... But such situations, IMHO, are not very often ones. -- // Black Lion AKA Lev Serebryakov l...@freebsd.org ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: RFC: Project geom-events
Hello, Miroslav. You wrote 5 октября 2011 г., 12:24:06: What RAID do you mean exactly? geom_stripe? geom_mirrot? geom_raid? Something else? I am mostly using geom_mirror. [SKIPPED] Oh, I see. Unfortunately, there is no GEOM metadata infrastructire, GEOMs are too generic for this. I could design some meta-meta framework, and unify all RAID classes with intenral metadtata (geom_stripe, geom_concat, geom_mirror, geom_raid3 and my external geom_raid5) to use it. In such case it will work -- kernel will not pass providers with ditry metadtata to any GEOMs, but owners, for tasting. Of course, classes like geom_part and geom_raid could not be changed in such way -- they are forced to use pre-defined metadata formats. It is good idea, but it should be separate project. And, yes, it will change metadata format for these GEOMs, so it will not be backward-compatible. And, yes, it seems to be much more intrusive change in GEOM subsystem (because it will change tasting sequence), and should be supervised by other developers from very beginning. I could write proposal in near future, with some design notes. -- // Black Lion AKA Lev Serebryakov l...@freebsd.org ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: RFC: Project geom-events
On 05.10.2011 11:58, Lev Serebryakov wrote: Hello, Miroslav. You wrote 5 октября 2011 г., 12:24:06: What RAID do you mean exactly? geom_stripe? geom_mirrot? geom_raid? Something else? I am mostly using geom_mirror. [SKIPPED] Oh, I see. Unfortunately, there is no GEOM metadata infrastructire, GEOMs are too generic for this. I could design some meta-meta framework, and unify all RAID classes with intenral metadtata (geom_stripe, geom_concat, geom_mirror, geom_raid3 and my external geom_raid5) to use it. In such case it will work -- kernel will not pass providers with ditry metadtata to any GEOMs, but owners, for tasting. Of course, classes like geom_part and geom_raid could not be changed in such way -- they are forced to use pre-defined metadata formats. geom_raid addresses this problem in own way. As soon as RAID BIOSes expect RAIDs to be built on raw physical devices and probe order is not discussed, geom_raid exclusively opens underlying providers immediately after detecting supported metadata. So even if volume is broken or incomplete or this disk marked failed, or in any other case, this disk won't be accessible for other GEOM classes. If administrator wishes to reuse this disk for any other purpose, he should explicitly erase on-disk metadata using graid tool or with dd after unloading geom_raid. Up to the recent time geom tools didn't report geoms without providers. Now there is special -a argument to report all of them. Also there is -g to report geoms instead of providers, that is useful in such cases. It is good idea, but it should be separate project. And, yes, it will change metadata format for these GEOMs, so it will not be backward-compatible. And, yes, it seems to be much more intrusive change in GEOM subsystem (because it will change tasting sequence), and should be supervised by other developers from very beginning. I could write proposal in near future, with some design notes. -- Alexander Motin ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: RFC: Project geom-events
Hello, Alexander. You wrote 5 октября 2011 г., 13:18:34: geom_raid addresses this problem in own way. As soon as RAID BIOSes expect RAIDs to be built on raw physical devices and probe order is not discussed, geom_raid exclusively opens underlying providers immediately after detecting supported metadata. So even if volume is broken or But it could be not first, who taste component of mirror, am I right? If geom_part will be first, will it take away component from geom_raid? Or it could not? If it works in any case (exclusive open spoils geom_part), it could be used in all other classes without any metadata infrastructure, but it seems, that geom_mirror, for example, could pickup metadtata from last parition instead of raw device... I'm not sure here. But, in any case, maybe standard first 16 bytes of metadata in pure-GEOM classes and filter in GEOM infrastructure itself (not pass provider for tasting to anything but class, written in first 16 bytes of last sector) looks good idea, IMHO. -- // Black Lion AKA Lev Serebryakov l...@freebsd.org ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: RFC: Project geom-events
Lev Serebryakov wrote: Hello, Miroslav. You wrote 5 октября 2011 г., 12:24:06: What RAID do you mean exactly? geom_stripe? geom_mirrot? geom_raid? Something else? I am mostly using geom_mirror. [SKIPPED] Oh, I see. Unfortunately, there is no GEOM metadata infrastructure, GEOMs are too generic for this. I could design some meta-meta framework, and unify all RAID classes with intenral metadtata (geom_stripe, geom_concat, geom_mirror, geom_raid3 and my external geom_raid5) to use it. In such case it will work -- kernel will not pass providers with dirty metadtata to any GEOMs, but owners, for tasting. Of course, classes like geom_part and geom_raid could not be changed in such way -- they are forced to use pre-defined metadata formats. It is good idea, but it should be separate project. And, yes, it will change metadata format for these GEOMs, so it will not be backward-compatible. And, yes, it seems to be much more intrusive change in GEOM subsystem (because it will change tasting sequence), and should be supervised by other developers from very beginning. I could write proposal in near future, with some design notes. I am waiting years for the moment, when these GEOM problems will be fixed, so I am really glad to see your interest! It will be move to right direction even if changes will not be backward compatible. The current state is too fragile to be used in production. Gmirror alone can be used, glabel alone can be used, GPT alone can be used... but mix it all stacked together is way to hell. e.g. Using GPT on glabeled provider always ends with error message about corrupted secondary GPT table. (But how can I use iSCSI in reliable way if I cannot use glable on devices and iSCSI device can have different number on each reboot? I wrote about it almost 2 years ago) GEOM layering possibilities are really amazing, but metadata, tasting and robustness in edge cases is not well done. If you are able to come with some fixes in GEOM metadata implementation / handling, I see better future :) Unfortunately, I am not a C programmer, so I cannot write patches, but I can test whatever you will need in this area. You are right, it should be separate project. I am looking forward to your proposal / wiki page. Thank you again for your work on GEOM improvements! Miroslav Lachman ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: RFC: Project geom-events
2011/10/5 Miroslav Lachman 000.f...@quip.cz: I am waiting years for the moment, when these GEOM problems will be fixed, so I am really glad to see your interest! It will be move to right direction even if changes will not be backward compatible. The current state is too fragile to be used in production. Gmirror alone can be used, glabel alone can be used, GPT alone can be used... but mix it all stacked together is way to hell. e.g. Using GPT on glabeled provider always ends with error message about corrupted secondary GPT table. (But how can I use iSCSI in reliable way if I cannot use glable on devices and iSCSI device can have different number on each reboot? I wrote about it almost 2 years ago) You don't need to use glabel on GPT disks, as gpart has it's own way to label GPT disks: Fixit# gpart create -s gpt ad0 Fixit# gpart add -s 4G -t freebsd-swap -l swap0 ad0 Fixit# gpart add -t freebsd-zfs -l disk0 ad0 This create the following in /dev: /dev/gpt/swap0 /dev/gpt/disk0 Glabel is not needed for GPT partitioned disks. What should happen is that glabel should fail when attempting to label a GPT disk. If you wish to add a GPT label after the fact use: gpart show geom gpart modify -i index -l label geom (i.e. geom = ad0) Scot ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: RFC: Project geom-events
On 05.10.2011 12:29, Lev Serebryakov wrote: You wrote 5 октября 2011 г., 13:18:34: geom_raid addresses this problem in own way. As soon as RAID BIOSes expect RAIDs to be built on raw physical devices and probe order is not discussed, geom_raid exclusively opens underlying providers immediately after detecting supported metadata. So even if volume is broken or But it could be not first, who taste component of mirror, am I right? If geom_part will be first, will it take away component from geom_raid? Or it could not? Most of GEOM classes are less aggressive. So geom_raid will any way taste device finally and geom_part should be automatically spoiled as soon as geom_raid open device. If it works in any case (exclusive open spoils geom_part), it could be used in all other classes without any metadata infrastructure, That works perfect for case when class (geom_raid) is known to work on raw device. Other RAID classes can be used over partitions, so some care should be taken to avoid false positives. but it seems, that geom_mirror, for example, could pickup metadtata from last parition instead of raw device... I'm not sure here. In that case it is helpful to include media size into the metadata. Comparing that value with provider size during taste allows to avoid these false positives. geom_mirror metadata include/check provider size since version 3. Pity that MBR and probably others don't. But, in any case, maybe standard first 16 bytes of metadata in pure-GEOM classes and filter in GEOM infrastructure itself (not pass provider for tasting to anything but class, written in first 16 bytes of last sector) looks good idea, IMHO. And what if class is not loaded/supported? There should be a way to manage/clear that label. -- Alexander Motin ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: RFC: Project geom-events
Scot Hetzel wrote: 2011/10/5 Miroslav Lachman000.f...@quip.cz: I am waiting years for the moment, when these GEOM problems will be fixed, so I am really glad to see your interest! It will be move to right direction even if changes will not be backward compatible. The current state is too fragile to be used in production. Gmirror alone can be used, glabel alone can be used, GPT alone can be used... but mix it all stacked together is way to hell. e.g. Using GPT on glabeled provider always ends with error message about corrupted secondary GPT table. (But how can I use iSCSI in reliable way if I cannot use glable on devices and iSCSI device can have different number on each reboot? I wrote about it almost 2 years ago) You don't need to use glabel on GPT disks, as gpart has it's own way to label GPT disks: [...] The point was that glabel on disk device is successful, gpartitioning on glabeled device is successful, but metadata handling / device tasting is wrong after reboot and this should be fixed, not worked around. Otherwise thank you for example with GPT labels, it can be useful in some cases. Miroslav Lachman ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: RFC: Project geom-events
Lev Serebryakov wrote this message on Wed, Oct 05, 2011 at 12:51 +0400: Hello, Andrey. You wrote 5 ??? 2011 ?., 11:51:36: On 05.10.2011 10:39, Lev Serebryakov wrote: (1) Class and name of GEOM which is affected. (2) Name of provider which is affected. (3) Name of underlying provider which is lost (consumer from reporting GEOM's point of view). (4) Resulting state of affected provider (fixable, alive, dead). All except last could be get from the consumer in the orphan method. I'm afraid, that (2) could not be known too in generic way, as GEOM could have several providers, and only part of them could be affected by disconnection. Consumer contains geom (with class) and underlying provider, it is items (1) and (3)... Other example -- geom_label creates and destroys about 10 labels on boot (on my test VM) and, if DESTROYED will be reported by very generic mechanism, it will end up with 10 e-mails to administrator on every boot -- I've got this, when put notifications in too generic place for first try. Ok, good point. Can you explain how your script will distinguish which actions are performed by administrator? Since change made by administrator could trigger disappearing of several child geoms. Not the script, but GEOMs themselves. They knows, why disk disappears. Of course, it work only one-level -- if administrator calls gmirror remove gm0 ada4 geom_mirror knows, that ada4 is no failed. Yes, I understand, that if here is configuration like this: gmirror0 gstripe0 ada0 ada1 gstripe1 ada2 ada3 and administrator kills gstripe0, for example, geom_mirror will send event, because from its point of view it is not administrative action... But such situations, IMHO, are not very often ones. Won't gmirror still report COMPLETE after a gmirror remove? So the script can look at the gmirror device, and see that it is still complete even though one of the providers were dropped and assume it was an administrative command that did it.. -- John-Mark Gurney Voice: +1 415 225 5579 All that I will do, has been done, All that I have, has not. ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: RFC: Project geom-events
Hello, Lev. You wrote 4 октября 2011 г., 22:05:07: Patch against CURRENT is attached. Oh, sorry, it seems, that patch is too big for list. http://lev.serebryakov.spb.ru/download/geom-events-1.0.head.patch.gz -- // Black Lion AKA Lev Serebryakov l...@freebsd.org ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: RFC: Project geom-events
On 04.10.2011 21:12, Freddie Cash wrote: 2011/10/4 Lev Serebryakov l...@freebsd.org mailto:l...@freebsd.org One thing is missed from software RAIDs is spare drives and state monitoring (yes, I know, that geom_raid supports spare drivers for metadata formats which supports them, but it not universal solution). Sounds impressive! Will be very useful for those using GEOM-based RAID (gmirror, gstripe, graid3, graid5, etc). Just curious: would the geom-events framework, and in particular the geom-events script, be useful for ZFS setups, for initiating replacements and providing hot-spare support? Now there is projects/zfsd branch that is doing alike things (disk auto-insertion and hot spares) specifically for the case of ZFS. It also uses devctl interface to receive events, but user-level part (zfsd itself) is tightly hardcoded to talk to ZFS, fetching statuses and making control actions. Not sure whether this functionality could be scripted within geom-events, but having single mechanism indeed would be nice. -- Alexander Motin ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: RFC: Project geom-events
2011/10/4 Lev Serebryakov l...@freebsd.org One thing is missed from software RAIDs is spare drives and state monitoring (yes, I know, that geom_raid supports spare drivers for metadata formats which supports them, but it not universal solution). Sounds impressive! Will be very useful for those using GEOM-based RAID (gmirror, gstripe, graid3, graid5, etc). Just curious: would the geom-events framework, and in particular the geom-events script, be useful for ZFS setups, for initiating replacements and providing hot-spare support? -- Freddie Cash fjwc...@gmail.com ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: RFC: Project geom-events
On Oct 4, 2011, at 11:12 AM, Freddie Cash wrote: 2011/10/4 Lev Serebryakov l...@freebsd.org One thing is missed from software RAIDs is spare drives and state monitoring (yes, I know, that geom_raid supports spare drivers for metadata formats which supports them, but it not universal solution). Sounds impressive! Will be very useful for those using GEOM-based RAID (gmirror, gstripe, graid3, graid5, etc). Just curious: would the geom-events framework, and in particular the geom-events script, be useful for ZFS setups, for initiating replacements and providing hot-spare support? Work in the zfsd project branch already seems to do this properly. Please note that some HBAs (like mps) don't play well with hotswap on some branches, whereas others (mfi) might, depending on how things are coded up and chipset support. Thanks, -Garrett___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: RFC: Project geom-events
Hello, Freddie. You wrote 4 октября 2011 г., 22:12:32: Sounds impressive! Will be very useful for those using GEOM-based RAID (gmirror, gstripe, graid3, graid5, etc). Just curious: would the geom-events framework, and in particular the geom-events script, be useful for ZFS setups, for initiating replacements and providing hot-spare support? Script is configurable enough to adapt for any component removal and insertion command. ZFS needs to send proper event (with pool name in question, instead of GEOM name, for example), and some commands should be added to config file for zfs type of GEOM, but, I think, as ZFS has extensive userland library, zfsd solution could be better for ZFS. But anyway, I don't want to touch any ZFS sources, as they are very complicated. But I open for suggestions from ZFS team, of course. -- // Black Lion AKA Lev Serebryakov l...@freebsd.org ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: RFC: Project geom-events
On Tue, Oct 4, 2011 at 12:15 PM, Garrett Cooper yaneg...@gmail.com wrote: On Oct 4, 2011, at 11:12 AM, Freddie Cash wrote: 2011/10/4 Lev Serebryakov l...@freebsd.org One thing is missed from software RAIDs is spare drives and state monitoring (yes, I know, that geom_raid supports spare drivers for metadata formats which supports them, but it not universal solution). Sounds impressive! Will be very useful for those using GEOM-based RAID (gmirror, gstripe, graid3, graid5, etc). Just curious: would the geom-events framework, and in particular the geom-events script, be useful for ZFS setups, for initiating replacements and providing hot-spare support? Work in the zfsd project branch already seems to do this properly. Please note that some HBAs (like mps) don't play well with hotswap on some branches, whereas others (mfi) might, depending on how things are coded up and chipset support. Cool! Sounds like we're just around the corner from having a top-notch software RAID stack via GEOM/ZFS with all the automatic goodies one expects/hopes for. :) Keep up the good work people!! -- Freddie Cash fjwc...@gmail.com ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: RFC: Project geom-events
Miroslav Lachman wrote: Lev Serebryakov wrote: One thing is missed from software RAIDs is spare drives and state monitoring (yes, I know, that geom_raid supports spare drivers for metadata formats which supports them, but it not universal solution). I am still missing one thing - dropped provider is not marked as failed RAID provider and is accessible for anything like normal disk device. So in some edge cases, the system can boot from failed RAID component instead of degraded RAID. This can cause data loss or demage. To reliably boot from RAID array, you need help from some RAID BIOS. While booting from correctly working gmirror is possible, it may not be reliable when array is degraded. That is one of the main benefits of the graid, comparing to gmirror -- cooperation with RAID BIOS. Ability to track failed devices also depends on specific metadata format. For example, for Intel RAID BIOS, metadata stored on each disk include information about all other disks used now. As result, if some disk fails and system is unable to update it's metadata any more, that information can still be stored on other devices to prevent disk resurrection in most cases. -- Alexander Motin ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: RFC: Project geom-events
Lev Serebryakov wrote: [...] One thing is missed from software RAIDs is spare drives and state monitoring (yes, I know, that geom_raid supports spare drivers for metadata formats which supports them, but it not universal solution). I am still missing one thing - dropped provider is not marked as failed RAID provider and is accessible for anything like normal disk device. So in some edge cases, the system can boot from failed RAID component instead of degraded RAID. This can cause data loss or demage. Is it possible to fix it by something like your geom-events, or should it be done in each GEOM RAID class separately? But after all, I realy appreciate your work in this area! I hope I will have time to test it soon. Thank you! Miroslav Lachman ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: RFC: Project geom-events
On 04.10.2011 22:05, Lev Serebryakov wrote: So, here it is. GEOM Events. Project consists of several parts (all are ready and commited to project branch!): Hi, Lev (5) Changes in all geom classes to post these events. It seems that you could change only geom_dev.c to get most of what you want. Actually, the part of your changes related to the DISCONNECT events, and maybe DESTROY events could be implemented in the geom_dev. -- WBR, Andrey V. Elsukov ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org