Re: RFC: Project geom-events

2011-10-10 Thread perryh
Lev Serebryakov l...@freebsd.org wrote:

 GPT must have backup copy in last sector by standard ...

In that case, shouldn't it refuse to install on any provider that is
not in fact a disk, so as not to create configurations that cannot
work properly?

 MBR doesn;t have any additional metadata. How adding one will help it?

It would add robustness, for cases like the one that started
this thread.  If MBR put a GEOM metadata block at the end of
its provider, it would fix the tasting race when an MBR is
installed on a glabelled (or gmirrored) drive.
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: RFC: Project geom-events

2011-10-10 Thread perryh
Lev Serebryakov l...@freebsd.org wrote:

  GPT _must_ be placed twice -- at first and last sectors
  (really, more than one sectors). By standard. Secondary
  copy must be at end of disk. Period.
  Then, by standard GPT cannot coexist with GLABEL. Such setup
  should be disallowed, or at least big nasty message that you
  have just shoot yourself in the leg should be output. (period)
 Ok, maybe adding check to geom_part, that it is used on rank-1
 provider (whole disk) is not so bad idea. But it then raise
 question how to install FreeBSD on software mirror, what is
 useful.

To install FreeBSD on a gmirrored disk, use MBR (or dangerously
dedicated BSD label) instead of GPT.  (This is one reason why
BSD label and MBR should not be considered obsolete.)

If you want to use gmirror and *have* to use GPT, e.g. if you
have a (hypothetical) BIOS which will not boot from MBR, mirror
the individual partitions instead of the whole disk.  Granted
that is more trouble, both to set up initially and to replace
a failed drive.
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: RFC: Project geom-events

2011-10-10 Thread perryh
John j...@freebsd.org wrote:
  ... gpart should show warning message if user is trying to put
  GPT on non real disk devices.
...
This also seem to prevent something useful like:

 # camcontrol inquiry da0
 pass2: HP EH0146FAWJB HPDD Fixed Direct Access SCSI-5 device 
 pass2: Serial Number 3TB1BKGX9036W9EN
 pass2: 600.000MB/s transfers, Command Queueing Enabled
 # camcontrol inquiry da25
 pass27: HP EH0146FAWJB HPDD Fixed Direct Access SCSI-5 device 
 pass27: Serial Number 3TB1BKGX9036W9EN
 pass27: 600.000MB/s transfers, Command Queueing Enabled

 # gmultipath label ZFS0 da0 da25
 # gpart create -s gpt $device
 # gpart add -s 128 -t freebsd-boot   $device # Create 64K boot 
 partition
 # gpart add -s 4m  -t freebsd-ufs  -l mb$dev $device # small partition
 # gpart add-t freebsd-zfs  -l $dev   $device # Remaining space for zfs

It seems like protecting your partitions with multiple
 paths would be a good thing.  I've been experimenting with
 this and end up with corrupt partitions.

The setting of $device is not shown, but I suppose it is the name
of the multipath provider.

I'm not familiar with gmultipath, but it would not surprise me if
(like most GEOMs) it were putting its metadata in the last block(s)
of its providers and therefore encountering the same issues as
gmirror and glabel.  In that case, the best fix may be to define
the multipathing per-partition instead of per-device (if that is
possible), or to use MBR/BSD instead of GPT for partitioning.
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: RFC: Project geom-events

2011-10-09 Thread Miroslav Lachman

Lev Serebryakov wrote:

Hello, Miroslav.
You wrote 6 октября 2011 г., 16:59:19:


[...]


The current state is simply wrong, because user can do something what
cannot work and is not documented anywhere.

   It is Ok in UNIX way, in general. You should be able to shoot your
  leg, it is good :)


I am sorry for my late reply.
Foot shooting is OK, if somebody wants to shoot his foot, but I don't 
want to shoot my foot if I am aiming at my head :)



   But if geom_label doesn't reduce its provider to count its own
  metadata, it looks like a bug!


As Ivan Voras explained, it is not a bug, it is just a matter of mixing 
two things thant can't coexist together. So the problem is that it is 
not mentioned anywhere in the FreeBSD docs. (Thank you Ivan for your 
explanation!)
And as somebody else already mentioned in this thread, it should be 
documented in manpages and Handbook and gpart should show warning 
message if user is trying to put GPT on non real disk devices.


As is mentioned in the thread Memstick image differences between 8.x 
and 9.x, the GPT brings more problems by requirement of second table at 
the end of the device (so disk image cannot be easily written by dd on 
bigger disk)

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: RFC: Project geom-events

2011-10-09 Thread John
- Miroslav Lachman's Original Message -
 Lev Serebryakov wrote:
 Hello, Miroslav.
 You wrote 6 ?? 2011 ??., 16:59:19:
 
 [...]
 
 The current state is simply wrong, because user can do something what
 cannot work and is not documented anywhere.
It is Ok in UNIX way, in general. You should be able to shoot your
   leg, it is good :)
 
 I am sorry for my late reply.
 Foot shooting is OK, if somebody wants to shoot his foot, but I don't 
 want to shoot my foot if I am aiming at my head :)
 
But if geom_label doesn't reduce its provider to count its own
   metadata, it looks like a bug!
 
 As Ivan Voras explained, it is not a bug, it is just a matter of mixing 
 two things thant can't coexist together. So the problem is that it is 
 not mentioned anywhere in the FreeBSD docs. (Thank you Ivan for your 
 explanation!)
 And as somebody else already mentioned in this thread, it should be 
 documented in manpages and Handbook and gpart should show warning 
 message if user is trying to put GPT on non real disk devices.
 
 As is mentioned in the thread Memstick image differences between 8.x 
 and 9.x, the GPT brings more problems by requirement of second table at 
 the end of the device (so disk image cannot be easily written by dd on 
 bigger disk)

   This also seem to prevent something useful like:

# camcontrol inquiry da0
pass2: HP EH0146FAWJB HPDD Fixed Direct Access SCSI-5 device 
pass2: Serial Number 3TB1BKGX9036W9EN
pass2: 600.000MB/s transfers, Command Queueing Enabled
# camcontrol inquiry da25
pass27: HP EH0146FAWJB HPDD Fixed Direct Access SCSI-5 device 
pass27: Serial Number 3TB1BKGX9036W9EN
pass27: 600.000MB/s transfers, Command Queueing Enabled

# gmultipath label ZFS0 da0 da25
# gpart  create -s gpt $device
# gpart  add-s 128-t freebsd-boot$device # Create 
64K boot partition
# gpart  add-s 4m -t freebsd-ufs  -l mb$dev  $device # small 
partition
# gpart  add  -t freebsd-zfs  -l $dev$device # 
Remaining space for zfs

   It seems like protecting your partitions with multiple
paths would be a good thing.  I've been experimenting with this
and end up with corrupt partitions.

   Am I missing something?

-john
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: RFC: Project geom-events

2011-10-08 Thread Lev Serebryakov
Hello, Daniel.
You wrote 8 октября 2011 г., 0:13:54:

GPT (and MBR) metadata placement is dictated from outside world,
 where is no GEOM and geom_label. They INTENDED to be used on DISKS.
 BIOSes should be able to find it :)
 Certainly GPT and MBR must place an instance of the partition table
 where the BIOS expects it, but there's no immediately obvious reason
 why they must regard that instance as their GEOM metadata.  GPT puts
 a second copy in the provider's last block, and AFAICT it could just
 as well use _that_ instance -- or even a differently-formatted block
 that included the same data -- as the primary.  MBR could do likewise.
I have deja-vu, that I answered this. Please, read standard. GPT
   _must_ be placed twice -- at first and last sectors (really, more
   than one sectors). By standard. Secondary copy must be at end of
   disk. Period.
 Then, by standard GPT cannot coexist with GLABEL. Such setup should be
 disallowed, or at least big nasty message that you have just shoot 
 yourself in the leg should be output. (period)
  Ok, maybe adding check to geom_part, that it is used on rank-1
 provider (whole disk) is not so bad idea. But it then raise question
 how to install FreeBSD on software mirror, what is useful. But could
 bite you sometimes... Hm...

-- 
// Black Lion AKA Lev Serebryakov l...@freebsd.org

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: RFC: Project geom-events

2011-10-08 Thread Lev Serebryakov
Hello, Lev.
You wrote 8 октября 2011 г., 13:52:21:

 GPT must have backup copy in last sector by standard ...
 In that case, shouldn't it refuse to install on any provider that is
 not in fact a disk, so as not to create configurations that cannot
 work properly?
  Installation of FreeBSD on software mirror?..

 MBR doesn;t have any additional metadata. How adding one will help it?
 It would add robustness, for cases like the one that started
 this thread.  If MBR put a GEOM metadata block at the end of
 its provider, it would fix the tasting race when an MBR is
 installed on a glabelled (or gmirrored) drive.
  And how it should work with MBR created by non-FreeBSD tools?

-- 
// Black Lion AKA Lev Serebryakov l...@freebsd.org

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: RFC: Project geom-events

2011-10-08 Thread Lev Serebryakov
Hello, Ivan.
You wrote 8 октября 2011 г., 0:23:14:

 If you think this should be explicitely handled, please file a PR
 which requests the modification of gpart so that it detects that a GPT
 is being created in anything other than a raw drive, and warns the
 user.
  It should be mentioned in documentation, at least.
  But how people will create bootable gmirror installation in such
 case? Make (many) mirrors from parts? I don't like this idea...

-- 
// Black Lion AKA Lev Serebryakov l...@freebsd.org

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: RFC: Project geom-events

2011-10-08 Thread Daniel Kalchev

On Oct 8, 2011, at 12:05 , Lev Serebryakov wrote:

 Hello, Ivan.
 You wrote 8 октября 2011 г., 0:23:14:
 
 If you think this should be explicitely handled, please file a PR
 which requests the modification of gpart so that it detects that a GPT
 is being created in anything other than a raw drive, and warns the
 user.
  It should be mentioned in documentation, at least.
  But how people will create bootable gmirror installation in such
 case? Make (many) mirrors from parts? I don't like this idea...

Good example of what I would call laziness -- other would call it hacking I 
guess. Either way, the solution we have now is permitting some exotic setups, 
but is fragile and is not consistent. Most of the useful features are actually 
side effects of the hack.

If it should remain this way, a warning in the documentation and at runtime is 
very helpful.

Daniel___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: RFC: Project geom-events

2011-10-07 Thread Lev Serebryakov
Hello, Perryh.
You wrote 7 октября 2011 г., 18:06:38:

   GPT (and MBR) metadata placement is dictated from outside world,
 where is no GEOM and geom_label. They INTENDED to be used on DISKS.
 BIOSes should be able to find it :)

 Certainly GPT and MBR must place an instance of the partition table
 where the BIOS expects it, but there's no immediately obvious reason
 why they must regard that instance as their GEOM metadata.  GPT puts
 a second copy in the provider's last block, and AFAICT it could just
 as well use _that_ instance -- or even a differently-formatted block
  GPT must have backup copy in last sector by standard, it is not
caprise of GEOM class author.. BIOSes could refuse to boot from it, if
they don't find second copy. And it could occupies not only one sector,
but up to 34 of them.

 that included the same data -- as the primary.  MBR could do likewise.
  MBR doesn;t have any additional metadata. How adding one will help
 it?

-- 
// Black Lion AKA Lev Serebryakov l...@freebsd.org

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: RFC: Project geom-events

2011-10-07 Thread perryh
Lev Serebryakov l...@freebsd.org wrote:

   GPT (and MBR) metadata placement is dictated from outside world,
 where is no GEOM and geom_label. They INTENDED to be used on DISKS.
 BIOSes should be able to find it :)

Certainly GPT and MBR must place an instance of the partition table
where the BIOS expects it, but there's no immediately obvious reason
why they must regard that instance as their GEOM metadata.  GPT puts
a second copy in the provider's last block, and AFAICT it could just
as well use _that_ instance -- or even a differently-formatted block
that included the same data -- as the primary.  MBR could do likewise.
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: RFC: Project geom-events

2011-10-07 Thread Lev Serebryakov
Hello, Perryh.
You wrote 7 октября 2011 г., 18:06:38:

   GPT (and MBR) metadata placement is dictated from outside world,
 where is no GEOM and geom_label. They INTENDED to be used on DISKS.
 BIOSes should be able to find it :)
 Certainly GPT and MBR must place an instance of the partition table
 where the BIOS expects it, but there's no immediately obvious reason
 why they must regard that instance as their GEOM metadata.  GPT puts
 a second copy in the provider's last block, and AFAICT it could just
 as well use _that_ instance -- or even a differently-formatted block
 that included the same data -- as the primary.  MBR could do likewise.
  I have deja-vu, that I answered this. Please, read standard. GPT
 _must_ be placed twice -- at first and last sectors (really, more
 than one sectors). By standard. Secondary copy must be at end of
 disk. Period.

-- 
// Black Lion AKA Lev Serebryakov l...@freebsd.org

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: RFC: Project geom-events

2011-10-07 Thread Daniel Kalchev



On 07.10.11 22:44, Lev Serebryakov wrote:

Hello, Perryh.
You wrote 7 октября 2011 г., 18:06:38:


   GPT (and MBR) metadata placement is dictated from outside world,
where is no GEOM and geom_label. They INTENDED to be used on DISKS.
BIOSes should be able to find it :)

Certainly GPT and MBR must place an instance of the partition table
where the BIOS expects it, but there's no immediately obvious reason
why they must regard that instance as their GEOM metadata.  GPT puts
a second copy in the provider's last block, and AFAICT it could just
as well use _that_ instance -- or even a differently-formatted block
that included the same data -- as the primary.  MBR could do likewise.

   I have deja-vu, that I answered this. Please, read standard. GPT
  _must_ be placed twice -- at first and last sectors (really, more
  than one sectors). By standard. Secondary copy must be at end of
  disk. Period.



Then, by standard GPT cannot coexist with GLABEL. Such setup should be 
disallowed, or at least big nasty message that you have just shoot 
yourself in the leg should be output. (period)


Daniel

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: RFC: Project geom-events

2011-10-07 Thread Ivan Voras
2011/10/7 Daniel Kalchev dan...@digsys.bg:

 Then, by standard GPT cannot coexist with GLABEL. Such setup should be
 disallowed, or at least big nasty message that you have just shoot yourself
 in the leg should be output. (period)

GPT cannot coexist with ANY GEOM CLASS which writes metadata to the last sector.

If you think this should be explicitely handled, please file a PR
which requests the modification of gpart so that it detects that a GPT
is being created in anything other than a raw drive, and warns the
user.
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: RFC: Project geom-events

2011-10-06 Thread Lev Serebryakov
Hello, John-Mark.
You wrote 6 октября 2011 г., 2:53:53:

gmirror0
  gstripe0
ada0
ada1
  gstripe1
ada2
ada3
 
   and administrator kills gstripe0, for example, geom_mirror will send
  event, because from its point of view it is not administrative
  action...
   But such situations, IMHO, are not very often ones.
 Won't gmirror still report COMPLETE after a gmirror remove?  So the
  I say kill gstripe0, not Remove gstripe0 from gmirror0, it is
different situations. gmirror0 will be DEGRADED after this action, but
will send DISCONNECT message with fixable state and it is state when
geom-events(8) try to find replacement (spare). Exactly as when it
lost component due to accident. If you say gmirror remove, yes, it
will be COMPLETE after it.

 script can look at the gmirror device, and see that it is still
 complete even though one of the providers were dropped and assume
 it was an administrative command that did it..
   Here is one problem: there is no STANDARD way to understand state of
 provider from userland, as it is GEOM-specific. g${class} status ${geom} 
prints almost
 free-form information. Not all classes names it COMPLETE, and some
 classes (geom_raid, for example) could have many providers and many
 states, which adds complexity. So, to make it work this way I need to
 add knowledge about all classes and their output formats to
 geom-ecents(8). I don't think, that it is good design -- it is bad
 idea to put knowledge about GEOM classes in two places -- class
 itself and some script. It will hard to synchronize, etc. So, I
 think, GEOM class itself should decide and report its state in
 standard way.

-- 
// Black Lion AKA Lev Serebryakov l...@freebsd.org

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: RFC: Project geom-events

2011-10-06 Thread Lev Serebryakov
Hello, Alexander.
You wrote 6 октября 2011 г., 1:34:33:

 That works perfect for case when class (geom_raid) is known to work on
 raw device. Other RAID classes can be used over partitions, so some care
 should be taken to avoid false positives.
  Oh, yes... I see...

   I'm not sure here.
 In that case it is helpful to include media size into the metadata.
 Comparing that value with provider size during taste allows to avoid
 these false positives. geom_mirror metadata include/check provider size
 since version 3. Pity that MBR and probably others don't.
  Yep.

 And what if class is not loaded/supported? There should be a way to
 manage/clear that label.
  Most of classes have clear command which doesn't need loaded module
and works completely from userland now (as label works from userland
too for these classes). And, if such changes will be made,
generic command to geom itself, without class at all, could be added
-- of course, it will refuse to clear anything that doesn't start from
common signature.

-- 
// Black Lion AKA Lev Serebryakov l...@freebsd.org

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: RFC: Project geom-events

2011-10-06 Thread Ivan Voras
On 06/10/2011 00:12, Miroslav Lachman wrote:
 Scot Hetzel wrote:
 2011/10/5 Miroslav Lachman000.f...@quip.cz:
 I am waiting years for the moment, when these GEOM problems will be
 fixed,
 so I am really glad to see your interest!
 It will be move to right direction even if changes will not be backward
 compatible.
 The current state is too fragile to be used in production. Gmirror
 alone can
 be used, glabel alone can be used, GPT alone can be used... but mix
 it all
 stacked together is way to hell.

 e.g. Using GPT on glabeled provider always ends with error message about
 corrupted secondary GPT table. (But how can I use iSCSI in reliable
 way if I
 cannot use glable on devices and iSCSI device can have different
 number on
 each reboot? I wrote about it almost 2 years ago)

 You don't need to use glabel on GPT disks, as gpart has it's own way
 to label GPT disks:
 
 [...]
 
 The point was that glabel on disk device is successful, gpartitioning on
 glabeled device is successful, but metadata handling / device tasting is
 wrong after reboot and this should be fixed, not worked around.
 
 Otherwise thank you for example with GPT labels, it can be useful in
 some cases.

Um, you do realize this is a physical problem with metadata location
and cannot be solved in any meaningful way? Geom_label stores its label
in the last sector of the device, and GPT stores the secondary /
backup table also at the end of the device. The two can NEVER work
together. The same goes for any other GEOM class which stores metadata
and GPT.

The only way to get this sorted out is to make a label class (or adapt
glabel) which does NOT store metadata anywhere on the devices. Maybe
they can store it in the file system (a file in /etc - though you then
lose bootability, and have to somehow connect devices and labels), or
the device hardware ID can be used as a label (but not all devices have
it, and in case of software constructs like iSCSI the labels can be
changed).




signature.asc
Description: OpenPGP digital signature


Re: RFC: Project geom-events

2011-10-06 Thread Daniel Kalchev



On 06.10.11 14:07, Ivan Voras wrote:


Um, you do realize this is a physical problem with metadata location
and cannot be solved in any meaningful way? Geom_label stores its label
in the last sector of the device, and GPT stores the secondary /
backup table also at the end of the device. The two can NEVER work
together. The same goes for any other GEOM class which stores metadata
and GPT.


The proper way for this is to have these things store their metadata in 
the first/last sector of the provider, not the underlying device.


This means that, if you have GPT within GLABEL, for example -- you will 
only see the GPT label if you first see the GLABEL.


I guess the present situation was created out of laziness ;)
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: RFC: Project geom-events

2011-10-06 Thread Ivan Voras
On 06/10/2011 13:29, Daniel Kalchev wrote:
 
 
 On 06.10.11 14:07, Ivan Voras wrote:

 Um, you do realize this is a physical problem with metadata location
 and cannot be solved in any meaningful way? Geom_label stores its label
 in the last sector of the device, and GPT stores the secondary /
 backup table also at the end of the device. The two can NEVER work
 together. The same goes for any other GEOM class which stores metadata
 and GPT.
 
 The proper way for this is to have these things store their metadata in
 the first/last sector of the provider, not the underlying device.
 
 This means that, if you have GPT within GLABEL, for example -- you will
 only see the GPT label if you first see the GLABEL.
 
 I guess the present situation was created out of laziness ;)

No, I don't think you understand.

The layering *is* correct and you *can* create a GPT inside a glabel
label, but then

1) you get device names like /dev/label/somethingp1,
/dev/label/somethingp2, etc.

2) this makes the device unbootable as the GPT partition is per
definition not valid. It still stores the primary partition table on the
first sector (and the following sectors...), but its secondary table is
stored at one sector short of device's last sector (which is used by
glabel). Any utilities and BIOSes which test for GPT will find the first
table but not the last and depending on how sensitive / broken they are,
they will either recognize a broken GPT (and/or try to fix it,
destroying the glabel label), or not work at all.

You could argue that the GPT design is broken, but it was always, per
design, only made to work on whole drives. There is no way to use it
with any other scheme which uses either the first or the last sectors of
a drive.

Luckily, GPT also provides its own labels (per design) and instead of
labeling the provider, you could just as easily label the individual
partitions and skip glabel in this case.




signature.asc
Description: OpenPGP digital signature


Re: RFC: Project geom-events

2011-10-06 Thread Miroslav Lachman

Ivan Voras wrote:

The point was that glabel on disk device is successful, gpartitioning on
  glabeled device is successful, but metadata handling / device tasting is
  wrong after reboot and this should be fixed, not worked around.

  Otherwise thank you for example with GPT labels, it can be useful in
  some cases.

Um, you do realize this is a physical problem with metadata location
and cannot be solved in any meaningful way? Geom_label stores its label
in the last sector of the device, and GPT stores the secondary /
backup table also at the end of the device. The two can NEVER work
together. The same goes for any other GEOM class which stores metadata
and GPT.

The only way to get this sorted out is to make a label class (or adapt
glabel) which does NOT store metadata anywhere on the devices. Maybe
they can store it in the file system (a file in /etc - though you then
lose bootability, and have to somehow connect devices and labels), or
the device hardware ID can be used as a label (but not all devices have
it, and in case of software constructs like iSCSI the labels can be
changed).


Then there should be warning in documentation or error message printed 
by command in the time of writing metadata.


I am not a GEOM expert, but isn't it wrong concept, that glabel writes 
its metadata and publish original device size? If some GEOM write 
metadata at last sector (or first), then it should shrink the published 
size (or offset). Or is the problem at geom_part, that it is writing 
metadata past the advertised end of the device?


e.g. If I have disk device with size of 100 sectors and glabel metadata 
is stored at the last sector, then glabel should shrink the advertised 
size to 99 sectors - then GPT secondary table will be at sector 99 
instead of 100.


I know there is problem if somebody access the device by its normal 
device node (e.g. /dev/ada0), then secondary GPT table will be at 
different place, not in last sector. But this is the mistake in glabel 
concept and if it cannot be solved by any other way, then glabel should 
not be allowed to place labels on the disk device at all. (if we cannot 
be sure it is non conflicting)


The current state is simply wrong, because user can do something what 
cannot work and is not documented anywhere.


Miroslav Lachman
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: RFC: Project geom-events

2011-10-06 Thread Daniel Kalchev



On 06.10.11 15:36, Ivan Voras wrote:

On 06/10/2011 13:29, Daniel Kalchev wrote:


On 06.10.11 14:07, Ivan Voras wrote:

Um, you do realize this is a physical problem with metadata location
and cannot be solved in any meaningful way? Geom_label stores its label
in the last sector of the device, and GPT stores the secondary /
backup table also at the end of the device. The two can NEVER work
together. The same goes for any other GEOM class which stores metadata
and GPT.

The proper way for this is to have these things store their metadata in
the first/last sector of the provider, not the underlying device.

This means that, if you have GPT within GLABEL, for example -- you will
only see the GPT label if you first see the GLABEL.

I guess the present situation was created out of laziness ;)

No, I don't think you understand.

The layering *is* correct and you *can* create a GPT inside a glabel
label, but then

1) you get device names like /dev/label/somethingp1,
/dev/label/somethingp2, etc.


.. and, you overwrite the last sector of the device, not of the 
provider. This is incorrect layering -- GPT should see only the provider 
it was given and nothing at different layers.




2) this makes the device unbootable as the GPT partition is per
definition not valid. It still stores the primary partition table on the
first sector (and the following sectors...), but its secondary table is
stored at one sector short of device's last sector (which is used by
glabel). Any utilities and BIOSes which test for GPT will find the first
table but not the last and depending on how sensitive / broken they are,
they will either recognize a broken GPT (and/or try to fix it,
destroying the glabel label), or not work at all.


This is why I said, lazy. If you use proper layering, the GPT within 
another GEOM provider should be unbootable. No BIOS should ever see 
it. The rationale of using GPT within another GEOM is questionable, but 
apparently has applications.



You could argue that the GPT design is broken, but it was always, per
design, only made to work on whole drives. There is no way to use it
with any other scheme which uses either the first or the last sectors of
a drive.


I am not arguing this.

But suppose you use GPT on HAST providers. These cannot boot so whether 
GPT is bootable in that case is irelevant.



Luckily, GPT also provides its own labels (per design) and instead of
labeling the provider, you could just as easily label the individual
partitions and skip glabel in this case.



That is another option, yes. But this does not mean we do not have a issue.

Probably, it should be simply disallowed to use GPT within GLABEL. Or, 
in cases where this might be beneficial, use sort of 'boot-less' GPT.


Probably, the problem here is with GLABEL, that does not use the first 
sector as well.

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: RFC: Project geom-events

2011-10-06 Thread Ivan Voras

 I am not a GEOM expert, but isn't it wrong concept, that glabel writes
 its metadata and publish original device size? 

It does not.

# diskinfo -v /dev/md0
/dev/md0
512 # sectorsize
104857600   # mediasize in bytes (100M)
204800  # mediasize in sectors
0   # stripesize
0   # stripeoffset

# glabel create /dev/label/blah md0

# diskinfo -v /dev/label/blah
/dev/label/blah
512 # sectorsize
104857088   # mediasize in bytes (100M)
204799  # mediasize in sectors
0   # stripesize
0   # stripeoffset

(i.e. one sector is used for glabel).

# gpart create -s GPT /dev/label/blah

# gpart add -t freebsd -l gptpart label/blah

# diskinfo -v /dev/label/blahs1
/dev/label/blahs1
512 # sectorsize
104822784   # mediasize in bytes (100M)
204732  # mediasize in sectors
0   # stripesize
17408   # stripeoffset
710 # Cylinders according to firmware.
32  # Heads according to firmware.
9   # Sectors according to firmware.

(i.e. 67 sectors are used for: protective MBR (1), GPT header (1), GPT
table (32), the backup GPT header (1) and the backup GPT table (32)).

 If some GEOM write
 metadata at last sector (or first), then it should shrink the published
 size (or offset). 

It does that.

 Or is the problem at geom_part, that it is writing
 metadata past the advertised end of the device?

There is no problem as far as I can see. The only possible problem is
that you are trying to do something outside the specification. This is
not the fault of FreeBSD, GEOM, or glabel, and it's even probably not
the fault of the specification as it cannot avoid real-world problems
such as this.

See this:

http://www.uefi.org/specs/

If the primary GPT is invalid, the backup GPT is used instead and it is
located on the last logical
block on the disk. If the backup GPT is valid it must be used to restore
the primary GPT. If the
primary GPT is valid and the backup GPT is invalid software must restore
the backup GPT. If both
the primary and backup GPTs are corrupted this block device is defined
as not having a valid GUID
Partition Header.

Even though the primary GPT header contains the LBA of the backup GPT
header, this paragraph says that it must be the last sector of the device.

 e.g. If I have disk device with size of 100 sectors and glabel metadata
 is stored at the last sector, then glabel should shrink the advertised
 size to 99 sectors - then GPT secondary table will be at sector 99
 instead of 100.

Maybe an illustration will help. In the scenario like the above, this is
what you have on the drive:

[ PMBR | GPT 1 | --- random file system data --- | GPT 2 | glabel ]

The $glabeled_size is $device_sectors - 1.
The size of the GPT available space is $glabeled_size - 67.

Since the PMBR and primary GPT headers are located at the start of the
drive, they are detected there. A tool which is made to the
specification from the paragraph above will detect that the second GPT
table (GPT 2) is invalid in this configuration and will try to fix it
by destroying the glabel metadata.

It doesn't matter that from GEOM's point of view, the layering is
correct and the provider sizes are calculated correctly. If you want to
do this, you must do it with something which isn't GPT.





signature.asc
Description: OpenPGP digital signature


Re: RFC: Project geom-events

2011-10-06 Thread Lev Serebryakov
Hello, Daniel.
You wrote 6 октября 2011 г., 15:29:58:


 The proper way for this is to have these things store their metadata in
 the first/last sector of the provider, not the underlying device.

 This means that, if you have GPT within GLABEL, for example -- you will
 only see the GPT label if you first see the GLABEL.

 I guess the present situation was created out of laziness ;)
  No. GPT (and MBR) metadata placement is dictated from outside world,
where is no GEOM and geom_label. They INTENDED to be used on DISKS.
BIOSes should be able to find it :)

-- 
// Black Lion AKA Lev Serebryakov l...@freebsd.org

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: RFC: Project geom-events

2011-10-06 Thread Lev Serebryakov
Hello, Miroslav.
You wrote 6 октября 2011 г., 16:59:19:

 I am not a GEOM expert, but isn't it wrong concept, that glabel writes
 its metadata and publish original device size? If some GEOM write 
 metadata at last sector (or first), then it should shrink the published
 size (or offset). Or is the problem at geom_part, that it is writing 
 metadata past the advertised end of the device?
  Good point.

 e.g. If I have disk device with size of 100 sectors and glabel metadata
 is stored at the last sector, then glabel should shrink the advertised
 size to 99 sectors - then GPT secondary table will be at sector 99 
 instead of 100.

 The current state is simply wrong, because user can do something what
 cannot work and is not documented anywhere.
  It is Ok in UNIX way, in general. You should be able to shoot your
 leg, it is good :)

  But if geom_label doesn't reduce its provider to count its own
 metadata, it looks like a bug!

-- 
// Black Lion AKA Lev Serebryakov l...@freebsd.org

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: RFC: Project geom-events

2011-10-06 Thread Daniel Kalchev



On 06.10.11 17:04, Pieter de Goeje wrote:

The layering *is* correct and you *can* create a GPT inside a glabel
label, but then

1) you get device names like /dev/label/somethingp1,
/dev/label/somethingp2, etc.

.. and, you overwrite the last sector of the device, not of the
provider. This is incorrect layering -- GPT should see only the provider
it was given and nothing at different layers.

If you stack GPT on top of glabel, then your statement is not true. GPT will
overwrite the last sector of the (glabel) provider, not the underlying device.
There is no layering violation.


I stand corrected. Sorry for creating confusion with this statement.

Most of the time I was blaming GPT, I was actually blaming GLABEL (see 
below)



Because physically the first sector of the device is still GPT data the BIOS
will still try to boot from it, hence it would probably be wise to disallow
GPT on anything other then raw devices.
Yes, but.. what is a raw device? Probably disallow GPT on devices that 
are not bootable, but how this can be indicated? GPT is very useful for 
it's ability to create labeled partitions.



This problem wouldn't exist if geom classes would write their metadata to the
first sector, but then you could no longer boot from for example
gmirrored/glabeled devices with a MBR.


We seem to blame GPT here, but it is really GLABEL the culprit here.

If GLABEL writes to the first sector of the provider and that makes the 
disk non-bootable, then there is little chance that somebody will try to 
use first GLABEL, then GPT etc and create the current situation.


Unfortunately, the GLABEL + GMIRROR setup is so common..
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: RFC: Project geom-events

2011-10-06 Thread Pieter de Goeje
On Thursday, October 06, 2011 02:43:03 PM Daniel Kalchev wrote:
 On 06.10.11 15:36, Ivan Voras wrote:
  On 06/10/2011 13:29, Daniel Kalchev wrote:
  On 06.10.11 14:07, Ivan Voras wrote:
  Um, you do realize this is a physical problem with metadata location
  and cannot be solved in any meaningful way? Geom_label stores its label
  in the last sector of the device, and GPT stores the secondary /
  backup table also at the end of the device. The two can NEVER work
  together. The same goes for any other GEOM class which stores metadata
  and GPT.
  
  The proper way for this is to have these things store their metadata in
  the first/last sector of the provider, not the underlying device.
  
  This means that, if you have GPT within GLABEL, for example -- you will
  only see the GPT label if you first see the GLABEL.
  
  I guess the present situation was created out of laziness ;)
  
  No, I don't think you understand.
  
  The layering *is* correct and you *can* create a GPT inside a glabel
  label, but then
  
  1) you get device names like /dev/label/somethingp1,
  /dev/label/somethingp2, etc.
 
 .. and, you overwrite the last sector of the device, not of the
 provider. This is incorrect layering -- GPT should see only the provider
 it was given and nothing at different layers.

If you stack GPT on top of glabel, then your statement is not true. GPT will 
overwrite the last sector of the (glabel) provider, not the underlying device. 
There is no layering violation.

So you get this physical layout: Primary GPT header,data,Secondary GPT 
header,glabel metadata.

Because physically the first sector of the device is still GPT data the BIOS 
will still try to boot from it, hence it would probably be wise to disallow 
GPT on anything other then raw devices.

This problem wouldn't exist if geom classes would write their metadata to the 
first sector, but then you could no longer boot from for example 
gmirrored/glabeled devices with a MBR.

- Pieter

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: RFC: Project geom-events

2011-10-06 Thread Andrey V. Elsukov
On 06.10.2011 16:36, Ivan Voras wrote:
 2) this makes the device unbootable as the GPT partition is per
 definition not valid. It still stores the primary partition table on the
 first sector (and the following sectors...), but its secondary table is
 stored at one sector short of device's last sector (which is used by
 glabel). Any utilities and BIOSes which test for GPT will find the first
 table but not the last and depending on how sensitive / broken they are,
 they will either recognize a broken GPT (and/or try to fix it,
 destroying the glabel label), or not work at all.

Actually we support booting from GPT when secondary GPT header is not in
the last LBA. Our bootcode will complain in this situation, but it works.

-- 
WBR, Andrey V. Elsukov



signature.asc
Description: OpenPGP digital signature


Re: RFC: Project geom-events

2011-10-05 Thread Lev Serebryakov
Hello, Miroslav.
You wrote 5 октября 2011 г., 1:27:03:

 I am still missing one thing - dropped provider is not marked as failed
 RAID provider and is accessible for anything like normal disk device. So
 in some edge cases, the system can boot from failed RAID component 
 instead of degraded RAID. This can cause data loss or demage.
  What RAID do you mean exactly? geom_stripe? geom_mirrot? geom_raid?
Something else? If GEOM class drops underlying provider due to errors,
it doesn't have chances to update metadata for it.
  But most of classes, if dropped provider attached again, will
rebuild itself, as they track which components are actual and which
ones are old.
  Do you want GEOM classes to track droppen components somewhere else
and din't even try to attach them automaticaly when they re-appear?

 Is it possible to fix it by something like your geom-events, or should
 it be done in each GEOM RAID class separately?
  geom-events only process events from GEOM classes in userland. Each
 class should decide what happens to him by itself, as only class
 itself knows is this particular error fatal or not.
  geom-events could help, if it replaces dropped component fith spare
 drive, as in such case most classes prefer latest drive, not old
 one. Without spares, everything will be exactly as it is now, plus
 e-mails to administrator :)


-- 
// Black Lion AKA Lev Serebryakov l...@freebsd.org

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: RFC: Project geom-events

2011-10-05 Thread Lev Serebryakov
Hello, Andrey.
You wrote 5 октября 2011 г., 9:07:16:

 It seems that you could change only geom_dev.c to get most of what you want.
 Actually, the part of your changes related to the DISCONNECT events, and
 maybe DESTROY events could be implemented in the geom_dev.
  Does geom_dev knows all needed bits of information to report? It seems to me,
that it isn't.
  I mean:

   (1) Class and name of GEOM which is affected.
   (2) Name of provider which is affected.
   (3) Name of underlying provider which is lost (consumer from
   reporting GEOM's point of view).
   (4) Resulting state of affected provider (fixable, alive, dead).

  Yes, geom_dev knows name of FAILED provider, but does it knows all
 other? I'm affraid -- not, or I don't understand how generic
 mechanism could now, that geom_stripe could not lose components and
 still be fixable, and gome_mirror could.

  Additionally, some GEOM classes could throw away faulty consumers before
 they disappear from geom_dev point of view.

  Actually, DESTROY could be observed without my changes at all --
 message from DEVFS about removing entry :) But, again, this
 notification will not contain name and class of GEOM, only provider's
 name (devfs entry).


-- 
// Black Lion AKA Lev Serebryakov l...@freebsd.org

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: RFC: Project geom-events

2011-10-05 Thread Lev Serebryakov
Hello, Andrey.
You wrote 5 октября 2011 г., 10:27:10:

 It seems that you could change only geom_dev.c to get most of what you want.
 Actually, the part of your changes related to the DISCONNECT events, and
 maybe DESTROY events could be implemented in the geom_dev.
   Does geom_dev knows all needed bits of information to report? It seems to 
 me,
 that it isn't.
   I mean:

(1) Class and name of GEOM which is affected.
(2) Name of provider which is affected.
(3) Name of underlying provider which is lost (consumer from
reporting GEOM's point of view).
(4) Resulting state of affected provider (fixable, alive, dead).
  And, I'm affraid, that geom_dev could not distinguish manual
operations with geom (performed from userland by administrator) and
real accidents. I don't want geoms to post DISCONNECTED or DESTROYED
events when administrator knows what he does -- and it could lead to
race conditions, when administrator rebuild array and forgot todisable
spare drives, for example.
  Other example -- geom_label creates and destroys about 10 labels on
boot (on my test VM) and, if DESTROYED will be reported by very
generic mechanism, it will end up with 10 e-mails to administrator on
every boot -- I've got this, when put notifications in too generic
place for first try.


-- 
// Black Lion AKA Lev Serebryakov l...@freebsd.org

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: RFC: Project geom-events

2011-10-05 Thread Lev Serebryakov
Hello, Stephane.
You wrote 5 октября 2011 г., 10:25:51:

 On 10/05/2011 03:19 PM, Lev Serebryakov wrote:
 A bit unrelated, but are there plans to integrate hardware RAID
 (mps/mfi/mpt/amr) failure notification in the same way as this would be
 done for GEOM ? As in, one framework and way to manage both hard and
 soft RAIDs.
  I don't have such plans, as I think, only drivers authors could
identify proper places to add event sending. Drivers are much more
complicated, that RAID classes (I was unable to find proper places for
geom_vinum, for example, and hardware drivers doesn't look simpler,
that that).

 But from userland's point of view, there is nothing special about
hardware RAIDs -- geom-events(8) needs two commands to be configured:
to remove failed drive from array and to add new one, that's all.

 Of course, GEOM system name in events will looks like odd for
hardware controllers. But it could be renamed to something more
generic. And hardware controllers has same bits of information as
software ones -- type of controller, name of failed drive, name of
affected volume and resulting state, everything is the same.

 So, if here is interest form hardware RAID driver's authors, it could
be integrated, of course.

-- 
// Black Lion AKA Lev Serebryakov l...@freebsd.org

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: RFC: Project geom-events

2011-10-05 Thread Andrey V. Elsukov
On 05.10.2011 10:39, Lev Serebryakov wrote:
(1) Class and name of GEOM which is affected.
(2) Name of provider which is affected.
(3) Name of underlying provider which is lost (consumer from
reporting GEOM's point of view).
(4) Resulting state of affected provider (fixable, alive, dead).

All except last could be get from the consumer in the orphan method.

   And, I'm affraid, that geom_dev could not distinguish manual
 operations with geom (performed from userland by administrator) and
 real accidents. I don't want geoms to post DISCONNECTED or DESTROYED
 events when administrator knows what he does -- and it could lead to
 race conditions, when administrator rebuild array and forgot todisable
 spare drives, for example.
   Other example -- geom_label creates and destroys about 10 labels on
 boot (on my test VM) and, if DESTROYED will be reported by very
 generic mechanism, it will end up with 10 e-mails to administrator on
 every boot -- I've got this, when put notifications in too generic
 place for first try.

Ok, good point. Can you explain how your script will distinguish which
actions are performed by administrator? Since change made by administrator
could trigger disappearing of several child geoms.

-- 
WBR, Andrey V. Elsukov
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: RFC: Project geom-events

2011-10-05 Thread Miroslav Lachman

Lev Serebryakov wrote:

Hello, Miroslav.
You wrote 5 октября 2011 г., 1:27:03:


I am still missing one thing - dropped provider is not marked as failed
RAID provider and is accessible for anything like normal disk device. So
in some edge cases, the system can boot from failed RAID component
instead of degraded RAID. This can cause data loss or demage.

   What RAID do you mean exactly? geom_stripe? geom_mirrot? geom_raid?
Something else?


I am mostly using geom_mirror.


If GEOM class drops underlying provider due to errors,
it doesn't have chances to update metadata for it.


I understand this, but if there are (stale) metadata on provider, system 
can read this metadata and should disallow normal operations (for 
example propagating slices, partitions and labels)



   But most of classes, if dropped provider attached again, will
rebuild itself, as they track which components are actual and which
ones are old.


I see many times dropped provider (for example ada1) because of some DMA 
timeout (bad cables and so on), sometimes provider (disk ada1) detached 
from ATA channel and reattached after reboot. In both cases, provider 
has stale metadata and is marked as broken by geom_mirror and auto 
rebuild did not start.


In this case, I see gm0 with all of its slices, partitions and labels 
and ada1 with the same slices, partitions and labels - this is the 
problem. Because there are two devices providing same labels and the 
winner is the first tasted... Even if the system (geom_mirror) knows, 
that ada1 is broken disk.


I think that GEOM should be more robust in this case and if metadata is 
found, do not publish slices, partitions, labels and so on...



   Do you want GEOM classes to track droppen components somewhere else
and din't even try to attach them automaticaly when they re-appear?


If some disk is removed, reinserted and synchronisation starts, then 
everything is OK. But situation where component is marked as broken 
and system and user can operate on it like on normal good and clean 
drive is wrong.


The drive's content should be inacessible until operator do some action 
(for example gmirror clear on broken disk device).


Miroslav Lachman
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: RFC: Project geom-events

2011-10-05 Thread Lev Serebryakov
Hello, Andrey.
You wrote 5 октября 2011 г., 11:51:36:

 On 05.10.2011 10:39, Lev Serebryakov wrote:
(1) Class and name of GEOM which is affected.
(2) Name of provider which is affected.
(3) Name of underlying provider which is lost (consumer from
reporting GEOM's point of view).
(4) Resulting state of affected provider (fixable, alive, dead).

 All except last could be get from the consumer in the orphan method.
  I'm afraid, that (2) could not be known too in generic way, as GEOM
could have several providers, and only part of them could be affected by
disconnection. Consumer contains geom (with class) and underlying
provider, it is items (1) and (3)...

   Other example -- geom_label creates and destroys about 10 labels on
 boot (on my test VM) and, if DESTROYED will be reported by very
 generic mechanism, it will end up with 10 e-mails to administrator on
 every boot -- I've got this, when put notifications in too generic
 place for first try.
 Ok, good point. Can you explain how your script will distinguish which
 actions are performed by administrator? Since change made by administrator
 could trigger disappearing of several child geoms.
  Not the script, but GEOMs themselves. They knows, why disk
 disappears. Of course, it work only one-level -- if administrator
 calls gmirror remove gm0 ada4 geom_mirror knows, that ada4 is no
 failed. Yes, I understand, that if here is configuration like this:

   gmirror0
 gstripe0
   ada0
   ada1
 gstripe1
   ada2
   ada3

  and administrator kills gstripe0, for example, geom_mirror will send
 event, because from its point of view it is not administrative
 action...
  But such situations, IMHO, are not very often ones.

-- 
// Black Lion AKA Lev Serebryakov l...@freebsd.org

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: RFC: Project geom-events

2011-10-05 Thread Lev Serebryakov
Hello, Miroslav.
You wrote 5 октября 2011 г., 12:24:06:

What RAID do you mean exactly? geom_stripe? geom_mirrot? geom_raid?
 Something else?
 I am mostly using geom_mirror.
  [SKIPPED]
  Oh, I see. Unfortunately, there is no GEOM metadata infrastructire,
GEOMs are too generic for this. I could design some meta-meta
framework, and unify all RAID classes with intenral metadtata
(geom_stripe, geom_concat, geom_mirror, geom_raid3 and my external
geom_raid5) to use it. In such case it will work -- kernel will not
pass providers with ditry metadtata to any GEOMs, but owners, for
tasting. Of course, classes like geom_part and geom_raid could not be
changed in such way -- they are forced to use pre-defined metadata
formats.

  It is good idea, but it should be separate project. And, yes, it
 will change metadata format for these GEOMs, so it will not be
 backward-compatible.

  And, yes, it seems to be much more intrusive change in GEOM
subsystem (because it will change tasting sequence), and should be
supervised by other developers from very beginning.

  I could write proposal in near future, with some design notes.

-- 
// Black Lion AKA Lev Serebryakov l...@freebsd.org

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: RFC: Project geom-events

2011-10-05 Thread Alexander Motin
On 05.10.2011 11:58, Lev Serebryakov wrote:
 Hello, Miroslav.
 You wrote 5 октября 2011 г., 12:24:06:
 
What RAID do you mean exactly? geom_stripe? geom_mirrot? geom_raid?
 Something else?
 I am mostly using geom_mirror.
   [SKIPPED]
   Oh, I see. Unfortunately, there is no GEOM metadata infrastructire,
 GEOMs are too generic for this. I could design some meta-meta
 framework, and unify all RAID classes with intenral metadtata
 (geom_stripe, geom_concat, geom_mirror, geom_raid3 and my external
 geom_raid5) to use it. In such case it will work -- kernel will not
 pass providers with ditry metadtata to any GEOMs, but owners, for
 tasting. Of course, classes like geom_part and geom_raid could not be
 changed in such way -- they are forced to use pre-defined metadata
 formats.

geom_raid addresses this problem in own way. As soon as RAID BIOSes
expect RAIDs to be built on raw physical devices and probe order is not
discussed, geom_raid exclusively opens underlying providers immediately
after detecting supported metadata. So even if volume is broken or
incomplete or this disk marked failed, or in any other case, this disk
won't be accessible for other GEOM classes. If administrator wishes to
reuse this disk for any other purpose, he should explicitly erase
on-disk metadata using graid tool or with dd after unloading geom_raid.

Up to the recent time geom tools didn't report geoms without providers.
Now there is special -a argument to report all of them. Also there is
-g to report geoms instead of providers, that is useful in such cases.

   It is good idea, but it should be separate project. And, yes, it
  will change metadata format for these GEOMs, so it will not be
  backward-compatible.
 
   And, yes, it seems to be much more intrusive change in GEOM
 subsystem (because it will change tasting sequence), and should be
 supervised by other developers from very beginning.
 
   I could write proposal in near future, with some design notes.

-- 
Alexander Motin
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: RFC: Project geom-events

2011-10-05 Thread Lev Serebryakov
Hello, Alexander.
You wrote 5 октября 2011 г., 13:18:34:

 geom_raid addresses this problem in own way. As soon as RAID BIOSes
 expect RAIDs to be built on raw physical devices and probe order is not
 discussed, geom_raid exclusively opens underlying providers immediately
 after detecting supported metadata. So even if volume is broken or
  But it could be not first, who taste component of mirror, am I
right? If geom_part will be first, will it take away component from
geom_raid? Or it could not?

  If it works in any case (exclusive open spoils geom_part), it could
be used in all other classes without any metadata infrastructure, but
it seems, that geom_mirror, for example, could pickup metadtata from
last parition instead of raw device...

  I'm not sure here.

  But, in any case, maybe standard first 16 bytes of metadata in
pure-GEOM classes and filter in GEOM infrastructure itself (not pass
provider for tasting to anything but class, written in first 16 bytes
of last sector) looks good idea, IMHO.



-- 
// Black Lion AKA Lev Serebryakov l...@freebsd.org

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: RFC: Project geom-events

2011-10-05 Thread Miroslav Lachman

Lev Serebryakov wrote:

Hello, Miroslav.
You wrote 5 октября 2011 г., 12:24:06:


What RAID do you mean exactly? geom_stripe? geom_mirrot? geom_raid?
Something else?

I am mostly using geom_mirror.

   [SKIPPED]
   Oh, I see. Unfortunately, there is no GEOM metadata infrastructure,
GEOMs are too generic for this. I could design some meta-meta
framework, and unify all RAID classes with intenral metadtata
(geom_stripe, geom_concat, geom_mirror, geom_raid3 and my external
geom_raid5) to use it. In such case it will work -- kernel will not
pass providers with dirty metadtata to any GEOMs, but owners, for
tasting. Of course, classes like geom_part and geom_raid could not be
changed in such way -- they are forced to use pre-defined metadata
formats.

   It is good idea, but it should be separate project. And, yes, it
  will change metadata format for these GEOMs, so it will not be
  backward-compatible.

   And, yes, it seems to be much more intrusive change in GEOM
subsystem (because it will change tasting sequence), and should be
supervised by other developers from very beginning.

   I could write proposal in near future, with some design notes.


I am waiting years for the moment, when these GEOM problems will be 
fixed, so I am really glad to see your interest!
It will be move to right direction even if changes will not be backward 
compatible.
The current state is too fragile to be used in production. Gmirror alone 
can be used, glabel alone can be used, GPT alone can be used... but mix 
it all stacked together is way to hell.


e.g. Using GPT on glabeled provider always ends with error message about 
corrupted secondary GPT table. (But how can I use iSCSI in reliable way 
if I cannot use glable on devices and iSCSI device can have different 
number on each reboot? I wrote about it almost 2 years ago)


GEOM layering possibilities are really amazing, but metadata, tasting 
and robustness in edge cases is not well done.


If you are able to come with some fixes in GEOM metadata implementation 
/ handling, I see better future :)
Unfortunately, I am not a C programmer, so I cannot write patches, but I 
can test whatever you will need in this area.


You are right, it should be separate project. I am looking forward to 
your proposal / wiki page.


Thank you again for your work on GEOM improvements!

Miroslav Lachman
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: RFC: Project geom-events

2011-10-05 Thread Scot Hetzel
2011/10/5 Miroslav Lachman 000.f...@quip.cz:
 I am waiting years for the moment, when these GEOM problems will be fixed,
 so I am really glad to see your interest!
 It will be move to right direction even if changes will not be backward
 compatible.
 The current state is too fragile to be used in production. Gmirror alone can
 be used, glabel alone can be used, GPT alone can be used... but mix it all
 stacked together is way to hell.

 e.g. Using GPT on glabeled provider always ends with error message about
 corrupted secondary GPT table. (But how can I use iSCSI in reliable way if I
 cannot use glable on devices and iSCSI device can have different number on
 each reboot? I wrote about it almost 2 years ago)

You don't need to use glabel on GPT disks, as gpart has it's own way
to label GPT disks:

 Fixit# gpart create -s gpt ad0
 Fixit# gpart add -s 4G -t freebsd-swap -l swap0 ad0
 Fixit# gpart add -t freebsd-zfs -l disk0 ad0

This create the following in /dev:

/dev/gpt/swap0
/dev/gpt/disk0

Glabel is not needed for GPT partitioned disks.  What should happen is
that glabel should fail when attempting to label a GPT disk.

If you wish to add a GPT label after the fact use:

gpart show geom
gpart modify -i index -l label geom

(i.e. geom = ad0)

Scot
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: RFC: Project geom-events

2011-10-05 Thread Alexander Motin
On 05.10.2011 12:29, Lev Serebryakov wrote:
 You wrote 5 октября 2011 г., 13:18:34:
 geom_raid addresses this problem in own way. As soon as RAID BIOSes
 expect RAIDs to be built on raw physical devices and probe order is not
 discussed, geom_raid exclusively opens underlying providers immediately
 after detecting supported metadata. So even if volume is broken or
   But it could be not first, who taste component of mirror, am I
 right? If geom_part will be first, will it take away component from
 geom_raid? Or it could not?

Most of GEOM classes are less aggressive. So geom_raid will any way
taste device finally and geom_part should be automatically spoiled as
soon as geom_raid open device.

   If it works in any case (exclusive open spoils geom_part), it could
 be used in all other classes without any metadata infrastructure,

That works perfect for case when class (geom_raid) is known to work on
raw device. Other RAID classes can be used over partitions, so some care
should be taken to avoid false positives.

 but
 it seems, that geom_mirror, for example, could pickup metadtata from
 last parition instead of raw device...
 
   I'm not sure here.

In that case it is helpful to include media size into the metadata.
Comparing that value with provider size during taste allows to avoid
these false positives. geom_mirror metadata include/check provider size
since version 3. Pity that MBR and probably others don't.

   But, in any case, maybe standard first 16 bytes of metadata in
 pure-GEOM classes and filter in GEOM infrastructure itself (not pass
 provider for tasting to anything but class, written in first 16 bytes
 of last sector) looks good idea, IMHO.

And what if class is not loaded/supported? There should be a way to
manage/clear that label.

-- 
Alexander Motin
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: RFC: Project geom-events

2011-10-05 Thread Miroslav Lachman

Scot Hetzel wrote:

2011/10/5 Miroslav Lachman000.f...@quip.cz:

I am waiting years for the moment, when these GEOM problems will be fixed,
so I am really glad to see your interest!
It will be move to right direction even if changes will not be backward
compatible.
The current state is too fragile to be used in production. Gmirror alone can
be used, glabel alone can be used, GPT alone can be used... but mix it all
stacked together is way to hell.

e.g. Using GPT on glabeled provider always ends with error message about
corrupted secondary GPT table. (But how can I use iSCSI in reliable way if I
cannot use glable on devices and iSCSI device can have different number on
each reboot? I wrote about it almost 2 years ago)


You don't need to use glabel on GPT disks, as gpart has it's own way
to label GPT disks:


[...]

The point was that glabel on disk device is successful, gpartitioning on 
glabeled device is successful, but metadata handling / device tasting is 
wrong after reboot and this should be fixed, not worked around.


Otherwise thank you for example with GPT labels, it can be useful in 
some cases.


Miroslav Lachman
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: RFC: Project geom-events

2011-10-05 Thread John-Mark Gurney
Lev Serebryakov wrote this message on Wed, Oct 05, 2011 at 12:51 +0400:
 Hello, Andrey.
 You wrote 5 ??? 2011 ?., 11:51:36:
 
  On 05.10.2011 10:39, Lev Serebryakov wrote:
 (1) Class and name of GEOM which is affected.
 (2) Name of provider which is affected.
 (3) Name of underlying provider which is lost (consumer from
 reporting GEOM's point of view).
 (4) Resulting state of affected provider (fixable, alive, dead).
 
  All except last could be get from the consumer in the orphan method.
   I'm afraid, that (2) could not be known too in generic way, as GEOM
 could have several providers, and only part of them could be affected by
 disconnection. Consumer contains geom (with class) and underlying
 provider, it is items (1) and (3)...
 
Other example -- geom_label creates and destroys about 10 labels on
  boot (on my test VM) and, if DESTROYED will be reported by very
  generic mechanism, it will end up with 10 e-mails to administrator on
  every boot -- I've got this, when put notifications in too generic
  place for first try.
  Ok, good point. Can you explain how your script will distinguish which
  actions are performed by administrator? Since change made by administrator
  could trigger disappearing of several child geoms.
   Not the script, but GEOMs themselves. They knows, why disk
  disappears. Of course, it work only one-level -- if administrator
  calls gmirror remove gm0 ada4 geom_mirror knows, that ada4 is no
  failed. Yes, I understand, that if here is configuration like this:
 
gmirror0
  gstripe0
ada0
ada1
  gstripe1
ada2
ada3
 
   and administrator kills gstripe0, for example, geom_mirror will send
  event, because from its point of view it is not administrative
  action...
   But such situations, IMHO, are not very often ones.

Won't gmirror still report COMPLETE after a gmirror remove?  So the
script can look at the gmirror device, and see that it is still
complete even though one of the providers were dropped and assume
it was an administrative command that did it..

-- 
  John-Mark Gurney  Voice: +1 415 225 5579

 All that I will do, has been done, All that I have, has not.
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: RFC: Project geom-events

2011-10-04 Thread Lev Serebryakov
Hello, Lev.
You wrote 4 октября 2011 г., 22:05:07:

Patch against CURRENT is attached.
  Oh, sorry, it seems, that patch is too big for list.
  http://lev.serebryakov.spb.ru/download/geom-events-1.0.head.patch.gz


-- 
// Black Lion AKA Lev Serebryakov l...@freebsd.org

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: RFC: Project geom-events

2011-10-04 Thread Alexander Motin
On 04.10.2011 21:12, Freddie Cash wrote:
 2011/10/4 Lev Serebryakov l...@freebsd.org mailto:l...@freebsd.org
 
  One thing is missed from software RAIDs is spare drives and state
 monitoring (yes, I know, that geom_raid supports spare drivers for
 metadata formats which supports them, but it not universal solution).
 
 
 Sounds impressive!  Will be very useful for those using GEOM-based RAID
 (gmirror, gstripe, graid3, graid5, etc).
 
 Just curious:  would the geom-events framework, and in particular the
 geom-events script, be useful for ZFS setups, for initiating
 replacements and providing hot-spare support? 

Now there is projects/zfsd branch that is doing alike things (disk
auto-insertion and hot spares) specifically for the case of ZFS. It also
uses devctl interface to receive events, but user-level part (zfsd
itself) is tightly hardcoded to talk to ZFS, fetching statuses and
making control actions. Not sure whether this functionality could be
scripted within geom-events, but having single mechanism indeed would be
nice.

-- 
Alexander Motin
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: RFC: Project geom-events

2011-10-04 Thread Freddie Cash
2011/10/4 Lev Serebryakov l...@freebsd.org

  One thing is missed from software RAIDs is spare drives and state
 monitoring (yes, I know, that geom_raid supports spare drivers for
 metadata formats which supports them, but it not universal solution).


Sounds impressive!  Will be very useful for those using GEOM-based RAID
(gmirror, gstripe, graid3, graid5, etc).

Just curious:  would the geom-events framework, and in particular the
geom-events script, be useful for ZFS setups, for initiating replacements
and providing hot-spare support?


-- 
Freddie Cash
fjwc...@gmail.com
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: RFC: Project geom-events

2011-10-04 Thread Garrett Cooper
On Oct 4, 2011, at 11:12 AM, Freddie Cash wrote:

 2011/10/4 Lev Serebryakov l...@freebsd.org
 
 One thing is missed from software RAIDs is spare drives and state
 monitoring (yes, I know, that geom_raid supports spare drivers for
 metadata formats which supports them, but it not universal solution).
 
 
 Sounds impressive!  Will be very useful for those using GEOM-based RAID
 (gmirror, gstripe, graid3, graid5, etc).
 
 Just curious:  would the geom-events framework, and in particular the
 geom-events script, be useful for ZFS setups, for initiating replacements
 and providing hot-spare support?

Work in the zfsd project branch already seems to do this properly. 
Please note that some HBAs (like mps) don't play well with hotswap on some 
branches, whereas others (mfi) might, depending on how things are coded up and 
chipset support.
Thanks,
-Garrett___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: RFC: Project geom-events

2011-10-04 Thread Lev Serebryakov
Hello, Freddie.
You wrote 4 октября 2011 г., 22:12:32:

 Sounds impressive!  Will be very useful for those using GEOM-based
 RAID (gmirror, gstripe, graid3, graid5, etc).

 Just curious:  would the geom-events framework, and in particular
 the geom-events script, be useful for ZFS setups, for initiating
 replacements and providing hot-spare support? 
  Script is configurable enough to adapt for any component removal and 
insertion command. ZFS needs to send proper event (with pool name in question, 
instead of GEOM name, for example), and some commands should be added to config 
file for zfs type of GEOM, but, I think, as ZFS has extensive userland 
library, zfsd solution could be better for ZFS.

  But anyway, I don't want to touch any ZFS sources, as they are very 
complicated.

  But I open for suggestions from ZFS team, of course.

-- 
// Black Lion AKA Lev Serebryakov l...@freebsd.org

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: RFC: Project geom-events

2011-10-04 Thread Freddie Cash
On Tue, Oct 4, 2011 at 12:15 PM, Garrett Cooper yaneg...@gmail.com wrote:

 On Oct 4, 2011, at 11:12 AM, Freddie Cash wrote:

  2011/10/4 Lev Serebryakov l...@freebsd.org
 
  One thing is missed from software RAIDs is spare drives and state
  monitoring (yes, I know, that geom_raid supports spare drivers for
  metadata formats which supports them, but it not universal solution).
 
  Sounds impressive!  Will be very useful for those using GEOM-based RAID
  (gmirror, gstripe, graid3, graid5, etc).
 
  Just curious:  would the geom-events framework, and in particular the
  geom-events script, be useful for ZFS setups, for initiating replacements
  and providing hot-spare support?

 Work in the zfsd project branch already seems to do this properly.
 Please note that some HBAs (like mps) don't play well with hotswap on some
 branches, whereas others (mfi) might, depending on how things are coded up
 and chipset support.


Cool!  Sounds like we're just around the corner from having a top-notch
software RAID stack via GEOM/ZFS with all the automatic goodies one
expects/hopes for.  :)

Keep up the good work people!!



-- 
Freddie Cash
fjwc...@gmail.com
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: RFC: Project geom-events

2011-10-04 Thread Alexander Motin
Miroslav Lachman wrote:
 Lev Serebryakov wrote:
One thing is missed from software RAIDs is spare drives and state
 monitoring (yes, I know, that geom_raid supports spare drivers for
 metadata formats which supports them, but it not universal solution).
 
 I am still missing one thing - dropped provider is not marked as failed
 RAID provider and is accessible for anything like normal disk device. So
 in some edge cases, the system can boot from failed RAID component
 instead of degraded RAID. This can cause data loss or demage.

To reliably boot from RAID array, you need help from some RAID BIOS.
While booting from correctly working gmirror is possible, it may not be
reliable when array is degraded. That is one of the main benefits of the
graid, comparing to gmirror -- cooperation with RAID BIOS.

Ability to track failed devices also depends on specific metadata
format. For example, for Intel RAID BIOS, metadata stored on each disk
include information about all other disks used now. As result, if some
disk fails and system is unable to update it's metadata any more, that
information can still be stored on other devices to prevent disk
resurrection in most cases.

-- 
Alexander Motin
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: RFC: Project geom-events

2011-10-04 Thread Miroslav Lachman

Lev Serebryakov wrote:

[...]


   One thing is missed from software RAIDs is spare drives and state
monitoring (yes, I know, that geom_raid supports spare drivers for
metadata formats which supports them, but it not universal solution).


I am still missing one thing - dropped provider is not marked as failed 
RAID provider and is accessible for anything like normal disk device. So 
in some edge cases, the system can boot from failed RAID component 
instead of degraded RAID. This can cause data loss or demage.


Is it possible to fix it by something like your geom-events, or should 
it be done in each GEOM RAID class separately?



But after all, I realy appreciate your work in this area! I hope I will 
have time to test it soon.


Thank you!

Miroslav Lachman
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: RFC: Project geom-events

2011-10-04 Thread Andrey V. Elsukov
On 04.10.2011 22:05, Lev Serebryakov wrote:
   So, here it is. GEOM Events.
 
   Project consists of several parts (all are ready and commited to
  project branch!):
 

Hi, Lev

   (5) Changes in all geom classes to post these events.

It seems that you could change only geom_dev.c to get most of what you want.
Actually, the part of your changes related to the DISCONNECT events, and
maybe DESTROY events could be implemented in the geom_dev.

-- 
WBR, Andrey V. Elsukov
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org