Re: [systemd-devel] [survey] BTRFS_IOC_DEVICES_READY return status

2015-06-17 Thread Goffredo Baroncelli
On 2015-06-15 19:38, Lennart Poettering wrote:
 On Mon, 15.06.15 19:23, Goffredo Baroncelli (kreij...@inwind.it) wrote:
 
 On 2015-06-15 12:46, Lennart Poettering wrote:
 On Sat, 13.06.15 17:09, Goffredo Baroncelli (kreij...@libero.it) wrote:

 Further, the problem will be more intense in this eg. if you use dd
 and copy device A to device B. After you mount device A, by just
 providing device B in the above two commands you could let kernel
 update the device path, again all the IO (since device is mounted)
 are still going to the device A (not B), but /proc/self/mounts and
 'btrfs fi show' shows it as device B (not A).

 Its a bug. very tricky to fix.

 In the past [*] I proposed a mount.btrfs helper . I tried to move the 
 logic outside the kernel.
 I think that the problem is that we try to manage all these cases
 from a device point of view: when a device appears, we register the
 device and we try to mount the filesystem... This works very well
 when there is 1-volume filesystem. For the other cases there is a
 mess between the different layers:

 - kernel
 - udev/systemd
 - initrd logic

 My attempt followed a different idea: the mount helper waits the
 devices if needed, or if it is the case it mounts the filesystem in
 degraded mode. All devices are passed as mount arguments
 (--device=/dev/sdX), there is no a device registration: this avoids
 all these problems.

 Hmm, no. /bin/mount should not block for devices. That's generally
 incompatible with how the tool is used, and in particular from
 systemd. We would not make use for such a scheme in
 systemd. /bin/mount should always be short-running.

 Apart systemd, which are these incompatibilities ? 
 
 Well, /bin/mount is not a daemon, and it should not be one.

My helper is not a deamon; you was correct the first time: it blocks until all 
needed/enough devices are appeared.
Anyway this should not be different from mounting a nfs filesystem. Even in 
this case the mount helper blocks until the connection happened. The block time 
is not negligible, even tough not long as a device timeout ... 

 
 I am pretty sure that if such automatic degraded mounting should be
 supported, then this should be done with some background storage
 daemon that alters the effect of the READY ioctl somehow after the
 timeout, and then retriggers the devcies so that systemd takes
 note. (or, alternatively: such a scheme could even be implemented all
 in kernel, based on some configurable kernel setting...)

 I recognize that this solution provides the maximum compatibility
 with the current implementation. However it seems too complex to
 me. Re-trigging a devices seems to me more a workaround than a
 solution.
 
 Well, it's not really ugly. I mean, if the state or properties of a
 device change, then udev should update its information about it, and
 that's done via a retrigger. We do that all the time already, for
 example when an existing loopback device gets a backing file assigned
 or removed. I am pretty sure that loopback case is very close to what
 you want to do here, hence retriggering (either from the kernel side,
 or from userspace), appears like an OK thing to do.

What seems strange to me is that in this case the devices don't have changed 
their status.
How this problem is managed in the md/dm raid cases ?

 
 Could a generator do this job ? I.e. this generator (or storage
 daemon) waits that all (or enough) devices are appeared, then it
 creates a .mount unit: do you think that it is doable ?
 
 systemd generators are a way to extend the systemd unit dep tree with
 units. They are very short running, and are executed only very very
 early at boot. They cannot wait for anything, they don#t have access
 to devices and are not run when they are appear.
 
 Lennart
 


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli kreijackATinwind.it
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [survey] BTRFS_IOC_DEVICES_READY return status

2015-06-17 Thread Andrei Borzenkov
В Wed, 17 Jun 2015 23:02:02 +0200
Lennart Poettering lenn...@poettering.net пишет:

 On Wed, 17.06.15 21:10, Goffredo Baroncelli (kreij...@libero.it) wrote:
 
   Well, /bin/mount is not a daemon, and it should not be one.
  
  My helper is not a deamon; you was correct the first time: it blocks
  until all needed/enough devices are appeared.
  Anyway this should not be different from mounting a nfs
  filesystem. Even in this case the mount helper blocks until the
  connection happened. The block time is not negligible, even tough
  not long as a device timeout ...
 
 Well, the mount tool doesn't wait for the network to be configured or
 so. It just waits for a response from the server. That's quite a
 difference.
 
   Well, it's not really ugly. I mean, if the state or properties of a
   device change, then udev should update its information about it, and
   that's done via a retrigger. We do that all the time already, for
   example when an existing loopback device gets a backing file assigned
   or removed. I am pretty sure that loopback case is very close to what
   you want to do here, hence retriggering (either from the kernel side,
   or from userspace), appears like an OK thing to do.
  
  What seems strange to me is that in this case the devices don't have 
  changed their status.
  How this problem is managed in the md/dm raid cases ?
 
 md has a daemon mdmon to my knowledge.
 

No, mdmon does something different. What mdadm does is to start timer
when RAID is complete enough to be started in degraded mode. If
notifications for missing devices appear after that, RAID is started
normally. If no notification appears until timer is expired, RAID is
started in degraded mode. 

ACTION==add|change, IMPORT{program}=BINDIR/mdadm --incremental --export 
$devnode --offroot ${DEVLINKS}
ACTION==add|change, ENV{MD_STARTED}==*unsafe*, ENV{MD_FOREIGN}==no, 
ENV{SYSTEMD_WANTS}+=mdadm-last-resort@$env{MD_DEVICE}.timer

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [survey] BTRFS_IOC_DEVICES_READY return status

2015-06-17 Thread Lennart Poettering
On Wed, 17.06.15 21:10, Goffredo Baroncelli (kreij...@libero.it) wrote:

  Well, /bin/mount is not a daemon, and it should not be one.
 
 My helper is not a deamon; you was correct the first time: it blocks
 until all needed/enough devices are appeared.
 Anyway this should not be different from mounting a nfs
 filesystem. Even in this case the mount helper blocks until the
 connection happened. The block time is not negligible, even tough
 not long as a device timeout ...

Well, the mount tool doesn't wait for the network to be configured or
so. It just waits for a response from the server. That's quite a
difference.

  Well, it's not really ugly. I mean, if the state or properties of a
  device change, then udev should update its information about it, and
  that's done via a retrigger. We do that all the time already, for
  example when an existing loopback device gets a backing file assigned
  or removed. I am pretty sure that loopback case is very close to what
  you want to do here, hence retriggering (either from the kernel side,
  or from userspace), appears like an OK thing to do.
 
 What seems strange to me is that in this case the devices don't have changed 
 their status.
 How this problem is managed in the md/dm raid cases ?

md has a daemon mdmon to my knowledge.

Lennart

-- 
Lennart Poettering, Red Hat
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [survey] BTRFS_IOC_DEVICES_READY return status

2015-06-15 Thread Lennart Poettering
On Sat, 13.06.15 17:35, Anand Jain (anand.j...@oracle.com) wrote:

 Are there any other users?
 
- If the the device in the argument is already mounted,
  can it straightaway return 0 (ready) ? (as of now it would
  again independently read the SB determine total_devices
  and check against num_devices.
 
 
 I think yes; obvious use case is btrfs mounted in initrd and later
 coldplug. There is no point to wait for anything as filesystem is
 obviously there.
 
 
  There is little difference. If the device is already mounted.
  And there are two device paths for the same device PA and PB.
  The path as last given to either 'btrfs dev scan (BTRFS_IOC_SCAN_DEV)'
  or 'btrfs device ready (BTRFS_IOC_DEVICES_READY)' will be shown
  in the 'btrfs filesystem show' or '/proc/self/mounts' output.
  It does not mean that btrfs kernel will close the first device path
  and reopen the 2nd given device path, it just updates the device path
  in the kernel.

The device paths shown in /proc/self/mountinfo is also weird in other
cases: if people boot up without initrd, and use a btrfs fs as root,
then it will always carry the string /dev/root in there, which is
completely useless, since such a device never exists in userspace or
/sys, and hence one cannot make sense of. Moreover, if one then asks
the kernel for the devices backing the btrfs fs via the ioctl it will
also return /dev/root for it, which is really useless.

I think in general I'd prefer if btrfs would stop returning the device
paths it got from userspace or the kernel, and would always return
sanitized ones that use the official kernel names for the devices in
them. Specifically, the member devices ioctl should always return
names like /dev/sda5, even if I mount something using root= on the
kernel cmdline, or if I mount /dev/disks/by-uuid/ via a symlink
instead of the real kernel name of the device.

Then, I think it would be a good idea to always update the device
string shown in /proc/self/mountinfo to be a concatenated version of
the list of device names reported by the ioctl. So that a btrfs RAID
would show /dev/sda5:/dev/sdb6:/dev/sdc5 or so. And if I remove or
add backing devices the string really should be updated.

The btrfs client side tools then could use udev to get a list of the
device node symlinks for each device to help the user identifying
which backing devices belong to a btrfs pool.

Lennart

-- 
Lennart Poettering, Red Hat
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [survey] BTRFS_IOC_DEVICES_READY return status

2015-06-15 Thread Lennart Poettering
On Fri, 12.06.15 21:16, Anand Jain (anand.j...@oracle.com) wrote:

 
 
 BTRFS_IOC_DEVICES_READY is to check if all the required devices
 are known by the btrfs kernel, so that admin/system-application
 could mount the FS. It is checked against a device in the argument.
 
 However the actual implementation is bit more than just that,
 in the way that it would also scan and register the device
 provided in the argument (same as btrfs device scan subcommand
 or BTRFS_IOC_SCAN_DEV ioctl).
 
 So BTRFS_IOC_DEVICES_READY ioctl isn't a read/view only ioctl,
 but its a write command as well.
 
 Next, since in the kernel we only check if total_devices
 (read from SB)  is equal to num_devices (counted in the list)
 to state the status as 0 (ready) or 1 (not ready). But this
 does not work in rest of the device pool state like missing,
 seeding, replacing since total_devices is actually not equal
 to num_devices in these state but device pool is ready for
 the mount and its a bug which is not part of this discussions.
 
 
 Questions:
 
  - Do we want BTRFS_IOC_DEVICES_READY ioctl to also scan and
register the device provided (same as btrfs device scan
command or the BTRFS_IOC_SCAN_DEV ioctl)
OR can BTRFS_IOC_DEVICES_READY be read-only ioctl interface
to check the state of the device pool. ?

I am pretty sure the kernel should not change API on this now. Hence:
stick to the current behaviour, please.

  - If the the device in the argument is already mounted,
can it straightaway return 0 (ready) ? (as of now it would
again independently read the SB determine total_devices
and check against num_devices.

Yeah, I figure that might make sense to do.

  - What should be the expected return when the FS is mounted
and there is a missing device.

An error, as it already does.

I am pretty sure that mounting degraded file systems should be an
exceptional operation, and not the common scheme. If it should happen
automatically at all, then it should be triggered by some daemon or
so, but not by udev/systemd.

Lennart

-- 
Lennart Poettering, Red Hat
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [survey] BTRFS_IOC_DEVICES_READY return status

2015-06-15 Thread Lennart Poettering
On Sat, 13.06.15 17:09, Goffredo Baroncelli (kreij...@libero.it) wrote:

  Further, the problem will be more intense in this eg. if you use dd
  and copy device A to device B. After you mount device A, by just
  providing device B in the above two commands you could let kernel
  update the device path, again all the IO (since device is mounted)
  are still going to the device A (not B), but /proc/self/mounts and
  'btrfs fi show' shows it as device B (not A).
  
  Its a bug. very tricky to fix.
 
 In the past [*] I proposed a mount.btrfs helper . I tried to move the logic 
 outside the kernel.
 I think that the problem is that we try to manage all these cases
 from a device point of view: when a device appears, we register the
 device and we try to mount the filesystem... This works very well
 when there is 1-volume filesystem. For the other cases there is a
 mess between the different layers:

 - kernel
 - udev/systemd
 - initrd logic
 
 My attempt followed a different idea: the mount helper waits the
 devices if needed, or if it is the case it mounts the filesystem in
 degraded mode. All devices are passed as mount arguments
 (--device=/dev/sdX), there is no a device registration: this avoids
 all these problems.

Hmm, no. /bin/mount should not block for devices. That's generally
incompatible with how the tool is used, and in particular from
systemd. We would not make use for such a scheme in
systemd. /bin/mount should always be short-running.

I am pretty sure that if such automatic degraded mounting should be
supported, then this should be done with some background storage
daemon that alters the effect of the READY ioctl somehow after the
timeout, and then retriggers the devcies so that systemd takes
note. (or, alternatively: such a scheme could even be implemented all
in kernel, based on some configurable kernel setting...)

Lennart

-- 
Lennart Poettering, Red Hat
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [survey] BTRFS_IOC_DEVICES_READY return status

2015-06-15 Thread David Sterba
On Fri, Jun 12, 2015 at 09:16:30PM +0800, Anand Jain wrote:
 BTRFS_IOC_DEVICES_READY is to check if all the required devices
 are known by the btrfs kernel, so that admin/system-application
 could mount the FS. It is checked against a device in the argument.
 
 However the actual implementation is bit more than just that,
 in the way that it would also scan and register the device
 provided in the argument (same as btrfs device scan subcommand
 or BTRFS_IOC_SCAN_DEV ioctl).
 
 So BTRFS_IOC_DEVICES_READY ioctl isn't a read/view only ioctl,
 but its a write command as well.

The implemented DEVICES_READY behaviour is intentional, but not a good
example of ioctl interface design. I asked for a more generic interface
to querying devices when this patch was submitted but to no outcome.

 Next, since in the kernel we only check if total_devices
 (read from SB)  is equal to num_devices (counted in the list)
 to state the status as 0 (ready) or 1 (not ready). But this
 does not work in rest of the device pool state like missing,
 seeding, replacing since total_devices is actually not equal
 to num_devices in these state but device pool is ready for
 the mount and its a bug which is not part of this discussions.

That's an example why the single-shot ioctl is bad - it relies on some
internal state that's otherwise nontrivial to get.

 Questions:
 
   - Do we want BTRFS_IOC_DEVICES_READY ioctl to also scan and
 register the device provided (same as btrfs device scan
 command or the BTRFS_IOC_SCAN_DEV ioctl)
 OR can BTRFS_IOC_DEVICES_READY be read-only ioctl interface
 to check the state of the device pool. ?

This has been mentioned in the thread, we cannot change the ioctl that
way. Extensions are possible as far as they stay backward compatible
without changes to the existing users.

   - If the the device in the argument is already mounted,
 can it straightaway return 0 (ready) ? (as of now it would
 again independently read the SB determine total_devices
 and check against num_devices.

We can do that, looks like a safe optimization.

   - What should be the expected return when the FS is mounted
 and there is a missing device.

I think the current ioctl cannot give a good answer to that, similar to
the seeding or dev-replace case. We'd need an improved ioctl or do it
via sysfs which is my preference at the moment.
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [survey] BTRFS_IOC_DEVICES_READY return status

2015-06-15 Thread Lennart Poettering
On Mon, 15.06.15 19:23, Goffredo Baroncelli (kreij...@inwind.it) wrote:

 On 2015-06-15 12:46, Lennart Poettering wrote:
  On Sat, 13.06.15 17:09, Goffredo Baroncelli (kreij...@libero.it) wrote:
  
  Further, the problem will be more intense in this eg. if you use dd
  and copy device A to device B. After you mount device A, by just
  providing device B in the above two commands you could let kernel
  update the device path, again all the IO (since device is mounted)
  are still going to the device A (not B), but /proc/self/mounts and
  'btrfs fi show' shows it as device B (not A).
 
  Its a bug. very tricky to fix.
 
  In the past [*] I proposed a mount.btrfs helper . I tried to move the 
  logic outside the kernel.
  I think that the problem is that we try to manage all these cases
  from a device point of view: when a device appears, we register the
  device and we try to mount the filesystem... This works very well
  when there is 1-volume filesystem. For the other cases there is a
  mess between the different layers:
  
  - kernel
  - udev/systemd
  - initrd logic
 
  My attempt followed a different idea: the mount helper waits the
  devices if needed, or if it is the case it mounts the filesystem in
  degraded mode. All devices are passed as mount arguments
  (--device=/dev/sdX), there is no a device registration: this avoids
  all these problems.
  
  Hmm, no. /bin/mount should not block for devices. That's generally
  incompatible with how the tool is used, and in particular from
  systemd. We would not make use for such a scheme in
  systemd. /bin/mount should always be short-running.
 
 Apart systemd, which are these incompatibilities ? 

Well, /bin/mount is not a daemon, and it should not be one.

  I am pretty sure that if such automatic degraded mounting should be
  supported, then this should be done with some background storage
  daemon that alters the effect of the READY ioctl somehow after the
  timeout, and then retriggers the devcies so that systemd takes
  note. (or, alternatively: such a scheme could even be implemented all
  in kernel, based on some configurable kernel setting...)
 
 I recognize that this solution provides the maximum compatibility
 with the current implementation. However it seems too complex to
 me. Re-trigging a devices seems to me more a workaround than a
 solution.

Well, it's not really ugly. I mean, if the state or properties of a
device change, then udev should update its information about it, and
that's done via a retrigger. We do that all the time already, for
example when an existing loopback device gets a backing file assigned
or removed. I am pretty sure that loopback case is very close to what
you want to do here, hence retriggering (either from the kernel side,
or from userspace), appears like an OK thing to do.

 Could a generator do this job ? I.e. this generator (or storage
 daemon) waits that all (or enough) devices are appeared, then it
 creates a .mount unit: do you think that it is doable ?

systemd generators are a way to extend the systemd unit dep tree with
units. They are very short running, and are executed only very very
early at boot. They cannot wait for anything, they don#t have access
to devices and are not run when they are appear.

Lennart

-- 
Lennart Poettering, Red Hat
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [survey] BTRFS_IOC_DEVICES_READY return status

2015-06-14 Thread Goffredo Baroncelli
On 2015-06-14 06:05, Duncan wrote:
 Goffredo Baroncelli posted on Sat, 13 Jun 2015 17:09:19 +0200 as
 excerpted:
 
 My attempt followed a different idea: the mount helper waits the devices
 if needed, or if it is the case it mounts the filesystem in degraded
 mode.
 All devices are passed as mount arguments (--device=/dev/sdX), there is
 no a device registration: this avoids all these problems.

 [*] http://permalink.gmane.org/gmane.comp.file-systems.btrfs/40767
 
 But /dev/sdX doesn't always work, because, for instance, my usual /dev/sdb 
 was slow to respond on my last boot, and currently appears as /dev/sdf, 
 with sdb/c/d/e being my (multi-type) sdcard, etc, adapter, medialess.

Please give a look to my patch.

You may mount the filesystem in different way:
- by device (/dev/sdxxx)
- by UUID (UUID=)
- by LABEL (LABEL=)

The helper finds the right devices and (eventually) waits for the other devices.
When it has collected all the devices, these are passed to the kernel via 
the device=/dev/sdx mount option. So the registration would not be needed 
anymore.

 
 Tho if /dev/disk/by-*/* works, I could use that.  Tho AFAIK it's udev 
 that fills that in, so udev would be necessary.

I never wrote that udev is not necessary. I think only that relying to udev
to handling a multi-volume filesystem is too complicated. The responsibility 
is spread in too much layer.



-- 
gpg @keyserver.linux.it: Goffredo Baroncelli kreijackATinwind.it
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [survey] BTRFS_IOC_DEVICES_READY return status

2015-06-13 Thread Duncan
Goffredo Baroncelli posted on Sat, 13 Jun 2015 17:09:19 +0200 as
excerpted:

 My attempt followed a different idea: the mount helper waits the devices
 if needed, or if it is the case it mounts the filesystem in degraded
 mode.
 All devices are passed as mount arguments (--device=/dev/sdX), there is
 no a device registration: this avoids all these problems.
 
 [*] http://permalink.gmane.org/gmane.comp.file-systems.btrfs/40767

But /dev/sdX doesn't always work, because, for instance, my usual /dev/sdb 
was slow to respond on my last boot, and currently appears as /dev/sdf, 
with sdb/c/d/e being my (multi-type) sdcard, etc, adapter, medialess.

Tho if /dev/disk/by-*/* works, I could use that.  Tho AFAIK it's udev 
that fills that in, so udev would be necessary.

-- 
Duncan - List replies preferred.   No HTML msgs.
Every nonfree program has a lord, a master --
and if you use the program, he is your master.  Richard Stallman

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [survey] BTRFS_IOC_DEVICES_READY return status

2015-06-13 Thread Andrei Borzenkov
В Sat, 13 Jun 2015 17:35:53 +0800
Anand Jain anand.j...@oracle.com пишет:

 
 Thanks for your reply Andrei and Goffredo.
 more below...
 
 On 06/13/2015 04:08 AM, Goffredo Baroncelli wrote:
  On 2015-06-12 20:04, Andrei Borzenkov wrote:
  В Fri, 12 Jun 2015 21:16:30 +0800
  Anand Jain anand.j...@oracle.com пишет:
 
 
 
  BTRFS_IOC_DEVICES_READY is to check if all the required devices
  are known by the btrfs kernel, so that admin/system-application
  could mount the FS. It is checked against a device in the argument.
 
  However the actual implementation is bit more than just that,
  in the way that it would also scan and register the device
  provided in the argument (same as btrfs device scan subcommand
  or BTRFS_IOC_SCAN_DEV ioctl).
 
  So BTRFS_IOC_DEVICES_READY ioctl isn't a read/view only ioctl,
  but its a write command as well.
 
  Next, since in the kernel we only check if total_devices
  (read from SB)  is equal to num_devices (counted in the list)
  to state the status as 0 (ready) or 1 (not ready). But this
  does not work in rest of the device pool state like missing,
  seeding, replacing since total_devices is actually not equal
  to num_devices in these state but device pool is ready for
  the mount and its a bug which is not part of this discussions.
 
 
  Questions:
 
 - Do we want BTRFS_IOC_DEVICES_READY ioctl to also scan and
   register the device provided (same as btrfs device scan
   command or the BTRFS_IOC_SCAN_DEV ioctl)
   OR can BTRFS_IOC_DEVICES_READY be read-only ioctl interface
   to check the state of the device pool. ?
 
 
  udev is using it to incrementally assemble multi-device btrfs, so in
  this case I think it should.
 
   Nice. Thanks for letting me know this.
 
  I agree, the ioctl name is confusing, but unfortunately this is an API and
  it has to be stay here forever. Udev uses it, so we know for sure that it
  is widely used.
 
   ok. what goes in stays there forever. its time to update
   the man page rather.
 
  Are there any other users?
 
 - If the the device in the argument is already mounted,
   can it straightaway return 0 (ready) ? (as of now it would
   again independently read the SB determine total_devices
   and check against num_devices.
 
 
  I think yes; obvious use case is btrfs mounted in initrd and later
  coldplug. There is no point to wait for anything as filesystem is
  obviously there.
 
 
   There is little difference. If the device is already mounted.
   And there are two device paths for the same device PA and PB.
   The path as last given to either 'btrfs dev scan (BTRFS_IOC_SCAN_DEV)'
   or 'btrfs device ready (BTRFS_IOC_DEVICES_READY)' will be shown
   in the 'btrfs filesystem show' or '/proc/self/mounts' output.
   It does not mean that btrfs kernel will close the first device path
   and reopen the 2nd given device path, it just updates the device path
   in the kernel.
 
   Further, the problem will be more intense in this eg.
   if you use dd and copy device A to device B.
   After you mount device A, by just providing device B in the
   above two commands you could let kernel update the device path,
   again all the IO (since device is mounted) are still going to
   the device A (not B), but /proc/self/mounts and 'btrfs fi show'
   shows it as device B (not A).
 
   Its a bug. very tricky to fix.
 
- we can't return -EBUSY for subsequent (after mount) calls
for the above two ioctls (if a mounted device is used as an argument).
Since admin/system-application might actually call again to
mount subvols.
 
- we can return success (without updating the device path) but,
we would be wrong when device A is copied into device B using dd.
Since we would check against the on device SB's fsid/uuid/devid.
Checking using strcmp the device paths is not practical since there
can be different paths to the same device (lets says mapper).
 

Neither of those problems are specific to mounted filesystem. The
order of device discovery is non-deterministic. If you duplicate devices
(snapshot, dd) it is unpredictable which devices will be included in
btrfs. I.e. if you have A, B, C and A1, B1, C1 filesystem could be
assembled as A, B, C1 and next boot as A, B1, C.

Other systems attempt to mitigate such situation by keeping track of
both on-disk identification and physical device properties (e.g.
changing LU number will cause VMware to block access to disk on
assumption that it is snapshot). One possibility is to store disk
physical identity (UUID, serial number) and compare on access. 

Unless this is done, to guard against such case full device scan must
be performed and attempt to mount such filesystem (that has duplicated
members) blocked until admin resolves the issue. If filesystem is already
mounted, any attempt to add duplicated member must be rejected.

(any suggestion on how to check if its the same device in the
kernel?).
 

I do not know kernel interfaces, but 

Re: [systemd-devel] [survey] BTRFS_IOC_DEVICES_READY return status

2015-06-13 Thread Goffredo Baroncelli
On 2015-06-13 11:35, Anand Jain wrote:
 
 Thanks for your reply Andrei and Goffredo. more below...
 
 On 06/13/2015 04:08 AM, Goffredo Baroncelli wrote:
 On 2015-06-12 20:04, Andrei Borzenkov wrote:
 В Fri, 12 Jun 2015 21:16:30 +0800 Anand Jain
 anand.j...@oracle.com пишет:
 
 
 
 BTRFS_IOC_DEVICES_READY is to check if all the required
 devices are known by the btrfs kernel, so that
 admin/system-application could mount the FS. It is checked
 against a device in the argument.
 
 However the actual implementation is bit more than just that, 
 in the way that it would also scan and register the device 
 provided in the argument (same as btrfs device scan subcommand 
 or BTRFS_IOC_SCAN_DEV ioctl).
 
 So BTRFS_IOC_DEVICES_READY ioctl isn't a read/view only ioctl, 
 but its a write command as well.
 
 Next, since in the kernel we only check if total_devices (read
 from SB)  is equal to num_devices (counted in the list) to
 state the status as 0 (ready) or 1 (not ready). But this does
 not work in rest of the device pool state like missing, 
 seeding, replacing since total_devices is actually not equal to
 num_devices in these state but device pool is ready for the
 mount and its a bug which is not part of this discussions.
 
 
 Questions:
 
 - Do we want BTRFS_IOC_DEVICES_READY ioctl to also scan and 
 register the device provided (same as btrfs device scan command
 or the BTRFS_IOC_SCAN_DEV ioctl) OR can BTRFS_IOC_DEVICES_READY
 be read-only ioctl interface to check the state of the device
 pool. ?
 
 
 udev is using it to incrementally assemble multi-device btrfs, so
 in this case I think it should.
 
 Nice. Thanks for letting me know this.
 
 I agree, the ioctl name is confusing, but unfortunately this is an
 API and it has to be stay here forever. Udev uses it, so we know
 for sure that it is widely used.
 
 ok. what goes in stays there forever. its time to update the man page
 rather.
 
 Are there any other users?
 
 - If the the device in the argument is already mounted, can it
 straightaway return 0 (ready) ? (as of now it would again
 independently read the SB determine total_devices and check
 against num_devices.
 
 
 I think yes; obvious use case is btrfs mounted in initrd and
 later coldplug. There is no point to wait for anything as
 filesystem is obviously there.
 
 
 There is little difference. If the device is already mounted. And
 there are two device paths for the same device PA and PB. The path as
 last given to either 'btrfs dev scan (BTRFS_IOC_SCAN_DEV)' or 'btrfs
 device ready (BTRFS_IOC_DEVICES_READY)' will be shown in the 'btrfs
 filesystem show' or '/proc/self/mounts' output. It does not mean that
 btrfs kernel will close the first device path and reopen the 2nd
 given device path, it just updates the device path in the kernel.
 
 Further, the problem will be more intense in this eg. if you use dd
 and copy device A to device B. After you mount device A, by just
 providing device B in the above two commands you could let kernel
 update the device path, again all the IO (since device is mounted)
 are still going to the device A (not B), but /proc/self/mounts and
 'btrfs fi show' shows it as device B (not A).
 
 Its a bug. very tricky to fix.

In the past [*] I proposed a mount.btrfs helper . I tried to move the logic 
outside the kernel.
I think that the problem is that we try to manage all these cases from a device 
point of view: when a device appears, we register the device and we try to 
mount the filesystem... This works very well when there is 1-volume filesystem. 
For the other cases there is a mess between the different layers:
- kernel
- udev/systemd
- initrd logic

My attempt followed a different idea: the mount helper waits the devices if 
needed, or if it is the case it mounts the filesystem in degraded mode. All 
devices are passed as mount arguments (--device=/dev/sdX), there is no a device 
registration: this avoids all these problems.

[*] http://permalink.gmane.org/gmane.comp.file-systems.btrfs/40767

back to your questions

 - we can't return -EBUSY for subsequent (after mount) calls for the
 above two ioctls (if a mounted device is used as an argument). Since
 admin/system-application might actually call again to mount subvols.

I am not sure that the two things are related: the mount doesn't use 
BTRFS_IOC_DEVICES_READY. After BTRFS_IOC_DEVICES_READY returns OK, all the 
filesystem belongs this FSID should be mounted; but it is a job of 
systemd/initramfs/sysv... a further failed BTRFS_IOC_DEVICES_READY shouldn't 
case any problem ...


 
 - we can return success (without updating the device path) but, we
 would be wrong when device A is copied into device B using dd. Since
 we would check against the on device SB's fsid/uuid/devid. Checking
 using strcmp the device paths is not practical since there can be
 different paths to the same device (lets says mapper).

 
 (any suggestion on how to check if its the same device in the 
 kernel?).

check minor/major ?

 
 - 

Re: [systemd-devel] [survey] BTRFS_IOC_DEVICES_READY return status

2015-06-13 Thread Anand Jain


Thanks for your reply Andrei and Goffredo.
more below...

On 06/13/2015 04:08 AM, Goffredo Baroncelli wrote:

On 2015-06-12 20:04, Andrei Borzenkov wrote:

В Fri, 12 Jun 2015 21:16:30 +0800
Anand Jain anand.j...@oracle.com пишет:




BTRFS_IOC_DEVICES_READY is to check if all the required devices
are known by the btrfs kernel, so that admin/system-application
could mount the FS. It is checked against a device in the argument.

However the actual implementation is bit more than just that,
in the way that it would also scan and register the device
provided in the argument (same as btrfs device scan subcommand
or BTRFS_IOC_SCAN_DEV ioctl).

So BTRFS_IOC_DEVICES_READY ioctl isn't a read/view only ioctl,
but its a write command as well.

Next, since in the kernel we only check if total_devices
(read from SB)  is equal to num_devices (counted in the list)
to state the status as 0 (ready) or 1 (not ready). But this
does not work in rest of the device pool state like missing,
seeding, replacing since total_devices is actually not equal
to num_devices in these state but device pool is ready for
the mount and its a bug which is not part of this discussions.


Questions:

   - Do we want BTRFS_IOC_DEVICES_READY ioctl to also scan and
 register the device provided (same as btrfs device scan
 command or the BTRFS_IOC_SCAN_DEV ioctl)
 OR can BTRFS_IOC_DEVICES_READY be read-only ioctl interface
 to check the state of the device pool. ?



udev is using it to incrementally assemble multi-device btrfs, so in
this case I think it should.


 Nice. Thanks for letting me know this.


I agree, the ioctl name is confusing, but unfortunately this is an API and
it has to be stay here forever. Udev uses it, so we know for sure that it
is widely used.


 ok. what goes in stays there forever. its time to update
 the man page rather.


Are there any other users?


   - If the the device in the argument is already mounted,
 can it straightaway return 0 (ready) ? (as of now it would
 again independently read the SB determine total_devices
 and check against num_devices.



I think yes; obvious use case is btrfs mounted in initrd and later
coldplug. There is no point to wait for anything as filesystem is
obviously there.



 There is little difference. If the device is already mounted.
 And there are two device paths for the same device PA and PB.
 The path as last given to either 'btrfs dev scan (BTRFS_IOC_SCAN_DEV)'
 or 'btrfs device ready (BTRFS_IOC_DEVICES_READY)' will be shown
 in the 'btrfs filesystem show' or '/proc/self/mounts' output.
 It does not mean that btrfs kernel will close the first device path
 and reopen the 2nd given device path, it just updates the device path
 in the kernel.

 Further, the problem will be more intense in this eg.
 if you use dd and copy device A to device B.
 After you mount device A, by just providing device B in the
 above two commands you could let kernel update the device path,
 again all the IO (since device is mounted) are still going to
 the device A (not B), but /proc/self/mounts and 'btrfs fi show'
 shows it as device B (not A).

 Its a bug. very tricky to fix.

  - we can't return -EBUSY for subsequent (after mount) calls
  for the above two ioctls (if a mounted device is used as an argument).
  Since admin/system-application might actually call again to
  mount subvols.

  - we can return success (without updating the device path) but,
  we would be wrong when device A is copied into device B using dd.
  Since we would check against the on device SB's fsid/uuid/devid.
  Checking using strcmp the device paths is not practical since there
  can be different paths to the same device (lets says mapper).

  (any suggestion on how to check if its the same device in the
  kernel?).

  - Also if we don't let to update the device path after device is
  mounted, then are there chances that we would be stuck with the
  device path during initrd which does not make any sense to the
  user ?



   - What should be the expected return when the FS is mounted
 and there is a missing device.


I suggest to not invest further energy on a ioctl API. If you want these kind 
of information, you (we) should export these in sysfs:
In an ideal world:

- a new btrfs device appears
- udev register it with BTRFS_IOC_SCAN_DEV:
- udev (or mount ?) checks the status of the filesystem reading the sysfs 
entries (total devices, present devices, seed devices, raid level); on the 
basis of the local policy (allow degraded mount, device timeout, how many 
device are missing, filesystem redundancy level.) udev (mount) may mount 
the filesystem with the appropriate parameter (ro, degraded, or even insert a 
spare device to correct a missing device)


 Yes. sysfs interface is coming. few framework patch were sent sometime
 back, any comments will help. On the ioctl part I am trying to fix the
 bug(s).





This is similar to problem mdadm had to solve. mdadm starts timer as
soon 

[systemd-devel] [survey] BTRFS_IOC_DEVICES_READY return status

2015-06-12 Thread Anand Jain



BTRFS_IOC_DEVICES_READY is to check if all the required devices
are known by the btrfs kernel, so that admin/system-application
could mount the FS. It is checked against a device in the argument.

However the actual implementation is bit more than just that,
in the way that it would also scan and register the device
provided in the argument (same as btrfs device scan subcommand
or BTRFS_IOC_SCAN_DEV ioctl).

So BTRFS_IOC_DEVICES_READY ioctl isn't a read/view only ioctl,
but its a write command as well.

Next, since in the kernel we only check if total_devices
(read from SB)  is equal to num_devices (counted in the list)
to state the status as 0 (ready) or 1 (not ready). But this
does not work in rest of the device pool state like missing,
seeding, replacing since total_devices is actually not equal
to num_devices in these state but device pool is ready for
the mount and its a bug which is not part of this discussions.


Questions:

 - Do we want BTRFS_IOC_DEVICES_READY ioctl to also scan and
   register the device provided (same as btrfs device scan
   command or the BTRFS_IOC_SCAN_DEV ioctl)
   OR can BTRFS_IOC_DEVICES_READY be read-only ioctl interface
   to check the state of the device pool. ?

 - If the the device in the argument is already mounted,
   can it straightaway return 0 (ready) ? (as of now it would
   again independently read the SB determine total_devices
   and check against num_devices.

 - What should be the expected return when the FS is mounted
   and there is a missing device.


Thanks, Anand
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [survey] BTRFS_IOC_DEVICES_READY return status

2015-06-12 Thread Andrei Borzenkov
В Fri, 12 Jun 2015 21:16:30 +0800
Anand Jain anand.j...@oracle.com пишет:

 
 
 BTRFS_IOC_DEVICES_READY is to check if all the required devices
 are known by the btrfs kernel, so that admin/system-application
 could mount the FS. It is checked against a device in the argument.
 
 However the actual implementation is bit more than just that,
 in the way that it would also scan and register the device
 provided in the argument (same as btrfs device scan subcommand
 or BTRFS_IOC_SCAN_DEV ioctl).
 
 So BTRFS_IOC_DEVICES_READY ioctl isn't a read/view only ioctl,
 but its a write command as well.
 
 Next, since in the kernel we only check if total_devices
 (read from SB)  is equal to num_devices (counted in the list)
 to state the status as 0 (ready) or 1 (not ready). But this
 does not work in rest of the device pool state like missing,
 seeding, replacing since total_devices is actually not equal
 to num_devices in these state but device pool is ready for
 the mount and its a bug which is not part of this discussions.
 
 
 Questions:
 
   - Do we want BTRFS_IOC_DEVICES_READY ioctl to also scan and
 register the device provided (same as btrfs device scan
 command or the BTRFS_IOC_SCAN_DEV ioctl)
 OR can BTRFS_IOC_DEVICES_READY be read-only ioctl interface
 to check the state of the device pool. ?
 

udev is using it to incrementally assemble multi-device btrfs, so in
this case I think it should. Are there any other users?

   - If the the device in the argument is already mounted,
 can it straightaway return 0 (ready) ? (as of now it would
 again independently read the SB determine total_devices
 and check against num_devices.
 

I think yes; obvious use case is btrfs mounted in initrd and later
coldplug. There is no point to wait for anything as filesystem is
obviously there.

   - What should be the expected return when the FS is mounted
 and there is a missing device.
 

This is similar to problem mdadm had to solve. mdadm starts timer as
soon as enough raid devices are present; if timer expires before raid
is complete, raid is started in degraded mode. This avoids spurious
rebuilds. So it would be good if btrfs could distinguish between enough
devices to mount and all devices.
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel