Re: [systemd-devel] [survey] BTRFS_IOC_DEVICES_READY return status
On 2015-06-15 19:38, Lennart Poettering wrote: On Mon, 15.06.15 19:23, Goffredo Baroncelli (kreij...@inwind.it) wrote: On 2015-06-15 12:46, Lennart Poettering wrote: On Sat, 13.06.15 17:09, Goffredo Baroncelli (kreij...@libero.it) wrote: Further, the problem will be more intense in this eg. if you use dd and copy device A to device B. After you mount device A, by just providing device B in the above two commands you could let kernel update the device path, again all the IO (since device is mounted) are still going to the device A (not B), but /proc/self/mounts and 'btrfs fi show' shows it as device B (not A). Its a bug. very tricky to fix. In the past [*] I proposed a mount.btrfs helper . I tried to move the logic outside the kernel. I think that the problem is that we try to manage all these cases from a device point of view: when a device appears, we register the device and we try to mount the filesystem... This works very well when there is 1-volume filesystem. For the other cases there is a mess between the different layers: - kernel - udev/systemd - initrd logic My attempt followed a different idea: the mount helper waits the devices if needed, or if it is the case it mounts the filesystem in degraded mode. All devices are passed as mount arguments (--device=/dev/sdX), there is no a device registration: this avoids all these problems. Hmm, no. /bin/mount should not block for devices. That's generally incompatible with how the tool is used, and in particular from systemd. We would not make use for such a scheme in systemd. /bin/mount should always be short-running. Apart systemd, which are these incompatibilities ? Well, /bin/mount is not a daemon, and it should not be one. My helper is not a deamon; you was correct the first time: it blocks until all needed/enough devices are appeared. Anyway this should not be different from mounting a nfs filesystem. Even in this case the mount helper blocks until the connection happened. The block time is not negligible, even tough not long as a device timeout ... I am pretty sure that if such automatic degraded mounting should be supported, then this should be done with some background storage daemon that alters the effect of the READY ioctl somehow after the timeout, and then retriggers the devcies so that systemd takes note. (or, alternatively: such a scheme could even be implemented all in kernel, based on some configurable kernel setting...) I recognize that this solution provides the maximum compatibility with the current implementation. However it seems too complex to me. Re-trigging a devices seems to me more a workaround than a solution. Well, it's not really ugly. I mean, if the state or properties of a device change, then udev should update its information about it, and that's done via a retrigger. We do that all the time already, for example when an existing loopback device gets a backing file assigned or removed. I am pretty sure that loopback case is very close to what you want to do here, hence retriggering (either from the kernel side, or from userspace), appears like an OK thing to do. What seems strange to me is that in this case the devices don't have changed their status. How this problem is managed in the md/dm raid cases ? Could a generator do this job ? I.e. this generator (or storage daemon) waits that all (or enough) devices are appeared, then it creates a .mount unit: do you think that it is doable ? systemd generators are a way to extend the systemd unit dep tree with units. They are very short running, and are executed only very very early at boot. They cannot wait for anything, they don#t have access to devices and are not run when they are appear. Lennart -- gpg @keyserver.linux.it: Goffredo Baroncelli kreijackATinwind.it Key fingerprint BBF5 1610 0B64 DAC6 5F7D 17B2 0EDA 9B37 8B82 E0B5 ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] [survey] BTRFS_IOC_DEVICES_READY return status
В Wed, 17 Jun 2015 23:02:02 +0200 Lennart Poettering lenn...@poettering.net пишет: On Wed, 17.06.15 21:10, Goffredo Baroncelli (kreij...@libero.it) wrote: Well, /bin/mount is not a daemon, and it should not be one. My helper is not a deamon; you was correct the first time: it blocks until all needed/enough devices are appeared. Anyway this should not be different from mounting a nfs filesystem. Even in this case the mount helper blocks until the connection happened. The block time is not negligible, even tough not long as a device timeout ... Well, the mount tool doesn't wait for the network to be configured or so. It just waits for a response from the server. That's quite a difference. Well, it's not really ugly. I mean, if the state or properties of a device change, then udev should update its information about it, and that's done via a retrigger. We do that all the time already, for example when an existing loopback device gets a backing file assigned or removed. I am pretty sure that loopback case is very close to what you want to do here, hence retriggering (either from the kernel side, or from userspace), appears like an OK thing to do. What seems strange to me is that in this case the devices don't have changed their status. How this problem is managed in the md/dm raid cases ? md has a daemon mdmon to my knowledge. No, mdmon does something different. What mdadm does is to start timer when RAID is complete enough to be started in degraded mode. If notifications for missing devices appear after that, RAID is started normally. If no notification appears until timer is expired, RAID is started in degraded mode. ACTION==add|change, IMPORT{program}=BINDIR/mdadm --incremental --export $devnode --offroot ${DEVLINKS} ACTION==add|change, ENV{MD_STARTED}==*unsafe*, ENV{MD_FOREIGN}==no, ENV{SYSTEMD_WANTS}+=mdadm-last-resort@$env{MD_DEVICE}.timer ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] [survey] BTRFS_IOC_DEVICES_READY return status
On Wed, 17.06.15 21:10, Goffredo Baroncelli (kreij...@libero.it) wrote: Well, /bin/mount is not a daemon, and it should not be one. My helper is not a deamon; you was correct the first time: it blocks until all needed/enough devices are appeared. Anyway this should not be different from mounting a nfs filesystem. Even in this case the mount helper blocks until the connection happened. The block time is not negligible, even tough not long as a device timeout ... Well, the mount tool doesn't wait for the network to be configured or so. It just waits for a response from the server. That's quite a difference. Well, it's not really ugly. I mean, if the state or properties of a device change, then udev should update its information about it, and that's done via a retrigger. We do that all the time already, for example when an existing loopback device gets a backing file assigned or removed. I am pretty sure that loopback case is very close to what you want to do here, hence retriggering (either from the kernel side, or from userspace), appears like an OK thing to do. What seems strange to me is that in this case the devices don't have changed their status. How this problem is managed in the md/dm raid cases ? md has a daemon mdmon to my knowledge. Lennart -- Lennart Poettering, Red Hat ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] [survey] BTRFS_IOC_DEVICES_READY return status
On Sat, 13.06.15 17:35, Anand Jain (anand.j...@oracle.com) wrote: Are there any other users? - If the the device in the argument is already mounted, can it straightaway return 0 (ready) ? (as of now it would again independently read the SB determine total_devices and check against num_devices. I think yes; obvious use case is btrfs mounted in initrd and later coldplug. There is no point to wait for anything as filesystem is obviously there. There is little difference. If the device is already mounted. And there are two device paths for the same device PA and PB. The path as last given to either 'btrfs dev scan (BTRFS_IOC_SCAN_DEV)' or 'btrfs device ready (BTRFS_IOC_DEVICES_READY)' will be shown in the 'btrfs filesystem show' or '/proc/self/mounts' output. It does not mean that btrfs kernel will close the first device path and reopen the 2nd given device path, it just updates the device path in the kernel. The device paths shown in /proc/self/mountinfo is also weird in other cases: if people boot up without initrd, and use a btrfs fs as root, then it will always carry the string /dev/root in there, which is completely useless, since such a device never exists in userspace or /sys, and hence one cannot make sense of. Moreover, if one then asks the kernel for the devices backing the btrfs fs via the ioctl it will also return /dev/root for it, which is really useless. I think in general I'd prefer if btrfs would stop returning the device paths it got from userspace or the kernel, and would always return sanitized ones that use the official kernel names for the devices in them. Specifically, the member devices ioctl should always return names like /dev/sda5, even if I mount something using root= on the kernel cmdline, or if I mount /dev/disks/by-uuid/ via a symlink instead of the real kernel name of the device. Then, I think it would be a good idea to always update the device string shown in /proc/self/mountinfo to be a concatenated version of the list of device names reported by the ioctl. So that a btrfs RAID would show /dev/sda5:/dev/sdb6:/dev/sdc5 or so. And if I remove or add backing devices the string really should be updated. The btrfs client side tools then could use udev to get a list of the device node symlinks for each device to help the user identifying which backing devices belong to a btrfs pool. Lennart -- Lennart Poettering, Red Hat ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] [survey] BTRFS_IOC_DEVICES_READY return status
On Fri, 12.06.15 21:16, Anand Jain (anand.j...@oracle.com) wrote: BTRFS_IOC_DEVICES_READY is to check if all the required devices are known by the btrfs kernel, so that admin/system-application could mount the FS. It is checked against a device in the argument. However the actual implementation is bit more than just that, in the way that it would also scan and register the device provided in the argument (same as btrfs device scan subcommand or BTRFS_IOC_SCAN_DEV ioctl). So BTRFS_IOC_DEVICES_READY ioctl isn't a read/view only ioctl, but its a write command as well. Next, since in the kernel we only check if total_devices (read from SB) is equal to num_devices (counted in the list) to state the status as 0 (ready) or 1 (not ready). But this does not work in rest of the device pool state like missing, seeding, replacing since total_devices is actually not equal to num_devices in these state but device pool is ready for the mount and its a bug which is not part of this discussions. Questions: - Do we want BTRFS_IOC_DEVICES_READY ioctl to also scan and register the device provided (same as btrfs device scan command or the BTRFS_IOC_SCAN_DEV ioctl) OR can BTRFS_IOC_DEVICES_READY be read-only ioctl interface to check the state of the device pool. ? I am pretty sure the kernel should not change API on this now. Hence: stick to the current behaviour, please. - If the the device in the argument is already mounted, can it straightaway return 0 (ready) ? (as of now it would again independently read the SB determine total_devices and check against num_devices. Yeah, I figure that might make sense to do. - What should be the expected return when the FS is mounted and there is a missing device. An error, as it already does. I am pretty sure that mounting degraded file systems should be an exceptional operation, and not the common scheme. If it should happen automatically at all, then it should be triggered by some daemon or so, but not by udev/systemd. Lennart -- Lennart Poettering, Red Hat ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] [survey] BTRFS_IOC_DEVICES_READY return status
On Sat, 13.06.15 17:09, Goffredo Baroncelli (kreij...@libero.it) wrote: Further, the problem will be more intense in this eg. if you use dd and copy device A to device B. After you mount device A, by just providing device B in the above two commands you could let kernel update the device path, again all the IO (since device is mounted) are still going to the device A (not B), but /proc/self/mounts and 'btrfs fi show' shows it as device B (not A). Its a bug. very tricky to fix. In the past [*] I proposed a mount.btrfs helper . I tried to move the logic outside the kernel. I think that the problem is that we try to manage all these cases from a device point of view: when a device appears, we register the device and we try to mount the filesystem... This works very well when there is 1-volume filesystem. For the other cases there is a mess between the different layers: - kernel - udev/systemd - initrd logic My attempt followed a different idea: the mount helper waits the devices if needed, or if it is the case it mounts the filesystem in degraded mode. All devices are passed as mount arguments (--device=/dev/sdX), there is no a device registration: this avoids all these problems. Hmm, no. /bin/mount should not block for devices. That's generally incompatible with how the tool is used, and in particular from systemd. We would not make use for such a scheme in systemd. /bin/mount should always be short-running. I am pretty sure that if such automatic degraded mounting should be supported, then this should be done with some background storage daemon that alters the effect of the READY ioctl somehow after the timeout, and then retriggers the devcies so that systemd takes note. (or, alternatively: such a scheme could even be implemented all in kernel, based on some configurable kernel setting...) Lennart -- Lennart Poettering, Red Hat ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] [survey] BTRFS_IOC_DEVICES_READY return status
On Fri, Jun 12, 2015 at 09:16:30PM +0800, Anand Jain wrote: BTRFS_IOC_DEVICES_READY is to check if all the required devices are known by the btrfs kernel, so that admin/system-application could mount the FS. It is checked against a device in the argument. However the actual implementation is bit more than just that, in the way that it would also scan and register the device provided in the argument (same as btrfs device scan subcommand or BTRFS_IOC_SCAN_DEV ioctl). So BTRFS_IOC_DEVICES_READY ioctl isn't a read/view only ioctl, but its a write command as well. The implemented DEVICES_READY behaviour is intentional, but not a good example of ioctl interface design. I asked for a more generic interface to querying devices when this patch was submitted but to no outcome. Next, since in the kernel we only check if total_devices (read from SB) is equal to num_devices (counted in the list) to state the status as 0 (ready) or 1 (not ready). But this does not work in rest of the device pool state like missing, seeding, replacing since total_devices is actually not equal to num_devices in these state but device pool is ready for the mount and its a bug which is not part of this discussions. That's an example why the single-shot ioctl is bad - it relies on some internal state that's otherwise nontrivial to get. Questions: - Do we want BTRFS_IOC_DEVICES_READY ioctl to also scan and register the device provided (same as btrfs device scan command or the BTRFS_IOC_SCAN_DEV ioctl) OR can BTRFS_IOC_DEVICES_READY be read-only ioctl interface to check the state of the device pool. ? This has been mentioned in the thread, we cannot change the ioctl that way. Extensions are possible as far as they stay backward compatible without changes to the existing users. - If the the device in the argument is already mounted, can it straightaway return 0 (ready) ? (as of now it would again independently read the SB determine total_devices and check against num_devices. We can do that, looks like a safe optimization. - What should be the expected return when the FS is mounted and there is a missing device. I think the current ioctl cannot give a good answer to that, similar to the seeding or dev-replace case. We'd need an improved ioctl or do it via sysfs which is my preference at the moment. ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] [survey] BTRFS_IOC_DEVICES_READY return status
On Mon, 15.06.15 19:23, Goffredo Baroncelli (kreij...@inwind.it) wrote: On 2015-06-15 12:46, Lennart Poettering wrote: On Sat, 13.06.15 17:09, Goffredo Baroncelli (kreij...@libero.it) wrote: Further, the problem will be more intense in this eg. if you use dd and copy device A to device B. After you mount device A, by just providing device B in the above two commands you could let kernel update the device path, again all the IO (since device is mounted) are still going to the device A (not B), but /proc/self/mounts and 'btrfs fi show' shows it as device B (not A). Its a bug. very tricky to fix. In the past [*] I proposed a mount.btrfs helper . I tried to move the logic outside the kernel. I think that the problem is that we try to manage all these cases from a device point of view: when a device appears, we register the device and we try to mount the filesystem... This works very well when there is 1-volume filesystem. For the other cases there is a mess between the different layers: - kernel - udev/systemd - initrd logic My attempt followed a different idea: the mount helper waits the devices if needed, or if it is the case it mounts the filesystem in degraded mode. All devices are passed as mount arguments (--device=/dev/sdX), there is no a device registration: this avoids all these problems. Hmm, no. /bin/mount should not block for devices. That's generally incompatible with how the tool is used, and in particular from systemd. We would not make use for such a scheme in systemd. /bin/mount should always be short-running. Apart systemd, which are these incompatibilities ? Well, /bin/mount is not a daemon, and it should not be one. I am pretty sure that if such automatic degraded mounting should be supported, then this should be done with some background storage daemon that alters the effect of the READY ioctl somehow after the timeout, and then retriggers the devcies so that systemd takes note. (or, alternatively: such a scheme could even be implemented all in kernel, based on some configurable kernel setting...) I recognize that this solution provides the maximum compatibility with the current implementation. However it seems too complex to me. Re-trigging a devices seems to me more a workaround than a solution. Well, it's not really ugly. I mean, if the state or properties of a device change, then udev should update its information about it, and that's done via a retrigger. We do that all the time already, for example when an existing loopback device gets a backing file assigned or removed. I am pretty sure that loopback case is very close to what you want to do here, hence retriggering (either from the kernel side, or from userspace), appears like an OK thing to do. Could a generator do this job ? I.e. this generator (or storage daemon) waits that all (or enough) devices are appeared, then it creates a .mount unit: do you think that it is doable ? systemd generators are a way to extend the systemd unit dep tree with units. They are very short running, and are executed only very very early at boot. They cannot wait for anything, they don#t have access to devices and are not run when they are appear. Lennart -- Lennart Poettering, Red Hat ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] [survey] BTRFS_IOC_DEVICES_READY return status
On 2015-06-14 06:05, Duncan wrote: Goffredo Baroncelli posted on Sat, 13 Jun 2015 17:09:19 +0200 as excerpted: My attempt followed a different idea: the mount helper waits the devices if needed, or if it is the case it mounts the filesystem in degraded mode. All devices are passed as mount arguments (--device=/dev/sdX), there is no a device registration: this avoids all these problems. [*] http://permalink.gmane.org/gmane.comp.file-systems.btrfs/40767 But /dev/sdX doesn't always work, because, for instance, my usual /dev/sdb was slow to respond on my last boot, and currently appears as /dev/sdf, with sdb/c/d/e being my (multi-type) sdcard, etc, adapter, medialess. Please give a look to my patch. You may mount the filesystem in different way: - by device (/dev/sdxxx) - by UUID (UUID=) - by LABEL (LABEL=) The helper finds the right devices and (eventually) waits for the other devices. When it has collected all the devices, these are passed to the kernel via the device=/dev/sdx mount option. So the registration would not be needed anymore. Tho if /dev/disk/by-*/* works, I could use that. Tho AFAIK it's udev that fills that in, so udev would be necessary. I never wrote that udev is not necessary. I think only that relying to udev to handling a multi-volume filesystem is too complicated. The responsibility is spread in too much layer. -- gpg @keyserver.linux.it: Goffredo Baroncelli kreijackATinwind.it Key fingerprint BBF5 1610 0B64 DAC6 5F7D 17B2 0EDA 9B37 8B82 E0B5 ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] [survey] BTRFS_IOC_DEVICES_READY return status
Goffredo Baroncelli posted on Sat, 13 Jun 2015 17:09:19 +0200 as excerpted: My attempt followed a different idea: the mount helper waits the devices if needed, or if it is the case it mounts the filesystem in degraded mode. All devices are passed as mount arguments (--device=/dev/sdX), there is no a device registration: this avoids all these problems. [*] http://permalink.gmane.org/gmane.comp.file-systems.btrfs/40767 But /dev/sdX doesn't always work, because, for instance, my usual /dev/sdb was slow to respond on my last boot, and currently appears as /dev/sdf, with sdb/c/d/e being my (multi-type) sdcard, etc, adapter, medialess. Tho if /dev/disk/by-*/* works, I could use that. Tho AFAIK it's udev that fills that in, so udev would be necessary. -- Duncan - List replies preferred. No HTML msgs. Every nonfree program has a lord, a master -- and if you use the program, he is your master. Richard Stallman ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] [survey] BTRFS_IOC_DEVICES_READY return status
В Sat, 13 Jun 2015 17:35:53 +0800 Anand Jain anand.j...@oracle.com пишет: Thanks for your reply Andrei and Goffredo. more below... On 06/13/2015 04:08 AM, Goffredo Baroncelli wrote: On 2015-06-12 20:04, Andrei Borzenkov wrote: В Fri, 12 Jun 2015 21:16:30 +0800 Anand Jain anand.j...@oracle.com пишет: BTRFS_IOC_DEVICES_READY is to check if all the required devices are known by the btrfs kernel, so that admin/system-application could mount the FS. It is checked against a device in the argument. However the actual implementation is bit more than just that, in the way that it would also scan and register the device provided in the argument (same as btrfs device scan subcommand or BTRFS_IOC_SCAN_DEV ioctl). So BTRFS_IOC_DEVICES_READY ioctl isn't a read/view only ioctl, but its a write command as well. Next, since in the kernel we only check if total_devices (read from SB) is equal to num_devices (counted in the list) to state the status as 0 (ready) or 1 (not ready). But this does not work in rest of the device pool state like missing, seeding, replacing since total_devices is actually not equal to num_devices in these state but device pool is ready for the mount and its a bug which is not part of this discussions. Questions: - Do we want BTRFS_IOC_DEVICES_READY ioctl to also scan and register the device provided (same as btrfs device scan command or the BTRFS_IOC_SCAN_DEV ioctl) OR can BTRFS_IOC_DEVICES_READY be read-only ioctl interface to check the state of the device pool. ? udev is using it to incrementally assemble multi-device btrfs, so in this case I think it should. Nice. Thanks for letting me know this. I agree, the ioctl name is confusing, but unfortunately this is an API and it has to be stay here forever. Udev uses it, so we know for sure that it is widely used. ok. what goes in stays there forever. its time to update the man page rather. Are there any other users? - If the the device in the argument is already mounted, can it straightaway return 0 (ready) ? (as of now it would again independently read the SB determine total_devices and check against num_devices. I think yes; obvious use case is btrfs mounted in initrd and later coldplug. There is no point to wait for anything as filesystem is obviously there. There is little difference. If the device is already mounted. And there are two device paths for the same device PA and PB. The path as last given to either 'btrfs dev scan (BTRFS_IOC_SCAN_DEV)' or 'btrfs device ready (BTRFS_IOC_DEVICES_READY)' will be shown in the 'btrfs filesystem show' or '/proc/self/mounts' output. It does not mean that btrfs kernel will close the first device path and reopen the 2nd given device path, it just updates the device path in the kernel. Further, the problem will be more intense in this eg. if you use dd and copy device A to device B. After you mount device A, by just providing device B in the above two commands you could let kernel update the device path, again all the IO (since device is mounted) are still going to the device A (not B), but /proc/self/mounts and 'btrfs fi show' shows it as device B (not A). Its a bug. very tricky to fix. - we can't return -EBUSY for subsequent (after mount) calls for the above two ioctls (if a mounted device is used as an argument). Since admin/system-application might actually call again to mount subvols. - we can return success (without updating the device path) but, we would be wrong when device A is copied into device B using dd. Since we would check against the on device SB's fsid/uuid/devid. Checking using strcmp the device paths is not practical since there can be different paths to the same device (lets says mapper). Neither of those problems are specific to mounted filesystem. The order of device discovery is non-deterministic. If you duplicate devices (snapshot, dd) it is unpredictable which devices will be included in btrfs. I.e. if you have A, B, C and A1, B1, C1 filesystem could be assembled as A, B, C1 and next boot as A, B1, C. Other systems attempt to mitigate such situation by keeping track of both on-disk identification and physical device properties (e.g. changing LU number will cause VMware to block access to disk on assumption that it is snapshot). One possibility is to store disk physical identity (UUID, serial number) and compare on access. Unless this is done, to guard against such case full device scan must be performed and attempt to mount such filesystem (that has duplicated members) blocked until admin resolves the issue. If filesystem is already mounted, any attempt to add duplicated member must be rejected. (any suggestion on how to check if its the same device in the kernel?). I do not know kernel interfaces, but
Re: [systemd-devel] [survey] BTRFS_IOC_DEVICES_READY return status
On 2015-06-13 11:35, Anand Jain wrote: Thanks for your reply Andrei and Goffredo. more below... On 06/13/2015 04:08 AM, Goffredo Baroncelli wrote: On 2015-06-12 20:04, Andrei Borzenkov wrote: В Fri, 12 Jun 2015 21:16:30 +0800 Anand Jain anand.j...@oracle.com пишет: BTRFS_IOC_DEVICES_READY is to check if all the required devices are known by the btrfs kernel, so that admin/system-application could mount the FS. It is checked against a device in the argument. However the actual implementation is bit more than just that, in the way that it would also scan and register the device provided in the argument (same as btrfs device scan subcommand or BTRFS_IOC_SCAN_DEV ioctl). So BTRFS_IOC_DEVICES_READY ioctl isn't a read/view only ioctl, but its a write command as well. Next, since in the kernel we only check if total_devices (read from SB) is equal to num_devices (counted in the list) to state the status as 0 (ready) or 1 (not ready). But this does not work in rest of the device pool state like missing, seeding, replacing since total_devices is actually not equal to num_devices in these state but device pool is ready for the mount and its a bug which is not part of this discussions. Questions: - Do we want BTRFS_IOC_DEVICES_READY ioctl to also scan and register the device provided (same as btrfs device scan command or the BTRFS_IOC_SCAN_DEV ioctl) OR can BTRFS_IOC_DEVICES_READY be read-only ioctl interface to check the state of the device pool. ? udev is using it to incrementally assemble multi-device btrfs, so in this case I think it should. Nice. Thanks for letting me know this. I agree, the ioctl name is confusing, but unfortunately this is an API and it has to be stay here forever. Udev uses it, so we know for sure that it is widely used. ok. what goes in stays there forever. its time to update the man page rather. Are there any other users? - If the the device in the argument is already mounted, can it straightaway return 0 (ready) ? (as of now it would again independently read the SB determine total_devices and check against num_devices. I think yes; obvious use case is btrfs mounted in initrd and later coldplug. There is no point to wait for anything as filesystem is obviously there. There is little difference. If the device is already mounted. And there are two device paths for the same device PA and PB. The path as last given to either 'btrfs dev scan (BTRFS_IOC_SCAN_DEV)' or 'btrfs device ready (BTRFS_IOC_DEVICES_READY)' will be shown in the 'btrfs filesystem show' or '/proc/self/mounts' output. It does not mean that btrfs kernel will close the first device path and reopen the 2nd given device path, it just updates the device path in the kernel. Further, the problem will be more intense in this eg. if you use dd and copy device A to device B. After you mount device A, by just providing device B in the above two commands you could let kernel update the device path, again all the IO (since device is mounted) are still going to the device A (not B), but /proc/self/mounts and 'btrfs fi show' shows it as device B (not A). Its a bug. very tricky to fix. In the past [*] I proposed a mount.btrfs helper . I tried to move the logic outside the kernel. I think that the problem is that we try to manage all these cases from a device point of view: when a device appears, we register the device and we try to mount the filesystem... This works very well when there is 1-volume filesystem. For the other cases there is a mess between the different layers: - kernel - udev/systemd - initrd logic My attempt followed a different idea: the mount helper waits the devices if needed, or if it is the case it mounts the filesystem in degraded mode. All devices are passed as mount arguments (--device=/dev/sdX), there is no a device registration: this avoids all these problems. [*] http://permalink.gmane.org/gmane.comp.file-systems.btrfs/40767 back to your questions - we can't return -EBUSY for subsequent (after mount) calls for the above two ioctls (if a mounted device is used as an argument). Since admin/system-application might actually call again to mount subvols. I am not sure that the two things are related: the mount doesn't use BTRFS_IOC_DEVICES_READY. After BTRFS_IOC_DEVICES_READY returns OK, all the filesystem belongs this FSID should be mounted; but it is a job of systemd/initramfs/sysv... a further failed BTRFS_IOC_DEVICES_READY shouldn't case any problem ... - we can return success (without updating the device path) but, we would be wrong when device A is copied into device B using dd. Since we would check against the on device SB's fsid/uuid/devid. Checking using strcmp the device paths is not practical since there can be different paths to the same device (lets says mapper). (any suggestion on how to check if its the same device in the kernel?). check minor/major ? -
Re: [systemd-devel] [survey] BTRFS_IOC_DEVICES_READY return status
Thanks for your reply Andrei and Goffredo. more below... On 06/13/2015 04:08 AM, Goffredo Baroncelli wrote: On 2015-06-12 20:04, Andrei Borzenkov wrote: В Fri, 12 Jun 2015 21:16:30 +0800 Anand Jain anand.j...@oracle.com пишет: BTRFS_IOC_DEVICES_READY is to check if all the required devices are known by the btrfs kernel, so that admin/system-application could mount the FS. It is checked against a device in the argument. However the actual implementation is bit more than just that, in the way that it would also scan and register the device provided in the argument (same as btrfs device scan subcommand or BTRFS_IOC_SCAN_DEV ioctl). So BTRFS_IOC_DEVICES_READY ioctl isn't a read/view only ioctl, but its a write command as well. Next, since in the kernel we only check if total_devices (read from SB) is equal to num_devices (counted in the list) to state the status as 0 (ready) or 1 (not ready). But this does not work in rest of the device pool state like missing, seeding, replacing since total_devices is actually not equal to num_devices in these state but device pool is ready for the mount and its a bug which is not part of this discussions. Questions: - Do we want BTRFS_IOC_DEVICES_READY ioctl to also scan and register the device provided (same as btrfs device scan command or the BTRFS_IOC_SCAN_DEV ioctl) OR can BTRFS_IOC_DEVICES_READY be read-only ioctl interface to check the state of the device pool. ? udev is using it to incrementally assemble multi-device btrfs, so in this case I think it should. Nice. Thanks for letting me know this. I agree, the ioctl name is confusing, but unfortunately this is an API and it has to be stay here forever. Udev uses it, so we know for sure that it is widely used. ok. what goes in stays there forever. its time to update the man page rather. Are there any other users? - If the the device in the argument is already mounted, can it straightaway return 0 (ready) ? (as of now it would again independently read the SB determine total_devices and check against num_devices. I think yes; obvious use case is btrfs mounted in initrd and later coldplug. There is no point to wait for anything as filesystem is obviously there. There is little difference. If the device is already mounted. And there are two device paths for the same device PA and PB. The path as last given to either 'btrfs dev scan (BTRFS_IOC_SCAN_DEV)' or 'btrfs device ready (BTRFS_IOC_DEVICES_READY)' will be shown in the 'btrfs filesystem show' or '/proc/self/mounts' output. It does not mean that btrfs kernel will close the first device path and reopen the 2nd given device path, it just updates the device path in the kernel. Further, the problem will be more intense in this eg. if you use dd and copy device A to device B. After you mount device A, by just providing device B in the above two commands you could let kernel update the device path, again all the IO (since device is mounted) are still going to the device A (not B), but /proc/self/mounts and 'btrfs fi show' shows it as device B (not A). Its a bug. very tricky to fix. - we can't return -EBUSY for subsequent (after mount) calls for the above two ioctls (if a mounted device is used as an argument). Since admin/system-application might actually call again to mount subvols. - we can return success (without updating the device path) but, we would be wrong when device A is copied into device B using dd. Since we would check against the on device SB's fsid/uuid/devid. Checking using strcmp the device paths is not practical since there can be different paths to the same device (lets says mapper). (any suggestion on how to check if its the same device in the kernel?). - Also if we don't let to update the device path after device is mounted, then are there chances that we would be stuck with the device path during initrd which does not make any sense to the user ? - What should be the expected return when the FS is mounted and there is a missing device. I suggest to not invest further energy on a ioctl API. If you want these kind of information, you (we) should export these in sysfs: In an ideal world: - a new btrfs device appears - udev register it with BTRFS_IOC_SCAN_DEV: - udev (or mount ?) checks the status of the filesystem reading the sysfs entries (total devices, present devices, seed devices, raid level); on the basis of the local policy (allow degraded mount, device timeout, how many device are missing, filesystem redundancy level.) udev (mount) may mount the filesystem with the appropriate parameter (ro, degraded, or even insert a spare device to correct a missing device) Yes. sysfs interface is coming. few framework patch were sent sometime back, any comments will help. On the ioctl part I am trying to fix the bug(s). This is similar to problem mdadm had to solve. mdadm starts timer as soon
[systemd-devel] [survey] BTRFS_IOC_DEVICES_READY return status
BTRFS_IOC_DEVICES_READY is to check if all the required devices are known by the btrfs kernel, so that admin/system-application could mount the FS. It is checked against a device in the argument. However the actual implementation is bit more than just that, in the way that it would also scan and register the device provided in the argument (same as btrfs device scan subcommand or BTRFS_IOC_SCAN_DEV ioctl). So BTRFS_IOC_DEVICES_READY ioctl isn't a read/view only ioctl, but its a write command as well. Next, since in the kernel we only check if total_devices (read from SB) is equal to num_devices (counted in the list) to state the status as 0 (ready) or 1 (not ready). But this does not work in rest of the device pool state like missing, seeding, replacing since total_devices is actually not equal to num_devices in these state but device pool is ready for the mount and its a bug which is not part of this discussions. Questions: - Do we want BTRFS_IOC_DEVICES_READY ioctl to also scan and register the device provided (same as btrfs device scan command or the BTRFS_IOC_SCAN_DEV ioctl) OR can BTRFS_IOC_DEVICES_READY be read-only ioctl interface to check the state of the device pool. ? - If the the device in the argument is already mounted, can it straightaway return 0 (ready) ? (as of now it would again independently read the SB determine total_devices and check against num_devices. - What should be the expected return when the FS is mounted and there is a missing device. Thanks, Anand ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] [survey] BTRFS_IOC_DEVICES_READY return status
В Fri, 12 Jun 2015 21:16:30 +0800 Anand Jain anand.j...@oracle.com пишет: BTRFS_IOC_DEVICES_READY is to check if all the required devices are known by the btrfs kernel, so that admin/system-application could mount the FS. It is checked against a device in the argument. However the actual implementation is bit more than just that, in the way that it would also scan and register the device provided in the argument (same as btrfs device scan subcommand or BTRFS_IOC_SCAN_DEV ioctl). So BTRFS_IOC_DEVICES_READY ioctl isn't a read/view only ioctl, but its a write command as well. Next, since in the kernel we only check if total_devices (read from SB) is equal to num_devices (counted in the list) to state the status as 0 (ready) or 1 (not ready). But this does not work in rest of the device pool state like missing, seeding, replacing since total_devices is actually not equal to num_devices in these state but device pool is ready for the mount and its a bug which is not part of this discussions. Questions: - Do we want BTRFS_IOC_DEVICES_READY ioctl to also scan and register the device provided (same as btrfs device scan command or the BTRFS_IOC_SCAN_DEV ioctl) OR can BTRFS_IOC_DEVICES_READY be read-only ioctl interface to check the state of the device pool. ? udev is using it to incrementally assemble multi-device btrfs, so in this case I think it should. Are there any other users? - If the the device in the argument is already mounted, can it straightaway return 0 (ready) ? (as of now it would again independently read the SB determine total_devices and check against num_devices. I think yes; obvious use case is btrfs mounted in initrd and later coldplug. There is no point to wait for anything as filesystem is obviously there. - What should be the expected return when the FS is mounted and there is a missing device. This is similar to problem mdadm had to solve. mdadm starts timer as soon as enough raid devices are present; if timer expires before raid is complete, raid is started in degraded mode. This avoids spurious rebuilds. So it would be good if btrfs could distinguish between enough devices to mount and all devices. ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel