On May 16, 2014, at 11:31 AM, Goffredo Baroncelli <kreij...@inwind.it> wrote:

> On 05/15/2014 11:54 PM, Chris Murphy wrote:
>> 
>> On May 15, 2014, at 2:57 PM, Goffredo Baroncelli <kreij...@libero.it>
>> wrote:
> [....]
>> 
>> The udev rule right now is asking if all Btrfs member devices are
>> present and it sounds like that answer is no with a missing device;
>> so a mount isn't even attempted by systemd rather than attempting a
>> degraded mount specifically for the root=UUID device(s).
> 
> Who is in charge to mount the filesystem ?

Ultimately systemd mounts the defined root file system to /sysroot. It knows 
what volume to mount based on boot parameter root=UUID= but it doesn't even try 
to mount it until the volume UUID for root fs has appeared. Until then, the 
mount command isn't even issued.


> 
> What I found is that dracut waits until all the btrfs devices are present:
> 
> cat /usr/lib/dracut/modules.d/90btrfs/80-btrfs.rules
> SUBSYSTEM!="block", GOTO="btrfs_end"
> ACTION!="add|change", GOTO="btrfs_end"
> ENV{ID_FS_TYPE}!="btrfs", GOTO="btrfs_end"
> RUN+="/sbin/btrfs device scan $env{DEVNAME}"
> 
> RUN+="/sbin/initqueue --finished --unique --name btrfs_finished 
> /sbin/btrfs_finished"
> 
> LABEL="btrfs_end"
> 
> 
> and 
> 
> 
> cat btrfs_finished.sh 
> #!/bin/sh
> # -*- mode: shell-script; indent-tabs-mode: nil; sh-basic-offset: 4; -*-
> # ex: ts=8 sw=4 sts=4 et filetype=sh
> 
> type getarg >/dev/null 2>&1 || . /lib/dracut-lib.sh
> 
> btrfs_check_complete() {
>    local _rootinfo _dev
>    _dev="${1:-/dev/root}"
>    [ -e "$_dev" ] || return 0
>    _rootinfo=$(udevadm info --query=env "--name=$_dev" 2>/dev/null)
>    if strstr "$_rootinfo" "ID_FS_TYPE=btrfs"; then
>        info "Checking, if btrfs device complete"
>        unset __btrfs_mount
>        mount -o ro "$_dev" /tmp >/dev/null 2>&1
>        __btrfs_mount=$?
>        [ $__btrfs_mount -eq 0 ] && umount "$_dev" >/dev/null 2>&1
>        return $__btrfs_mount
>    fi
>    return 0
> }
> 
> btrfs_check_complete $1
> exit $?
> 

> 
> It seems that when a new btrfs device appears, the system attempt to mount 
> it. If it succeed then it is assumed that all devices are present.

No, the system definitely does not attempt to mount it if there's a missing 
device. Systemd never executes /bin/mount at all in that case. A prerequisite 
for the mount attempt is this line:

[    1.621517] localhost.localdomain systemd[1]: 
dev-disk-by\x2duuid-9ff63135\x2dce42\x2d4447\x2da6de\x2dd7c9b4fb6d66.device 
changed dead -> plugged

That line only appears if all devices are present. And mount attempt doesn't 
happen. The system just hangs.

However, if I do an rd.break=pre-mount, and get to a dracut shell this command 
works:

mount -t btrfs -o subvol=root,ro,degraded -U <uuid>

The volume UUID is definitely present even though not all devices are present. 
So actually in this case it's confusing why this uuid hasn't gone from dead to 
plugged. Until it's plugged, the mount command won't happen.



> To allow a degraded boot, it should be sufficient replace
> 
> 
>       mount -o ro "$_dev" /tmp >/dev/null 2>&1
> 
> with
> 
>       OPTS="ro"
>       grep -q degraded /proc/cmdline && OPTS=",degraded"
>       mount -o $OPTS "$_dev" /tmp >/dev/null 2>&1

The problem isn't that the degraded mount option isn't being used by systemd. 
The problem is that systemd isn't changing the device from dead to plugged.

And the problem there is that there are actually four possible states for an 
array, yet btrfs device ready apparently only distinguishes between 1 and not 1 
(i.e. 2, 3, 4).

1. All devices ready.
2. Minimum number of data/metadata devices ready, allow degraded rw mount.
3. Minimum number of data devices not ready, but enough metadata devices are 
ready, allow degraded ro mount.
4. Minimum number of data/metadata devices not ready, degraded mount not 
possible.

So I think it's a question for the btrfs list to see what the long term 
strategy is, in the face of the fact rootflags=degraded alone does not work on 
systemd systems. Once I'm on 208-16 on Fedora 20, I get the same hang as on 
Rawhide. So actually I have to force power off, reboot with mount option 
rd.break=pre-mount, mount the volume manually, and exit twice. And that's fine 
for me, but it's non-obvious for most users.

The thing to put to the Btrfs list is how are they expecting this to work down 
the road.

Right now, the way md does this, it doesn't do anything at all. It's actually 
dracut scripts that check for the existance of the rootfs volume UUID up to 240 
times, with an 0.5 sleep between each attempt. After 240 failed attempts, 
dracut runs mdadm -R which forcibly runs the array with available devices (i.e. 
degraded assembly), at that moment the volume UUID becomes available, the 
device goes from dead to plugged, and systemd mounts it. And boot continues 
normally.

So maybe Btrfs can leverage that same loop used for md degraded booting. But 
after the loop completes, then what? I don't see how systemd gets informed to 
use an additional mount option "degraded" conditionally. I think the equivalent 
for dracut's mdadm -R, for btrfs would be something like 'btrfs device 
allowdegraded -U <uuid>' to set a state on the volume to permit normal mounts 
to work. Then the device goes from dead to plugged, and systemd just issues the 
usual mount command.

*shrug*



Chris Murphy

_______________________________________________
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel

Reply via email to