Re: [PATCH v3] btrfs: handle dynamically reappearing missing device
On 12/18/2017 08:01 PM, Nikolay Borisov wrote: On 17.12.2017 05:04, Anand Jain wrote: If the device is not present at the time of (-o degrade) mount, the mount context will create a dummy missing struct btrfs_device. Later this device may reappear after the FS is mounted and then device is included in the device list but it missed the open_device part. So this patch handles that case by going through the open_device steps which this device missed and finally adds to the device alloc list. So now with this patch, to bring back the missing device user can run, btrfs dev scan Without this kernel patch, even though 'btrfs fi show' and 'btrfs dev ready' would tell you that missing device has reappeared successfully but actually in kernel FS layer it didn't. Signed-off-by: Anand Jain--- This patch needs: [PATCH 0/4] factor __btrfs_open_devices() v3: The check for missing in the device_list_add() is now a another patch as its not related. btrfs: fix inconsistency during missing device rejoin v2: Add more comments. Add more change log. Add to check if device missing is set, to handle the case dev open fail and user will rerun the dev scan. fs/btrfs/volumes.c | 57 -- 1 file changed, 55 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 93d65c72b731..5c3190c65f81 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -812,8 +812,61 @@ static noinline int device_list_add(const char *path, rcu_string_free(device->name); rcu_assign_pointer(device->name, name); if (test_bit(BTRFS_DEV_STATE_MISSING, >dev_state)) { - fs_devices->missing_devices--; - clear_bit(BTRFS_DEV_STATE_MISSING, >dev_state); + int ret; + struct btrfs_fs_info *fs_info = fs_devices->fs_info; + fmode_t fmode = FMODE_READ | FMODE_WRITE | FMODE_EXCL; + + if (btrfs_super_flags(disk_super) & + BTRFS_SUPER_FLAG_SEEDING) + fmode &= ~FMODE_WRITE; + + /* +* Missing can be set only when FS is mounted. +* So here its always fs_devices->opened > 0 and most +* of the struct device members are already updated by +* the mount process even if this device was missing, so +* now follow the normal open device procedure for this +* device. The scrub will take care of filling the So how is scrub supposed to be initiated - automatically or by the user since it's assumed he will have seen the message that a device has been added? Reading the comment I'd expect scrub is kicked automatically. Oh. Right. it isn't a good idea to reset degraded option (below) in this thread, instead let scrub do it. Then user can notice too. In fact even if all the devices are present but if chunks does not match to what user has configured for, then its a degraded volume. When I was resetting the degraded option below. I had that question, but now the thought process and reasoning is satisfying. Will update. Thanks. +* missing stripes for raid56 and balance for raid1 and +* raid10. +*/ + ASSERT(fs_devices->opened); + mutex_lock(_devices->device_list_mutex); + mutex_lock(_info->chunk_mutex); + /* +* As of now do not fail the dev scan thread for the +* reason that btrfs_open_one_device() fails and keep +* the legacy dev scan requisites as it is. +* And reset missing only if open is successful, as +* user can rerun dev scan after fixing the device +* for which the device open (below) failed. +*/ + ret = btrfs_open_one_device(fs_devices, device, fmode, + fs_info->bdev_holder); + if (!ret) { + fs_devices->missing_devices--; + clear_bit(BTRFS_DEV_STATE_MISSING, + >dev_state); +btrfs_clear_opt(fs_info->mount_opt, DEGRADED); <-- + btrfs_warn(fs_info, + "BTRFS: device %s devid %llu joined\n", + path, devid); why do we have to warn, this is considered to be a "good" thing so perhaps btrfs_info? you are right, it should be btrfs_info. Will change. + } + +
Re: [PATCH v3] btrfs: handle dynamically reappearing missing device
On 17.12.2017 05:04, Anand Jain wrote: > If the device is not present at the time of (-o degrade) mount, > the mount context will create a dummy missing struct btrfs_device. > Later this device may reappear after the FS is mounted and > then device is included in the device list but it missed the > open_device part. So this patch handles that case by going > through the open_device steps which this device missed and finally > adds to the device alloc list. > > So now with this patch, to bring back the missing device user can run, > >btrfs dev scan > > Without this kernel patch, even though 'btrfs fi show' and 'btrfs > dev ready' would tell you that missing device has reappeared > successfully but actually in kernel FS layer it didn't. > > Signed-off-by: Anand Jain> --- > This patch needs: > [PATCH 0/4] factor __btrfs_open_devices() > v3: > The check for missing in the device_list_add() is now a another > patch as its not related. > btrfs: fix inconsistency during missing device rejoin > > v2: > Add more comments. > Add more change log. > Add to check if device missing is set, to handle the case > dev open fail and user will rerun the dev scan. > > fs/btrfs/volumes.c | 57 > -- > 1 file changed, 55 insertions(+), 2 deletions(-) > > diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c > index 93d65c72b731..5c3190c65f81 100644 > --- a/fs/btrfs/volumes.c > +++ b/fs/btrfs/volumes.c > @@ -812,8 +812,61 @@ static noinline int device_list_add(const char *path, > rcu_string_free(device->name); > rcu_assign_pointer(device->name, name); > if (test_bit(BTRFS_DEV_STATE_MISSING, >dev_state)) { > - fs_devices->missing_devices--; > - clear_bit(BTRFS_DEV_STATE_MISSING, >dev_state); > + int ret; > + struct btrfs_fs_info *fs_info = fs_devices->fs_info; > + fmode_t fmode = FMODE_READ | FMODE_WRITE | FMODE_EXCL; > + > + if (btrfs_super_flags(disk_super) & > + BTRFS_SUPER_FLAG_SEEDING) > + fmode &= ~FMODE_WRITE; > + > + /* > + * Missing can be set only when FS is mounted. > + * So here its always fs_devices->opened > 0 and most > + * of the struct device members are already updated by > + * the mount process even if this device was missing, so > + * now follow the normal open device procedure for this > + * device. The scrub will take care of filling the So how is scrub supposed to be initiated - automatically or by the user since it's assumed he will have seen the message that a device has been added? Reading the comment I'd expect scrub is kicked automatically. > + * missing stripes for raid56 and balance for raid1 and > + * raid10. > + */ > + ASSERT(fs_devices->opened); > + mutex_lock(_devices->device_list_mutex); > + mutex_lock(_info->chunk_mutex); > + /* > + * As of now do not fail the dev scan thread for the > + * reason that btrfs_open_one_device() fails and keep > + * the legacy dev scan requisites as it is. > + * And reset missing only if open is successful, as > + * user can rerun dev scan after fixing the device > + * for which the device open (below) failed. > + */ > + ret = btrfs_open_one_device(fs_devices, device, fmode, > + fs_info->bdev_holder); > + if (!ret) { > + fs_devices->missing_devices--; > + clear_bit(BTRFS_DEV_STATE_MISSING, > + >dev_state); > + btrfs_clear_opt(fs_info->mount_opt, DEGRADED); > + btrfs_warn(fs_info, > + "BTRFS: device %s devid %llu joined\n", > + path, devid); why do we have to warn, this is considered to be a "good" thing so perhaps btrfs_info? > + } > + > + if (test_bit(BTRFS_DEV_STATE_WRITEABLE, > + >dev_state) && Can a device that is missing really be writable? So we have 2 cases where we add a missing device: 1. Is via add_missing_dev which sets dev_state_missing so writable will be false. 2. In read_one_dev when we have successfully found the device from btrfs_find_device but it doesn't have a ->bdev member, in which case we don't really
[PATCH v3] btrfs: handle dynamically reappearing missing device
If the device is not present at the time of (-o degrade) mount, the mount context will create a dummy missing struct btrfs_device. Later this device may reappear after the FS is mounted and then device is included in the device list but it missed the open_device part. So this patch handles that case by going through the open_device steps which this device missed and finally adds to the device alloc list. So now with this patch, to bring back the missing device user can run, btrfs dev scan Without this kernel patch, even though 'btrfs fi show' and 'btrfs dev ready' would tell you that missing device has reappeared successfully but actually in kernel FS layer it didn't. Signed-off-by: Anand Jain--- This patch needs: [PATCH 0/4] factor __btrfs_open_devices() v3: The check for missing in the device_list_add() is now a another patch as its not related. btrfs: fix inconsistency during missing device rejoin v2: Add more comments. Add more change log. Add to check if device missing is set, to handle the case dev open fail and user will rerun the dev scan. fs/btrfs/volumes.c | 57 -- 1 file changed, 55 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 93d65c72b731..5c3190c65f81 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -812,8 +812,61 @@ static noinline int device_list_add(const char *path, rcu_string_free(device->name); rcu_assign_pointer(device->name, name); if (test_bit(BTRFS_DEV_STATE_MISSING, >dev_state)) { - fs_devices->missing_devices--; - clear_bit(BTRFS_DEV_STATE_MISSING, >dev_state); + int ret; + struct btrfs_fs_info *fs_info = fs_devices->fs_info; + fmode_t fmode = FMODE_READ | FMODE_WRITE | FMODE_EXCL; + + if (btrfs_super_flags(disk_super) & + BTRFS_SUPER_FLAG_SEEDING) + fmode &= ~FMODE_WRITE; + + /* +* Missing can be set only when FS is mounted. +* So here its always fs_devices->opened > 0 and most +* of the struct device members are already updated by +* the mount process even if this device was missing, so +* now follow the normal open device procedure for this +* device. The scrub will take care of filling the +* missing stripes for raid56 and balance for raid1 and +* raid10. +*/ + ASSERT(fs_devices->opened); + mutex_lock(_devices->device_list_mutex); + mutex_lock(_info->chunk_mutex); + /* +* As of now do not fail the dev scan thread for the +* reason that btrfs_open_one_device() fails and keep +* the legacy dev scan requisites as it is. +* And reset missing only if open is successful, as +* user can rerun dev scan after fixing the device +* for which the device open (below) failed. +*/ + ret = btrfs_open_one_device(fs_devices, device, fmode, + fs_info->bdev_holder); + if (!ret) { + fs_devices->missing_devices--; + clear_bit(BTRFS_DEV_STATE_MISSING, + >dev_state); + btrfs_clear_opt(fs_info->mount_opt, DEGRADED); + btrfs_warn(fs_info, + "BTRFS: device %s devid %llu joined\n", + path, devid); + } + + if (test_bit(BTRFS_DEV_STATE_WRITEABLE, +>dev_state) && + !test_bit(BTRFS_DEV_STATE_REPLACE_TGT, + >dev_state)) { + fs_devices->total_rw_bytes += + device->total_bytes; + atomic64_add(device->total_bytes - + device->bytes_used, + _info->free_chunk_space); + } + set_bit(BTRFS_DEV_STATE_IN_FS_METADATA, + >dev_state); + mutex_unlock(_info->chunk_mutex); + mutex_unlock(_devices->device_list_mutex);
[PATCH v3] btrfs: handle dynamically reappearing missing device
If the device is not present at the time of (-o degrade) mount, the mount context will create a dummy missing struct btrfs_device. Later this device may reappear after the FS is mounted and then device is included in the device list but it missed the open_device part. So this patch handles that case by going through the open_device steps which this device missed and finally adds to the device alloc list. So now with this patch, to bring back the missing device user can run, btrfs dev scan Without this kernel patch, even though 'btrfs fi show' and 'btrfs dev ready' would tell you that missing device has reappeared successfully but actually in kernel FS layer it didn't. Signed-off-by: Anand Jain--- This patch needs: [PATCH 0/4] factor __btrfs_open_devices() v2: Add more comments. Add more change log. Add to check if device missing is set, to handle the case dev open fail and user will rerun the dev scan v3: Reword comments in the code. The device missing check added in v2, is sent as a separate patch [patch] btrfs: fix inconsistency during missing device rejoin fs/btrfs/volumes.c | 57 -- 1 file changed, 55 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index ac0c4eb5107f..04164337ac69 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -760,8 +760,61 @@ static noinline int device_list_add(const char *path, rcu_string_free(device->name); rcu_assign_pointer(device->name, name); if (test_bit(BTRFS_DEV_STATE_MISSING, >dev_state)) { - fs_devices->missing_devices--; - clear_bit(BTRFS_DEV_STATE_MISSING, >dev_state); + int ret; + struct btrfs_fs_info *fs_info = fs_devices->fs_info; + fmode_t fmode = FMODE_READ | FMODE_WRITE | FMODE_EXCL; + + if (btrfs_super_flags(disk_super) & + BTRFS_SUPER_FLAG_SEEDING) + fmode &= ~FMODE_WRITE; + + /* +* Missing can be set only when FS is mounted. +* So here its always fs_devices->opened > 0 and most +* of the struct device members are already updated by +* the mount process even if this device was missing, so +* now follow the normal open device procedure for this +* device. The scrub will take care of filling the +* missing stripes for raid56 and balance for raid1 and +* raid10. +*/ + ASSERT(fs_devices->opened); + mutex_lock(_devices->device_list_mutex); + mutex_lock(_info->chunk_mutex); + /* +* As of now do not fail the dev scan thread for the +* reason that btrfs_open_one_device() fails and keep +* the legacy dev scan requisites as it is. +* And reset missing only if open is successful, as +* user can rerun dev scan after fixing the device +* for which the device open (below) failed. +*/ + ret = btrfs_open_one_device(fs_devices, device, fmode, + fs_info->bdev_holder); + if (!ret) { + fs_devices->missing_devices--; + clear_bit(BTRFS_DEV_STATE_MISSING, + >dev_state); + btrfs_clear_opt(fs_info->mount_opt, DEGRADED); + btrfs_warn(fs_info, + "BTRFS: device %s devid %llu joined\n", + path, devid); + } + + if (test_bit(BTRFS_DEV_STATE_WRITEABLE, + >dev_state) && + !test_bit(BTRFS_DEV_STATE_REPLACE_TGT, + >dev_state)) { + fs_devices->total_rw_bytes += + device->total_bytes; + atomic64_add(device->total_bytes - + device->bytes_used, + _info->free_chunk_space); + } + set_bit(BTRFS_DEV_STATE_IN_FS_METADATA, + >dev_state); + mutex_unlock(_info->chunk_mutex); +