Re: [PATCH 0/9 v2] fix replace-start and replace-cancel racing

2018-11-16 Thread Anand Jain




On 11/15/2018 11:41 PM, David Sterba wrote:

On Sun, Nov 11, 2018 at 10:22:15PM +0800, Anand Jain wrote:

v1->v2:
   2/9: Drop writeback required
   3/9: Drop writeback required
   7/9: Use the condition within the WARN_ON()
   6/9: Use the condition within the ASSERT()

Replace-start and replace-cancel threads can race to create a messy
situation leading to UAF. We use the scrub code to write
the blocks on the replace target. So if we haven't have set the
replace-scrub-running yet, without this patch we just ignore the error
and free the target device. When this happens the system panics with
UAF error.

Its nice to see that btrfs_dev_replace_finishing() already handles
the ECANCELED (replace canceled) situation, but for an unknown reason
we aren't using it to cleanup the replace cancel situation, instead
we just let the replace cancel ioctl thread to cleanup the target
device and return and out of synchronous with the scrub code.

This patch 4/9, 5/9 and 6/9 uses the return code of btrfs_scrub_cancel()
to check if the scrub was really running. And if its not then shall
return an error to the user (replace not started error) so that user
can retry replace cancel. And uses btrfs_dev_replace_finishing() code
to cleanup after successful cancel of the replace scrub.

Further, a suspended replace, when tries to restart, and if it fails
(for example target device missing, or excl ops running) it goes to the
started state, and so the cli 'btrfs replace status /mnt' hangs with no
progress. So patches 2/9 and 3/9 fixes that.

As the originals code idea of ECANCELED was limited to the situation of
the error only and not user requested, there are unnecessary error log
and warn log which 7/9 and 8/9 patches fixes.

Patches 1/9 and 9/9 are good to have fixes. Makes a function static and
code readability good.

Testing: (I did some attempt to convert these into xfstests but need a
mechanism where kernel thread can wait for user land script. I thought
I could do it using ebfp, but needs more digging on how).
As of now hand tested with using procfs to hold kernel thread at
(wait_for_user(..)) until user land issues go.


This could be tricky to get implemented but would be of course useful. I
saw the crash about once a week so will watch if this still happens.


 That will be nice.


Anand Jain (9):
   btrfs: mark btrfs_dev_replace_start() as static
   btrfs: replace go back to suspended if target missing
   btrfs: replace back to suspend state if EXCL OP is running
   btrfs: fix UAF due to race between replace start and cancel
   btrfs: replace cancel is successful if scrub cancel is successful
   btrfs: replace's scrub must not be running in replace suspended state
   btrfs: quiten warn if the replace is canceled at finish
   btrfs: user requsted replace cancel is not an error
   btrfs: add explicit check for replace result no error


The above is merged to misc-next, except:

btrfs: quiten warn if the replace is canceled at finish
btrfs: user requsted replace cancel is not an error

with replies under the patches what could be improved. The changes can
be sent independently if you need to do that in several patches. Thanks.


 We need these patch otherwise you will see WARN_ON and btrfs_err
 after a successful replace cancel. Will send revised patch.

Thanks, Anand


Re: [PATCH 0/9 v2] fix replace-start and replace-cancel racing

2018-11-15 Thread David Sterba
On Sun, Nov 11, 2018 at 10:22:15PM +0800, Anand Jain wrote:
> v1->v2:
>   2/9: Drop writeback required
>   3/9: Drop writeback required
>   7/9: Use the condition within the WARN_ON()
>   6/9: Use the condition within the ASSERT()
> 
> Replace-start and replace-cancel threads can race to create a messy
> situation leading to UAF. We use the scrub code to write
> the blocks on the replace target. So if we haven't have set the
> replace-scrub-running yet, without this patch we just ignore the error
> and free the target device. When this happens the system panics with
> UAF error.
> 
> Its nice to see that btrfs_dev_replace_finishing() already handles
> the ECANCELED (replace canceled) situation, but for an unknown reason
> we aren't using it to cleanup the replace cancel situation, instead
> we just let the replace cancel ioctl thread to cleanup the target
> device and return and out of synchronous with the scrub code.
> 
> This patch 4/9, 5/9 and 6/9 uses the return code of btrfs_scrub_cancel()
> to check if the scrub was really running. And if its not then shall
> return an error to the user (replace not started error) so that user
> can retry replace cancel. And uses btrfs_dev_replace_finishing() code
> to cleanup after successful cancel of the replace scrub.
> 
> Further, a suspended replace, when tries to restart, and if it fails
> (for example target device missing, or excl ops running) it goes to the
> started state, and so the cli 'btrfs replace status /mnt' hangs with no
> progress. So patches 2/9 and 3/9 fixes that.
> 
> As the originals code idea of ECANCELED was limited to the situation of
> the error only and not user requested, there are unnecessary error log
> and warn log which 7/9 and 8/9 patches fixes.
> 
> Patches 1/9 and 9/9 are good to have fixes. Makes a function static and
> code readability good.
> 
> Testing: (I did some attempt to convert these into xfstests but need a
> mechanism where kernel thread can wait for user land script. I thought
> I could do it using ebfp, but needs more digging on how).
> As of now hand tested with using procfs to hold kernel thread at
> (wait_for_user(..)) until user land issues go.

This could be tricky to get implemented but would be of course useful. I
saw the crash about once a week so will watch if this still happens.

> Anand Jain (9):
>   btrfs: mark btrfs_dev_replace_start() as static
>   btrfs: replace go back to suspended if target missing
>   btrfs: replace back to suspend state if EXCL OP is running
>   btrfs: fix UAF due to race between replace start and cancel
>   btrfs: replace cancel is successful if scrub cancel is successful
>   btrfs: replace's scrub must not be running in replace suspended state
>   btrfs: quiten warn if the replace is canceled at finish
>   btrfs: user requsted replace cancel is not an error
>   btrfs: add explicit check for replace result no error

The above is merged to misc-next, except:

btrfs: quiten warn if the replace is canceled at finish
btrfs: user requsted replace cancel is not an error

with replies under the patches what could be improved. The changes can
be sent independently if you need to do that in several patches. Thanks.


Re: [PATCH 0/9 v2] fix replace-start and replace-cancel racing

2018-11-13 Thread David Sterba
On Sun, Nov 11, 2018 at 10:22:15PM +0800, Anand Jain wrote:
> v1->v2:
>   2/9: Drop writeback required
>   3/9: Drop writeback required
>   7/9: Use the condition within the WARN_ON()
>   6/9: Use the condition within the ASSERT()
> 
> Replace-start and replace-cancel threads can race to create a messy
> situation leading to UAF. We use the scrub code to write
> the blocks on the replace target. So if we haven't have set the
> replace-scrub-running yet, without this patch we just ignore the error
> and free the target device. When this happens the system panics with
> UAF error.
> 
> Its nice to see that btrfs_dev_replace_finishing() already handles
> the ECANCELED (replace canceled) situation, but for an unknown reason
> we aren't using it to cleanup the replace cancel situation, instead
> we just let the replace cancel ioctl thread to cleanup the target
> device and return and out of synchronous with the scrub code.
> 
> This patch 4/9, 5/9 and 6/9 uses the return code of btrfs_scrub_cancel()
> to check if the scrub was really running. And if its not then shall
> return an error to the user (replace not started error) so that user
> can retry replace cancel. And uses btrfs_dev_replace_finishing() code
> to cleanup after successful cancel of the replace scrub.
> 
> Further, a suspended replace, when tries to restart, and if it fails
> (for example target device missing, or excl ops running) it goes to the
> started state, and so the cli 'btrfs replace status /mnt' hangs with no
> progress. So patches 2/9 and 3/9 fixes that.
> 
> As the originals code idea of ECANCELED was limited to the situation of
> the error only and not user requested, there are unnecessary error log
> and warn log which 7/9 and 8/9 patches fixes.

This fixes quite a few problems, namely the crash in scrub_setup_ctx,
thanks. I'm going to add the patchset to for-next, the code looks good
on first glance.


[PATCH 0/9 v2] fix replace-start and replace-cancel racing

2018-11-11 Thread Anand Jain
v1->v2:
  2/9: Drop writeback required
  3/9: Drop writeback required
  7/9: Use the condition within the WARN_ON()
  6/9: Use the condition within the ASSERT()

Replace-start and replace-cancel threads can race to create a messy
situation leading to UAF. We use the scrub code to write
the blocks on the replace target. So if we haven't have set the
replace-scrub-running yet, without this patch we just ignore the error
and free the target device. When this happens the system panics with
UAF error.

Its nice to see that btrfs_dev_replace_finishing() already handles
the ECANCELED (replace canceled) situation, but for an unknown reason
we aren't using it to cleanup the replace cancel situation, instead
we just let the replace cancel ioctl thread to cleanup the target
device and return and out of synchronous with the scrub code.

This patch 4/9, 5/9 and 6/9 uses the return code of btrfs_scrub_cancel()
to check if the scrub was really running. And if its not then shall
return an error to the user (replace not started error) so that user
can retry replace cancel. And uses btrfs_dev_replace_finishing() code
to cleanup after successful cancel of the replace scrub.

Further, a suspended replace, when tries to restart, and if it fails
(for example target device missing, or excl ops running) it goes to the
started state, and so the cli 'btrfs replace status /mnt' hangs with no
progress. So patches 2/9 and 3/9 fixes that.

As the originals code idea of ECANCELED was limited to the situation of
the error only and not user requested, there are unnecessary error log
and warn log which 7/9 and 8/9 patches fixes.

Patches 1/9 and 9/9 are good to have fixes. Makes a function static and
code readability good.

Testing: (I did some attempt to convert these into xfstests but need a
mechanism where kernel thread can wait for user land script. I thought
I could do it using ebfp, but needs more digging on how).
As of now hand tested with using procfs to hold kernel thread at
(wait_for_user(..)) until user land issues go.


1.
umount /btrfs; wipefs -a /dev/sd[b-f] && mkfs.btrfs -fq /dev/sdb && mount 
/dev/sdb /btrfs && fillfs /btrfs 1
btrfs replace start /dev/sdb /dev/sdc /btrfs
  wait_for_user("scrub running is set..waiting"); AND OR
  wait_for_user("scrub running is NOT set..waiting");
btrfs replace cancel /btrfs
  wait_for_user_go();
btrfs replace status /btrfs

2.
umount /btrfs; wipefs -a /dev/sd[b-f] && mkfs.btrfs -fq /dev/sdb && mount 
/dev/sdb /btrfs && fillfs /btrfs 1
btrfs replace start /dev/sdb /dev/sdc /btrfs
  wait_for_user("scrub running is set..waiting"); AND OR
  wait_for_user("scrub running is NOT set..waiting");
reboot
mount -o device=/dev/sdc /dev/sdb /btrfs
  wait_for_user_go();
btrfs replace status /btrfs
btrfs replace cancel /btrfs
btrfs replace status /btrfs

3.
umount /btrfs; wipefs -a /dev/sd[b-f] && mkfs.btrfs -fq /dev/sdb && mount 
/dev/sdb /btrfs && fillfs /btrfs 1
btrfs replace start /dev/sdb /dev/sdc /btrfs
  wait_for_user("scrub running is set..waiting"); AND OR
  wait_for_user("scrub running is NOT set..waiting");
reboot
mount -o degraded /dev/sdb /btrfs
btrfs replace status /btrfs
btrfs replace cancel /btrfs
btrfs replace status /btrfs
umount /btrfs
mount /dev/sdb /btrfs

Anand Jain (9):
  btrfs: mark btrfs_dev_replace_start() as static
  btrfs: replace go back to suspended if target missing
  btrfs: replace back to suspend state if EXCL OP is running
  btrfs: fix UAF due to race between replace start and cancel
  btrfs: replace cancel is successful if scrub cancel is successful
  btrfs: replace's scrub must not be running in replace suspended state
  btrfs: quiten warn if the replace is canceled at finish
  btrfs: user requsted replace cancel is not an error
  btrfs: add explicit check for replace result no error

 fs/btrfs/dev-replace.c | 87 ++
 fs/btrfs/dev-replace.h |  3 --
 2 files changed, 59 insertions(+), 31 deletions(-)

-- 
1.8.3.1

>From e5a84ccf4f73ec405bc7f5ad812b04762f287e2d Mon Sep 17 00:00:00 2001
From: Anand Jain 
Date: Wed, 7 Nov 2018 18:51:19 +0800


Anand Jain (9):
  btrfs: mark btrfs_dev_replace_start() as static
  btrfs: replace go back to suspended if target missing
  btrfs: replace back to suspend state if EXCL OP is running
  btrfs: fix UAF due to race between replace start and cancel
  btrfs: replace cancel is successful if scrub cancel is successful
  btrfs: replace's scrub must not be running in replace suspended state
  btrfs: quiten warn if the replace is canceled at finish
  btrfs: user requsted replace cancel is not an error
  btrfs: add explicit check for replace result no error

 fs/btrfs/dev-replace.c | 90 ++
 fs/btrfs/dev-replace.h |  3 --
 2 files changed, 62 insertions(+), 31 deletions(-)

-- 
1.8.3.1