Re: [PATCH 0/3] Handle bad dev_root properly with rescue=all

2021-03-18 Thread Josef Bacik

On 3/18/21 11:43 AM, David Sterba wrote:

On Thu, Mar 11, 2021 at 11:23:13AM -0500, Josef Bacik wrote:
[...]


rescue=all working without panicing the machine,
and I verified everything by
using btrfs-corrupt-block to corrupt a dev root of a file system.  Thanks,


We need to have such testing part of some suite but it depends on the
utilities that we don't want to ship by default. Last time we discussed
how to make btrfs-corrupt-block or similar available in source form in
fstests it was not pretty, like having half of the btrfs-progs sources
providing code to read and modify the data structures.

So, what if: we have such tests in the btrfs-progs testsuite. As they're
potentially dangerous, not run by default, but available for easy setup
and test in a VM. The testsuite can be exported into a tarball, with the
binaries included. Or simply run it built from git, that works too just
needs more dependencies installed.

I was thinking about collecting all the stress tests into one place,
fstests already has some but my idea is to provide stress testing of
btrfs-specific features, more at once. Or require some 3rd party tools
and data sources to provide the load.

It would be some infrastructure duplication with fstests, but lots of
that is already done in btrfs-progs and I'd rather go with some
duplication instead of development-time-only testing.



I actually had Boris working on this, basically allow us to set 
BTRFS_CORRUPT_PROG inside our fstests config, and then allow us to write these 
tests for fstests.  I'd prefer to keep this inside xfstests for mostly selfish 
reasons, it's easier for me to adapt the existing continual testing to take 
advantage of this and automatically run the xfstests stuff and post the results. 
 If we keep it in btrfs-progs I have to write a bunch of code to run those to 
post the results to our continual testing stuff.  Thanks,


Josef


Re: [PATCH 0/3] Handle bad dev_root properly with rescue=all

2021-03-18 Thread David Sterba
On Thu, Mar 11, 2021 at 11:23:13AM -0500, Josef Bacik wrote:
[...]

> rescue=all working without panicing the machine,
> and I verified everything by
> using btrfs-corrupt-block to corrupt a dev root of a file system.  Thanks,

We need to have such testing part of some suite but it depends on the
utilities that we don't want to ship by default. Last time we discussed
how to make btrfs-corrupt-block or similar available in source form in
fstests it was not pretty, like having half of the btrfs-progs sources
providing code to read and modify the data structures.

So, what if: we have such tests in the btrfs-progs testsuite. As they're
potentially dangerous, not run by default, but available for easy setup
and test in a VM. The testsuite can be exported into a tarball, with the
binaries included. Or simply run it built from git, that works too just
needs more dependencies installed.

I was thinking about collecting all the stress tests into one place,
fstests already has some but my idea is to provide stress testing of
btrfs-specific features, more at once. Or require some 3rd party tools
and data sources to provide the load.

It would be some infrastructure duplication with fstests, but lots of
that is already done in btrfs-progs and I'd rather go with some
duplication instead of development-time-only testing.


Re: [PATCH 0/3] Handle bad dev_root properly with rescue=all

2021-03-17 Thread Josef Bacik

On 3/17/21 8:27 AM, David Sterba wrote:

On Thu, Mar 11, 2021 at 11:23:13AM -0500, Josef Bacik wrote:

Hello,

My recent debugging session with Neal's broken filesystem uncovered a glaring
hole in my rescue=all patches, they don't deal with a NULL dev_root properly.
In testing I only ever tested corrupting the extent tree or the csum tree, since
those are the most problematic.  The following 3 fixes allowed Neal to get
rescue=all working without panicing the machine, and I verified everything by
using btrfs-corrupt-block to corrupt a dev root of a file system.  Thanks,


When rescue= is set lots of things can't work, I was wondering if we
should add messages once the "if (!dev_root)" cases are hit but I don't
think we need it, it should be clear enough from the other rescue=
related messages.



Yeah I went back and forth on this, and I came to the same conclusion as you. 
If we're doing rescue=all we know things are bad, and we just want our data 
back.  Thanks,


Josef


Re: [PATCH 0/3] Handle bad dev_root properly with rescue=all

2021-03-17 Thread David Sterba
On Thu, Mar 11, 2021 at 11:23:13AM -0500, Josef Bacik wrote:
> Hello,
> 
> My recent debugging session with Neal's broken filesystem uncovered a glaring
> hole in my rescue=all patches, they don't deal with a NULL dev_root properly.
> In testing I only ever tested corrupting the extent tree or the csum tree, 
> since
> those are the most problematic.  The following 3 fixes allowed Neal to get
> rescue=all working without panicing the machine, and I verified everything by
> using btrfs-corrupt-block to corrupt a dev root of a file system.  Thanks,

When rescue= is set lots of things can't work, I was wondering if we
should add messages once the "if (!dev_root)" cases are hit but I don't
think we need it, it should be clear enough from the other rescue=
related messages.


Re: [PATCH 0/3] Handle bad dev_root properly with rescue=all

2021-03-11 Thread Neal Gompa
On Thu, Mar 11, 2021 at 11:25 AM Josef Bacik  wrote:
>
> Hello,
>
> My recent debugging session with Neal's broken filesystem uncovered a glaring
> hole in my rescue=all patches, they don't deal with a NULL dev_root properly.
> In testing I only ever tested corrupting the extent tree or the csum tree, 
> since
> those are the most problematic.  The following 3 fixes allowed Neal to get
> rescue=all working without panicing the machine, and I verified everything by
> using btrfs-corrupt-block to corrupt a dev root of a file system.  Thanks,
>
> Josef
>
> Josef Bacik (3):
>   btrfs: init devices always
>   btrfs: do not init dev stats if we have no dev_root
>   btrfs: don't init dev replace for bad dev root
>
>  fs/btrfs/dev-replace.c | 3 +++
>  fs/btrfs/disk-io.c | 2 +-
>  fs/btrfs/volumes.c | 3 +++
>  3 files changed, 7 insertions(+), 1 deletion(-)
>
> --
> 2.26.2
>

I'm very happy that my freak disk controller issue yielded some good
fixes for Btrfs rescue mode code. :)

Reviewed-by: Neal Gompa 


-- 
真実はいつも一つ!/ Always, there's only one truth!