Re: Bad magic on superblock on /dev/sda at 65536

2018-04-07 Thread Ben Parsons
Fantastic!
btrfs check had no error on either disk!

I have now mounted the pool and am running a scrub.

Thank you so much for all of your help.

Thanks,
Ben


On 8 April 2018 at 13:05, Qu Wenruo  wrote:
>
>
> On 2018年04月08日 10:55, Ben Parsons wrote:
>> just to confirm:
>>
>> I run the following dd command to fix the superblocks:
>> dd if=super_dump.sdb of=/dev/sdb bs=1 count=4096 skip=64k
>> dd if=super_dump.sdc1 of=/dev/sdc1 bs=1 count=4096 skip=64k
>
> Nope.
>
> it's seek=64K
>
> Thanks,
> Qu
>>
>> Thanks,
>> Ben
>>
>> On 8 April 2018 at 12:27, Qu Wenruo  wrote:
>>> Here you go, all patched super block attached.
>>>
>>> Thanks,
>>> Qu
>>>
>>> On 2018年04月08日 10:14, Ben Parsons wrote:
 Super block of sdb as requested

 Thanks,
 Ben

 On 8 April 2018 at 11:53, Qu Wenruo  wrote:
>
>
> On 2018年04月08日 08:57, Ben Parsons wrote:
>> See attached for requested output.
>>
>> Do I still need to recover the super block of sdb?
>
> Yep. Please also attach the binary dump of superblock of sdb.
>
>>
>> Could you please point me the right direction for doing the inplace 
>> recovery?
>
> I'll provide the patched superblock for both disks (sdb and sdc1)
>
> And with them written back to disk, just run "btrfs check" first, if
> nothing wrong, mount it RW and run scrub.
>
> Pretty straightforward.
>
> Thanks,
> Qu
>>
>> I have not rebooterd or tried to recover / mount the disc btw.
>>
>> Thanks,
>> Ben
>>
>> On 8 April 2018 at 10:02, Qu Wenruo  wrote:
>>>
>>>
>>> On 2018年04月08日 07:29, Ben Parsons wrote:
 On 7 April 2018 at 22:09, Qu Wenruo  wrote:
>
>
> On 2018年04月07日 10:31, Ben Parsons wrote:
> [snip]
>>> Pretty common hard power reset.
>>>
 looking at journalctl, there is a large stacktrace from kernel: 
 amdgpu
 (see attached).
 then when I booted back up the pool (2 disks, 1TB + 2TB) wouldn't 
 mount.
>>>
>>> I'd say such corruption is pretty serious.
>>>
>>> And what's the profile of the btrfs? If metadata is raid1, we could 
>>> at
>>> least try to recovery the superblock from the remaining disk.
>>
>> I am not sure what the metadata was but the two disks had no parity
>> and just appeared as a single disk with total space of the two disks
>
> Strangely, for the 2nd disk, it's sdc1, which means it has partition 
> table.
> While for the 1st disk, it's sda, without partition table at all.
> Is there any possibility that you just took run partition?
> (Or did some program uses it incorrectly?)
>

 I dont quite understand what you are asking.
 I was always under the impression I could run mount on either
 partition and it would mount the pool

>>
>> how would i got about recovering the 2nd disk? attached is
>
> The 2nd disk looks good, however it's csum_type is wrong.
> 41700 looks like garbage.
>
> Despite that, incompact_flags also has garbage.
>
> The good news is, the system (and metadata) profile is RAID1, so it's
> highly possible for us to salvage (to be more accurate, rebuild) the
> superblock for the 1st device.
>
> Please dump the superblock of the 2nd device (sdc1) by the following
> command:
>
> # dd if=/dev/sdc1 of=super_dump.sdc1 bs=1 count=4096 skip=64k
>

 See attached.

>
> Unfortunately, btrfs-sb-mod tool added recently doesn't have all 
> needed
> fields, so I'm afraid I need to manually modify it.
>
> And just in case, please paste the following output to help us verify 
> if
> it's really sda without offset:
>
> # lsblk /dev/sda
> # grep -obUaP "\x5F\x42\x48\x52\x66\x53\x5F\x4D"
>

 dd if=/dev/sdb of=toGrep.sdb bs=1 count=128M status=progress
 cat toGrep.sdb | grep -obUaP "\x5F\x42\x48\x52\x66\x53\x5F\x4D"

 65600:_BHRfS_M
 67108928:_BHRfS_M
>>>
>>> Well, the magic number is completely correct, and at correct location.
>>>
>>> Would you please run "btrfs inspect dump-super -fFa /dev/sdb" again?
>>> This time it should provide good data.
>>>

>
> Above grep could be very slow since it will try to iterate the whole
> disk. It's recommended to dump the first 128M of the disk and then 
> grep
> on that 128M image.
>
>
> BTW, with superblock of sdc1 patched, you should be able to mount the 
> fs
> with -o ro,degraded, and salvage some data.
>
> Thanks,

Re: [PATCH v2] fstests: btrfs/159 superblock corruption test case

2018-04-07 Thread Eryu Guan
On Thu, Apr 05, 2018 at 02:28:49PM +0800, Anand Jain wrote:
> Verify if the superblock corruption is handled correctly.
> 
> Signed-off-by: Anand Jain 
> ---
> v1->v2:
>  $subject slightly changed
>  Added more info about the test-case
>  Keep the stuff from the ./new btrfs
>  Add mount_opt_minus_args() to get the options (if) set at the config file
>  Move DEV_GOOD & DEV_BAD to where it starts to use
>  To help debugging added run_check where possible
>  Remove {} in the out file
>  Use _filter_error_mount for mount fail cases other than -EINVAL
> 
>  tests/btrfs/159 | 177 
> 
>  tests/btrfs/159.out |  23 +++
>  tests/btrfs/group   |   1 +
>  3 files changed, 201 insertions(+)
>  create mode 100755 tests/btrfs/159
>  create mode 100644 tests/btrfs/159.out
> 
> diff --git a/tests/btrfs/159 b/tests/btrfs/159
> new file mode 100755
> index ..521cfdab0242
> --- /dev/null
> +++ b/tests/btrfs/159
> @@ -0,0 +1,177 @@
> +#! /bin/bash
> +# FS QA Test 159
> +#
> +# Test if the superblock corruption is handled correctly:
> +#- Test fsid miss-match (csum ok) between primary and copy superblock
> +#Fixed by the ML patch:
> +#btrfs: check if the fsid in the primary sb and copy sb are same
> +#- Test if the mount fails if the primary superblock csum is
> +#corrupted on any disk
> +#- Test if the mount does not fail if the copy1 sb csum is corrupted
> +#Fixed by the ML patches:
> +#btrfs: verify superblock checksum during scan
> +#btrfs: verify checksum for all devices in mount context
> +#
> +#-
> +# Copyright (c) 2018 Oracle.  All Rights Reserved.
> +# Author: Anand Jain 
> +#
> +# This program is free software; you can redistribute it and/or
> +# modify it under the terms of the GNU General Public License as
> +# published by the Free Software Foundation.
> +#
> +# This program is distributed in the hope that it would be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program; if not, write the Free Software Foundation,
> +# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
> +#---
> +#
> +
> +seq=`basename $0`
> +seqres=$RESULT_DIR/$seq
> +echo "QA output created by $seq"
> +
> +here=`pwd`
> +tmp=/tmp/$$
> +status=1 # failure is the default!
> +trap "_cleanup; exit \$status" 0 1 2 3 15
> +
> +_cleanup()
> +{
> + cd /
> + rm -f $tmp.*
> + _scratch_dev_pool_put
> +}
> +
> +# get standard environment, filters and checks
> +. ./common/rc
> +. ./common/filter
> +. ./common/module
> +
> +# remove previous $seqres.full before test
> +rm -f $seqres.full
> +
> +# real QA test starts here
> +
> +_supported_fs btrfs
> +_supported_os Linux
> +_require_scratch_dev_pool 2
> +_require_loadable_fs_module "btrfs"
> +_require_command "$WIPEFS_PROG" wipefs
> +
> +_scratch_dev_pool_get 2
> +
> +mount_opt_minus_args()
> +{
> + local nr
> + local mnt_opt=""
> +
> + nr=`_scratch_mount_options | awk '{print NF}'`
> + if [ $nr -eq 4 ]; then

Seems this only works with "scratch_mount_options" returns something
like:

"-o opt dev mnt" or "-o opt1,opt2 dev mnt"

but not

"-oopt dev mnt" nor
"-o opt1 -o opt2 dev mnt" nor if $SELINUX_MOUNT_OPTIONS not empty.

Also if MOUNT_OPTIONS is "-oopt1 -oopt2", mount_opt_minus_args would
return something like "-o -oopt2,device=", which are not valid
mount options.

> + #gets only the mount option set in the config file
> + mnt_opt=`_scratch_mount_options | awk '{print $2}'`
> + fi
> + #Append the additional opts provide as func args.
> + #Make sure -o is not echo-ed if both config file mnt-option
> + #and the test case mnt-option are null.
> + if [ -z $mnt_opt ]; then
> + if [ ! -z $* ]; then
> + echo "-o $*"
> + fi
> + else
> + if [ -z $* ]; then
> + echo "-o $mnt_opt"
> + else
> + echo "-o $mnt_opt,$*"
> + fi
> + fi
> +}
> +
> +wipe()
> +{
> + $WIPEFS_PROG -a $DEV_GOOD > /dev/null 2>&1
> + $WIPEFS_PROG -a $DEV_BAD > /dev/null 2>&1
> +}
> +
> +# Test for fsid miss-match (csum ok) with primary and copy superblock.
> +check_copy1_fsid()
> +{
> + local bytenr=67108864
> + echo -e "\\ncheck_copy1_fsid\\n" | tee -a $seqres.full
> +
> + wipe
> + $MKFS_BTRFS_PROG -fq $DEV_GOOD
> + $MKFS_BTRFS_PROG -fq $DEV_BAD
> + _reload_fs_module "btrfs"
> +
> + run_check dd status=none of=$DEV_BAD if=$DEV_GOOD ibs=1 obs=1\
> + skip=$bytenr seek=$bytenr count=4K
> +
> + 

Re: Bad magic on superblock on /dev/sda at 65536

2018-04-07 Thread Qu Wenruo


On 2018年04月08日 10:55, Ben Parsons wrote:
> just to confirm:
> 
> I run the following dd command to fix the superblocks:
> dd if=super_dump.sdb of=/dev/sdb bs=1 count=4096 skip=64k
> dd if=super_dump.sdc1 of=/dev/sdc1 bs=1 count=4096 skip=64k

Nope.

it's seek=64K

Thanks,
Qu
> 
> Thanks,
> Ben
> 
> On 8 April 2018 at 12:27, Qu Wenruo  wrote:
>> Here you go, all patched super block attached.
>>
>> Thanks,
>> Qu
>>
>> On 2018年04月08日 10:14, Ben Parsons wrote:
>>> Super block of sdb as requested
>>>
>>> Thanks,
>>> Ben
>>>
>>> On 8 April 2018 at 11:53, Qu Wenruo  wrote:


 On 2018年04月08日 08:57, Ben Parsons wrote:
> See attached for requested output.
>
> Do I still need to recover the super block of sdb?

 Yep. Please also attach the binary dump of superblock of sdb.

>
> Could you please point me the right direction for doing the inplace 
> recovery?

 I'll provide the patched superblock for both disks (sdb and sdc1)

 And with them written back to disk, just run "btrfs check" first, if
 nothing wrong, mount it RW and run scrub.

 Pretty straightforward.

 Thanks,
 Qu
>
> I have not rebooterd or tried to recover / mount the disc btw.
>
> Thanks,
> Ben
>
> On 8 April 2018 at 10:02, Qu Wenruo  wrote:
>>
>>
>> On 2018年04月08日 07:29, Ben Parsons wrote:
>>> On 7 April 2018 at 22:09, Qu Wenruo  wrote:


 On 2018年04月07日 10:31, Ben Parsons wrote:
 [snip]
>> Pretty common hard power reset.
>>
>>> looking at journalctl, there is a large stacktrace from kernel: 
>>> amdgpu
>>> (see attached).
>>> then when I booted back up the pool (2 disks, 1TB + 2TB) wouldn't 
>>> mount.
>>
>> I'd say such corruption is pretty serious.
>>
>> And what's the profile of the btrfs? If metadata is raid1, we could 
>> at
>> least try to recovery the superblock from the remaining disk.
>
> I am not sure what the metadata was but the two disks had no parity
> and just appeared as a single disk with total space of the two disks

 Strangely, for the 2nd disk, it's sdc1, which means it has partition 
 table.
 While for the 1st disk, it's sda, without partition table at all.
 Is there any possibility that you just took run partition?
 (Or did some program uses it incorrectly?)

>>>
>>> I dont quite understand what you are asking.
>>> I was always under the impression I could run mount on either
>>> partition and it would mount the pool
>>>
>
> how would i got about recovering the 2nd disk? attached is

 The 2nd disk looks good, however it's csum_type is wrong.
 41700 looks like garbage.

 Despite that, incompact_flags also has garbage.

 The good news is, the system (and metadata) profile is RAID1, so it's
 highly possible for us to salvage (to be more accurate, rebuild) the
 superblock for the 1st device.

 Please dump the superblock of the 2nd device (sdc1) by the following
 command:

 # dd if=/dev/sdc1 of=super_dump.sdc1 bs=1 count=4096 skip=64k

>>>
>>> See attached.
>>>

 Unfortunately, btrfs-sb-mod tool added recently doesn't have all needed
 fields, so I'm afraid I need to manually modify it.

 And just in case, please paste the following output to help us verify 
 if
 it's really sda without offset:

 # lsblk /dev/sda
 # grep -obUaP "\x5F\x42\x48\x52\x66\x53\x5F\x4D"

>>>
>>> dd if=/dev/sdb of=toGrep.sdb bs=1 count=128M status=progress
>>> cat toGrep.sdb | grep -obUaP "\x5F\x42\x48\x52\x66\x53\x5F\x4D"
>>>
>>> 65600:_BHRfS_M
>>> 67108928:_BHRfS_M
>>
>> Well, the magic number is completely correct, and at correct location.
>>
>> Would you please run "btrfs inspect dump-super -fFa /dev/sdb" again?
>> This time it should provide good data.
>>
>>>

 Above grep could be very slow since it will try to iterate the whole
 disk. It's recommended to dump the first 128M of the disk and then grep
 on that 128M image.


 BTW, with superblock of sdc1 patched, you should be able to mount the 
 fs
 with -o ro,degraded, and salvage some data.

 Thanks,
 Qu
>>>
>>> Thank you so much!
>>>
>>> I am better off copying the data to another disk and then rebuilding 
>>> the pool?
>>> or can I just run a scrub after the super block is fixed?
>>
>> According to your latest grep output, strangely the 1st device is not
>> that corrupted as before.
>>
>> So I think in-place recover shou

Re: Bad magic on superblock on /dev/sda at 65536

2018-04-07 Thread Ben Parsons
just to confirm:

I run the following dd command to fix the superblocks:
dd if=super_dump.sdb of=/dev/sdb bs=1 count=4096 skip=64k
dd if=super_dump.sdc1 of=/dev/sdc1 bs=1 count=4096 skip=64k

Thanks,
Ben

On 8 April 2018 at 12:27, Qu Wenruo  wrote:
> Here you go, all patched super block attached.
>
> Thanks,
> Qu
>
> On 2018年04月08日 10:14, Ben Parsons wrote:
>> Super block of sdb as requested
>>
>> Thanks,
>> Ben
>>
>> On 8 April 2018 at 11:53, Qu Wenruo  wrote:
>>>
>>>
>>> On 2018年04月08日 08:57, Ben Parsons wrote:
 See attached for requested output.

 Do I still need to recover the super block of sdb?
>>>
>>> Yep. Please also attach the binary dump of superblock of sdb.
>>>

 Could you please point me the right direction for doing the inplace 
 recovery?
>>>
>>> I'll provide the patched superblock for both disks (sdb and sdc1)
>>>
>>> And with them written back to disk, just run "btrfs check" first, if
>>> nothing wrong, mount it RW and run scrub.
>>>
>>> Pretty straightforward.
>>>
>>> Thanks,
>>> Qu

 I have not rebooterd or tried to recover / mount the disc btw.

 Thanks,
 Ben

 On 8 April 2018 at 10:02, Qu Wenruo  wrote:
>
>
> On 2018年04月08日 07:29, Ben Parsons wrote:
>> On 7 April 2018 at 22:09, Qu Wenruo  wrote:
>>>
>>>
>>> On 2018年04月07日 10:31, Ben Parsons wrote:
>>> [snip]
> Pretty common hard power reset.
>
>> looking at journalctl, there is a large stacktrace from kernel: 
>> amdgpu
>> (see attached).
>> then when I booted back up the pool (2 disks, 1TB + 2TB) wouldn't 
>> mount.
>
> I'd say such corruption is pretty serious.
>
> And what's the profile of the btrfs? If metadata is raid1, we could at
> least try to recovery the superblock from the remaining disk.

 I am not sure what the metadata was but the two disks had no parity
 and just appeared as a single disk with total space of the two disks
>>>
>>> Strangely, for the 2nd disk, it's sdc1, which means it has partition 
>>> table.
>>> While for the 1st disk, it's sda, without partition table at all.
>>> Is there any possibility that you just took run partition?
>>> (Or did some program uses it incorrectly?)
>>>
>>
>> I dont quite understand what you are asking.
>> I was always under the impression I could run mount on either
>> partition and it would mount the pool
>>

 how would i got about recovering the 2nd disk? attached is
>>>
>>> The 2nd disk looks good, however it's csum_type is wrong.
>>> 41700 looks like garbage.
>>>
>>> Despite that, incompact_flags also has garbage.
>>>
>>> The good news is, the system (and metadata) profile is RAID1, so it's
>>> highly possible for us to salvage (to be more accurate, rebuild) the
>>> superblock for the 1st device.
>>>
>>> Please dump the superblock of the 2nd device (sdc1) by the following
>>> command:
>>>
>>> # dd if=/dev/sdc1 of=super_dump.sdc1 bs=1 count=4096 skip=64k
>>>
>>
>> See attached.
>>
>>>
>>> Unfortunately, btrfs-sb-mod tool added recently doesn't have all needed
>>> fields, so I'm afraid I need to manually modify it.
>>>
>>> And just in case, please paste the following output to help us verify if
>>> it's really sda without offset:
>>>
>>> # lsblk /dev/sda
>>> # grep -obUaP "\x5F\x42\x48\x52\x66\x53\x5F\x4D"
>>>
>>
>> dd if=/dev/sdb of=toGrep.sdb bs=1 count=128M status=progress
>> cat toGrep.sdb | grep -obUaP "\x5F\x42\x48\x52\x66\x53\x5F\x4D"
>>
>> 65600:_BHRfS_M
>> 67108928:_BHRfS_M
>
> Well, the magic number is completely correct, and at correct location.
>
> Would you please run "btrfs inspect dump-super -fFa /dev/sdb" again?
> This time it should provide good data.
>
>>
>>>
>>> Above grep could be very slow since it will try to iterate the whole
>>> disk. It's recommended to dump the first 128M of the disk and then grep
>>> on that 128M image.
>>>
>>>
>>> BTW, with superblock of sdc1 patched, you should be able to mount the fs
>>> with -o ro,degraded, and salvage some data.
>>>
>>> Thanks,
>>> Qu
>>
>> Thank you so much!
>>
>> I am better off copying the data to another disk and then rebuilding the 
>> pool?
>> or can I just run a scrub after the super block is fixed?
>
> According to your latest grep output, strangely the 1st device is not
> that corrupted as before.
>
> So I think in-place recover should save you a lot of time.
>
> Thanks,
> Qu
>
>>
>> For reference here is lsblk:
>>
>> sda  8:00 465.8G  0 disk
>> ├─sda1   8:10   512M  0 part /boot
>> ├─sda2   8:20 455.3G  0 part /
>> └─sda3   8:3010

Re: Bad magic on superblock on /dev/sda at 65536

2018-04-07 Thread Qu Wenruo
Here you go, all patched super block attached.

Thanks,
Qu

On 2018年04月08日 10:14, Ben Parsons wrote:
> Super block of sdb as requested
> 
> Thanks,
> Ben
> 
> On 8 April 2018 at 11:53, Qu Wenruo  wrote:
>>
>>
>> On 2018年04月08日 08:57, Ben Parsons wrote:
>>> See attached for requested output.
>>>
>>> Do I still need to recover the super block of sdb?
>>
>> Yep. Please also attach the binary dump of superblock of sdb.
>>
>>>
>>> Could you please point me the right direction for doing the inplace 
>>> recovery?
>>
>> I'll provide the patched superblock for both disks (sdb and sdc1)
>>
>> And with them written back to disk, just run "btrfs check" first, if
>> nothing wrong, mount it RW and run scrub.
>>
>> Pretty straightforward.
>>
>> Thanks,
>> Qu
>>>
>>> I have not rebooterd or tried to recover / mount the disc btw.
>>>
>>> Thanks,
>>> Ben
>>>
>>> On 8 April 2018 at 10:02, Qu Wenruo  wrote:


 On 2018年04月08日 07:29, Ben Parsons wrote:
> On 7 April 2018 at 22:09, Qu Wenruo  wrote:
>>
>>
>> On 2018年04月07日 10:31, Ben Parsons wrote:
>> [snip]
 Pretty common hard power reset.

> looking at journalctl, there is a large stacktrace from kernel: amdgpu
> (see attached).
> then when I booted back up the pool (2 disks, 1TB + 2TB) wouldn't 
> mount.

 I'd say such corruption is pretty serious.

 And what's the profile of the btrfs? If metadata is raid1, we could at
 least try to recovery the superblock from the remaining disk.
>>>
>>> I am not sure what the metadata was but the two disks had no parity
>>> and just appeared as a single disk with total space of the two disks
>>
>> Strangely, for the 2nd disk, it's sdc1, which means it has partition 
>> table.
>> While for the 1st disk, it's sda, without partition table at all.
>> Is there any possibility that you just took run partition?
>> (Or did some program uses it incorrectly?)
>>
>
> I dont quite understand what you are asking.
> I was always under the impression I could run mount on either
> partition and it would mount the pool
>
>>>
>>> how would i got about recovering the 2nd disk? attached is
>>
>> The 2nd disk looks good, however it's csum_type is wrong.
>> 41700 looks like garbage.
>>
>> Despite that, incompact_flags also has garbage.
>>
>> The good news is, the system (and metadata) profile is RAID1, so it's
>> highly possible for us to salvage (to be more accurate, rebuild) the
>> superblock for the 1st device.
>>
>> Please dump the superblock of the 2nd device (sdc1) by the following
>> command:
>>
>> # dd if=/dev/sdc1 of=super_dump.sdc1 bs=1 count=4096 skip=64k
>>
>
> See attached.
>
>>
>> Unfortunately, btrfs-sb-mod tool added recently doesn't have all needed
>> fields, so I'm afraid I need to manually modify it.
>>
>> And just in case, please paste the following output to help us verify if
>> it's really sda without offset:
>>
>> # lsblk /dev/sda
>> # grep -obUaP "\x5F\x42\x48\x52\x66\x53\x5F\x4D"
>>
>
> dd if=/dev/sdb of=toGrep.sdb bs=1 count=128M status=progress
> cat toGrep.sdb | grep -obUaP "\x5F\x42\x48\x52\x66\x53\x5F\x4D"
>
> 65600:_BHRfS_M
> 67108928:_BHRfS_M

 Well, the magic number is completely correct, and at correct location.

 Would you please run "btrfs inspect dump-super -fFa /dev/sdb" again?
 This time it should provide good data.

>
>>
>> Above grep could be very slow since it will try to iterate the whole
>> disk. It's recommended to dump the first 128M of the disk and then grep
>> on that 128M image.
>>
>>
>> BTW, with superblock of sdc1 patched, you should be able to mount the fs
>> with -o ro,degraded, and salvage some data.
>>
>> Thanks,
>> Qu
>
> Thank you so much!
>
> I am better off copying the data to another disk and then rebuilding the 
> pool?
> or can I just run a scrub after the super block is fixed?

 According to your latest grep output, strangely the 1st device is not
 that corrupted as before.

 So I think in-place recover should save you a lot of time.

 Thanks,
 Qu

>
> For reference here is lsblk:
>
> sda  8:00 465.8G  0 disk
> ├─sda1   8:10   512M  0 part /boot
> ├─sda2   8:20 455.3G  0 part /
> └─sda3   8:3010G  0 part [SWAP]
>
> sdb  8:16   0 931.5G  0 disk
> -- first disk
>
> sdc  8:32   0   1.8T  0 disk
> └─sdc1   8:33   0   1.8T  0 part
> -- 2nd disk
>


super_dump.sdb
Description: Binary data


super_dump.sdc1
Description: Binary data


Re: Bad magic on superblock on /dev/sda at 65536

2018-04-07 Thread Ben Parsons
Super block of sdb as requested

Thanks,
Ben

On 8 April 2018 at 11:53, Qu Wenruo  wrote:
>
>
> On 2018年04月08日 08:57, Ben Parsons wrote:
>> See attached for requested output.
>>
>> Do I still need to recover the super block of sdb?
>
> Yep. Please also attach the binary dump of superblock of sdb.
>
>>
>> Could you please point me the right direction for doing the inplace recovery?
>
> I'll provide the patched superblock for both disks (sdb and sdc1)
>
> And with them written back to disk, just run "btrfs check" first, if
> nothing wrong, mount it RW and run scrub.
>
> Pretty straightforward.
>
> Thanks,
> Qu
>>
>> I have not rebooterd or tried to recover / mount the disc btw.
>>
>> Thanks,
>> Ben
>>
>> On 8 April 2018 at 10:02, Qu Wenruo  wrote:
>>>
>>>
>>> On 2018年04月08日 07:29, Ben Parsons wrote:
 On 7 April 2018 at 22:09, Qu Wenruo  wrote:
>
>
> On 2018年04月07日 10:31, Ben Parsons wrote:
> [snip]
>>> Pretty common hard power reset.
>>>
 looking at journalctl, there is a large stacktrace from kernel: amdgpu
 (see attached).
 then when I booted back up the pool (2 disks, 1TB + 2TB) wouldn't 
 mount.
>>>
>>> I'd say such corruption is pretty serious.
>>>
>>> And what's the profile of the btrfs? If metadata is raid1, we could at
>>> least try to recovery the superblock from the remaining disk.
>>
>> I am not sure what the metadata was but the two disks had no parity
>> and just appeared as a single disk with total space of the two disks
>
> Strangely, for the 2nd disk, it's sdc1, which means it has partition 
> table.
> While for the 1st disk, it's sda, without partition table at all.
> Is there any possibility that you just took run partition?
> (Or did some program uses it incorrectly?)
>

 I dont quite understand what you are asking.
 I was always under the impression I could run mount on either
 partition and it would mount the pool

>>
>> how would i got about recovering the 2nd disk? attached is
>
> The 2nd disk looks good, however it's csum_type is wrong.
> 41700 looks like garbage.
>
> Despite that, incompact_flags also has garbage.
>
> The good news is, the system (and metadata) profile is RAID1, so it's
> highly possible for us to salvage (to be more accurate, rebuild) the
> superblock for the 1st device.
>
> Please dump the superblock of the 2nd device (sdc1) by the following
> command:
>
> # dd if=/dev/sdc1 of=super_dump.sdc1 bs=1 count=4096 skip=64k
>

 See attached.

>
> Unfortunately, btrfs-sb-mod tool added recently doesn't have all needed
> fields, so I'm afraid I need to manually modify it.
>
> And just in case, please paste the following output to help us verify if
> it's really sda without offset:
>
> # lsblk /dev/sda
> # grep -obUaP "\x5F\x42\x48\x52\x66\x53\x5F\x4D"
>

 dd if=/dev/sdb of=toGrep.sdb bs=1 count=128M status=progress
 cat toGrep.sdb | grep -obUaP "\x5F\x42\x48\x52\x66\x53\x5F\x4D"

 65600:_BHRfS_M
 67108928:_BHRfS_M
>>>
>>> Well, the magic number is completely correct, and at correct location.
>>>
>>> Would you please run "btrfs inspect dump-super -fFa /dev/sdb" again?
>>> This time it should provide good data.
>>>

>
> Above grep could be very slow since it will try to iterate the whole
> disk. It's recommended to dump the first 128M of the disk and then grep
> on that 128M image.
>
>
> BTW, with superblock of sdc1 patched, you should be able to mount the fs
> with -o ro,degraded, and salvage some data.
>
> Thanks,
> Qu

 Thank you so much!

 I am better off copying the data to another disk and then rebuilding the 
 pool?
 or can I just run a scrub after the super block is fixed?
>>>
>>> According to your latest grep output, strangely the 1st device is not
>>> that corrupted as before.
>>>
>>> So I think in-place recover should save you a lot of time.
>>>
>>> Thanks,
>>> Qu
>>>

 For reference here is lsblk:

 sda  8:00 465.8G  0 disk
 ├─sda1   8:10   512M  0 part /boot
 ├─sda2   8:20 455.3G  0 part /
 └─sda3   8:3010G  0 part [SWAP]

 sdb  8:16   0 931.5G  0 disk
 -- first disk

 sdc  8:32   0   1.8T  0 disk
 └─sdc1   8:33   0   1.8T  0 part
 -- 2nd disk



super_dump.sdb
Description: Binary data


Re: [PATCH] fstests: generic test for fsync after fallocate

2018-04-07 Thread Eryu Guan
On Thu, Apr 05, 2018 at 10:56:14PM +0100, fdman...@kernel.org wrote:
> From: Filipe Manana 
> 
> Test that fsync operations preserve extents allocated with fallocate(2)
> that are placed beyond a file's size.
> 
> This test is motivated by a bug found in btrfs where unwritten extents
> beyond the inode's i_size were not preserved after a fsync and power
> failure. The btrfs bug is fixed by the following patch for the linux
> kernel:
> 
>  "Btrfs: fix loss of prealloc extents past i_size after fsync log replay"
> 
> Signed-off-by: Filipe Manana 

Hmm, xfs fails this test, while ext4 passes.

# diff -u tests/generic/483.out 
/root/workspace/xfstests/results//xfs_4k_crc/generic/483.out.bad
--- tests/generic/483.out   2018-04-07 23:35:00.55511 +0800
+++ /root/workspace/xfstests/results//xfs_4k_crc/generic/483.out.bad
2018-04-07 23:39:48.780659707 +0800
@@ -6,5 +6,5 @@
 0: [0..511]: data
 1: [512..2559]: unwritten
 File baz fiemap:
-0: [0..511]: data
-1: [512..6143]: unwritten
+0: [0..895]: data
+1: [896..6143]: unwritten

I'm not sure what the problem is yet, but IMHO controlling on-disk
layout of a file from userspace is hard and should be avoided if
possible.

Why not dumping md5sum to .out file like other dmflakey tests? I've
checked the md5sum of all the three test files, and they're the same on
xfs as on ext4, so the files are not corrupted on xfs.

Thanks,
Eryu

> ---
>  tests/generic/482 | 118 
> ++
>  tests/generic/482.out |  10 +
>  tests/generic/group   |   1 +
>  3 files changed, 129 insertions(+)
>  create mode 100755 tests/generic/482
>  create mode 100644 tests/generic/482.out
> 
> diff --git a/tests/generic/482 b/tests/generic/482
> new file mode 100755
> index ..43bbc913
> --- /dev/null
> +++ b/tests/generic/482
> @@ -0,0 +1,118 @@
> +#! /bin/bash
> +# FSQA Test No. 482
> +#
> +# Test that fsync operations preserve extents allocated with fallocate(2) 
> that
> +# are placed beyond a file's size.
> +#
> +#---
> +#
> +# Copyright (C) 2018 SUSE Linux Products GmbH. All Rights Reserved.
> +# Author: Filipe Manana 
> +#
> +# This program is free software; you can redistribute it and/or
> +# modify it under the terms of the GNU General Public License as
> +# published by the Free Software Foundation.
> +#
> +# This program is distributed in the hope that it would be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program; if not, write the Free Software Foundation,
> +# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
> +#---
> +#
> +
> +seq=`basename $0`
> +seqres=$RESULT_DIR/$seq
> +echo "QA output created by $seq"
> +tmp=/tmp/$$
> +status=1 # failure is the default!
> +trap "_cleanup; exit \$status" 0 1 2 3 15
> +
> +_cleanup()
> +{
> + _cleanup_flakey
> + cd /
> + rm -f $tmp.*
> +}
> +
> +# get standard environment, filters and checks
> +. ./common/rc
> +. ./common/filter
> +. ./common/dmflakey
> +. ./common/punch
> +
> +# real QA test starts here
> +_supported_fs generic
> +_supported_os Linux
> +_require_scratch
> +_require_dm_target flakey
> +_require_xfs_io_command "falloc" "-k"
> +_require_xfs_io_command "fiemap"
> +
> +rm -f $seqres.full
> +
> +_scratch_mkfs >>$seqres.full 2>&1
> +_require_metadata_journaling $SCRATCH_DEV
> +_init_flakey
> +_mount_flakey
> +
> +# Create our test files.
> +$XFS_IO_PROG -f -c "pwrite -S 0xea 0 256K" $SCRATCH_MNT/foo >/dev/null
> +
> +# Create a file with many extents. We later want to shrink truncate it and
> +# add a prealloc extent beyond its new size.
> +for ((i = 1; i <= 500; i++)); do
> + offset=$(((i - 1) * 4 * 1024))
> + $XFS_IO_PROG -f -s -c "pwrite -S 0xcf $offset 4K" \
> + $SCRATCH_MNT/bar >/dev/null
> +done
> +
> +# A file which already has a prealloc extent beyond its size.
> +# The fsync done on it is motivated by differences in the btrfs 
> implementation
> +# of fsync (first fsync has different logic from subsequent fsyncs).
> +$XFS_IO_PROG -f -c "pwrite -S 0xf1 0 256K" \
> +  -c "falloc -k 256K 768K" \
> +  -c "fsync" \
> +  $SCRATCH_MNT/baz >/dev/null
> +
> +# Make sure everything done so far is durably persisted.
> +sync
> +
> +# Allocate an extent beyond the size of the first test file and fsync it.
> +$XFS_IO_PROG -c "falloc -k 256K 1M"\
> +  -c "fsync" \
> +  $SCRATCH_MNT/foo
> +
> +# Do a shrinking truncate of our test file, add a prealloc extent to it after
> +# its new size and fsync it.
> +$XFS_IO_PROG -c "truncate 256K" \
> +  -c "falloc -k 256K 1M"\
> +  -c "fsync" \
> +  $S

Re: Bad magic on superblock on /dev/sda at 65536

2018-04-07 Thread Qu Wenruo


On 2018年04月08日 08:57, Ben Parsons wrote:
> See attached for requested output.
> 
> Do I still need to recover the super block of sdb?

Yep. Please also attach the binary dump of superblock of sdb.

> 
> Could you please point me the right direction for doing the inplace recovery?

I'll provide the patched superblock for both disks (sdb and sdc1)

And with them written back to disk, just run "btrfs check" first, if
nothing wrong, mount it RW and run scrub.

Pretty straightforward.

Thanks,
Qu
> 
> I have not rebooterd or tried to recover / mount the disc btw.
> 
> Thanks,
> Ben
> 
> On 8 April 2018 at 10:02, Qu Wenruo  wrote:
>>
>>
>> On 2018年04月08日 07:29, Ben Parsons wrote:
>>> On 7 April 2018 at 22:09, Qu Wenruo  wrote:


 On 2018年04月07日 10:31, Ben Parsons wrote:
 [snip]
>> Pretty common hard power reset.
>>
>>> looking at journalctl, there is a large stacktrace from kernel: amdgpu
>>> (see attached).
>>> then when I booted back up the pool (2 disks, 1TB + 2TB) wouldn't mount.
>>
>> I'd say such corruption is pretty serious.
>>
>> And what's the profile of the btrfs? If metadata is raid1, we could at
>> least try to recovery the superblock from the remaining disk.
>
> I am not sure what the metadata was but the two disks had no parity
> and just appeared as a single disk with total space of the two disks

 Strangely, for the 2nd disk, it's sdc1, which means it has partition table.
 While for the 1st disk, it's sda, without partition table at all.
 Is there any possibility that you just took run partition?
 (Or did some program uses it incorrectly?)

>>>
>>> I dont quite understand what you are asking.
>>> I was always under the impression I could run mount on either
>>> partition and it would mount the pool
>>>
>
> how would i got about recovering the 2nd disk? attached is

 The 2nd disk looks good, however it's csum_type is wrong.
 41700 looks like garbage.

 Despite that, incompact_flags also has garbage.

 The good news is, the system (and metadata) profile is RAID1, so it's
 highly possible for us to salvage (to be more accurate, rebuild) the
 superblock for the 1st device.

 Please dump the superblock of the 2nd device (sdc1) by the following
 command:

 # dd if=/dev/sdc1 of=super_dump.sdc1 bs=1 count=4096 skip=64k

>>>
>>> See attached.
>>>

 Unfortunately, btrfs-sb-mod tool added recently doesn't have all needed
 fields, so I'm afraid I need to manually modify it.

 And just in case, please paste the following output to help us verify if
 it's really sda without offset:

 # lsblk /dev/sda
 # grep -obUaP "\x5F\x42\x48\x52\x66\x53\x5F\x4D"

>>>
>>> dd if=/dev/sdb of=toGrep.sdb bs=1 count=128M status=progress
>>> cat toGrep.sdb | grep -obUaP "\x5F\x42\x48\x52\x66\x53\x5F\x4D"
>>>
>>> 65600:_BHRfS_M
>>> 67108928:_BHRfS_M
>>
>> Well, the magic number is completely correct, and at correct location.
>>
>> Would you please run "btrfs inspect dump-super -fFa /dev/sdb" again?
>> This time it should provide good data.
>>
>>>

 Above grep could be very slow since it will try to iterate the whole
 disk. It's recommended to dump the first 128M of the disk and then grep
 on that 128M image.


 BTW, with superblock of sdc1 patched, you should be able to mount the fs
 with -o ro,degraded, and salvage some data.

 Thanks,
 Qu
>>>
>>> Thank you so much!
>>>
>>> I am better off copying the data to another disk and then rebuilding the 
>>> pool?
>>> or can I just run a scrub after the super block is fixed?
>>
>> According to your latest grep output, strangely the 1st device is not
>> that corrupted as before.
>>
>> So I think in-place recover should save you a lot of time.
>>
>> Thanks,
>> Qu
>>
>>>
>>> For reference here is lsblk:
>>>
>>> sda  8:00 465.8G  0 disk
>>> ├─sda1   8:10   512M  0 part /boot
>>> ├─sda2   8:20 455.3G  0 part /
>>> └─sda3   8:3010G  0 part [SWAP]
>>>
>>> sdb  8:16   0 931.5G  0 disk
>>> -- first disk
>>>
>>> sdc  8:32   0   1.8T  0 disk
>>> └─sdc1   8:33   0   1.8T  0 part
>>> -- 2nd disk
>>>
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Bad magic on superblock on /dev/sda at 65536

2018-04-07 Thread Ben Parsons
See attached for requested output.

Do I still need to recover the super block of sdb?

Could you please point me the right direction for doing the inplace recovery?

I have not rebooterd or tried to recover / mount the disc btw.

Thanks,
Ben

On 8 April 2018 at 10:02, Qu Wenruo  wrote:
>
>
> On 2018年04月08日 07:29, Ben Parsons wrote:
>> On 7 April 2018 at 22:09, Qu Wenruo  wrote:
>>>
>>>
>>> On 2018年04月07日 10:31, Ben Parsons wrote:
>>> [snip]
> Pretty common hard power reset.
>
>> looking at journalctl, there is a large stacktrace from kernel: amdgpu
>> (see attached).
>> then when I booted back up the pool (2 disks, 1TB + 2TB) wouldn't mount.
>
> I'd say such corruption is pretty serious.
>
> And what's the profile of the btrfs? If metadata is raid1, we could at
> least try to recovery the superblock from the remaining disk.

 I am not sure what the metadata was but the two disks had no parity
 and just appeared as a single disk with total space of the two disks
>>>
>>> Strangely, for the 2nd disk, it's sdc1, which means it has partition table.
>>> While for the 1st disk, it's sda, without partition table at all.
>>> Is there any possibility that you just took run partition?
>>> (Or did some program uses it incorrectly?)
>>>
>>
>> I dont quite understand what you are asking.
>> I was always under the impression I could run mount on either
>> partition and it would mount the pool
>>

 how would i got about recovering the 2nd disk? attached is
>>>
>>> The 2nd disk looks good, however it's csum_type is wrong.
>>> 41700 looks like garbage.
>>>
>>> Despite that, incompact_flags also has garbage.
>>>
>>> The good news is, the system (and metadata) profile is RAID1, so it's
>>> highly possible for us to salvage (to be more accurate, rebuild) the
>>> superblock for the 1st device.
>>>
>>> Please dump the superblock of the 2nd device (sdc1) by the following
>>> command:
>>>
>>> # dd if=/dev/sdc1 of=super_dump.sdc1 bs=1 count=4096 skip=64k
>>>
>>
>> See attached.
>>
>>>
>>> Unfortunately, btrfs-sb-mod tool added recently doesn't have all needed
>>> fields, so I'm afraid I need to manually modify it.
>>>
>>> And just in case, please paste the following output to help us verify if
>>> it's really sda without offset:
>>>
>>> # lsblk /dev/sda
>>> # grep -obUaP "\x5F\x42\x48\x52\x66\x53\x5F\x4D"
>>>
>>
>> dd if=/dev/sdb of=toGrep.sdb bs=1 count=128M status=progress
>> cat toGrep.sdb | grep -obUaP "\x5F\x42\x48\x52\x66\x53\x5F\x4D"
>>
>> 65600:_BHRfS_M
>> 67108928:_BHRfS_M
>
> Well, the magic number is completely correct, and at correct location.
>
> Would you please run "btrfs inspect dump-super -fFa /dev/sdb" again?
> This time it should provide good data.
>
>>
>>>
>>> Above grep could be very slow since it will try to iterate the whole
>>> disk. It's recommended to dump the first 128M of the disk and then grep
>>> on that 128M image.
>>>
>>>
>>> BTW, with superblock of sdc1 patched, you should be able to mount the fs
>>> with -o ro,degraded, and salvage some data.
>>>
>>> Thanks,
>>> Qu
>>
>> Thank you so much!
>>
>> I am better off copying the data to another disk and then rebuilding the 
>> pool?
>> or can I just run a scrub after the super block is fixed?
>
> According to your latest grep output, strangely the 1st device is not
> that corrupted as before.
>
> So I think in-place recover should save you a lot of time.
>
> Thanks,
> Qu
>
>>
>> For reference here is lsblk:
>>
>> sda  8:00 465.8G  0 disk
>> ├─sda1   8:10   512M  0 part /boot
>> ├─sda2   8:20 455.3G  0 part /
>> └─sda3   8:3010G  0 part [SWAP]
>>
>> sdb  8:16   0 931.5G  0 disk
>> -- first disk
>>
>> sdc  8:32   0   1.8T  0 disk
>> └─sdc1   8:33   0   1.8T  0 part
>> -- 2nd disk
>>
superblock: bytenr=65536, device=/dev/sdb
-
csum_type   41700 (INVALID)
csum_size   32
csum
0x76e0389d [match]
bytenr  65536
flags   0x1
( WRITTEN )
magic   _BHRfS_M [match]
fsid08e51c76-0068-45ba-bac8-9c1f57363ec6
label   
generation  1285351
root6485326479360
sys_array_size  129
chunk_root_generation   1273669
root_level  1
chunk_root  5518540881920
chunk_root_level1
log_root0
log_root_transid0
log_root_level  0
total_bytes 2979127844864
bytes_used  2924699414528
sectorsize  4096
nodesize16384
leafsize (deprecated)   16384
stripesize  4096
root_dir6
num_devices 2
compat_flags0x0
compat_ro_flags 0x0
incompat_flags  0x5b224169
( MIXED_BACKREF |

Re: Bad magic on superblock on /dev/sda at 65536

2018-04-07 Thread Qu Wenruo


On 2018年04月08日 07:29, Ben Parsons wrote:
> On 7 April 2018 at 22:09, Qu Wenruo  wrote:
>>
>>
>> On 2018年04月07日 10:31, Ben Parsons wrote:
>> [snip]
 Pretty common hard power reset.

> looking at journalctl, there is a large stacktrace from kernel: amdgpu
> (see attached).
> then when I booted back up the pool (2 disks, 1TB + 2TB) wouldn't mount.

 I'd say such corruption is pretty serious.

 And what's the profile of the btrfs? If metadata is raid1, we could at
 least try to recovery the superblock from the remaining disk.
>>>
>>> I am not sure what the metadata was but the two disks had no parity
>>> and just appeared as a single disk with total space of the two disks
>>
>> Strangely, for the 2nd disk, it's sdc1, which means it has partition table.
>> While for the 1st disk, it's sda, without partition table at all.
>> Is there any possibility that you just took run partition?
>> (Or did some program uses it incorrectly?)
>>
> 
> I dont quite understand what you are asking.
> I was always under the impression I could run mount on either
> partition and it would mount the pool
> 
>>>
>>> how would i got about recovering the 2nd disk? attached is
>>
>> The 2nd disk looks good, however it's csum_type is wrong.
>> 41700 looks like garbage.
>>
>> Despite that, incompact_flags also has garbage.
>>
>> The good news is, the system (and metadata) profile is RAID1, so it's
>> highly possible for us to salvage (to be more accurate, rebuild) the
>> superblock for the 1st device.
>>
>> Please dump the superblock of the 2nd device (sdc1) by the following
>> command:
>>
>> # dd if=/dev/sdc1 of=super_dump.sdc1 bs=1 count=4096 skip=64k
>>
> 
> See attached.
> 
>>
>> Unfortunately, btrfs-sb-mod tool added recently doesn't have all needed
>> fields, so I'm afraid I need to manually modify it.
>>
>> And just in case, please paste the following output to help us verify if
>> it's really sda without offset:
>>
>> # lsblk /dev/sda
>> # grep -obUaP "\x5F\x42\x48\x52\x66\x53\x5F\x4D"
>>
> 
> dd if=/dev/sdb of=toGrep.sdb bs=1 count=128M status=progress
> cat toGrep.sdb | grep -obUaP "\x5F\x42\x48\x52\x66\x53\x5F\x4D"
> 
> 65600:_BHRfS_M
> 67108928:_BHRfS_M

Well, the magic number is completely correct, and at correct location.

Would you please run "btrfs inspect dump-super -fFa /dev/sdb" again?
This time it should provide good data.

> 
>>
>> Above grep could be very slow since it will try to iterate the whole
>> disk. It's recommended to dump the first 128M of the disk and then grep
>> on that 128M image.
>>
>>
>> BTW, with superblock of sdc1 patched, you should be able to mount the fs
>> with -o ro,degraded, and salvage some data.
>>
>> Thanks,
>> Qu
> 
> Thank you so much!
> 
> I am better off copying the data to another disk and then rebuilding the pool?
> or can I just run a scrub after the super block is fixed?

According to your latest grep output, strangely the 1st device is not
that corrupted as before.

So I think in-place recover should save you a lot of time.

Thanks,
Qu

> 
> For reference here is lsblk:
> 
> sda  8:00 465.8G  0 disk
> ├─sda1   8:10   512M  0 part /boot
> ├─sda2   8:20 455.3G  0 part /
> └─sda3   8:3010G  0 part [SWAP]
> 
> sdb  8:16   0 931.5G  0 disk
> -- first disk
> 
> sdc  8:32   0   1.8T  0 disk
> └─sdc1   8:33   0   1.8T  0 part
> -- 2nd disk
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Bad magic on superblock on /dev/sda at 65536

2018-04-07 Thread Ben Parsons
On 7 April 2018 at 22:09, Qu Wenruo  wrote:
>
>
> On 2018年04月07日 10:31, Ben Parsons wrote:
> [snip]
>>> Pretty common hard power reset.
>>>
 looking at journalctl, there is a large stacktrace from kernel: amdgpu
 (see attached).
 then when I booted back up the pool (2 disks, 1TB + 2TB) wouldn't mount.
>>>
>>> I'd say such corruption is pretty serious.
>>>
>>> And what's the profile of the btrfs? If metadata is raid1, we could at
>>> least try to recovery the superblock from the remaining disk.
>>
>> I am not sure what the metadata was but the two disks had no parity
>> and just appeared as a single disk with total space of the two disks
>
> Strangely, for the 2nd disk, it's sdc1, which means it has partition table.
> While for the 1st disk, it's sda, without partition table at all.
> Is there any possibility that you just took run partition?
> (Or did some program uses it incorrectly?)
>

I dont quite understand what you are asking.
I was always under the impression I could run mount on either
partition and it would mount the pool

>>
>> how would i got about recovering the 2nd disk? attached is
>
> The 2nd disk looks good, however it's csum_type is wrong.
> 41700 looks like garbage.
>
> Despite that, incompact_flags also has garbage.
>
> The good news is, the system (and metadata) profile is RAID1, so it's
> highly possible for us to salvage (to be more accurate, rebuild) the
> superblock for the 1st device.
>
> Please dump the superblock of the 2nd device (sdc1) by the following
> command:
>
> # dd if=/dev/sdc1 of=super_dump.sdc1 bs=1 count=4096 skip=64k
>

See attached.

>
> Unfortunately, btrfs-sb-mod tool added recently doesn't have all needed
> fields, so I'm afraid I need to manually modify it.
>
> And just in case, please paste the following output to help us verify if
> it's really sda without offset:
>
> # lsblk /dev/sda
> # grep -obUaP "\x5F\x42\x48\x52\x66\x53\x5F\x4D"
>

dd if=/dev/sdb of=toGrep.sdb bs=1 count=128M status=progress
cat toGrep.sdb | grep -obUaP "\x5F\x42\x48\x52\x66\x53\x5F\x4D"

65600:_BHRfS_M
67108928:_BHRfS_M

>
> Above grep could be very slow since it will try to iterate the whole
> disk. It's recommended to dump the first 128M of the disk and then grep
> on that 128M image.
>
>
> BTW, with superblock of sdc1 patched, you should be able to mount the fs
> with -o ro,degraded, and salvage some data.
>
> Thanks,
> Qu

Thank you so much!

I am better off copying the data to another disk and then rebuilding the pool?
or can I just run a scrub after the super block is fixed?

For reference here is lsblk:

sda  8:00 465.8G  0 disk
├─sda1   8:10   512M  0 part /boot
├─sda2   8:20 455.3G  0 part /
└─sda3   8:3010G  0 part [SWAP]

sdb  8:16   0 931.5G  0 disk
-- first disk

sdc  8:32   0   1.8T  0 disk
└─sdc1   8:33   0   1.8T  0 part
-- 2nd disk

>>
>> btrfs inspect dump-super -Ffa
>>
>> for the second disk
>>
>>> And is there special mount options used here like discard?
>>
>> compress=lzo, noauto
>>
>>> Thanks,
>>> Qu
>>>
>>
>> Thank you for all your help so far.
>> Does this mean that the first disk is definatly gone? is there any way
>> to recover?
>>
>>
>> Thank,
>> Ben
>>

 Thanks,
 Ben

 On 7 April 2018 at 09:44, Qu Wenruo  wrote:
>
>
> On 2018年04月07日 01:03, David Sterba wrote:
>> On Fri, Apr 06, 2018 at 11:32:34PM +1000, Ben Parsons wrote:
>>> Hi,
>>>
>>> I just had an unexpected restart and now my btrfs pool wont mount.
>>> The error on mount is:
>>>
>>> "ERROR: unsupported checksum algorithm 41700"
>>>
>>> and when running
>>>
>>> btrfs inspect-internal dump-super /dev/sda
>>> ERROR: bad magic on superblock on /dev/sda at 65536
>>>
>>> I saw a thread in the mailing list about it:
>>> https://www.spinics.net/lists/linux-btrfs/msg75326.html
>>> However I am told on IRC that Qu fixed it using magic.
>>>
>>> Any help would be much appreciated.
>>
>> In the previous report, there were 2 isolated areas of superblock
>> damaged. Please post output of
>>
>>   btrfs inspect dump-super /path
>
> And don't forget -Ffa option.
> -F to force btrfs-progs to recognize it as btrfs no matter what the magic 
> is
> -f shows all data so we could find all corruption and fix them if possible
> -a shows all backup superblocks, and if some backup is good, "btrfs
> rescue super-recovery" mentioned by Nikolay would be the best solution.
>
> Despite that, any extra info on how this happened is also appreciated,
> as similar problem happened twice, which means we need to pay attention
> on this.
>
> Thanks,
> Qu
>
> Thanks,
> Qu
>
>>
>> so we can see if it's a similar issue.
>>
>> In case it is, there's a tool in the btrfs-progs repo that can fix the
>> individual values.
>> --
>> To unsubscribe from this list: send the l

Re: Unable to compile btrfs progs 4.16 on ubuntu Xenial

2018-04-07 Thread Menion
I am adding - - prefix=/usr that seems you are not using

2018-04-07 21:55 GMT+02:00 Nikolay Borisov :
>
>
> On  7.04.2018 20:16, Menion wrote:
>> Hi all
>> Apparently it is not possible to compile with python bindings the
>> btrfs progs on ubuntu xenial
>>
>> checking for a Python interpreter with version >= 3.4... python3
>> checking for python3... /usr/bin/python3
>> checking for python3 version... 3.5
>> checking for python3 platform... linux
>> checking for python3 script directory... 
>> ${prefix}/lib/python3.5/site-packages
>> checking for python3 extension module directory...
>> ${exec_prefix}/lib/python3.5/site-packages
>> checking for python-3.5... no
>> configure: error: Package requirements (python-3.5) were not met:
>>
>> No package 'python-3.5' found
>>
>> Consider adjusting the PKG_CONFIG_PATH environment variable if you
>> installed software in a non-standard prefix.
>>
>> Alternatively, you may set the environment variables PYTHON_CFLAGS
>> and PYTHON_LIBS to avoid the need to call pkg-config.
>> See the pkg-config man page for more details.
>>
>> /usr/lib/python3.5/site-packages exists, but on Ubuntu the package
>> name is python3.5 and not python-3.5
>
>
> Works for me, I'm also on xenial:
>
> checking for python3... /usr/bin/python3
> checking for python3 version... 3.5
> checking for python3 platform... linux
> checking for python3 script directory... ${prefix}/lib/python3.5/site-packages
> checking for python3 extension module directory... 
> ${exec_prefix}/lib/python3.5/site-packages
> checking for PYTHON... yes
> checking for lzo_version in -llzo2... yes
> configure: creating ./config.status
> config.status: creating Makefile.inc
> config.status: creating Documentation/Makefile
> config.status: creating version.h
> config.status: creating config.h
>
> btrfs-progs v4.16
>
> prefix: /usr/local
> exec prefix:${prefix}
>
> bindir: ${exec_prefix}/bin
> libdir: ${exec_prefix}/lib
> includedir: ${prefix}/include
>
> compiler:   gcc
> cflags: -g -O1 -Wall -D_FORTIFY_SOURCE=2
> ldflags:
>
> documentation:  no
> doc generator:  none
> backtrace support:  yes
> btrfs-convert:  no
> btrfs-restore zstd: no
> Python bindings:yes
> Python interpreter: /usr/bin/python3
>
>
>
>
>>
>> Bye
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Unable to compile btrfs progs 4.16 on ubuntu Xenial

2018-04-07 Thread Nikolay Borisov


On  7.04.2018 20:16, Menion wrote:
> Hi all
> Apparently it is not possible to compile with python bindings the
> btrfs progs on ubuntu xenial
> 
> checking for a Python interpreter with version >= 3.4... python3
> checking for python3... /usr/bin/python3
> checking for python3 version... 3.5
> checking for python3 platform... linux
> checking for python3 script directory... ${prefix}/lib/python3.5/site-packages
> checking for python3 extension module directory...
> ${exec_prefix}/lib/python3.5/site-packages
> checking for python-3.5... no
> configure: error: Package requirements (python-3.5) were not met:
> 
> No package 'python-3.5' found
> 
> Consider adjusting the PKG_CONFIG_PATH environment variable if you
> installed software in a non-standard prefix.
> 
> Alternatively, you may set the environment variables PYTHON_CFLAGS
> and PYTHON_LIBS to avoid the need to call pkg-config.
> See the pkg-config man page for more details.
> 
> /usr/lib/python3.5/site-packages exists, but on Ubuntu the package
> name is python3.5 and not python-3.5


Works for me, I'm also on xenial: 

checking for python3... /usr/bin/python3
checking for python3 version... 3.5
checking for python3 platform... linux
checking for python3 script directory... ${prefix}/lib/python3.5/site-packages
checking for python3 extension module directory... 
${exec_prefix}/lib/python3.5/site-packages
checking for PYTHON... yes
checking for lzo_version in -llzo2... yes
configure: creating ./config.status
config.status: creating Makefile.inc
config.status: creating Documentation/Makefile
config.status: creating version.h
config.status: creating config.h

btrfs-progs v4.16

prefix: /usr/local
exec prefix:${prefix}

bindir: ${exec_prefix}/bin
libdir: ${exec_prefix}/lib
includedir: ${prefix}/include

compiler:   gcc
cflags: -g -O1 -Wall -D_FORTIFY_SOURCE=2
ldflags:

documentation:  no
doc generator:  none
backtrace support:  yes
btrfs-convert:  no 
btrfs-restore zstd: no
Python bindings:yes
Python interpreter: /usr/bin/python3




> 
> Bye
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Unable to compile btrfs progs 4.16 on ubuntu Xenial

2018-04-07 Thread Menion
Hi all
Apparently it is not possible to compile with python bindings the
btrfs progs on ubuntu xenial

checking for a Python interpreter with version >= 3.4... python3
checking for python3... /usr/bin/python3
checking for python3 version... 3.5
checking for python3 platform... linux
checking for python3 script directory... ${prefix}/lib/python3.5/site-packages
checking for python3 extension module directory...
${exec_prefix}/lib/python3.5/site-packages
checking for python-3.5... no
configure: error: Package requirements (python-3.5) were not met:

No package 'python-3.5' found

Consider adjusting the PKG_CONFIG_PATH environment variable if you
installed software in a non-standard prefix.

Alternatively, you may set the environment variables PYTHON_CFLAGS
and PYTHON_LIBS to avoid the need to call pkg-config.
See the pkg-config man page for more details.

/usr/lib/python3.5/site-packages exists, but on Ubuntu the package
name is python3.5 and not python-3.5

Bye
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Bad magic on superblock on /dev/sda at 65536

2018-04-07 Thread Qu Wenruo


On 2018年04月07日 10:31, Ben Parsons wrote:
[snip]
>> Pretty common hard power reset.
>>
>>> looking at journalctl, there is a large stacktrace from kernel: amdgpu
>>> (see attached).
>>> then when I booted back up the pool (2 disks, 1TB + 2TB) wouldn't mount.
>>
>> I'd say such corruption is pretty serious.
>>
>> And what's the profile of the btrfs? If metadata is raid1, we could at
>> least try to recovery the superblock from the remaining disk.
> 
> I am not sure what the metadata was but the two disks had no parity
> and just appeared as a single disk with total space of the two disks

Strangely, for the 2nd disk, it's sdc1, which means it has partition table.
While for the 1st disk, it's sda, without partition table at all.
Is there any possibility that you just took run partition?
(Or did some program uses it incorrectly?)

> 
> how would i got about recovering the 2nd disk? attached is

The 2nd disk looks good, however it's csum_type is wrong.
41700 looks like garbage.

Despite that, incompact_flags also has garbage.

The good news is, the system (and metadata) profile is RAID1, so it's
highly possible for us to salvage (to be more accurate, rebuild) the
superblock for the 1st device.

Please dump the superblock of the 2nd device (sdc1) by the following
command:

# dd if=/dev/sdc1 of=super_dump.sdc1 bs=1 count=4096 skip=64k

Unfortunately, btrfs-sb-mod tool added recently doesn't have all needed
fields, so I'm afraid I need to manually modify it.

And just in case, please paste the following output to help us verify if
it's really sda without offset:

# lsblk /dev/sda
# grep -obUaP "\x5F\x42\x48\x52\x66\x53\x5F\x4D"

Above grep could be very slow since it will try to iterate the whole
disk. It's recommended to dump the first 128M of the disk and then grep
on that 128M image.


BTW, with superblock of sdc1 patched, you should be able to mount the fs
with -o ro,degraded, and salvage some data.

Thanks,
Qu
> 
> btrfs inspect dump-super -Ffa
> 
> for the second disk
> 
>> And is there special mount options used here like discard?
> 
> compress=lzo, noauto
> 
>> Thanks,
>> Qu
>>
> 
> Thank you for all your help so far.
> Does this mean that the first disk is definatly gone? is there any way
> to recover?
> 
> 
> Thank,
> Ben
> 
>>>
>>> Thanks,
>>> Ben
>>>
>>> On 7 April 2018 at 09:44, Qu Wenruo  wrote:


 On 2018年04月07日 01:03, David Sterba wrote:
> On Fri, Apr 06, 2018 at 11:32:34PM +1000, Ben Parsons wrote:
>> Hi,
>>
>> I just had an unexpected restart and now my btrfs pool wont mount.
>> The error on mount is:
>>
>> "ERROR: unsupported checksum algorithm 41700"
>>
>> and when running
>>
>> btrfs inspect-internal dump-super /dev/sda
>> ERROR: bad magic on superblock on /dev/sda at 65536
>>
>> I saw a thread in the mailing list about it:
>> https://www.spinics.net/lists/linux-btrfs/msg75326.html
>> However I am told on IRC that Qu fixed it using magic.
>>
>> Any help would be much appreciated.
>
> In the previous report, there were 2 isolated areas of superblock
> damaged. Please post output of
>
>   btrfs inspect dump-super /path

 And don't forget -Ffa option.
 -F to force btrfs-progs to recognize it as btrfs no matter what the magic 
 is
 -f shows all data so we could find all corruption and fix them if possible
 -a shows all backup superblocks, and if some backup is good, "btrfs
 rescue super-recovery" mentioned by Nikolay would be the best solution.

 Despite that, any extra info on how this happened is also appreciated,
 as similar problem happened twice, which means we need to pay attention
 on this.

 Thanks,
 Qu

 Thanks,
 Qu

>
> so we can see if it's a similar issue.
>
> In case it is, there's a tool in the btrfs-progs repo that can fix the
> individual values.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 07/16] btrfs: add proper safety check before resuming dev-replace

2018-04-07 Thread Anand Jain



On 04/07/2018 02:42 PM, Anand Jain wrote:



On 04/04/2018 02:34 AM, David Sterba wrote:

The device replace is paused by unmount or read only remount, and
resumed on next mount or write remount.

The exclusive status should be checked properly as it's a global
invariant and we must not allow 2 operations run. In this case, the
balance can be also paused and resumed under same conditions. It's
always checked first so dev-replace could see the EXCL_OP already taken,
BUT, the ioctl would never let start both at the same time.

Replace the WARN_ON with message and return 0, indicating no error as
this is purely theoretical and the user will be informed. Resolving that
manually should be possible by waiting for the other operation to finish
or cancel the paused state.


  So if there is a paused balance and replace is triggered, a write
  remount will reverse the process? 


 Ok, I am answering my own question:
 Even if the balance is paused that's considered as an exclusive
 operation in progress and does not let replace to start. So there
 is no scenario where paused-balance and replace could co-exist.

 So an asset will be better insted of return 0.

+   if (test_and_set_bit(BTRFS_FS_EXCL_OP, &fs_info->flags)) {
+   btrfs_info(fs_info,
+   "cannot resume dev-replace, other exclusive operation running");
+   return 0;
+   }


Thanks, Anand


that is balance will start and
  replace will hit EXCL_OP and thus canceled? How does it work in
  this case?

Thanks, Anand
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html