Re: Fatal failure, btrfs raid with duplicated metadata

2017-10-11 Thread Ian Kumlien
On Wed, Oct 11, 2017 at 2:42 PM, Jeff Mahoney  wrote:
> On 10/11/17 2:20 PM, Ian Kumlien wrote:
>>
>>
>> On Wed, Oct 11, 2017 at 2:10 PM Jeff Mahoney > > wrote:
>>
>> On 10/11/17 12:41 PM, Ian Kumlien wrote:
>>
>> [--8<--]
>>
>> > Eventually the filesystem becomes read-only and everything is odd...
>>
>> Are you still able to mount it?  I'd be surprised if you could if check
>> can't open the file system.
>>
>>
>> Nope, it's like there never was a filesystem in the first place...
>>
>> But since metadata should be duplicated all over, i'd assume that it
>> would be able to mount it and survive =)
>
> If you'd been using RAID1 or something instead, you'd be able to mount
> the file system and replace the disk.

Yep, It wasn't a problem, i had a backup and so on (and the disks had
been odd before)

>> > Trying to run btrfs check on the disks results in:
>> > btrfs check -b /dev/disk/by-uuid/8d431da9-dad4-481c-a5ad-5e6844f31da0
>> > bytenr mismatch, want=912228352, have=0
>> > Couldn't read tree root
>> > ERROR: cannot open file system
>> >
>> > (For backup and normal)
>> >
>> > So even if the data is duplicated on all disks, something in the above
>> > errors seemed to cause it to abort
>> > (These disks are seagate sshd disks, never ever buying them again)
>>
>> If you have metadata: dup, that doesn't mean the metadata is duplicated
>> on every disk.  It means that there are two copies of the metadata on a
>> single disk.  If that disk is going bad and returning failures for both
>> copies of the metadata, you may be out of luck.  It's really intended
>> for single spinning disks to get a little bit more resiliency in the
>> face of bad sectors.
>>
>>
>> Oh? it looks like it would be 2 per 1 device, but ok - Then i could have
>> had a issue where the drive that keeps the metadata is gone... I
>> suspected that I did do DUP on multiple devices
>>
>> from the man page:
>>Note 1: DUP may exist on more than 1 device if it starts on a
>> single device and another one is added. Since version 4.5.1,
>>mkfs.btrfs will let you create DUP on multiple devices.
>
> I can see how you'd reach that conclusion.  The wording is somewhat
> confusing.  We allocate space in chunks that are usually about 1GB in
> size.  When DUP is used, we allocate two chunks on the same device and
> that is presented as a single usable chunk.  The constituent chunks will
> be allocated on the same device, but which device is used can change
> with each allocation.
>
> Say you have 5 disks and 8 metadata chunks.  They can be allocated like so:
>
> sda: A A D D
> sdb: B B
> sdc: C C
> sde: F F G G
> sdf: H H I I
>
> There is no redundancy in the case of a disk failure, only for sector
> failures.  To spread metadata across disks for redundancy you'll need to
> use a raid mode instead.  If one of those disks is failing and it
> contains a critical part of the metadata, the file system won't be
> mountable.

Yeah, thanks =)

>> The check error above means that it wasn't able to map a logical address
>> to a physical address.  Typically that means that the mapping was lost.
>>
>>
>> I was more reporting that it happened and if there was any useful data
>> that we could extract from this if it's a failure that shouldn't happen :)
>>
>> I haven't wiped anything yet - preparing to replace the disks though
>
> Thanks for reporting it, but in this context, it's somewhat of an
> expected failure mode.

Yep, i see that now, thanks =)
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fatal failure, btrfs raid with duplicated metadata

2017-10-11 Thread Jeff Mahoney
On 10/11/17 2:20 PM, Ian Kumlien wrote:
> 
> 
> On Wed, Oct 11, 2017 at 2:10 PM Jeff Mahoney  > wrote:
> 
> On 10/11/17 12:41 PM, Ian Kumlien wrote:
> 
> [--8<--] 
> 
> > Eventually the filesystem becomes read-only and everything is odd...
> 
> Are you still able to mount it?  I'd be surprised if you could if check
> can't open the file system.
> 
> 
> Nope, it's like there never was a filesystem in the first place... 
> 
> But since metadata should be duplicated all over, i'd assume that it
> would be able to mount it and survive =)

If you'd been using RAID1 or something instead, you'd be able to mount
the file system and replace the disk.
> > Trying to run btrfs check on the disks results in:
> > btrfs check -b /dev/disk/by-uuid/8d431da9-dad4-481c-a5ad-5e6844f31da0
> > bytenr mismatch, want=912228352, have=0
> > Couldn't read tree root
> > ERROR: cannot open file system
> >
> > (For backup and normal)
> >
> > So even if the data is duplicated on all disks, something in the above
> > errors seemed to cause it to abort
> > (These disks are seagate sshd disks, never ever buying them again)
> 
> If you have metadata: dup, that doesn't mean the metadata is duplicated
> on every disk.  It means that there are two copies of the metadata on a
> single disk.  If that disk is going bad and returning failures for both
> copies of the metadata, you may be out of luck.  It's really intended
> for single spinning disks to get a little bit more resiliency in the
> face of bad sectors.
> 
> 
> Oh? it looks like it would be 2 per 1 device, but ok - Then i could have
> had a issue where the drive that keeps the metadata is gone... I
> suspected that I did do DUP on multiple devices
> 
> from the man page:
>        Note 1: DUP may exist on more than 1 device if it starts on a
> single device and another one is added. Since version 4.5.1,
>        mkfs.btrfs will let you create DUP on multiple devices.

I can see how you'd reach that conclusion.  The wording is somewhat
confusing.  We allocate space in chunks that are usually about 1GB in
size.  When DUP is used, we allocate two chunks on the same device and
that is presented as a single usable chunk.  The constituent chunks will
be allocated on the same device, but which device is used can change
with each allocation.

Say you have 5 disks and 8 metadata chunks.  They can be allocated like so:

sda: A A D D
sdb: B B
sdc: C C
sde: F F G G
sdf: H H I I

There is no redundancy in the case of a disk failure, only for sector
failures.  To spread metadata across disks for redundancy you'll need to
use a raid mode instead.  If one of those disks is failing and it
contains a critical part of the metadata, the file system won't be
mountable.

> The check error above means that it wasn't able to map a logical address
> to a physical address.  Typically that means that the mapping was lost.
> 
> 
> I was more reporting that it happened and if there was any useful data
> that we could extract from this if it's a failure that shouldn't happen :)
> 
> I haven't wiped anything yet - preparing to replace the disks though

Thanks for reporting it, but in this context, it's somewhat of an
expected failure mode.

-Jeff

-- 
Jeff Mahoney
SUSE Labs



signature.asc
Description: OpenPGP digital signature


Re: Fatal failure, btrfs raid with duplicated metadata

2017-10-11 Thread Ian Kumlien
Resent since google inbox is still not doing clear-text emails...

On Wed, Oct 11, 2017 at 2:09 PM, Jeff Mahoney  wrote:
> On 10/11/17 12:41 PM, Ian Kumlien wrote:

[--8<---]

>> Eventually the filesystem becomes read-only and everything is odd...
>
> Are you still able to mount it?  I'd be surprised if you could if check
> can't open the file system.

Nope, it's like there never was a filesystem in the first place...

But since metadata should be duplicated all over, i'd assume that it
would be able to mount it and survive =)

>> Trying to run btrfs check on the disks results in:
>> btrfs check -b /dev/disk/by-uuid/8d431da9-dad4-481c-a5ad-5e6844f31da0
>> bytenr mismatch, want=912228352, have=0
>> Couldn't read tree root
>> ERROR: cannot open file system
>>
>> (For backup and normal)
>>
>> So even if the data is duplicated on all disks, something in the above
>> errors seemed to cause it to abort
>> (These disks are seagate sshd disks, never ever buying them again)
>
> If you have metadata: dup, that doesn't mean the metadata is duplicated
> on every disk.  It means that there are two copies of the metadata on a
> single disk.  If that disk is going bad and returning failures for both
> copies of the metadata, you may be out of luck.  It's really intended
> for single spinning disks to get a little bit more resiliency in the
> face of bad sectors.

Oh? it looks like it would be 2 per 1 device, but ok - Then i could
have had a issue where
the drive that keeps the metadata is gone... I suspected that I did do
DUP on multiple devices

from the man page:
   Note 1: DUP may exist on more than 1 device if it starts on a
single device and another one is added. Since version 4.5.1,
   mkfs.btrfs will let you create DUP on multiple devices.

> The check error above means that it wasn't able to map a logical address
> to a physical address.  Typically that means that the mapping was lost.

I was more reporting that it happened and if there was any useful data
that we could
extract from this if it's a failure that shouldn't happen :)

I haven't wiped anything yet - preparing to replace the disks though
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fatal failure, btrfs raid with duplicated metadata

2017-10-11 Thread Jeff Mahoney
On 10/11/17 12:41 PM, Ian Kumlien wrote:
> Hi,
> 
> I was running a btrfs raid with 6 disks, metadata: dup and data: raid 6
> 
> Two of the disks started behaving oddly:
> [436823.570296] sd 3:1:0:4: [sdf] Unaligned partial completion
> (resid=244, sector_sz=512)
> [436823.578604] sd 3:1:0:4: [sdf] Unaligned partial completion
> (resid=52, sector_sz=512) [436823.617593]
> sd 3:1:0:4: [sdf] Unaligned partial completion (resid=56,
> sector_sz=512)
> [436823.617771] sd 3:1:0:4: [sdf] Unaligned partial completion
> (resid=222, sector_sz=512)
> [436823.618386] sd 3:1:0:4: [sdf] Unaligned partial completion
> (resid=246, sector_sz=512)
> [436823.618463] sd 3:1:0:4: [sdf] Unaligned partial completion
> (resid=56, sector_sz=512)
> [436977.701944] scsi_io_completion: 68 callbacks suppressed
> [436977.701973] sd 3:1:0:4: [sdf] tag#0 FAILED Result: hostbyte=DID_OK
> driverbyte=DRIVER_SENSE
> [436977.701982] sd 3:1:0:4: [sdf] tag#0 Sense Key : Hardware Error
> [current]
> [436977.701991] sd 3:1:0:4: [sdf] tag#0 Add. Sense: Logical unit
> failure
> [436977.702000] sd 3:1:0:4: [sdf] tag#0 CDB: Read(10) 28 00 02 fb fb
> 80 00 00 28 00
> [436977.702005] print_req_error: 68 callbacks suppressed
> [436977.702010] print_req_error: critical target error, dev sdf,
> sector 50068352
> [498132.144319] print_req_error: 450 callbacks suppressed
> [498132.144324] print_req_error: critical target error, dev sdf,
> sector 41777640
> [498132.144590] btrfs_dev_stat_print_on_error: 540 callbacks
> suppressed
> [498132.144600] BTRFS error (device sdb1): bdev /dev/sdf1 errs: wr
> 632, rd 1526, flush 0, corrupt 0, gen 0
> 
> Eventually the filesystem becomes read-only and everything is odd...

Are you still able to mount it?  I'd be surprised if you could if check
can't open the file system.

> Trying to run btrfs check on the disks results in:
> btrfs check -b /dev/disk/by-uuid/8d431da9-dad4-481c-a5ad-5e6844f31da0
> bytenr mismatch, want=912228352, have=0
> Couldn't read tree root
> ERROR: cannot open file system
> 
> (For backup and normal)
> 
> So even if the data is duplicated on all disks, something in the above
> errors seemed to cause it to abort
> (These disks are seagate sshd disks, never ever buying them again)

If you have metadata: dup, that doesn't mean the metadata is duplicated
on every disk.  It means that there are two copies of the metadata on a
single disk.  If that disk is going bad and returning failures for both
copies of the metadata, you may be out of luck.  It's really intended
for single spinning disks to get a little bit more resiliency in the
face of bad sectors.

The check error above means that it wasn't able to map a logical address
to a physical address.  Typically that means that the mapping was lost.

-Jeff


-- 
Jeff Mahoney
SUSE Labs



signature.asc
Description: OpenPGP digital signature


Fatal failure, btrfs raid with duplicated metadata

2017-10-11 Thread Ian Kumlien
Hi,

I was running a btrfs raid with 6 disks, metadata: dup and data: raid 6

Two of the disks started behaving oddly:
[436823.570296] sd 3:1:0:4: [sdf] Unaligned partial completion
(resid=244, sector_sz=512)
[436823.578604] sd 3:1:0:4: [sdf] Unaligned partial completion
(resid=52, sector_sz=512) [436823.617593]
sd 3:1:0:4: [sdf] Unaligned partial completion (resid=56,
sector_sz=512)
[436823.617771] sd 3:1:0:4: [sdf] Unaligned partial completion
(resid=222, sector_sz=512)
[436823.618386] sd 3:1:0:4: [sdf] Unaligned partial completion
(resid=246, sector_sz=512)
[436823.618463] sd 3:1:0:4: [sdf] Unaligned partial completion
(resid=56, sector_sz=512)
[436977.701944] scsi_io_completion: 68 callbacks suppressed
[436977.701973] sd 3:1:0:4: [sdf] tag#0 FAILED Result: hostbyte=DID_OK
driverbyte=DRIVER_SENSE
[436977.701982] sd 3:1:0:4: [sdf] tag#0 Sense Key : Hardware Error
[current]
[436977.701991] sd 3:1:0:4: [sdf] tag#0 Add. Sense: Logical unit
failure
[436977.702000] sd 3:1:0:4: [sdf] tag#0 CDB: Read(10) 28 00 02 fb fb
80 00 00 28 00
[436977.702005] print_req_error: 68 callbacks suppressed
[436977.702010] print_req_error: critical target error, dev sdf,
sector 50068352
[498132.144319] print_req_error: 450 callbacks suppressed
[498132.144324] print_req_error: critical target error, dev sdf,
sector 41777640
[498132.144590] btrfs_dev_stat_print_on_error: 540 callbacks
suppressed
[498132.144600] BTRFS error (device sdb1): bdev /dev/sdf1 errs: wr
632, rd 1526, flush 0, corrupt 0, gen 0

Eventually the filesystem becomes read-only and everything is odd...

Trying to run btrfs check on the disks results in:
btrfs check -b /dev/disk/by-uuid/8d431da9-dad4-481c-a5ad-5e6844f31da0
bytenr mismatch, want=912228352, have=0
Couldn't read tree root
ERROR: cannot open file system

(For backup and normal)

So even if the data is duplicated on all disks, something in the above
errors seemed to cause it to abort
(These disks are seagate sshd disks, never ever buying them again)
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html