Re: possible raid6 corruption

2015-06-02 Thread Sander
Christoph Anton Mitterer wrote (ao):
 May 19 03:25:50 lcg-lrz-dc10 kernel: [903106.581205] sd 0:0:14:0: Device 
 offlined - not ready after error recovery

 May 28 16:38:43 lcg-lrz-dc10 kernel: [1727488.984810] sd 0:0:14:0: rejecting 
 I/O to offline device

 May 28 16:39:19 lcg-lrz-dc10 kernel: [1727524.067182] BTRFS: lost page write 
 due to I/O error on /dev/sdm
 May 28 16:39:19 lcg-lrz-dc10 kernel: [1727524.067426] BTRFS: bdev /dev/sdm 
 errs: wr 1, rd 0, flush 0, corrupt 0, gen 0

 May 28 21:03:06 lcg-lrz-dc10 kernel: [1743336.347191] sd 0:0:14:0: rejecting 
 I/O to offline device
 May 28 21:03:06 lcg-lrz-dc10 kernel: [1743336.369569] BTRFS: lost page write 
 due to I/O error on /dev/sdm

 Well as I've said,.. maybe it's not an issue at all, but at least it's
 strange that this happens on brand new hardware only with the
 btrfs-raid56 node, especially the gazillions of megasas messages.

Brand new hardware is most likely to show (hardware) issues as it has no
proven track record yet while it was subject to any kind of abuse during
transport. I'm sure you will see the same if you put sw raid + ext4 on
this server.

Nice hardware btw, please share your findings.

Sander
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


possible raid6 corruption

2015-06-01 Thread Christoph Anton Mitterer
Hi.

The following is a possible corruption of a btrfs with RAID6,... it may
however also be just and issue with the megasas driver or the PERC
controller behind it.
Anyway since RADI56 is quite new in btrfs, an expert may want to have a
look at it whether it's something that needs to be focused on.

I cannot mount the btrfs since the incident:
I.e.
# mount /dev/sd[any disk of the btrfs raid] /mnt/

gives a:
[358466.484374] BTRFS info (device sda): disk space caching is enabled
[358466.484426] BTRFS: has skinny extents
[358466.485421] BTRFS: failed to read the system array on sda
[358466.543422] BTRFS: open_ctree failed

But no valuable data has been on these devices and I haven't really
tried any of the recovery methods.



What I did:
At the university we run a Tier-2 for the LHC computing grid (i.e. we
have loads of storage).
Recently we bought a number of Dell nodes each with 16 6TB SATA disks,
the disks are connected via a Dell PERC H730P controller (which is based
on some LSI Mega*-whatever, AFAIC).

Since I had 10 new nodes I wanted to use the opportunity and do some
extensive benchmarking, i.e. HW RAID vs. MD RAID, vs btrfs-RAID... +
btrfs and ext4, in all reasonable combinations.
The nodes which were used for MD/btrfs-RAID obviously used the PERC in
pass-through-mode.

As said, the nodes are brand new and during the tests the one with
btrfs-raid6 had a fs crash (all others continued to run fine).

System is Debian jessie, except the kernel from sid (or experimental of
that time) 4.0.0 and btrfs-progs 4.0.

The fs was created pretty much standard: 
# mkfs.btrfs -L data-test -d raid6 -m raid6 /dev/sda /dev/sdb /dev/sdc /dev/sdd 
/dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk /dev/sdl 
/dev/sdm /dev/sdn /dev/sdo /dev/sdp

And then there came some heave iozone stressing: 
# iozone -Rb $(hostname)_1.xls -s 128g -i 0 -i 1 -i 2 -i 5 -j 12 -r 64 -t 1 -F 
/mnt/iozone


Some excerpts from the kerne.log, which might be of interest: 


May 10 00:26:39 lcg-lrz-dc10 kernel: [115511.479387] Btrfs loaded
May 10 00:26:39 lcg-lrz-dc10 kernel: [115511.479680] BTRFS: device label 
data-test devid 1 transid 3 /dev/sda
May 10 00:26:39 lcg-lrz-dc10 kernel: [115511.482080] BTRFS: device label 
data-test devid 2 transid 3 /dev/sdb
May 10 00:26:39 lcg-lrz-dc10 kernel: [115511.484047] BTRFS: device label 
data-test devid 3 transid 3 /dev/sdc
May 10 00:26:39 lcg-lrz-dc10 kernel: [115511.486021] BTRFS: device label 
data-test devid 4 transid 3 /dev/sdd
May 10 00:26:39 lcg-lrz-dc10 kernel: [115511.487892] BTRFS: device label 
data-test devid 5 transid 3 /dev/sde
May 10 00:26:39 lcg-lrz-dc10 kernel: [115511.489849] BTRFS: device label 
data-test devid 6 transid 3 /dev/sdf
May 10 00:26:39 lcg-lrz-dc10 kernel: [115511.491819] BTRFS: device label 
data-test devid 7 transid 3 /dev/sdg
May 10 00:26:39 lcg-lrz-dc10 kernel: [115511.493919] BTRFS: device label 
data-test devid 8 transid 3 /dev/sdh
May 10 00:26:39 lcg-lrz-dc10 kernel: [115511.495761] BTRFS: device label 
data-test devid 9 transid 3 /dev/sdi
May 10 00:26:39 lcg-lrz-dc10 kernel: [115511.497645] BTRFS: device label 
data-test devid 10 transid 3 /dev/sdj
May 10 00:26:39 lcg-lrz-dc10 kernel: [115511.499477] BTRFS: device label 
data-test devid 11 transid 3 /dev/sdk
May 10 00:26:39 lcg-lrz-dc10 kernel: [115511.501307] BTRFS: device label 
data-test devid 12 transid 3 /dev/sdl
May 10 00:26:39 lcg-lrz-dc10 kernel: [115511.503208] BTRFS: device label 
data-test devid 13 transid 3 /dev/sdm
May 10 00:26:39 lcg-lrz-dc10 kernel: [115511.505037] BTRFS: device label 
data-test devid 14 transid 3 /dev/sdn
May 10 00:26:39 lcg-lrz-dc10 kernel: [115511.506837] BTRFS: device label 
data-test devid 15 transid 3 /dev/sdo
May 10 00:26:39 lcg-lrz-dc10 kernel: [115511.508800] BTRFS: device label 
data-test devid 16 transid 3 /dev/sdp
May 10 00:27:34 lcg-lrz-dc10 kernel: [115566.351260] BTRFS info (device sdp): 
disk space caching is enabled
May 10 00:27:34 lcg-lrz-dc10 kernel: [115566.351307] BTRFS: has skinny extents
May 10 00:27:34 lcg-lrz-dc10 kernel: [115566.351333] BTRFS: flagging fs with 
big metadata feature
May 10 00:27:34 lcg-lrz-dc10 kernel: [115566.354089] BTRFS: creating UUID tree



Literally gazillions of these: 
May 19 02:39:19 lcg-lrz-dc10 kernel: [900318.402678] megasas:span 0 rowDataSize 
1
May 19 02:39:19 lcg-lrz-dc10 kernel: [900318.402705] megasas:span 0 rowDataSize 
1

Wile I saw the above lines on all the other nodes as well, there were
only like 30 once, and that's it.
But on the one node with btrfs the log file was flooded to 1,6GB with
these.


At some point I've had this: 
May 19 03:25:19 lcg-lrz-dc10 kernel: [903075.511076] megasas: [ 0]waiting for 1 
commands to complete for scsi0
May 19 03:25:24 lcg-lrz-dc10 kernel: [903080.526184] megasas: [ 5]waiting for 1 
commands to complete for scsi0
May 19 03:25:29 lcg-lrz-dc10 kernel: [903085.541375] megasas: [10]waiting for 1 
commands to complete for scsi0
May 19 03:25:34 lcg-lrz-dc10 kernel: 

Re: possible raid6 corruption

2015-06-01 Thread Chris Murphy
I'm seeing three separate problems:

May 19 03:25:39 lcg-lrz-dc10 kernel: [903095.585150] megasas:
megasas_aen_polling waiting for controller reset to finish for scsi0
May 19 03:25:50 lcg-lrz-dc10 kernel: [903106.581205] sd 0:0:14:0:
Device offlined - not ready after error recovery

I don't know if that's controller related or drive related. In either
case it's hardware related. And then:

May 28 16:40:43 lcg-lrz-dc10 kernel: [1727608.170703] BTRFS: bdev
/dev/sdm errs: wr 12, rd 0, flush 0, corrupt 0, gen 0
May 28 16:40:50 lcg-lrz-dc10 kernel: [1727615.608552] BTRFS: bdev
/dev/sdm errs: wr 12, rd 1, flush 0, corrupt 0, gen 0
...
May 28 16:43:16 lcg-lrz-dc10 kernel: [1727761.077607] BTRFS: bdev
/dev/sdm errs: wr 28, rd 21596, flush 0, corrupt 0, gen 0

This is just the fs saying it can't write to one particular drive, and
then also many read failures. And then:


May 28 21:03:06 lcg-lrz-dc10 kernel: [1743336.369569] BTRFS: lost page
write due to I/O error on /dev/sdm
May 28 21:03:07 lcg-lrz-dc10 kernel: [1743337.093299] sd 0:0:14:0:
rejecting I/O to offline device
May 28 21:03:07 lcg-lrz-dc10 kernel: [1743337.094348] BTRFS (device
sdp): bad tree block start 3328214216270427953 3448651776

So another lost write to the same drive, sdm, and then new problem
which is bad tree block on a different drive sdp. And then:

May 28 21:03:07 lcg-lrz-dc10 kernel: [1743337.096927] BTRFS: error -5
while searching for dev_stats item for device /dev/sdm!
May 28 21:03:07 lcg-lrz-dc10 kernel: [1743337.097314] BTRFS warning
(device sdp): Skipping commit of aborted transaction.

It still hasn't given up on sdm (which seems kinda odd by now that
there are thousands of read errors and the kernel thinks it's offline
anyway), but then now has to deal with problems with sdp. The
resulting stack trace though suggests a umount was in progress?


May 28 22:55:56 lcg-lrz-dc10 kernel: [1750099.616565] CPU: 4 PID:
134844 Comm: umount Tainted: GW   4.0.0-trunk-amd64 #1
Debian 4.0-1~exp1



https://bugs.launchpad.net/ubuntu/+source/linux/+bug/891115
That's an old bug, kernel 3.2 era. But ultimately it looks like it was
hardware related.


Chris Murphy
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html