Re: put 2 hard drives in mdadm raid 1 and detect bitrot like btrfs does, what's that called?

2021-02-04 Thread Andy Smith
Hi Cedric,

On Wed, Feb 03, 2021 at 08:33:18PM +0100,   wrote:
> it's called "dm-integrity", as mentioned in this e-mail:
> https://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg93037.html

If you do this it would be very interesting to see performance
figures for the following setups:

- btrfs with raid1 meta and data allocation
- mdadm raid1 on raw devices
- mdadm raid1 on dm-integrity (no encryption) on raw devices
- mdadm raid1 on dm-integrity (encryption) on raw devices

just to see what kind of performance loss dm-integrity and
encryption is going to impose.

After doing it, it would find a nice home on the Linux RAID wiki:

https://raid.wiki.kernel.org/index.php/Dm-integrity

Cheers,
Andy


Problems with "btrfs dev remove" of dead disk

2016-02-14 Thread Andy Smith
Hi,

One of my drives died earlier in a fairly emphatic way in that not
only did it show IO errors and got removed as a device by the
kernel, but it was also making audible grinding/screeching noises
until I hot unplugged it.

Feb 14 18:29:36 specialbrew kernel: [27576156.070961] ata6.15: SATA link up 3.0 
Gbps (SStatus 123 SControl 0)
Feb 14 18:29:37 specialbrew kernel: [27576157.215312] ata6.00: hard resetting 
link
Feb 14 18:29:37 specialbrew kernel: [27576157.555369] ata6.00: SATA link up 6.0 
Gbps (SStatus 133 SControl 300)
Feb 14 18:29:37 specialbrew kernel: [27576157.560028] ata6.01: hard resetting 
link
Feb 14 18:29:38 specialbrew kernel: [27576157.915797] ata6.01: SATA link up 6.0 
Gbps (SStatus 133 SControl 300)
Feb 14 18:29:38 specialbrew kernel: [27576157.920591] ata6.02: hard resetting 
link
Feb 14 18:29:38 specialbrew kernel: [27576158.275759] ata6.02: SATA link up 6.0 
Gbps (SStatus 133 SControl 300)
Feb 14 18:29:38 specialbrew kernel: [27576158.280603] ata6.03: hard resetting 
link
Feb 14 18:29:38 specialbrew kernel: [27576158.603658] ata6.03: SATA link down 
(SStatus 0 SControl 320)
Feb 14 18:29:38 specialbrew kernel: [27576158.608844] ata6.04: hard resetting 
link
Feb 14 18:29:39 specialbrew kernel: [27576158.947805] ata6.04: SATA link up 6.0 
Gbps (SStatus 133 SControl 300)
Feb 14 18:29:39 specialbrew kernel: [27576158.953058] ata6.05: hard resetting 
link
Feb 14 18:29:39 specialbrew kernel: [27576159.291801] ata6.05: SATA link up 6.0 
Gbps (SStatus 133 SControl 300)
Feb 14 18:29:39 specialbrew kernel: [27576159.297143] ata6.06: hard resetting 
link
Feb 14 18:29:39 specialbrew kernel: [27576159.639850] ata6.06: SATA link up 6.0 
Gbps (SStatus 133 SControl 300)
Feb 14 18:29:39 specialbrew kernel: [27576159.645411] ata6.07: hard resetting 
link
Feb 14 18:29:40 specialbrew kernel: [27576159.971581] ata6.07: SATA link down 
(SStatus 0 SControl 320)
Feb 14 18:29:40 specialbrew kernel: [27576159.977251] ata6.08: hard resetting 
link
Feb 14 18:29:40 specialbrew kernel: [27576160.303533] ata6.08: SATA link down 
(SStatus 0 SControl 320)
Feb 14 18:29:40 specialbrew kernel: [27576160.310056] ata6.09: hard resetting 
link
Feb 14 18:29:40 specialbrew kernel: [27576160.635541] ata6.09: SATA link down 
(SStatus 0 SControl 320)
Feb 14 18:29:40 specialbrew kernel: [27576160.641371] ata6.10: hard resetting 
link
Feb 14 18:29:41 specialbrew kernel: [27576160.967639] ata6.10: SATA link down 
(SStatus 0 SControl 320)
Feb 14 18:29:41 specialbrew kernel: [27576160.973591] ata6.11: hard resetting 
link
Feb 14 18:29:41 specialbrew kernel: [27576161.299570] ata6.11: SATA link down 
(SStatus 0 SControl 320)
Feb 14 18:29:41 specialbrew kernel: [27576161.305670] ata6.12: hard resetting 
link
Feb 14 18:29:41 specialbrew kernel: [27576161.631589] ata6.12: SATA link down 
(SStatus 0 SControl 320)
Feb 14 18:29:41 specialbrew kernel: [27576161.637725] ata6.13: hard resetting 
link
Feb 14 18:29:42 specialbrew kernel: [27576161.963597] ata6.13: SATA link down 
(SStatus 0 SControl 320)
Feb 14 18:29:42 specialbrew kernel: [27576161.969538] ata6.14: hard resetting 
link
Feb 14 18:29:42 specialbrew kernel: [27576162.295657] ata6.14: SATA link down 
(SStatus 0 SControl 320)
Feb 14 18:29:42 specialbrew kernel: [27576162.303094] ata6.00: configured for 
UDMA/100
Feb 14 18:29:42 specialbrew kernel: [27576162.310674] ata6.01: configured for 
UDMA/100
Feb 14 18:29:42 specialbrew kernel: [27576162.317928] ata6.02: configured for 
UDMA/100
Feb 14 18:29:42 specialbrew kernel: [27576162.326589] ata6.04: configured for 
UDMA/100
Feb 14 18:29:42 specialbrew kernel: [27576162.337178] ata6.05: configured for 
UDMA/100
Feb 14 18:29:42 specialbrew kernel: [27576162.344438] ata6.06: configured for 
UDMA/100
Feb 14 18:29:43 specialbrew kernel: [27576163.607145] ata6.03: hard resetting 
link
Feb 14 18:29:44 specialbrew kernel: [27576163.935962] ata6.03: SATA link down 
(SStatus 0 SControl 320)
Feb 14 18:29:44 specialbrew kernel: [27576163.942835] ata6.03: limiting SATA 
link speed to 1.5 Gbps
Feb 14 18:29:49 specialbrew kernel: [27576168.939422] ata6.03: hard resetting 
link
Feb 14 18:29:49 specialbrew kernel: [27576169.264031] ata6.03: SATA link down 
(SStatus 0 SControl 310)
Feb 14 18:29:49 specialbrew kernel: [27576169.270519] ata6.03: disabled
Feb 14 18:29:49 specialbrew kernel: [27576169.276874] end_request: I/O error, 
dev sdh, sector 0
Feb 14 18:29:49 specialbrew kernel: [27576169.282908] 
btrfs_dev_stat_print_on_error: 965 callbacks suppressed
Feb 14 18:29:49 specialbrew kernel: [27576169.282929] ata6: EH complete
Feb 14 18:29:49 specialbrew kernel: [27576169.294246] BTRFS: bdev /dev/sdh 
errs: wr 125, rd 8, flush 1, corrupt 0, gen 0
Feb 14 18:29:49 specialbrew kernel: [27576169.300987] sd 5:3:0:0: rejecting I/O 
to offline device
Feb 14 18:29:49 specialbrew kernel: [27576169.307016] BTRFS: lost page write 
due to I/O error on /dev/sdh
Feb 14 18:29:49 specialbrew kernel: [27576169.312976] BTRFS: bdev /dev/sdh 
errs: wr 126, rd 8, flush 

Re: Problems with "btrfs dev remove" of dead disk

2016-02-14 Thread Andy Smith
Hi Chris,

On Sun, Feb 14, 2016 at 04:49:29PM -0700, Chris Murphy wrote:
> On Sun, Feb 14, 2016 at 2:55 PM, Andy Smith  wrote:
> > $ sudo btrfs dev remove /dev/sdh /srv/tank
> > ERROR: not a block device: /dev/sdh
> 
> 
> Since now it's a missing device, it should be
> 
> sudo btrfs device remove missing /srv/tank

$ sudo btrfs device remove missing /srv/tank
ERROR: error removing device 'missing': no missing devices found to remove

> But I'm not sure if this works when the volume is not already mounted
> degraded.

I have now done:

# mount -oremount,degraded /srv/tank 

and tried again, but it produces the same response ("mount" now does
show "degraded" as one of the mount flags, however).

I have not yet tried completely unmounting it and mounting it again.

> it really doesn't make sense to me you'd want to increase risk of
> more Btrfs problems when such known things are now fixed. Consider
> 4.1.15 if you want a stable long term yet currently supportable
> kernel.

It is inconvenient to reboot just now, so if I'm able to fix things
without doing so (e.g. by balance or replace) then I would like to.

If that won't be possible then I will of course boot into a newer
kernel at the same time.

If I end up booting into 4.1.15 then it should be possible to mount
degraded and remove missing?

Cheers,
Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Is this normal? Should I use scrub?

2015-04-01 Thread Andy Smith
Hello,

I have a 6 device RAID-1 filesystem:

$ sudo btrfs fi df /srv/tank
Data, RAID1: total=1.24TiB, used=1.24TiB
System, RAID1: total=32.00MiB, used=184.00KiB
Metadata, RAID1: total=3.00GiB, used=1.65GiB
unknown, single: total=512.00MiB, used=0.00
$ sudo btrfs fi sh /srv/tank
Label: 'tank'  uuid: 472ee2b3-4dc3-4fc1-80bc-5ba967069ceb
Total devices 6 FS bytes used 1.24TiB
devid2 size 1.82TiB used 384.03GiB path /dev/sdh
devid3 size 1.82TiB used 383.00GiB path /dev/sdg
devid4 size 1.82TiB used 384.00GiB path /dev/sdf
devid5 size 2.73TiB used 1.13TiB path /dev/sdk
devid6 size 1.82TiB used 121.00GiB path /dev/sdj
devid7 size 2.73TiB used 116.00GiB path /dev/sde

Btrfs v3.14.2

All of these devices are in an external eSATA enclosure.

A few days ago (I believe) something went wrong with the enclosure
hardware and the SCSI bus kept getting reset over and over. At one
point three of the six devices were kicked out and the filesystem
was left running (read-only) on three devices.

Through some trial and error I determined that the enclosure was
taking exception to one of the devices, and by removing it I was
able to get things up and running with five devices, writeable,
mounted in degraded mode. /dev/sdk is the device that was kept out
of the filesystem.

I do not believe that there is anything wrong with /dev/sdk as I put
it in another system and was able to read it entirely, do SMART long
tests on it, etc.

I wasn't able to prove it is a hardware problem until I took the
enclosure out of service as it's the only enclosure I had. So that's
a task for later.

I have now got a new enclosure and put this system back together
with all six devices. I was not expecting this filesystem to mount
without assistance on boot because of /dev/sdk being "stale"
compared to the other devices. I suppose this incorrect view is a
holdover from my experience with mdadm.

Anyway, I booted it and /srv/tank was mounted automatically with all
six devices.  I got a bunch of these messages as soon as it was
mounted:

http://pastie.org/private/2ghahjwtzlcm6hwp66hkg

There's lots more of it but it's all like that. That paste is from
the end of the log and there haven't been any more such message
since, so that's about 20 minutes (the times are in GMT).

Is that normal output indicating that btrfs is repairing the
"staleness" of sdk from the other copy?

I seem to be able to use the filesystem and a cursory inspection
isn't turning up anything that I can't read or that seems
corrupted. I will now run checksums against my last good backup.

Should I run a scrub as well?

Cheers,
Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Is this normal? Should I use scrub?

2015-04-02 Thread Andy Smith
Hi Hugo,

Thanks for your help.

On Wed, Apr 01, 2015 at 03:42:02PM +, Hugo Mills wrote:
> On Wed, Apr 01, 2015 at 03:11:14PM +0000, Andy Smith wrote:
> > Should I run a scrub as well?
> 
>Yes. The output you've had so far will be just the pieces that the
> FS has tried to read, and where, as a result, it's been able to detect
> the out-of-date data. A scrub will check and fix everything.

Thanks, things seem to be fine now. :)

What's the difference between "verufy" and "csum" here?

scrub status for 472ee2b3-4dc3-4fc1-80bc-5ba967069ceb
scrub device /dev/sdh (id 2) history
scrub started at Wed Apr  1 20:05:58 2015 and finished after 14642 
seconds
total bytes scrubbed: 383.42GiB with 0 errors
scrub device /dev/sdg (id 3) history
scrub started at Wed Apr  1 20:05:58 2015 and finished after 14504 
seconds
total bytes scrubbed: 382.62GiB with 0 errors
scrub device /dev/sdf (id 4) history
scrub started at Wed Apr  1 20:05:58 2015 and finished after 14436 
seconds
total bytes scrubbed: 383.00GiB with 0 errors
scrub device /dev/sdk (id 5) history
scrub started at Wed Apr  1 20:05:58 2015 and finished after 21156 
seconds
total bytes scrubbed: 1.13TiB with 14530 errors
error details: verify=10909 csum=3621
corrected errors: 14530, uncorrectable errors: 0, unverified errors: 0
scrub device /dev/sdj (id 6) history
scrub started at Wed Apr  1 20:05:58 2015 and finished after 5693 
seconds
total bytes scrubbed: 119.42GiB with 0 errors
scrub device /dev/sde (id 7) history
scrub started at Wed Apr  1 20:05:58 2015 and finished after 5282 
seconds
total bytes scrubbed: 114.45GiB with 0 errors

Cheers,
Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 40TB volume taking over 16 hours to mount, any ideas?

2014-08-09 Thread Andy Smith
Hello,

On Sat, Aug 09, 2014 at 01:38:34PM +1000, Russell Coker wrote:
> On Fri, 8 Aug 2014 16:35:29 Jose Ildefonso Camargo Tolosa wrote:
> > Then, after reading here and there, decided to try to use a newer
> > kernel, tried 3.15.8.  Well, it is still mounting after ~16 hours, and
> > I got messages like these at first:
> 
> I recommend trying a 3.14 kernel.  I had ongoing problems with kernels before 
> 3.14 which included infinite loops in kernel space.  Based on reports on this 
> list I haven't been inclined to test 3.15 kernels.  But 3.14 has been working 
> well for me on many systems.

I'm in a similar position with a filesystem that won't mount except
read-only, but am already on 3.14 and am also wondering whether to
try a 3.16 kernel.

https://bugzilla.kernel.org/show_bug.cgi?id=81981

Jose, maybe you could try -oro in the hope of at least getting back
to a read-only mount?

Cheers,
Andy

-- 
"I remember the first time I made love.  Perhaps it was not love exactly but I
 made it and it still works." — The League Against Tedium
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html