Re: BTRFS did it's job nicely (thanks!)
On Mon, Nov 5, 2018 at 6:27 AM, Austin S. Hemmelgarn wrote: > On 11/4/2018 11:44 AM, waxhead wrote: >> >> Sterling Windmill wrote: >>> >>> Out of curiosity, what led to you choosing RAID1 for data but RAID10 >>> for metadata? >>> >>> I've flip flipped between these two modes myself after finding out >>> that BTRFS RAID10 doesn't work how I would've expected. >>> >>> Wondering what made you choose your configuration. >>> >>> Thanks! >>> Sure, >> >> >> The "RAID"1 profile for data was chosen to maximize disk space utilization >> since I got a lot of mixed size devices. >> >> The "RAID"10 profile for metadata was chosen simply because it *feels* a >> bit faster for some of my (previous) workload which was reading a lot of >> small files (which I guess was embedded in the metadata). While I never >> remembered that I got any measurable performance increase the system simply >> felt smoother (which is strange since "RAID"10 should hog more disks at >> once). >> >> I would love to try "RAID"10 for both data and metadata, but I have to >> delete some files first (or add yet another drive). >> >> Would you like to elaborate a bit more yourself about how BTRFS "RAID"10 >> does not work as you expected? >> >> As far as I know BTRFS' version of "RAID"10 means it ensure 2 copies (1 >> replica) is striped over as many disks it can (as long as there is free >> space). >> >> So if I am not terribly mistaking a "RAID"10 with 20 devices will stripe >> over (20/2) x 2 and if you run out of space on 10 of the devices it will >> continue to stripe over (5/2) x 2. So your stripe width vary with the >> available space essentially... I may be terribly wrong about this (until >> someones corrects me that is...) > > He's probably referring to the fact that instead of there being a roughly > 50% chance of it surviving the failure of at least 2 devices like classical > RAID10 is technically able to do, it's currently functionally 100% certain > it won't survive more than one device failing. Right. Classic RAID10 is *two block device* copies, where you have mirror1 drives and mirror2 drives, and each mirror pair becomes a single virtual block device that are then striped across. If you lose a single mirror1 drive, its mirror2 data is available and statistically unlikely to also go away. Whereas with Btrfs raid10, it's *two block group* copies. And it is the block group that's striped. That means block group copy 1 is striped across 1/2 the available drives (at the time the bg is allocated), and block group copy 2 is striped across the other drives. When a drive dies, there is no single remaining drive that contains all the missing copies, they're distributed. Which means you've got a very good chance in a 2 drive failure of losing two copies of either metadata or data or both. While I'm not certain it's 100% not survivable, the real gotcha is it's possible maybe even likely that it'll mount and seem to work fine but as soon as it runs into two missing bg's, it'll face plant. -- Chris Murphy
Re: BTRFS did it's job nicely (thanks!)
On 11/4/2018 11:44 AM, waxhead wrote: Sterling Windmill wrote: Out of curiosity, what led to you choosing RAID1 for data but RAID10 for metadata? I've flip flipped between these two modes myself after finding out that BTRFS RAID10 doesn't work how I would've expected. Wondering what made you choose your configuration. Thanks! Sure, The "RAID"1 profile for data was chosen to maximize disk space utilization since I got a lot of mixed size devices. The "RAID"10 profile for metadata was chosen simply because it *feels* a bit faster for some of my (previous) workload which was reading a lot of small files (which I guess was embedded in the metadata). While I never remembered that I got any measurable performance increase the system simply felt smoother (which is strange since "RAID"10 should hog more disks at once). I would love to try "RAID"10 for both data and metadata, but I have to delete some files first (or add yet another drive). Would you like to elaborate a bit more yourself about how BTRFS "RAID"10 does not work as you expected? As far as I know BTRFS' version of "RAID"10 means it ensure 2 copies (1 replica) is striped over as many disks it can (as long as there is free space). So if I am not terribly mistaking a "RAID"10 with 20 devices will stripe over (20/2) x 2 and if you run out of space on 10 of the devices it will continue to stripe over (5/2) x 2. So your stripe width vary with the available space essentially... I may be terribly wrong about this (until someones corrects me that is...) He's probably referring to the fact that instead of there being a roughly 50% chance of it surviving the failure of at least 2 devices like classical RAID10 is technically able to do, it's currently functionally 100% certain it won't survive more than one device failing.
Re: BTRFS did it's job nicely (thanks!)
Sterling Windmill wrote: Out of curiosity, what led to you choosing RAID1 for data but RAID10 for metadata? I've flip flipped between these two modes myself after finding out that BTRFS RAID10 doesn't work how I would've expected. Wondering what made you choose your configuration. Thanks! Sure, The "RAID"1 profile for data was chosen to maximize disk space utilization since I got a lot of mixed size devices. The "RAID"10 profile for metadata was chosen simply because it *feels* a bit faster for some of my (previous) workload which was reading a lot of small files (which I guess was embedded in the metadata). While I never remembered that I got any measurable performance increase the system simply felt smoother (which is strange since "RAID"10 should hog more disks at once). I would love to try "RAID"10 for both data and metadata, but I have to delete some files first (or add yet another drive). Would you like to elaborate a bit more yourself about how BTRFS "RAID"10 does not work as you expected? As far as I know BTRFS' version of "RAID"10 means it ensure 2 copies (1 replica) is striped over as many disks it can (as long as there is free space). So if I am not terribly mistaking a "RAID"10 with 20 devices will stripe over (20/2) x 2 and if you run out of space on 10 of the devices it will continue to stripe over (5/2) x 2. So your stripe width vary with the available space essentially... I may be terribly wrong about this (until someones corrects me that is...)
Re: BTRFS did it's job nicely (thanks!)
Out of curiosity, what led to you choosing RAID1 for data but RAID10 for metadata? I've flip flipped between these two modes myself after finding out that BTRFS RAID10 doesn't work how I would've expected. Wondering what made you choose your configuration. Thanks! On Fri, Nov 2, 2018 at 3:55 PM waxhead wrote: > > Hi, > > my main computer runs on a 7x SSD BTRFS as rootfs with > data:RAID1 and metadata:RAID10. > > One SSD is probably about to fail, and it seems that BTRFS fixed it > nicely (thanks everyone!) > > I decided to just post the ugly details in case someone just wants to > have a look. Note that I tend to interpret the btrfs de st / output as > if the error was NOT fixed even if (seems clearly that) it was, so I > think the output is a bit misleading... just saying... > > > > -- below are the details for those curious (just for fun) --- > > scrub status for [YOINK!] > scrub started at Fri Nov 2 17:49:45 2018 and finished after > 00:29:26 > total bytes scrubbed: 1.15TiB with 1 errors > error details: csum=1 > corrected errors: 1, uncorrectable errors: 0, unverified errors: 0 > > btrfs fi us -T / > Overall: > Device size: 1.18TiB > Device allocated: 1.17TiB > Device unallocated:9.69GiB > Device missing: 0.00B > Used: 1.17TiB > Free (estimated): 6.30GiB (min: 6.30GiB) > Data ratio: 2.00 > Metadata ratio: 2.00 > Global reserve: 512.00MiB (used: 0.00B) > > Data Metadata System > Id Path RAID1 RAID10RAID10Unallocated > -- - - - - --- > 6 /dev/sda1 236.28GiB 704.00MiB 32.00MiB 485.00MiB > 7 /dev/sdb1 233.72GiB 1.03GiB 32.00MiB 2.69GiB > 2 /dev/sdc1 110.56GiB 352.00MiB - 904.00MiB > 8 /dev/sdd1 234.96GiB 1.03GiB 32.00MiB 1.45GiB > 1 /dev/sde1 164.90GiB 1.03GiB 32.00MiB 1.72GiB > 9 /dev/sdf1 109.00GiB 1.03GiB 32.00MiB 744.00MiB > 10 /dev/sdg1 107.98GiB 1.03GiB 32.00MiB 1.74GiB > -- - - - - --- > Total 598.70GiB 3.09GiB 96.00MiB 9.69GiB > Used 597.25GiB 1.57GiB 128.00KiB > > > > uname -a > Linux main 4.18.0-2-amd64 #1 SMP Debian 4.18.10-2 (2018-10-07) x86_64 > GNU/Linux > > btrfs --version > btrfs-progs v4.17 > > > dmesg | grep -i btrfs > [7.801817] Btrfs loaded, crc32c=crc32c-generic > [8.163288] BTRFS: device label btrfsroot devid 10 transid 669961 > /dev/sdg1 > [8.163433] BTRFS: device label btrfsroot devid 9 transid 669961 > /dev/sdf1 > [8.163591] BTRFS: device label btrfsroot devid 1 transid 669961 > /dev/sde1 > [8.163734] BTRFS: device label btrfsroot devid 8 transid 669961 > /dev/sdd1 > [8.163974] BTRFS: device label btrfsroot devid 2 transid 669961 > /dev/sdc1 > [8.164117] BTRFS: device label btrfsroot devid 7 transid 669961 > /dev/sdb1 > [8.164262] BTRFS: device label btrfsroot devid 6 transid 669961 > /dev/sda1 > [8.206174] BTRFS info (device sde1): disk space caching is enabled > [8.206236] BTRFS info (device sde1): has skinny extents > [8.348610] BTRFS info (device sde1): enabling ssd optimizations > [8.854412] BTRFS info (device sde1): enabling free space tree > [8.854471] BTRFS info (device sde1): using free space tree > [ 68.170580] BTRFS warning (device sde1): csum failed root 3760 ino > 3247424 off 125434560512 csum 0x2e395164 expected csum 0x6514b2c2 mirror 2 > [ 68.185973] BTRFS warning (device sde1): csum failed root 3760 ino > 3247424 off 125434560512 csum 0x2e395164 expected csum 0x6514b2c2 mirror 2 > [ 68.185991] BTRFS warning (device sde1): csum failed root 3760 ino > 3247424 off 125434560512 csum 0x2e395164 expected csum 0x6514b2c2 mirror 2 > [ 68.186003] BTRFS warning (device sde1): csum failed root 3760 ino > 3247424 off 125434560512 csum 0x2e395164 expected csum 0x6514b2c2 mirror 2 > [ 68.186015] BTRFS warning (device sde1): csum failed root 3760 ino > 3247424 off 125434560512 csum 0x2e395164 expected csum 0x6514b2c2 mirror 2 > [ 68.186028] BTRFS warning (device sde1): csum failed root 3760 ino > 3247424 off 125434560512 csum 0x2e395164 expected csum 0x6514b2c2 mirror 2 > [ 68.186041] BTRFS warning (device sde1): csum failed root 3760 ino > 3247424 off 125434560512 csum 0x2e395164 expected csum 0x6514b2c2 mirror 2 > [ 68.186052] BTRFS warning (device sde1): csum failed root 3760 ino > 3247424 off 125434560512 csum 0x2e395164 expected csum 0x6514b2c2 mirror 2 > [ 68.186063] BTRFS warning (device sde1): csum failed root 3760 ino > 3247424 off 125434560512 csum 0x2e395164 expected csum 0x6514b2c2 mirror 2 > [ 68.186075] BTRFS warning (device sde1): csum failed root 3760 ino > 3247424 off 125434560512 csum 0x2e395164 expected csum 0x6514b2c2 mirror 2 > [ 68.199237]
Re: BTRFS did it's job nicely (thanks!)
Duncan wrote: waxhead posted on Fri, 02 Nov 2018 20:54:40 +0100 as excerpted: Note that I tend to interpret the btrfs de st / output as if the error was NOT fixed even if (seems clearly that) it was, so I think the output is a bit misleading... just saying... See the btrfs-device manpage, stats subcommand, -z|--reset option, and device stats section: -z|--reset Print the stats and reset the values to zero afterwards. DEVICE STATS The device stats keep persistent record of several error classes related to doing IO. The current values are printed at mount time and updated during filesystem lifetime or from a scrub run. So stats keeps a count of historic errors and is only reset when you specifically reset it, *NOT* when the error is fixed. Yes, I am perfectly aware of all that. The issue I have is that the manpage describes corruption errors as "A block checksum mismatched or corrupted metadata header was found". This does not tell me if this was a permanent corruption or if it was fixed. That is why I think the output is a bit misleadning (and I should have said that more clearly). My point being that btrfs device stats /mnt would have been a lot easier to read and understand if it distinguished between permanent corruption e.g. unfixable errors vs fixed errors. (There's actually a recent patch, I believe in the current dev kernel 4.20/5.0, that will reset a device's stats automatically for the btrfs replace case when it's actually a different device afterward anyway. Apparently, it doesn't even do /that/ automatically yet. Keep that in mind if you replace that device.) Oh thanks for the heads up, I was under the impression that the device stats was tracked by btrfs devid, but apparently it is (was) not. Good to know!
Re: BTRFS did it's job nicely (thanks!)
waxhead posted on Fri, 02 Nov 2018 20:54:40 +0100 as excerpted: > Note that I tend to interpret the btrfs de st / output as if the error > was NOT fixed even if (seems clearly that) it was, so I think the output > is a bit misleading... just saying... See the btrfs-device manpage, stats subcommand, -z|--reset option, and device stats section: -z|--reset Print the stats and reset the values to zero afterwards. DEVICE STATS The device stats keep persistent record of several error classes related to doing IO. The current values are printed at mount time and updated during filesystem lifetime or from a scrub run. So stats keeps a count of historic errors and is only reset when you specifically reset it, *NOT* when the error is fixed. (There's actually a recent patch, I believe in the current dev kernel 4.20/5.0, that will reset a device's stats automatically for the btrfs replace case when it's actually a different device afterward anyway. Apparently, it doesn't even do /that/ automatically yet. Keep that in mind if you replace that device.) -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman
BTRFS did it's job nicely (thanks!)
Hi, my main computer runs on a 7x SSD BTRFS as rootfs with data:RAID1 and metadata:RAID10. One SSD is probably about to fail, and it seems that BTRFS fixed it nicely (thanks everyone!) I decided to just post the ugly details in case someone just wants to have a look. Note that I tend to interpret the btrfs de st / output as if the error was NOT fixed even if (seems clearly that) it was, so I think the output is a bit misleading... just saying... -- below are the details for those curious (just for fun) --- scrub status for [YOINK!] scrub started at Fri Nov 2 17:49:45 2018 and finished after 00:29:26 total bytes scrubbed: 1.15TiB with 1 errors error details: csum=1 corrected errors: 1, uncorrectable errors: 0, unverified errors: 0 btrfs fi us -T / Overall: Device size: 1.18TiB Device allocated: 1.17TiB Device unallocated:9.69GiB Device missing: 0.00B Used: 1.17TiB Free (estimated): 6.30GiB (min: 6.30GiB) Data ratio: 2.00 Metadata ratio: 2.00 Global reserve: 512.00MiB (used: 0.00B) Data Metadata System Id Path RAID1 RAID10RAID10Unallocated -- - - - - --- 6 /dev/sda1 236.28GiB 704.00MiB 32.00MiB 485.00MiB 7 /dev/sdb1 233.72GiB 1.03GiB 32.00MiB 2.69GiB 2 /dev/sdc1 110.56GiB 352.00MiB - 904.00MiB 8 /dev/sdd1 234.96GiB 1.03GiB 32.00MiB 1.45GiB 1 /dev/sde1 164.90GiB 1.03GiB 32.00MiB 1.72GiB 9 /dev/sdf1 109.00GiB 1.03GiB 32.00MiB 744.00MiB 10 /dev/sdg1 107.98GiB 1.03GiB 32.00MiB 1.74GiB -- - - - - --- Total 598.70GiB 3.09GiB 96.00MiB 9.69GiB Used 597.25GiB 1.57GiB 128.00KiB uname -a Linux main 4.18.0-2-amd64 #1 SMP Debian 4.18.10-2 (2018-10-07) x86_64 GNU/Linux btrfs --version btrfs-progs v4.17 dmesg | grep -i btrfs [7.801817] Btrfs loaded, crc32c=crc32c-generic [8.163288] BTRFS: device label btrfsroot devid 10 transid 669961 /dev/sdg1 [8.163433] BTRFS: device label btrfsroot devid 9 transid 669961 /dev/sdf1 [8.163591] BTRFS: device label btrfsroot devid 1 transid 669961 /dev/sde1 [8.163734] BTRFS: device label btrfsroot devid 8 transid 669961 /dev/sdd1 [8.163974] BTRFS: device label btrfsroot devid 2 transid 669961 /dev/sdc1 [8.164117] BTRFS: device label btrfsroot devid 7 transid 669961 /dev/sdb1 [8.164262] BTRFS: device label btrfsroot devid 6 transid 669961 /dev/sda1 [8.206174] BTRFS info (device sde1): disk space caching is enabled [8.206236] BTRFS info (device sde1): has skinny extents [8.348610] BTRFS info (device sde1): enabling ssd optimizations [8.854412] BTRFS info (device sde1): enabling free space tree [8.854471] BTRFS info (device sde1): using free space tree [ 68.170580] BTRFS warning (device sde1): csum failed root 3760 ino 3247424 off 125434560512 csum 0x2e395164 expected csum 0x6514b2c2 mirror 2 [ 68.185973] BTRFS warning (device sde1): csum failed root 3760 ino 3247424 off 125434560512 csum 0x2e395164 expected csum 0x6514b2c2 mirror 2 [ 68.185991] BTRFS warning (device sde1): csum failed root 3760 ino 3247424 off 125434560512 csum 0x2e395164 expected csum 0x6514b2c2 mirror 2 [ 68.186003] BTRFS warning (device sde1): csum failed root 3760 ino 3247424 off 125434560512 csum 0x2e395164 expected csum 0x6514b2c2 mirror 2 [ 68.186015] BTRFS warning (device sde1): csum failed root 3760 ino 3247424 off 125434560512 csum 0x2e395164 expected csum 0x6514b2c2 mirror 2 [ 68.186028] BTRFS warning (device sde1): csum failed root 3760 ino 3247424 off 125434560512 csum 0x2e395164 expected csum 0x6514b2c2 mirror 2 [ 68.186041] BTRFS warning (device sde1): csum failed root 3760 ino 3247424 off 125434560512 csum 0x2e395164 expected csum 0x6514b2c2 mirror 2 [ 68.186052] BTRFS warning (device sde1): csum failed root 3760 ino 3247424 off 125434560512 csum 0x2e395164 expected csum 0x6514b2c2 mirror 2 [ 68.186063] BTRFS warning (device sde1): csum failed root 3760 ino 3247424 off 125434560512 csum 0x2e395164 expected csum 0x6514b2c2 mirror 2 [ 68.186075] BTRFS warning (device sde1): csum failed root 3760 ino 3247424 off 125434560512 csum 0x2e395164 expected csum 0x6514b2c2 mirror 2 [ 68.199237] BTRFS info (device sde1): read error corrected: ino 3247424 off 36700160 (dev /dev/sda1 sector 244987192) [ 68.202602] BTRFS info (device sde1): read error corrected: ino 3247424 off 36704256 (dev /dev/sda1 sector 244987192) [ 68.203176] BTRFS info (device sde1): read error corrected: ino 3247424 off 36712448 (dev /dev/sda1 sector 244987192) [ 68.206762] BTRFS info (device sde1): read error corrected: ino 3247424 off 36708352 (dev /dev/sda1 sector 244987192) [ 68.212071] BTRFS info