Re: Filesystem Corruption
On Mon, Dec 3, 2018, at 4:31 AM, Stefan Malte Schumacher wrote: > I have noticed an unusual amount of crc-errors in downloaded rars, > beginning about a week ago. But lets start with the preliminaries. I > am using Debian Stretch. > Kernel: Linux mars 4.9.0-8-amd64 #1 SMP Debian 4.9.110-3+deb9u4 > (2018-08-21) x86_64 GNU/Linux > > [5390748.884929] Buffer I/O error on dev dm-0, logical block > 976701312, async page read Excuse me for butting when there are *many* more qualified people on this list. But assuming the rar crc errors are related to your unexplained buffer I/O errors, (and not some weird coincidence of simply bad downloads.), I would start, immediately, by testing the Memory. Ram corruption can wreak havok with btrfs, (any filesystem but I think BTRFS has special challenges in this regard.) and this looks like memory error to me.
Re: Filesystem Corruption
On 2018/12/3 下午5:31, Stefan Malte Schumacher wrote: > Hello, > > I have noticed an unusual amount of crc-errors in downloaded rars, > beginning about a week ago. But lets start with the preliminaries. I > am using Debian Stretch. > Kernel: Linux mars 4.9.0-8-amd64 #1 SMP Debian 4.9.110-3+deb9u4 > (2018-08-21) x86_64 GNU/Linux > BTRFS-Tools btrfs-progs 4.7.3-1 > Smartctl shows no errors for any of the drives in the filesystem. > > Btrfs /dev/stats shows zero errors, but dmesg gives me a lot of > filesystem related error messages. > > [5390748.884929] Buffer I/O error on dev dm-0, logical block > 976701312, async page read > This errors is shown a lot of time in the log. No "btrfs:" prefix, looks more like an error message from block level, no wonder btrfs shows no error at all. What is the underlying device mapper? And further more, is there any kernel message with "btrfs" (case-insensitive) in it? Thanks, Qu > > This seems to affect just newly written files. This is the output of > btrfs scrub status: > scrub status for 1609e4e1-4037-4d31-bf12-f84a691db5d8 > scrub started at Tue Nov 27 06:02:04 2018 and finished after 07:34:16 > total bytes scrubbed: 17.29TiB with 0 errors > > What is the probable cause of these errors? How can I fix this? > > Thanks in advance for your advice > Stefan > signature.asc Description: OpenPGP digital signature
Re: Filesystem corruption?
On 2018/10/23 上午4:02, Gervais, Francois wrote: > Hi, > > I think I lost power on my btrfs disk and it looks like it is now in an > unfunctional state. What does the word "unfunctional" mean? Unable to mount? Or what else? > > Any idea how I could debug that issue? > > Here is what I have: > > kernel 4.4.0-119-generic The kernel is somewhat old now. > btrfs-progs v4.4 The progs is definitely too old. It's highly recommended to use the latest btrfs-progs for its better "btrfs check" code. > > > > sudo btrfs check /dev/sdd > Checking filesystem on /dev/sdd > UUID: 9a14b7a1-672c-44da-b49a-1f6566db3e44 > checking extents > checking free space cache > checking fs roots > checking csums > checking root refs So no error reported from all these essential trees. Unless there is some bug in btrfs-progs 4.4, your fs should be mostly OK. > checking quota groups > Ignoring qgroup relation key 310 [snip] > Ignoring qgroup relation key 71776119061217590 Just a lot of qgroup relation key problems. Not a big problem, especially considering you're using older kernel without proper qgroup fixes. Just in case, please run "btrfs check" with latest btrfs-progs (v4.17.1) to see if it reports extra error. Despite that, if the fs can be mounted RW, mount it then execute "btrfs quota disable " should disable quota and solves the problem. Thanks, Qu > found 29301522460 bytes used err is 0 > total csum bytes: 27525424 > total tree bytes: 541573120 > total fs tree bytes: 494223360 > total extent tree bytes: 16908288 > btree space waste bytes: 85047903 > file data blocks allocated: 273892241408 > referenced 44667650048 > extent buffer leak: start 29360128 len 16384 > extent buffer leak: start 740524032 len 16384 > extent buffer leak: start 446840832 len 16384 > extent buffer leak: start 142819328 len 16384 > extent buffer leak: start 143179776 len 16384 > extent buffer leak: start 184107008 len 16384 > extent buffer leak: start 190513152 len 16384 > extent buffer leak: start 190939136 len 16384 > extent buffer leak: start 239943680 len 16384 > extent buffer leak: start 29392896 len 16384 > extent buffer leak: start 295223296 len 16384 > extent buffer leak: start 30556160 len 16384 > extent buffer leak: start 29376512 len 16384 > extent buffer leak: start 29409280 len 16384 > extent buffer leak: start 29491200 len 16384 > extent buffer leak: start 29556736 len 16384 > extent buffer leak: start 29720576 len 16384 > extent buffer leak: start 29884416 len 16384 > extent buffer leak: start 30097408 len 16384 > extent buffer leak: start 30179328 len 16384 > extent buffer leak: start 30228480 len 16384 > extent buffer leak: start 30277632 len 16384 > extent buffer leak: start 30343168 len 16384 > extent buffer leak: start 30392320 len 16384 > extent buffer leak: start 30457856 len 16384 > extent buffer leak: start 30507008 len 16384 > extent buffer leak: start 30572544 len 16384 > extent buffer leak: start 30621696 len 16384 > extent buffer leak: start 30670848 len 16384 > extent buffer leak: start 3072 len 16384 > extent buffer leak: start 30769152 len 16384 > extent buffer leak: start 30801920 len 16384 > extent buffer leak: start 30867456 len 16384 > extent buffer leak: start 30916608 len 16384 > extent buffer leak: start 102498304 len 16384 > extent buffer leak: start 204488704 len 16384 > extent buffer leak: start 237912064 len 16384 > extent buffer leak: start 328499200 len 16384 > extent buffer leak: start 684539904 len 16384 > extent buffer leak: start 849362944 len 16384 > signature.asc Description: OpenPGP digital signature
Re: filesystem corruption
Zygo Blaxell posted on Mon, 03 Nov 2014 23:31:45 -0500 as excerpted: On Mon, Nov 03, 2014 at 10:11:18AM -0700, Chris Murphy wrote: On Nov 2, 2014, at 8:43 PM, Zygo Blaxell zblax...@furryterror.org wrote: btrfs seems to assume the data is correct on both disks (the generation numbers and checksums are OK) but gets confused by equally plausible but different metadata on each disk. It doesn't take long before the filesystem becomes data soup or crashes the kernel. This is a pretty significant problem to still be present, honestly. I can understand the catchup mechanism is probably not built yet, but clearly the two devices don't have the same generation. The lower generation device should probably be booted/ignored or declared missing in the meantime to prevent trashing the file system. The problem with generation numbers is when both devices get divergent generation numbers but we can't tell them apart [snip very reasonable scenario] Now we have two disks with equal generation numbers. Generations 6..9 on sda are not the same as generations 6..9 on sdb, so if we mix the two disks' metadata we get bad confusion. It needs to be more than a sequential number. If one of the disks disappears we need to record this fact on the surviving disks, and also cope with _both_ disks claiming to be the surviving one. Zygo's absolutely correct. There is an existing catchup mechanism, but the tracking is /purely/ sequential generation number based, and if the two generation sequences diverge, Welcome to the (data) Twilight Zone! I noted this in my own early pre-deployment raid1 mode testing as well, except that I didn't at that point know about sequence numbers and never got as far as letting the filesystem make data soup of itself. What I did was this: 1) Create a two-device raid1 data and metadata filesystem, mount it and stick some data on it. 2) Unmount, pull a device, mount degraded the remaining device. 3) Change a file. 4) Unmount, switch devices, mount degraded the other device. 5) Change the same file in an different/incompatible way. 6) Unmount, plug both devices in again, mount (not degraded). 7) Wait for the sync I was used to from mdraid, which of course didn't occur. 8) Check the file to see which version showed up. I don't recall which version it was, but it wasn't the common pre-change version. 9) Unmount, pull each device one at a time, mounting the other one degraded and checking the file again. 10) The file on each device remained different, without a warning or indication of any problem at all when I mounted undegraded in 6/7. Had I initiated a scrub, presumably it would have seen the difference and if one was a newer generation, it would have taken it, overwriting the other. I don't know what it would have done if both were the same generation, tho the file being small (just a few line text file, big enough to test the effect of differing edits), I guess it would take one version or the other. If the file was large enough to be multiple extents, however, I've no idea whether it'd take one or the other, or possibly combine the two, picking extents where they differed more or less randomly. By that time the lack of warning and absolute resolution to one version or the other even after mounting undegraded and accessing the file with incompatible versions on each of the two devices was bothering me sufficiently that I didn't test any further. Being just me I have to worry about (unlike a multi-admin corporate scenario where you can never be /sure/ what the other admins will do regardless of agreed procedure), I simply set myself a set of rules very similar to what Zygo proposed: 1) If for whatever reason I ever split a btrfs raid1 with the intent or even the possibility of bringing the pieces back together again, if at all possible, never mount the split pieces writable -- mount read-only. 2) If a writable mount is required, keep the writable mounts to one device of the split. As long as the other device is never mounted writable, it will have an older generation when they're reunited and a scrub should take care of things, reliably resolving to the updated written device, rewriting the older generation on the other device. What I'd do here is physically put the removed side of the raid1 in storage, far enough from the remaining side that I couldn't possibly get them mixed up. I'd clearly label it as well, creating a defense in depth of at least two, the labeling and the physical separation and storage of the read-only device. 3) If for whatever reason the originally read-only side must be mounted writable, very clearly mark the originally mounted-writable device POISONED/TOXIC!! *NEVER* *EVER* let such a POISONED device anywhere near its original raid1 mate, until it is wiped, such that there's no possibility of btrfs getting confused and contaminated with the poisoned data. Given how unimpressed I was
Re: filesystem corruption
On Nov 3, 2014, at 9:31 PM, Zygo Blaxell zblax...@furryterror.org wrote: On Mon, Nov 03, 2014 at 10:11:18AM -0700, Chris Murphy wrote: On Nov 2, 2014, at 8:43 PM, Zygo Blaxell zblax...@furryterror.org wrote: btrfs seems to assume the data is correct on both disks (the generation numbers and checksums are OK) but gets confused by equally plausible but different metadata on each disk. It doesn't take long before the filesystem becomes data soup or crashes the kernel. This is a pretty significant problem to still be present, honestly. I can understand the catchup mechanism is probably not built yet, but clearly the two devices don't have the same generation. The lower generation device should probably be booted/ignored or declared missing in the meantime to prevent trashing the file system. The problem with generation numbers is when both devices get divergent generation numbers but we can't tell them apart, e.g. 1. sda generation = 5, sdb generation = 5 2. sdb temporarily disconnects, so we are degraded on just sda 3. sda gets more generations 6..9 4. sda temporarily disconnects, so we have no disks at all. 5. the machine reboots, gets sdb back but not sda If we allow degraded here, then: 6. sdb gets more generations 6..9 7. sdb disconnects, no disks so no filesystem 8. the machine reboots again, this time with sda and sdb present Now we have two disks with equal generation numbers. Generations 6..9 on sda are not the same as generations 6..9 on sdb, so if we mix the two disks' metadata we get bad confusion. It needs to be more than a sequential number. If one of the disks disappears we need to record this fact on the surviving disks, and also cope with _both_ disks claiming to be the surviving one. I agree this is also a problem. But the most common case is where we know that sda generation is newer (larger value) and most recently modified, and sdb has not since been modified but needs to be caught up. As far as I know the only way to do that on Btrfs right now is a full balance, it doesn't catch up just be being reconnected with a normal mount. Chris Murphy-- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: filesystem corruption
Chris Murphy posted on Tue, 04 Nov 2014 11:28:39 -0700 as excerpted: It needs to be more than a sequential number. If one of the disks disappears we need to record this fact on the surviving disks, and also cope with _both_ disks claiming to be the surviving one. I agree this is also a problem. But the most common case is where we know that sda generation is newer (larger value) and most recently modified, and sdb has not since been modified but needs to be caught up. As far as I know the only way to do that on Btrfs right now is a full balance, it doesn't catch up just be being reconnected with a normal mount. I thought it was a scrub that would take care of that, not a balance? (Maybe do both to be sure?) -- Duncan - List replies preferred. No HTML msgs. Every nonfree program has a lord, a master -- and if you use the program, he is your master. Richard Stallman -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: filesystem corruption
On 11/04/2014 10:28 AM, Chris Murphy wrote: On Nov 3, 2014, at 9:31 PM, Zygo Blaxell zblax...@furryterror.org wrote: Now we have two disks with equal generation numbers. Generations 6..9 on sda are not the same as generations 6..9 on sdb, so if we mix the two disks' metadata we get bad confusion. It needs to be more than a sequential number. If one of the disks disappears we need to record this fact on the surviving disks, and also cope with _both_ disks claiming to be the surviving one. I agree this is also a problem. But the most common case is where we know that sda generation is newer (larger value) and most recently modified, and sdb has not since been modified but needs to be caught up. As far as I know the only way to do that on Btrfs right now is a full balance, it doesn't catch up just be being reconnected with a normal mount. I would think that any time any system or fraction thereof is mounted with both a degraded and rw, status a degraded flag should be set somewhere/somehow in the superblock etc. The only way to clear this flag would be to reach a reconciled state. That state could be reached in one of several ways. Removing the missing mirror element would be a fast reconcile, doing a balance or scrub would be a slow reconcile for a filessytem where all the media are returned to service (e.g. the missing volume of a RAID 1 etc is returned.) Generation numbers are pretty good, but I'd put on a rider that any generation number or equivelant incremented while the system is degraded should have a unique quanta (say a GUID) generated and stored along with the generation number. The mere existence of this quanta would act as the degraded flag. Any check/compare/access related to the generation number would know to notice that the GUID is in place and do the necessary resolution. If successful the GUID would be discarded. As to how this could be implemented, I'm not fully conversant on the internal layout. One possibility would be to add a block reference, or, indeed replace the current storage for generation numbers completely with block reference to a block containing the generation number and the potential GUID. The main value of having an out-of-structure reference is that its content is less space constrained, and it could be shared by multiple usages. In the case, for instance, where the block is added (as opposed to replacing the generation number) only one such block would be needed per degraded,rw mount, and it could be attached to as many filesystem structures as needed. Just as metadata under DUP is divergent after a degraded mount, a generation block wold be divergent, and likely in a different location than its peers on a subsequent restored geometry. A gerenation block could have other nicities like the date/time and the devices present (or absent); such information could conceivably be used to intellegently disambiguate references. For instance if one degraded mount had sda and sdb, and second had sdb and sdc, then itd be known that sdb was dominant for having been present every time. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: filesystem corruption
On Tue, Nov 04, 2014 at 11:28:39AM -0700, Chris Murphy wrote: On Nov 3, 2014, at 9:31 PM, Zygo Blaxell zblax...@furryterror.org wrote: It needs to be more than a sequential number. If one of the disks disappears we need to record this fact on the surviving disks, and also cope with _both_ disks claiming to be the surviving one. I agree this is also a problem. But the most common case is where we know that sda generation is newer (larger value) and most recently modified, and sdb has not since been modified but needs to be caught up. As far as I know the only way to do that on Btrfs right now is a full balance, it doesn't catch up just be being reconnected with a normal mount. The data on the disks might be inconistent, so resynchronization must read from only the good copy. A balance could just spread corruption around if it reads from two out-of-sync mirrors. (Maybe it already does the right thing if sdb was not modified...?). The full resync operation is more like btrfs device replace, except that it's replacing a disk in-place (i.e. without removing it first), and it would not read from the non-good disk. Chris Murphy-- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html signature.asc Description: Digital signature
Re: filesystem corruption
On Nov 2, 2014, at 8:43 PM, Zygo Blaxell zblax...@furryterror.org wrote: On Sun, Nov 02, 2014 at 02:57:22PM -0700, Chris Murphy wrote: For example if I have a two device Btrfs raid1 for both data and metadata, and one device is removed and I mount -o degraded,rw one of them and make some small changes, unmount, then reconnect the missing device and mount NOT degraded - what happens? I haven't tried this. I have. It's a filesystem-destroying disaster. Never do it, never let it happen accidentally. Make sure that if a disk gets temporarily disconnected, you either never mount it degraded, or never let it come back (i.e. take the disk to another machine and wipefs it). Don't ever, ever put 'degraded' in /etc/fstab mount options. Nope. No. Well I guess I now see why opensuse's plan for Btrfs by default proscribes multiple device Btrfs volumes. The described scenario is really common with users, I see it often on linux-raid@. And md doesn't have this problem. The worst case scenario is if devices don't have bitmaps, and then a whole device rebuild has to happen rather than just a quick catchup. btrfs seems to assume the data is correct on both disks (the generation numbers and checksums are OK) but gets confused by equally plausible but different metadata on each disk. It doesn't take long before the filesystem becomes data soup or crashes the kernel. This is a pretty significant problem to still be present, honestly. I can understand the catchup mechanism is probably not built yet, but clearly the two devices don't have the same generation. The lower generation device should probably be booted/ignored or declared missing in the meantime to prevent trashing the file system. Chris Murphy -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: filesystem corruption
On Mon, Nov 03, 2014 at 10:11:18AM -0700, Chris Murphy wrote: On Nov 2, 2014, at 8:43 PM, Zygo Blaxell zblax...@furryterror.org wrote: btrfs seems to assume the data is correct on both disks (the generation numbers and checksums are OK) but gets confused by equally plausible but different metadata on each disk. It doesn't take long before the filesystem becomes data soup or crashes the kernel. This is a pretty significant problem to still be present, honestly. I can understand the catchup mechanism is probably not built yet, but clearly the two devices don't have the same generation. The lower generation device should probably be booted/ignored or declared missing in the meantime to prevent trashing the file system. The problem with generation numbers is when both devices get divergent generation numbers but we can't tell them apart, e.g. 1. sda generation = 5, sdb generation = 5 2. sdb temporarily disconnects, so we are degraded on just sda 3. sda gets more generations 6..9 4. sda temporarily disconnects, so we have no disks at all. 5. the machine reboots, gets sdb back but not sda If we allow degraded here, then: 6. sdb gets more generations 6..9 7. sdb disconnects, no disks so no filesystem 8. the machine reboots again, this time with sda and sdb present Now we have two disks with equal generation numbers. Generations 6..9 on sda are not the same as generations 6..9 on sdb, so if we mix the two disks' metadata we get bad confusion. It needs to be more than a sequential number. If one of the disks disappears we need to record this fact on the surviving disks, and also cope with _both_ disks claiming to be the surviving one. Chris Murphy -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html signature.asc Description: Digital signature
Re: filesystem corruption
On Nov 1, 2014, at 10:49 PM, Robert White rwh...@pobox.com wrote: On 10/31/2014 10:34 AM, Tobias Holst wrote: I am now using another system with kernel 3.17.2 and btrfs-tools 3.17 and inserted one of the two HDDs of my btrfs-RAID1 to it. I can't add the second one as there are only two slots in that server. This is what I got: tobby@ubuntu: sudo btrfs check /dev/sdb1 warning, device 2 is missing warning devid 2 not found already root item for root 1746, current bytenr 80450240512, current gen 163697, current level 2, new bytenr 40074067968, new gen 163707, new level 2 Found 1 roots with an outdated root item. Please run a filesystem check with the option --repair to fix them. tobby@ubuntu: sudo btrfs check --repair /dev/sdb1 enabling repair mode warning, device 2 is missing warning devid 2 not found already Unable to find block group for 0 extent-tree.c:289: find_search_start: Assertion `1` failed. The read-only snapshots taken under 3.17.1 are your core problem. Now btrfsck is refusing to operate on the degraded RAID because degraded RAID is degraded so it's read-only. (this is an educated guess). Degradedness and writability are orthogonal. If there's some problem with the fs that prevents it from being mountable rw, then that'd apply for both normal and degraded operation. If the fs is OK, it should permit writable degraded mounts. Since btrfsck is _not_ a mount type of operation its got no degraded mode that would let you deal with half a RAID as far as I know. That's a problem. I can see why a repair might need an additional flag (maybe force) to repair a volume that has the minimum number of devices for degraded mounting, but not all are present. Maybe we wouldn't want it to be easy to accidentally run a repair that changes the file system when a device happens to be missing inadvertently that could be found and connected later. I think related to this is a btrfs equivalent of a bitmap. The metadata already has this information in it, but possibly right now btrfs lacks the equivalent behavior of mdadm readd when a previously missing device is reconnected. If it has a bitmap then it doesn't have to be completely rebuilt, the bitmap contains information telling md how to catch up the readded device, i.e. only that which is different needs to be written upon a readd. For example if I have a two device Btrfs raid1 for both data and metadata, and one device is removed and I mount -o degraded,rw one of them and make some small changes, unmount, then reconnect the missing device and mount NOT degraded - what happens? I haven't tried this. And I also don't know if a full balance (hours) is needed to catch up the formerly missing device. With md this is very fast - seconds/minutes depending on how much has been changed. Chris Murphy -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: filesystem corruption
Thank you for your reply. I'll answer in-line. 2014-11-02 5:49 GMT+01:00 Robert White rwh...@pobox.com: On 10/31/2014 10:34 AM, Tobias Holst wrote: I am now using another system with kernel 3.17.2 and btrfs-tools 3.17 and inserted one of the two HDDs of my btrfs-RAID1 to it. I can't add the second one as there are only two slots in that server. This is what I got: tobby@ubuntu: sudo btrfs check /dev/sdb1 warning, device 2 is missing warning devid 2 not found already root item for root 1746, current bytenr 80450240512, current gen 163697, current level 2, new bytenr 40074067968, new gen 163707, new level 2 Found 1 roots with an outdated root item. Please run a filesystem check with the option --repair to fix them. tobby@ubuntu: sudo btrfs check --repair /dev/sdb1 enabling repair mode warning, device 2 is missing warning devid 2 not found already Unable to find block group for 0 extent-tree.c:289: find_search_start: Assertion `1` failed. The read-only snapshots taken under 3.17.1 are your core problem. OK Now btrfsck is refusing to operate on the degraded RAID because degraded RAID is degraded so it's read-only. (this is an educated guess). Since btrfsck is _not_ a mount type of operation its got no degraded mode that would let you deal with half a RAID as far as I know. OK, good to know. In your case... It is _known_ that you need to be _not_ running 3.17.0 or 3.17.1 if you are going to make read-only snapshots safely. It is _known_ that you need to be running 3.17.2 to get a number of fixes that impact your circumstance. It is _known_ that you need to be running btrfs-progs 3.17 to repair the read-only snapshot that are borked up, and that you must _not_ have previously tried to repair the problme with an older btrfsck. No, I didn't try to repair it with older kernels/btrfs-tools. Were I you, I would... Put the two disks back in the same computer before something bad happens. Upgrade that computer to 3.17.2 and 3.17 respectively. As I mentioned before I only have two slots and my system on this btrfs-raid1 is not working anymore. Not just when accessing ro-snapshots - it crashes everytime at the login prompt. So now I installed Ubuntu 14.04 to an USB stick (so I can readd both btrfs HDDs) and upgraded the kernel to 3.17.2 and btrfs-tools to 3.17. Take a backup (because I am paranoid like that, though current threat seems negligible). I already have a backup. :) btrfsck your raid with --repair. OK. And this is what I get now: tobby@ubuntu: sudo btrfs check /dev/sda1 root item for root 1746, current bytenr 80450240512, current gen 163697, current level 2, new bytenr 40074067968, new gen 163707, new level 2 Found 1 roots with an outdated root item. Please run a filesystem check with the option --repair to fix them. tobby@ubuntu: sudo btrfs check /dev/sda1 --repair enabling repair mode fixing root item for root 1746, current bytenr 80450240512, current gen 163697, current level 2, new bytenr 40074067968, new gen 163707, new level 2 Fixed 1 roots. Checking filesystem on /dev/sda1 UUID: 3ad065be-2525-4547-87d3-0e195497f9cf checking extents checking free space cache cache and super generation don't match, space cache will be invalidated checking fs roots root 18446744073709551607 inode 258 errors 1000, some csum missing found 36031450184 bytes used err is 1 total csum bytes: 59665716 total tree bytes: 3523330048 total fs tree bytes: 3234054144 total extent tree bytes: 202358784 btree space waste bytes: 755547262 file data blocks allocated: 122274091008 referenced 211741990912 Btrfs v3.17 Alternately, if you previously tried to btrfsck the raid with a version prior to 3.17 tools after the read-only snapshot(s) problem, you will need to resort to mkfs.btrfs to solve the problem. But Hey! you have two disks, so break the RAID, then mkfs one of them, then copy the data, then re-make the RAID such that the new FS rules. Enjoy your system no longer taking racy read-only snapshots... 8-) And this worked! :) Server is back online without restoring any files from the backup. Looks good to me! But I can't do a balance anymore? root@t-mon:~# btrfs balance start /dev/sda1 ERROR: can't access '/dev/sda1' Regards Tobias -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: filesystem corruption
On Sun, Nov 02, 2014 at 02:57:22PM -0700, Chris Murphy wrote: On Nov 1, 2014, at 10:49 PM, Robert White rwh...@pobox.com wrote: On 10/31/2014 10:34 AM, Tobias Holst wrote: I am now using another system with kernel 3.17.2 and btrfs-tools 3.17 and inserted one of the two HDDs of my btrfs-RAID1 to it. I can't add the second one as there are only two slots in that server. This is what I got: tobby@ubuntu: sudo btrfs check /dev/sdb1 warning, device 2 is missing warning devid 2 not found already root item for root 1746, current bytenr 80450240512, current gen 163697, current level 2, new bytenr 40074067968, new gen 163707, new level 2 Found 1 roots with an outdated root item. Please run a filesystem check with the option --repair to fix them. tobby@ubuntu: sudo btrfs check --repair /dev/sdb1 enabling repair mode warning, device 2 is missing warning devid 2 not found already Unable to find block group for 0 extent-tree.c:289: find_search_start: Assertion `1` failed. The read-only snapshots taken under 3.17.1 are your core problem. Now btrfsck is refusing to operate on the degraded RAID because degraded RAID is degraded so it's read-only. (this is an educated guess). Degradedness and writability are orthogonal. If there's some problem with the fs that prevents it from being mountable rw, then that'd apply for both normal and degraded operation. If the fs is OK, it should permit writable degraded mounts. Since btrfsck is _not_ a mount type of operation its got no degraded mode that would let you deal with half a RAID as far as I know. That's a problem. I can see why a repair might need an additional flag (maybe force) to repair a volume that has the minimum number of devices for degraded mounting, but not all are present. Maybe we wouldn't want it to be easy to accidentally run a repair that changes the file system when a device happens to be missing inadvertently that could be found and connected later. I think related to this is a btrfs equivalent of a bitmap. The metadata already has this information in it, but possibly right now btrfs lacks the equivalent behavior of mdadm readd when a previously missing device is reconnected. If it has a bitmap then it doesn't have to be completely rebuilt, the bitmap contains information telling md how to catch up the readded device, i.e. only that which is different needs to be written upon a readd. For example if I have a two device Btrfs raid1 for both data and metadata, and one device is removed and I mount -o degraded,rw one of them and make some small changes, unmount, then reconnect the missing device and mount NOT degraded - what happens? I haven't tried this. I have. It's a filesystem-destroying disaster. Never do it, never let it happen accidentally. Make sure that if a disk gets temporarily disconnected, you either never mount it degraded, or never let it come back (i.e. take the disk to another machine and wipefs it). Don't ever, ever put 'degraded' in /etc/fstab mount options. Nope. No. btrfs seems to assume the data is correct on both disks (the generation numbers and checksums are OK) but gets confused by equally plausible but different metadata on each disk. It doesn't take long before the filesystem becomes data soup or crashes the kernel. There is more than one way to get to this point. Take LVM snapshots of the devices in a btrfs RAID1 array, and 'btrfs device scan' will see two different versions of each btrfs device in a btrfs filesystem (one for the origin LV and one for the snapshot). btrfs then assembles LVs of different vintages randomly (e.g. one from the mount command line, one from an earlier LVM snapshot of the second disk) with disastrous results similar to the above. IMHO if btrfs sees multiple devices with the same UUIDs, it should reject all of them and require an explicit device list; however, mdadm has a way to deal with this that would also work. mdadm puts event counters and timestamps in the device superblocks to prevent any such accidental disjoint assembly and modification of members of an array. If disks go temporarily offline with separate modifications then mdadm refuses to accept disks with different counter+timestamp data (so you'll get all the disks but one rejected, or only one disk with all others rejected). The rejected disk(s) has to go through full device recovery before rejoining the array--someone has to use mdadm to add the rejected disk as if it was a new, blank one. Currently btrfs won't mount a degraded array by default, which prevents unrecoverable inconsistency. That's a safe behavior for now, but sooner or later btrfs will need to be able to safely boot unattended on a degraded RAID1 root filesystem. And I also don't know if a full balance (hours) is needed to catch up the formerly missing device. With md this is very fast - seconds/minutes depending on how much has been changed. I schedule
Re: filesystem corruption
On 11/02/2014 06:55 PM, Tobias Holst wrote: But I can't do a balance anymore? root@t-mon:~# btrfs balance start /dev/sda1 ERROR: can't access '/dev/sda1' Balance takes place on a mounted filesystem not a native block device. So... mount -t btrfs /dev/sda1 /some/path/somewhere btrfs balance start /some/path/somewhere -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: filesystem corruption
On 10/31/2014 10:34 AM, Tobias Holst wrote: I am now using another system with kernel 3.17.2 and btrfs-tools 3.17 and inserted one of the two HDDs of my btrfs-RAID1 to it. I can't add the second one as there are only two slots in that server. This is what I got: tobby@ubuntu: sudo btrfs check /dev/sdb1 warning, device 2 is missing warning devid 2 not found already root item for root 1746, current bytenr 80450240512, current gen 163697, current level 2, new bytenr 40074067968, new gen 163707, new level 2 Found 1 roots with an outdated root item. Please run a filesystem check with the option --repair to fix them. tobby@ubuntu: sudo btrfs check --repair /dev/sdb1 enabling repair mode warning, device 2 is missing warning devid 2 not found already Unable to find block group for 0 extent-tree.c:289: find_search_start: Assertion `1` failed. The read-only snapshots taken under 3.17.1 are your core problem. Now btrfsck is refusing to operate on the degraded RAID because degraded RAID is degraded so it's read-only. (this is an educated guess). Since btrfsck is _not_ a mount type of operation its got no degraded mode that would let you deal with half a RAID as far as I know. In your case... It is _known_ that you need to be _not_ running 3.17.0 or 3.17.1 if you are going to make read-only snapshots safely. It is _known_ that you need to be running 3.17.2 to get a number of fixes that impact your circumstance. It is _known_ that you need to be running btrfs-progs 3.17 to repair the read-only snapshot that are borked up, and that you must _not_ have previously tried to repair the problme with an older btrfsck. Were I you, I would... Put the two disks back in the same computer before something bad happens. Upgrade that computer to 3.17.2 and 3.17 respectively. Take a backup (because I am paranoid like that, though current threat seems negligible). btrfsck your raid with --repair. Alternately, if you previously tried to btrfsck the raid with a version prior to 3.17 tools after the read-only snapshot(s) problem, you will need to resort to mkfs.btrfs to solve the problem. But Hey! you have two disks, so break the RAID, then mkfs one of them, then copy the data, then re-make the RAID such that the new FS rules. Enjoy your system no longer taking racy read-only snapshots... 8-) -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: filesystem corruption
I am now using another system with kernel 3.17.2 and btrfs-tools 3.17 and inserted one of the two HDDs of my btrfs-RAID1 to it. I can't add the second one as there are only two slots in that server. This is what I got: tobby@ubuntu: sudo btrfs check /dev/sdb1 warning, device 2 is missing warning devid 2 not found already root item for root 1746, current bytenr 80450240512, current gen 163697, current level 2, new bytenr 40074067968, new gen 163707, new level 2 Found 1 roots with an outdated root item. Please run a filesystem check with the option --repair to fix them. tobby@ubuntu: sudo btrfs check --repair /dev/sdb1 enabling repair mode warning, device 2 is missing warning devid 2 not found already Unable to find block group for 0 extent-tree.c:289: find_search_start: Assertion `1` failed. btrfs[0x42bd62] btrfs[0x42ffe5] btrfs[0x430211] btrfs[0x4246ec] btrfs[0x424d11] btrfs[0x426af3] btrfs[0x41b18c] btrfs[0x40b46a] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7ffca1119ec5] btrfs[0x40b497] This can be repeated as often as I want ;) Nothing changed. Regards Tobias 2014-10-31 3:41 GMT+01:00 Rich Freeman r-bt...@thefreemanclan.net: On Thu, Oct 30, 2014 at 9:02 PM, Tobias Holst to...@tobby.eu wrote: Addition: I found some posts here about a general file system corruption in 3.17 and 3.17.1 - is this the cause? Additionally I am using ro-snapshots - maybe this is the cause, too? Anyway: Can I fix that or do I have to reinstall? Haven't touched the filesystem, just did a scrub (found 0 errors). Yup - ro-snapshots is a big problem in 3.17. You can probably recover now by: 1. Update your kernel to 3.17.2 - that takes care of all the big known 3.16/17 issues in general. 2. Run btrfs check using btrfs-tools 3.17. That can clean up the broken snapshots in your filesystem. That is fairly likely to get your filesystem working normally again. It worked for me. I was getting some balance issues when trying to add another device and I'm not sure if 3.17.2 totally fixed that - I ended up cancelling the balance and it will be a while before I have to balance this particular filesystem again, so I'll just hold off and hope things stabilize. -- Rich -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: filesystem corruption
Addition: I found some posts here about a general file system corruption in 3.17 and 3.17.1 - is this the cause? Additionally I am using ro-snapshots - maybe this is the cause, too? Anyway: Can I fix that or do I have to reinstall? Haven't touched the filesystem, just did a scrub (found 0 errors). Regards Tobias 2014-10-31 1:29 GMT+01:00 Tobias Holst to...@tobby.eu: Hi I was using a btrfs RAID1 with two disks under Ubuntu 14.04, kernel 3.13 and btrfs-tools 3.14.1 for weeks without issues. Now I updated to kernel 3.17.1 and btrfs-tools 3.17. After a reboot everything looked fine and I started some tests. While running duperemover (just scanning, not doing anything) and a balance at the same time the load suddenly went up to 30 and the system was not responding anymore. Everyhting working with the filesystem stopped responding. So I did a hard reset. I was able to reboot, but on the login prompt nothing happened but a kernel bug. Same back in kernel 3.13. Now I started a live system (Ubuntu 14.10, kernel 3.16.x, btrfs-tools 3.14.1), and mounted the btrfs filesystem. I can browse through the files but sometimes, especially when accessing my snapshots or trying to create a new snapshot, the kernel bug appears and the filesystem hangs. It shows this: Oct 31 00:09:14 ubuntu kernel: [ 187.661731] [ cut here ] Oct 31 00:09:14 ubuntu kernel: [ 187.661770] WARNING: CPU: 1 PID: 4417 at /build/buildd/linux-3.16.0/fs/btrfs/relocation.c:924 build_backref_tree+0xcab/0x1240 [btrfs]() Oct 31 00:09:14 ubuntu kernel: [ 187.661772] Modules linked in: nls_iso8859_1 dm_crypt gpio_ich coretemp lpc_ich kvm_intel kvm dm_multipath scsi_dh serio_raw xgifb(C) bnep rfcomm bluetooth 6lowpan_iphc i3000_edac edac_core parport_pc mac_hid ppdev shpchp lp parport squashfs overlayfs nls_utf8 isofs btrfs xor raid6_pq dm_mirror dm_region_hash dm_log hid_generic usbhid hid uas usb_storage ahci e1000e libahci ptp pps_core Oct 31 00:09:14 ubuntu kernel: [ 187.661800] CPU: 1 PID: 4417 Comm: btrfs-balance Tainted: G C3.16.0-23-generic #31-Ubuntu Oct 31 00:09:14 ubuntu kernel: [ 187.661802] Hardware name: Supermicro PDSML/PDSML+, BIOS 6.00 03/06/2009 Oct 31 00:09:14 ubuntu kernel: [ 187.661804] 0009 8800a0ae7a00 8177fcbc Oct 31 00:09:14 ubuntu kernel: [ 187.661807] 8800a0ae7a38 8106fd8d 8800a1440750 8800a1440b48 Oct 31 00:09:14 ubuntu kernel: [ 187.661809] 88020a8ce000 0001 88020b6b0d00 8800a0ae7a48 Oct 31 00:09:14 ubuntu kernel: [ 187.661812] Call Trace: Oct 31 00:09:14 ubuntu kernel: [ 187.661820] [8177fcbc] dump_stack+0x45/0x56 Oct 31 00:09:14 ubuntu kernel: [ 187.661825] [8106fd8d] warn_slowpath_common+0x7d/0xa0 Oct 31 00:09:14 ubuntu kernel: [ 187.661827] [8106fe6a] warn_slowpath_null+0x1a/0x20 Oct 31 00:09:14 ubuntu kernel: [ 187.661842] [c01b734b] build_backref_tree+0xcab/0x1240 [btrfs] Oct 31 00:09:14 ubuntu kernel: [ 187.661857] [c01b7ae1] relocate_tree_blocks+0x201/0x600 [btrfs] Oct 31 00:09:14 ubuntu kernel: [ 187.661872] [c01b88d8] ? add_data_references+0x268/0x2a0 [btrfs] Oct 31 00:09:14 ubuntu kernel: [ 187.661887] [c01b96fd] relocate_block_group+0x25d/0x6b0 [btrfs] Oct 31 00:09:14 ubuntu kernel: [ 187.661902] [c01b9d36] btrfs_relocate_block_group+0x1e6/0x2f0 [btrfs] Oct 31 00:09:14 ubuntu kernel: [ 187.661916] [c0190988] btrfs_relocate_chunk.isra.27+0x58/0x720 [btrfs] Oct 31 00:09:14 ubuntu kernel: [ 187.661926] [c0140dc1] ? btrfs_set_path_blocking+0x41/0x80 [btrfs] Oct 31 00:09:14 ubuntu kernel: [ 187.661935] [c0145dfd] ? btrfs_search_slot+0x48d/0xa40 [btrfs] Oct 31 00:09:14 ubuntu kernel: [ 187.661950] [c018b49b] ? release_extent_buffer+0x2b/0xd0 [btrfs] Oct 31 00:09:14 ubuntu kernel: [ 187.661964] [c018b95f] ? free_extent_buffer+0x4f/0xa0 [btrfs] Oct 31 00:09:14 ubuntu kernel: [ 187.661979] [c01936c3] __btrfs_balance+0x4d3/0x8d0 [btrfs] Oct 31 00:09:14 ubuntu kernel: [ 187.661993] [c0193d48] btrfs_balance+0x288/0x600 [btrfs] Oct 31 00:09:14 ubuntu kernel: [ 187.662008] [c019411d] balance_kthread+0x5d/0x80 [btrfs] Oct 31 00:09:14 ubuntu kernel: [ 187.662022] [c01940c0] ? btrfs_balance+0x600/0x600 [btrfs] Oct 31 00:09:14 ubuntu kernel: [ 187.662026] [81094aeb] kthread+0xdb/0x100 Oct 31 00:09:14 ubuntu kernel: [ 187.662029] [81094a10] ? kthread_create_on_node+0x1c0/0x1c0 Oct 31 00:09:14 ubuntu kernel: [ 187.662032] [81787c3c] ret_from_fork+0x7c/0xb0 Oct 31 00:09:14 ubuntu kernel: [ 187.662035] [81094a10] ? kthread_create_on_node+0x1c0/0x1c0 Oct 31 00:09:14 ubuntu kernel: [ 187.662037] ---[ end trace fb7849e4a6f20424 ]--- end this: Oct 31 00:09:14 ubuntu kernel: [ 187.682629] [ cut here
Re: filesystem corruption
On Thu, Oct 30, 2014 at 9:02 PM, Tobias Holst to...@tobby.eu wrote: Addition: I found some posts here about a general file system corruption in 3.17 and 3.17.1 - is this the cause? Additionally I am using ro-snapshots - maybe this is the cause, too? Anyway: Can I fix that or do I have to reinstall? Haven't touched the filesystem, just did a scrub (found 0 errors). Yup - ro-snapshots is a big problem in 3.17. You can probably recover now by: 1. Update your kernel to 3.17.2 - that takes care of all the big known 3.16/17 issues in general. 2. Run btrfs check using btrfs-tools 3.17. That can clean up the broken snapshots in your filesystem. That is fairly likely to get your filesystem working normally again. It worked for me. I was getting some balance issues when trying to add another device and I'm not sure if 3.17.2 totally fixed that - I ended up cancelling the balance and it will be a while before I have to balance this particular filesystem again, so I'll just hold off and hope things stabilize. -- Rich -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html