Re: btrfs cleaner failure - fs/btrfs/extent-tree.c:5748 (3.14.0)
First, my apologies for the broken threads, I had one message where I updated the subject line, but it got cut in two and sent part of the headers in the body :( (operator mistake, sorry) On Sun, May 11, 2014 at 02:28:23AM +, Duncan wrote: That's a fair point but I run scrub every day with errors if any, mailed to me. Can scrub miss latent corruption? Depends on the type of corruption. Scrub simply checks the checksums, replacing any bad copies it finds with good copies if there's good copies to do so with (thus my raid1 here, giving me an alternate to look at, too bad I can't get N-way-mirroring yet and have a second alternate just in case). Bitflipping and random corruption, it should detect and if possible fix, no problem. So I was under the mistaken impression that scrub had to go through the filesystem structure and would find corrupted files but also pointers that went nowhere, or filesystems that had obvious damage. It sounds like I was over optimistic on this one, so as per another message, having an online btrfsck that tell me something is wrong, even if it can't fix it, would indeed be a big plus. Marc -- A mouse is a device used to point at the xterm you want to type in - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 1024R/763BE901 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs cleaner failure - fs/btrfs/extent-tree.c:5748 (3.14.0)
On May 10, 2014 10:09 AM, Hugo Mills h...@carfax.org.uk wrote: As in, Your filesystem got corruption as a result of a bug in some earlier version. Upgrading to the new version isn't magically going to make that corruption go away. (Not saying that's what's happened here, but it's common, and commonly misunderstood). That's a fair point but I run scrub every day with errors if any, mailed to me. Can scrub miss latent corruption? Marc -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs cleaner failure - fs/btrfs/extent-tree.c:5748 (3.14.0)
On Fri, May 09, 2014 at 07:54:20PM -0600, Chris Murphy wrote: However, I have a recent case in VBox guest, with guest additions built. That cause the kernel to be tainted G because it's an out of tree kernel module for guest additions. I'm getting a bunch of Btrfs errors that aren't reproducible with an untainted kernel. So Oh, really? Then considering my crash happened soon after I tried to run vbox but didn't succeed due to a module that was out of date, I'd say that there is a decent chance it's related. That would be a pretty severe bug if it allows it to corrupt data that btrfs uses, but it's possible. However, I'm surprised that btrfs would have gotten so damaged that it can't even reopen its filesystem with btrfs recovery when given the right find-root value. For that to be possible, if it's not a bug in btrfs, it must have been some massive corruption :-/ I'm not filing a bug against Btrfs, instead I've filed a bug against VirtualBox because I'm also getting a pile of read write errors with /dev/sda which is backed by a VDI. A virtual device producing hardware Note that in my case, I wasn't trying to run linux inside vbox, just to start a win7 vm guest on my linux laptop. Is that a case that also is known to cause problems? The win7 VM was backed by a vdi image on my btrfs FS, however since the image never was able to start, I'm not certain it could have done much. Then again, you never know. Given the multiple problems in 3.14 that only seem to be fixed in 3.15rc (that in itself is a bit troubling by the way), I'm going to switch to 3.15rc5, but for the reasons we discussed, this doesn't fill me with joy :-/ Marc -- A mouse is a device used to point at the xterm you want to type in - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 1024R/763BE901 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs cleaner failure - fs/btrfs/extent-tree.c:5748 (3.14.0)
On May 10, 2014, at 7:51 AM, Marc MERLIN m...@merlins.org wrote: Note that in my case, I wasn't trying to run linux inside vbox, just to start a win7 vm guest on my linux laptop. Is that a case that also is known to cause problems? No, the host experiences no issues, although in my case the host is OS X so it's a completely different kernel. I don't think they're related. Mine was just an example of tainted kernel correlating to some other problem, while not known (yet) to be causation the source of the taintedness is suspect. https://www.virtualbox.org/ticket/13022 Chris Murphy -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs cleaner failure - fs/btrfs/extent-tree.c:5748 (3.14.0)
Marc MERLIN posted on Fri, 09 May 2014 20:40:26 -0700 as excerpted: On May 10, 2014 10:09 AM, Hugo Mills h...@carfax.org.uk wrote: As in, Your filesystem got corruption as a result of a bug in some earlier version. Upgrading to the new version isn't magically going to make that corruption go away. (Not saying that's what's happened here, but it's common, and commonly misunderstood). That's a fair point but I run scrub every day with errors if any, mailed to me. Can scrub miss latent corruption? Depends on the type of corruption. Scrub simply checks the checksums, replacing any bad copies it finds with good copies if there's good copies to do so with (thus my raid1 here, giving me an alternate to look at, too bad I can't get N-way-mirroring yet and have a second alternate just in case). Bitflipping and random corruption, it should detect and if possible fix, no problem. But if the bug was a logic error and btrfs validly checksummed bad (meta)data due to that faulty logic, scrub won't do anything to find that, because all it does is validate the checksum and that's perfectly fine -- the result of the faulty logic was still faulty, but perfectly retained. =:^\ Faulty logic is what rebalance and btrfs check will try to detect, except unlike checksums which are binary case and match or don't match, there's all /sorts/ of ways logic can be faulty, and given the immaturity of the tools, there's still some decent gaps in what they'll detect -- there's a LOT more ways that the filesystem can be wrong and the logic faulty than we know about yet, and if we don't know about it, it's pretty hard to test for it. (Let alone the case of btrfs check /thinking/ it detects something wrong, but either it's fine, or it's wrong in a different way than btrfs check thinks, such that btrfs check --repair could actually make things worse... thus the recommendation not to blindly run --repair, only as a last resort before a new mkfs, or on the specific recommendation of a dev.) Bottom line, if the logic was wrong, scrub isn't likely to catch the problem, since the checksum on the faulty logic output can and likely will still be perfectly valid. It's simply the wrong tool to detect that sort of error. -- Duncan - List replies preferred. No HTML msgs. Every nonfree program has a lord, a master -- and if you use the program, he is your master. Richard Stallman -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs cleaner failure - fs/btrfs/extent-tree.c:5748 (3.14.0)
Ok, first for the devs, I found the real trace that happened just before the system went read only My apologies for pasting the bad one first. I'll wipe/rebuild the FS tonight unless you ask me to wait for one more day and/or data off it. Please advise if I should rebuilt with 3.14.3 or 3.15rc4 Thanks. Details: It looks like my corruption came from there. I'm still not sure why it's apparently so severe that btrfs recovery cannot open the FS now. WARNING: CPU: 6 PID: 555 at fs/btrfs/extent-tree.c:5748 __btrfs_free_extent+0x359/0x712() CPU: 6 PID: 555 Comm: btrfs-cleaner Tainted: GW 3.14.0-amd64-i915-preempt-20140216 #2 Hardware name: LENOVO 20BECT0/20BECT0, BIOS GMET28WW (1.08 ) 09/18/2013 8800cd9f1b38 8160a06d 8800cd9f1b70 81050025 812170f6 88013c9cbdf0 fffe 01856000 8800cd9f1b80 Call Trace: [8160a06d] dump_stack+0x4e/0x7a [81050025] warn_slowpath_common+0x7f/0x98 [812170f6] ? __btrfs_free_extent+0x359/0x712 [810500ec] warn_slowpath_null+0x1a/0x1c [812170f6] __btrfs_free_extent+0x359/0x712 [8160f97b] ? _raw_spin_unlock+0x17/0x2a [8126518b] ? btrfs_check_delayed_seq+0x84/0x90 [8121c262] __btrfs_run_delayed_refs+0xa94/0xbdf [8113fcf3] ? __cache_free.isra.39+0x1b4/0x1c3 [8121df46] btrfs_run_delayed_refs+0x81/0x18f [8121ac3a] ? walk_up_tree+0x72/0xf9 [8122af08] btrfs_should_end_transaction+0x52/0x5b [8121cba9] btrfs_drop_snapshot+0x36f/0x610 [8160f97b] ? _raw_spin_unlock+0x17/0x2a [8114020e] ? kfree+0x66/0x85 [8122c73d] btrfs_clean_one_deleted_snapshot+0x103/0x10f [81224f09] cleaner_kthread+0x103/0x136 [81224e06] ? btrfs_alloc_root+0x26/0x26 [8106bc62] kthread+0xae/0xb6 [8106bbb4] ? __kthread_parkme+0x61/0x61 [8161637c] ret_from_fork+0x7c/0xb0 [8106bbb4] ? __kthread_parkme+0x61/0x61 On Fri, May 09, 2014 at 10:19:46AM -0600, Chris Murphy wrote: On May 9, 2014, at 4:35 AM, Marc MERLIN m...@merlins.org wrote: Howdy, I won't have the time to rebuild my laptop tonight, so I'll wait one more day to see if anyone would like data from that fs to see why it crashed and why btrfs recovery doesn't even seem able to open it. There's some underlying reason why it went read only, but we don't have those messages. The message we do have says the kernel is already tainted, so something (possibly entirely unrelated) happened earlier. Oh, I missed that. May 2 14:23:06 legolas kernel: [283268.319035] CPU: 0 PID: 25726 Comm: watchdog/0 Tainted: GW3.14.0-amd64-i915-preempt-20140216 #2 This is weird because I don't use any 3rd party binary modules. Right now, I do see: legolas:~# cat /proc/sys/kernel/tainted 512 Mmmh, so I missed up and pasted the wrong error. I found the real one now, pasted below Also I'm not sure if I should risk 3.15rc to rebuild the filesystem and I'd love not to have to say during my talk that even almost latest btrfs corrupts itself without reason and working recovery methods :-/ Just because the reason isn't yet known or understood yet doesn't mean it's happened without reason. And we also don't know whether it corrupted itself, or had help earlier on. Neither is good, but depending on the cause of the corruption, recovery may not even be realistic. You're right that there is always a reason :) (especially now that I see the real error, my fault for missing it the first time) But I was fairly dismayed that btrfs recovery couldn't even open the filesystem. I was somehow thinking maybe I gave it the wrong options. I'd probably consider 3.13.11 if I simply had work that needs to get done rather than testing. If the problem happens there too then you've stumbled on something that isn't likely a regression. True, although most devs tell you to run the latest, or any problems or bugs are your fault :) (losely paraphrased :) If you've done any suspend/hibernate at all, I'd stop doing that until you're in a position to do a lot more rigorous testing. I say that Thanks for warning me of that. I only use S3 sleep, oh but you say that's bad too? I've been using it for more than 10 years, is it now suddenly cause of kernel and/or filesystem corruption? because suspend and hibernate have become so completely unreliable for so many people I know doing testing, including myself, that it's worth avoiding. I've had lots of corruptions, not just Btrfs, related to suspend testing in particular (hibernate doesn't work either but it hasn't corrupted the file system). And there's a bunch of new work happening on suspend in 3.15 so things are probably about to change yet again. Chris Murphy-- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to
Re: btrfs cleaner failure - fs/btrfs/extent-tree.c:5748 (3.14.0)
On May 9, 2014, at 4:36 PM, Marc MERLIN m...@merlins.org wrote: Details: It looks like my corruption came from there. I'm still not sure why it's apparently so severe that btrfs recovery cannot open the FS now. WARNING: CPU: 6 PID: 555 at fs/btrfs/extent-tree.c:5748 __btrfs_free_extent+0x359/0x712() CPU: 6 PID: 555 Comm: btrfs-cleaner Tainted: GW 3.14.0-amd64-i915-preempt-20140216 #2 Hardware name: LENOVO 20BECT0/20BECT0, BIOS GMET28WW (1.08 ) 09/18/2013 8800cd9f1b38 8160a06d 8800cd9f1b70 81050025 812170f6 88013c9cbdf0 fffe 01856000 8800cd9f1b80 Call Trace: [8160a06d] dump_stack+0x4e/0x7a [81050025] warn_slowpath_common+0x7f/0x98 [812170f6] ? __btrfs_free_extent+0x359/0x712 [810500ec] warn_slowpath_null+0x1a/0x1c [812170f6] __btrfs_free_extent+0x359/0x712 [8160f97b] ? _raw_spin_unlock+0x17/0x2a [8126518b] ? btrfs_check_delayed_seq+0x84/0x90 [8121c262] __btrfs_run_delayed_refs+0xa94/0xbdf [8113fcf3] ? __cache_free.isra.39+0x1b4/0x1c3 [8121df46] btrfs_run_delayed_refs+0x81/0x18f [8121ac3a] ? walk_up_tree+0x72/0xf9 [8122af08] btrfs_should_end_transaction+0x52/0x5b [8121cba9] btrfs_drop_snapshot+0x36f/0x610 [8160f97b] ? _raw_spin_unlock+0x17/0x2a [8114020e] ? kfree+0x66/0x85 [8122c73d] btrfs_clean_one_deleted_snapshot+0x103/0x10f [81224f09] cleaner_kthread+0x103/0x136 [81224e06] ? btrfs_alloc_root+0x26/0x26 [8106bc62] kthread+0xae/0xb6 [8106bbb4] ? __kthread_parkme+0x61/0x61 [8161637c] ret_from_fork+0x7c/0xb0 [8106bbb4] ? __kthread_parkme+0x61/0x61 Well I'm sorta dense, so I only find a complete dmesg useful because with storage problems it seems much is due to some other problem happening earlier. Maybe a fs developer would say yeah that's not good, but we maybe should do better failing gracefully. Call traces don't mean much of anything to me, I think the real problem happened before this, unless it's strictly a Btrfs bug in which case the evidence may be localized in just the trace. Also you said it went read only overnight but I'm seeing a reference here to cleaning up a deleted snapshot? Are you running something that's taking and deleting snapshots on a schedule? On Fri, May 09, 2014 at 10:19:46AM -0600, Chris Murphy wrote: On May 9, 2014, at 4:35 AM, Marc MERLIN m...@merlins.org wrote: Howdy, I won't have the time to rebuild my laptop tonight, so I'll wait one more day to see if anyone would like data from that fs to see why it crashed and why btrfs recovery doesn't even seem able to open it. There's some underlying reason why it went read only, but we don't have those messages. The message we do have says the kernel is already tainted, so something (possibly entirely unrelated) happened earlier. Oh, I missed that. May 2 14:23:06 legolas kernel: [283268.319035] CPU: 0 PID: 25726 Comm: watchdog/0 Tainted: GW3.14.0-amd64-i915-preempt-20140216 #2 This is weird because I don't use any 3rd party binary modules. The G means it's not a proprietary driver involved. You'd have to go through a full dmesg to find out what's causing it, but the point of the tainted state notification is that the kernel is in a state likely no one, or very few, other people are experiencing and any subsequent problems are suspect. Right now, I do see: legolas:~# cat /proc/sys/kernel/tainted 512 Mmmh, so I missed up and pasted the wrong error. I found the real one now, pasted below Also I'm not sure if I should risk 3.15rc to rebuild the filesystem and I'd love not to have to say during my talk that even almost latest btrfs corrupts itself without reason and working recovery methods :-/ Just because the reason isn't yet known or understood yet doesn't mean it's happened without reason. And we also don't know whether it corrupted itself, or had help earlier on. Neither is good, but depending on the cause of the corruption, recovery may not even be realistic. You're right that there is always a reason :) (especially now that I see the real error, my fault for missing it the first time) But I was fairly dismayed that btrfs recovery couldn't even open the filesystem. I was somehow thinking maybe I gave it the wrong options. There are still ZFS corruptions from time to time. And they happen even on file systems that get pounded on mercilessly like NTFS, XFS and HFS+. Almost always it's not the file system itself, something else instigated the problem. Still such mature file systems have bugs being found and fixed. So recovery not working itself doesn't surprise me, I don't even know what caused the problem. I'd probably consider 3.13.11 if I simply had work that needs to get done rather than
Re: btrfs cleaner failure - fs/btrfs/extent-tree.c:5748 (3.14.0)
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi Marc, On Fri, 9 May 2014 03:36:59 PM Marc MERLIN wrote: Oh, I missed that. May 2 14:23:06 legolas kernel: [283268.319035] CPU: 0 PID: 25726 Comm: watchdog/0 Tainted: GW3.14.0-amd64-i915-preempt-20140216 #2 This is weird because I don't use any 3rd party binary modules. There's actually a bunch of reasons a kernel can be tainted. Right now, I do see: legolas:~# cat /proc/sys/kernel/tainted 512 IIUC that's an array of bit flags, and that value means you've had a previous kernel warning at that point according to: https://www.kernel.org/doc/Documentation/sysctl/kernel.txt # tainted: # # Non-zero if the kernel has been tainted. Numeric values, # which can be ORed together: # [...] # 512 - A kernel warning has occurred. Best of luck! Chris - -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.22 (GNU/Linux) iQEVAwUBU21vOo1yjaOTJg85AQLJDwf/QOxRt0f5KqPbhknn8x0XyUQ5upC8PbzD FoDHAkKV7tCUGQ6ZmCufBUKi0beNHNE3YKXlld8zLjlYpyV5lCZIgP3XvjQ/A4pZ Vq+XKiqddaZHOFnjQuk9kseqXJaeH7Vr90xz2D92lcRb3NY6yoD2sdFMhAeN43vh 23stzC2Ybr79NFELWPCL3MTFL4qZrAY/4KFFKDQEZsNHMEJW2zJXX841lFsTXJwO 1Ggsi3WzNCJMo+GHRqH+9Gyb4ICk7u7FABHo+y/dShTGnxAh5/8zMnKidlSfCdzd APKPMrydKEX+O+Fm3zDcKg8gER3FJtWKCyHXfW+zyORTMbxiH5QK5Q== =q69d -END PGP SIGNATURE- -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs cleaner failure - fs/btrfs/extent-tree.c:5748 (3.14.0)
On Sat, May 10, 2014 at 10:13:43AM +1000, Chris Samuel wrote: Right now, I do see: legolas:~# cat /proc/sys/kernel/tainted 512 IIUC that's an array of bit flags, and that value means you've had a previous kernel warning at that point according to: https://www.kernel.org/doc/Documentation/sysctl/kernel.txt Yep, I meant to say that I don't have the 'G' now. It's likely that vbox did 'G' even if I didn't successfully start it, and even if I haven't had problems with it 'till now, it's a possible culprit (more details below) Anyway, it sounds like the FS is toast, there isn't much useful that can be gleaned from it, so I'll just wipe it and start over. I think really my biggest disappointment is that no recovery tools seem to be able to open the FS now even though it was accessible and seemingly working well enough when it was read only before I rebooted. On Fri, May 09, 2014 at 06:00:50PM -0600, Chris Murphy wrote: Well I'm sorta dense, so I only find a complete dmesg useful because with storage problems it seems much is due to some other problem happening earlier. Maybe a fs developer would say yeah that's not True, although I didn't find anything earlier that looked relevant. good, but we maybe should do better failing gracefully. Call traces don't mean much of anything to me, I think the real problem happened before this, unless it's strictly a Btrfs bug in which case the evidence may be localized in just the trace. Sure, the corruption could have happened before the cleaner process uncovered it and then turned my FS read only. But to be honest, before cleaner ran, the FS worked (I was using it), after that, it was read only and upon reboot it became unmountable by anything. That seems suspect to me :-/ Also you said it went read only overnight but I'm seeing a reference here to cleaning up a deleted snapshot? Are you running something that's taking and deleting snapshots on a schedule? Yes, hourly snapshot rotations and hourly btrfs send/receive to my secondary drive, which is still working as of now and I'm using to type this now. (I'll format the SSD and copy things back tonight since I'm worried that if anything happens to my HD, my laptop will be toast until I get home) The G means it's not a proprietary driver involved. You'd have to go through a full dmesg to find out what's causing it, but the point of the tainted state notification is that the kernel is in a state likely no one, or very few, other people are experiencing and any subsequent problems are suspect. Mmmh, I did try to start virtualbox, but it didn't start because the driver was out of date. I did not compile and install the new one yet, nor actually used virtualbox. There are still ZFS corruptions from time to time. And they happen even on file systems that get pounded on mercilessly like NTFS, XFS and HFS+. Almost always it's not the file system itself, something else instigated the problem. Still such mature file systems have bugs being found and fixed. So recovery not working itself doesn't surprise me, I don't even know what caused the problem. True. Never had this with ext2/3/4 in 15 years, but as you say, it's possible. I think Btrfs in general is still buyer beware, but that's in the category of Not News because I think all free software distributions say the same thing, essentially. None of it comes with support or a warranty unless you've bought an SLA. If you really suspect a problem in 3.14.x that may not yet be fixed in 3.15rc or you don't want to run rc kernels is reasonable to run the kernel prior to the current one which is 3.13.11. The way kernel fixes work, a fix has to be demonstrated in Right. I'd want to avoid 3.15rc unless someone tells me I really should be running it. Well you think you've been using it successfully for 10 years. If you've have exactly 0 cases of any kind of fs corruption in 10 years, or can exclude suspend/resume from corruption incident by assurance there was a reboot in between the suspend/resume and the corruption, then maybe you haven't experienced a problem. But Google is full of users who have not merely immediate corruption on suspend/resume Point taken, thanks. But not suspending (S3 sleep) on my lapotp isn't exactly practical either :-/ Marc -- A mouse is a device used to point at the xterm you want to type in - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 1024R/763BE901 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs cleaner failure - fs/btrfs/extent-tree.c:5748 (3.14.0)
On Fri, May 09, 2014 at 05:42:54PM -0700, Marc MERLIN wrote: On Sat, May 10, 2014 at 10:13:43AM +1000, Chris Samuel wrote: Right now, I do see: legolas:~# cat /proc/sys/kernel/tainted 512 IIUC that's an array of bit flags, and that value means you've had a previous kernel warning at that point according to: https://www.kernel.org/doc/Documentation/sysctl/kernel.txt Yep, I meant to say that I don't have the 'G' now. G is actually good, I think. IIRC, it's everything we've had to this point has been under a license where we have the source available. It's when you load a proprietary module that you get the P and the G goes away. It's likely that vbox did 'G' even if I didn't successfully start it, and even if I haven't had problems with it 'till now, it's a possible culprit (more details below) I think G is actually a default state, and is good. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- I write in C because using pointer arithmetic lets people --- know that you're virile. -- Matthew Garrett signature.asc Description: Digital signature
Re: btrfs cleaner failure - fs/btrfs/extent-tree.c:5748 (3.14.0)
On Fri, May 09, 2014 at 06:00:50PM -0600, Chris Murphy wrote: Well I'm sorta dense, so I only find a complete dmesg useful because with storage problems it seems much is due to some other problem happening earlier. Life would be so much easier if filesystems didn't store any persistent state... :) The number of people who don't quite get that that's the function and natural behaviour of a filesystem is... surprising. As in, Your filesystem got corruption as a result of a bug in some earlier version. Upgrading to the new version isn't magically going to make that corruption go away. (Not saying that's what's happened here, but it's common, and commonly misunderstood). Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- The makers of Steinway pianos would like me to tell you that --- this is a Bechstein. signature.asc Description: Digital signature
Re: btrfs cleaner failure - fs/btrfs/extent-tree.c:5748 (3.14.0)
On May 9, 2014, at 7:05 PM, Hugo Mills h...@carfax.org.uk wrote: On Fri, May 09, 2014 at 05:42:54PM -0700, Marc MERLIN wrote: On Sat, May 10, 2014 at 10:13:43AM +1000, Chris Samuel wrote: Right now, I do see: legolas:~# cat /proc/sys/kernel/tainted 512 IIUC that's an array of bit flags, and that value means you've had a previous kernel warning at that point according to: https://www.kernel.org/doc/Documentation/sysctl/kernel.txt Yep, I meant to say that I don't have the 'G' now. G is actually good, I think. IIRC, it's everything we've had to this point has been under a license where we have the source available. It's when you load a proprietary module that you get the P and the G goes away. It's likely that vbox did 'G' even if I didn't successfully start it, and even if I haven't had problems with it 'till now, it's a possible culprit (more details below) I think G is actually a default state, and is good. The G just means it's not a proprietary kernel module, but it's still out of tree. So the kernel is in a state that we don't really know, without finding out what's causing it to be tainted. If it's a video or wireless driver (pretty likely) then it's probably sufficiently unrelated to fs to not matter. However, I have a recent case in VBox guest, with guest additions built. That cause the kernel to be tainted G because it's an out of tree kernel module for guest additions. I'm getting a bunch of Btrfs errors that aren't reproducible with an untainted kernel. So I'm not filing a bug against Btrfs, instead I've filed a bug against VirtualBox because I'm also getting a pile of read write errors with /dev/sda which is backed by a VDI. A virtual device producing hardware read write errors (as far as linux kernel is concerned). But only with guest additions loaded. And the sustained copy event that triggers it doesn't even involve sda. It's a shared folder copy as the source, to a raw device as destination. Yet I get dozens of read write errors on sda, and ensuing Btrfs complaints as well. But in this case Btrfs is behaving exactly as I'd expect. What's unexpected is the virtual sata device behaving wrong, but apparently only with guest additions loaded. Chris Murphy -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs cleaner failure - fs/btrfs/extent-tree.c:5748 (3.14.0)
Hugo Mills posted on Sat, 10 May 2014 02:09:02 +0100 as excerpted: Life would be so much easier if filesystems didn't store any persistent state... :) The number of people who don't quite get that that's the function and natural behaviour of a filesystem is... surprising. As in, Your filesystem got corruption as a result of a bug in some earlier version. Upgrading to the new version isn't magically going to make that corruption go away. (Not saying that's what's happened here, but it's common, and commonly misunderstood). FWIW, this is why I'm currently doing a mkfs.btrfs and copying over from primary backup (an identically sized partition on the same set of physical devices, also btrfs, secondary backup is reiserfs on a different device, just in case) every few kernel cycles, perhaps twice a year or every eight months. My thinking is that even if scrub/balance/btrfs-check report no problems: a) There are new on-device filesystem features I can now take advantage of (at least, there have been in each of the two mkfs.btrfs cycles I've done so far). And... b) Recreating the filesystem and copying everything over new limits the time-window I'm exposed to old and potentially latent bugs that may have in fact been fixed in new deployments without every having triggered at the time, due to masking from some other bug or happenstance that may eventually go away, otherwise leaving me exposed to this strange corner- case bug from two years or whatever ago. I'll probably continue to do that until btrfs is considered stable, or even past that (tho then likely at a rather lower frequency, say every year to year and a half), because it's relatively easy to do with the way I handle backups. -- Duncan - List replies preferred. No HTML msgs. Every nonfree program has a lord, a master -- and if you use the program, he is your master. Richard Stallman -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html