Re: btrfs cleaner failure - fs/btrfs/extent-tree.c:5748 (3.14.0)

2014-05-11 Thread Marc MERLIN
First, my apologies for the broken threads, I had one message where I
updated the subject line, but it got cut in two and sent part of the
headers in the body :(
(operator mistake, sorry)

On Sun, May 11, 2014 at 02:28:23AM +, Duncan wrote:
  That's a fair point but I run scrub every day with errors if any, mailed
  to me.
  Can scrub miss latent corruption?
 
 Depends on the type of corruption.  Scrub simply checks the checksums, 
 replacing any bad copies it finds with good copies if there's good copies 
 to do so with (thus my raid1 here, giving me an alternate to look at, too 
 bad I can't get N-way-mirroring yet and have a second alternate just in 
 case).  Bitflipping and random corruption, it should detect and if 
 possible fix, no problem.
 
So I was under the mistaken impression that scrub had to go through the
filesystem structure and would find corrupted files but also pointers
that went nowhere, or filesystems that had obvious damage.
It sounds like I was over optimistic on this one, so as per another
message, having an online btrfsck that tell me something is wrong, even
if it can't fix it, would indeed be a big plus.

Marc
-- 
A mouse is a device used to point at the xterm you want to type in - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs cleaner failure - fs/btrfs/extent-tree.c:5748 (3.14.0)

2014-05-10 Thread Marc MERLIN
On May 10, 2014 10:09 AM, Hugo Mills h...@carfax.org.uk wrote:
As in, Your filesystem got corruption as a result of a bug in some
 earlier version. Upgrading to the new version isn't magically going to
 make that corruption go away. (Not saying that's what's happened
 here, but it's common, and commonly misunderstood).

That's a fair point but I run scrub every day with errors if any, mailed to
me.
Can scrub miss latent corruption?

Marc
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs cleaner failure - fs/btrfs/extent-tree.c:5748 (3.14.0)

2014-05-10 Thread Marc MERLIN
On Fri, May 09, 2014 at 07:54:20PM -0600, Chris Murphy wrote:
 However, I have a recent case in VBox guest, with guest additions
 built. That cause the kernel to be tainted G because it's an out
 of tree kernel module for guest additions. I'm getting a bunch of
 Btrfs errors that aren't reproducible with an untainted kernel. So

Oh, really?

Then considering my crash happened soon after I tried to run vbox but
didn't succeed due to a module that was out of date, I'd say that there
is a decent chance it's related.

That would be a pretty severe bug if it allows it to corrupt data that
btrfs uses, but it's possible.

However, I'm surprised that btrfs would have gotten so damaged that it
can't even reopen its filesystem with btrfs recovery when given the
right find-root value. For that to be possible, if it's not a bug in
btrfs, it must have been some massive corruption :-/

 I'm not filing a bug against Btrfs, instead I've filed a bug against
 VirtualBox because I'm also getting a pile of read write errors with
 /dev/sda which is backed by a VDI. A virtual device producing hardware

Note that in my case, I wasn't trying to run linux inside vbox, just to
start a win7 vm guest on my linux laptop.
Is that a case that also is known to cause problems?

The win7 VM was backed by a vdi image on my btrfs FS, however since the
image never was able to start, I'm not certain it could have done much.
Then again, you never know.

Given the multiple problems in 3.14 that only seem to be fixed in 3.15rc
(that in itself is a bit troubling by the way), I'm going to switch to
3.15rc5, but for the reasons we discussed, this doesn't fill me with joy
:-/

Marc
-- 
A mouse is a device used to point at the xterm you want to type in - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs cleaner failure - fs/btrfs/extent-tree.c:5748 (3.14.0)

2014-05-10 Thread Chris Murphy

On May 10, 2014, at 7:51 AM, Marc MERLIN m...@merlins.org wrote:

 Note that in my case, I wasn't trying to run linux inside vbox, just to
 start a win7 vm guest on my linux laptop.
 Is that a case that also is known to cause problems?

No, the host experiences no issues, although in my case the host is OS X so 
it's a completely different kernel. I don't think they're related. Mine was 
just an example of tainted kernel correlating to some other problem, while not 
known (yet) to be causation the source of the taintedness is suspect.

https://www.virtualbox.org/ticket/13022


Chris Murphy

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs cleaner failure - fs/btrfs/extent-tree.c:5748 (3.14.0)

2014-05-10 Thread Duncan
Marc MERLIN posted on Fri, 09 May 2014 20:40:26 -0700 as excerpted:

 On May 10, 2014 10:09 AM, Hugo Mills h...@carfax.org.uk wrote:
As in, Your filesystem got corruption as a result of a bug in some
 earlier version. Upgrading to the new version isn't magically going to
 make that corruption go away. (Not saying that's what's happened here,
 but it's common, and commonly misunderstood).
 
 That's a fair point but I run scrub every day with errors if any, mailed
 to me.
 Can scrub miss latent corruption?

Depends on the type of corruption.  Scrub simply checks the checksums, 
replacing any bad copies it finds with good copies if there's good copies 
to do so with (thus my raid1 here, giving me an alternate to look at, too 
bad I can't get N-way-mirroring yet and have a second alternate just in 
case).  Bitflipping and random corruption, it should detect and if 
possible fix, no problem.

But if the bug was a logic error and btrfs validly checksummed bad 
(meta)data due to that faulty logic, scrub won't do anything to find 
that, because all it does is validate the checksum and that's perfectly 
fine -- the result of the faulty logic was still faulty, but perfectly 
retained. =:^\

Faulty logic is what rebalance and btrfs check will try to detect, except 
unlike checksums which are binary case and match or don't match, there's 
all /sorts/ of ways logic can be faulty, and given the immaturity of the 
tools, there's still some decent gaps in what they'll detect -- there's a 
LOT more ways that the filesystem can be wrong and the logic faulty than 
we know about yet, and if we don't know about it, it's pretty hard to 
test for it.

(Let alone the case of btrfs check /thinking/ it detects something wrong, 
but either it's fine, or it's wrong in a different way than btrfs check 
thinks, such that btrfs check --repair could actually make things 
worse... thus the recommendation not to blindly run --repair, only as a 
last resort before a new mkfs, or on the specific recommendation of a 
dev.)

Bottom line, if the logic was wrong, scrub isn't likely to catch the 
problem, since the checksum on the faulty logic output can and likely 
will still be perfectly valid.  It's simply the wrong tool to detect that 
sort of error.

-- 
Duncan - List replies preferred.   No HTML msgs.
Every nonfree program has a lord, a master --
and if you use the program, he is your master.  Richard Stallman

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs cleaner failure - fs/btrfs/extent-tree.c:5748 (3.14.0)

2014-05-09 Thread Marc MERLIN
Ok, first for the devs, I found the real trace that happened just before the 
system went
read only
My apologies for pasting the bad one first.

I'll wipe/rebuild the FS tonight unless you ask me to wait for one more day 
and/or data off it.

Please advise if I should rebuilt with 3.14.3 or 3.15rc4

Thanks.


Details:
It looks like my corruption came from there.
I'm still not sure why it's apparently so severe that btrfs recovery cannot
open the FS now.

WARNING: CPU: 6 PID: 555 at fs/btrfs/extent-tree.c:5748 
__btrfs_free_extent+0x359/0x712()
CPU: 6 PID: 555 Comm: btrfs-cleaner Tainted: GW
3.14.0-amd64-i915-preempt-20140216 #2
Hardware name: LENOVO 20BECT0/20BECT0, BIOS GMET28WW (1.08 ) 09/18/2013
  8800cd9f1b38 8160a06d 
 8800cd9f1b70 81050025 812170f6 88013c9cbdf0
 fffe  01856000 8800cd9f1b80
Call Trace:
 [8160a06d] dump_stack+0x4e/0x7a
 [81050025] warn_slowpath_common+0x7f/0x98
 [812170f6] ? __btrfs_free_extent+0x359/0x712
 [810500ec] warn_slowpath_null+0x1a/0x1c
 [812170f6] __btrfs_free_extent+0x359/0x712
 [8160f97b] ? _raw_spin_unlock+0x17/0x2a
 [8126518b] ? btrfs_check_delayed_seq+0x84/0x90
 [8121c262] __btrfs_run_delayed_refs+0xa94/0xbdf
 [8113fcf3] ? __cache_free.isra.39+0x1b4/0x1c3
 [8121df46] btrfs_run_delayed_refs+0x81/0x18f
 [8121ac3a] ? walk_up_tree+0x72/0xf9
 [8122af08] btrfs_should_end_transaction+0x52/0x5b
 [8121cba9] btrfs_drop_snapshot+0x36f/0x610
 [8160f97b] ? _raw_spin_unlock+0x17/0x2a
 [8114020e] ? kfree+0x66/0x85
 [8122c73d] btrfs_clean_one_deleted_snapshot+0x103/0x10f
 [81224f09] cleaner_kthread+0x103/0x136
 [81224e06] ? btrfs_alloc_root+0x26/0x26
 [8106bc62] kthread+0xae/0xb6
 [8106bbb4] ? __kthread_parkme+0x61/0x61
 [8161637c] ret_from_fork+0x7c/0xb0
 [8106bbb4] ? __kthread_parkme+0x61/0x61


On Fri, May 09, 2014 at 10:19:46AM -0600, Chris Murphy wrote:
 
 On May 9, 2014, at 4:35 AM, Marc MERLIN m...@merlins.org wrote:
 
  
  Howdy,
  
  I won't have the time to rebuild my laptop tonight, so I'll wait one more
  day to see if anyone would like data from that fs to see why it crashed and
  why btrfs recovery doesn't even seem able to open it.
 
 There's some underlying reason why it went read only, but we don't
 have those messages. The message we do have says the kernel is already
 tainted, so something (possibly entirely unrelated) happened earlier.
 
Oh, I missed that.
May  2 14:23:06 legolas kernel: [283268.319035] CPU: 0 PID: 25726 Comm: 
watchdog/0 Tainted: GW3.14.0-amd64-i915-preempt-20140216 #2
This is weird because I don't use any 3rd party binary modules.

Right now, I do see:
legolas:~# cat /proc/sys/kernel/tainted
512

Mmmh, so I missed up and pasted the wrong error. I found the real one now, 
pasted below

  Also I'm not sure if I should risk 3.15rc to rebuild the filesystem and I'd
  love not to have to say during my talk that even almost latest btrfs
  corrupts itself without reason and working recovery methods :-/
 
 Just because the reason isn't yet known or understood yet doesn't mean it's 
 happened without reason. And we also don't know whether it corrupted itself, 
 or had help earlier on. Neither is good, but depending on the cause of the 
 corruption, recovery may not even be realistic.

You're right that there is always a reason :)
(especially now that I see the real error, my fault for missing it the first 
time)

But I was fairly dismayed that btrfs recovery couldn't even open the filesystem.
I was somehow thinking maybe I gave it the wrong options.
 
 I'd probably consider 3.13.11 if I simply had work that needs to get done 
 rather than testing. If the problem happens there too then you've stumbled on 
 something that isn't likely a regression.

True, although most devs tell you to run the latest, or any problems or bugs 
are your fault :)
(losely paraphrased :)

 If you've done any suspend/hibernate at all, I'd stop doing that until
 you're in a position to do a lot more rigorous testing. I say that

Thanks for warning me of that.
I only use S3 sleep, oh but you say that's bad too?
I've been using it for more than 10 years, is it now suddenly cause of
kernel and/or filesystem corruption?

 because suspend and hibernate have become so completely unreliable
 for so many people I know doing testing, including myself, that it's
 worth avoiding. I've had lots of corruptions, not just Btrfs, related
 to suspend testing in particular (hibernate doesn't work either but
 it hasn't corrupted the file system). And there's a bunch of new work
 happening on suspend in 3.15 so things are probably about to change
 yet again.
 
 
 Chris Murphy--
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to 

Re: btrfs cleaner failure - fs/btrfs/extent-tree.c:5748 (3.14.0)

2014-05-09 Thread Chris Murphy

On May 9, 2014, at 4:36 PM, Marc MERLIN m...@merlins.org wrote:

 
 Details:
 It looks like my corruption came from there.
 I'm still not sure why it's apparently so severe that btrfs recovery cannot
 open the FS now.
 
 WARNING: CPU: 6 PID: 555 at fs/btrfs/extent-tree.c:5748 
 __btrfs_free_extent+0x359/0x712()
 CPU: 6 PID: 555 Comm: btrfs-cleaner Tainted: GW
 3.14.0-amd64-i915-preempt-20140216 #2
 Hardware name: LENOVO 20BECT0/20BECT0, BIOS GMET28WW (1.08 ) 09/18/2013
  8800cd9f1b38 8160a06d 
 8800cd9f1b70 81050025 812170f6 88013c9cbdf0
 fffe  01856000 8800cd9f1b80
 Call Trace:
 [8160a06d] dump_stack+0x4e/0x7a
 [81050025] warn_slowpath_common+0x7f/0x98
 [812170f6] ? __btrfs_free_extent+0x359/0x712
 [810500ec] warn_slowpath_null+0x1a/0x1c
 [812170f6] __btrfs_free_extent+0x359/0x712
 [8160f97b] ? _raw_spin_unlock+0x17/0x2a
 [8126518b] ? btrfs_check_delayed_seq+0x84/0x90
 [8121c262] __btrfs_run_delayed_refs+0xa94/0xbdf
 [8113fcf3] ? __cache_free.isra.39+0x1b4/0x1c3
 [8121df46] btrfs_run_delayed_refs+0x81/0x18f
 [8121ac3a] ? walk_up_tree+0x72/0xf9
 [8122af08] btrfs_should_end_transaction+0x52/0x5b
 [8121cba9] btrfs_drop_snapshot+0x36f/0x610
 [8160f97b] ? _raw_spin_unlock+0x17/0x2a
 [8114020e] ? kfree+0x66/0x85
 [8122c73d] btrfs_clean_one_deleted_snapshot+0x103/0x10f
 [81224f09] cleaner_kthread+0x103/0x136
 [81224e06] ? btrfs_alloc_root+0x26/0x26
 [8106bc62] kthread+0xae/0xb6
 [8106bbb4] ? __kthread_parkme+0x61/0x61
 [8161637c] ret_from_fork+0x7c/0xb0
 [8106bbb4] ? __kthread_parkme+0x61/0x61

Well I'm sorta dense, so I only find a complete dmesg useful because with 
storage problems it seems much is due to some other problem happening earlier. 
Maybe a fs developer would say yeah that's not good, but we maybe should do 
better failing gracefully. Call traces don't mean much of anything to me, I 
think the real problem happened before this, unless it's strictly a Btrfs bug 
in which case the evidence may be localized in just the trace.

Also you said it went read only overnight but I'm seeing a reference here to 
cleaning up a deleted snapshot? Are you running something that's taking and 
deleting snapshots on a schedule?
 
 
 On Fri, May 09, 2014 at 10:19:46AM -0600, Chris Murphy wrote:
 
 On May 9, 2014, at 4:35 AM, Marc MERLIN m...@merlins.org wrote:
 
 
 Howdy,
 
 I won't have the time to rebuild my laptop tonight, so I'll wait one more
 day to see if anyone would like data from that fs to see why it crashed and
 why btrfs recovery doesn't even seem able to open it.
 
 There's some underlying reason why it went read only, but we don't
 have those messages. The message we do have says the kernel is already
 tainted, so something (possibly entirely unrelated) happened earlier.
 
 Oh, I missed that.
 May  2 14:23:06 legolas kernel: [283268.319035] CPU: 0 PID: 25726 Comm: 
 watchdog/0 Tainted: GW3.14.0-amd64-i915-preempt-20140216 #2
 This is weird because I don't use any 3rd party binary modules.

The G means it's not a proprietary driver involved. You'd have to go through a 
full dmesg to find out what's causing it, but the point of the tainted state 
notification is that the kernel is in a state likely no one, or very few, other 
people are experiencing and any subsequent problems are suspect. 

 
 Right now, I do see:
 legolas:~# cat /proc/sys/kernel/tainted
 512
 
 Mmmh, so I missed up and pasted the wrong error. I found the real one now, 
 pasted below
 
 Also I'm not sure if I should risk 3.15rc to rebuild the filesystem and I'd
 love not to have to say during my talk that even almost latest btrfs
 corrupts itself without reason and working recovery methods :-/
 
 Just because the reason isn't yet known or understood yet doesn't mean it's 
 happened without reason. And we also don't know whether it corrupted itself, 
 or had help earlier on. Neither is good, but depending on the cause of the 
 corruption, recovery may not even be realistic.
 
 You're right that there is always a reason :)
 (especially now that I see the real error, my fault for missing it the first 
 time)
 
 But I was fairly dismayed that btrfs recovery couldn't even open the 
 filesystem.
 I was somehow thinking maybe I gave it the wrong options.

There are still ZFS corruptions from time to time. And they happen even on file 
systems that get pounded on mercilessly like NTFS, XFS and HFS+. Almost always 
it's not the file system itself, something else instigated the problem. Still 
such mature file systems have bugs being found and fixed. So recovery not 
working itself doesn't surprise me, I don't even know what caused the problem.


 
 I'd probably consider 3.13.11 if I simply had work that needs to get done 
 rather than 

Re: btrfs cleaner failure - fs/btrfs/extent-tree.c:5748 (3.14.0)

2014-05-09 Thread Chris Samuel
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi Marc,

On Fri, 9 May 2014 03:36:59 PM Marc MERLIN wrote:

 Oh, I missed that.
 May  2 14:23:06 legolas kernel: [283268.319035] CPU: 0 PID: 25726 Comm:
 watchdog/0 Tainted: GW3.14.0-amd64-i915-preempt-20140216 #2
 This is weird because I don't use any 3rd party binary modules.

There's actually a bunch of reasons a kernel can be tainted.

 Right now, I do see:
 legolas:~# cat /proc/sys/kernel/tainted
 512

IIUC that's an array of bit flags, and that value means you've had a previous 
kernel warning at that point according to:

https://www.kernel.org/doc/Documentation/sysctl/kernel.txt

# tainted:
#
# Non-zero if the kernel has been tainted.  Numeric values,
# which can be ORed together:
#
[...]
# 512 - A kernel warning has occurred.

Best of luck!
Chris
- -- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC

-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.22 (GNU/Linux)

iQEVAwUBU21vOo1yjaOTJg85AQLJDwf/QOxRt0f5KqPbhknn8x0XyUQ5upC8PbzD
FoDHAkKV7tCUGQ6ZmCufBUKi0beNHNE3YKXlld8zLjlYpyV5lCZIgP3XvjQ/A4pZ
Vq+XKiqddaZHOFnjQuk9kseqXJaeH7Vr90xz2D92lcRb3NY6yoD2sdFMhAeN43vh
23stzC2Ybr79NFELWPCL3MTFL4qZrAY/4KFFKDQEZsNHMEJW2zJXX841lFsTXJwO
1Ggsi3WzNCJMo+GHRqH+9Gyb4ICk7u7FABHo+y/dShTGnxAh5/8zMnKidlSfCdzd
APKPMrydKEX+O+Fm3zDcKg8gER3FJtWKCyHXfW+zyORTMbxiH5QK5Q==
=q69d
-END PGP SIGNATURE-

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs cleaner failure - fs/btrfs/extent-tree.c:5748 (3.14.0)

2014-05-09 Thread Marc MERLIN
On Sat, May 10, 2014 at 10:13:43AM +1000, Chris Samuel wrote:
  Right now, I do see:
  legolas:~# cat /proc/sys/kernel/tainted
  512
 
 IIUC that's an array of bit flags, and that value means you've had a previous 
 kernel warning at that point according to:
 
 https://www.kernel.org/doc/Documentation/sysctl/kernel.txt

Yep, I meant to say that I don't have the 'G' now.

It's likely that vbox did 'G' even if I didn't successfully start it,
and even if I haven't had problems with it 'till now, it's a possible
culprit (more details below)

Anyway, it sounds like the FS is toast, there isn't much useful that can
be gleaned from it, so I'll just wipe it and start over. 
I think really my biggest disappointment is that no recovery tools seem
to be able to open the FS now even though it was accessible and
seemingly working well enough when it was read only before I rebooted.

On Fri, May 09, 2014 at 06:00:50PM -0600, Chris Murphy wrote:
 Well I'm sorta dense, so I only find a complete dmesg useful because
 with storage problems it seems much is due to some other problem
 happening earlier. Maybe a fs developer would say yeah that's not

True, although I didn't find anything earlier that looked relevant.

 good, but we maybe should do better failing gracefully. Call traces
 don't mean much of anything to me, I think the real problem happened
 before this, unless it's strictly a Btrfs bug in which case the
 evidence may be localized in just the trace.

Sure, the corruption could have happened before the cleaner process
uncovered it and then turned my FS read only.
But to be honest, before cleaner ran, the FS worked (I was using it),
after that, it was read only and upon reboot it became unmountable by
anything.
That seems suspect to me :-/
 
 Also you said it went read only overnight but I'm seeing a reference
 here to cleaning up a deleted snapshot? Are you running something
 that's taking and deleting snapshots on a schedule?

Yes, hourly snapshot rotations and hourly btrfs send/receive to my
secondary drive, which is still working as of now and I'm using to type
this now.
(I'll format the SSD and copy things back tonight since I'm worried that
if anything happens to my HD, my laptop will be toast until I get home)

 The G means it's not a proprietary driver involved. You'd have to go
 through a full dmesg to find out what's causing it, but the point of
 the tainted state notification is that the kernel is in a state likely
 no one, or very few, other people are experiencing and any subsequent
 problems are suspect.
 
Mmmh, I did try to start virtualbox, but it didn't start because the
driver was out of date. I did not compile and install the new one yet,
nor actually used virtualbox.

 There are still ZFS corruptions from time to time. And they happen
 even on file systems that get pounded on mercilessly like NTFS, XFS
 and HFS+. Almost always it's not the file system itself, something
 else instigated the problem. Still such mature file systems have bugs
 being found and fixed. So recovery not working itself doesn't surprise
 me, I don't even know what caused the problem.

True. Never had this with ext2/3/4 in 15 years, but as you say, it's
possible.

 I think Btrfs in general is still buyer beware, but that's in the
 category of Not News because I think all free software distributions
 say the same thing, essentially. None of it comes with support or a
 warranty unless you've bought an SLA. If you really suspect a problem
 in 3.14.x that may not yet be fixed in 3.15rc or you don't want to
 run rc kernels is reasonable to run the kernel prior to the current
 one which is 3.13.11. The way kernel fixes work, a fix has to be
 demonstrated in

Right. I'd want to avoid 3.15rc unless someone tells me I really should
be running it.

 Well you think you've been using it successfully for 10 years. If
 you've have exactly 0 cases of any kind of fs corruption in 10 years,
 or can exclude suspend/resume from corruption incident by assurance
 there was a reboot in between the suspend/resume and the corruption,
 then maybe you haven't experienced a problem. But Google is full of
 users who have not merely immediate corruption on suspend/resume

Point taken, thanks.
But not suspending (S3 sleep) on my lapotp isn't exactly practical
either :-/

Marc
-- 
A mouse is a device used to point at the xterm you want to type in - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs cleaner failure - fs/btrfs/extent-tree.c:5748 (3.14.0)

2014-05-09 Thread Hugo Mills
On Fri, May 09, 2014 at 05:42:54PM -0700, Marc MERLIN wrote:
 On Sat, May 10, 2014 at 10:13:43AM +1000, Chris Samuel wrote:
   Right now, I do see:
   legolas:~# cat /proc/sys/kernel/tainted
   512
  
  IIUC that's an array of bit flags, and that value means you've had a 
  previous 
  kernel warning at that point according to:
  
  https://www.kernel.org/doc/Documentation/sysctl/kernel.txt
 
 Yep, I meant to say that I don't have the 'G' now.

   G is actually good, I think. IIRC, it's everything we've had to
this point has been under a license where we have the source
available. It's when you load a proprietary module that you get the P
and the G goes away.

 It's likely that vbox did 'G' even if I didn't successfully start it,
 and even if I haven't had problems with it 'till now, it's a possible
 culprit (more details below)

   I think G is actually a default state, and is good.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
--- I write in C because using pointer arithmetic lets people ---
   know that you're virile. -- Matthew Garrett   


signature.asc
Description: Digital signature


Re: btrfs cleaner failure - fs/btrfs/extent-tree.c:5748 (3.14.0)

2014-05-09 Thread Hugo Mills
On Fri, May 09, 2014 at 06:00:50PM -0600, Chris Murphy wrote:
 Well I'm sorta dense, so I only find a complete dmesg useful because
 with storage problems it seems much is due to some other problem
 happening earlier. 

   Life would be so much easier if filesystems didn't store any
persistent state... :)

   The number of people who don't quite get that that's the function
and natural behaviour of a filesystem is... surprising. 

   As in, Your filesystem got corruption as a result of a bug in some
earlier version. Upgrading to the new version isn't magically going to
make that corruption go away. (Not saying that's what's happened
here, but it's common, and commonly misunderstood).

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
  --- The makers of Steinway pianos would like me to tell you that ---   
  this is a Bechstein.   


signature.asc
Description: Digital signature


Re: btrfs cleaner failure - fs/btrfs/extent-tree.c:5748 (3.14.0)

2014-05-09 Thread Chris Murphy

On May 9, 2014, at 7:05 PM, Hugo Mills h...@carfax.org.uk wrote:

 On Fri, May 09, 2014 at 05:42:54PM -0700, Marc MERLIN wrote:
 On Sat, May 10, 2014 at 10:13:43AM +1000, Chris Samuel wrote:
 Right now, I do see:
 legolas:~# cat /proc/sys/kernel/tainted
 512
 
 IIUC that's an array of bit flags, and that value means you've had a 
 previous 
 kernel warning at that point according to:
 
 https://www.kernel.org/doc/Documentation/sysctl/kernel.txt
 
 Yep, I meant to say that I don't have the 'G' now.
 
   G is actually good, I think. IIRC, it's everything we've had to
 this point has been under a license where we have the source
 available. It's when you load a proprietary module that you get the P
 and the G goes away.
 
 It's likely that vbox did 'G' even if I didn't successfully start it,
 and even if I haven't had problems with it 'till now, it's a possible
 culprit (more details below)
 
   I think G is actually a default state, and is good.

The G just means it's not a proprietary kernel module, but it's still out of 
tree. So the kernel is in a state that we don't really know, without finding 
out what's causing it to be tainted. If it's a video or wireless driver (pretty 
likely) then it's probably sufficiently unrelated to fs to not matter.

However, I have a recent case in VBox guest, with guest additions built. That 
cause the kernel to be tainted G because it's an out of tree kernel module for 
guest additions. I'm getting a bunch of Btrfs errors that aren't reproducible 
with an untainted kernel. So I'm not filing a bug against Btrfs, instead I've 
filed a bug against VirtualBox because I'm also getting a pile of read write 
errors with /dev/sda which is backed by a VDI. A virtual device producing 
hardware read write errors (as far as linux kernel is concerned). But only with 
guest additions loaded. And the sustained copy event that triggers it doesn't 
even involve sda. It's a shared folder copy as the source, to a raw device as 
destination. Yet I get dozens of read write errors on sda, and ensuing Btrfs 
complaints as well. But in this case Btrfs is behaving exactly as I'd expect. 
What's unexpected is the virtual sata device behaving wrong, but apparently 
only with guest additions loaded.

Chris Murphy

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs cleaner failure - fs/btrfs/extent-tree.c:5748 (3.14.0)

2014-05-09 Thread Duncan
Hugo Mills posted on Sat, 10 May 2014 02:09:02 +0100 as excerpted:

 Life would be so much easier if filesystems didn't store any
 persistent state... :)
 
 The number of people who don't quite get that that's the function
 and natural behaviour of a filesystem is... surprising.
 
 As in, Your filesystem got corruption as a result of a bug in some
 earlier version. Upgrading to the new version isn't magically going to
 make that corruption go away. (Not saying that's what's happened here,
 but it's common, and commonly misunderstood).

FWIW, this is why I'm currently doing a mkfs.btrfs and copying over from 
primary backup (an identically sized partition on the same set of 
physical devices, also btrfs, secondary backup is reiserfs on a different 
device, just in case) every few kernel cycles, perhaps twice a year or 
every eight months.

My thinking is that even if scrub/balance/btrfs-check report no problems:

a) There are new on-device filesystem features I can now take advantage 
of (at least, there have been in each of the two mkfs.btrfs cycles I've 
done so far).  And...

b) Recreating the filesystem and copying everything over new limits the 
time-window I'm exposed to old and potentially latent bugs that may have 
in fact been fixed in new deployments without every having triggered at 
the time, due to masking from some other bug or happenstance that may 
eventually go away, otherwise leaving me exposed to this strange corner-
case bug from two years or whatever ago.

I'll probably continue to do that until btrfs is considered stable, or 
even past that (tho then likely at a rather lower frequency, say every 
year to year and a half), because it's relatively easy to do with the way 
I handle backups.

-- 
Duncan - List replies preferred.   No HTML msgs.
Every nonfree program has a lord, a master --
and if you use the program, he is your master.  Richard Stallman

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html