Re: bad key ordering - repairable?

2018-01-27 Thread Claes Fransson
2018-01-27 18:32 GMT+01:00 Claes Fransson :
>
> Duncan Wed, 24 Jan 2018 15:18:25 -0800
>
> Claes Fransson posted on Wed, 24 Jan 2018 20:44:33 +0100 as excerpted:
>
> > So, I have now some results from the PassMark Memtest86! I let the
> > default automatic tests run for about 19 hours and 16 passes. It
> > reported zero "Errors", but 4 lines of "[Note] RAM may be vulnerable to
> > high frequency row hammer bit flips". If I understand it correctly,
> > it means that some errors were detected when the RAM was tested at
> > higher rates than guaranteed accurate by the vendors.
>
> >From Wikipedia:
>
>> Row hammer (also written as rowhammer) is an unintended side effect in
>> dynamic random-access memory (DRAM) that causes memory cells to leak
>> their charges and interact electrically between themselves, possibly
>> altering the contents of nearby memory rows that were not addressed in
>> the original memory access. This circumvention of the isolation between
>> DRAM memory cells results from the high cell density in modern DRAM, and
>> can be triggered by specially crafted memory access patterns that rapidly
>> activate the same memory rows numerous times.[1][2][3]
>>
>> The row hammer effect has been used in some privilege escalation computer
>> security exploits.
>>
>> https://en.wikipedia.org/wiki/Row_hammer
>>
>> So it has nothing to do with (generic) testing the RAM at higher rates
>> than guaranteed by the vendors, but rather, with deliberate rapid
>> repeated access (at normal clock rates) of the same cell rows in ordered
>> to trigger a bitflip in nearby memory cells that could not normally be
>> accessed due to process separation and insufficient privileges.
>
>
Well, I was thinking of the specific error message by memtest86.
According to the PassMark website,
https://www.memtest86.com/troubleshooting.htm, "Why am I only getting
errors during Test 13 Hammer Test?", second paragraph.
Thanks for the Wikipedia explanation though.
>
>> IOW, it's unlikely to be accidentally tripped, and thus is exceedingly
>> unlikely to be relevant here, unless you're being hacked, of course.
>
>
Okay, thanks for your conclusion.
>
>>
> That said, and entirely unrelated to rowhammer, I know one of the
> problems of memory test false-negatives from experience.
>
> In my case, I was even running ECC RAM.  But the memory I had purchased
> (back in the day when memory was far more expensive and sub-GB memory was
> the norm) was cheap, and as it happened, marked as stable at slightly
> higher clock rates than it actually was.  But I couldn't afford more (or
> I'd have procured less dodgy RAM in the first place) and had little
> recourse but to live with it for awhile.  A year or so later there was a
> BIOS update that added better memory clocking control, and I was able to
> declock the RAM slightly from its rating (IIRC to PC-3000 level, it was
> PC3200 rated, this was DDR1 era), after which it was /entirely/ stable,
> even after reducing some of the wait-state settings somewhat to try to
> claw back some of what I lost due to the underclocking.
>
> I run gentoo, and nearly all of my problems occurred when I was doing
> updates, building packages at 100% CPU with multiple cores accessing the
> same RAM.  FWIW, the most frequent /detected/ problem was bunzip checksum
> errors as it decompressed and verified the data in memory (before writing
> out)... that would move or go away if I tried again.  Occasionally I'd
> get machine-check errors (MCEs), but not frequently, and the ECC RAM
> subsystem /never/ reported errors.
>
My filesystem went readonly just after I did some updating of a lot of
packages (I think it was thousands of packages :) ), so massive
disk-IO for me, but possible also some CPU and RAM usage...
>
>> But the memory tests gave that memory an all-clear.
>
>
>>> The problem with the memory tests in this case is that they tend to work
>>> on an otherwise unloaded system, and test the retention of the memory
>>> cells, /not/ so much the speed and reliability at which they are accessed
>>> under fully loaded system stress -- and how could they when memory speed
>>> is normally set by the BIOS and not something the memory tester has
>>> access to?
>>>
>>> But my memory problems weren't with the memory cells themselves -- they
>>> retained their data just fine and indeed it was ECC RAM so would have
>>> triggered ECC errors if they didn't -- but with the precision timing of
>>> memory IO -- it wasn't quite up to the specs it claimed to support and
>>> would occasionally produce in-transit errors (the ECC would have detected
>>> and possibly corrected errors in storage), and the memory testers simply
>>> didn't test that like a fully loaded system doing unpacks of sources and
>>> builds from them did.
>>>
>>> As mentioned, once I got a BIOS update that let me declock the RAM a bit,
>>> everything was fine, and it remained fine when I did upgrade the RAM some
>>> years later, after prices had fallen, as well.
>
>
Than

Re: bad key ordering - repairable?

2018-01-27 Thread Claes Fransson
2018-01-23 14:06 GMT+01:00 Claes Fransson :
> 2018-01-22 22:22 GMT+01:00 Hugo Mills :
>> On Mon, Jan 22, 2018 at 10:06:58PM +0100, Claes Fransson wrote:
>>> Hi!
>>>
>>> I really like the features of BTRFS, especially deduplication,
>>> snapshotting and checksumming. However, when using it on my laptop the
>>> last couple of years, it has became corrupted a lot of times.
>>> Sometimes I have managed to fix the problems (at least so much that I
>>> can continue to use the filesystem) with check --repair, but several
>>> times I had to recreate the file system and reinstall the operating
>>> system.
>>>
>>> I am guessing the corruptions might be the results of unclean
>>> shutdowns, mostly after system hangs, but also because of running out
>>> of battery sometimes?
>>> Furthermore, the power-led has recently started blinking (also when
>>> the power-cable is plugged in), I guess because of an old and bad
>>> battery. Maybe the current corruption also can have something to do
>>> with this? However I almost always run with power cable plugged in in
>>> last year, only on battery a few seconds a few times when moving the
>>> laptop.
>>>
>>> Currently, I can only mount the filesystem readonly, it goes readonly
>>> automatically if I try to mount it normally.
>>>
>>> When booting an OpenSUSE Tumbleweed-20180119 live-iso:
>>> localhost:~ # uname -r
>>> 4.14.13-1-default
>>> localhost:~ # btrfs --version
>>> btrfs-progs v4.14.1
>>>
>>> localhost:~ # btrfs check -p /dev/sda12
>>> Checking filesystem on /dev/sda12
>>
>> [fixing up bad paste]
>>
>>> UUID: d2819d5a-fd69-484b-bf34-f2b5692cbe1f
>>> bad key ordering 159 160 bad block 690436964352
>>> ERROR: errors found in extent allocation tree or chunk allocation
>>> checking free space cache [.]
>>> checking fs roots [o]
>>> checking csums
>>> bad key ordering 159 160
>>> Error looking up extent record -1
>>
>> [snip]
>>
>>> localhost:~ # btrfs inspect-internal dump-tree -b 690436964352
>>> /dev/sda12
>>> btrfs-progs v4.14.1
>>> leaf 690436964352 items 170 free space 1811 generation 196864 owner 2
>>> leaf 690436964352 flags 0x1(WRITTEN) backref revision 1
>>> fs uuid d2819d5a-fd69-484b-bf34-f2b5692cbe1f
>>> chunk uuid 52f81fe6-893b-4432-9336-895057ee81e1
>>> .
>>> .
>>> .
>>> item 157 key (22732500992 EXTENT_ITEM 16384) itemoff 6538 itemsize 
>>> 53
>>> refs 1 gen 821 flags DATA
>>> extent data backref root 287 objectid 51665 offset 0 count 1
>>> item 158 key (22732517376 EXTENT_ITEM 16384) itemoff 6485 itemsize 
>>> 53
>>> refs 1 gen 821 flags DATA
>>> extent data backref root 287 objectid 51666 offset 0 count 1
>>> item 159 key (22732533760 EXTENT_ITEM 16384) itemoff 6485 itemsize 0
>>> print-tree.c:428: print_extent_item: BUG_ON `item_size != sizeof(*ei0)` 
>>> triggered, value 1
>>> btrfs(+0x365c6)[0x55bdfaada5c6]
>>> btrfs(print_extent_item+0x424)[0x55bdfaadb284]
>>> btrfs(btrfs_print_leaf+0x94e)[0x55bdfaadbc1e]
>>> btrfs(btrfs_print_tree+0x295)[0x55bdfaadcf05]
>>> btrfs(cmd_inspect_dump_tree+0x734)[0x55bdfab1b024]
>>> btrfs(main+0x7d)[0x55bdfaac7d4d]
>>> /lib64/libc.so.6(__libc_start_main+0xea)[0x7ff42100ff4a]
>>> btrfs(_start+0x2a)[0x55bdfaac7e5a]
>>> Aborted (core dumped)
>>
>>Wow, I've never seen it do that before. It's the next thing I'd
>> have asked for, so it's good you've preempted it.
>>
>>The main thing is that bad key ordering is almost always due to RAM
>> corruption. That's either bad RAM, or dodgy power regulation -- the
>> latter could be the PSU, or capacitors on the motherboard. (In this
>> case, it might also be something funny with the battery).
>>
>>I would definitely recommend a long run of memtest86. At least 8
>> hours, preferably 24. If you get errors repeatedly in the sme place,
>> it's the RAM. If they appear randomly, it's probably the power
>> regulation.
>>
> Thanks for the suggestion, I will try to do this in the next days.
>
>> [snip]
>>
>>>
>>> The filesystem had become pretty full, I had planned to increase the
>>> Btrfs-partition size before it became corrupt.
>>>
>>> Active kernel when the filesystem went read only: OpenSUSE Linux
>>> 4.14.14-1.geef6178-default, from the
>>> http://download.opensuse.org/repositories/Kernel:/stable/standard/stable
>>> repository.
>>>
>>> Fstab mount options: noatime,autodefrag (I have been using the option
>>> nossd with older kernels one period in the past on the filesystem).
>>>
>>> If it matters, I have been running duperemove many times on the
>>> filesystem since creation.
>>>
>>> To test the RAM, I have been running mprime Blend-test for 24 hours
>>> after the corruption without any error or warning.
>>
>>Of all of the bad key order errors I've seen (dozens), I think
>> there were a whole two which turned out not to be obviously related to
>> corrupt RAM. I still say that it's most likely the hardware.
>
> Okay, thank you for sharing your experience with me.
>
>>
>>> Is 

Re: bad key ordering - repairable?

2018-01-25 Thread Austin S. Hemmelgarn

On 2018-01-24 18:54, Chris Murphy wrote:

On Wed, Jan 24, 2018 at 5:30 AM, Austin S. Hemmelgarn
 wrote:


APFS is really vague on this front, it may be checksumming metadata,
it's not checksumming data and with no option to. Apple proposes their
branded storage devices do not return bogus data. OK so then why
checksum the metadata?


Even aside from the fact that it might be checksumming data, Apple's storage
engineers are still smoking something pretty damn strong if they think that
they can claim their storage devices _never_ return bogus data.  Either
they're running some kind of checksumming _and_ replication below the block
layer in the storage device itself (which actually might explain the insane
cost of at least one piece of their hardware), or they think they've come up
with some fail-safe way to detect corruption and return errors reliably, and
in either case things can still fail.  I smell a potential future lawsuit in
the works.



I read somewhere the hardware (or more correctly their flash firmware)
supposedly uses 128 bytes of checksum per 4KB data. That's a lot, I
wonder if it's actually some kind of parity. But regardless, this kind
of in-hardware checksumming won't account for things like misdirected
or torn writes or literally any sort of corruption happening prior to
the flash firmware computing those checksums.
It's most likely more generic erasure coding (parity as most people 
think of it in the storage sense (RAID5 and RAID6) is a special case of 
(n, n-1) or (n, n-2) erasure coding that happens to be optimal), so in 
theory they could correct up to 1024 bits of errors, which is all well 
and good, but as you say doesn't really protect against much (more 
specifically, it only protects reliably against cell discharges from 
various sources, or more generic read-disturb errors).


On flash storage, maybe they're just concerned about bit rot or even
the most superficial bit flips, and having just enough information to
detect and correct for 1 or 2 flips per 4KB, not totally dissimilar to
ECC memory. But that they don't use ECC memory, leave them open to
corruption in the storage stack happening outside the literal storage
device.
They also don't appear to use T.10 DIF (or whatever the T.13 equivalent 
that I can never remember the name of is), which means even if they did 
use ECC RAM they would still have a period of time where the data is 
unprotected.



Actually, I forgot about the (newer) metadata checksumming feature in ext4,
and was just basing my statement on behavior the last time I used it for
anything serious.  Having just checked mkfs.ext4, it appears that the
metadata in the SB that tells the kernel what to do when it runs into an
error for the FS still defaults to continuing on as if nothing happens, even
if you enable metadata checksumming (which still seems to be disabled by
default).  Whether or not that actually is honored by modern kernels, I
don't know, but I've seen no evidence to suggest that it isn't.



Depending on the corruption, Btrfs continues as well. If I corrupt a
deadend leaf that contains file metadata (like names or security
contexts), I just get some complaints of corruption. The file system
remains rw mounted though. I don't know the metric by which metadata
can be damaged and Btrfs says "whoooaa!!" and puts on the brakes by
going read only. XFS certainly has its limits and goes read only when
it detects certain metadata corruption via checksum fail. I'd guess
ext4 will do the same thing, otherwise whats the point if it's going
to knowingly eat itself alive?
I'm pretty sure the ext4 behavior is a hold-over from the original ext 
filesystem, and I think even as far back as the version of the MINIX 
filesystem that Linux originally used (which ext evolved out of).  At a 
minimum, all three error behaviors (panic, go read-only, or flag and 
ignore) have been around since the early days of ext2.


FWIW, there are some cases where it does make sense to just not care and 
ignore the errors.  As a pretty specific example, one of the last 
remaining places I still use ext4 is on top of compressed ramdisks when 
I need some quick ephemeral storage that I want to be more memory 
efficient than tmpfs.  In such cases, the FS gets mounted exactly once, 
and is usually used only for a very short period of time, and as a 
result, the 'on-disk' data doesn't really matter much, so there's not 
much point in worrying about it.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: bad key ordering - repairable?

2018-01-24 Thread Chris Murphy
On Wed, Jan 24, 2018 at 5:30 AM, Austin S. Hemmelgarn
 wrote:

>> APFS is really vague on this front, it may be checksumming metadata,
>> it's not checksumming data and with no option to. Apple proposes their
>> branded storage devices do not return bogus data. OK so then why
>> checksum the metadata?
>
> Even aside from the fact that it might be checksumming data, Apple's storage
> engineers are still smoking something pretty damn strong if they think that
> they can claim their storage devices _never_ return bogus data.  Either
> they're running some kind of checksumming _and_ replication below the block
> layer in the storage device itself (which actually might explain the insane
> cost of at least one piece of their hardware), or they think they've come up
> with some fail-safe way to detect corruption and return errors reliably, and
> in either case things can still fail.  I smell a potential future lawsuit in
> the works.


I read somewhere the hardware (or more correctly their flash firmware)
supposedly uses 128 bytes of checksum per 4KB data. That's a lot, I
wonder if it's actually some kind of parity. But regardless, this kind
of in-hardware checksumming won't account for things like misdirected
or torn writes or literally any sort of corruption happening prior to
the flash firmware computing those checksums.

On flash storage, maybe they're just concerned about bit rot or even
the most superficial bit flips, and having just enough information to
detect and correct for 1 or 2 flips per 4KB, not totally dissimilar to
ECC memory. But that they don't use ECC memory, leave them open to
corruption in the storage stack happening outside the literal storage
device.


> Actually, I forgot about the (newer) metadata checksumming feature in ext4,
> and was just basing my statement on behavior the last time I used it for
> anything serious.  Having just checked mkfs.ext4, it appears that the
> metadata in the SB that tells the kernel what to do when it runs into an
> error for the FS still defaults to continuing on as if nothing happens, even
> if you enable metadata checksumming (which still seems to be disabled by
> default).  Whether or not that actually is honored by modern kernels, I
> don't know, but I've seen no evidence to suggest that it isn't.


Depending on the corruption, Btrfs continues as well. If I corrupt a
deadend leaf that contains file metadata (like names or security
contexts), I just get some complaints of corruption. The file system
remains rw mounted though. I don't know the metric by which metadata
can be damaged and Btrfs says "whoooaa!!" and puts on the brakes by
going read only. XFS certainly has its limits and goes read only when
it detects certain metadata corruption via checksum fail. I'd guess
ext4 will do the same thing, otherwise whats the point if it's going
to knowingly eat itself alive?


-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: bad key ordering - repairable?

2018-01-24 Thread Duncan
Claes Fransson posted on Wed, 24 Jan 2018 20:44:33 +0100 as excerpted:

> So, I have now some results from the PassMark Memtest86! I let the
> default automatic tests run for about 19 hours and 16 passes. It
> reported zero "Errors", but 4 lines of "[Note] RAM may be vulnerable to
> high frequency row hammer bit flips". If I understand it correctly,
> it means that some errors were detected when the RAM was tested at
> higher rates than guaranteed accurate by the vendors.

>From Wikipedia:

Row hammer (also written as rowhammer) is an unintended side effect in 
dynamic random-access memory (DRAM) that causes memory cells to leak 
their charges and interact electrically between themselves, possibly 
altering the contents of nearby memory rows that were not addressed in 
the original memory access. This circumvention of the isolation between 
DRAM memory cells results from the high cell density in modern DRAM, and 
can be triggered by specially crafted memory access patterns that rapidly 
activate the same memory rows numerous times.[1][2][3]

The row hammer effect has been used in some privilege escalation computer 
security exploits.

https://en.wikipedia.org/wiki/Row_hammer

So it has nothing to do with (generic) testing the RAM at higher rates 
than guaranteed by the vendors, but rather, with deliberate rapid 
repeated access (at normal clock rates) of the same cell rows in ordered 
to trigger a bitflip in nearby memory cells that could not normally be 
accessed due to process separation and insufficient privileges.

IOW, it's unlikely to be accidentally tripped, and thus is exceedingly 
unlikely to be relevant here, unless you're being hacked, of course.


That said, and entirely unrelated to rowhammer, I know one of the 
problems of memory test false-negatives from experience.

In my case, I was even running ECC RAM.  But the memory I had purchased 
(back in the day when memory was far more expensive and sub-GB memory was 
the norm) was cheap, and as it happened, marked as stable at slightly 
higher clock rates than it actually was.  But I couldn't afford more (or 
I'd have procured less dodgy RAM in the first place) and had little 
recourse but to live with it for awhile.  A year or so later there was a 
BIOS update that added better memory clocking control, and I was able to 
declock the RAM slightly from its rating (IIRC to PC-3000 level, it was 
PC3200 rated, this was DDR1 era), after which it was /entirely/ stable, 
even after reducing some of the wait-state settings somewhat to try to 
claw back some of what I lost due to the underclocking.

I run gentoo, and nearly all of my problems occurred when I was doing 
updates, building packages at 100% CPU with multiple cores accessing the 
same RAM.  FWIW, the most frequent /detected/ problem was bunzip checksum 
errors as it decompressed and verified the data in memory (before writing 
out)... that would move or go away if I tried again.  Occasionally I'd 
get machine-check errors (MCEs), but not frequently, and the ECC RAM 
subsystem /never/ reported errors.

But the memory tests gave that memory an all-clear.

The problem with the memory tests in this case is that they tend to work 
on an otherwise unloaded system, and test the retention of the memory 
cells, /not/ so much the speed and reliability at which they are accessed 
under fully loaded system stress -- and how could they when memory speed 
is normally set by the BIOS and not something the memory tester has 
access to?

But my memory problems weren't with the memory cells themselves -- they 
retained their data just fine and indeed it was ECC RAM so would have 
triggered ECC errors if they didn't -- but with the precision timing of 
memory IO -- it wasn't quite up to the specs it claimed to support and 
would occasionally produce in-transit errors (the ECC would have detected 
and possibly corrected errors in storage), and the memory testers simply 
didn't test that like a fully loaded system doing unpacks of sources and 
builds from them did.

As mentioned, once I got a BIOS update that let me declock the RAM a bit, 
everything was fine, and it remained fine when I did upgrade the RAM some 
years later, after prices had fallen, as well.

(The system was first-gen AMD Opteron, on a server-grade Tyan board, that 
I ran from purchase in late 2003 for over eight years, maxing out the 
pair of CPUs to dual-core Opteron 290s and the RAM to 8 gigs, over time, 
until the board finally died in 2012 due to burst capacitors.  Which 
reminds me, I'm still running the replacement, a Gigabyte with an fx6100 
overclocked a bit to 3.9 GHz and 16 gig RAM, and it's now nearing six 
years old, so I suppose I better start planning for the next upgrade...  
I've spent that six years upgrading to big-screen TVs as monitors, with a 
65inch/165cm 4K as my primary now and a 48inch/122cm as a secondary to 
put youtube or whatever on fullscreen, and to now my second generation of 
ssds, a pair of 1 TB samsung evos, b

Re: bad key ordering - repairable?

2018-01-24 Thread Claes Fransson
On Jan 24, 2018 01:31, "Chris Murphy"  wrote:

On Tue, Jan 23, 2018 at 11:13 AM, Claes Fransson
 wrote:

> I haven't noticed before that there is actually RAM-modules from
> different vendors in the laptop. One 8GB by Samsung, and one 4GB by
> Kingston!

If they have the correct tolerances, I don't think it's a problem.
Some memory controllers use a kind of interleaving if the module sizes
are the same, so worse case you might be leaving a bit of a
performance improvement on the table by the fact they aren't the same
size.

If the memory testing doesn't pan out, you could go down a bit of a
rabbit hole and run each module in production for twice the length of
time you figure you should see a corruption appear.


So, I have now some results from the PassMark Memtest86! I let the
default automatic tests run for about 19 hours and 16 passes. It
reported zero "Errors", but 4 lines of "[Note] RAM may be vulnerable
to high frequency row hammer bit flips". If I understand it correctly,
it means that some errors were detected when the RAM was tested at
higher rates than guaranteed accurate by the vendors. I am not sure
what that may indicate regarding the performance of the RAM for my
Btrfs filesystem. I "only" got irreparable corruptions maybe once
every couple of months or half a year.

I also forgot that I have been trying using Zswap the last couple of
months with OpenSUSE on the Btrfs-filesystem (and also Fedora on the
Ext4-partition). Maybe that is a source for the last corruption (I am
pretty sure I was not using Zswap during previous corruptions, of
which I think at least one was reporting "transid verify failed" or
similar.) Sometimes, but not when the filesystem went readonly, the
computer has been freezing almost completely (mouse pointer moving
only extremely slowly) when running out of RAM the last months. I have
sometimes waited many hours for the operating system to swap out not
so important memory to the swap-partition, but end up having to force
a reboot. I suspect that it might be Zswap not working optimally,
maybe it also affects Btrfs? I have used pretty low swappiness values,
1 or 10.

I might try using only one of the RAM modules in the future if nothing
else works. I usually use most of my available 12 GB RAM though (and
often even more :) ) when using my laptop.



> I also found that there indeed was a new firmware version for my
> SSD-disk, so I have now updated it's firmware to the newest version.
> Unfortunately I couldn't find any information of what possible issues
> it was supposed to fix. The laptop has already the latest BIOS version
> provided by ASUS for the model.

I don't know enough about the bad key ordering error and its cause. If
that corruption can happen only in memory then the SSD firmware update
may change nothing. If there's some possibility the corruption can be
the result of SSD firmware bugs, then it might make sense to use DUP
metadata in the short term, even on an SSD. Any memory corruption
would affect both copies. Any SSD induced corruption *might* affect
both copies, depending on whether the SSD deduplicates or colocates
the two copies of metadata...but I'd like to think that there's at
least a pretty decent chance one of the copies would be good in which
case you'd get Btrfs self-healing for metadata only.

Thanks, I might try metadata DUP in the future.

Anyway, it's a tedious search.

As for Btrfs getting better at handling these kinds of cases. Yeah
it's a valid question. What we know about other file systems is they
can become unrepairable because they don't detect corruption soon
enough. Whereas Btrfs has detected a problem early on yet it's still
damaged enough now that effectively you can no longer mount it rw.
>From a data integrity point of view, at least you can ro mount and get
your data off the volume with a normal file copy operation, not
something that's certain with other file systems.

If you were to try another file system, I'd look at XFS, tools and
kernels in the past couple of years support metadata checksumming with
the V5 format.


Yes, XFS should also have deduplication as an experimental feature.
Don't know how stable it is yet, I might try it. In the future it is
also supposed to get snapshot feature.

Thanks for all your tips and thoughts.

Claes




--
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: bad key ordering - repairable?

2018-01-24 Thread Austin S. Hemmelgarn

On 2018-01-23 19:44, Chris Murphy wrote:

On Tue, Jan 23, 2018 at 5:51 AM, Austin S. Hemmelgarn
 wrote:


This is extremely important to understand.  BTRFS and ZFS are essentially
the only filesystems available on Linux that actually validate things enough
to notice this reliably (ReFS on Windows probably does, and I think whatever
Apple is calling their new FS does too).


ReFS always checksums metadata, optionally can checksum data.
Good to know, I've not actually dealt with ReFS myself yet (we're mostly 
a Linux shop where I work, and the two Windows servers we do have aren't 
using ReFS simply because it wasn't beyond the technology preview level 
when we installed them and we don't want to screw anything up).


APFS is really vague on this front, it may be checksumming metadata,
it's not checksumming data and with no option to. Apple proposes their
branded storage devices do not return bogus data. OK so then why
checksum the metadata?
Even aside from the fact that it might be checksumming data, Apple's 
storage engineers are still smoking something pretty damn strong if they 
think that they can claim their storage devices _never_ return bogus 
data.  Either they're running some kind of checksumming _and_ 
replication below the block layer in the storage device itself (which 
actually might explain the insane cost of at least one piece of their 
hardware), or they think they've come up with some fail-safe way to 
detect corruption and return errors reliably, and in either case things 
can still fail.  I smell a potential future lawsuit in the works...



Even if ext4 did notice it, it
would just mark the filesystem for a check and then keep going without doing
anything else about it (seriously, the default behavior for internal errors
on ext4 is to just continue like nothing happened and mark the FS for fsck).


I haven't used ext4 with metadata checksumming enabled, and have no
idea how it behaves when it starts encountering checksum errors during
normal use. For sure XFS will complain a lot and will go read only
when it gets confused. I'd expect any file system going to the trouble
of checksumming would have to have some means of bailing out, rather
than just continuing on.
Actually, I forgot about the (newer) metadata checksumming feature in 
ext4, and was just basing my statement on behavior the last time I used 
it for anything serious.  Having just checked mkfs.ext4, it appears that 
the metadata in the SB that tells the kernel what to do when it runs 
into an error for the FS still defaults to continuing on as if nothing 
happens, even if you enable metadata checksumming (which still seems to 
be disabled by default).  Whether or not that actually is honored by 
modern kernels, I don't know, but I've seen no evidence to suggest that 
it isn't.


Btrfs (and maybe ZFS) COW everything except supers. So ostensibly a
future feature might let them continue on with a kind of
integrated/single volume variation on seed/sprout device. I'd like to
see something like this just for undoable and testable offline
repairs, rather than offline repair only being predicated on
overwritting metadata.Agreed.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: bad key ordering - repairable?

2018-01-23 Thread Chris Murphy
On Tue, Jan 23, 2018 at 5:51 AM, Austin S. Hemmelgarn
 wrote:

> This is extremely important to understand.  BTRFS and ZFS are essentially
> the only filesystems available on Linux that actually validate things enough
> to notice this reliably (ReFS on Windows probably does, and I think whatever
> Apple is calling their new FS does too).

ReFS always checksums metadata, optionally can checksum data.

APFS is really vague on this front, it may be checksumming metadata,
it's not checksumming data and with no option to. Apple proposes their
branded storage devices do not return bogus data. OK so then why
checksum the metadata?

>Even if ext4 did notice it, it
> would just mark the filesystem for a check and then keep going without doing
> anything else about it (seriously, the default behavior for internal errors
> on ext4 is to just continue like nothing happened and mark the FS for fsck).

I haven't used ext4 with metadata checksumming enabled, and have no
idea how it behaves when it starts encountering checksum errors during
normal use. For sure XFS will complain a lot and will go read only
when it gets confused. I'd expect any file system going to the trouble
of checksumming would have to have some means of bailing out, rather
than just continuing on.

Btrfs (and maybe ZFS) COW everything except supers. So ostensibly a
future feature might let them continue on with a kind of
integrated/single volume variation on seed/sprout device. I'd like to
see something like this just for undoable and testable offline
repairs, rather than offline repair only being predicated on
overwritting metadata.

-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: bad key ordering - repairable?

2018-01-23 Thread Chris Murphy
On Tue, Jan 23, 2018 at 11:13 AM, Claes Fransson
 wrote:

> I haven't noticed before that there is actually RAM-modules from
> different vendors in the laptop. One 8GB by Samsung, and one 4GB by
> Kingston!

If they have the correct tolerances, I don't think it's a problem.
Some memory controllers use a kind of interleaving if the module sizes
are the same, so worse case you might be leaving a bit of a
performance improvement on the table by the fact they aren't the same
size.

If the memory testing doesn't pan out, you could go down a bit of a
rabbit hole and run each module in production for twice the length of
time you figure you should see a corruption appear.

> I also found that there indeed was a new firmware version for my
> SSD-disk, so I have now updated it's firmware to the newest version.
> Unfortunately I couldn't find any information of what possible issues
> it was supposed to fix. The laptop has already the latest BIOS version
> provided by ASUS for the model.

I don't know enough about the bad key ordering error and its cause. If
that corruption can happen only in memory then the SSD firmware update
may change nothing. If there's some possibility the corruption can be
the result of SSD firmware bugs, then it might make sense to use DUP
metadata in the short term, even on an SSD. Any memory corruption
would affect both copies. Any SSD induced corruption *might* affect
both copies, depending on whether the SSD deduplicates or colocates
the two copies of metadata...but I'd like to think that there's at
least a pretty decent chance one of the copies would be good in which
case you'd get Btrfs self-healing for metadata only.

Anyway, it's a tedious search.

As for Btrfs getting better at handling these kinds of cases. Yeah
it's a valid question. What we know about other file systems is they
can become unrepairable because they don't detect corruption soon
enough. Whereas Btrfs has detected a problem early on yet it's still
damaged enough now that effectively you can no longer mount it rw.
>From a data integrity point of view, at least you can ro mount and get
your data off the volume with a normal file copy operation, not
something that's certain with other file systems.

If you were to try another file system, I'd look at XFS, tools and
kernels in the past couple of years support metadata checksumming with
the V5 format.


-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: bad key ordering - repairable?

2018-01-23 Thread Claes Fransson
2018-01-23 14:06 GMT+01:00 Claes Fransson :
> 2018-01-22 22:22 GMT+01:00 Hugo Mills :
>> On Mon, Jan 22, 2018 at 10:06:58PM +0100, Claes Fransson wrote:
>>> Hi!
>>>
>>> I really like the features of BTRFS, especially deduplication,
>>> snapshotting and checksumming. However, when using it on my laptop the
>>> last couple of years, it has became corrupted a lot of times.
>>> Sometimes I have managed to fix the problems (at least so much that I
>>> can continue to use the filesystem) with check --repair, but several
>>> times I had to recreate the file system and reinstall the operating
>>> system.
>>>
>>> I am guessing the corruptions might be the results of unclean
>>> shutdowns, mostly after system hangs, but also because of running out
>>> of battery sometimes?
>>> Furthermore, the power-led has recently started blinking (also when
>>> the power-cable is plugged in), I guess because of an old and bad
>>> battery. Maybe the current corruption also can have something to do
>>> with this? However I almost always run with power cable plugged in in
>>> last year, only on battery a few seconds a few times when moving the
>>> laptop.
>>>
>>> Currently, I can only mount the filesystem readonly, it goes readonly
>>> automatically if I try to mount it normally.
>>>
>>> When booting an OpenSUSE Tumbleweed-20180119 live-iso:
>>> localhost:~ # uname -r
>>> 4.14.13-1-default
>>> localhost:~ # btrfs --version
>>> btrfs-progs v4.14.1
>>>
>>> localhost:~ # btrfs check -p /dev/sda12
>>> Checking filesystem on /dev/sda12
>>
>> [fixing up bad paste]
>>
>>> UUID: d2819d5a-fd69-484b-bf34-f2b5692cbe1f
>>> bad key ordering 159 160 bad block 690436964352
>>> ERROR: errors found in extent allocation tree or chunk allocation
>>> checking free space cache [.]
>>> checking fs roots [o]
>>> checking csums
>>> bad key ordering 159 160
>>> Error looking up extent record -1
>>
>> [snip]
>>
>>> localhost:~ # btrfs inspect-internal dump-tree -b 690436964352
>>> /dev/sda12
>>> btrfs-progs v4.14.1
>>> leaf 690436964352 items 170 free space 1811 generation 196864 owner 2
>>> leaf 690436964352 flags 0x1(WRITTEN) backref revision 1
>>> fs uuid d2819d5a-fd69-484b-bf34-f2b5692cbe1f
>>> chunk uuid 52f81fe6-893b-4432-9336-895057ee81e1
>>> .
>>> .
>>> .
>>> item 157 key (22732500992 EXTENT_ITEM 16384) itemoff 6538 itemsize 
>>> 53
>>> refs 1 gen 821 flags DATA
>>> extent data backref root 287 objectid 51665 offset 0 count 1
>>> item 158 key (22732517376 EXTENT_ITEM 16384) itemoff 6485 itemsize 
>>> 53
>>> refs 1 gen 821 flags DATA
>>> extent data backref root 287 objectid 51666 offset 0 count 1
>>> item 159 key (22732533760 EXTENT_ITEM 16384) itemoff 6485 itemsize 0
>>> print-tree.c:428: print_extent_item: BUG_ON `item_size != sizeof(*ei0)` 
>>> triggered, value 1
>>> btrfs(+0x365c6)[0x55bdfaada5c6]
>>> btrfs(print_extent_item+0x424)[0x55bdfaadb284]
>>> btrfs(btrfs_print_leaf+0x94e)[0x55bdfaadbc1e]
>>> btrfs(btrfs_print_tree+0x295)[0x55bdfaadcf05]
>>> btrfs(cmd_inspect_dump_tree+0x734)[0x55bdfab1b024]
>>> btrfs(main+0x7d)[0x55bdfaac7d4d]
>>> /lib64/libc.so.6(__libc_start_main+0xea)[0x7ff42100ff4a]
>>> btrfs(_start+0x2a)[0x55bdfaac7e5a]
>>> Aborted (core dumped)
>>
>>Wow, I've never seen it do that before. It's the next thing I'd
>> have asked for, so it's good you've preempted it.
>>
>>The main thing is that bad key ordering is almost always due to RAM
>> corruption. That's either bad RAM, or dodgy power regulation -- the
>> latter could be the PSU, or capacitors on the motherboard. (In this
>> case, it might also be something funny with the battery).
>>
>>I would definitely recommend a long run of memtest86. At least 8
>> hours, preferably 24. If you get errors repeatedly in the sme place,
>> it's the RAM. If they appear randomly, it's probably the power
>> regulation.
>>
> Thanks for the suggestion, I will try to do this in the next days.
>

I haven't noticed before that there is actually RAM-modules from
different vendors in the laptop. One 8GB by Samsung, and one 4GB by
Kingston! Maybe that is a source for the corruptions.
I also found that there indeed was a new firmware version for my
SSD-disk, so I have now updated it's firmware to the newest version.
Unfortunately I couldn't find any information of what possible issues
it was supposed to fix. The laptop has already the latest BIOS version
provided by ASUS for the model.
I have not yet run the memtest86.

Claes

>> [snip]
>>
>>>
>>> The filesystem had become pretty full, I had planned to increase the
>>> Btrfs-partition size before it became corrupt.
>>>
>>> Active kernel when the filesystem went read only: OpenSUSE Linux
>>> 4.14.14-1.geef6178-default, from the
>>> http://download.opensuse.org/repositories/Kernel:/stable/standard/stable
>>> repository.
>>>
>>> Fstab mount options: noatime,autodefrag (I have been using the option
>>> nossd with older kernels one period in the p

Re: bad key ordering - repairable?

2018-01-23 Thread Claes Fransson
2018-01-23 13:51 GMT+01:00 Austin S. Hemmelgarn :
> On 2018-01-22 21:35, Chris Murphy wrote:
>>
>> On Mon, Jan 22, 2018 at 2:06 PM, Claes Fransson
>>  wrote:
>>>
>>> Hi!
>>>
>>> I really like the features of BTRFS, especially deduplication,
>>> snapshotting and checksumming. However, when using it on my laptop the
>>> last couple of years, it has became corrupted a lot of times.
>>> Sometimes I have managed to fix the problems (at least so much that I
>>> can continue to use the filesystem) with check --repair, but several
>>> times I had to recreate the file system and reinstall the operating
>>> system.
>>>
>>> I am guessing the corruptions might be the results of unclean
>>> shutdowns, mostly after system hangs, but also because of running out
>>> of battery sometimes?
>>
>>
>> I think it's something else because I intentionally and
>> unintentionally do unclean shutdowns (I'm really impatient and I'm a
>> saboteur) on my laptop and I never get corruptions. In 18 months with
>> an HP Spectre which doesn't even have ECC memory, and has an NVMe
>> drive, *and* really remarkable for almost half this time I used the
>> discard mount option which pretty much instantly obliterates unused
>> roots, even when referenced in the super block as backup roots - and
>> yet still zero corruption. No complaints on mount, scrub, or readonly
>> checks. *shrug*
>>
>> Anyway I suspect hardware or power issue. Or even SSD firmware issue.
>
> I would tend to agree here, with one caveat, if it's a laptop that's less
> than 3 years old, you can probably rule out power issues.  Some more info on
> the particular system might help identify what's wrong.

Hi,

I boughtThe laptop new in July 2014, but have had corruption issues
with btrfs I think as long as I have been trying it, since the end of
2014 I think. You can find addtitional info about my laptop in my
original post, please let me know if you want som more info.

>>
>>
>>> Furthermore, the power-led has recently started blinking (also when
>>> the power-cable is plugged in), I guess because of an old and bad
>>> battery. Maybe the current corruption also can have something to do
>>> with this? However I almost always run with power cable plugged in in
>>> last year, only on battery a few seconds a few times when moving the
>>> laptop.
>>>
>>> Currently, I can only mount the filesystem readonly, it goes readonly
>>> automatically if I try to mount it normally.
>>
>>
>> Btrfs is confused and doesn't want to make the corruption worse. >
>>>
>>>
>>> Fstab mount options: noatime,autodefrag (I have been using the option
>>> nossd with older kernels one period in the past on the filesystem).
>>>
>>> If it matters, I have been running duperemove many times on the
>>> filesystem since creation.
>>
>>
>> I don't think it's related.
>>
>>
>>>
>>> To test the RAM, I have been running mprime Blend-test for 24 hours
>>> after the corruption without any error or warning.
>>
>>
>> I'm not familiar with it, pretty sure you want this for UEFI:
>>
>> https://www.memtest86.com/download.htm
>>
>> Where you can use that or memtest86+ if the firmware is BIOS based.
>
> Do keep in mind that just because it passes memory checks does not mean it's
> not an issue with the RAM.  Memory testers rarely throw false positives, but
> it's pretty common to get false negatives from them.>

Okay, thanks for telling me.

>>>
>>> I have never noticed any corruptions on the NTFS and Ext4 file systems
>>> on the laptop, only on the Btrfs file systems.
>>
>>
>> NTFS and ext4 likely won't notice such corruptions either (although
>> new ext4 volumes any day now will have checksummed metadata by
>> default) as they're weren't designed with such detection in mind.
>
> This is extremely important to understand.  BTRFS and ZFS are essentially
> the only filesystems available on Linux that actually validate things enough
> to notice this reliably (ReFS on Windows probably does, and I think whatever
> Apple is calling their new FS does too). Even if ext4 did notice it, it
> would just mark the filesystem for a check and then keep going without doing
> anything else about it (seriously, the default behavior for internal errors
> on ext4 is to just continue like nothing happened and mark the FS for fsck).

Well, personally I think it would be great if I (optionally) could do
that with Btrfs too. Even if it notice me of corruption and I might
even lose e few files, I think it would be good if I could continue to
use the filesystem with normal read/write capabilities, so I wouldnt
need to reinstall the operating system.

Best regards,

Claes
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: bad key ordering - repairable?

2018-01-23 Thread Claes Fransson
2018-01-23 3:35 GMT+01:00 Chris Murphy :
> On Mon, Jan 22, 2018 at 2:06 PM, Claes Fransson
>  wrote:
>> Hi!
>>
>> I really like the features of BTRFS, especially deduplication,
>> snapshotting and checksumming. However, when using it on my laptop the
>> last couple of years, it has became corrupted a lot of times.
>> Sometimes I have managed to fix the problems (at least so much that I
>> can continue to use the filesystem) with check --repair, but several
>> times I had to recreate the file system and reinstall the operating
>> system.
>>
>> I am guessing the corruptions might be the results of unclean
>> shutdowns, mostly after system hangs, but also because of running out
>> of battery sometimes?
>
> I think it's something else because I intentionally and
> unintentionally do unclean shutdowns (I'm really impatient and I'm a
> saboteur) on my laptop and I never get corruptions. In 18 months with
> an HP Spectre which doesn't even have ECC memory, and has an NVMe
> drive, *and* really remarkable for almost half this time I used the
> discard mount option which pretty much instantly obliterates unused
> roots, even when referenced in the super block as backup roots - and
> yet still zero corruption. No complaints on mount, scrub, or readonly
> checks. *shrug*
>
Okay, thank you for sharing your experience


> Anyway I suspect hardware or power issue. Or even SSD firmware issue.
>


>> Furthermore, the power-led has recently started blinking (also when
>> the power-cable is plugged in), I guess because of an old and bad
>> battery. Maybe the current corruption also can have something to do
>> with this? However I almost always run with power cable plugged in in
>> last year, only on battery a few seconds a few times when moving the
>> laptop.
>>
>> Currently, I can only mount the filesystem readonly, it goes readonly
>> automatically if I try to mount it normally.
>
> Btrfs is confused and doesn't want to make the corruption worse.
>
>
>
>
>>
>> Fstab mount options: noatime,autodefrag (I have been using the option
>> nossd with older kernels one period in the past on the filesystem).
>>
>> If it matters, I have been running duperemove many times on the
>> filesystem since creation.
>
> I don't think it's related.
>
>
>>
>> To test the RAM, I have been running mprime Blend-test for 24 hours
>> after the corruption without any error or warning.
>
> I'm not familiar with it, pretty sure you want this for UEFI:
>
> https://www.memtest86.com/download.htm
>
Thanks, I will try this within the next days (I boot my laptop in UEFI mode),


> Where you can use that or memtest86+ if the firmware is BIOS based.
>
>
>> I have never noticed any corruptions on the NTFS and Ext4 file systems
>> on the laptop, only on the Btrfs file systems.
>
> NTFS and ext4 likely won't notice such corruptions either (although
> new ext4 volumes any day now will have checksummed metadata by
> default) as they're weren't designed with such detection in mind.
>
>
> --
> Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: bad key ordering - repairable?

2018-01-23 Thread Claes Fransson
2018-01-22 22:22 GMT+01:00 Hugo Mills :
> On Mon, Jan 22, 2018 at 10:06:58PM +0100, Claes Fransson wrote:
>> Hi!
>>
>> I really like the features of BTRFS, especially deduplication,
>> snapshotting and checksumming. However, when using it on my laptop the
>> last couple of years, it has became corrupted a lot of times.
>> Sometimes I have managed to fix the problems (at least so much that I
>> can continue to use the filesystem) with check --repair, but several
>> times I had to recreate the file system and reinstall the operating
>> system.
>>
>> I am guessing the corruptions might be the results of unclean
>> shutdowns, mostly after system hangs, but also because of running out
>> of battery sometimes?
>> Furthermore, the power-led has recently started blinking (also when
>> the power-cable is plugged in), I guess because of an old and bad
>> battery. Maybe the current corruption also can have something to do
>> with this? However I almost always run with power cable plugged in in
>> last year, only on battery a few seconds a few times when moving the
>> laptop.
>>
>> Currently, I can only mount the filesystem readonly, it goes readonly
>> automatically if I try to mount it normally.
>>
>> When booting an OpenSUSE Tumbleweed-20180119 live-iso:
>> localhost:~ # uname -r
>> 4.14.13-1-default
>> localhost:~ # btrfs --version
>> btrfs-progs v4.14.1
>>
>> localhost:~ # btrfs check -p /dev/sda12
>> Checking filesystem on /dev/sda12
>
> [fixing up bad paste]
>
>> UUID: d2819d5a-fd69-484b-bf34-f2b5692cbe1f
>> bad key ordering 159 160 bad block 690436964352
>> ERROR: errors found in extent allocation tree or chunk allocation
>> checking free space cache [.]
>> checking fs roots [o]
>> checking csums
>> bad key ordering 159 160
>> Error looking up extent record -1
>
> [snip]
>
>> localhost:~ # btrfs inspect-internal dump-tree -b 690436964352
>> /dev/sda12
>> btrfs-progs v4.14.1
>> leaf 690436964352 items 170 free space 1811 generation 196864 owner 2
>> leaf 690436964352 flags 0x1(WRITTEN) backref revision 1
>> fs uuid d2819d5a-fd69-484b-bf34-f2b5692cbe1f
>> chunk uuid 52f81fe6-893b-4432-9336-895057ee81e1
>> .
>> .
>> .
>> item 157 key (22732500992 EXTENT_ITEM 16384) itemoff 6538 itemsize 53
>> refs 1 gen 821 flags DATA
>> extent data backref root 287 objectid 51665 offset 0 count 1
>> item 158 key (22732517376 EXTENT_ITEM 16384) itemoff 6485 itemsize 53
>> refs 1 gen 821 flags DATA
>> extent data backref root 287 objectid 51666 offset 0 count 1
>> item 159 key (22732533760 EXTENT_ITEM 16384) itemoff 6485 itemsize 0
>> print-tree.c:428: print_extent_item: BUG_ON `item_size != sizeof(*ei0)` 
>> triggered, value 1
>> btrfs(+0x365c6)[0x55bdfaada5c6]
>> btrfs(print_extent_item+0x424)[0x55bdfaadb284]
>> btrfs(btrfs_print_leaf+0x94e)[0x55bdfaadbc1e]
>> btrfs(btrfs_print_tree+0x295)[0x55bdfaadcf05]
>> btrfs(cmd_inspect_dump_tree+0x734)[0x55bdfab1b024]
>> btrfs(main+0x7d)[0x55bdfaac7d4d]
>> /lib64/libc.so.6(__libc_start_main+0xea)[0x7ff42100ff4a]
>> btrfs(_start+0x2a)[0x55bdfaac7e5a]
>> Aborted (core dumped)
>
>Wow, I've never seen it do that before. It's the next thing I'd
> have asked for, so it's good you've preempted it.
>
>The main thing is that bad key ordering is almost always due to RAM
> corruption. That's either bad RAM, or dodgy power regulation -- the
> latter could be the PSU, or capacitors on the motherboard. (In this
> case, it might also be something funny with the battery).
>
>I would definitely recommend a long run of memtest86. At least 8
> hours, preferably 24. If you get errors repeatedly in the sme place,
> it's the RAM. If they appear randomly, it's probably the power
> regulation.
>
Thanks for the suggestion, I will try to do this in the next days.

> [snip]
>
>>
>> The filesystem had become pretty full, I had planned to increase the
>> Btrfs-partition size before it became corrupt.
>>
>> Active kernel when the filesystem went read only: OpenSUSE Linux
>> 4.14.14-1.geef6178-default, from the
>> http://download.opensuse.org/repositories/Kernel:/stable/standard/stable
>> repository.
>>
>> Fstab mount options: noatime,autodefrag (I have been using the option
>> nossd with older kernels one period in the past on the filesystem).
>>
>> If it matters, I have been running duperemove many times on the
>> filesystem since creation.
>>
>> To test the RAM, I have been running mprime Blend-test for 24 hours
>> after the corruption without any error or warning.
>
>Of all of the bad key order errors I've seen (dozens), I think
> there were a whole two which turned out not to be obviously related to
> corrupt RAM. I still say that it's most likely the hardware.

Okay, thank you for sharing your experience with me.

>
>> Is there a way I can try to repair this filesystem without the need to
>> recreate it and reinstall the operating system? A reinstall including
>> all currently installed packages

Re: bad key ordering - repairable?

2018-01-23 Thread Austin S. Hemmelgarn

On 2018-01-22 21:35, Chris Murphy wrote:

On Mon, Jan 22, 2018 at 2:06 PM, Claes Fransson
 wrote:

Hi!

I really like the features of BTRFS, especially deduplication,
snapshotting and checksumming. However, when using it on my laptop the
last couple of years, it has became corrupted a lot of times.
Sometimes I have managed to fix the problems (at least so much that I
can continue to use the filesystem) with check --repair, but several
times I had to recreate the file system and reinstall the operating
system.

I am guessing the corruptions might be the results of unclean
shutdowns, mostly after system hangs, but also because of running out
of battery sometimes?


I think it's something else because I intentionally and
unintentionally do unclean shutdowns (I'm really impatient and I'm a
saboteur) on my laptop and I never get corruptions. In 18 months with
an HP Spectre which doesn't even have ECC memory, and has an NVMe
drive, *and* really remarkable for almost half this time I used the
discard mount option which pretty much instantly obliterates unused
roots, even when referenced in the super block as backup roots - and
yet still zero corruption. No complaints on mount, scrub, or readonly
checks. *shrug*

Anyway I suspect hardware or power issue. Or even SSD firmware issue.
I would tend to agree here, with one caveat, if it's a laptop that's 
less than 3 years old, you can probably rule out power issues.  Some 
more info on the particular system might help identify what's wrong.



Furthermore, the power-led has recently started blinking (also when
the power-cable is plugged in), I guess because of an old and bad
battery. Maybe the current corruption also can have something to do
with this? However I almost always run with power cable plugged in in
last year, only on battery a few seconds a few times when moving the
laptop.

Currently, I can only mount the filesystem readonly, it goes readonly
automatically if I try to mount it normally.


Btrfs is confused and doesn't want to make the corruption worse. >


Fstab mount options: noatime,autodefrag (I have been using the option
nossd with older kernels one period in the past on the filesystem).

If it matters, I have been running duperemove many times on the
filesystem since creation.


I don't think it's related.




To test the RAM, I have been running mprime Blend-test for 24 hours
after the corruption without any error or warning.


I'm not familiar with it, pretty sure you want this for UEFI:

https://www.memtest86.com/download.htm

Where you can use that or memtest86+ if the firmware is BIOS based.
Do keep in mind that just because it passes memory checks does not mean 
it's not an issue with the RAM.  Memory testers rarely throw false 
positives, but it's pretty common to get false negatives from them.>

I have never noticed any corruptions on the NTFS and Ext4 file systems
on the laptop, only on the Btrfs file systems.


NTFS and ext4 likely won't notice such corruptions either (although
new ext4 volumes any day now will have checksummed metadata by
default) as they're weren't designed with such detection in mind.
This is extremely important to understand.  BTRFS and ZFS are 
essentially the only filesystems available on Linux that actually 
validate things enough to notice this reliably (ReFS on Windows probably 
does, and I think whatever Apple is calling their new FS does too). 
Even if ext4 did notice it, it would just mark the filesystem for a 
check and then keep going without doing anything else about it 
(seriously, the default behavior for internal errors on ext4 is to just 
continue like nothing happened and mark the FS for fsck).

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: bad key ordering - repairable?

2018-01-22 Thread Chris Murphy
On Mon, Jan 22, 2018 at 2:06 PM, Claes Fransson
 wrote:
> Hi!
>
> I really like the features of BTRFS, especially deduplication,
> snapshotting and checksumming. However, when using it on my laptop the
> last couple of years, it has became corrupted a lot of times.
> Sometimes I have managed to fix the problems (at least so much that I
> can continue to use the filesystem) with check --repair, but several
> times I had to recreate the file system and reinstall the operating
> system.
>
> I am guessing the corruptions might be the results of unclean
> shutdowns, mostly after system hangs, but also because of running out
> of battery sometimes?

I think it's something else because I intentionally and
unintentionally do unclean shutdowns (I'm really impatient and I'm a
saboteur) on my laptop and I never get corruptions. In 18 months with
an HP Spectre which doesn't even have ECC memory, and has an NVMe
drive, *and* really remarkable for almost half this time I used the
discard mount option which pretty much instantly obliterates unused
roots, even when referenced in the super block as backup roots - and
yet still zero corruption. No complaints on mount, scrub, or readonly
checks. *shrug*

Anyway I suspect hardware or power issue. Or even SSD firmware issue.

> Furthermore, the power-led has recently started blinking (also when
> the power-cable is plugged in), I guess because of an old and bad
> battery. Maybe the current corruption also can have something to do
> with this? However I almost always run with power cable plugged in in
> last year, only on battery a few seconds a few times when moving the
> laptop.
>
> Currently, I can only mount the filesystem readonly, it goes readonly
> automatically if I try to mount it normally.

Btrfs is confused and doesn't want to make the corruption worse.




>
> Fstab mount options: noatime,autodefrag (I have been using the option
> nossd with older kernels one period in the past on the filesystem).
>
> If it matters, I have been running duperemove many times on the
> filesystem since creation.

I don't think it's related.


>
> To test the RAM, I have been running mprime Blend-test for 24 hours
> after the corruption without any error or warning.

I'm not familiar with it, pretty sure you want this for UEFI:

https://www.memtest86.com/download.htm

Where you can use that or memtest86+ if the firmware is BIOS based.


> I have never noticed any corruptions on the NTFS and Ext4 file systems
> on the laptop, only on the Btrfs file systems.

NTFS and ext4 likely won't notice such corruptions either (although
new ext4 volumes any day now will have checksummed metadata by
default) as they're weren't designed with such detection in mind.


-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: bad key ordering - repairable?

2018-01-22 Thread Hugo Mills
On Mon, Jan 22, 2018 at 10:06:58PM +0100, Claes Fransson wrote:
> Hi!
> 
> I really like the features of BTRFS, especially deduplication,
> snapshotting and checksumming. However, when using it on my laptop the
> last couple of years, it has became corrupted a lot of times.
> Sometimes I have managed to fix the problems (at least so much that I
> can continue to use the filesystem) with check --repair, but several
> times I had to recreate the file system and reinstall the operating
> system.
> 
> I am guessing the corruptions might be the results of unclean
> shutdowns, mostly after system hangs, but also because of running out
> of battery sometimes?
> Furthermore, the power-led has recently started blinking (also when
> the power-cable is plugged in), I guess because of an old and bad
> battery. Maybe the current corruption also can have something to do
> with this? However I almost always run with power cable plugged in in
> last year, only on battery a few seconds a few times when moving the
> laptop.
> 
> Currently, I can only mount the filesystem readonly, it goes readonly
> automatically if I try to mount it normally.
> 
> When booting an OpenSUSE Tumbleweed-20180119 live-iso:
> localhost:~ # uname -r
> 4.14.13-1-default
> localhost:~ # btrfs --version
> btrfs-progs v4.14.1
> 
> localhost:~ # btrfs check -p /dev/sda12
> Checking filesystem on /dev/sda12

[fixing up bad paste]

> UUID: d2819d5a-fd69-484b-bf34-f2b5692cbe1f
> bad key ordering 159 160 bad block 690436964352
> ERROR: errors found in extent allocation tree or chunk allocation
> checking free space cache [.]
> checking fs roots [o]
> checking csums
> bad key ordering 159 160
> Error looking up extent record -1

[snip]

> localhost:~ # btrfs inspect-internal dump-tree -b 690436964352
> /dev/sda12
> btrfs-progs v4.14.1
> leaf 690436964352 items 170 free space 1811 generation 196864 owner 2
> leaf 690436964352 flags 0x1(WRITTEN) backref revision 1
> fs uuid d2819d5a-fd69-484b-bf34-f2b5692cbe1f
> chunk uuid 52f81fe6-893b-4432-9336-895057ee81e1
> .
> .
> .
> item 157 key (22732500992 EXTENT_ITEM 16384) itemoff 6538 itemsize 53
> refs 1 gen 821 flags DATA
> extent data backref root 287 objectid 51665 offset 0 count 1
> item 158 key (22732517376 EXTENT_ITEM 16384) itemoff 6485 itemsize 53
> refs 1 gen 821 flags DATA
> extent data backref root 287 objectid 51666 offset 0 count 1
> item 159 key (22732533760 EXTENT_ITEM 16384) itemoff 6485 itemsize 0
> print-tree.c:428: print_extent_item: BUG_ON `item_size != sizeof(*ei0)` 
> triggered, value 1
> btrfs(+0x365c6)[0x55bdfaada5c6]
> btrfs(print_extent_item+0x424)[0x55bdfaadb284]
> btrfs(btrfs_print_leaf+0x94e)[0x55bdfaadbc1e]
> btrfs(btrfs_print_tree+0x295)[0x55bdfaadcf05]
> btrfs(cmd_inspect_dump_tree+0x734)[0x55bdfab1b024]
> btrfs(main+0x7d)[0x55bdfaac7d4d]
> /lib64/libc.so.6(__libc_start_main+0xea)[0x7ff42100ff4a]
> btrfs(_start+0x2a)[0x55bdfaac7e5a]
> Aborted (core dumped)

   Wow, I've never seen it do that before. It's the next thing I'd
have asked for, so it's good you've preempted it.

   The main thing is that bad key ordering is almost always due to RAM
corruption. That's either bad RAM, or dodgy power regulation -- the
latter could be the PSU, or capacitors on the motherboard. (In this
case, it might also be something funny with the battery).

   I would definitely recommend a long run of memtest86. At least 8
hours, preferably 24. If you get errors repeatedly in the sme place,
it's the RAM. If they appear randomly, it's probably the power
regulation.

[snip]

> 
> The filesystem had become pretty full, I had planned to increase the
> Btrfs-partition size before it became corrupt.
> 
> Active kernel when the filesystem went read only: OpenSUSE Linux
> 4.14.14-1.geef6178-default, from the
> http://download.opensuse.org/repositories/Kernel:/stable/standard/stable
> repository.
> 
> Fstab mount options: noatime,autodefrag (I have been using the option
> nossd with older kernels one period in the past on the filesystem).
> 
> If it matters, I have been running duperemove many times on the
> filesystem since creation.
> 
> To test the RAM, I have been running mprime Blend-test for 24 hours
> after the corruption without any error or warning.

   Of all of the bad key order errors I've seen (dozens), I think
there were a whole two which turned out not to be obviously related to
corrupt RAM. I still say that it's most likely the hardware.

> Is there a way I can try to repair this filesystem without the need to
> recreate it and reinstall the operating system? A reinstall including
> all currently installed packages, and restoring all current system
> settings, would probably take some time for me to do.
> If it is currently not repairable, it would be nice if this kind of
> corruption could be repaired in the future, even if losing a few
> files. Or if the corruptions could be avoided in the fi

bad key ordering - repairable?

2018-01-22 Thread Claes Fransson
Hi!

I really like the features of BTRFS, especially deduplication,
snapshotting and checksumming. However, when using it on my laptop the
last couple of years, it has became corrupted a lot of times.
Sometimes I have managed to fix the problems (at least so much that I
can continue to use the filesystem) with check --repair, but several
times I had to recreate the file system and reinstall the operating
system.

I am guessing the corruptions might be the results of unclean
shutdowns, mostly after system hangs, but also because of running out
of battery sometimes?
Furthermore, the power-led has recently started blinking (also when
the power-cable is plugged in), I guess because of an old and bad
battery. Maybe the current corruption also can have something to do
with this? However I almost always run with power cable plugged in in
last year, only on battery a few seconds a few times when moving the
laptop.

Currently, I can only mount the filesystem readonly, it goes readonly
automatically if I try to mount it normally.

When booting an OpenSUSE Tumbleweed-20180119 live-iso:
localhost:~ # uname -r
4.14.13-1-default
localhost:~ # btrfs --version
btrfs-progs v4.14.1

localhost:~ # btrfs check -p /dev/sda12
Checking filesystem on /dev/sda12
UUID:
d2819d5a-fd69-484b-bf34-f2b5692cbe1f
bad key ordering 159 160

   bad block 690436964352



ERROR: errors found in extent allocation tree or chunk
allocation   checking free
space cache [.]
   checking fs roots [o]

 checking csums

  bad key ordering 159 160

 Error looking up extent record -1

Right section didn't have a record
There are no
extents for csum range 22732550144-24923615232
Csum exists for 16303538176-24923615232 but
there is no extent record ERROR:
errors found in csum tree
 found 344063430663 bytes
used, error(s) found
  total csum bytes: 0

total tree bytes: 453410816

total fs tree bytes: 0
total
extent tree bytes: 452952064
btree space waste bytes: 140165932
file data blocks
allocated: 108462080
 referenced 108462080

localhost:~ # btrfs inspect-internal dump-tree -b 690436964352
/dev/sda12
btrfs-progs v4.14.1
   leaf
690436964352 items 170 free space 1811 generation 196864 owner 2
leaf 690436964352 flags 0x1(WRITTEN)
backref revision 1
fs uuid d2819d5a-fd69-484b-bf34-f2b5692cbe1f
  chunk uuid
52f81fe6-893b-4432-9336-895057ee81e1
.
.
.
item 157 key (22732500992 EXTENT_ITEM 16384) itemoff 6538 itemsize 53
refs 1 gen 821 flags DATA
extent data backref root 287 objectid 51665 offset 0 count 1
item 158 key (22732517376 EXTENT_ITEM 16384) itemoff 6485 itemsize 53
refs 1 gen 821 flags DATA
extent data backref root 287 objectid 51666 offset 0 count 1
item 159 key (22732533760 EXTENT_ITEM 16384) itemoff 6485 itemsize 0
print-tree.c:428: print_extent_item: BUG_ON `item_size !=
sizeof(*ei0)` triggered, value 1
btrfs(+0x365c6)[0x55bdfaada5c6]
btrfs(print_extent_item+0x424)[0x55bdfaadb284]
btrfs(btrfs_print_leaf+0x94e)[0x55bdfaadbc1e]
btrfs(btrfs_print_tree+0x295)[0x55bdfaadcf05]
btrfs(cmd_inspect_dump_tree+0x734)[0x55bdfab1b024]
btrfs(main+0x7d)[0x55bdfaac7d4d]
/lib64/libc.so.6(__libc_start_main+0xea)[0x7ff42100ff4a]
btrfs(_start+0x2a)[0x55bdfaac7e5a]
Aborted (core dumped)


check --repair hangs after reporting "bad key ordering 159 160" with
no disk activity but constant high cpu usage.

localhost:~ # smartctl -a /dev/sda
smartctl 6.5 2016-05-07 r4318 [x86_64-linux-4.14.13-1-default] (SUSE RPM)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model: SanDisk SD8SB8U1T001122
Serial Number:163076421231
LU WWN Device Id: 5 001b44 4a4dde388
Firmware Version: X414
User Capacity:1,024,209,543,168 bytes [1.02 TB]
Sector Size:  512 bytes logical/physical
Rotation Rate:Solid State Device
Form Factor:  2.5 inches
Device is:Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-2 T13/2015-D revision 3
SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:Mon Jan 22 15:28:46 2018 EST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health