Re: unable to fixup (regular) error

2018-11-27 Thread Alexander Fieroch
Actually the data on raid0 is not my data but of my users and they knew 
and accepted the risk for raid0. So in my case it should be ok - I don't 
know the importance of the data files which are affected. I just wanted 
to help finding a possible bug and experiment with a broken btrfs 
filesystem before I recreate it.

For my own files I'd prefer raid5 or raid6...

But thanks Duncan for your explanation! Of course there are people out 
there who do not know the difference between different raid levels and 
should be warned!


Best regards,
Alexander



smime.p7s
Description: S/MIME Cryptographic Signature


Re: unable to fixup (regular) error

2018-11-26 Thread Duncan
Alexander Fieroch posted on Mon, 26 Nov 2018 11:23:00 +0100 as excerpted:

> Am 26.11.18 um 09:13 schrieb Qu Wenruo:
>> The corruption itself looks like some disk error, not some btrfs error
>> like transid error.
> 
> You're right! SMART has an increased value for one harddisk on
> reallocated sector count. Sorry, I missed to check this first...
> 
> I'll try to salvage my data...

FWIW as a general note about raid0 for updating your layout...

Because raid0 is less reliable than a single device (failure of any 
device of the raid0 is likely to take it out, and failure of any one of N 
is more likely than failure of any specific single device), admins 
generally consider it useful only for "throw-away" data, that is, data 
that can be lost without issue either because it really /is/ "throw-
away" (internet cache being a common example), or because it is 
considered a "throw-away" copy of the "real" data stored elsewhere, with 
that "real" copy being either the real working copy of which the raid0 is 
simply a faster cache, or with the raid0 being the working copy, but with 
sufficiently frequent backup updates that if the raid0 goes, it won't 
take anything of value with it (read as the effort to replace any data 
lost will be reasonably trivial, likely only a few minutes or hours, at 
worst perhaps a day's worth, of work, depending on how many people's work 
is involved and how much their time is considered to be worth).

So if it's raid0, you shouldn't be needing to worry about trying to 
recover what's on it, and probably shouldn't even be trying to run a 
btrfs check on it at all as it's likely to be more trouble and take more 
time than the throw-away data on it is worth.  If something goes wrong 
with a raid0, just declare it lost, blow it away and recreate fresh, 
restoring from the "real" copy if necessary.  Because for an admin, 
really with any data but particularly for a raid0, it's more a matter of 
when it'll die than if.

If that's inappropriate for the value of the data and status of the 
backups/real-copies, then you should really be reconsidering whether 
raid0 of any sort is appropriate, because it almost certainly is not.


For btrfs, what you might try instead of raid0, is raid1 metadata at 
least, raid0 or single mode data if there's not room enough to do raid1 
data as well.  And the raid1 metadata would have very likely saved the 
filesystem in this case, with some loss of files possible depending on 
where the damage is, but with the second copy of the metadata from the 
good device being used to fill in for and (attempt to, if the bad device 
is actively getting worse it might be a losing battle) repair any 
metadata damage on the bad device, thus giving you a far better chance of 
saving the filesystem as a whole.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman



Re: unable to fixup (regular) error

2018-11-26 Thread Alexander Fieroch

Am 26.11.18 um 09:13 schrieb Qu Wenruo:

The corruption itself looks like some disk error, not some btrfs error
like transid error.


You're right! SMART has an increased value for one harddisk on 
reallocated sector count. Sorry, I missed to check this first...


I'll try to salvage my data...

Thanks!

Best,
Alexander



smime.p7s
Description: S/MIME Cryptographic Signature


Re: unable to fixup (regular) error

2018-11-26 Thread Qu Wenruo


On 2018/11/26 下午3:19, Alexander Fieroch wrote:
> Hi,
> 
> My data partition with btrfs RAID 0 (/dev/sdc0 and /dev/sdd0) shows
> errors in syslog:
> 
> BTRFS error (device sdc): cleaner transaction attach returned -30
> BTRFS info (device sdc): disk space caching is enabled
> BTRFS info (device sdc): has skinny extents
> BTRFS info (device sdc): bdev /dev/sdc errs: wr 0, rd 0, flush 0,
> corrupt 3, gen 1
> BTRFS info (device sdc): bdev /dev/sdd errs: wr 0, rd 0, flush 0,
> corrupt 6, gen 2

Generation mismatch means something more serious.

> 
> 
> BTRFS error (device sdc): scrub: tree block 858803990528 spanning
> stripes, ignored. logical=3D858803929088

While the spanning stripes only means scrub code can't really check it
since it crosses stripe boundary.

It's normally nothing to worry, and it's normally caused by old kernel.
Newer kernel will avoid such problem from happening, but for existing
one, it will just skip it.

> BTRFS error (device sdc): scrub: tree block 858803990528 spanning
> stripes, ignored. logical=3D858803994624
> BTRFS warning (device sdc): checksum error at logical 858803961856 on
> dev /dev/sdd, physical 385263894528: metadata leaf (level 0) in tree 7
> BTRFS warning (device sdc): checksum error at logical 858803961856 on
> dev /dev/sdd, physical 385263894528: metadata leaf (level 0) in tree 7

This means some csum tree blocks get corrupted.

> BTRFS error (device sdc): bdev /dev/sdd errs: wr 0, rd 0, flush 0,
> corrupt 4, gen 1
> BTRFS error (device sdc): scrub: tree block 858820505600 spanning
> stripes, ignored. logical=3D858820444160
> BTRFS error (device sdc): scrub: tree block 858820505600 spanning
> stripes, ignored. logical=3D858820509696
> BTRFS error (device sdc): unable to fixup (regular) error at logical
> 858803961856 on dev /dev/sdd
> BTRFS error (device sdc): scrub: tree block 858821292032 spanning
> stripes, ignored. logical=3D858821230592
> BTRFS error (device sdc): scrub: tree block 858821292032 spanning
> stripes, ignored. logical=3D858821296128
> BTRFS warning (device sdc): checksum error at logical 858821263360 on
> dev /dev/sdd, physical 385281196032: metadata leaf (level 0) in tree 7
> BTRFS warning (device sdc): checksum error at logical 858821263360 on
> dev /dev/sdd, physical 385281196032: metadata leaf (level 0) in tree 7
> BTRFS error (device sdc): bdev /dev/sdd errs: wr 0, rd 0, flush 0,
> corrupt 5, gen 1
> BTRFS error (device sdc): unable to fixup (regular) error at logical
> 858821263360 on dev /dev/sdd
> BTRFS warning (device sdc): checksum/header error at logical
> 858820476928 on dev /dev/sdd, physical 385280409600: metadata leaf
> (level 0) in tree 7
> BTRFS warning (device sdc): checksum/header error at logical
> 858820476928 on dev /dev/sdd, physical 385280409600: metadata leaf
> (level 0) in tree 7
> BTRFS error (device sdc): bdev /dev/sdd errs: wr 0, rd 0, flush 0,
> corrupt 5, gen 2
> BTRFS warning (device sdc): checksum error at logical 858820489216 on
> dev /dev/sdd, physical 385280421888: metadata leaf (level 0) in tree 2
> BTRFS warning (device sdc): checksum error at logical 858820489216 on
> dev /dev/sdd, physical 385280421888: metadata leaf (level 0) in tree 2

This is some error in extent tree, and I'd say it's a serious problem
which may affect later write operation.

> BTRFS error (device sdc): bdev /dev/sdd errs: wr 0, rd 0, flush 0,
> corrupt 6, gen 2
> BTRFS error (device sdc): unable to fixup (regular) error at logical
> 858820476928 on dev /dev/sdd
> BTRFS error (device sdc): unable to fixup (regular) error at logical
> 858820489216 on dev /dev/sdd0
> 
> 
> $ btrfs filesystem show /mnt/data/
> Label: none  uuid: 5e6506b0-bf15-4b2e-b5f4-322c44b89db6
>   Total devices 2 FS bytes used 10.17TiB
>   devid    1 size 5.46TiB used 5.43TiB path /dev/sdc
>   devid    2 size 5.46TiB used 5.43TiB path /dev/sdd
> 
> $ btrfs --version
> btrfs-progs v4.15.1
> 
> $ uname -a
> Linux gpur1 4.15.0-39-generic #42-Ubuntu SMP Tue Oct 23 15:48:01 UTC
> 2018 x86_64 x86_64 x86_64 GNU/Linux
> 
> 
> $ btrfs dev stats /dev/sdc
> [/dev/sdc].write_io_errs    0
> [/dev/sdc].read_io_errs 0
> [/dev/sdc].flush_io_errs    0
> [/dev/sdc].corruption_errs  3
> [/dev/sdc].generation_errs  1
> 
> $ btrfs dev stats /dev/sdd
> [/dev/sdd].write_io_errs    0
> [/dev/sdd].read_io_errs 0
> [/dev/sdd].flush_io_errs    0
> [/dev/sdd].corruption_errs  3
> [/dev/sdd].generation_errs  1
> 
> $ btrfs fi show
> Label: 'system'  uuid: ae121e8e-d483-45f4-8568-2817f5c5d497
>     Total devices 1 FS bytes used 194.05GiB
>     devid    1 size 228.66GiB used 199.03GiB path /dev/sda3
> Label: none  uuid: 5e6506b0-bf15-4b2e-b

unable to fixup (regular) error

2018-11-25 Thread Alexander Fieroch

Hi,

My data partition with btrfs RAID 0 (/dev/sdc0 and /dev/sdd0) shows
errors in syslog:

BTRFS error (device sdc): cleaner transaction attach returned -30
BTRFS info (device sdc): disk space caching is enabled
BTRFS info (device sdc): has skinny extents
BTRFS info (device sdc): bdev /dev/sdc errs: wr 0, rd 0, flush 0,
corrupt 3, gen 1
BTRFS info (device sdc): bdev /dev/sdd errs: wr 0, rd 0, flush 0,
corrupt 6, gen 2


BTRFS error (device sdc): scrub: tree block 858803990528 spanning
stripes, ignored. logical=3D858803929088
BTRFS error (device sdc): scrub: tree block 858803990528 spanning
stripes, ignored. logical=3D858803994624
BTRFS warning (device sdc): checksum error at logical 858803961856 on
dev /dev/sdd, physical 385263894528: metadata leaf (level 0) in tree 7
BTRFS warning (device sdc): checksum error at logical 858803961856 on
dev /dev/sdd, physical 385263894528: metadata leaf (level 0) in tree 7
BTRFS error (device sdc): bdev /dev/sdd errs: wr 0, rd 0, flush 0,
corrupt 4, gen 1
BTRFS error (device sdc): scrub: tree block 858820505600 spanning
stripes, ignored. logical=3D858820444160
BTRFS error (device sdc): scrub: tree block 858820505600 spanning
stripes, ignored. logical=3D858820509696
BTRFS error (device sdc): unable to fixup (regular) error at logical
858803961856 on dev /dev/sdd
BTRFS error (device sdc): scrub: tree block 858821292032 spanning
stripes, ignored. logical=3D858821230592
BTRFS error (device sdc): scrub: tree block 858821292032 spanning
stripes, ignored. logical=3D858821296128
BTRFS warning (device sdc): checksum error at logical 858821263360 on
dev /dev/sdd, physical 385281196032: metadata leaf (level 0) in tree 7
BTRFS warning (device sdc): checksum error at logical 858821263360 on
dev /dev/sdd, physical 385281196032: metadata leaf (level 0) in tree 7
BTRFS error (device sdc): bdev /dev/sdd errs: wr 0, rd 0, flush 0,
corrupt 5, gen 1
BTRFS error (device sdc): unable to fixup (regular) error at logical
858821263360 on dev /dev/sdd
BTRFS warning (device sdc): checksum/header error at logical
858820476928 on dev /dev/sdd, physical 385280409600: metadata leaf
(level 0) in tree 7
BTRFS warning (device sdc): checksum/header error at logical
858820476928 on dev /dev/sdd, physical 385280409600: metadata leaf
(level 0) in tree 7
BTRFS error (device sdc): bdev /dev/sdd errs: wr 0, rd 0, flush 0,
corrupt 5, gen 2
BTRFS warning (device sdc): checksum error at logical 858820489216 on
dev /dev/sdd, physical 385280421888: metadata leaf (level 0) in tree 2
BTRFS warning (device sdc): checksum error at logical 858820489216 on
dev /dev/sdd, physical 385280421888: metadata leaf (level 0) in tree 2
BTRFS error (device sdc): bdev /dev/sdd errs: wr 0, rd 0, flush 0,
corrupt 6, gen 2
BTRFS error (device sdc): unable to fixup (regular) error at logical
858820476928 on dev /dev/sdd
BTRFS error (device sdc): unable to fixup (regular) error at logical
858820489216 on dev /dev/sdd0


$ btrfs filesystem show /mnt/data/
Label: none  uuid: 5e6506b0-bf15-4b2e-b5f4-322c44b89db6
  Total devices 2 FS bytes used 10.17TiB
  devid1 size 5.46TiB used 5.43TiB path /dev/sdc
  devid2 size 5.46TiB used 5.43TiB path /dev/sdd

$ btrfs --version
btrfs-progs v4.15.1

$ uname -a
Linux gpur1 4.15.0-39-generic #42-Ubuntu SMP Tue Oct 23 15:48:01 UTC
2018 x86_64 x86_64 x86_64 GNU/Linux


$ btrfs dev stats /dev/sdc
[/dev/sdc].write_io_errs0
[/dev/sdc].read_io_errs 0
[/dev/sdc].flush_io_errs0
[/dev/sdc].corruption_errs  3
[/dev/sdc].generation_errs  1

$ btrfs dev stats /dev/sdd
[/dev/sdd].write_io_errs0
[/dev/sdd].read_io_errs 0
[/dev/sdd].flush_io_errs0
[/dev/sdd].corruption_errs  3
[/dev/sdd].generation_errs  1

$ btrfs fi show
Label: 'system'  uuid: ae121e8e-d483-45f4-8568-2817f5c5d497
Total devices 1 FS bytes used 194.05GiB
devid1 size 228.66GiB used 199.03GiB path /dev/sda3
Label: none  uuid: 5e6506b0-bf15-4b2e-b5f4-322c44b89db6
Total devices 2 FS bytes used 10.17TiB
devid1 size 5.46TiB used 5.43TiB path /dev/sdc
devid2 size 5.46TiB used 5.43TiB path /dev/sdd

$ btrfs fi df /mnt/data/
Data, RAID0: total=10.84TiB, used=10.15TiB
System, RAID1: total=8.00MiB, used=896.00KiB
Metadata, RAID1: total=15.00GiB, used=13.28GiB
GlobalReserve, single: total=512.00MiB, used=0.00

$ btrfs scrub start -B /dev/sdc
ERROR: scrubbing /dev/sdc failed for device id 1: ret=-1, errno=5 
(Input/output error)

scrub canceled for 5e6506b0-bf15-4b2e-b5f4-322c44b89db6
 scrub started at Thu Nov 22 07:43:45 2018 and was aborted 
after 02:31:49

 total bytes scrubbed: 1.58TiB with 10 errors
 error details: verify=1 csum=3
 corrected errors: 0, uncorrectable errors: 10, unverified 
errors: 0




I've tried
$ btrfs check /dev/sdc
Checking filesystem on /dev/sdc
UUID: 5e6506b0-bf15-4b2e-b5f4-322c44b89db6
btrfs check --repairchecking extents
  ERROR: add_tree_backref failed (extent items shared bloc

Re: Unable to fixup (regular) error in RAID1 fs

2014-10-29 Thread Chris Murphy

On Oct 29, 2014, at 2:08 AM, Juan Orti  wrote:

> El 2014-10-29 04:02, Duncan escribió:
>> Juan Orti posted on Tue, 28 Oct 2014 16:54:19 +0100 as excerpted:
>>> [ 3713.086292] BTRFS: unable to fixup (regular) error at logical
>>> 483011874816 on dev /dev/sdb2
>>> [ 3713.092577] BTRFS: checksum error at logical 483011948544 on dev
>>> /dev/sdb2, sector 628793528, root 2500, inode 1436631, offset
>>> 4059963392, length 4096, links 1 (path:
>>> juan/.local/share/gnome-boxes/images/boxes-unknown)
>>> [ 3713.092584] BTRFS: bdev /dev/sdb2 errs: wr 0, rd 0, flush 0, corrupt
>>> 38, gen 0
>>> [ 3713.093035] BTRFS: unable to fixup (regular) error at logical
>>> 483011948544 on dev /dev/sdb2
>>> Why can't it fix the errors? a bad device? smartctl says the disk is ok.
>>> I'm currently running a full scrub to see if it finds more errors. What
>>> should I do?
>> Btrfs raid1, and I see you have it for both data and metadata.
>> During normal operation, when btrfs comes across a block that doesn't
>> match its checksum, it will look to see if there's another copy (which
>> there is with raid1, which has exactly two copies) of that block and will
>> try to use it instead if so.  If the second copy matches the checksum,
>> all is fine and btrfs will in fact attempt to rewrite the bad copy using
>> the good copy, as well as returning the good copy to whatever was
>> reading it.
>> Those corruption errors seem to indicate that it can't find a good
>> copy to update the bad copy with -- both copies ended up bad.  Either
>> that or it found the good copy and returned it to whatever was reading,
>> but couldn't rewrite the bad copy, for some reason.
>> I'm not sure which of those interpretations is correct, but given
>> that you didn't see anything else bad happening, no apps returning
>> errors due to read error, etc, I'd guess the second.  Because
>> otherwise whatever was doing the read should have returned an
>> error.
> 
> When this error happened, I was editing some text files with vi, and it was 
> painfully slow, it took 30 seconds to open a 20 lines file, so something 
> weird was going on. Anyway, no visible user space error could be seen.

Anything in dmesg prior to the previously reported errors?

Either with syslog messages or journalctl, filter by btrfs and see what you get 
for the past couple of days. And then also find out what ata port the two 
drives are on and filter by those; usually in the form ataX.00. You could also 
search for "exception Emask" and see if anything comes up. This would account 
for either controller or drive hardware error messages.


Chris Murphy

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Unable to fixup (regular) error in RAID1 fs

2014-10-29 Thread Juan Orti

El 2014-10-29 04:02, Duncan escribió:

Juan Orti posted on Tue, 28 Oct 2014 16:54:19 +0100 as excerpted:


[ 3713.086292] BTRFS: unable to fixup (regular) error at logical
483011874816 on dev /dev/sdb2
[ 3713.092577] BTRFS: checksum error at logical 483011948544 on dev
/dev/sdb2, sector 628793528, root 2500, inode 1436631, offset
4059963392, length 4096, links 1 (path:
juan/.local/share/gnome-boxes/images/boxes-unknown)
[ 3713.092584] BTRFS: bdev /dev/sdb2 errs: wr 0, rd 0, flush 0, 
corrupt

38, gen 0
[ 3713.093035] BTRFS: unable to fixup (regular) error at logical
483011948544 on dev /dev/sdb2

Why can't it fix the errors? a bad device? smartctl says the disk is 
ok.
I'm currently running a full scrub to see if it finds more errors. 
What

should I do?


Btrfs raid1, and I see you have it for both data and metadata.

During normal operation, when btrfs comes across a block that doesn't
match its checksum, it will look to see if there's another copy (which
there is with raid1, which has exactly two copies) of that block and 
will

try to use it instead if so.  If the second copy matches the checksum,
all is fine and btrfs will in fact attempt to rewrite the bad copy 
using

the good copy, as well as returning the good copy to whatever was
reading it.

Those corruption errors seem to indicate that it can't find a good
copy to update the bad copy with -- both copies ended up bad.  Either
that or it found the good copy and returned it to whatever was reading,
but couldn't rewrite the bad copy, for some reason.

I'm not sure which of those interpretations is correct, but given
that you didn't see anything else bad happening, no apps returning
errors due to read error, etc, I'd guess the second.  Because
otherwise whatever was doing the read should have returned an
error.


When this error happened, I was editing some text files with vi, and it 
was painfully slow, it took 30 seconds to open a 20 lines file, so 
something weird was going on. Anyway, no visible user space error could 
be seen.





Doing a scrub, as you already did, is the first thing I'd try here,
since normal operation won't catch all the errors.

BUT, you report that the scrub found no errors, which is weird.
You have the log saying there's corruption errors, but scrub
saying there's not.

The easiest explanation for something like that, is that the errors
were temporary.  If it happens again or regularly, consider running
memcheck or the like, as it could be bad memory.  Do you have ECC RAM?


I don't have ECC RAM, it's a regular desktop PC. Some RAM checks in the 
past have shown no errors, I'll check it again.




Another question.  Do you have skinny metadata on that btrfs?  If you
do, btrfs should mention "skinny extents" when mounting the filesystem.


No skinny metadata. I made the fs with the standard options, just with 
raid1 for data and metadata.




The reason I'm asking this is that if I'm reading the patch 
descriptions
correctly, a recently posted patch deals with a specific 
skinny-metadata

bug where wrong results would occasionally be returned, resulting in
errors.  Not being a dev I don't have the technical ability to know for
sure whether this could be connected to that or not, but it sounds like
the sort of thing I might expect from a bug that intermittently 
returned

bad data -- odd apparent corruption errors in normal use that scrub
can't see, even tho it's designed to catch and fix if possible exactly
that sort of corruption error.

Anyway, if scrub says no corruption, for a potential corruption error
I'd be inclined to trust scrub, so I think the filesystem is fine.
But if so, I'm worried about what might be triggering these
intermittent errors.  Certainly watch for more of them, and if you're
running skinny-metadata, consider finding and applying that patch.
If not or in general, also be on the lookout for more possible hints
of failing memory and/or run a good memory checker for a few hours
and see if it reports all is well.

But as they say about some kinds of potential cancer reports at times,
sometimes watchful waiting is the best you can do, hoping no further
symptoms show up, but being alert in case they do, to try something
more drastic, that isn't warranted /unless/ they do.


That's what I'll do, I'll wait and see.

Thank you for your explanation.

--
Juan Orti
https://miceliux.com

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Unable to fixup (regular) error in RAID1 fs

2014-10-28 Thread Duncan
Juan Orti posted on Tue, 28 Oct 2014 16:54:19 +0100 as excerpted:

> [ 3713.086292] BTRFS: unable to fixup (regular) error at logical 
> 483011874816 on dev /dev/sdb2
> [ 3713.092577] BTRFS: checksum error at logical 483011948544 on dev 
> /dev/sdb2, sector 628793528, root 2500, inode 1436631, offset 
> 4059963392, length 4096, links 1 (path: 
> juan/.local/share/gnome-boxes/images/boxes-unknown)
> [ 3713.092584] BTRFS: bdev /dev/sdb2 errs: wr 0, rd 0, flush 0, corrupt 
> 38, gen 0
> [ 3713.093035] BTRFS: unable to fixup (regular) error at logical 
> 483011948544 on dev /dev/sdb2
> 
> Why can't it fix the errors? a bad device? smartctl says the disk is ok. 
> I'm currently running a full scrub to see if it finds more errors. What 
> should I do?

Btrfs raid1, and I see you have it for both data and metadata.

During normal operation, when btrfs comes across a block that doesn't
match its checksum, it will look to see if there's another copy (which
there is with raid1, which has exactly two copies) of that block and will
try to use it instead if so.  If the second copy matches the checksum,
all is fine and btrfs will in fact attempt to rewrite the bad copy using
the good copy, as well as returning the good copy to whatever was
reading it.

Those corruption errors seem to indicate that it can't find a good
copy to update the bad copy with -- both copies ended up bad.  Either
that or it found the good copy and returned it to whatever was reading,
but couldn't rewrite the bad copy, for some reason.

I'm not sure which of those interpretations is correct, but given
that you didn't see anything else bad happening, no apps returning
errors due to read error, etc, I'd guess the second.  Because
otherwise whatever was doing the read should have returned an
error.

Doing a scrub, as you already did, is the first thing I'd try here,
since normal operation won't catch all the errors.

BUT, you report that the scrub found no errors, which is weird.
You have the log saying there's corruption errors, but scrub
saying there's not.

The easiest explanation for something like that, is that the errors
were temporary.  If it happens again or regularly, consider running
memcheck or the like, as it could be bad memory.  Do you have ECC RAM?

Another question.  Do you have skinny metadata on that btrfs?  If you
do, btrfs should mention "skinny extents" when mounting the filesystem.

The reason I'm asking this is that if I'm reading the patch descriptions
correctly, a recently posted patch deals with a specific skinny-metadata
bug where wrong results would occasionally be returned, resulting in
errors.  Not being a dev I don't have the technical ability to know for
sure whether this could be connected to that or not, but it sounds like
the sort of thing I might expect from a bug that intermittently returned
bad data -- odd apparent corruption errors in normal use that scrub
can't see, even tho it's designed to catch and fix if possible exactly
that sort of corruption error.

Anyway, if scrub says no corruption, for a potential corruption error
I'd be inclined to trust scrub, so I think the filesystem is fine.
But if so, I'm worried about what might be triggering these
intermittent errors.  Certainly watch for more of them, and if you're
running skinny-metadata, consider finding and applying that patch.
If not or in general, also be on the lookout for more possible hints
of failing memory and/or run a good memory checker for a few hours
and see if it reports all is well.

But as they say about some kinds of potential cancer reports at times,
sometimes watchful waiting is the best you can do, hoping no further
symptoms show up, but being alert in case they do, to try something
more drastic, that isn't warranted /unless/ they do.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Unable to fixup (regular) error in RAID1 fs

2014-10-28 Thread Juan Orti
El mar, 28-10-2014 a las 16:54 +0100, Juan Orti escribió:
> I'm seeing these errors in a RAID1 fs:
> (...)
> Why can't it fix the errors? a bad device? smartctl says the disk is ok. 
> I'm currently running a full scrub to see if it finds more errors. What 
> should I do?
> 

Well, the scrub has finished without errors. Should I worry or not?


-- 
Juan Orti
https://miceliux.com



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Unable to fixup (regular) error in RAID1 fs

2014-10-28 Thread Juan Orti

I'm seeing these errors in a RAID1 fs:

[ 3565.073223] BTRFS: bdev /dev/sdb2 errs: wr 0, rd 0, flush 0, corrupt 
30, gen 0
[ 3565.073472] BTRFS: unable to fixup (regular) error at logical 
460632743936 on dev /dev/sdb2
[ 3566.605419] BTRFS: checksum error at logical 461883383808 on dev 
/dev/sdb2, sector 600109712, root 2500, inode 1436631, offset 
6134886400, length 4096, links 1 (path: 
juan/.local/share/gnome-boxes/images/boxes-unknown)
[ 3566.605429] BTRFS: bdev /dev/sdb2 errs: wr 0, rd 0, flush 0, corrupt 
31, gen 0
[ 3566.629207] BTRFS: unable to fixup (regular) error at logical 
461883383808 on dev /dev/sdb2
[ 3569.459460] BTRFS: bdev /dev/sdb2 errs: wr 0, rd 0, flush 0, corrupt 
32, gen 0
[ 3569.478667] BTRFS: unable to fixup (regular) error at logical 
462282203136 on dev /dev/sdb2
[ 3569.479163] BTRFS: bdev /dev/sdb2 errs: wr 0, rd 0, flush 0, corrupt 
33, gen 0
[ 3569.479531] BTRFS: unable to fixup (regular) error at logical 
462282207232 on dev /dev/sdb2
[ 3569.479970] BTRFS: bdev /dev/sdb2 errs: wr 0, rd 0, flush 0, corrupt 
34, gen 0
[ 3569.480102] BTRFS: unable to fixup (regular) error at logical 
462282211328 on dev /dev/sdb2
[ 3569.494522] BTRFS: bdev /dev/sdb2 errs: wr 0, rd 0, flush 0, corrupt 
35, gen 0
[ 3569.494709] BTRFS: unable to fixup (regular) error at logical 
462282215424 on dev /dev/sdb2
[ 3569.495148] BTRFS: bdev /dev/sdb2 errs: wr 0, rd 0, flush 0, corrupt 
36, gen 0
[ 3713.075962] BTRFS: checksum error at logical 483011874816 on dev 
/dev/sdb2, sector 628793384, root 2500, inode 1436631, offset 
3997003776, length 4096, links 1 (path: 
juan/.local/share/gnome-boxes/images/boxes-unknown)
[ 3713.075987] BTRFS: bdev /dev/sdb2 errs: wr 0, rd 0, flush 0, corrupt 
37, gen 0
[ 3713.086292] BTRFS: unable to fixup (regular) error at logical 
483011874816 on dev /dev/sdb2
[ 3713.092577] BTRFS: checksum error at logical 483011948544 on dev 
/dev/sdb2, sector 628793528, root 2500, inode 1436631, offset 
4059963392, length 4096, links 1 (path: 
juan/.local/share/gnome-boxes/images/boxes-unknown)
[ 3713.092584] BTRFS: bdev /dev/sdb2 errs: wr 0, rd 0, flush 0, corrupt 
38, gen 0
[ 3713.093035] BTRFS: unable to fixup (regular) error at logical 
483011948544 on dev /dev/sdb2


Why can't it fix the errors? a bad device? smartctl says the disk is ok. 
I'm currently running a full scrub to see if it finds more errors. What 
should I do?


Versions used:
kernel-3.16.6-200.fc20.x86_64
btrfs-progs-3.16.2-1.fc20.x86_64

Full dmesg: http://ur1.ca/ikxxl

# btrfs fi show
Label: 'fedora_xenon'  uuid: f1c013ff-9bd4-48fe-828e-d0b7b9d91af1
Total devices 1 FS bytes used 13.85GiB
devid1 size 103.22GiB used 17.04GiB path /dev/sda4

Label: 'btrfs_raid1'  uuid: 7721c28b-8ae6-432d-bfe1-0f98fb4043e0
Total devices 3 FS bytes used 1.50TiB
devid1 size 1.81TiB used 1.08TiB path /dev/sdb2
devid2 size 1.81TiB used 1.08TiB path /dev/sdc2
devid3 size 1.81TiB used 1.08TiB path /dev/sdd2

# btrfs fi df /mnt/btrfs_raid1/
Data, RAID1: total=1.60TiB, used=1.49TiB
System, RAID1: total=32.00MiB, used=256.00KiB
Metadata, RAID1: total=10.00GiB, used=5.75GiB
GlobalReserve, single: total=512.00MiB, used=0.00

# btrfs fi df /mnt/btrfs_ssd/
Data, single: total=15.01GiB, used=13.13GiB
System, single: total=32.00MiB, used=16.00KiB
Metadata, single: total=2.00GiB, used=739.86MiB
GlobalReserve, single: total=256.00MiB, used=0.00

--
Juan Orti
https://miceliux.com

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


unable to fixup (regular) error

2013-08-14 Thread Cameron Berkenpas

Hello,

I hope this is the correct mailing list.

I have btrfs running on a 6TB (5.5ish TiB) raid10 array on a 3ware 
9750-4i controller. I decided to run a script and a got 5 checksum 
errors for the same file (errors from dmesg below).


I deleted the file without any issues, reran scrub, and now I don't see 
any errors. The file itself was unimportant as it was from a backup of 
another box that I already have multiple backups of (and the box itself 
is still fine). The disks in the array appear to all be fine and the 
array is also healthy. I also run "verify" regularly. Verify appears to 
be the controller's equivalent to scrub.


Additionally, according to smartctl, things are healthly although it 
seems error logging isn't supported:

Vendor:   LSI
Product:  9750-4iDISK
Revision: 5.12
User Capacity:5,999,977,037,824 bytes [5.99 TB]
Logical block size:   512 bytes
Logical Unit id:  0x600050e016538a0045671197
Serial number:9XK0C13D16538A004567
Device type:  disk
Local Time is:Wed Aug 14 15:15:42 2013 PDT
Device supports SMART and is Disabled
Temperature Warning Disabled or Not Supported
SMART Health Status: OK

Error Counter logging not supported
Device does not support Self Test logging

Any idea what may have happened here? Is this something to worry about?

Thanks,

-Cameron

[101511.280510] btrfs: checksum error at logical 1590664605696 on dev 
/dev/sda3, sector 3119366104, root 681, inode 1668516, offset 3473408, 
length 4096, links 1 (path: path/to/some/file)
[101511.288676] btrfs: bdev /dev/sda3 errs: wr 0, rd 0, flush 0, corrupt 
1, gen 0
[101511.291611] btrfs: unable to fixup (regular) error at logical 
1590664605696 on dev /dev/sda3
[101511.390081] btrfs: checksum error at logical 1590664609792 on dev 
/dev/sda3, sector 3119366112, root 681, inode 1668516, offset 3477504, 
length 4096, links 1 (path: path/to/some/file)
[101511.399321] btrfs: bdev /dev/sda3 errs: wr 0, rd 0, flush 0, corrupt 
2, gen 0
[101511.402552] btrfs: unable to fixup (regular) error at logical 
1590664609792 on dev /dev/sda3
[101511.406038] btrfs: checksum error at logical 1590664613888 on dev 
/dev/sda3, sector 3119366120, root 681, inode 1668516, offset 3481600, 
length 4096, links 1 (path: path/to/some/file)
[101511.416438] btrfs: bdev /dev/sda3 errs: wr 0, rd 0, flush 0, corrupt 
3, gen 0
[101511.420050] btrfs: unable to fixup (regular) error at logical 
1590664613888 on dev /dev/sda3
[101511.424238] btrfs: checksum error at logical 1590664617984 on dev 
/dev/sda3, sector 3119366128, root 681, inode 1668516, offset 3485696, 
length 4096, links 1 (path: path/to/some/file)
[101511.435928] btrfs: bdev /dev/sda3 errs: wr 0, rd 0, flush 0, corrupt 
4, gen 0
[101511.440241] btrfs: unable to fixup (regular) error at logical 
1590664617984 on dev /dev/sda3
[101511.523988] btrfs: checksum error at logical 1590664622080 on dev 
/dev/sda3, sector 3119366136, root 681, inode 1668516, offset 3489792, 
length 4096, links 1 (path: path/to/some/file)
[101511.537097] btrfs: bdev /dev/sda3 errs: wr 0, rd 0, flush 0, corrupt 
5, gen 0
[101511.541636] btrfs: unable to fixup (regular) error at logical 
1590664622080 on dev /dev/sda3




--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html