Possible deadlock when writing

2018-11-26 Thread Larkin Lowrey
I started having a host freeze randomly when running a 4.18 kernel. The 
host was stable when running 4.17.12.


At first, it appeared that it was only IO that was frozen since I could 
run common commands that were likely cached in RAM and that did not 
touch storage. Anything that did touch storage would freeze and I would 
not be able to ctrl-c it.


I noticed today, when it happened with kernel 4.19.2, that backups were 
still running and that the backup app could still read from the backup 
snapshot subvol. It's possible that the backups are still able to 
proceed because the accesses are all read-only and the snapshot was 
mounted with noatime so the backup process never triggers a write.


There never are any errors output to the console when this happens and 
nothing is logged. When I first encountered this back in Sept. I managed 
to record a few sysrq dumps and attached them to a redhat ticket. See 
links below.


https://bugzilla.redhat.com/show_bug.cgi?id=1627288
https://bugzilla.redhat.com/attachment.cgi?id=1482177

I do have several VMs running that have their image files nocow'd. 
Interestingly, all the VMs, except 1, seem to be able to write just 
fine. The one that can't has frozen completely and is the one that 
regularly generates the most IO.


Any ideas on how to debug this further?

--Larkin


Re: Scrub aborts due to corrupt leaf

2018-10-10 Thread Larkin Lowrey

On 10/10/2018 10:51 PM, Chris Murphy wrote:

On Wed, Oct 10, 2018 at 8:12 PM, Larkin Lowrey
 wrote:

On 10/10/2018 7:55 PM, Hans van Kranenburg wrote:

On 10/10/2018 07:44 PM, Chris Murphy wrote:


I'm pretty sure you have to umount, and then clear the space_cache
with 'btrfs check --clear-space-cache=v1' and then do a one time mount
with -o space_cache=v2.

The --clear-space-cache=v1 is optional, but recommended, if you are
someone who do not likes to keep accumulated cruft.

The v2 mount (rw mount!!!) does not remove the v1 cache. If you just
mount with v2, the v1 data keeps being there, doing nothing any more.


Theoretically I have the v2 space_cache enabled. After a clean umount...

# mount -onospace_cache /backups
[  391.243175] BTRFS info (device dm-3): disabling free space tree
[  391.249213] BTRFS error (device dm-3): cannot disable free space tree
[  391.255884] BTRFS error (device dm-3): open_ctree failed

"free space tree" is the v2 space cache, and once enabled it cannot be
disabled with nospace_cache mount option. If you want to run with
nospace_cache you'll need to clear it.



# mount -ospace_cache=v1 /backups/
mount: /backups: wrong fs type, bad option, bad superblock on
/dev/mapper/Cached-Backups, missing codepage or helper program, or other
error
[  983.501874] BTRFS info (device dm-3): enabling disk space caching
[  983.508052] BTRFS error (device dm-3): cannot disable free space tree
[  983.514633] BTRFS error (device dm-3): open_ctree failed

You cannot go back and forth between v1 and v2. Once v2 is enabled,
it's always used regardless of any mount option. You'll need to use
btrfs check to clear the v2 cache if you want to use v1 cache.



# btrfs check --clear-space-cache v1 /dev/Cached/Backups
Opening filesystem to check...
couldn't open RDWR because of unsupported option features (3).
ERROR: cannot open file system

You're missing the '=' symbol for the clear option, that's why it fails.



# btrfs check --clear-space-cache=v2 /dev/Cached/Backups
Opening filesystem to check...
Checking filesystem on /dev/Cached/Backups
UUID: acff5096-1128-4b24-a15e-4ba04261edc3
Clear free space cache v2
Segmentation fault (core dumped)

[  109.686188] btrfs[2429]: segfault at 68 ip 555ff6394b1c sp 
7ffcc4733ab0 error 4 in btrfs[555ff637c000+ca000]
[  109.696732] Code: ff e8 68 ed ff ff 8b 4c 24 58 4d 8b 8f c7 01 00 00 
4c 89 fe 85 c0 0f 44 44 24 40 45 31 c0 89 44 24 40 48 8b 84 24 90 00 00 
00 <8b> 40 68 49 29 87 d0 00 00 00 6a 00 55 48 8b 54 24 18 48 8b 7c 24


That's btrfs-progs v4.17.1 on 4.18.12-200.fc28.x86_64.

I appreciate the help and advice from everyone who has contributed to 
this thread. At this point, unless there is something for the project to 
gain from tracking down this trouble, I'm just going to nuke the fs and 
start over.


--Larkin



Re: Scrub aborts due to corrupt leaf

2018-10-10 Thread Larkin Lowrey

On 10/10/2018 7:55 PM, Hans van Kranenburg wrote:

On 10/10/2018 07:44 PM, Chris Murphy wrote:


I'm pretty sure you have to umount, and then clear the space_cache
with 'btrfs check --clear-space-cache=v1' and then do a one time mount
with -o space_cache=v2.

The --clear-space-cache=v1 is optional, but recommended, if you are
someone who do not likes to keep accumulated cruft.

The v2 mount (rw mount!!!) does not remove the v1 cache. If you just
mount with v2, the v1 data keeps being there, doing nothing any more.


Theoretically I have the v2 space_cache enabled. After a clean umount...

# mount -onospace_cache /backups
[  391.243175] BTRFS info (device dm-3): disabling free space tree
[  391.249213] BTRFS error (device dm-3): cannot disable free space tree
[  391.255884] BTRFS error (device dm-3): open_ctree failed

# mount -ospace_cache=v1 /backups/
mount: /backups: wrong fs type, bad option, bad superblock on 
/dev/mapper/Cached-Backups, missing codepage or helper program, or other 
error

[  983.501874] BTRFS info (device dm-3): enabling disk space caching
[  983.508052] BTRFS error (device dm-3): cannot disable free space tree
[  983.514633] BTRFS error (device dm-3): open_ctree failed

# btrfs check --clear-space-cache v1 /dev/Cached/Backups
Opening filesystem to check...
couldn't open RDWR because of unsupported option features (3).
ERROR: cannot open file system

# btrfs --version
btrfs-progs v4.17.1

# mount /backups/
[ 1036.840637] BTRFS info (device dm-3): using free space tree
[ 1036.846272] BTRFS info (device dm-3): has skinny extents
[ 1036.999456] BTRFS info (device dm-3): bdev /dev/mapper/Cached-Backups 
errs: wr 0, rd 0, flush 0, corrupt 666, gen 25

[ 1043.025076] BTRFS info (device dm-3): enabling ssd optimizations

Backups will run tonight and will beat on the FS. Perhaps if something 
interesting happens I'll have more log data.


--Larkin


Re: Scrub aborts due to corrupt leaf

2018-10-10 Thread Larkin Lowrey

On 10/10/2018 2:20 PM, Holger Hoffstätte wrote:

On 10/10/18 19:25, Larkin Lowrey wrote:

On 10/10/2018 12:04 PM, Holger Hoffstätte wrote:

On 10/10/18 17:44, Larkin Lowrey wrote:
(..)

About once a week, or so, I'm running into the above situation where
FS seems to deadlock. All IO to the FS blocks, there is no IO
activity at all. I have to hard reboot the system to recover. There
are no error indications except for the following which occurs well
before the FS freezes up:

BTRFS warning (device dm-3): block group 78691883286528 has wrong 
amount of free space
BTRFS warning (device dm-3): failed to load free space cache for 
block group 78691883286528, rebuilding it now


Do I have any options other the nuking the FS and starting over?


Unmount cleanly & mount again with -o space_cache=v2.


It froze while unmounting. The attached zip is a stack dump captured
via 'echo t > /proc/sysrq-trigger'. A second attempt after a hard
reboot worked.


Trace says freespace cache writeout failed midway while the scsi device
was resetting itself and then went rrrghh. Probably managed to hit
different blocks on the second attempt. So chances are your controller,
disk or something else is broken, dying, or both.
When things have settled and you have verified that r/o mounting works
and is stable, try rescuing the data (when necessary) before scrubbing,
dm-device-checking or whatever you have set up.


Interesting, because I do not see any indications of any other errors. 
The fs is backed by an mdraid array and the raid checks always pass with 
no mismatches, edac-util doesn't report any ECC errors, smartd doesn't 
report any SMART errors, and I never see any raid controller errors. I 
have the console connected through serial to a logging console server so 
if there were errors reported I would have seen them.


--Larkin


Re: Scrub aborts due to corrupt leaf

2018-10-10 Thread Larkin Lowrey

On 10/10/2018 12:04 PM, Holger Hoffstätte wrote:

On 10/10/18 17:44, Larkin Lowrey wrote:
(..)

About once a week, or so, I'm running into the above situation where
FS seems to deadlock. All IO to the FS blocks, there is no IO
activity at all. I have to hard reboot the system to recover. There
are no error indications except for the following which occurs well
before the FS freezes up:

BTRFS warning (device dm-3): block group 78691883286528 has wrong 
amount of free space
BTRFS warning (device dm-3): failed to load free space cache for 
block group 78691883286528, rebuilding it now


Do I have any options other the nuking the FS and starting over?


Unmount cleanly & mount again with -o space_cache=v2.


It froze while unmounting. The attached zip is a stack dump captured via 
'echo t > /proc/sysrq-trigger'. A second attempt after a hard reboot worked.


--Larkin
<>


Re: Scrub aborts due to corrupt leaf

2018-10-10 Thread Larkin Lowrey

On 9/11/2018 11:23 AM, Larkin Lowrey wrote:

On 8/29/2018 1:32 AM, Qu Wenruo wrote:


On 2018/8/28 下午9:56, Chris Murphy wrote:
On Tue, Aug 28, 2018 at 7:42 AM, Qu Wenruo  
wrote:


On 2018/8/28 下午9:29, Larkin Lowrey wrote:

On 8/27/2018 10:12 PM, Larkin Lowrey wrote:

On 8/27/2018 12:46 AM, Qu Wenruo wrote:
The system uses ECC memory and edac-util has not reported any 
errors.

However, I will run a memtest anyway.

So it should not be the memory problem.

BTW, what's the current generation of the fs?

# btrfs inspect dump-super  | grep generation

The corrupted leaf has generation 2862, I'm not sure how recent 
did the

corruption happen.

generation  358392
chunk_root_generation   357256
cache_generation    358392
uuid_tree_generation    358392
dev_item.generation 0

I don't recall the last time I ran a scrub but I doubt it has been
more than a year.

I am running 'btrfs check --init-csum-tree' now. Hopefully that 
clears

everything up.

No such luck:

Creating a new CRC tree
Checking filesystem on /dev/Cached/Backups
UUID: acff5096-1128-4b24-a15e-4ba04261edc3
Reinitialize checksum tree
csum result is 0 for block 2412149436416
extent-tree.c:2764: alloc_tree_block: BUG_ON `ret` triggered, 
value -28
It's ENOSPC, meaning btrfs can't find enough space for the new csum 
tree

blocks.

Seems bogus, there's >4TiB unallocated.

What a shame.
Btrfs won't try to allocate new chunk if we're allocating new tree
blocks for metadata trees (extent, csum, etc).

One quick (and dirty) way to avoid such limitation is to use the
following patch



<>


No luck.

# ./btrfs check --init-csum-tree /dev/Cached/Backups
Creating a new CRC tree
Opening filesystem to check...
Checking filesystem on /dev/Cached/Backups
UUID: acff5096-1128-4b24-a15e-4ba04261edc3
Reinitialize checksum tree
Segmentation fault (core dumped)

 btrfs[16575]: segfault at 7ffc4f74ef60 ip 0040d4c3 sp 
7ffc4f74ef50 error 6 in btrfs[40+bf000]


# ./btrfs --version
btrfs-progs v4.17.1

I cloned  btrfs-progs from git and applied your patch.

BTW, I've been having tons of trouble with two hosts after updating 
from kernel 4.17.12 to 4.17.14 and beyond. The fs will become 
unresponsive and all processes will end up stuck waiting on io. The 
system will end up totally idle but unable perform any io on the 
filesystem. So far things have been stable after reverting back to 
4.17.12. It looks like there was a btrfs change in 4.17.13. Could that 
be related to this csum tree corruption?


About once a week, or so, I'm running into the above situation where FS 
seems to deadlock. All IO to the FS blocks, there is no IO activity at 
all. I have to hard reboot the system to recover. There are no error 
indications except for the following which occurs well before the FS 
freezes up:


BTRFS warning (device dm-3): block group 78691883286528 has wrong amount 
of free space
BTRFS warning (device dm-3): failed to load free space cache for block 
group 78691883286528, rebuilding it now


Do I have any options other the nuking the FS and starting over?

--Larkin


Re: Scrub aborts due to corrupt leaf

2018-09-11 Thread Larkin Lowrey

On 8/29/2018 1:32 AM, Qu Wenruo wrote:


On 2018/8/28 下午9:56, Chris Murphy wrote:

On Tue, Aug 28, 2018 at 7:42 AM, Qu Wenruo  wrote:


On 2018/8/28 下午9:29, Larkin Lowrey wrote:

On 8/27/2018 10:12 PM, Larkin Lowrey wrote:

On 8/27/2018 12:46 AM, Qu Wenruo wrote:

The system uses ECC memory and edac-util has not reported any errors.
However, I will run a memtest anyway.

So it should not be the memory problem.

BTW, what's the current generation of the fs?

# btrfs inspect dump-super  | grep generation

The corrupted leaf has generation 2862, I'm not sure how recent did the
corruption happen.

generation  358392
chunk_root_generation   357256
cache_generation358392
uuid_tree_generation358392
dev_item.generation 0

I don't recall the last time I ran a scrub but I doubt it has been
more than a year.

I am running 'btrfs check --init-csum-tree' now. Hopefully that clears
everything up.

No such luck:

Creating a new CRC tree
Checking filesystem on /dev/Cached/Backups
UUID: acff5096-1128-4b24-a15e-4ba04261edc3
Reinitialize checksum tree
csum result is 0 for block 2412149436416
extent-tree.c:2764: alloc_tree_block: BUG_ON `ret` triggered, value -28

It's ENOSPC, meaning btrfs can't find enough space for the new csum tree
blocks.

Seems bogus, there's >4TiB unallocated.

What a shame.
Btrfs won't try to allocate new chunk if we're allocating new tree
blocks for metadata trees (extent, csum, etc).

One quick (and dirty) way to avoid such limitation is to use the
following patch



<>


No luck.

# ./btrfs check --init-csum-tree /dev/Cached/Backups
Creating a new CRC tree
Opening filesystem to check...
Checking filesystem on /dev/Cached/Backups
UUID: acff5096-1128-4b24-a15e-4ba04261edc3
Reinitialize checksum tree
Segmentation fault (core dumped)

 btrfs[16575]: segfault at 7ffc4f74ef60 ip 0040d4c3 sp 
7ffc4f74ef50 error 6 in btrfs[40+bf000]


# ./btrfs --version
btrfs-progs v4.17.1

I cloned  btrfs-progs from git and applied your patch.

BTW, I've been having tons of trouble with two hosts after updating from 
kernel 4.17.12 to 4.17.14 and beyond. The fs will become unresponsive 
and all processes will end up stuck waiting on io. The system will end 
up totally idle but unable perform any io on the filesystem. So far 
things have been stable after reverting back to 4.17.12. It looks like 
there was a btrfs change in 4.17.13. Could that be related to this csum 
tree corruption?


--Larkin



Re: Scrub aborts due to corrupt leaf

2018-08-28 Thread Larkin Lowrey

On 8/27/2018 10:12 PM, Larkin Lowrey wrote:

On 8/27/2018 12:46 AM, Qu Wenruo wrote:



The system uses ECC memory and edac-util has not reported any errors.
However, I will run a memtest anyway.

So it should not be the memory problem.

BTW, what's the current generation of the fs?

# btrfs inspect dump-super  | grep generation

The corrupted leaf has generation 2862, I'm not sure how recent did the
corruption happen.


generation  358392
chunk_root_generation   357256
cache_generation    358392
uuid_tree_generation    358392
dev_item.generation 0

I don't recall the last time I ran a scrub but I doubt it has been 
more than a year.


I am running 'btrfs check --init-csum-tree' now. Hopefully that clears 
everything up.


No such luck:

Creating a new CRC tree
Checking filesystem on /dev/Cached/Backups
UUID: acff5096-1128-4b24-a15e-4ba04261edc3
Reinitialize checksum tree
csum result is 0 for block 2412149436416
extent-tree.c:2764: alloc_tree_block: BUG_ON `ret` triggered, value -28
btrfs(+0x1da16)[0x55cc43796a16]
btrfs(btrfs_alloc_free_block+0x207)[0x55cc4379c177]
btrfs(+0x1602f)[0x55cc4378f02f]
btrfs(btrfs_search_slot+0xed2)[0x55cc43790be2]
btrfs(btrfs_csum_file_block+0x48f)[0x55cc437a213f]
btrfs(+0x55cef)[0x55cc437cecef]
btrfs(cmd_check+0xd49)[0x55cc437ddbc9]
btrfs(main+0x81)[0x55cc4378b4d1]
/lib64/libc.so.6(__libc_start_main+0xeb)[0x7f4717e6324b]
btrfs(_start+0x2a)[0x55cc4378b5ea]
Aborted (core dumped)

--Larkin


Re: Scrub aborts due to corrupt leaf

2018-08-27 Thread Larkin Lowrey

On 8/27/2018 12:46 AM, Qu Wenruo wrote:



The system uses ECC memory and edac-util has not reported any errors.
However, I will run a memtest anyway.

So it should not be the memory problem.

BTW, what's the current generation of the fs?

# btrfs inspect dump-super  | grep generation

The corrupted leaf has generation 2862, I'm not sure how recent did the
corruption happen.


generation  358392
chunk_root_generation   357256
cache_generation    358392
uuid_tree_generation    358392
dev_item.generation 0

I don't recall the last time I ran a scrub but I doubt it has been more 
than a year.


I am running 'btrfs check --init-csum-tree' now. Hopefully that clears 
everything up.


Thank you for your help and advice,

--Larkin


Re: Scrub aborts due to corrupt leaf

2018-08-26 Thread Larkin Lowrey

On 8/26/2018 8:16 PM, Qu Wenruo wrote:

Corrupted tree block bytenr matches with the number reported by kernel.
You could provide the tree block dump for bytenr 7687860535296, and
maybe we could find out what's going wrong and fix it manually.

# btrfs ins dump-tree -b 7687860535296 


Thank you for your reply.

# btrfs ins dump-tree -b 7687860535296 /dev/Cached/Backups
btrfs-progs v4.15.1
leaf free space ret -2002721201, leaf data size 16283, used 2002737484 
nritems 319

leaf 7687860535296 items 319 free space -2002721201 generation 2862 owner 7
leaf 7687860535296 flags 0x1(WRITTEN) backref revision 1
fs uuid acff5096-1128-4b24-a15e-4ba04261edc3
chunk uuid 0d2fdb5d-00c0-41b3-b2ed-39a5e3bf98aa
    item 0 key (18446744073650847734 EXTENT_CSUM 8487178285056) 
itemoff 13211 itemsize 3072

    range start 8487178285056 end 8487181430784 length 3145728
    item 1 key (18446744073650880502 EXTENT_CSUM 8487174090752) 
itemoff 10139 itemsize 3072

    range start 8487174090752 end 8487177236480 length 3145728
    item 2 key (18446744073650913270 EXTENT_CSUM 8487167782912) 
itemoff 3251 itemsize 6888

    range start 8487167782912 end 8487174836224 length 7053312
    item 3 key (18446744073651011574 EXTENT_CSUM 8487166103552) 
itemoff 187 itemsize 3064

    range start 8487166103552 end 8487169241088 length 3137536
    item 4 key (58523648 UNKNOWN.0 4115587072) itemoff 0 itemsize 0
    item 5 key (58523648 UNKNOWN.0 4115058688) itemoff 0 itemsize 0
    item 6 key (58392576 UNKNOWN.0 4115050496) itemoff 0 itemsize 0
    item 7 key (58392576 UNKNOWN.0 9160800976331685888) itemoff 
1325803612 itemsize 1549669347
    item 8 key (15706350841398176100 UNKNOWN.160 
9836230374950416562) itemoff -507102832 itemsize -1565142843
    item 9 key (16420776794030147775 UNKNOWN.139 
1413404178631177347) itemoff 319666572 itemsize -2033238481
    item 10 key (12490357187492557094 UNKNOWN.100 
8703020161114007581) itemoff 1698374107 itemsize 427239449
    item 11 key (10238910558655956878 UNKNOWN.145 
13172984620675614213) itemoff -1386707845 itemsize -2094889124
    item 12 key (14429452134272870167 UNKNOWN.47 
5095274587264087555) itemoff -385621303 itemsize -1014793681
    item 13 key (12392706351935785292 TREE_BLOCK_REF 
17075682359779944300) itemoff 467435242 itemsize -1974352848

    tree block backref
    item 14 key (9030638330689148475 UNKNOWN.146 
16510052416438219760) itemoff -1329727247 itemsize -989772882
    item 15 key (2557232588403612193 UNKNOWN.89 
11359249297629415033) itemoff -1393664382 itemsize -222178533
    item 16 key (16832668804185527807 UNKNOWN.190 
12813564574805698827) itemoff -824350641 itemsize 113587270
    item 17 key (17721977661761488041 UNKNOWN.133 
65181195353232031) itemoff 1165455420 itemsize -11248999
    item 18 key (17041494636387836535 UNKNOWN.146 
659630272632027956) itemoff 1646352770 itemsize 188954807
    item 19 key (4813797791329885851 UNKNOWN.147 
2988230942665281926) itemoff 2034137186 itemsize 429359084
    item 20 key (11925872190557602809 UNKNOWN.28 
10017979389672184473) itemoff 198274722 itemsize 1654501802
    item 21 key (18089916911465221293 UNKNOWN.215 
130744227189807288) itemoff -938569572 itemsize -322594079
    item 22 key (17582525817082834821 UNKNOWN.133 
14298100207216235213) itemoff 997305640 itemsize 380205383
    item 23 key (2509730330338250179 ORPHAN_ITEM 
8415032273173690331) itemoff 1213495256 itemsize -1813460706

    orphan item
    item 24 key (17657358590741059587 UNKNOWN.5 
4198714773705203243) itemoff -690501330 itemsize -237182892
    item 25 key (14784171376049469241 UNKNOWN.139 
15453005915765327150) itemoff 1543890422 itemsize 2093403168
    item 26 key (8296048569161577100 UNKNOWN.58 
12559616442258240580) itemoff 927535366 itemsize -620630864
    item 27 key (14738413134752477244 SHARED_BLOCK_REF 
90867799437527556) itemoff -629160915 itemsize 1418942359

    shared block backref
    item 28 key (17386064595326971933 SHARED_BLOCK_REF 
1813311842215708701) itemoff 1401681450 itemsize -2016124808

    shared block backref
    item 29 key (12068018374989506977 UNKNOWN.160 
1560146733122974605) itemoff -1145774613 itemsize -490403576
    item 30 key (5611751644962296316 QGROUP_LIMIT 
19245/207762978715732) itemoff -433607332 itemsize -854595036

Segmentation fault (core dumped)

Can I simply rebuild the csum tree (btrfs check --init-csum-tree)? The 
entire contents of the fs are back-up files that are hashed so I can 
verify that the files are correct.



Please note that this corruption could be caused by bad ram or some old
kernel bug.
It's recommend to run a memtest if possible.


The system uses ECC memory and edac-util has not reported any errors. 
However, I will run a memtest anyway.


Thank you,

--Larkin


Scrub aborts due to corrupt leaf

2018-08-26 Thread Larkin Lowrey

When I do a scrub it aborts about 10% of the way in due to:

corrupt leaf: root=7 block=7687860535296 slot=0, invalid key objectid 
for csum item, have 18446744073650847734 expect 18446744073709551606


The filesystem in question stores my backups and I have verified all of 
the backups so I know all files that are supposed to be there are there 
and their hashes match. Backups run normally and everything seems to 
work fine, it's just the scrub that doesn't.


I tried:

# btrfs check --repair /dev/Cached/Backups
enabling repair mode
Checking filesystem on /dev/Cached/Backups
UUID: acff5096-1128-4b24-a15e-4ba04261edc3
Fixed 0 roots.
checking extents
leaf free space ret -2002721201, leaf data size 16283, used 2002737484 
nritems 319
leaf free space ret -2002721201, leaf data size 16283, used 2002737484 
nritems 319

leaf free space incorrect 7687860535296 -2002721201
bad block 7687860535296
ERROR: errors found in extent allocation tree or chunk allocation
checking free space cache
block group 34028518375424 has wrong amount of free space
failed to load free space cache for block group 34028518375424
checking fs roots
root 5 inode 6784890 errors 1000, some csum missing
checking csums
there are no extents for csum range 6447630387207159216-6447630390115868080
csum exists for 6447630387207159216-6447630390115868080 but there is no 
extent record

there are no extents for csum range 763548178418734000-763548181428650928
csum exists for 763548178418734000-763548181428650928 but there is no 
extent record
there are no extents for csum range 
10574442573086800664-10574442573732416280
csum exists for 10574442573086800664-10574442573732416280 but there is 
no extent record

ERROR: errors found in csum tree
found 73238589853696 bytes used, error(s) found
total csum bytes: 8117840900
total tree bytes: 34106834944
total fs tree bytes: 23289413632
total extent tree bytes: 1659682816
btree space waste bytes: 6020692848
file data blocks allocated: 73136347418624
 referenced 73135917441024

Nothing changes because when I run the above command again the output is 
identical.


I had been using space_cache v2 but reverted to nospace_cache to run the 
above.


Is there any way to clean this up?

kernel 4.17.14-202.fc28.x86_64
btrfs-progs v4.15.1

Label: none  uuid: acff5096-1128-4b24-a15e-4ba04261edc3
    Total devices 1 FS bytes used 66.61TiB
    devid    1 size 72.77TiB used 68.03TiB path 
/dev/mapper/Cached-Backups


Data, single: total=67.80TiB, used=66.52TiB
System, DUP: total=40.00MiB, used=7.41MiB
Metadata, DUP: total=98.50GiB, used=95.21GiB
GlobalReserve, single: total=512.00MiB, used=0.00B

BTRFS info (device dm-3): disk space caching is enabled
BTRFS info (device dm-3): has skinny extents
BTRFS info (device dm-3): bdev /dev/mapper/Cached-Backups errs: wr 0, rd 
0, flush 0, corrupt 666, gen 25

BTRFS info (device dm-3): enabling ssd optimizations





Unmountable fs. No root for superblock generation

2017-10-18 Thread Larkin Lowrey
I am unable to mount one my my filesystems.  The superblock thinks the 
latest generation is 2220927 but I can't seem to find a root with that 
number. I can find 2220926 and 2220928 but not 2220927. Is there 
anything that I can do to recover this FS?


# btrfs check /dev/Cached/Backups
checksum verify failed on 159057884594176 found 15284E33 wanted C8C5B54E
checksum verify failed on 159057884594176 found 15284E33 wanted C8C5B54E
checksum verify failed on 159057884594176 found 472037C9 wanted 9ACDCCB4
checksum verify failed on 159057884594176 found 472037C9 wanted 9ACDCCB4
Csum didn't match
Couldn't setup extent tree
Couldn't open file system

# btrfs-find-root -g 2220927 /dev/Cached/Backups
Couldn't setup extent tree
Couldn't setup device tree
Superblock thinks the generation is 2220927
Superblock thinks the level is 2

Found tree root at 159057884577792 gen 2220927 level 2
Well block 101489031790592(gen: 2220928 level: 2) seems good, but 
generation/level doesn't match, want gen: 2220927 level: 2


# btrfs check --tree-root 159057884577792  /dev/Cached/Backups
checksum verify failed on 159057884594176 found 15284E33 wanted C8C5B54E
checksum verify failed on 159057884594176 found 15284E33 wanted C8C5B54E
checksum verify failed on 159057884594176 found 472037C9 wanted 9ACDCCB4
checksum verify failed on 159057884594176 found 472037C9 wanted 9ACDCCB4
Csum didn't match
Couldn't setup extent tree
Couldn't open file system

# btrfs check --tree-root 101489031790592 /dev/Cached/Backups
parent transid verify failed on 101489031790592 wanted 2220927 found 2220928
parent transid verify failed on 101489031790592 wanted 2220927 found 2220928
parent transid verify failed on 101489031790592 wanted 2220927 found 2220928
parent transid verify failed on 101489031790592 wanted 2220927 found 2220928
Ignoring transid failure
parent transid verify failed on 159057595138048 wanted 2220927 found 2220920
parent transid verify failed on 159057595138048 wanted 2220927 found 2220920
parent transid verify failed on 159057595138048 wanted 2220927 found 2220920
parent transid verify failed on 159057595138048 wanted 2220927 found 2220920
Ignoring transid failure
parent transid verify failed on 158652658122752 wanted 2220927 found 2220911
parent transid verify failed on 158652658122752 wanted 2220927 found 2220911
parent transid verify failed on 158652658122752 wanted 2220927 found 2220911
parent transid verify failed on 158652658122752 wanted 2220927 found 2220911
Ignoring transid failure
Checking filesystem on /dev/Cached/Backups
UUID: 1b213dfd-6486-47d8-8459-bc5825882023
checking extents
parent transid verify failed on 116329711550464 wanted 2220928 found 2220921
parent transid verify failed on 116329711550464 wanted 2220928 found 2220921
parent transid verify failed on 116329711550464 wanted 2220928 found 2220921
parent transid verify failed on 116329711550464 wanted 2220928 found 2220921
Ignoring transid failure
parent transid verify failed on 116325928206336 wanted 2220928 found 2220921
parent transid verify failed on 116325928206336 wanted 2220928 found 2220921
parent transid verify failed on 116325928206336 wanted 2220928 found 2220921
parent transid verify failed on 116325928206336 wanted 2220928 found 2220921
Ignoring transid failure
parent transid verify failed on 116329892970496 wanted 2220928 found 2220921
parent transid verify failed on 116329892970496 wanted 2220928 found 2220921
parent transid verify failed on 116329892970496 wanted 2220928 found 2220921
parent transid verify failed on 116329892970496 wanted 2220928 found 2220921
Ignoring transid failure
parent transid verify failed on 116325929943040 wanted 2220928 found 2220921
parent transid verify failed on 116325929943040 wanted 2220928 found 2220921
parent transid verify failed on 116325929943040 wanted 2220928 found 2220921
parent transid verify failed on 116325929943040 wanted 2220928 found 2220921
Ignoring transid failure
parent transid verify failed on 116325932679168 wanted 2220928 found 2220921
parent transid verify failed on 116325932679168 wanted 2220928 found 2220921
parent transid verify failed on 116325932679168 wanted 2220928 found 2220921
parent transid verify failed on 116325932679168 wanted 2220928 found 2220921
Ignoring transid failure
parent transid verify failed on 116010673373184 wanted 2220928 found 2220921
parent transid verify failed on 116010673373184 wanted 2220928 found 2220921
parent transid verify failed on 116010673373184 wanted 2220928 found 2220921
parent transid verify failed on 116010673373184 wanted 2220928 found 2220921
Ignoring transid failure
parent transid verify failed on 116329479405568 wanted 2220928 found 2220921
parent transid verify failed on 116329479405568 wanted 2220928 found 2220921
parent transid verify failed on 116329479405568 wanted 2220928 found 2220921
parent transid verify failed on 116329479405568 wanted 2220928 found 2220921
Ignoring transid failure
parent transid verify failed on 116480660914176 wanted 

Unmountable fs - missing generation?

2017-10-16 Thread Larkin Lowrey
I am unable to mount one my my filesystems.  The superblock thinks the 
latest generation is 2220927 but I can't seem to find a root with that 
number. I can find 2220926 and 2220928 but not 2220927. Is there 
anything that I can do to recover this FS?


# btrfs check /dev/Cached/Backups
checksum verify failed on 159057884594176 found 15284E33 wanted C8C5B54E
checksum verify failed on 159057884594176 found 15284E33 wanted C8C5B54E
checksum verify failed on 159057884594176 found 472037C9 wanted 9ACDCCB4
checksum verify failed on 159057884594176 found 472037C9 wanted 9ACDCCB4
Csum didn't match
Couldn't setup extent tree
Couldn't open file system

# btrfs-find-root -g 2220927 /dev/Cached/Backups
Couldn't setup extent tree
Couldn't setup device tree
Superblock thinks the generation is 2220927
Superblock thinks the level is 2

Found tree root at 159057884577792 gen 2220927 level 2
Well block 101489031790592(gen: 2220928 level: 2) seems good, but 
generation/level doesn't match, want gen: 2220927 level: 2


# btrfs check --tree-root 159057884577792  /dev/Cached/Backups
checksum verify failed on 159057884594176 found 15284E33 wanted C8C5B54E
checksum verify failed on 159057884594176 found 15284E33 wanted C8C5B54E
checksum verify failed on 159057884594176 found 472037C9 wanted 9ACDCCB4
checksum verify failed on 159057884594176 found 472037C9 wanted 9ACDCCB4
Csum didn't match
Couldn't setup extent tree
Couldn't open file system

# btrfs check --tree-root 101489031790592 /dev/Cached/Backups
parent transid verify failed on 101489031790592 wanted 2220927 found 2220928
parent transid verify failed on 101489031790592 wanted 2220927 found 2220928
parent transid verify failed on 101489031790592 wanted 2220927 found 2220928
parent transid verify failed on 101489031790592 wanted 2220927 found 2220928
Ignoring transid failure
parent transid verify failed on 159057595138048 wanted 2220927 found 2220920
parent transid verify failed on 159057595138048 wanted 2220927 found 2220920
parent transid verify failed on 159057595138048 wanted 2220927 found 2220920
parent transid verify failed on 159057595138048 wanted 2220927 found 2220920
Ignoring transid failure
parent transid verify failed on 158652658122752 wanted 2220927 found 2220911
parent transid verify failed on 158652658122752 wanted 2220927 found 2220911
parent transid verify failed on 158652658122752 wanted 2220927 found 2220911
parent transid verify failed on 158652658122752 wanted 2220927 found 2220911
Ignoring transid failure
Checking filesystem on /dev/Cached/Backups
UUID: 1b213dfd-6486-47d8-8459-bc5825882023
checking extents
parent transid verify failed on 116329711550464 wanted 2220928 found 2220921
parent transid verify failed on 116329711550464 wanted 2220928 found 2220921
parent transid verify failed on 116329711550464 wanted 2220928 found 2220921
parent transid verify failed on 116329711550464 wanted 2220928 found 2220921
Ignoring transid failure
parent transid verify failed on 116325928206336 wanted 2220928 found 2220921
parent transid verify failed on 116325928206336 wanted 2220928 found 2220921
parent transid verify failed on 116325928206336 wanted 2220928 found 2220921
parent transid verify failed on 116325928206336 wanted 2220928 found 2220921
Ignoring transid failure
parent transid verify failed on 116329892970496 wanted 2220928 found 2220921
parent transid verify failed on 116329892970496 wanted 2220928 found 2220921
parent transid verify failed on 116329892970496 wanted 2220928 found 2220921
parent transid verify failed on 116329892970496 wanted 2220928 found 2220921
Ignoring transid failure
parent transid verify failed on 116325929943040 wanted 2220928 found 2220921
parent transid verify failed on 116325929943040 wanted 2220928 found 2220921
parent transid verify failed on 116325929943040 wanted 2220928 found 2220921
parent transid verify failed on 116325929943040 wanted 2220928 found 2220921
Ignoring transid failure
parent transid verify failed on 116325932679168 wanted 2220928 found 2220921
parent transid verify failed on 116325932679168 wanted 2220928 found 2220921
parent transid verify failed on 116325932679168 wanted 2220928 found 2220921
parent transid verify failed on 116325932679168 wanted 2220928 found 2220921
Ignoring transid failure
parent transid verify failed on 116010673373184 wanted 2220928 found 2220921
parent transid verify failed on 116010673373184 wanted 2220928 found 2220921
parent transid verify failed on 116010673373184 wanted 2220928 found 2220921
parent transid verify failed on 116010673373184 wanted 2220928 found 2220921
Ignoring transid failure
parent transid verify failed on 116329479405568 wanted 2220928 found 2220921
parent transid verify failed on 116329479405568 wanted 2220928 found 2220921
parent transid verify failed on 116329479405568 wanted 2220928 found 2220921
parent transid verify failed on 116329479405568 wanted 2220928 found 2220921
Ignoring transid failure
parent transid verify failed on 116480660914176 wanted 

Re: Heavy nocow'd VM image fragmentation

2014-10-26 Thread Larkin Lowrey
On 10/24/2014 10:28 PM, Duncan wrote:
 Robert White posted on Fri, 24 Oct 2014 19:41:32 -0700 as excerpted:

 On 10/24/2014 04:49 AM, Marc MERLIN wrote:
 On Thu, Oct 23, 2014 at 06:04:43PM -0500, Larkin Lowrey wrote:
 I have a 240GB VirtualBox vdi image that is showing heavy
 fragmentation (filefrag). The file was created in a dir that was
 chattr +C'd, the file was created via fallocate and the contents of
 the orignal image were copied into the file via dd. I verified that
 the image was +C.
 To be honest, I have the same problem, and it's vexing:
 If I understand correctly, when you take a snapshot the file goes into
 what I call 1COW mode.
 Yes, but the OP said he hadn't snapshotted since creating the file, and 
 MM's a regular that actually wrote much of the wiki documentation on 
 raid56 modes, so he better know about the snapshotting problem too.

 So that can't be it.  There's apparently a bug in some recent code, and 
 it's not honoring the NOCOW even in normal operation, when it should be.

 (FWIW I'm not running any VMs or large DBs here, so don't have nocow set 
 on anything and can and do use autodefrag on all my btrfs.  So I can't 
 say one way or the other, personally.)


Correct, there were no snapshots during VM usage when the fragmentation
occurred.

One unusual property of my setup is I have my fs on top of bcache. More
specifically, the stack is md raid6  - bcache - lvm - btrfs. When the
fs mounts it has mount option 'ssd' due to the fact that bcache sets
/sys/block/bcache0/queue/rotational to 0.

Is there any reason why either the 'ssd' mount option or being backed by
bcache could be responsible?

--Larkin
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Heavy nocow'd VM image fragmentation

2014-10-23 Thread Larkin Lowrey
I have a 240GB VirtualBox vdi image that is showing heavy fragmentation
(filefrag). The file was created in a dir that was chattr +C'd, the file
was created via fallocate and the contents of the orignal image were
copied into the file via dd. I verified that the image was +C.

After initial creation there were about 2800 fragments, according to
filefrag. That doesn't surprise me because this image took up about 60%
of the free space. After an hour of light use the filefrag count was the
same. But, after a day of heavy use, the count is now well over 600,000.

There were no snapshots during the period of use. The fs does not have
compression enabled. These usual suspects don't apply in my case.

The process I used to copy the image to a noCOW image was:

fallocate -n -l $(stat --format %s old.vdi) new.vdi
dd if=old.vdi of=new.vdi conv=notrunc oflags=append bs=1M

Performance does seem much worse in the VM but could it be that the
image isn't actually severely fragmented and I'm just misunderstanding
the output from filefrag?

Is there a problem with how I copied over the old image file?

--Larkin

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


btrfsck check infinite loop

2014-09-24 Thread Larkin Lowrey
I ran 'btrfs check --repair --init-extent-tree' and appear to be in an
infinite loop. It performed heavy IO for about 1.5 hours then the IO
stopped and the CPU stayed at 100%. It's been like that for more than 12
hours now.

I made a hardware change last week that resulted in unstable RAM so I
suspect some corrupt data was written to disk. I tried mounting with
-orecovery,clear_cache,nospace_cache but I would get a panic shortly
thereafter. I tried 'btrfs check --repair' but also got a panic. I
finally tried 'btrfs check --repair --init-extent-tree' and hit an
assertion failed error with btrfs-progs 3.16.

After noticing some promising commits, I built from the integration repo
(kdave), re-ran (v3.16.1) and got further (2hrs) but then got stuck in
this infinite loop.

Here's the backtrace of where it is now and has been for hours:

#0  0x00438f01 in free_some_buffers (tree=0xda3078) at
extent_io.c:553
#1  __alloc_extent_buffer (blocksize=4096, bytenr=optimized out,
tree=0xda3078) at extent_io.c:592
#2  alloc_extent_buffer (tree=0xda3078, bytenr=optimized out,
blocksize=4096) at extent_io.c:671
#3  0x0042be29 in btrfs_find_create_tree_block
(root=root@entry=0xda34a0, bytenr=optimized out, blocksize=optimized
out) at disk-io.c:133
#4  0x0042d683 in read_tree_block (root=0xda34a0,
bytenr=optimized out, blocksize=optimized out,
parent_transid=161580) at disk-io.c:260
#5  0x00427c58 in read_node_slot (root=root@entry=0xda34a0,
parent=parent@entry=0x165ab88c0, slot=slot@entry=43) at ctree.c:634
#6  0x00428558 in push_leaf_right (trans=trans@entry=0xe709b0,
root=root@entry=0xda34a0, path=path@entry=0xde317a0,
data_size=data_size@entry=67, empty=empty@entry=0)
at ctree.c:1608
#7  0x00428e4c in split_leaf (trans=trans@entry=0xe709b0,
root=root@entry=0xda34a0, ins_key=ins_key@entry=0x7fff24da24b0,
path=path@entry=0xde317a0,
data_size=data_size@entry=67, extend=extend@entry=0) at ctree.c:1977
#8  0x0042aa54 in btrfs_search_slot (trans=0xe709b0,
root=root@entry=0xda34a0, key=key@entry=0x7fff24da24b0,
p=p@entry=0xde317a0, ins_len=ins_len@entry=67,
cow=cow@entry=1) at ctree.c:1120
#9  0x0042af51 in btrfs_insert_empty_items
(trans=trans@entry=0xe709b0, root=root@entry=0xda34a0,
path=path@entry=0xde317a0, cpu_key=cpu_key@entry=0x7fff24da24b0,
data_size=data_size@entry=0x7fff24da24a0, nr=nr@entry=1) at ctree.c:2412
#10 0x004175f6 in btrfs_insert_empty_item (data_size=42,
key=0x7fff24da24b0, path=0xde317a0, root=0xda34a0, trans=0xe709b0) at
ctree.h:2312
#11 record_extent (flags=0, allocated=optimized out, back=0x95cb3d90,
rec=0x95cb3cc0, path=0xde317a0, info=0xda3010, trans=0xe709b0) at
cmds-check.c:4438
#12 fixup_extent_refs (trans=trans@entry=0xe709b0, info=optimized out,
extent_cache=extent_cache@entry=0x7fff24da2970,
rec=rec@entry=0x95cb3cc0) at cmds-check.c:5287
#13 0x0041ac01 in check_extent_refs
(extent_cache=0x7fff24da2970, root=optimized out, trans=optimized
out) at cmds-check.c:5511
#14 check_chunks_and_extents (root=root@entry=0xfa7c70) at cmds-check.c:5978
#15 0x0041bdd9 in cmd_check (argc=optimized out,
argv=optimized out) at cmds-check.c:6723
#16 0x00404481 in main (argc=4, argv=0x7fff24da2fe0) at btrfs.c:247

I checked node, node-next, node-next-next, node-next-prev, etc. and
saw no obvious loop, at least not in the immediate vicinity of node. The
value of node is different each time I check it.

I'll periodically see the following backtrace:

#0  __list_del (next=0x1326fe820, prev=0xda3088) at list.h:113
#1  list_move_tail (head=0xda3088, list=0x1514b40f0) at list.h:183
#2  free_some_buffers (tree=0xda3078) at extent_io.c:560
#3  __alloc_extent_buffer (blocksize=4096, bytenr=optimized out,
tree=0xda3078) at extent_io.c:592
#4  alloc_extent_buffer (tree=0xda3078, bytenr=optimized out,
blocksize=4096) at extent_io.c:671
#5  0x0042be29 in btrfs_find_create_tree_block
(root=root@entry=0xda34a0, bytenr=optimized out, blocksize=optimized
out) at disk-io.c:133
#6  0x0042d683 in read_tree_block (root=0xda34a0,
bytenr=optimized out, blocksize=optimized out,
parent_transid=161580) at disk-io.c:260
#7  0x00427c58 in read_node_slot (root=root@entry=0xda34a0,
parent=parent@entry=0x165ab88c0, slot=slot@entry=43) at ctree.c:634
#8  0x00428558 in push_leaf_right (trans=trans@entry=0xe709b0,
root=root@entry=0xda34a0, path=path@entry=0xde317a0,
data_size=data_size@entry=67, empty=empty@entry=0)
at ctree.c:1608
#9  0x00428e4c in split_leaf (trans=trans@entry=0xe709b0,
root=root@entry=0xda34a0, ins_key=ins_key@entry=0x7fff24da24b0,
path=path@entry=0xde317a0,
data_size=data_size@entry=67, extend=extend@entry=0) at ctree.c:1977
#10 0x0042aa54 in btrfs_search_slot (trans=0xe709b0,
root=root@entry=0xda34a0, key=key@entry=0x7fff24da24b0,
p=p@entry=0xde317a0, ins_len=ins_len@entry=67,
cow=cow@entry=1) at ctree.c:1120
#11 0x0042af51 in 

Re: btrfsck check infinite loop

2014-09-24 Thread Larkin Lowrey
I noticed the following:

(gdb) print nrscan
$19 = 1680726970
(gdb) print tree-cache_size
$20 = 1073741824
(gdb) print cache_hard_max
$21 = 1073741824

It appears that cache_size can not shrink below cache_hard_max so we
never end up breaking out of the loop. The FS in question is 30TB with
~26TB in use. Perhaps cache_hard_max (1GB) is too small for this size
FS? I just bumped it to 2GB and am re-running to see if that helps.

--Larkin

On 9/24/2014 9:27 AM, Larkin Lowrey wrote:
 I ran 'btrfs check --repair --init-extent-tree' and appear to be in an
 infinite loop. It performed heavy IO for about 1.5 hours then the IO
 stopped and the CPU stayed at 100%. It's been like that for more than 12
 hours now.

 I made a hardware change last week that resulted in unstable RAM so I
 suspect some corrupt data was written to disk. I tried mounting with
 -orecovery,clear_cache,nospace_cache but I would get a panic shortly
 thereafter. I tried 'btrfs check --repair' but also got a panic. I
 finally tried 'btrfs check --repair --init-extent-tree' and hit an
 assertion failed error with btrfs-progs 3.16.

 After noticing some promising commits, I built from the integration repo
 (kdave), re-ran (v3.16.1) and got further (2hrs) but then got stuck in
 this infinite loop.

 Here's the backtrace of where it is now and has been for hours:

 #0  0x00438f01 in free_some_buffers (tree=0xda3078) at
 extent_io.c:553
 #1  __alloc_extent_buffer (blocksize=4096, bytenr=optimized out,
 tree=0xda3078) at extent_io.c:592
 #2  alloc_extent_buffer (tree=0xda3078, bytenr=optimized out,
 blocksize=4096) at extent_io.c:671
 #3  0x0042be29 in btrfs_find_create_tree_block
 (root=root@entry=0xda34a0, bytenr=optimized out, blocksize=optimized
 out) at disk-io.c:133
 #4  0x0042d683 in read_tree_block (root=0xda34a0,
 bytenr=optimized out, blocksize=optimized out,
 parent_transid=161580) at disk-io.c:260
 #5  0x00427c58 in read_node_slot (root=root@entry=0xda34a0,
 parent=parent@entry=0x165ab88c0, slot=slot@entry=43) at ctree.c:634
 #6  0x00428558 in push_leaf_right (trans=trans@entry=0xe709b0,
 root=root@entry=0xda34a0, path=path@entry=0xde317a0,
 data_size=data_size@entry=67, empty=empty@entry=0)
 at ctree.c:1608
 #7  0x00428e4c in split_leaf (trans=trans@entry=0xe709b0,
 root=root@entry=0xda34a0, ins_key=ins_key@entry=0x7fff24da24b0,
 path=path@entry=0xde317a0,
 data_size=data_size@entry=67, extend=extend@entry=0) at ctree.c:1977
 #8  0x0042aa54 in btrfs_search_slot (trans=0xe709b0,
 root=root@entry=0xda34a0, key=key@entry=0x7fff24da24b0,
 p=p@entry=0xde317a0, ins_len=ins_len@entry=67,
 cow=cow@entry=1) at ctree.c:1120
 #9  0x0042af51 in btrfs_insert_empty_items
 (trans=trans@entry=0xe709b0, root=root@entry=0xda34a0,
 path=path@entry=0xde317a0, cpu_key=cpu_key@entry=0x7fff24da24b0,
 data_size=data_size@entry=0x7fff24da24a0, nr=nr@entry=1) at ctree.c:2412
 #10 0x004175f6 in btrfs_insert_empty_item (data_size=42,
 key=0x7fff24da24b0, path=0xde317a0, root=0xda34a0, trans=0xe709b0) at
 ctree.h:2312
 #11 record_extent (flags=0, allocated=optimized out, back=0x95cb3d90,
 rec=0x95cb3cc0, path=0xde317a0, info=0xda3010, trans=0xe709b0) at
 cmds-check.c:4438
 #12 fixup_extent_refs (trans=trans@entry=0xe709b0, info=optimized out,
 extent_cache=extent_cache@entry=0x7fff24da2970,
 rec=rec@entry=0x95cb3cc0) at cmds-check.c:5287
 #13 0x0041ac01 in check_extent_refs
 (extent_cache=0x7fff24da2970, root=optimized out, trans=optimized
 out) at cmds-check.c:5511
 #14 check_chunks_and_extents (root=root@entry=0xfa7c70) at cmds-check.c:5978
 #15 0x0041bdd9 in cmd_check (argc=optimized out,
 argv=optimized out) at cmds-check.c:6723
 #16 0x00404481 in main (argc=4, argv=0x7fff24da2fe0) at btrfs.c:247

 I checked node, node-next, node-next-next, node-next-prev, etc. and
 saw no obvious loop, at least not in the immediate vicinity of node. The
 value of node is different each time I check it.

 I'll periodically see the following backtrace:

 #0  __list_del (next=0x1326fe820, prev=0xda3088) at list.h:113
 #1  list_move_tail (head=0xda3088, list=0x1514b40f0) at list.h:183
 #2  free_some_buffers (tree=0xda3078) at extent_io.c:560
 #3  __alloc_extent_buffer (blocksize=4096, bytenr=optimized out,
 tree=0xda3078) at extent_io.c:592
 #4  alloc_extent_buffer (tree=0xda3078, bytenr=optimized out,
 blocksize=4096) at extent_io.c:671
 #5  0x0042be29 in btrfs_find_create_tree_block
 (root=root@entry=0xda34a0, bytenr=optimized out, blocksize=optimized
 out) at disk-io.c:133
 #6  0x0042d683 in read_tree_block (root=0xda34a0,
 bytenr=optimized out, blocksize=optimized out,
 parent_transid=161580) at disk-io.c:260
 #7  0x00427c58 in read_node_slot (root=root@entry=0xda34a0,
 parent=parent@entry=0x165ab88c0, slot=slot@entry=43) at ctree.c:634
 #8  0x00428558 in push_leaf_right (trans=trans@entry=0xe709b0,
 root=root@entry=0xda34a0, path=path

Re: btrfs on bcache

2014-07-30 Thread Larkin Lowrey
I've been running two backup servers, with 25T and 20T of data, using
btrfs on bcache (writeback) for about 7 months. I periodically run btrfs
scrubs and backup verifies (SHA1 hashes) and have never had a corruption
issue.

My use of btrfs is simple, though, with no subvolumes and no btrfs level
raid. My bcache backing devices are LVM volumes that span multiple md
raid6 arrays. So, either the bug has been fixed or my configuration is
not susceptible.

I'm running kernel 3.15.5-200.fc20.x86_64.

--Larkin

On 7/30/2014 5:04 PM, dptr...@arcor.de wrote:
 Concerning http://thread.gmane.org/gmane.comp.file-systems.btrfs/31018, does 
 this bug still exists?

 Kernel 3.14
 B: 2x HDD 1 TB
 C: 1x SSD 256 GB

 # make-bcache -B /dev/sda /dev/sdb -C /dev/sdc --cache_replacement_policy=lru
 # mkfs.btrfs -d raid1 -m raid1 -L BTRFS_RAID /dev/bcache0 /dev/bcache1

 I still have no incomplete page write messages in dmesg | grep btrfs and 
 the checksums of some manually reviewed files are okay.

 Who has more experiences about this?

 Thanks,

 - dp
 --
 To unsubscribe from this list: send the line unsubscribe linux-bcache in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html