Re: Have 15GB missing in btrfs filesystem.

2018-10-27 Thread Marc MERLIN
On Sun, Oct 28, 2018 at 07:27:22AM +0800, Qu Wenruo wrote:
> > I can't drop all the snapshots since at least two is used for btrfs
> > send/receive backups.
> > However, if I delete more snapshots, and do a full balance, you think
> > it'll free up more space?
> 
> No.
> 
> You're already too worried about an non-existing problem.
> Your fs looks pretty healthy.

Thanks both for the answers. I'll go back and read them more carefully
later to see how I can adjust my monitoring but basically I hit the 90%
space used in df alert, and I know that once I get close to full, or
completely full, very bad things happen with btrfs, making the system
sometimes so unusable that it's very hard to reclaim space and fix the
issue (not counting that if you have btrfs send snapshots, you're forced
to break the snapshot relationship and start over since deleting data
does not reclaim blocks that are obviously still marked as used by the
last snapshot that was sent to the backup server).

Long story short, I try very hard to not ever hit this problem again :)

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/   | PGP 7F55D5F27AAF9D08


Re: Have 15GB missing in btrfs filesystem.

2018-10-27 Thread Marc MERLIN
On Sat, Oct 27, 2018 at 02:12:02PM -0400, Remi Gauvin wrote:
> On 2018-10-27 01:42 PM, Marc MERLIN wrote:
> 
> > 
> > I've been using btrfs for a long time now but I've never had a
> > filesystem where I had 15GB apparently unusable (7%) after a balance.
> > 
> 
> The space isn't unusable.  It's just allocated.. (It's used in the sense
> that it's reserved for data chunks.).  Start writing data to the drive,
> and the data will fill that space before more gets allocated.. (Unless
> you are using an older kernel and the filesystem gets mounted with ssd
> option, in which case, you'll want to add nossd option to prevent that
> behaviour.)
> 
> You can use btrfs fi usage to display that more clearly.
 
Got it. I have disk space free alerts based on df, which I know doesn't
mean that much on btrfs. Maybe I'll just need to change that alert code
to make it btrfs aware.
 
> > I can try a defrag next, but since I have COW for snapshots, it's not
> > going to help much, correct?
> 
> The defrag will end up using more space, as the fragmented parts of
> files will get duplicated.  That being said, if you have the luxury to
> defrag *before* taking new snapshots, that would be the time to do it.

Thanks for confirming. Because I always have snapshots for btrfs
send/receive, defrag will duplicate as you say, but once the older
snapshots get freed up, the duplicate blocks should go away, correct?

Back to usage, thanks for pointing out that command:
saruman:/mnt/btrfs_pool1# btrfs fi usage .
Overall:
Device size: 228.67GiB
Device allocated:203.54GiB
Device unallocated:   25.13GiB
Device missing:  0.00B
Used:192.01GiB
Free (estimated): 32.44GiB  (min: 19.88GiB)
Data ratio:   1.00
Metadata ratio:   2.00
Global reserve:  512.00MiB  (used: 0.00B)

Data,single: Size:192.48GiB, Used:185.16GiB
   /dev/mapper/pool1 192.48GiB

Metadata,DUP: Size:5.50GiB, Used:3.42GiB
   /dev/mapper/pool1  11.00GiB

System,DUP: Size:32.00MiB, Used:48.00KiB
   /dev/mapper/pool1  64.00MiB

Unallocated:
   /dev/mapper/pool1  25.13GiB


I'm still seing that I'm using 192GB, but 203GB allocated.
Do I have 25GB usable:
Device unallocated:   25.13GiB

Or 35GB usable?
Device size: 228.67GiB
  -
Used:192.01GiB
  = 36GB ?

Yes I know that I shouldn't get close to filling up the device, just
trying to clear up if I should stay below 25GB or below 35GB

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/   | PGP 7F55D5F27AAF9D08


Re: Have 15GB missing in btrfs filesystem.

2018-10-27 Thread Marc MERLIN
On Wed, Oct 24, 2018 at 01:07:25PM +0800, Qu Wenruo wrote:
> > saruman:/mnt/btrfs_pool1# btrfs balance start -musage=80 -v .
> > Dumping filters: flags 0x6, state 0x0, force is off
> >   METADATA (flags 0x2): balancing, usage=80
> >   SYSTEM (flags 0x2): balancing, usage=80
> > Done, had to relocate 5 out of 202 chunks
> > saruman:/mnt/btrfs_pool1# btrfs fi show .
> > Label: 'btrfs_pool1'  uuid: fda628bc-1ca4-49c5-91c2-4260fe967a23
> > Total devices 1 FS bytes used 188.24GiB
> > devid1 size 228.67GiB used 203.54GiB path /dev/mapper/pool1
> > 
> > and it's back to 15GB :-/
> > 
> > How can I get 188.24 and 203.54 to converge further? Where is all that
> > space gone?
> 
> Your original chunks are already pretty compact.
> Thus really no need to do extra balance.
> 
> You may get some extra space by doing full system balance (no usage=
> filter), but that's really not worthy in my opinion.
> 
> Maybe you could try defrag to free some space wasted by CoW instead?
> (If you're not using many snapshots)

Thanks for the reply.

So right now, I have:
saruman:~# btrfs fi show /mnt/btrfs_pool1/
Label: 'btrfs_pool1'  uuid: fda628bc-1ca4-49c5-91c2-4260fe967a23
Total devices 1 FS bytes used 188.25GiB
devid1 size 228.67GiB used 203.54GiB path /dev/mapper/pool1

saruman:~# btrfs fi df /mnt/btrfs_pool1/
Data, single: total=192.48GiB, used=184.87GiB
System, DUP: total=32.00MiB, used=48.00KiB
Metadata, DUP: total=5.50GiB, used=3.38GiB
GlobalReserve, single: total=512.00MiB, used=0.00B

I've been using btrfs for a long time now but I've never had a
filesystem where I had 15GB apparently unusable (7%) after a balance.

I can't drop all the snapshots since at least two is used for btrfs
send/receive backups.
However, if I delete more snapshots, and do a full balance, you think
it'll free up more space?
I can try a defrag next, but since I have COW for snapshots, it's not
going to help much, correct?

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/   | PGP 7F55D5F27AAF9D08


Have 15GB missing in btrfs filesystem.

2018-10-23 Thread Marc MERLIN
Normally when btrfs fi show will show lost space because 
your trees aren't balanced.
Balance usually reclaims that space, or most of it.
In this case, not so much.

kernel 4.17.6:

saruman:/mnt/btrfs_pool1# btrfs fi show .
Label: 'btrfs_pool1'  uuid: fda628bc-1ca4-49c5-91c2-4260fe967a23
Total devices 1 FS bytes used 186.89GiB
devid1 size 228.67GiB used 207.60GiB path /dev/mapper/pool1

Ok, I have 21GB between used by FS and used in block layer.

saruman:/mnt/btrfs_pool1# btrfs balance start -dusage=40 -v .
Dumping filters: flags 0x1, state 0x0, force is off
  DATA (flags 0x2): balancing, usage=40
Done, had to relocate 1 out of 210 chunks
saruman:/mnt/btrfs_pool1# btrfs balance start -musage=60 -v .
Dumping filters: flags 0x6, state 0x0, force is off
  METADATA (flags 0x2): balancing, usage=60
  SYSTEM (flags 0x2): balancing, usage=60
Done, had to relocate 4 out of 209 chunks
saruman:/mnt/btrfs_pool1# btrfs fi show .
Label: 'btrfs_pool1'  uuid: fda628bc-1ca4-49c5-91c2-4260fe967a23
Total devices 1 FS bytes used 186.91GiB
devid1 size 228.67GiB used 205.60GiB path /dev/mapper/pool1

That didn't help much, delta is now 19GB

saruman:/mnt/btrfs_pool1# btrfs balance start -dusage=80 -v .
Dumping filters: flags 0x1, state 0x0, force is off
  DATA (flags 0x2): balancing, usage=80
Done, had to relocate 8 out of 207 chunks
saruman:/mnt/btrfs_pool1# btrfs fi show .
Label: 'btrfs_pool1'  uuid: fda628bc-1ca4-49c5-91c2-4260fe967a23
Total devices 1 FS bytes used 187.03GiB
devid1 size 228.67GiB used 201.54GiB path /dev/mapper/pool1

Ok, now delta is 14GB

saruman:/mnt/btrfs_pool1# btrfs balance start -musage=80 -v .
Dumping filters: flags 0x6, state 0x0, force is off
  METADATA (flags 0x2): balancing, usage=80
  SYSTEM (flags 0x2): balancing, usage=80
Done, had to relocate 5 out of 202 chunks
saruman:/mnt/btrfs_pool1# btrfs fi show .
Label: 'btrfs_pool1'  uuid: fda628bc-1ca4-49c5-91c2-4260fe967a23
Total devices 1 FS bytes used 188.24GiB
devid1 size 228.67GiB used 203.54GiB path /dev/mapper/pool1

and it's back to 15GB :-/

How can I get 188.24 and 203.54 to converge further? Where is all that
space gone?

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/   | PGP 7F55D5F27AAF9D08


Re: btrfs check (not lowmem) and OOM-like hangs (4.17.6)

2018-07-18 Thread Marc MERLIN
On Wed, Jul 18, 2018 at 10:42:21PM +0300, Andrei Borzenkov wrote:
> > Any help from other experienced developers would definitely help to
> > solve why memory of 'btrfs check' is not swapped out or why OOM killer
> > is not triggered.
> 
> Almost all used memory is marked as "active" and active pages are not
> swapped. Page is active if it was accessed recently. Is it possible that
> btrfs logic does frequent scans across all allocated memory?
> >>
> >> Active: 30381404 kB
> >> Inactive: 585952 kB

That is a very good find.

Yes, the linux kernel VM may be smart enough not to swap pages that got used
recently and when btrfs slurps all the extents to cross check everything, I
think it does cross reference them all many times.
This is why it can run in a few hours when btrfs check lowmem requires days
to run in a similar situation.

I'm not sure if there is a good way around this, but it's good to know that
btrfs repair can effectively abuse the linux VM in a way that it'll take
everything down without OOM having a chance to trigger.

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs check (not lowmem) and OOM-like hangs (4.17.6)

2018-07-17 Thread Marc MERLIN
On Wed, Jul 18, 2018 at 08:05:51AM +0800, Qu Wenruo wrote:
> No OOM triggers? That's a little strange.
> Maybe it's related to how kernel handles memory over-commit?
 
Yes, I think you are correct.

> And for the hang, I think it's related to some memory allocation failure
> and error handler just didn't handle it well, so it's causing deadlock
> for certain page.

That indeed matches what I'm seeing.

> ENOMEM handling is pretty common but hardly verified, so it's not that
> strange, but we must locate the problem.

I seem to be getting deadlocks in the kernel, so I'm hoping that at least
it's checked there, but maybe not?

> In my system, at least I'm not using btrfs as root fs, and for the
> memory eating program I normally ensure it's eating all the memory +
> swap, so OOM killer is always triggered, maybe that's the cause.
> 
> So in your case, maybe it's btrfs not really taking up all memory, thus
> OOM killer not triggered.

Correct, the swap is not used.

> Any kernel dmesg about OOM killer triggered?
 
Nothing at all. It never gets triggered.

> > Here is my system when it virtually died:
> > ER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
> > root 31006 21.2 90.7 29639020 29623180 pts/19 D+ 13:49   1:35 ./btrfs 
> > check /dev/mapper/dshelf2

See how btrs was taking 29GB in that ps output (that's before it takes
everything and I can't even type ps anymore)
Note that VSZ is almost equal to RSS. Nothing gets swapped.

Then see free output:

> >  total   used   free sharedbuffers cached
> > Mem:  32643788   32180100 463688  0  44664 119508
> > -/+ buffers/cache:   32015928 627860
> > Swap: 15616764 443676   15173088
> 
> For swap, it looks like only some other program's memory is swapped out,
> not btrfs'.

That's exactly correct. btrfs check never goes to swap, I'm not sure why,
and because there is virtual memory free, maybe that's why OOM does not
trigger?
So I guess I can probably "fix" my problem by removing swap, but ultimately
it would be useful to know why memory taken by btrfs check does not end up
in swap.

> And unfortunately, I'm not so familiar with OOM/MM code outside of
> filesystem.
> Any help from other experienced developers would definitely help to
> solve why memory of 'btrfs check' is not swapped out or why OOM killer
> is not triggered.

Do you have someone from linux-vm you might be able to ask, or should we Cc
this thread there?

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs check (not lowmem) and OOM-like hangs (4.17.6)

2018-07-17 Thread Marc MERLIN
Ok, I did more testing. Qu is right that btrfs check does not crash the kernel.
It just takes all the memory until linux hangs everywhere, and somehow (no idea 
why) 
the OOM killer never triggers.
Details below:

On Tue, Jul 17, 2018 at 01:32:57PM -0700, Marc MERLIN wrote:
> Here is what I got when the system was not doing well (it took minutes to 
> run):
> 
>  total   used   free sharedbuffers cached
> Mem:  32643788   32070952 572836  0 1021604378772
> -/+ buffers/cache:   275900205053768
> Swap: 15616764 973596   14643168

ok, the reason it was not that close to 0 was due to /dev/shm it seems.
I cleared that, and now I can get it to go to near 0 again.
I'm wrong about the system being fully crashed, it's not, it's just very
close to being hung.
I can type killall -9 btrfs in the serial console and wait a few minutes.
The system eventually recovers, but it's impossible to fix anything via ssh 
apparently because networking does not get to run when I'm in this state.

I'm not sure why my system reproduces this easy while Qu's system does not, 
but Qu was right that the kernel is not dead and that it's merely a problem of 
userspace
taking all the RAM and somehow not being killed by OOM

I checked the PID and don't see why it's not being killed:
gargamel:/proc/31006# grep . oom*
oom_adj:0
oom_score:221   << this increases a lot, but OOM never kills it
oom_score_adj:0

I have these variables:
/proc/sys/vm/oom_dump_tasks:1
/proc/sys/vm/oom_kill_allocating_task:0
/proc/sys/vm/overcommit_kbytes:0
/proc/sys/vm/overcommit_memory:0
/proc/sys/vm/overcommit_ratio:50  << is this bad (seems default)

Here is my system when it virtually died:
ER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
root 31006 21.2 90.7 29639020 29623180 pts/19 D+ 13:49   1:35 ./btrfs check 
/dev/mapper/dshelf2

 total   used   free sharedbuffers cached
Mem:  32643788   32180100 463688  0  44664 119508
-/+ buffers/cache:   32015928 627860
Swap: 15616764 443676   15173088

MemTotal:   32643788 kB
MemFree:  463440 kB
MemAvailable:  44864 kB
Buffers:   44664 kB
Cached:   120360 kB
SwapCached:87064 kB
Active: 30381404 kB
Inactive: 585952 kB
Active(anon):   30334696 kB
Inactive(anon):   474624 kB
Active(file):  46708 kB
Inactive(file):   111328 kB
Unevictable:5616 kB
Mlocked:5616 kB
SwapTotal:  15616764 kB
SwapFree:   15173088 kB
Dirty:  1636 kB
Writeback: 4 kB
AnonPages:  30734240 kB
Mapped:67236 kB
Shmem:  3036 kB
Slab: 267884 kB
SReclaimable:  51528 kB
SUnreclaim:   216356 kB
KernelStack:   10144 kB
PageTables:69284 kB
NFS_Unstable:  0 kB
Bounce:0 kB
WritebackTmp:  0 kB
CommitLimit:31938656 kB
Committed_AS:   32865492 kB
VmallocTotal:   34359738367 kB
VmallocUsed:   0 kB
VmallocChunk:  0 kB
HardwareCorrupted: 0 kB
AnonHugePages: 0 kB
ShmemHugePages:0 kB
ShmemPmdMapped:0 kB
CmaTotal:  16384 kB
CmaFree:   0 kB
HugePages_Total:   0
HugePages_Free:0
HugePages_Rsvd:0
HugePages_Surp:0
Hugepagesize:   2048 kB
Hugetlb:   0 kB
DirectMap4k:  560404 kB
DirectMap2M:32692224 kB


-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


btrfs check (not lowmem) and OOM-like hangs (4.17.6)

2018-07-17 Thread Marc MERLIN
On Tue, Jul 17, 2018 at 10:50:32AM -0700, Marc MERLIN wrote:
> I got the following on 4.17.6 while running btrfs check --repair on an
> unmounted filesystem (not the lowmem version)
> 
> I understand that btrfs check is userland only, although it seems that
> it caused these FS hangs on a different filesystem (the trace of course
> does not provide info on which FS)
> 
> Any idea what happened here?
> I'm going to wait a few hours without running btrfs check to see if it
> happens again and then if running btrfs check will re-create this issue,
> but other suggestions (if any), are welcome:

Hi Qu, I know we were talking about this last week and then, btrfs check
just worked for me so I wasn't able to reproduce.
Now I'm able to reproduce again.

I tried again, it's definitely triggered by btrfs check --repair

I tried to capture what happens, and memory didn't dip to 0, but the system
got very slow and things started failing.
btrfs was never killed though while ssh was.
Is there a chance that maybe btrfs is in some kernel OOM exclude list?

Here is what I got when the system was not doing well (it took minutes to run):

 total   used   free sharedbuffers cached
Mem:  32643788   32070952 572836  0 1021604378772
-/+ buffers/cache:   275900205053768
Swap: 15616764 973596   14643168

gargamel:~# cat /proc/meminfo
MemTotal:   32643788 kB
MemFree: 2726276 kB
MemAvailable:2502200 kB
Buffers:   12360 kB
Cached:  1676388 kB
SwapCached: 11048580 kB
Active: 16443004 kB
Inactive:   12010456 kB
Active(anon):   16287780 kB
Inactive(anon): 11651692 kB
Active(file): 155224 kB
Inactive(file):   358764 kB
Unevictable:5776 kB
Mlocked:5776 kB
SwapTotal:  15616764 kB
SwapFree: 294592 kB
Dirty:  3032 kB
Writeback: 76064 kB
AnonPages:  15723272 kB
Mapped:   612124 kB
Shmem:   1171032 kB
Slab: 399824 kB
SReclaimable:  84568 kB
SUnreclaim:   315256 kB
KernelStack:   20576 kB
PageTables:94268 kB
NFS_Unstable:  0 kB
Bounce:0 kB
WritebackTmp:  0 kB
CommitLimit:31938656 kB
Committed_AS:   37909452 kB
VmallocTotal:   34359738367 kB
VmallocUsed:   0 kB
VmallocChunk:  0 kB
HardwareCorrupted: 0 kB
AnonHugePages: 98304 kB
ShmemHugePages:0 kB
ShmemPmdMapped:0 kB
CmaTotal:  16384 kB
CmaFree:   0 kB
HugePages_Total:   0
HugePages_Free:0
HugePages_Rsvd:0
HugePages_Surp:0
Hugepagesize:   2048 kB
Hugetlb:   0 kB
DirectMap4k:  355604 kB
DirectMap2M:32897024 kB

and console:
[ 9184.345329] INFO: task zmtrigger.pl:9981 blocked for more than 120 seconds.
[ 9184.366258]   Not tainted 4.17.6-amd64-preempt-sysrq-20180818 #4
[ 9184.385323] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[ 9184.408803] zmtrigger.plD0  9981   9804 0x20020080
[ 9184.425249] Call Trace:
[ 9184.432580]  ? __schedule+0x53e/0x59b
[ 9184.443551]  schedule+0x7f/0x98
[ 9184.452960]  io_schedule+0x16/0x38
[ 9184.463154]  wait_on_page_bit_common+0x10c/0x199
[ 9184.476996]  ? file_check_and_advance_wb_err+0xd7/0xd7
[ 9184.493339]  shmem_getpage_gfp+0x2dd/0x975
[ 9184.506558]  shmem_fault+0x188/0x1c3
[ 9184.518199]  ? filemap_map_pages+0x6f/0x295
[ 9184.531680]  __do_fault+0x1d/0x6e
[ 9184.542505]  __handle_mm_fault+0x675/0xa61
[ 9184.555653]  ? list_move+0x21/0x3a
[ 9184.566737]  handle_mm_fault+0x11c/0x16b
[ 9184.579355]  __do_page_fault+0x324/0x41c
[ 9184.591996]  ? page_fault+0x8/0x30
[ 9184.603059]  page_fault+0x1e/0x30
[ 9184.613846] RIP: 0023:0xf7d2d022
[ 9184.624366] RSP: 002b:ffeb9fe8 EFLAGS: 00010202
[ 9184.640868] RAX: f7eed000 RBX: 567e6000 RCX: 0004
[ 9184.663095] RDX: 587fecb0 RSI: 5876538c RDI: 0004
[ 9184.685308] RBP: 58185160 R08:  R09: 
[ 9184.707524] R10:  R11: 0286 R12: 
[ 9184.729757] R13:  R14:  R15: 
[ 9184.751988] INFO: task /usr/sbin/apach:11868 blocked for more than 120 
seconds.
[ 9184.775106]   Not tainted 4.17.6-amd64-preempt-sysrq-20180818 #4
[ 9184.795072] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[ 9184.819423] /usr/sbin/apach D0 11868  11311 0x20020080
[ 9184.836748] Call Trace:
[ 9184.844926]  ? __schedule+0x53e/0x59b
[ 9184.856811]  schedule+0x7f/0x98
[ 9184.867075]  io_schedule+0x16/0x38
[ 9184.878114]  wait_on_page_bit_common+0x10c/0x199
[ 9184.892807]  ? file_check_and_advance_wb_err+0xd7/0xd7
[ 9184.909036]  shmem_getpage_gfp+0x2dd/0x975
[ 9184.922157]  shmem_fault+0x188/0x1c3
[ 9184.933667]  ? filemap_map_pages+0x6f/0x295
[ 9184.947504]  __do_fa

task btrfs-transacti:921 blocked for more than 120 seconds during check repair

2018-07-17 Thread Marc MERLIN
I got the following on 4.17.6 while running btrfs check --repair on an
unmounted filesystem (not the lowmem version)

I understand that btrfs check is userland only, although it seems that
it caused these FS hangs on a different filesystem (the trace of course
does not provide info on which FS)

Any idea what happened here?
I'm going to wait a few hours without running btrfs check to see if it
happens again and then if running btrfs check will re-create this issue,
but other suggestions (if any), are welcome:

[ 2538.566952] Workqueue: btrfs-endio-write btrfs_endio_write_helper
[ 2538.616484] Call Trace:
[ 2538.623828]  ? __schedule+0x53e/0x59b
[ 2538.634802]  schedule+0x7f/0x98
[ 2538.644214]  wait_current_trans+0x9b/0xd8
[ 2538.656229]  ? add_wait_queue+0x3a/0x3a
[ 2538.668239]  start_transaction+0x1ce/0x325
[ 2538.680556]  btrfs_finish_ordered_io+0x240/0x5d3
[ 2538.694414]  normal_work_helper+0x118/0x277
[ 2538.706984]  process_one_work+0x19c/0x281
[ 2538.719036]  ? rescuer_thread+0x279/0x279
[ 2538.731064]  worker_thread+0x197/0x246
[ 2538.742322]  kthread+0xeb/0xf0
[ 2538.751492]  ? kthread_create_worker_on_cpu+0x66/0x66
[ 2538.76]  ret_from_fork+0x35/0x40
[ 2538.777403] INFO: task kworker/u16:11:369 blocked for more than 120 seconds.
[ 2538.799025]   Not tainted 4.17.6-amd64-preempt-sysrq-20180818 #4
[ 2538.818109] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[ 2538.841640] kworker/u16:11  D0   369  2 0x8000
[ 2538.858112] Workqueue: btrfs-endio-write btrfs_endio_write_helper
[ 2538.876401] Call Trace:
[ 2538.883770]  ? __schedule+0x53e/0x59b
[ 2538.894760]  schedule+0x7f/0x98
[ 2538.904192]  wait_current_trans+0x9b/0xd8
[ 2538.916242]  ? add_wait_queue+0x3a/0x3a
[ 2538.927772]  start_transaction+0x1ce/0x325
[ 2538.940081]  btrfs_finish_ordered_io+0x240/0x5d3
[ 2538.953973]  normal_work_helper+0x118/0x277
[ 2538.966523]  process_one_work+0x19c/0x281
[ 2538.978546]  ? rescuer_thread+0x279/0x279
[ 2538.990560]  worker_thread+0x197/0x246
[ 2539.001797]  kthread+0xeb/0xf0
[ 2539.010986]  ? kthread_create_worker_on_cpu+0x66/0x66
[ 2539.026137]  ret_from_fork+0x35/0x40
[ 2539.037666] INFO: task btrfs-transacti:921 blocked for more than 120 seconds.
[ 2539.059851]   Not tainted 4.17.6-amd64-preempt-sysrq-20180818 #4
[ 2539.079733] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[ 2539.104007] btrfs-transacti D0   921  2 0x8000
[ 2539.121257] Call Trace:
[ 2539.129377]  ? __schedule+0x53e/0x59b
[ 2539.141171]  schedule+0x7f/0x98
[ 2539.151370]  btrfs_tree_lock+0xa6/0x19d
[ 2539.163621]  ? add_wait_queue+0x3a/0x3a
[ 2539.175876]  btrfs_search_slot+0x5aa/0x756
[ 2539.188899]  lookup_inline_extent_backref+0x11a/0x485
[ 2539.204781]  ? fixup_slab_list.isra.43+0x1b/0x72
[ 2539.219360]  __btrfs_free_extent+0xf1/0xa72
[ 2539.232597]  ? btrfs_merge_delayed_refs+0x18b/0x1a7
[ 2539.247922]  ? __mutex_trylock_or_owner+0x43/0x54
[ 2539.262708]  __btrfs_run_delayed_refs+0xad8/0xc40
[ 2539.277504]  btrfs_run_delayed_refs+0x6e/0x16a
[ 2539.291519]  btrfs_commit_transaction+0x42/0x710
[ 2539.306043]  ? start_transaction+0x295/0x325
[ 2539.319516]  transaction_kthread+0xc9/0x135
[ 2539.332757]  ? btrfs_cleanup_transaction+0x3ee/0x3ee
[ 2539.348327]  kthread+0xeb/0xf0
[ 2539.358155]  ? kthread_create_worker_on_cpu+0x66/0x66
[ 2539.373977]  ret_from_fork+0x35/0x40
[ 2539.385394] INFO: task vnstatd:6338 blocked for more than 120 seconds.
[ 2539.405667]   Not tainted 4.17.6-amd64-preempt-sysrq-20180818 #4

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/   | PGP 7F55D5F27AAF9D08
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Why original mode doesn't use swap? (Original: Re: btrfs check lowmem, take 2)

2018-07-12 Thread Marc MERLIN
On Thu, Jul 12, 2018 at 01:26:41PM +0800, Qu Wenruo wrote:
> 
> 
> On 2018年07月12日 01:09, Chris Murphy wrote:
> > On Tue, Jul 10, 2018 at 12:09 PM, Marc MERLIN  wrote:
> >> Thanks to Su and Qu, I was able to get my filesystem to a point that
> >> it's mountable.
> >> I then deleted loads of snapshots and I'm down to 26.
> >>
> >> IT now looks like this:
> >> gargamel:~# btrfs fi show /mnt/mnt
> >> Label: 'dshelf2'  uuid: 0f1a0c9f-4e54-4fa7-8736-fd50818ff73d
> >> Total devices 1 FS bytes used 12.30TiB
> >> devid1 size 14.55TiB used 13.81TiB path /dev/mapper/dshelf2
> >>
> >> gargamel:~# btrfs fi df /mnt/mnt
> >> Data, single: total=13.57TiB, used=12.19TiB
> >> System, DUP: total=32.00MiB, used=1.55MiB
> >> Metadata, DUP: total=124.50GiB, used=115.62GiB
> >> Metadata, single: total=216.00MiB, used=0.00B
> >> GlobalReserve, single: total=512.00MiB, used=0.00B
> >>
> >>
> >> Problems
> >> 1) btrfs check --repair _still_ takes all 32GB of RAM and crashes the
> >> server, despite my deleting lots of snapshots.
> >> Is it because I have too many files then?
> > 
> > I think originally needs most of metdata in memory.
> > 
> > I'm not understanding why btrfs check won't use swap like at least
> > xfs_repair and pretty sure e2fsck will as well.
> 
> I don't understand either.
> 
> Isn't memory from malloc() swappable?

I never looked at the code and why/how it crashes, but my guess was
that it somehow causes the kernel to grab a lot of memory in the btrfs
driver and that is what is what is crashing the system.
If it were just malloc() the btrfs user space tool, it should be both
swappable like you said, and should also get OOM'ed.

I suppose I can still be completely wrong, but I can't find another
logical explanation.

I just tried running it again to trigger the problem, but because I
freed a lot of snapshots, btrfs check --repair goes back to only using
10GB instead of 32GB, so I wasn't able to replicate OOM for you.

Incidently, it died with:
gargamel:~# btrfs check --repair /dev/mapper/dshelf2
enabling repair mode
Checking filesystem on /dev/mapper/dshelf2
UUID: 0f1a0c9f-4e54-4fa7-8736-fd50818ff73d
root 18446744073709551607 has a root item with a more recent gen (143376) 
compared to the found
 root node (139061)
ERROR: failed to repair root items: Invalid argument

That said, when it was using a fair amount of RAM, I captured this:
USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
root  1376  1.4 25.2 8256368 8240392 pts/18 R+  14:52   1:07 btrfs check 
--repair /dev/mapper/dshelf2

I don't know how to read /proc/meminfo, but that's what it said:
MemTotal:   32643792 kB
MemFree: 1367516 kB
MemAvailable:   15554836 kB
Buffers: 3491672 kB
Cached: 15900320 kB
SwapCached: 2092 kB
Active: 14577228 kB
Inactive:   15028608 kB
Active(anon):   12122180 kB
Inactive(anon):  2643176 kB
Active(file):2455048 kB
Inactive(file): 12385432 kB
Unevictable:8068 kB
Mlocked:8068 kB
SwapTotal:  15616764 kB   < swap was totally unused and stays unused when I 
get the system to crash 
SwapFree:   15578020 kB
Dirty: 71956 kB
Writeback:64 kB
AnonPages:  10219976 kB
Mapped:  4033568 kB
Shmem:   4545552 kB
Slab: 713300 kB
SReclaimable: 395508 kB
SUnreclaim:   317792 kB
KernelStack:   11788 kB
PageTables:52592 kB
NFS_Unstable:  0 kB
Bounce:0 kB
WritebackTmp:  0 kB
CommitLimit:31938660 kB
Committed_AS:   20070736 kB
VmallocTotal:   34359738367 kB
VmallocUsed:   0 kB
VmallocChunk:  0 kB
HardwareCorrupted: 0 kB
AnonHugePages: 0 kB
ShmemHugePages:0 kB
ShmemPmdMapped:0 kB
CmaTotal:  16384 kB
CmaFree:   0 kB
HugePages_Total:   0
HugePages_Free:0
HugePages_Rsvd:0
HugePages_Surp:0
Hugepagesize:   2048 kB
Hugetlb:   0 kB
DirectMap4k: 1207572 kB
DirectMap2M:32045056 kB

Does it help figure out where the memory was going and wehther kernel
memory was being used?

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/   | PGP 7F55D5F27AAF9D08
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs check mode normal still hard crash-hanging systems

2018-07-11 Thread Marc MERLIN
On Wed, Jul 11, 2018 at 11:09:56AM -0600, Chris Murphy wrote:
> On Tue, Jul 10, 2018 at 12:09 PM, Marc MERLIN  wrote:
> > Thanks to Su and Qu, I was able to get my filesystem to a point that
> > it's mountable.
> > I then deleted loads of snapshots and I'm down to 26.
> >
> > IT now looks like this:
> > gargamel:~# btrfs fi show /mnt/mnt
> > Label: 'dshelf2'  uuid: 0f1a0c9f-4e54-4fa7-8736-fd50818ff73d
> > Total devices 1 FS bytes used 12.30TiB
> > devid1 size 14.55TiB used 13.81TiB path /dev/mapper/dshelf2
> >
> > gargamel:~# btrfs fi df /mnt/mnt
> > Data, single: total=13.57TiB, used=12.19TiB
> > System, DUP: total=32.00MiB, used=1.55MiB
> > Metadata, DUP: total=124.50GiB, used=115.62GiB
> > Metadata, single: total=216.00MiB, used=0.00B
> > GlobalReserve, single: total=512.00MiB, used=0.00B
> >
> >
> > Problems
> > 1) btrfs check --repair _still_ takes all 32GB of RAM and crashes the
> > server, despite my deleting lots of snapshots.
> > Is it because I have too many files then?
> 
> I think originally needs most of metdata in memory.
> 
> I'm not understanding why btrfs check won't use swap like at least
> xfs_repair and pretty sure e2fsck will as well.
> 
> Using 128G swap on nvme with original check is still gonna be faster
> than lowmem mode.

Yeah, that's been also a concern/question of mine all these years, even if
Su isn't working on that code, and likely is the wrong person to ask.
Personally, my take is that if btrfs wants to be taken seriously, at the
very least its fsck tool should not hard crash a system you run it on.
(and it really does the worst kind of hard crash I've ever seen, OOM can't
trigger fast enough, linux doesn't panic, so it can't self reboot either, 
it just hard dies and hangs)

Maybe David knows?

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs check lowmem, take 2

2018-07-10 Thread Marc MERLIN
On Wed, Jul 11, 2018 at 12:07:05PM +0800, Su Yue wrote:
> > So, I went back to https://github.com/Damenly/btrfs-progs.git/tmp1 and
> > I'm running it without the extra options you added with hardcoded stuff:
> > gargamel:/var/local/src/btrfs-progs.sy-test# ./btrfsck --mode=lowmem 
> > --repair /dev/mapper/dshelf2
> > 
> This is okay. Let's wait to see the result.

Sadly, it crashes quickly:

Starting program: /var/local/src/btrfs-progs.sy-test/btrfs check --mode=lowmem 
--repair /dev/mapper/dshelf2
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
enabling repair mode
WARNING: low-memory mode repair support is only partial
 Checking filesystem on /dev/mapper/dshelf2
UUID: 0f1a0c9f-4e54-4fa7-8736-fd50818ff73d
checking extents

Program received signal SIGSEGV, Segmentation fault.
check_tree_block_backref (fs_info=fs_info@entry=0x55825e10, 
root_id=root_id@entry=18446744073709551607, bytenr=bytenr@entry=655589376, 
level=level@entry=1)
at check/mode-lowmem.c:3744
3744if (btrfs_header_bytenr(node) != bytenr) {
(gdb)  bt
#0  check_tree_block_backref (fs_info=fs_info@entry=0x55825e10, 
root_id=root_id@entry=18446744073709551607, bytenr=bytenr@entry=655589376, 
level=level@entry=1)
at check/mode-lowmem.c:3744
#1  0x555cb1f9 in check_extent_item 
(fs_info=fs_info@entry=0x55825e10, 
path=path@entry=0x7fffdc60) at check/mode-lowmem.c:4194
#2  0x555d06e9 in check_leaf_items (account_bytes=1, 
nrefs=0x7fffdb80, 
path=0x7fffdc60, root=0x558262f0) at check/mode-lowmem.c:4654
#3  walk_down_tree (check_all=1, nrefs=0x7fffdb80, level=, 
path=0x7fffdc60, root=0x558262f0) at check/mode-lowmem.c:4790
#4  check_btrfs_root (root=root@entry=0x558262f0, 
check_all=check_all@entry=1)
at check/mode-lowmem.c:5114
#5  0x555d144f in check_chunks_and_extents_lowmem 
(fs_info=fs_info@entry=0x55825e10)
at check/mode-lowmem.c:5475
#6  0x555b44b1 in do_check_chunks_and_extents (fs_info=0x55825e10) 
at check/main.c:8369
#7  cmd_check (argc=, argv=) at check/main.c:9899
#8  0x55567510 in main (argc=4, argv=0x7fffe390) at btrfs.c:302


Would you like anything off gdb? (feel free to Email me directly or
point me to an online chat platform you have access to)

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/   | PGP 7F55D5F27AAF9D08
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs check lowmem, take 2

2018-07-10 Thread Marc MERLIN
On Wed, Jul 11, 2018 at 09:58:36AM +0800, Su Yue wrote:
> 
> 
> On 07/11/2018 09:44 AM, Marc MERLIN wrote:
> > On Wed, Jul 11, 2018 at 09:08:40AM +0800, Su Yue wrote:
> > > 
> > > 
> > > On 07/11/2018 08:58 AM, Marc MERLIN wrote:
> > > > On Wed, Jul 11, 2018 at 08:53:58AM +0800, Su Yue wrote:
> > > > > > Problems
> > > > > > 1) btrfs check --repair _still_ takes all 32GB of RAM and crashes 
> > > > > > the
> > > > > > server, despite my deleting lots of snapshots.
> > > > > > Is it because I have too many files then?
> > > > > > 
> > > > > Yes. Original check first gather all infomation about extent tree and
> > > > > your files in RAM, then process one by one.
> > > > > But deleting still counts, it does speed lowmem check up.
> > > > 
> > > > Understood.
> > > > 
> > > > > > 2) I tried Su's master git branch for btrfs-progs to try and see how
> > > > > Oh..No... My master branch is still 4.14. The true mater branch is
> > > > > David's here:
> > > > > https://github.com/kdave/btrfs-progs
> > > > > But the master branch has a known bug which I fixed yesterday, please 
> > > > > see
> > > > > the mail.
> > > > 
> > > > So, if I git sync it now, it should have your fix, and I can run it,
> > > > correct?
> > > > 
> > > Yes, please.
> > 
> > Ok, I am now running
> > gargamel:~# time btrfs check --mode=lowmem --repair /dev/mapper/dshelf2
> > using git master from https://github.com/kdave/btrfs-progs
> > 
> Please stop check, plese.
> 
> The branch 'it' which I mean is
> https://github.com/Damenly/btrfs-progs/tree/tmp1

Ok, sorry I thought you said you had pushed your changes to 
https://github.com/kdave/btrfs-progs
yesterday.

So, I went back to https://github.com/Damenly/btrfs-progs.git/tmp1 and
I'm running it without the extra options you added with hardcoded stuff:
gargamel:/var/local/src/btrfs-progs.sy-test# ./btrfsck --mode=lowmem --repair 
/dev/mapper/dshelf2

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/   | PGP 7F55D5F27AAF9D08
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs check lowmem, take 2

2018-07-10 Thread Marc MERLIN
On Wed, Jul 11, 2018 at 09:08:40AM +0800, Su Yue wrote:
> 
> 
> On 07/11/2018 08:58 AM, Marc MERLIN wrote:
> > On Wed, Jul 11, 2018 at 08:53:58AM +0800, Su Yue wrote:
> > > > Problems
> > > > 1) btrfs check --repair _still_ takes all 32GB of RAM and crashes the
> > > > server, despite my deleting lots of snapshots.
> > > > Is it because I have too many files then?
> > > > 
> > > Yes. Original check first gather all infomation about extent tree and
> > > your files in RAM, then process one by one.
> > > But deleting still counts, it does speed lowmem check up.
> > 
> > Understood.
> > 
> > > > 2) I tried Su's master git branch for btrfs-progs to try and see how
> > > Oh..No... My master branch is still 4.14. The true mater branch is
> > > David's here:
> > > https://github.com/kdave/btrfs-progs
> > > But the master branch has a known bug which I fixed yesterday, please see
> > > the mail.
> > 
> > So, if I git sync it now, it should have your fix, and I can run it,
> > correct?
> > 
> Yes, please.

Ok, I am now running
gargamel:~# time btrfs check --mode=lowmem --repair /dev/mapper/dshelf2
using git master from https://github.com/kdave/btrfs-progs

I will report back how long it takes with extent tree check and whether
it returns clean, or not.

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/   | PGP 7F55D5F27AAF9D08
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs check lowmem, take 2

2018-07-10 Thread Marc MERLIN
On Wed, Jul 11, 2018 at 08:53:58AM +0800, Su Yue wrote:
> > Problems
> > 1) btrfs check --repair _still_ takes all 32GB of RAM and crashes the
> > server, despite my deleting lots of snapshots.
> > Is it because I have too many files then?
> > 
> Yes. Original check first gather all infomation about extent tree and
> your files in RAM, then process one by one.
> But deleting still counts, it does speed lowmem check up.

Understood.

> > 2) I tried Su's master git branch for btrfs-progs to try and see how
> Oh..No... My master branch is still 4.14. The true mater branch is
> David's here:
> https://github.com/kdave/btrfs-progs
> But the master branch has a known bug which I fixed yesterday, please see
> the mail.

So, if I git sync it now, it should have your fix, and I can run it,
correct?

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/   | PGP 7F55D5F27AAF9D08
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


btrfs check lowmem, take 2

2018-07-10 Thread Marc MERLIN
Thanks to Su and Qu, I was able to get my filesystem to a point that
it's mountable.
I then deleted loads of snapshots and I'm down to 26.

IT now looks like this:
gargamel:~# btrfs fi show /mnt/mnt
Label: 'dshelf2'  uuid: 0f1a0c9f-4e54-4fa7-8736-fd50818ff73d
Total devices 1 FS bytes used 12.30TiB
devid1 size 14.55TiB used 13.81TiB path /dev/mapper/dshelf2

gargamel:~# btrfs fi df /mnt/mnt
Data, single: total=13.57TiB, used=12.19TiB
System, DUP: total=32.00MiB, used=1.55MiB
Metadata, DUP: total=124.50GiB, used=115.62GiB
Metadata, single: total=216.00MiB, used=0.00B
GlobalReserve, single: total=512.00MiB, used=0.00B


Problems
1) btrfs check --repair _still_ takes all 32GB of RAM and crashes the
server, despite my deleting lots of snapshots.
Is it because I have too many files then?

2) I tried Su's master git branch for btrfs-progs to try and see how a
normal check would go, and I'm stuck on this:
gargamel:/var/local/src/btrfs-progs.sy# time ./btrfsck --mode=lowmem --repair 
/dev/mapper/dshelf2
enabling repair mode
WARNING: low-memory mode repair support is only partial
Checking filesystem on /dev/mapper/dshelf2
UUID: 0f1a0c9f-4e54-4fa7-8736-fd50818ff73d
root 18446744073709551607 has a root item with a more recent gen (143376) 
compared to the found root node (139061)
ERROR: failed to repair root items: Invalid argument

real75m8.046s
user0m14.591s
sys 0m52.431s

I understand what the message means, I just need to switch to the newer root
but honestly I'm not quite sure how to do this from the btrfs-check man page.

This didn't work:
time ./btrfsck --mode=lowmem --repair --chunk-root=18446744073709551607  
/dev/mapper/dshelf2
enabling repair mode
WARNING: low-memory mode repair support is only partial
WARNING: chunk_root_bytenr 18446744073709551607 is unaligned to 4096, ignore it

How do I address the error above?

Thanks
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/   | PGP 7F55D5F27AAF9D08
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: So, does btrfs check lowmem take days? weeks?

2018-07-09 Thread Marc MERLIN
To fill in for the spectators on the list :)
Su gave me a modified version of btrfsck lowmem that was able to clean
most of my filesystem.
It's not a general case solution since it had some hardcoding specific
to my filesystem problems, but still a great success.
Email quoted below, along with responses to Qu

On Tue, Jul 10, 2018 at 09:09:33AM +0800, Qu Wenruo wrote:
> 
> 
> On 2018年07月10日 01:48, Marc MERLIN wrote:
> > Success!
> > Well done Su, this is a huge improvement to the lowmem code. It went from 
> > days to less than 3 hours.
> 
> Awesome work!
> 
> > I'll paste the logs below.
> > 
> > Questions:
> > 1) I assume I first need to delete a lot of snapshots. What is the limit in 
> > your opinion?
> > 100? 150? other?
> 
> My personal recommendation is just 20. Not 150, not even 100.
 
I see. Then, I may be forced to recreate multiple filesystems anyway.
I have about 25 btrfs send/receive relationships and I have around 10
historical snapshots for each.

In the future, can't we segment extents/snapshots per subvolume, making
subvolumes mini filesystems within the bigger filesystem?

> But snapshot deletion will take time (and it's delayed, you won't know
> if something wrong happened just after "btrfs subv delete") and even
> require a healthy extent tree.
> If all extent tree errors are just false alert, that should not be a big
> problem at all.
> 
> > 
> > 2) my filesystem is somewhat misbalanced. Which balance options do you 
> > think are safe to use?
> 
> I would recommend to manually check extent tree for BLOCK_GROUP_ITEM,
> which will tell how big a block group is and how many space is used.
> And gives you an idea on which block group can be relocated.
> Then use vrange= to specify exact block group to relocation.
> 
> One example would be:
> 
> # btrfs ins dump-tree -t extent  | grep -A1 BLOCK_GROUP_ITEM |\
>   tee block_group_dump
> 
> Then the output contains:
>   item 1 key (13631488 BLOCK_GROUP_ITEM 8388608) itemoff 16206 itemsize 24
>   block group used 262144 chunk_objectid 256 flags DATA
> 
> The "13631488" is the bytenr of the block group.
> The "8388608" is the length of the block group.
> The "262144" is the used bytes of the block group.
> 
> The less used space the higher priority it should be relocated. (and
> faster to relocate).
> You could write a small script to do it, or there should be some tool to
> do the calculation for you.
 
I usually use something simpler:
Label: 'btrfs_boot'  uuid: e4c1daa8-9c39-4a59-b0a9-86297d397f3b
Total devices 1 FS bytes used 30.19GiB
devid1 size 79.93GiB used 78.01GiB path /dev/mapper/cryptroot

This is bad, I have 30GB of data, but 78 out of 80GB of structures full.
This is bad news and recommends a balance, correct?
If so, I always struggle as to what value I should give to dusage and
musage...

> And only relocate one block group each time, to avoid possible problem.
> 
> The last but not the least, it's highly recommend to do the relocation
> only after unused snapshots are completely deleted.
> (Or it would be super super slow to relocate)

Thank you for the advise. Hopefully this hepls someone else too, and
maybe someone can write some reallocate helper tool if I don't have the
time to do it myself.

> > 3) Should I start a scrub now (takes about 1 day) or anything else to
> > check that the filesystem is hopefully not damaged anymore?
> 
> I would normally recommend to use btrfs check, but neither mode really
> works here.
> And scrub only checks csum, doesn't check the internal cross reference
> (like content of extent tree).
> 
> Maybe Su could skip the whole extent tree check and let lowmem to check
> the fs tree only, with --check-data-csum it should be a better work than
>  scrub.

I will wait to hear back from Su, but I think the current situation is
that I still have some problems on my FS, they are just
1) not important enough to block mount rw (now it works again)
2) currently ignored by the modified btrfsck I have, but would cause
problems if I used real btrfsck.

Correct?

> > 
> > 4) should btrfs check reset the corrupt counter?
> > bdev /dev/mapper/dshelf2 errs: wr 0, rd 0, flush 0, corrupt 2, gen 0
> > for now, should I reset it manually?
> 
> It could be pretty easy to implement if not already implemented.

Seems like it's not given that Su's btrfsck --repair ran to completion
and I still have corrupt set to '2' :)

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/   | PGP 7F55D5F27AAF9D08
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: So, does btrfs check lowmem take days? weeks?

2018-07-09 Thread Marc MERLIN
On Tue, Jul 10, 2018 at 09:34:36AM +0800, Qu Wenruo wrote:
>  Ok, this is where I am now:
>  WARNING: debug: end of checking extent item[18457780273152 169 1]
>  type: 176 offset: 2
>  checking extent items [18457780273152/18457780273152]
>  ERROR: errors found in extent allocation tree or chunk allocation
>  checking fs roots
>  ERROR: root 17592 EXTENT_DATA[25937109 4096] gap exists, expected:
>  EXTENT_DATA[25937109 4033]
> 
> The expected end is not even aligned to sectorsize.
> 
> I think there is something wrong.
> Dump tree on this INODE would definitely help in this case.
> 
> Marc, would you please try dump using the following command?
> 
> # btrfs ins dump-tree -t 17592  | grep -C 40 25937109
 
Sure, there you go:
gargamel:~# btrfs ins dump-tree -t 17592 /dev/mapper/dshelf2  | grep -C 40 
25937109
extent data disk byte 3259370151936 nr 114688
extent data offset 0 nr 131072 ram 131072
extent compression 1 (zlib)
item 144 key (2009526 EXTENT_DATA 1179648) itemoff 7931 itemsize 53
generation 18462 type 1 (regular)
extent data disk byte 3259370266624 nr 118784
extent data offset 0 nr 131072 ram 131072
extent compression 1 (zlib)
item 145 key (2009526 EXTENT_DATA 1310720) itemoff 7878 itemsize 53
generation 18462 type 1 (regular)
extent data disk byte 3259370385408 nr 118784
extent data offset 0 nr 131072 ram 131072
extent compression 1 (zlib)
item 146 key (2009526 EXTENT_DATA 1441792) itemoff 7825 itemsize 53
generation 18462 type 1 (regular)
extent data disk byte 3259370504192 nr 118784
extent data offset 0 nr 131072 ram 131072
extent compression 1 (zlib)
item 147 key (2009526 EXTENT_DATA 1572864) itemoff 7772 itemsize 53
generation 18462 type 1 (regular)
extent data disk byte 3259370622976 nr 114688
extent data offset 0 nr 131072 ram 131072
extent compression 1 (zlib)
item 148 key (2009526 EXTENT_DATA 1703936) itemoff 7719 itemsize 53
generation 18462 type 1 (regular)
extent data disk byte 3259370737664 nr 118784
extent data offset 0 nr 131072 ram 131072
extent compression 1 (zlib)
item 149 key (2009526 EXTENT_DATA 1835008) itemoff 7666 itemsize 53
generation 18462 type 1 (regular)
extent data disk byte 3259370856448 nr 118784
extent data offset 0 nr 131072 ram 131072
extent compression 1 (zlib)
item 150 key (2009526 EXTENT_DATA 1966080) itemoff 7613 itemsize 53
generation 18462 type 1 (regular)
extent data disk byte 3259370975232 nr 118784
extent data offset 0 nr 131072 ram 131072
extent compression 1 (zlib)
item 151 key (2009526 EXTENT_DATA 2097152) itemoff 7560 itemsize 53
generation 18462 type 1 (regular)
extent data disk byte 3259371094016 nr 114688
extent data offset 0 nr 131072 ram 131072
extent compression 1 (zlib)
item 152 key (2009526 EXTENT_DATA 2228224) itemoff 7507 itemsize 53
generation 18462 type 1 (regular)
extent data disk byte 3259371208704 nr 114688
extent data offset 0 nr 131072 ram 131072
extent compression 1 (zlib)
item 153 key (2009526 EXTENT_DATA 2359296) itemoff 7454 itemsize 53
generation 18462 type 1 (regular)
extent data disk byte 3259371323392 nr 110592
extent data offset 0 nr 131072 ram 131072
extent compression 1 (zlib)
item 154 key (2009526 EXTENT_DATA 2490368) itemoff 7401 itemsize 53
generation 18462 type 1 (regular)
extent data disk byte 3259371433984 nr 114688
extent data offset 0 nr 131072 ram 131072
extent compression 1 (zlib)
item 155 key (2009526 EXTENT_DATA 2621440) itemoff 7348 itemsize 53
generation 18462 type 1 (regular)
extent data disk byte 3259371548672 nr 110592
extent data offset 0 nr 131072 ram 131072
extent compression 1 (zlib)
item 156 key (2009526 EXTENT_DATA 2752512) itemoff 7295 itemsize 53
generation 18462 type 1 (regular)
extent data disk byte 3259371659264 nr 114688
extent data offset 0 nr 131072 ram 131072
extent compression 1 (zlib)
item 157 key (2009526 EXTENT_DATA 2883584) itemoff 7242 itemsize 53
generation 18462 type 1 (regular)
extent data disk byte 3259371773952 nr 106496
extent 

Re: So, does btrfs check lowmem take days? weeks?

2018-07-03 Thread Marc MERLIN
On Tue, Jul 03, 2018 at 03:46:59PM -0600, Chris Murphy wrote:
> On Tue, Jul 3, 2018 at 2:50 AM, Qu Wenruo  wrote:
> >
> >
> > There must be something wrong, however due to the size of the fs, and
> > the complexity of extent tree, I can't tell.
> 
> Right, which is why I'm asking if any of the metadata integrity
> checker mask options might reveal what's going wrong?
> 
> I guess the big issues are:
> a. compile kernel with CONFIG_BTRFS_FS_CHECK_INTEGRITY=y is necessary
> b. it can come with a high resource burden depending on the mask and
> where the log is being written (write system logs to a different file
> system for sure)
> c. the granularity offered in the integrity checker might not be enough.
> d. might take a while before corruptions are injected before
> corruption is noticed and flagged.

Back to where I'm at right now. I'm going to delete this filesystem and
start over very soon. Tomorrow or the day after.
I'm happy to get more data off it if someone wants it for posterity, but
I indeed need to recover soon since being with a dead backup server is
not a good place to be in :)

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/   | PGP 7F55D5F27AAF9D08
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: So, does btrfs check lowmem take days? weeks?

2018-07-03 Thread Marc MERLIN
On Tue, Jul 03, 2018 at 03:34:45PM -0600, Chris Murphy wrote:
> On Tue, Jul 3, 2018 at 2:34 AM, Su Yue  wrote:
> 
> > Yes, extent tree is the hardest part for lowmem mode. I'm quite
> > confident the tool can deal well with file trees(which records metadata
> > about file and directory name, relationships).
> > As for extent tree, I have few confidence due to its complexity.
> 
> I have to ask again if there's some metadata integrity mask opion Marc
> should use to try to catch the corruption cause in the first place?
> 
> His use case really can't afford either mode of btrfs check. And also
> check is only backward looking, it doesn't show what was happening at
> the time. And for big file systems, check rapidly doesn't scale at all
> anyway.
> 
> And now he's modifying his layout to avoid the problem from happening
> again which makes it less likely to catch the cause, and get it fixed.
> I think if he's willing to build a kernel with integrity checker
> enabled, it should be considered but only if it's likely to reveal why
> the problem is happening, even if it can't repair the problem once
> it's happened. He's already in that situation so masked integrity
> checking is no worse, at least it gives a chance to improve Btrfs
> rather than it being a mystery how it got corrupt.

Yeah, I'm fine waiting a few more ays with this down and gather data if
that helps.
But due to the size, a full btrfs image may be a bit larger than we
want, not counting some confidential data in some filenames.

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/   | PGP 7F55D5F27AAF9D08
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: So, does btrfs check lowmem take days? weeks?

2018-07-03 Thread Marc MERLIN
On Tue, Jul 03, 2018 at 04:50:48PM +0800, Qu Wenruo wrote:
> > It sounds like there may not be a fix to this problem with the filesystem's
> > design, outside of "do not get there, or else".
> > It would even be useful for btrfs tools to start computing heuristics and
> > output warnings like "you have more than 100 snapshots on this filesystem,
> > this is not recommended, please read http://url/;
> 
> This looks pretty doable, but maybe it's better to add some warning at
> btrfs progs (both "subvolume snapshot" and "receive").

This is what I meant to say, correct.

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: how to best segment a big block device in resizeable btrfs filesystems?

2018-07-02 Thread Marc MERLIN
On Tue, Jul 03, 2018 at 04:26:37AM +, Paul Jones wrote:
> I don't have any experience with this, but since it's the internet let me 
> tell you how I'd do it anyway 

That's the spirit :)

> raid5
> dm-crypt
> lvm (using thin provisioning + cache)
> btrfs
> 
> The cache mode on lvm requires you to set up all your volumes first, then
> add caching to those volumes last. If you need to modify the volume then
> you have to remove the cache, make your changes, then re-add the cache. It
> sounds like a pain, but having the cache separate from the data is quite
> handy.

I'm ok enough with that.

> Given you are running a backup server I don't think the cache would
> really do much unless you enable writeback mode. If you can split up your
> filesystem a bit to the point that btrfs check doesn't OOM that will
> seriously help performance as well. Rsync might be feasible again.

I'm a bit warry of write caching with the issues I've had. I may do
write-through, but not writeback :)

But caching helps indeed for my older filesystems that are still backed up
via rsync because the source fs is ext4 and not btrfs.

Thanks for the suggestions
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: So, does btrfs check lowmem take days? weeks?

2018-07-02 Thread Marc MERLIN
On Mon, Jul 02, 2018 at 06:31:43PM -0600, Chris Murphy wrote:
> So the idea behind journaled file systems is that journal replay
> enabled mount time "repair" that's faster than an fsck. Already Btrfs
> use cases with big, but not huge, file systems makes btrfs check a
> problem. Either running out of memory or it takes too long. So already
> it isn't scaling as well as ext4 or XFS in this regard.
> 
> So what's the future hold? It seems like the goal is that the problems
> must be avoided in the first place rather than to repair them after
> the fact.
> 
> Are the problem's Marc is running into understood well enough that
> there can eventually be a fix, maybe even an on-disk format change,
> that prevents such problems from happening in the first place?
> 
> Or does it make sense for him to be running with btrfs debug or some
> subset of btrfs integrity checking mask to try to catch the problems
> in the act of them happening?

Those are all good questions.
To be fair, I cannot claim that btrfs was at fault for whatever filesystem
damage I ended up with. It's very possible that it happened due to a flaky
Sata card that kicked drives off the bus when it shouldn't have.
Sure in theory a journaling filesystem can recover from unexpected power
loss and drives dropping off at bad times, but I'm going to guess that
btrfs' complexity also means that it has data structures (extent tree?) that
need to be updated completely "or else".

I'm obviously ok with a filesystem check being necessary to recover in cases
like this, afterall I still occasionally have to run e2fsck on ext4 too, but
I'm a lot less thrilled with the btrfs situation where basically the repair
tools can either completely crash your kernel, or take days and then either
get stuck in an infinite loop or hit an algorithm that can't scale if you
have too many hardlinks/snapshots.

It sounds like there may not be a fix to this problem with the filesystem's
design, outside of "do not get there, or else".
It would even be useful for btrfs tools to start computing heuristics and
output warnings like "you have more than 100 snapshots on this filesystem,
this is not recommended, please read http://url/;

Qu, Su, does that sound both reasonable and doable?

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: how to best segment a big block device in resizeable btrfs filesystems?

2018-07-02 Thread Marc MERLIN
On Tue, Jul 03, 2018 at 09:37:47AM +0800, Qu Wenruo wrote:
> > If I do this, I would have
> > software raid 5 < dmcrypt < bcache < lvm < btrfs
> > That's a lot of layers, and that's also starting to make me nervous :)
> 
> If you could keep the number of snapshots to minimal (less than 10) for
> each btrfs (and the number of send source is less than 5), one big btrfs
> may work in that case.
 
Well, we kind of discussed this already. If btrfs falls over if you reach
100 snapshots or so, and it sure seems to in my case, I won't be much better
off.
Having btrfs check --repair fail because 32GB of RAM is not enough, and it's
unable to use swap, is a big deal in my case. You also confirmed that btrfs
check lowmem does not scale to filesystems like mine, so this translates
into "if regular btrfs check repair can't fit in 32GB, I am completely out
of luck if anything happens to the filesystem"

You're correct that I could tweak my backups and snapshot rotation to get
from 250 or so down to 100, but it seems that I'll just be hoping to avoid
the problem by being just under the limit, until I'm not, again, and it'll
be too late to do anything it next time I'm in trouble again, putting me
back right in the same spot I'm in now.
Is all this fair to say, or did I misunderstand?

> BTW, IMHO the bcache is not really helping for backup system, which is
> more write oriented.

That's a good point. So, what I didn't explain is that I still have some old
filesystem that do get backed up with rsync instead of btrfs send (going
into the same filesystem, but not same subvolume).
Because rsync is so painfully slow when it needs to scan both sides before
it'll even start doing any work, bcache helps there.

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: how to best segment a big block device in resizeable btrfs filesystems?

2018-07-02 Thread Marc MERLIN
On Tue, Jul 03, 2018 at 12:51:30AM +, Paul Jones wrote:
> You could combine bcache and lvm if you are happy to use dm-cache instead 
> (which lvm uses).
> I use it myself (but without thin provisioning) and it works well.

Interesting point. So, I used to use lvm and then lvm2 many years ago until
I got tired with its performance, especially as asoon as I took even a
single snapshot.
But that was a long time ago now, just saying that I'm a bit rusty on LVM
itself.

That being said, if I have
raid5
dm-cache
dm-crypt
dm-thin

That's still 4 block layers under btrfs.
Am I any better off using dm-cache instead of bcache, my understanding is
that it only replaces one block layer with another one and one codebase with
another.

Mmmh, a bit of reading shows that dm-cache is now used as lvmcache, which
might change things, or not.
I'll admit that setting up and maintaining bcache is a bit of a pain, I only
used it at the time because it seemed more ready then, but we're a few years
later now.

So, what do you recommend nowadays, assuming you've used both?
(given that it's literally going to take days to recreate my array, I'd
rather do it once and the right way the first time :) )

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: how to best segment a big block device in resizeable btrfs filesystems?

2018-07-02 Thread Marc MERLIN
On Mon, Jul 02, 2018 at 02:35:19PM -0400, Austin S. Hemmelgarn wrote:
> >I kind of linked the thin provisioning idea because it's hands off,
> >which is appealing. Any reason against it?
> No, not currently, except that it adds a whole lot more stuff between 
> BTRFS and whatever layer is below it.  That increase in what's being 
> done adds some overhead (it's noticeable on 7200 RPM consumer SATA 
> drives, but not on decent consumer SATA SSD's).
> 
> There used to be issues running BTRFS on top of LVM thin targets which 
> had zero mode turned off, but AFAIK, all of those problems were fixed 
> long ago (before 4.0).

I see, thanks for the heads up.

> >Does LVM do built in raid5 now? Is it as good/trustworthy as mdadm
> >radi5?
> Actually, it uses MD's RAID5 implementation as a back-end.  Same for 
> RAID6, and optionally for RAID0, RAID1, and RAID10.
 
Ok, that makes me feel a bit better :)

> >But yeah, if it's incompatible with thin provisioning, it's not that
> >useful.
> It's technically not incompatible, just a bit of a pain.  Last time I 
> tried to use it, you had to jump through hoops to repair a damaged RAID 
> volume that was serving as an underlying volume in a thin pool, and it 
> required keeping the thin pool offline for the entire duration of the 
> rebuild.

Argh, not good :( / thanks for the heads up.

> If you do go with thin provisioning, I would encourage you to make 
> certain to call fstrim on the BTRFS volumes on a semi regular basis so 
> that the thin pool doesn't get filled up with old unused blocks, 

That's a very good point/reminder, thanks for that. I guess it's like
running on an ssd :)

> preferably when you are 100% certain that there are no ongoing writes on 
> them (trimming blocks on BTRFS gets rid of old root trees, so it's a bit 
> dangerous to do it while writes are happening).
 
Argh, that will be harder, but I'll try.

Given what you said, it sounds like I'll still be best off with separate
layers to avoid the rebuild problem you mentioned.
So it'll be
swraid5 / dmcrypt / bcache / lvm dm thin / btrfs

Hopefully that will work well enough.

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: So, does btrfs check lowmem take days? weeks?

2018-07-02 Thread Marc MERLIN
On Mon, Jul 02, 2018 at 10:33:09PM +0500, Roman Mamedov wrote:
> On Mon, 2 Jul 2018 08:19:03 -0700
> Marc MERLIN  wrote:
> 
> > I actually have fewer snapshots than this per filesystem, but I backup
> > more than 10 filesystems.
> > If I used as many snapshots as you recommend, that would already be 230
> > snapshots for 10 filesystems :)
> 
> (...once again me with my rsync :)
> 
> If you didn't use send/receive, you wouldn't be required to keep a separate
> snapshot trail per filesystem backed up, one trail of snapshots for the entire
> backup server would be enough. Rsync everything to subdirs within one
> subvolume, then do timed or event-based snapshots of it. You only need more
> than one trail if you want different retention policies for different datasets
> (e.g. in my case I have 91 and 31 days).

This is exactly how I used to do backups before btrfs.
I did 

cp -al backup.olddate backup.newdate
rsync -avSH src/ backup.newdate/

You don't even need snapshots or btrfs anymore.
Also, sorry to say, but I have different data retention needs for
different backups. Some need to rotate more quickly than others, but if
you're using rsync, the method I gave above works fine at any rotation
interval you need.

It is almost as efficient as btrfs on space, but as I said, the time
penalty on all those stats for many files was what killed it for me.
If I go back to rsync backups (and I'm really unlikely to), then I'd
also go back to ext4. There would be no point in dealing with the
complexity and fragility of btrfs anymore.

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/   | PGP 7F55D5F27AAF9D08
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: how to best segment a big block device in resizeable btrfs filesystems?

2018-07-02 Thread Marc MERLIN
On Mon, Jul 02, 2018 at 12:59:02PM -0400, Austin S. Hemmelgarn wrote:
> > Am I supposed to put LVM thin volumes underneath so that I can share
> > the same single 10TB raid5?
>
> Actually, because of the online resize ability in BTRFS, you don't
> technically _need_ to use thin provisioning here.  It makes the maintenance
> a bit easier, but it also adds a much more complicated layer of indirection
> than just doing regular volumes.

You're right that I can use btrfs resize, but then I still need an LVM
device underneath, correct?
So, if I have 10 backup targets, I need 10 LVM LVs, I give them 10%
each of the full size available (as a guess), and then I'd have to 
- btrfs resize down one that's bigger than I need
- LVM shrink the LV
- LVM grow the other LV
- LVM resize up the other btrfs

and I think LVM resize and btrfs resize are not linked so I have to do
them separately and hope to type the right numbers each time, correct?
(or is that easier now?)

I kind of linked the thin provisioning idea because it's hands off,
which is appealing. Any reason against it?

> You could (in theory) merge the LVM and software RAID5 layers, though that
> may make handling of the RAID5 layer a bit complicated if you choose to use
> thin provisioning (for some reason, LVM is unable to do on-line checks and
> rebuilds of RAID arrays that are acting as thin pool data or metadata).
 
Does LVM do built in raid5 now? Is it as good/trustworthy as mdadm
radi5?
But yeah, if it's incompatible with thin provisioning, it's not that
useful.

> Alternatively, you could increase your array size, remove the software RAID
> layer, and switch to using BTRFS in raid10 mode so that you could eliminate
> one of the layers, though that would probably reduce the effectiveness of
> bcache (you might want to get a bigger cache device if you do this).

Sadly that won't work. I have more data than will fit on raid10

Thanks for your suggestions though.
Still need to read up on whether I should do thin provisioning, or not.

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/   | PGP 7F55D5F27AAF9D08
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: So, does btrfs check lowmem take days? weeks?

2018-07-02 Thread Marc MERLIN
Hi Qu,

thanks for the detailled and honest answer.
A few comments inline.

On Mon, Jul 02, 2018 at 10:42:40PM +0800, Qu Wenruo wrote:
> For full, it depends. (but for most real world case, it's still flawed)
> We have small and crafted images as test cases, which btrfs check can
> repair without problem at all.
> But such images are *SMALL*, and only have *ONE* type of corruption,
> which can represent real world case at all.
 
right, they're just unittest images, I understand.

> 1) Too large fs (especially too many snapshots)
>The use case (too many snapshots and shared extents, a lot of extents
>get shared over 1000 times) is in fact a super large challenge for
>lowmem mode check/repair.
>It needs O(n^2) or even O(n^3) to check each backref, which hugely
>slow the progress and make us hard to locate the real bug.
 
So, the non lowmem version would work better, but it's a problem if it
doesn't fit in RAM.
I've always considered it a grave bug that btrfs check repair can use so
much kernel memory that it will crash the entire system. This should not
be possible.
While it won't help me here, can btrfs check be improved not to suck all
the kernel memory, and ideally even allow using swap space if the RAM is
not enough?

Is btrfs check regular mode still being maintained? I think it's still
better than lowmem, correct?

> 2) Corruption in extent tree and our objective is to mount RW
>Extent tree is almost useless if we just want to read data.
>But when we do any write, we needs it and if it goes wrong even a
>tiny bit, your fs could be damaged really badly.
> 
>For other corruption, like some fs tree corruption, we could do
>something to discard some corrupted files, but if it's extent tree,
>we either mount RO and grab anything we have, or hopes the
>almost-never-working --init-extent-tree can work (that's mostly
>miracle).
 
I understand that it's the weak point of btrfs, thanks for explaining.

> 1) Don't keep too many snapshots.
>Really, this is the core.
>For send/receive backup, IIRC it only needs the parent subvolume
>exists, there is no need to keep the whole history of all those
>snapshots.

You are correct on history. The reason I keep history is because I may
want to recover a file from last week or 2 weeks ago after I finally
notice that it's gone. 
I have terabytes of space on the backup server, so it's easier to keep
history there than on the client which may not have enough space to keep
a month's worth of history.
As you know, back when we did tape backups, we also kept history of at
least several weeks (usually several months, but that's too much for
btrfs snapshots).

>Keep the number of snapshots to minimal does greatly improve the
>possibility (both manual patch or check repair) of a successful
>repair.
>Normally I would suggest 4 hourly snapshots, 7 daily snapshots, 12
>monthly snapshots.

I actually have fewer snapshots than this per filesystem, but I backup
more than 10 filesystems.
If I used as many snapshots as you recommend, that would already be 230
snapshots for 10 filesystems :)

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/   | PGP 7F55D5F27AAF9D08
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: how to best segment a big block device in resizeable btrfs filesystems?

2018-07-02 Thread Marc MERLIN
Hi Qu,

I'll split this part into a new thread:

> 2) Don't keep unrelated snapshots in one btrfs.
>I totally understand that maintain different btrfs would hugely add
>maintenance pressure, but as explains, all snapshots share one
>fragile extent tree.

Yes, I understand that this is what I should do given what you
explained.
My main problem is knowing how to segment things so I don't end up with
filesystems that are full while others are almost empty :)

Am I supposed to put LVM thin volumes underneath so that I can share
the same single 10TB raid5?

If I do this, I would have
software raid 5 < dmcrypt < bcache < lvm < btrfs
That's a lot of layers, and that's also starting to make me nervous :)

Is there any other way that does not involve me creating smaller block
devices for multiple btrfs filesystems and hope that they are the right
size because I won't be able to change it later?

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/   | PGP 7F55D5F27AAF9D08
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: So, does btrfs check lowmem take days? weeks?

2018-07-02 Thread Marc MERLIN
On Mon, Jul 02, 2018 at 02:22:20PM +0800, Su Yue wrote:
> > Ok, that's 29MB, so it doesn't fit on pastebin:
> > http://marc.merlins.org/tmp/dshelf2_inspect.txt
> > 
> Sorry Marc. After offline communication with Qu, both
> of us think the filesystem is hard to repair.
> The filesystem is too large to debug step by step.
> Every time check and debug spent is too expensive.
> And it already costs serveral days.
> 
> Sadly, I am afarid that you have to recreate filesystem
> and reback up your data. :(
> 
> Sorry again and thanks for you reports and patient.

I appreciate your help. Honestly I only wanted to help you find why the
tools aren't working. Fixing filesystems by hand (and remotely via Email
on top of that), is way too time consuming like you said.

Is the btrfs design flawed in a way that repair tools just cannot repair
on their own? 
I understand that data can be lost, but I don't understand how the tools
just either keep crashing for me, go in infinite loops, or otherwise
fail to give me back a stable filesystem, even if some data is missing
after that.

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/   | PGP 7F55D5F27AAF9D08
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: So, does btrfs check lowmem take days? weeks?

2018-07-01 Thread Marc MERLIN
On Mon, Jul 02, 2018 at 10:02:33AM +0800, Su Yue wrote:
> Could you try follow dumps? They shouldn't cost much time.
> 
> #btrfs inspect dump-tree -t 21872  | grep -C 50 "374857 
> EXTENT_DATA "
> 
> #btrfs inspect dump-tree -t 22911  | grep -C 50 "374857 
> EXTENT_DATA "

Ok, that's 29MB, so it doesn't fit on pastebin:
http://marc.merlins.org/tmp/dshelf2_inspect.txt

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: So, does btrfs check lowmem take days? weeks?

2018-07-01 Thread Marc MERLIN
On Thu, Jun 28, 2018 at 11:43:54PM -0700, Marc MERLIN wrote:
> On Fri, Jun 29, 2018 at 02:32:44PM +0800, Su Yue wrote:
> > > > https://github.com/Damenly/btrfs-progs/tree/tmp1
> > > 
> > > Not sure if I undertand that you meant, here.
> > > 
> > Sorry for my unclear words.
> > Simply speaking, I suggest you to stop current running check.
> > Then, clone above branch to compile binary then run
> > 'btrfs check --mode=lowmem $dev'.
>  
> I understand, I'll build and try it.
> 
> > > This filesystem is trash to me and will require over a week to rebuild
> > > manually if I can't repair it.
> > 
> > Understood your anxiety, a log of check without '--repair' will help
> > us to make clear what's wrong with your filesystem.
> 
> Ok, I'll run your new code without repair and report back. It will
> likely take over a day though.

Well, it got stuck for over a day, and then I had to reboot :(

saruman:/var/local/src/btrfs-progs.sy# git remote -v
origin  https://github.com/Damenly/btrfs-progs.git (fetch)
origin  https://github.com/Damenly/btrfs-progs.git (push)
saruman:/var/local/src/btrfs-progs.sy# git branch
  master
* tmp1
saruman:/var/local/src/btrfs-progs.sy# git pull
Already up to date.
saruman:/var/local/src/btrfs-progs.sy# make
Making all in Documentation
make[1]: Nothing to be done for 'all'.

However, it still got stuck here:
gargamel:~# btrfs check --mode=lowmem  -p /dev/mapper/dshelf2   
Checking filesystem on /dev/mapper/dshelf2  
UUID: 0f1a0c9f-4e54-4fa7-8736-fd50818ff73d  
ERROR: extent[84302495744, 69632] referencer count mismatch (root: 21872, 
owner: 374857, offset: 3407872) wanted: 2
have: 3  
ERROR: extent[84302495744, 69632] referencer count mismatch (root: 22911, 
owner: 374857, offset: 3407872) wanted: 2
have: 4  
ERROR: extent[125712527360, 12214272] referencer count mismatch (root: 21872, 
owner: 374857, offset: 114540544) wan
d: 180, have: 181  
ERROR: extent[125730848768, 5111808] referencer count mismatch (root: 21872, 
owner: 374857, offset: 126754816) want
: 67, have: 68  
ERROR: extent[125730848768, 5111808] referencer count mismatch (root: 22911, 
owner: 374857, offset: 126754816) want
: 67, have: 115  
ERROR: extent[125736914944, 6037504] referencer count mismatch (root: 21872, 
owner: 374857, offset: 131866624) want
: 114, have: 115  
ERROR: extent[125736914944, 6037504] referencer count mismatch (root: 22911, 
owner: 374857, offset: 131866624) want
: 114, have: 143  
ERROR: extent[129952120832, 20242432] referencer count mismatch (root: 21872, 
owner: 374857, offset: 148234240) wan
d: 301, have: 302  
ERROR: extent[129952120832, 20242432] referencer count mismatch (root: 22911, 
owner: 374857, offset: 148234240) wan
d: 355, have: 433  
ERROR: extent[134925357056, 11829248] referencer count mismatch (root: 21872, 
owner: 374857, offset: 180371456) wan
d: 160, have: 161  
ERROR: extent[134925357056, 11829248] referencer count mismatch (root: 22911, 
owner: 374857, offset: 180371456) wan
d: 161, have: 240  
ERROR: extent[147895111680, 12345344] referencer count mismatch (root: 21872, 
owner: 374857, offset: 192200704) wan
d: 169, have: 170  
ERROR: extent[147895111680, 12345344] referencer count mismatch (root: 22911, 
owner: 374857, offset: 192200704) wan
d: 171, have: 251  
ERROR: extent[150850146304, 17522688] referencer count mismatch (root: 21872, 
owner: 374857, offset: 217653248) wan
d: 347, have: 348  
ERROR: extent[156909494272, 55320576] referencer count mismatch (root: 22911, 
owner: 374857, offset: 235175936) wan
d: 1, have: 1449  
ERROR: extent[156909494272, 55320576] referencer count mismatch (root: 21872, 
owner: 374857, offset: 235175936) wan
d: 1, have: 556  

What should I try next?

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/   | PGP 7F55D5F27AAF9D08
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs check of a raid0?

2018-07-01 Thread Marc MERLIN
On Sun, Jul 01, 2018 at 01:15:09PM -0600, Chris Murphy wrote:
> > How is it supposed to work when you have multiple devices for a btrfs
> > filesystem?
> >
> > gargamel:~# btrfs check --repair -p /dev/bcache2
> > enabling repair mode
> > ERROR: mount check: cannot open /dev/bcache2: No such device or address
> > ERROR: could not check mount status: No such device or address
> > gargamel:~# btrfs check --repair -p /dev/bcache3
> > enabling repair mode
> > ERROR: cannot open device '/dev/bcache3': Device or resource busy
> > ERROR: cannot open file system
> >
> > [205248.299528] BTRFS info (device bcache3): disk space caching is enabled
> > [205248.320335] BTRFS error (device bcache3): Remounting read-write after 
> > error is not allowed
> 
> If it's successfully unmounted, I don't understand the error messages
> that it can't be opened. Is umount hung? Sounds to me like btrfs check
> thinks it's still mounted.

I spent more time on this and apparently because the underlying device
had a hardware fault (fell off the bus), its dmcrpyt device is still
there but not working.
In turn, I can't dmsetup rm it because it's in use by bcache which
didn't free it, but bcache won't let me free it because it got removed.
So, I'm stuck with a reboot in the end, oh well...

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/   | PGP 7F55D5F27AAF9D08
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


btrfs check of a raid0?

2018-07-01 Thread Marc MERLIN
Howdy,

I have a btrfs filesystem made out of 2 devices:
[   75.141414] BTRFS: device label btrfs_space devid 1 transid 429220 
/dev/bcache3
[   75.164745] BTRFS: device label btrfs_space devid 2 transid 429220 
/dev/bcache2

One of the 2 devices had a hardware error (not btrfs' fault):
[201504.939659] BTRFS error (device bcache3): bdev /dev/bcache2 errs: wr 552, 
rd 39, flush 1, corrupt 0, gen 0
[201504.995967] BTRFS warning (device bcache3): bcache3 checksum verify failed 
on 38976 wanted F3019EEA found E6A97DC4 level 0
[201505.032209] BTRFS error (device bcache3): bdev /dev/bcache2 errs: wr 552, 
rd 40, flush 1, corrupt 0, gen 0
[201505.062447] BTRFS error (device bcache3): parent transid verify failed on 
38976 wanted 434763 found 434245
[201600.262142] BTRFS error (device bcache3): bdev /dev/bcache2 errs: wr 552, 
rd 41, flush 1, corrupt 0, gen 0

I unmounted it, and I'm trying to check the filesystem now.

How is it supposed to work when you have multiple devices for a btrfs
filesystem?

gargamel:~# btrfs check --repair -p /dev/bcache2 
enabling repair mode
ERROR: mount check: cannot open /dev/bcache2: No such device or address
ERROR: could not check mount status: No such device or address
gargamel:~# btrfs check --repair -p /dev/bcache3
enabling repair mode
ERROR: cannot open device '/dev/bcache3': Device or resource busy
ERROR: cannot open file system

[205248.299528] BTRFS info (device bcache3): disk space caching is enabled
[205248.320335] BTRFS error (device bcache3): Remounting read-write after error 
is not allowed

Yes, rebooting should likely get around the problem, but I'd rather not
reboot, I have long running stuff I would rather not stop.

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/   | PGP 7F55D5F27AAF9D08
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Incremental send/receive broken after snapshot restore

2018-06-30 Thread Marc MERLIN
Sorry that I missed the beginning of this discussion, but I think this is
what I documented here after hitting hte same problem:
http://marc.merlins.org/perso/btrfs/post_2018-03-09_Btrfs-Tips_-Rescuing-A-Btrfs-Send-Receive-Relationship.html

Marc

On Sun, Jul 01, 2018 at 01:03:37AM +0200, Hannes Schweizer wrote:
> On Sat, Jun 30, 2018 at 10:02 PM Andrei Borzenkov  wrote:
> >
> > 30.06.2018 21:49, Andrei Borzenkov пишет:
> > > 30.06.2018 20:49, Hannes Schweizer пишет:
> > ...
> > >>
> > >> I've tested a few restore methods beforehand, and simply creating a
> > >> writeable clone from the restored snapshot does not work for me, eg:
> > >> # create some source snapshots
> > >> btrfs sub create test_root
> > >> btrfs sub snap -r test_root test_snap1
> > >> btrfs sub snap -r test_root test_snap2
> > >>
> > >> # send a full and incremental backup to external disk
> > >> btrfs send test_snap2 | btrfs receive /run/media/schweizer/external
> > >> btrfs sub snap -r test_root test_snap3
> > >> btrfs send -c test_snap2 test_snap3 | btrfs receive
> > >> /run/media/schweizer/external
> > >>
> > >> # simulate disappearing source
> > >> btrfs sub del test_*
> > >>
> > >> # restore full snapshot from external disk
> > >> btrfs send /run/media/schweizer/external/test_snap3 | btrfs receive .
> > >>
> > >> # create writeable clone
> > >> btrfs sub snap test_snap3 test_root
> > >>
> > >> # try to continue with backup scheme from source to external
> > >> btrfs sub snap -r test_root test_snap4
> > >>
> > >> # this fails!!
> > >> btrfs send -c test_snap3 test_snap4 | btrfs receive
> > >> /run/media/schweizer/external
> > >> At subvol test_snap4
> > >> ERROR: parent determination failed for 2047
> > >> ERROR: empty stream is not considered valid
> > >>
> > >
> > > Yes, that's expected. Incremental stream always needs valid parent -
> > > this will be cloned on destination and incremental changes applied to
> > > it. "-c" option is just additional sugar on top of it which might reduce
> > > size of stream, but in this case (i.e. without "-p") it also attempts to
> > > guess parent subvolume for test_snap4 and this fails because test_snap3
> > > and test_snap4 do not have common parent so test_snap3 is rejected as
> > > valid parent snapshot. You can restart incremental-forever chain by
> > > using explicit "-p" instead:
> > >
> > > btrfs send -p test_snap3 test_snap4
> > >
> > > Subsequent snapshots (test_snap5 etc) will all have common parent with
> > > immediate predecessor again so "-c" will work.
> > >
> > > Note that technically "btrfs send" with single "-c" option is entirely
> > > equivalent to "btrfs -p". Using "-p" would have avoided this issue. :)
> > > Although this implicit check for common parent may be considered a good
> > > thing in this case.
> > >
> > > P.S. looking at the above, it probably needs to be in manual page for
> > > btrfs-send. It took me quite some time to actually understand the
> > > meaning of "-p" and "-c" and behavior if they are present.
> > >
> > ...
> > >>
> > >> Is there some way to reset the received_uuid of the following snapshot
> > >> on online?
> > >> ID 258 gen 13742 top level 5 parent_uuid -
> > >>received_uuid 6c683d90-44f2-ad48-bb84-e9f241800179 uuid
> > >> 46db1185-3c3e-194e-8d19-7456e532b2f3 path diablo
> > >>
> > >
> > > There is no "official" tool but this question came up quite often.
> > > Search this list, I believe recently one-liner using python-btrfs was
> > > posted. Note that also patch that removes received_uuid when "ro"
> > > propery is removed was suggested, hopefully it will be merged at some
> > > point. Still I personally consider ability to flip read-only property
> > > the very bad thing that should have never been exposed in the first place.
> > >
> >
> > Note that if you remove received_uuid (explicitly or - in the future -
> > implicitly) you will not be able to restart incremental send anymore.
> > Without received_uuid there will be no way to match source test_snap3
> > with destination test_snap3. So you *must* preserve it and start with
> > writable clone.
> >
> > received_uuid is misnomer. I wish it would be named "content_uuid" or
> > "snap_uuid" with semantic
> >
> > 1. When read-only snapshot of writable volume is created, content_uuid
> > is initialized
> >
> > 2. Read-only snapshot of read-only snapshot inherits content_uuid
> >
> > 3. destination of "btrfs send" inherits content_uuid
> >
> > 4. writable snapshot of read-only snapshot clears content_uuid
> >
> > 5. clearing read-only property clears content_uuid
> >
> > This would make it more straightforward to cascade and restart
> > replication by having single subvolume property to match against.
> 
> Indeed, the current terminology is a bit confusing, and the patch
> removing the received_uuid when manually switching ro to false should
> definitely be merged. As recommended, I'll simply create a writeable
> clone of the restored snapshot and use -p instead of -c when restoring
> again 

Re: So, does btrfs check lowmem take days? weeks?

2018-06-30 Thread Marc MERLIN
On Sat, Jun 30, 2018 at 10:49:07PM +0800, Qu Wenruo wrote:
> But the last abort looks pretty possible to be the culprit.
> 
> Would you try to dump the extent tree?
> # btrfs inspect dump-tree -t extent  | grep -A50 156909494272

Sure, there you go:

item 25 key (156909494272 EXTENT_ITEM 55320576) itemoff 14943 itemsize 
24
refs 19715 gen 31575 flags DATA
item 26 key (156909494272 EXTENT_DATA_REF 571620086735451015) itemoff 
14915 itemsize 28
extent data backref root 21641 objectid 374857 offset 235175936 
count 1452
item 27 key (156909494272 EXTENT_DATA_REF 1765833482087969671) itemoff 
14887 itemsize 28
extent data backref root 23094 objectid 374857 offset 235175936 
count 1442
item 28 key (156909494272 EXTENT_DATA_REF 1807626434455810951) itemoff 
14859 itemsize 28
extent data backref root 21503 objectid 374857 offset 235175936 
count 1454
item 29 key (156909494272 EXTENT_DATA_REF 1879818091602916231) itemoff 
14831 itemsize 28
extent data backref root 21462 objectid 374857 offset 235175936 
count 1454
item 30 key (156909494272 EXTENT_DATA_REF 3610854505775117191) itemoff 
14803 itemsize 28
extent data backref root 23134 objectid 374857 offset 235175936 
count 1442
item 31 key (156909494272 EXTENT_DATA_REF 3754675454231458695) itemoff 
14775 itemsize 28
extent data backref root 23052 objectid 374857 offset 235175936 
count 1442
item 32 key (156909494272 EXTENT_DATA_REF 5060494667839714183) itemoff 
14747 itemsize 28
extent data backref root 23174 objectid 374857 offset 235175936 
count 1440
item 33 key (156909494272 EXTENT_DATA_REF 5476627808561673095) itemoff 
14719 itemsize 28
extent data backref root 22911 objectid 374857 offset 235175936 
count 1
item 34 key (156909494272 EXTENT_DATA_REF 6378484416458011527) itemoff 
14691 itemsize 28
extent data backref root 23012 objectid 374857 offset 235175936 
count 1442
item 35 key (156909494272 EXTENT_DATA_REF 7338474132555182983) itemoff 
14663 itemsize 28
extent data backref root 21872 objectid 374857 offset 235175936 
count 1
item 36 key (156909494272 EXTENT_DATA_REF 7516565391717970823) itemoff 
14635 itemsize 28
extent data backref root 21826 objectid 374857 offset 235175936 
count 1452
item 37 key (156909494272 SHARED_DATA_REF 14871537025024) itemoff 14631 
itemsize 4
shared data backref count 10
item 38 key (156909494272 SHARED_DATA_REF 14871617568768) itemoff 14627 
itemsize 4
shared data backref count 73
item 39 key (156909494272 SHARED_DATA_REF 14871619846144) itemoff 14623 
itemsize 4
shared data backref count 59
item 40 key (156909494272 SHARED_DATA_REF 14871623270400) itemoff 14619 
itemsize 4
shared data backref count 68
item 41 key (156909494272 SHARED_DATA_REF 14871623532544) itemoff 14615 
itemsize 4
shared data backref count 70
item 42 key (156909494272 SHARED_DATA_REF 14871626383360) itemoff 14611 
itemsize 4
shared data backref count 76
item 43 key (156909494272 SHARED_DATA_REF 14871635132416) itemoff 14607 
itemsize 4
shared data backref count 60
item 44 key (156909494272 SHARED_DATA_REF 14871649533952) itemoff 14603 
itemsize 4
shared data backref count 79
item 45 key (156909494272 SHARED_DATA_REF 14871862378496) itemoff 14599 
itemsize 4
shared data backref count 70
item 46 key (156909494272 SHARED_DATA_REF 14909667098624) itemoff 14595 
itemsize 4
shared data backref count 72
item 47 key (156909494272 SHARED_DATA_REF 14909669720064) itemoff 14591 
itemsize 4
shared data backref count 58
item 48 key (156909494272 SHARED_DATA_REF 14909734567936) itemoff 14587 
itemsize 4
shared data backref count 73
item 49 key (156909494272 SHARED_DATA_REF 14909920477184) itemoff 14583 
itemsize 4
shared data backref count 79
item 50 key (156909494272 SHARED_DATA_REF 14942279335936) itemoff 14579 
itemsize 4
shared data backref count 79
item 51 key (156909494272 SHARED_DATA_REF 14942304862208) itemoff 14575 
itemsize 4
shared data backref count 72
item 52 key (156909494272 SHARED_DATA_REF 14942348378112) itemoff 14571 
itemsize 4
shared data backref count 67
item 53 key (156909494272 SHARED_DATA_REF 14942366138368) itemoff 14567 
itemsize 4
shared data backref count 51
item 54 key (156909494272 SHARED_DATA_REF 14942384799744) itemoff 14563 
itemsize 4
shared data backref count 64
item 55 key (156909494272 SHARED_DATA_REF 14978234613760) 

Re: So, does btrfs check lowmem take days? weeks?

2018-06-29 Thread Marc MERLIN
Well, there goes that. After about 18H:
ERROR: extent[156909494272, 55320576] referencer count mismatch (root: 21872, 
owner: 374857, offset: 235175936) wanted: 1, have: 1452 
backref.c:466: __add_missing_keys: Assertion `ref->root_id` failed, value 0 
btrfs(+0x3a232)[0x56091704f232] 
btrfs(+0x3ab46)[0x56091704fb46] 
btrfs(+0x3b9f5)[0x5609170509f5] 
btrfs(btrfs_find_all_roots+0x9)[0x560917050a45] 
btrfs(+0x572ff)[0x56091706c2ff] 
btrfs(+0x60b13)[0x560917075b13] 
btrfs(cmd_check+0x2634)[0x56091707d431] 
btrfs(main+0x88)[0x560917027260] 
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf1)[0x7f93aa508561] 
btrfs(_start+0x2a)[0x560917026dfa] 
Aborted 

That's https://github.com/Damenly/btrfs-progs.git

Whoops, I didn't use the tmp1 branch, let me try again with that and
report back, although the problem above is still going to be there since
I think the only difference will be this, correct?
https://github.com/Damenly/btrfs-progs/commit/b5851513a12237b3e19a3e71f3ad00b966d25b3a

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/   | PGP 7F55D5F27AAF9D08
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: So, does btrfs check lowmem take days? weeks?

2018-06-29 Thread Marc MERLIN
On Fri, Jun 29, 2018 at 12:28:31AM -0700, Marc MERLIN wrote:
> So, I rebooted, and will now run Su's btrfs check without repair and
> report back.

As expected, it will likely still take days, here's the start:

gargamel:~# btrfs check --mode=lowmem  -p /dev/mapper/dshelf2  
Checking filesystem on /dev/mapper/dshelf2 
UUID: 0f1a0c9f-4e54-4fa7-8736-fd50818ff73d 
ERROR: extent[84302495744, 69632] referencer count mismatch (root: 21872, 
owner: 374857, offset: 3407872) wanted: 2, have: 4
ERROR: extent[84302495744, 69632] referencer count mismatch (root: 22911, 
owner: 374857, offset: 3407872) wanted: 2, have: 4
ERROR: extent[125712527360, 12214272] referencer count mismatch (root: 21872, 
owner: 374857, offset: 114540544) wanted: 180, have: 240
ERROR: extent[125730848768, 5111808] referencer count mismatch (root: 21872, 
owner: 374857, offset: 126754816) wanted: 67, have: 115
ERROR: extent[125730848768, 5111808] referencer count mismatch (root: 22911, 
owner: 374857, offset: 126754816) wanted: 67, have: 115
ERROR: extent[125736914944, 6037504] referencer count mismatch (root: 21872, 
owner: 374857, offset: 131866624) wanted: 114, have: 143
ERROR: extent[125736914944, 6037504] referencer count mismatch (root: 22911, 
owner: 374857, offset: 131866624) wanted: 114, have: 143
ERROR: extent[129952120832, 20242432] referencer count mismatch (root: 21872, 
owner: 374857, offset: 148234240) wanted: 301, have: 431
ERROR: extent[129952120832, 20242432] referencer count mismatch (root: 22911, 
owner: 374857, offset: 148234240) wanted: 355, have: 433
ERROR: extent[134925357056, 11829248] referencer count mismatch (root: 21872, 
owner: 374857, offset: 180371456) wanted: 160, have: 240
ERROR: extent[134925357056, 11829248] referencer count mismatch (root: 22911, 
owner: 374857, offset: 180371456) wanted: 161, have: 240
ERROR: extent[147895111680, 12345344] referencer count mismatch (root: 21872, 
owner: 374857, offset: 192200704) wanted: 169, have: 249
ERROR: extent[147895111680, 12345344] referencer count mismatch (root: 22911, 
owner: 374857, offset: 192200704) wanted: 171, have: 251
ERROR: extent[150850146304, 17522688] referencer count mismatch (root: 21872, 
owner: 374857, offset: 217653248) wanted: 347, have: 418
ERROR: extent[156909494272, 55320576] referencer count mismatch (root: 22911, 
owner: 374857, offset: 235175936) wanted: 1, have: 1449
ERROR: extent[156909494272, 55320576] referencer count mismatch (root: 21872, 
owner: 374857, offset: 235175936) wanted: 1, have: 1452

Mmmh, these look similar (but not identical) to the last run earlier in this 
thread:
ERROR: extent[84302495744, 69632] referencer count mismatch (root: 21872, 
owner: 374857, offset: 3407872) wanted: 3, have: 4
Created new chunk [18457780224000 1073741824]
Delete backref in extent [84302495744 69632]
ERROR: extent[84302495744, 69632] referencer count mismatch (root: 22911, 
owner: 374857, offset: 3407872) wanted: 3, have: 4
Delete backref in extent [84302495744 69632]
ERROR: extent[125712527360, 12214272] referencer count mismatch (root: 21872, 
owner: 374857, offset: 114540544) wanted: 181, have: 240
Delete backref in extent [125712527360 12214272]
ERROR: extent[125730848768, 5111808] referencer count mismatch (root: 21872, 
owner: 374857, offset: 126754816) wanted: 68, have: 115
Delete backref in extent [125730848768 5111808]
ERROR: extent[125730848768, 5111808] referencer count mismatch (root: 22911, 
owner: 374857, offset: 126754816) wanted: 68, have: 115
Delete backref in extent [125730848768 5111808]
ERROR: extent[125736914944, 6037504] referencer count mismatch (root: 21872, 
owner: 374857, offset: 131866624) wanted: 115, have: 143
Delete backref in extent [125736914944 6037504]
ERROR: extent[125736914944, 6037504] referencer count mismatch (root: 22911, 
owner: 374857, offset: 131866624) wanted: 115, have: 143
Delete backref in extent [125736914944 6037504]
ERROR: extent[129952120832, 20242432] referencer count mismatch (root: 21872, 
owner: 374857, offset: 148234240) wanted: 302, have: 431
Delete backref in extent [129952120832 20242432]
ERROR: extent[129952120832, 20242432] referencer count mismatch (root: 22911, 
owner: 374857, offset: 148234240) wanted: 356, have: 433
Delete backref in extent [129952120832 20242432]
ERROR: extent[134925357056, 11829248] referencer count mismatch (root: 21872, 
owner: 374857, offset: 180371456) wanted: 161, have: 240
Delete backref in extent [134925357056 11829248]
ERROR: extent[134925357056, 11829248] referencer count mismatch (root: 22911, 
owner: 374857, offset: 180371456) wanted: 162, have: 240
Delete backref in extent [134925357056 11829248]
ERROR: extent[147895111680, 12345344] referencer count mismatch (root: 21872, 
owner: 374857, offset: 192200704) wanted: 170, have: 249
Delete backref in extent [147895111680 12345344]
ERROR: extent[147895111680, 12345344] referencer count mismatch (root: 22911, 
owner: 374857, offset: 192200704) wanted: 172, have: 251
Delete backref in 

Re: btrfs send/receive vs rsync

2018-06-29 Thread Marc MERLIN
On Fri, Jun 29, 2018 at 10:04:02AM +0200, Lionel Bouton wrote:
> Hi,
> 
> On 29/06/2018 09:22, Marc MERLIN wrote:
> > On Fri, Jun 29, 2018 at 12:09:54PM +0500, Roman Mamedov wrote:
> >> On Thu, 28 Jun 2018 23:59:03 -0700
> >> Marc MERLIN  wrote:
> >>
> >>> I don't waste a week recreating the many btrfs send/receive relationships.
> >> Consider not using send/receive, and switching to regular rsync instead.
> >> Send/receive is very limiting and cumbersome, including because of what you
> >> described. And it doesn't gain you much over an incremental rsync. As for
> > Err, sorry but I cannot agree with you here, at all :)
> >
> > btrfs send/receive is pretty much the only reason I use btrfs. 
> > rsync takes hours on big filesystems scanning every single inode on both
> > sides and then seeing what changed, and only then sends the differences
> > It's super inefficient.
> > btrfs send knows in seconds what needs to be sent, and works on it right
> > away.
> 
> I've not yet tried send/receive but I feel the pain of rsyncing millions
> of files (I had to use lsyncd to limit the problem to the time the
> origin servers reboot which is a relatively rare event) so this thread
> picked my attention. Looking at the whole thread I wonder if you could
> get a more manageable solution by splitting the filesystem.

So, let's be clear. I did backups with rsync for 10+ years. It was slow
and painful. On my laptop an hourly rsync between 2 drives slowed down
my machine to a crawl while everything was being stat'ed, it took
forever.
Now with btrfs send/receive, it just works, I don't even see it
happening in the background.

Here is a page I wrote about it in 2014:
http://marc.merlins.org/perso/btrfs/2014-03.html#Btrfs-Tips_-Doing-Fast-Incremental-Backups-With-Btrfs-Send-and-Receive

Here is a talk I gave in 2014 too, scroll to the bottom of the page, and
the bottom of the talk outline:
http://marc.merlins.org/perso/btrfs/2014-05.html#My-Btrfs-Talk-at-Linuxcon-JP-2014
and click on 'Btrfs send/receive'

> If instead of using a single BTRFS filesystem you used LVM volumes
> (maybe with Thin provisioning and monitoring of the volume group free
> space) for each of your servers to backup with one BTRFS filesystem per
> volume you would have less snapshots per filesystem and isolate problems
> in case of corruption. If you eventually decide to start from scratch
> again this might help a lot in your case.

So, I already have problems due to too many block layers:
- raid 5 + ssd
- bcache
- dmcrypt
- btrfs

I get occasional deadlocks due to upper layers sending more data to the
lower layer (bcache) than it can process. I'm a bit warry of adding yet
another layer (LVM), but you're otherwise correct than keeping smaller
btrfs filesystems would help with performance and containing possible
damage.

Has anyone actually done this? :)

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/   | PGP 7F55D5F27AAF9D08
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: So, does btrfs check lowmem take days? weeks?

2018-06-29 Thread Marc MERLIN
On Fri, Jun 29, 2018 at 03:20:42PM +0800, Qu Wenruo wrote:
> If certain btrfs specific operations are involved, it's definitely not OK:
> 1) Balance
> 2) Quota
> 3) Btrfs check

Ok, I understand. I'll try to balance almost never then. My problems did
indeed start because I ran balance and it got stuck 2 days with 0
progress.
That still seems like a bug though. I'm ok with slow, but stuck for 2
days with only 270 snapshots or so means there is a bug, or the
algorithm is so expensive that 270 snapshots could cause it to take days
or weeks to proceed?

> > It's a backup server, it only contains data from other machines.
> > If the filesystem cannot be recovered to a working state, I will need
> > over a week to restart the many btrfs send commands from many servers.
> > This is why anything other than --repair is useless ot me, I don't need
> > the data back, it's still on the original machines, I need the
> > filesystem to work again so that I don't waste a week recreating the
> > many btrfs send/receive relationships.
> 
> Now totally understand why you need to repair the fs.

I also understand that my use case is atypical :)
But I guess this also means that using btrfs for a lot of send/receive
on a backup server is not going to work well unfortunately :-/

Now I'm wondering if I'm the only person even doing this.

> > Does the pastebin help and is 270 snapshots ok enough?
> 
> The super dump doesn't show anything wrong.
> 
> So the problem may be in the super large extent tree.
> 
> In this case, plain check result with Su's patch would help more, other
> than the not so interesting super dump.

First I tried to mount with skip balance after the partial repair, and
it hung a long time:
[445635.716318] BTRFS info (device dm-2): disk space caching is enabled
[445635.736229] BTRFS info (device dm-2): has skinny extents
[445636.101999] BTRFS info (device dm-2): bdev /dev/mapper/dshelf2 errs: wr 0, 
rd 0, flush 0, corrupt 2, gen 0
[445825.053205] BTRFS info (device dm-2): enabling ssd optimizations
[446511.006588] BTRFS info (device dm-2): disk space caching is enabled
[446511.026737] BTRFS info (device dm-2): has skinny extents
[446511.325470] BTRFS info (device dm-2): bdev /dev/mapper/dshelf2 errs: wr 0, 
rd 0, flush 0, corrupt 2, gen 0
[446699.593501] BTRFS info (device dm-2): enabling ssd optimizations
[446964.077045] INFO: task btrfs-transacti:9211 blocked for more than 120 
seconds.
[446964.099802]   Not tainted 4.17.2-amd64-preempt-sysrq-20180818 #3
[446964.120004] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.

So, I rebooted, and will now run Su's btrfs check without repair and
report back.

Thanks both for your help.

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/   | PGP 7F55D5F27AAF9D08
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: So, does btrfs check lowmem take days? weeks?

2018-06-29 Thread Marc MERLIN
On Fri, Jun 29, 2018 at 12:09:54PM +0500, Roman Mamedov wrote:
> On Thu, 28 Jun 2018 23:59:03 -0700
> Marc MERLIN  wrote:
> 
> > I don't waste a week recreating the many btrfs send/receive relationships.
> 
> Consider not using send/receive, and switching to regular rsync instead.
> Send/receive is very limiting and cumbersome, including because of what you
> described. And it doesn't gain you much over an incremental rsync. As for

Err, sorry but I cannot agree with you here, at all :)

btrfs send/receive is pretty much the only reason I use btrfs. 
rsync takes hours on big filesystems scanning every single inode on both
sides and then seeing what changed, and only then sends the differences
It's super inefficient.
btrfs send knows in seconds what needs to be sent, and works on it right
away.

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/   | PGP 7F55D5F27AAF9D08
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: So, does btrfs check lowmem take days? weeks?

2018-06-29 Thread Marc MERLIN
On Fri, Jun 29, 2018 at 02:29:10PM +0800, Qu Wenruo wrote:
> > If --repair doesn't work, check is useless to me sadly.
> 
> Not exactly.
> Although it's time consuming, I have manually patched several users fs,
> which normally ends pretty well.
 
Ok I understand now.

> > Agreed, I doubt I have over or much over 100 snapshots though (but I
> > can't check right now).
> > Sadly I'm not allowed to mount even read only while check is running:
> > gargamel:~# mount -o ro /dev/mapper/dshelf2 /mnt/mnt2
> > mount: /dev/mapper/dshelf2 already mounted or /mnt/mnt2 busy

Ok, so I just checked now, 270 snapshots, but not because I'm crazy,
because I use btrfs send a lot :)

> This looks like super block corruption?
> 
> What about "btrfs inspect dump-super -fFa /dev/mapper/dshelf2"?

Sure, there you go: https://pastebin.com/uF1pHTsg

> And what about "skip_balance" mount option?
 
I have this in my fstab :)

> Another problem is, with so many snapshots, balance is also hugely
> slowed, thus I'm not 100% sure if it's really a hang.

I sent another thread about this last week, balance got hung after 2
days of doing nothing and just moving a single chunk.

Ok, I was able to remount the filesystem read only. I was wrong, I have
270 snapshots:
gargamel:/mnt/mnt# btrfs subvolume list . | grep -c 'path backup/'
74
gargamel:/mnt/mnt# btrfs subvolume list . | grep -c 'path backup-btrfssend/'
196

It's a backup server, I use btrfs send for many machines and for each btrs
send, I keep history, maybe 10 or so backups. So it adds up in the end.

Is btrfs unable to deal with this well enough?

> If for that usage, btrfs-restore would fit your use case more,
> Unfortunately it needs extra disk space and isn't good at restoring
> subvolume/snapshots.
> (Although it's much faster than repairing the possible corrupted extent
> tree)

It's a backup server, it only contains data from other machines.
If the filesystem cannot be recovered to a working state, I will need
over a week to restart the many btrfs send commands from many servers.
This is why anything other than --repair is useless ot me, I don't need
the data back, it's still on the original machines, I need the
filesystem to work again so that I don't waste a week recreating the
many btrfs send/receive relationships.

> > Is that possible at all?
> 
> At least for file recovery (fs tree repair), we have such behavior.
> 
> However, the problem you hit (and a lot of users hit) is all about
> extent tree repair, which doesn't even goes to file recovery.
> 
> All the hassle are in extent tree, and for extent tree, it's just good
> or bad. Any corruption in extent tree may lead to later bugs.
> The only way to avoid extent tree problems is to mount the fs RO.
> 
> So, I'm afraid it is at least impossible for recent years.

Understood, thanks for answering.

Does the pastebin help and is 270 snapshots ok enough?

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/   | PGP 7F55D5F27AAF9D08
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: So, does btrfs check lowmem take days? weeks?

2018-06-29 Thread Marc MERLIN
On Fri, Jun 29, 2018 at 02:32:44PM +0800, Su Yue wrote:
> > > https://github.com/Damenly/btrfs-progs/tree/tmp1
> > 
> > Not sure if I undertand that you meant, here.
> > 
> Sorry for my unclear words.
> Simply speaking, I suggest you to stop current running check.
> Then, clone above branch to compile binary then run
> 'btrfs check --mode=lowmem $dev'.
 
I understand, I'll build and try it.

> > This filesystem is trash to me and will require over a week to rebuild
> > manually if I can't repair it.
> 
> Understood your anxiety, a log of check without '--repair' will help
> us to make clear what's wrong with your filesystem.

Ok, I'll run your new code without repair and report back. It will
likely take over a day though.

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/   | PGP 7F55D5F27AAF9D08
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: So, does btrfs check lowmem take days? weeks?

2018-06-29 Thread Marc MERLIN
On Fri, Jun 29, 2018 at 02:02:19PM +0800, Su Yue wrote:
> I have figured out the bug is lowmem check can't deal with shared tree block
> in reloc tree. The fix is simple, you can try the follow repo:
> 
> https://github.com/Damenly/btrfs-progs/tree/tmp1

Not sure if I undertand that you meant, here.

> Please run lowmem check "without =--repair" first to be sure whether
> your filesystem is fine.
 
The filesystem is not fine, it caused btrfs balance to hang, whether
balance actually broke it further or caused the breakage, I can't say.

Then mount hangs, even with recovery, unless I use ro.

This filesystem is trash to me and will require over a week to rebuild
manually if I can't repair it.
Running check without repair for likely several days just to know that
my filesystem is not clear (I already know this) isn't useful :)
Or am I missing something?

> Though the bug and phenomenon are clear enough, before sending my patch,
> I have to make a test image. I have spent a week to study btrfs balance
> but it seems a liitle hard for me.

thanks for having a look, either way.

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/   | PGP 7F55D5F27AAF9D08
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: So, does btrfs check lowmem take days? weeks?

2018-06-29 Thread Marc MERLIN
On Fri, Jun 29, 2018 at 01:48:17PM +0800, Qu Wenruo wrote:
> Just normal btrfs check, and post the output.
> If normal check eats up all your memory, btrfs check --mode=lowmem.
 
Does check without --repair eat less RAM?

> --repair should be considered as the last method.

If --repair doesn't work, check is useless to me sadly. I know that for
FS analysis and bug reporting, you want to have the FS without changing
it to something maybe worse, but for my use, if it can't be mounted and
can't be fixed, then it gets deleted which is even worse than check
doing the wrong thing.

> > The last two ERROR lines took over a day to get generated, so I'm not sure 
> > if it's still working, but just slowly.
> 
> OK, that explains something.
> 
> One extent is referred hundreds times, no wonder it will take a long time.
> 
> Just one tip here, there are really too many snapshots/reflinked files.
> It's highly recommended to keep the number of snapshots to a reasonable
> number (lower two digits).
> Although btrfs snapshot is super fast, it puts a lot of pressure on its
> extent tree, so there is no free lunch here.
 
Agreed, I doubt I have over or much over 100 snapshots though (but I
can't check right now).
Sadly I'm not allowed to mount even read only while check is running:
gargamel:~# mount -o ro /dev/mapper/dshelf2 /mnt/mnt2
mount: /dev/mapper/dshelf2 already mounted or /mnt/mnt2 busy

> > I see. Is there any reasonably easy way to check on this running process?
> 
> GDB attach would be good.
> Interrupt and check the inode number if it's checking fs tree.
> Check the extent bytenr number if it's checking extent tree.
> 
> But considering how many snapshots there are, it's really hard to determine.
> 
> In this case, the super large extent tree is causing a lot of problem,
> maybe it's a good idea to allow btrfs check to skip extent tree check?

I only see --init-extent-tree in the man page, which option did you have
in mind?

> > Then again, maybe it already fixed enough that I can mount my filesystem 
> > again.
> 
> This needs the initial btrfs check report and the kernel messages how it
> fails to mount.

mount command hangs, kernel does not show anything special outside of disk 
access hanging.

Jun 23 17:23:26 gargamel kernel: [  341.802696] BTRFS warning (device dm-2): 
'recovery' is deprecated, use 'useback
uproot' instead
Jun 23 17:23:26 gargamel kernel: [  341.828743] BTRFS info (device dm-2): 
trying to use backup root at mount time
Jun 23 17:23:26 gargamel kernel: [  341.850180] BTRFS info (device dm-2): disk 
space caching is enabled
Jun 23 17:23:26 gargamel kernel: [  341.869014] BTRFS info (device dm-2): has 
skinny extents
Jun 23 17:23:26 gargamel kernel: [  342.206289] BTRFS info (device dm-2): bdev 
/dev/mapper/dshelf2 errs: wr 0, rd 0 , flush 0, corrupt 2, gen 0
Jun 23 17:26:26 gargamel kernel: [  521.571392] BTRFS info (device dm-2): 
enabling ssd optimizations
Jun 23 17:55:58 gargamel kernel: [ 2293.914867] perf: interrupt took too long 
(2507 > 2500), lowering kernel.perf_event_max_sample_rate to 79750
Jun 23 17:56:22 gargamel kernel: [ 2317.718406] BTRFS info (device dm-2): disk 
space caching is enabled
Jun 23 17:56:22 gargamel kernel: [ 2317.737277] BTRFS info (device dm-2): has 
skinny extents
Jun 23 17:56:22 gargamel kernel: [ 2318.069461] BTRFS info (device dm-2): bdev 
/dev/mapper/dshelf2 errs: wr 0, rd 0 , flush 0, corrupt 2, gen 0
Jun 23 17:59:22 gargamel kernel: [ 2498.256167] BTRFS info (device dm-2): 
enabling ssd optimizations
Jun 23 18:05:23 gargamel kernel: [ 2859.107057] BTRFS info (device dm-2): disk 
space caching is enabled
Jun 23 18:05:23 gargamel kernel: [ 2859.125883] BTRFS info (device dm-2): has 
skinny extents
Jun 23 18:05:24 gargamel kernel: [ 2859.448018] BTRFS info (device dm-2): bdev 
/dev/mapper/dshelf2 errs: wr 0, rd 0 , flush 0, corrupt 2, gen 0
Jun 23 18:08:23 gargamel kernel: [ 3039.023305] BTRFS info (device dm-2): 
enabling ssd optimizations
Jun 23 18:13:41 gargamel kernel: [ 3356.626037] perf: interrupt took too long 
(3143 > 3133), lowering kernel.perf_event_max_sample_rate to 63500
Jun 23 18:17:23 gargamel kernel: [ 3578.937225] Process accounting resumed
Jun 23 18:33:47 gargamel kernel: [ 4563.356252] JFS: nTxBlock = 8192, nTxLock = 
65536
Jun 23 18:33:48 gargamel kernel: [ 4563.446715] ntfs: driver 2.1.32 [Flags: R/W 
MODULE].
Jun 23 18:42:20 gargamel kernel: [ 5075.995254] INFO: task sync:20253 blocked 
for more than 120 seconds.
Jun 23 18:42:20 gargamel kernel: [ 5076.015729]   Not tainted 
4.17.2-amd64-preempt-sysrq-20180817 #1
Jun 23 18:42:20 gargamel kernel: [ 5076.036141] "echo 0 > 
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jun 23 18:42:20 gargamel kernel: [ 5076.060637] syncD0 20253  
15327 0x20020080
Jun 23 18:42:20 gargamel kernel: [ 5076.078032] Call Trace:
Jun 23 18:42:20 gargamel kernel: [ 5076.086366]  ? __schedule+0x53e/0x59b
Jun 23 18:42:20 gargamel kernel: [ 5076.098311]  schedule+0x7f/0x98

Re: So, does btrfs check lowmem take days? weeks?

2018-06-28 Thread Marc MERLIN
On Fri, Jun 29, 2018 at 01:35:06PM +0800, Su Yue wrote:
> > It's hard to estimate, especially when every cross check involves a lot
> > of disk IO.
> > 
> > But at least, we could add such indicator to show we're doing something.
> > Maybe we can account all roots in root tree first, before checking a
> tree, report i/num_roots. So users can see the what is the check doing
> something meaningful or silly dead looping.

Sounds reasonable.
Do you want to submit something in git master for btrfs-progs, I pull
it, and just my btrfs check again?

In the meantime, how sane does the output I just posted, look?

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/   | PGP 7F55D5F27AAF9D08
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: So, does btrfs check lowmem take days? weeks?

2018-06-28 Thread Marc MERLIN
On Fri, Jun 29, 2018 at 01:07:20PM +0800, Qu Wenruo wrote:
> > lowmem repair seems to be going still, but it's been days and -p seems
> > to do absolutely nothing.
> 
> I'm a afraid you hit a bug in lowmem repair code.
> By all means, --repair shouldn't really be used unless you're pretty
> sure the problem is something btrfs check can handle.
> 
> That's also why --repair is still marked as dangerous.
> Especially when it's combined with experimental lowmem mode.

Understood, but btrfs got corrupted (by itself or not, I don't know)
I cannot mount the filesystem read/write
I cannot btrfs check --repair it since that code will kill my machine
What do I have left?

> > My filesystem is "only" 10TB or so, albeit with a lot of files.
> 
> Unless you have tons of snapshots and reflinked (deduped) files, it
> shouldn't take so long.

I may have a fair amount.
gargamel:~# btrfs check --mode=lowmem --repair -p /dev/mapper/dshelf2 
enabling repair mode
WARNING: low-memory mode repair support is only partial
Checking filesystem on /dev/mapper/dshelf2
UUID: 0f1a0c9f-4e54-4fa7-8736-fd50818ff73d
Fixed 0 roots.
ERROR: extent[84302495744, 69632] referencer count mismatch (root: 21872, 
owner: 374857, offset: 3407872) wanted: 3, have: 4
Created new chunk [18457780224000 1073741824]
Delete backref in extent [84302495744 69632]
ERROR: extent[84302495744, 69632] referencer count mismatch (root: 22911, 
owner: 374857, offset: 3407872) wanted: 3, have: 4
Delete backref in extent [84302495744 69632]
ERROR: extent[125712527360, 12214272] referencer count mismatch (root: 21872, 
owner: 374857, offset: 114540544) wanted: 181, have: 240
Delete backref in extent [125712527360 12214272]
ERROR: extent[125730848768, 5111808] referencer count mismatch (root: 21872, 
owner: 374857, offset: 126754816) wanted: 68, have: 115
Delete backref in extent [125730848768 5111808]
ERROR: extent[125730848768, 5111808] referencer count mismatch (root: 22911, 
owner: 374857, offset: 126754816) wanted: 68, have: 115
Delete backref in extent [125730848768 5111808]
ERROR: extent[125736914944, 6037504] referencer count mismatch (root: 21872, 
owner: 374857, offset: 131866624) wanted: 115, have: 143
Delete backref in extent [125736914944 6037504]
ERROR: extent[125736914944, 6037504] referencer count mismatch (root: 22911, 
owner: 374857, offset: 131866624) wanted: 115, have: 143
Delete backref in extent [125736914944 6037504]
ERROR: extent[129952120832, 20242432] referencer count mismatch (root: 21872, 
owner: 374857, offset: 148234240) wanted: 302, have: 431
Delete backref in extent [129952120832 20242432]
ERROR: extent[129952120832, 20242432] referencer count mismatch (root: 22911, 
owner: 374857, offset: 148234240) wanted: 356, have: 433
Delete backref in extent [129952120832 20242432]
ERROR: extent[134925357056, 11829248] referencer count mismatch (root: 21872, 
owner: 374857, offset: 180371456) wanted: 161, have: 240
Delete backref in extent [134925357056 11829248]
ERROR: extent[134925357056, 11829248] referencer count mismatch (root: 22911, 
owner: 374857, offset: 180371456) wanted: 162, have: 240
Delete backref in extent [134925357056 11829248]
ERROR: extent[147895111680, 12345344] referencer count mismatch (root: 21872, 
owner: 374857, offset: 192200704) wanted: 170, have: 249
Delete backref in extent [147895111680 12345344]
ERROR: extent[147895111680, 12345344] referencer count mismatch (root: 22911, 
owner: 374857, offset: 192200704) wanted: 172, have: 251
Delete backref in extent [147895111680 12345344]
ERROR: extent[150850146304, 17522688] referencer count mismatch (root: 21872, 
owner: 374857, offset: 217653248) wanted: 348, have: 418
Delete backref in extent [150850146304 17522688]
ERROR: extent[156909494272, 55320576] referencer count mismatch (root: 22911, 
owner: 374857, offset: 235175936) wanted: 555, have: 1449
Deleted root 2 item[156909494272, 178, 5476627808561673095]
ERROR: extent[156909494272, 55320576] referencer count mismatch (root: 21872, 
owner: 374857, offset: 235175936) wanted: 556, have: 1452
Deleted root 2 item[156909494272, 178, 7338474132555182983]
ERROR: file extent[374857 235184128] root 21872 owner 21872 backref lost
Add one extent data backref [156909494272 55320576]
ERROR: file extent[374857 235184128] root 22911 owner 22911 backref lost
Add one extent data backref [156909494272 55320576]

The last two ERROR lines took over a day to get generated, so I'm not sure if 
it's still working, but just slowly.
For what it's worth non lowmem check used to take 12 to 24H on that filesystem 
back when it still worked.

> > 2 things that come to mind
> > 1) can lowmem have some progress working so that I know if I'm looking
> > at days, weeks, or even months before it will be done?
> 
> It's hard to estimate, especially when every cross check involves a lot
> of disk IO.
> But at least, we could add such indicator to show we're doing something.

Yes, anything to show that I should still wait is still good :)

> > 2) non lowmem 

So, does btrfs check lowmem take days? weeks?

2018-06-28 Thread Marc MERLIN
Regular btrfs check --repair has a nice progress option. It wasn't
perfect, but it showed something.

But then it also takes all your memory quicker than the linux kernel can
defend itself and reliably completely kills my 32GB server quicker than
it can OOM anything.

lowmem repair seems to be going still, but it's been days and -p seems
to do absolutely nothing.

My filesystem is "only" 10TB or so, albeit with a lot of files.

2 things that come to mind
1) can lowmem have some progress working so that I know if I'm looking
at days, weeks, or even months before it will be done?

2) non lowmem is more efficient obviously when it doesn't completely
crash your machine, but could lowmem be given an amount of memory to use
for caching, or maybe use some heuristics based on RAM free so that it's
not so excrutiatingly slow?

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/   | PGP 7F55D5F27AAF9D08
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs balance did not progress after 12H, hang on reboot, btrfs check --repair kills the system still

2018-06-25 Thread Marc MERLIN
On Mon, Jun 25, 2018 at 01:07:10PM -0400, Austin S. Hemmelgarn wrote:
> > - mount -o recovery still hung
> > - mount -o ro did not hang though
> One tip here specifically, if you had to reboot during a balance and the FS
> hangs when it mounts, try mounting with `-o skip_balance`.  That should
> pause the balance instead of resuming it on mount, at which point you should
> also be able to cancel it without it hanging.

Very good tip, I have this in all my mountpoints :)

#LABEL=dshelf2 /mnt/btrfs_pool2 btrfs defaults,compress=lzo,skip_balance,noatime

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/   | PGP 7F55D5F27AAF9D08
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs balance did not progress after 12H, hang on reboot, btrfs check --repair kills the system still

2018-06-25 Thread Marc MERLIN
On Mon, Jun 25, 2018 at 06:24:37PM +0200, Hans van Kranenburg wrote:
> >> output hasn't changed for over 36 hours, unless you've got an insanely slow
> >> storage array, that's extremely unusual (it should only be moving at most
> >> 3GB of data per chunk)).
> > 
> > I didn't hear from any developer, so I had to continue.
> > - btrfs scrub cancel did not work (hang)
> 
> Did you mean balance cancel? It waits until the current block group is
> finished.
 
Yes, I meant that, thanks for correcting me.  And you're correct that
because it was hung, cancel wasn't going to go anywhere.
At least my filesystem was still working at the time (as in IO was going
on just fine)

> > - at reboot mounting the filesystem hung, even with 4.17, which is
> >   disappointing (it should not hang)
> > - mount -o recovery still hung
> > - mount -o ro did not hang though
> > 
> > Sigh, why is my FS corrupted again?
> 
> Again? Do you think balance is corrupting the filesystem? Or have there
> been previous btrfs check --repair operations which made smaller
> problems bigger in the past?

Honestly, I don't fully remember at this point, I keep notes, but not
detailled enough and it's been a little while.
I know I've had to delete/recreate this filesystem twice already over
the last years, but I'm not fully certain I remember when this one was
last wiped.
Yes, I do run balance along with scrub once a month:

btrfs balance start -musage=0 -v $mountpoint 2>&1 | grep -Ev "$FILTER"
# After metadata, let's do data:
btrfs balance start -dusage=0 -v $mountpoint 2>&1 | grep -Ev "$FILTER"
btrfs balance start -dusage=20 -v $mountpoint 2>&1 | grep -Ev "$FILTER"
echo btrfs scrub start -Bd $mountpoint
ionice -c 3 nice -10 btrfs scrub start -Bd $mountpoint

Hard to say if balance has damaged my filesystem over time, but it's
definitely possible.

> Am I right to interpret the messages below, and see that you have
> extents that are referenced hundreds of times?
 
I'm not certain, but it's a backup server with many blocks that are the same, 
so it 
could be some COW stuff, even if I didn't run any dedupe commands myself.

> Is there heavy snapshotting or deduping going on in this filesystem? If
> so, it's not surprising balance will get a hard time moving extents
> around, since it has to update all of the metadata for each extent again
> in hundreds of places.

There is some snapshotting, but maybe around 20 or so per subvolume, not 
hundreds.

> Did you investigate what balance was doing if it takes long? Is is using
> cpu all the time, or is it reading from disk slowly (random reads) or is
> it writing to disk all the time at full speed?

I couldn't see what it was doing, but it's running in the kernel, is it not?
(or can you just strace the user space command?)
Either way, it's too late for that now, and given that it didn't make progress 
of
a single block in 36H, I'm assuming it was well deadlocked.

Thanks for the reply.
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/   | PGP 7F55D5F27AAF9D08
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs balance did not progress after 12H, hang on reboot, btrfs check --repair kills the system still

2018-06-25 Thread Marc MERLIN
On Tue, Jun 19, 2018 at 12:58:44PM -0400, Austin S. Hemmelgarn wrote:
> > In your situation, I would run "btrfs pause ", wait to hear from
> > a btrfs developer, and not use the volume whatsoever in the meantime.
> I would say this is probably good advice.  I don't really know what's going
> on here myself actually, though it looks like the balance got stuck (the
> output hasn't changed for over 36 hours, unless you've got an insanely slow
> storage array, that's extremely unusual (it should only be moving at most
> 3GB of data per chunk)).

I didn't hear from any developer, so I had to continue.
- btrfs scrub cancel did not work (hang)
- at reboot mounting the filesystem hung, even with 4.17, which is
  disappointing (it should not hang)
- mount -o recovery still hung
- mount -o ro did not hang though

Sigh, why is my FS corrupted again?
Anyway, back to 
btrfs check --repair
and, it took all my 32GB of RAM on a system I can't add more RAM to, so
I'm hosed. I'll note in passing (and it's not ok at all) that check
--repair after a 20 to 30mn pause, takes all the kernel RAM more quickly
than the system can OOM or log anything, and just deadlocks it.
This is repeateable and totally not ok :(

I'm now left with btrfs-progs git master, and lowmem which finally does
a bit of repair.
So far:
gargamel:~# btrfs check --mode=lowmem --repair -p /dev/mapper/dshelf2  
enabling repair mode  
WARNING: low-memory mode repair support is only partial  
Checking filesystem on /dev/mapper/dshelf2  
UUID: 0f1a0c9f-4e54-4fa7-8736-fd50818ff73d  
Fixed 0 roots.  
ERROR: extent[84302495744, 69632] referencer count mismatch (root: 21872, 
owner: 374857, offset: 3407872) wanted: 3, have: 4
Created new chunk [18457780224000 1073741824]
Delete backref in extent [84302495744 69632]
ERROR: extent[84302495744, 69632] referencer count mismatch (root: 22911, 
owner: 374857, offset: 3407872) wanted: 3, have: 4
Delete backref in extent [84302495744 69632]
ERROR: extent[125712527360, 12214272] referencer count mismatch (root: 21872, 
owner: 374857, offset: 114540544) wanted: 181, have: 240
Delete backref in extent [125712527360 12214272]
ERROR: extent[125730848768, 5111808] referencer count mismatch (root: 21872, 
owner: 374857, offset: 126754816) wanted: 68, have: 115
Delete backref in extent [125730848768 5111808]
ERROR: extent[125730848768, 5111808] referencer count mismatch (root: 22911, 
owner: 374857, offset: 126754816) wanted: 68, have: 115
Delete backref in extent [125730848768 5111808]
ERROR: extent[125736914944, 6037504] referencer count mismatch (root: 21872, 
owner: 374857, offset: 131866624) wanted: 115, have: 143
Delete backref in extent [125736914944 6037504]
ERROR: extent[125736914944, 6037504] referencer count mismatch (root: 22911, 
owner: 374857, offset: 131866624) wanted: 115, have: 143
Delete backref in extent [125736914944 6037504]
ERROR: extent[129952120832, 20242432] referencer count mismatch (root: 21872, 
owner: 374857, offset: 148234240) wanted: 302, have: 431
Delete backref in extent [129952120832 20242432]
ERROR: extent[129952120832, 20242432] referencer count mismatch (root: 22911, 
owner: 374857, offset: 148234240) wanted: 356, have: 433
Delete backref in extent [129952120832 20242432]
ERROR: extent[134925357056, 11829248] referencer count mismatch (root: 21872, 
owner: 374857, offset: 180371456) wanted: 161, have: 240
Delete backref in extent [134925357056 11829248]
ERROR: extent[134925357056, 11829248] referencer count mismatch (root: 22911, 
owner: 374857, offset: 180371456) wanted: 162, have: 240
Delete backref in extent [134925357056 11829248]
ERROR: extent[147895111680, 12345344] referencer count mismatch (root: 21872, 
owner: 374857, offset: 192200704) wanted: 170, have: 249
Delete backref in extent [147895111680 12345344]
ERROR: extent[147895111680, 12345344] referencer count mismatch (root: 22911, 
owner: 374857, offset: 192200704) wanted: 172, have: 251
Delete backref in extent [147895111680 12345344]
ERROR: extent[150850146304, 17522688] referencer count mismatch (root: 21872, 
owner: 374857, offset: 217653248) wanted: 348, have: 418
Delete backref in extent [150850146304 17522688]
ERROR: extent[156909494272, 55320576] referencer count mismatch (root: 22911, 
owner: 374857, offset: 235175936) wanted: 555, have: 1449
Deleted root 2 item[156909494272, 178, 5476627808561673095]
ERROR: extent[156909494272, 55320576] referencer count mismatch (root: 21872, 
owner: 374857, offset: 235175936) wanted: 556, have: 1452
Deleted root 2 item[156909494272, 178, 7338474132555182983]

At the rate it's going, it'll probably take days though, it's already been 36H

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/   | PGP 7F55D5F27AAF9D08
--
To unsubscribe from this list: send the line "unsubscribe 

Re: btrfs balance did not progress after 12H

2018-06-19 Thread Marc MERLIN
On Mon, Jun 18, 2018 at 06:00:55AM -0700, Marc MERLIN wrote:
> So, I ran this:
> gargamel:/mnt/btrfs_pool2# btrfs balance start -dusage=60 -v .  &
> [1] 24450
> Dumping filters: flags 0x1, state 0x0, force is off
>   DATA (flags 0x2): balancing, usage=60
> gargamel:/mnt/btrfs_pool2# while :; do btrfs balance status .; sleep 60; done
> 0 out of about 0 chunks balanced (0 considered), -nan% left
> Balance on '.' is running
> 0 out of about 73 chunks balanced (2 considered), 100% left
> Balance on '.' is running
> 
> After about 20mn, it changed to this:
> 1 out of about 73 chunks balanced (6724 considered),  99% left
> Balance on '.' is running
> 
> Now, 12H later, it's still there, only 1 out of 73.
> 
> gargamel:/mnt/btrfs_pool2# btrfs fi show .
> Label: 'dshelf2'  uuid: 0f1a0c9f-4e54-4fa7-8736-fd50818ff73d
> Total devices 1 FS bytes used 12.72TiB
> devid1 size 14.55TiB used 13.81TiB path /dev/mapper/dshelf2
> 
> gargamel:/mnt/btrfs_pool2# btrfs fi df .
> Data, single: total=13.57TiB, used=12.60TiB
> System, DUP: total=32.00MiB, used=1.55MiB
> Metadata, DUP: total=121.50GiB, used=116.53GiB
> GlobalReserve, single: total=512.00MiB, used=848.00KiB
> 
> kernel: 4.16.8
> 
> Is that expected? Should I be ready to wait days possibly for this
> balance to finish?
 
It's now beeen 2 days, and it's still stuck at 1%
1 out of about 73 chunks balanced (6724 considered),  99% left

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/   | PGP 7F55D5F27AAF9D08
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


btrfs balance did not progress after 12H

2018-06-18 Thread Marc MERLIN
So, I ran this:
gargamel:/mnt/btrfs_pool2# btrfs balance start -dusage=60 -v .  &
[1] 24450
Dumping filters: flags 0x1, state 0x0, force is off
  DATA (flags 0x2): balancing, usage=60
gargamel:/mnt/btrfs_pool2# while :; do btrfs balance status .; sleep 60; done
0 out of about 0 chunks balanced (0 considered), -nan% left
Balance on '.' is running
0 out of about 73 chunks balanced (2 considered), 100% left
Balance on '.' is running

After about 20mn, it changed to this:
1 out of about 73 chunks balanced (6724 considered),  99% left
Balance on '.' is running

Now, 12H later, it's still there, only 1 out of 73.

gargamel:/mnt/btrfs_pool2# btrfs fi show .
Label: 'dshelf2'  uuid: 0f1a0c9f-4e54-4fa7-8736-fd50818ff73d
Total devices 1 FS bytes used 12.72TiB
devid1 size 14.55TiB used 13.81TiB path /dev/mapper/dshelf2

gargamel:/mnt/btrfs_pool2# btrfs fi df .
Data, single: total=13.57TiB, used=12.60TiB
System, DUP: total=32.00MiB, used=1.55MiB
Metadata, DUP: total=121.50GiB, used=116.53GiB
GlobalReserve, single: total=512.00MiB, used=848.00KiB

kernel: 4.16.8

Is that expected? Should I be ready to wait days possibly for this
balance to finish?

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/   | PGP 7F55D5F27AAF9D08
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 4.15.6 crash: BUG at fs/btrfs/ctree.c:1862

2018-05-15 Thread Marc MERLIN
On Tue, May 15, 2018 at 09:36:11AM +0100, Filipe Manana wrote:
> We got a fix for this recently:  https://patchwork.kernel.org/patch/10396523/

Thanks very much for the notice, sorry that I missed it.

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/   | PGP 7F55D5F27AAF9D08
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


4.15.6 crash: BUG at fs/btrfs/ctree.c:1862

2018-05-14 Thread Marc MERLIN
static noinline struct extent_buffer *
read_node_slot(struct btrfs_fs_info *fs_info, struct extent_buffer *parent,
   int slot)
{
int level = btrfs_header_level(parent);
struct extent_buffer *eb;

if (slot < 0 || slot >= btrfs_header_nritems(parent))
return ERR_PTR(-ENOENT);

BUG_ON(level == 0);



BTRFS info (device dm-2): relocating block group 13404622290944 flags data
BTRFS info (device dm-2): found 9959 extents
BTRFS info (device dm-2): found 9959 extents
BTRFS info (device dm-2): relocating block group 13403548549120 flags data
[ cut here ]
kernel BUG at fs/btrfs/ctree.c:1862!
invalid opcode:  [#1] PREEMPT SMP PTI
CPU: 5 PID: 8103 Comm: btrfs Tainted: G U   
4.15.6-amd64-preempt-sysrq-20171018 #3
Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 
04/27/2013
RIP: 0010:read_node_slot+0x3c/0x9e
RSP: 0018:becfaa0b7b58 EFLAGS: 00210246
RAX: 00a0 RBX: 000c RCX: 0003
RDX: 000c RSI: 9a60e9d9de78 RDI: 00052f6e
RBP: 9a60e9d9de78 R08: 0001 R09: becfaa0b7bf6
R10: 9a64988bd7e9 R11: 9a64988bd7c8 R12: e003d4bdb800
R13: 9a64a481 R14:  R15: 
FS:  7fba34c9c8c0() GS:9a64de34() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 5a8b9c9a CR3: 0001446c6004 CR4: 001606e0
Call Trace:
 tree_advance+0xb1/0x11e
 btrfs_compare_trees+0x1c2/0x4d6
 ? process_extent+0xdcf/0xdcf
 btrfs_ioctl_send+0x81e/0xc70
 ? __kmalloc_track_caller+0xfb/0x10f
 _btrfs_ioctl_send+0xbc/0xe6
 ? paravirt_sched_clock+0x5/0x8
 ? set_task_rq+0x2f/0x80
 ? task_rq_unlock+0x22/0x36
 btrfs_ioctl+0x162f/0x1dc8
 ? select_task_rq_fair+0xb65/0xb7a
 ? update_load_avg+0x16d/0x442
 ? list_add+0x15/0x2e
 ? cfs_rq_throttled.isra.30+0x9/0x18
 ? vfs_ioctl+0x1b/0x28
 vfs_ioctl+0x1b/0x28
 do_vfs_ioctl+0x4f4/0x53f
 ? __audit_syscall_entry+0xbf/0xe3
 SyS_ioctl+0x52/0x76
 do_syscall_64+0x72/0x81
 entry_SYSCALL_64_after_hwframe+0x3d/0xa2
RIP: 0033:0x7fba34d835e7
RSP: 002b:7ffc32cf4cb8 EFLAGS: 0202 ORIG_RAX: 0010
RAX: ffda RBX: 523f RCX: 7fba34d835e7
RDX: 7ffc32cf4d40 RSI: 40489426 RDI: 0004
RBP: 0004 R08:  R09: 7fba34c9b700
R10: 7fba34c9b9d0 R11: 0202 R12: 0003
R13: 563a30b87020 R14: 0001 R15: 0001
Code: f5 53 4c 8b a6 98 00 00 00 89 d3 4c 89 e7 e8 67 fd ff ff 85 db 78 63 4c 
89 e7 41 88 c6 e8 92 fb ff ff 39 d8 76 54 45 84 f6 75 02 <0f> 0b 89 de 48 89 ef 
e8 2e ff ff ff 89 de 49 89 c4 48 89 ef e8
RIP: read_node_slot+0x3c/0x9e RSP: becfaa0b7b58
---[ end trace a24e7de6b77b5cb1 ]---
Kernel panic - not syncing: Fatal exception
Kernel Offset: 0x1900 from 0x8100 (relocation range: 
0x8000-0xbfff)

-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/   | PGP 7F55D5F27AAF9D08
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: How to change/fix 'Received UUID'

2018-03-10 Thread Marc MERLIN
Thanks all for the help again.
I just wrote a blog post to explain the process to others should anyone
need this later.

http://marc.merlins.org/perso/btrfs/post_2018-03-09_Btrfs-Tips_-Rescuing-A-Btrfs-Send-Receive-Relationship.html

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/   | PGP 7F55D5F27AAF9D08
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: How to change/fix 'Received UUID'

2018-03-08 Thread Marc MERLIN
On Thu, Mar 08, 2018 at 09:36:49PM +0300, Andrei Borzenkov wrote:
> Yes. Your source has Received UUID. In this case btrfs send will
> transmit received UUID instead of subvolume UUID as reference to base
> snapshot. You need to either clear received UUID on source or set
> received UUID on destination to received UUID of source (not to
> subvolume UUID of source).

gargamel:/var/local/src/python-btrfs/examples# ./set_received_uuid.py 
0e220a4f-6426-4745-8399-0da0084f8b23 313
37 1234.5678 /mnt/btrfs_bigbackup/DS1/Video_ro.20180220_21:03:41
  
Current subvolume information:  
  
  subvol_id: 94887  
  
  received_uuid: 2afc7a5e-107f-d54b-8929-197b80b70828   
  
  stime: 1234.5678 (1970-01-01T00:20:34.567800) 
  
  stransid: 31337   
  
  rtime: 1520488877.415709329 (2018-03-08T06:01:17.415709)  
  
  rtransid: 255755  
  

  
Setting received subvolume...   
  

  
Resulting subvolume information:
  
  subvol_id: 94887  
  
  received_uuid: 0e220a4f-6426-4745-8399-0da0084f8b23   
  
  stime: 1234.5678 (1970-01-01T00:20:34.567800) 
  
  stransid: 31337   
  
  rtime: 1520537034.890253770 (2018-03-08T19:23:54.890254)  
  
  rtransid: 256119  
  

  
gargamel:/var/local/src/python-btrfs/examples# btrfs property set -ts 
/mnt/btrfs_bigbackup/DS1/Video_ro.201802
20_21:03:41 ro true

This worked fine, thank you so much.
I now have an incremental send that is going on and will take a few dozen 
minutes instead
of days for 8TB+ :)

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: How to change/fix 'Received UUID'

2018-03-08 Thread Marc MERLIN
On Thu, Mar 08, 2018 at 09:34:45AM +0300, Andrei Borzenkov wrote:
> 08.03.2018 09:06, Marc MERLIN пишет:
> > On Tue, Mar 06, 2018 at 12:02:47PM -0800, Marc MERLIN wrote:
> >>> https://github.com/knorrie/python-btrfs/commit/1ace623f95300ecf581b1182780fd6432a46b24d
> >>
> >> Well, I had never heard about it until now, thank you.
> >>
> >> I'll see if I can make it work when I get a bit of time.
> > 
> > Sorry, I missed the fact that there was no code to write at all.
> > gargamel:/var/local/src/python-btrfs/examples# ./set_received_uuid.py 
> > 2afc7a5e-107f-d54b-8929-197b80b70828 31337 1234.5678 
> > /mnt/btrfs_bigbackup/DS1/Video_ro.20180220_21:03:41
> > Current subvolume information:
> >   subvol_id: 94887
> >   received_uuid: ----
> >   stime: 0.0 (1970-01-01T00:00:00)
> >   stransid: 0  
> >   rtime: 0.0 (1970-01-01T00:00:00)
> >   rtransid: 0  
> > 
> > Setting received subvolume...
> > 
> > Resulting subvolume information:
> >   subvol_id: 94887
> >   received_uuid: 2afc7a5e-107f-d54b-8929-197b80b70828
> >   stime: 1234.5678 (1970-01-01T00:20:34.567800)
> >   stransid: 31337
> >   rtime: 1520488877.415709329 (2018-03-08T06:01:17.415709)
> >   rtransid: 255755
> > 
> > gargamel:/var/local/src/python-btrfs/examples# btrfs property set -ts 
> > /mnt/btrfs_bigbackup/DS1/Video_ro.20180220_21:03:41 ro true
> > 
> > 
> > ABORT: btrfs send -p /mnt/btrfs_pool1/Video_ro.20180205_21:05:15 
> > Video_ro.20180307_22:03:03 |  btrfs receive /mnt/btrfs_bigbackup/DS1//. 
> > failed
> > At subvol Video_ro.20180307_22:03:03
> > At snapshot Video_ro.20180307_22:03:03
> > ERROR: cannot find parent subvolume
> > 
> > gargamel:/mnt/btrfs_pool1# btrfs subvolume show 
> > /mnt/btrfs_pool1/Video_ro.20180220_21\:03\:41/
> > Video_ro.20180220_21:03:41
> 
> Not sure I understand how this subvolume is related. You send
> differences between Video_ro.20180205_21:05:15 and
> Video_ro.20180307_22:03:03, so you need to have (replica of)
> Video_ro.20180205_21:05:15 on destination. How exactly
> Video_ro.20180220_21:03:41 comes in picture here?
 
Sorry, I pasted the wrong thing.
ABORT: btrfs send -p /mnt/btrfs_pool1/Video_ro.20180220_21:03:41 
Video_ro.20180308_07:50:06 |  btrfs receive /mnt/btrfs_bigbackup/DS1//. failed
At subvol Video_ro.20180308_07:50:06
At snapshot Video_ro.20180308_07:50:06
ERROR: cannot find parent subvolume

Same problem basically, just copied the wrong attempt, sorry about that.

Do I need to make sure of more than
DS1/Video_ro.20180220_21:03:41
Received UUID:  2afc7a5e-107f-d54b-8929-197b80b70828

be equal to
Name:   Video_ro.20180220_21:03:41
UUID:   2afc7a5e-107f-d54b-8929-197b80b70828

Thanks,
Marc


> > Name:   Video_ro.20180220_21:03:41
> > UUID:   2afc7a5e-107f-d54b-8929-197b80b70828
> > Parent UUID:e5ec5c1e-6b49-084e-8820-5a8cfaa1b089
> > Received UUID:  0e220a4f-6426-4745-8399-0da0084f8b23>   
> >   Creation time:  2018-02-20 21:03:42 -0800
> > Subvolume ID:   11228
> > Generation: 4174
> > Gen at creation:4150
> > Parent ID:  5
> > Top level ID:   5
> > Flags:  readonly
> > Snapshot(s):
> > Video_rw.20180220_21:03:41
> > Video
> > 
> > 
> > Wasn't I supposed to set 2afc7a5e-107f-d54b-8929-197b80b70828 onto the 
> > destination?
> > 
> > Doesn't that look ok now? Is there something else I'm missing?
> > gargamel:/mnt/btrfs_pool1# btrfs subvolume show 
> > /mnt/btrfs_bigbackup/DS1/Video_ro.20180220_21:03:41
> > DS1/Video_ro.20180220_21:03:41
> > Name:   Video_ro.20180220_21:03:41
> > UUID:   cb4f343c-5e79-7f49-adf0-7ce0b29f23b3
> > Parent UUID:0e220a4f-6426-4745-8399-0da0084f8b23
> > Received UUID:  2afc7a5e-107f-d54b-8929-197b80b70828
> > Creation time:  2018-02-20 21:13:36 -0800
> > Subvolume ID:   94887
> > Generation: 250689
> > Gen at creation:250689
> > Parent ID:  89160
> > Top level ID:   89160
> > Flags:  readonly
> > Snapshot(s):
> > 
> > Thanks,
> > Marc
> > 
> 
> 

-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/   | PGP 7F55D5F27AAF9D08
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: How to change/fix 'Received UUID'

2018-03-07 Thread Marc MERLIN
On Tue, Mar 06, 2018 at 12:02:47PM -0800, Marc MERLIN wrote:
> > https://github.com/knorrie/python-btrfs/commit/1ace623f95300ecf581b1182780fd6432a46b24d
> 
> Well, I had never heard about it until now, thank you.
> 
> I'll see if I can make it work when I get a bit of time.

Sorry, I missed the fact that there was no code to write at all.
gargamel:/var/local/src/python-btrfs/examples# ./set_received_uuid.py 
2afc7a5e-107f-d54b-8929-197b80b70828 31337 1234.5678 
/mnt/btrfs_bigbackup/DS1/Video_ro.20180220_21:03:41
Current subvolume information:
  subvol_id: 94887
  received_uuid: ----
  stime: 0.0 (1970-01-01T00:00:00)
  stransid: 0  
  rtime: 0.0 (1970-01-01T00:00:00)
  rtransid: 0  

Setting received subvolume...

Resulting subvolume information:
  subvol_id: 94887
  received_uuid: 2afc7a5e-107f-d54b-8929-197b80b70828
  stime: 1234.5678 (1970-01-01T00:20:34.567800)
  stransid: 31337
  rtime: 1520488877.415709329 (2018-03-08T06:01:17.415709)
  rtransid: 255755

gargamel:/var/local/src/python-btrfs/examples# btrfs property set -ts 
/mnt/btrfs_bigbackup/DS1/Video_ro.20180220_21:03:41 ro true


ABORT: btrfs send -p /mnt/btrfs_pool1/Video_ro.20180205_21:05:15 
Video_ro.20180307_22:03:03 |  btrfs receive /mnt/btrfs_bigbackup/DS1//. failed
At subvol Video_ro.20180307_22:03:03
At snapshot Video_ro.20180307_22:03:03
ERROR: cannot find parent subvolume

gargamel:/mnt/btrfs_pool1# btrfs subvolume show 
/mnt/btrfs_pool1/Video_ro.20180220_21\:03\:41/
Video_ro.20180220_21:03:41
Name:   Video_ro.20180220_21:03:41
UUID:   2afc7a5e-107f-d54b-8929-197b80b70828
Parent UUID:e5ec5c1e-6b49-084e-8820-5a8cfaa1b089
Received UUID:  0e220a4f-6426-4745-8399-0da0084f8b23
Creation time:  2018-02-20 21:03:42 -0800
Subvolume ID:   11228
Generation: 4174
Gen at creation:4150
Parent ID:  5
Top level ID:   5
Flags:  readonly
Snapshot(s):
Video_rw.20180220_21:03:41
Video


Wasn't I supposed to set 2afc7a5e-107f-d54b-8929-197b80b70828 onto the 
destination?

Doesn't that look ok now? Is there something else I'm missing?
gargamel:/mnt/btrfs_pool1# btrfs subvolume show 
/mnt/btrfs_bigbackup/DS1/Video_ro.20180220_21:03:41
DS1/Video_ro.20180220_21:03:41
Name:   Video_ro.20180220_21:03:41
UUID:   cb4f343c-5e79-7f49-adf0-7ce0b29f23b3
Parent UUID:0e220a4f-6426-4745-8399-0da0084f8b23
Received UUID:  2afc7a5e-107f-d54b-8929-197b80b70828
Creation time:  2018-02-20 21:13:36 -0800
Subvolume ID:   94887
Generation: 250689
Gen at creation:250689
Parent ID:  89160
Top level ID:   89160
Flags:  readonly
Snapshot(s):

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/   | PGP 7F55D5F27AAF9D08
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: How to change/fix 'Received UUID'

2018-03-06 Thread Marc MERLIN
On Tue, Mar 06, 2018 at 08:12:15PM +0100, Hans van Kranenburg wrote:
> On 05/03/2018 20:47, Marc MERLIN wrote:
> > On Mon, Mar 05, 2018 at 10:38:16PM +0300, Andrei Borzenkov wrote:
> >>> If I absolutely know that the data is the same on both sides, how do I
> >>> either
> >>> 1) force back in a 'Received UUID' value on the destination
> >>
> >> I suppose the most simple is to write small program that does it using
> >> BTRFS_IOC_SET_RECEIVED_SUBVOL.
> > 
> > Understdood.
> > Given that I have not worked with the code at all, what is the best 
> > tool in btrfs progs, to add this to?
> > 
> > btrfstune?
> > btrfs propery set?
> > other?
> > 
> > David, is this something you'd be willing to add support for?
> > (to be honest, it'll be quicker for someone who knows the code to add than
> > for me, but if no one has the time, I'l see if I can have a shot at it)
> 
> If you want something right now that works, so you can continue doing
> your backups, python-btrfs also has the ioctl, since v9, together with
> an example of using it:
> 
> https://github.com/knorrie/python-btrfs/commit/1ace623f95300ecf581b1182780fd6432a46b24d

Well, I had never heard about it until now, thank you.

I'll see if I can make it work when I get a bit of time.

Dear btrfs-progs folks, this would be great to add to the canonical
btrfs-progs too :)

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/   | PGP 7F55D5F27AAF9D08
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: How to change/fix 'Received UUID'

2018-03-05 Thread Marc MERLIN
On Mon, Mar 05, 2018 at 10:38:16PM +0300, Andrei Borzenkov wrote:
> > If I absolutely know that the data is the same on both sides, how do I
> > either
> > 1) force back in a 'Received UUID' value on the destination
> 
> I suppose the most simple is to write small program that does it using
> BTRFS_IOC_SET_RECEIVED_SUBVOL.

Understdood.
Given that I have not worked with the code at all, what is the best 
tool in btrfs progs, to add this to?

btrfstune?
btrfs propery set?
other?

David, is this something you'd be willing to add support for?
(to be honest, it'll be quicker for someone who knows the code to add than
for me, but if no one has the time, I'l see if I can have a shot at it)

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


How to change/fix 'Received UUID'

2018-03-05 Thread Marc MERLIN
Howdy,

I did a bunch of copies and moving around subvolumes between disks and
at some point, I did a snapshot dir1/Win_ro.20180205_21:18:31 
dir2/Win_ro.20180205_21:18:31

As a result, I lost the ro flag, and apparently 'Received UUID' which is
now preventing me from restarting the btrfs send/receive.

I changed the snapshot back to 'ro' but that's not enough:

Source:
Name:   Win_ro.20180205_21:18:31
UUID:   23ccf2bd-f494-e348-b34e-1f28486b2540
Parent UUID:-
Received UUID:  3cc327e1-358f-284e-92e2-4e4fde92b16f
Creation time:  2018-02-15 20:14:42 -0800
Subvolume ID:   964
Generation: 4062
Gen at creation:459
Parent ID:  5
Top level ID:   5
Flags:  readonly

Dest:
Name:   Win_ro.20180205_21:18:31
UUID:   a1e8777c-c52b-af4e-9ce2-45ca4d4d2df8
Parent UUID:-
Received UUID:  -
Creation time:  2018-02-17 22:20:25 -0800
Subvolume ID:   94826
Generation: 250714
Gen at creation:250540
Parent ID:  89160
Top level ID:   89160
Flags:  readonly

If I absolutely know that the data is the same on both sides, how do I
either
1) force back in a 'Received UUID' value on the destination
2) force a btrfs receive to work despite the lack of matching 'Received
UUID' 

Yes, I could discard and start over, but my 2nd such subvolume is 8TB,
so I'd really rather not :)

Any ideas?

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/   | PGP 7F55D5F27AAF9D08
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs check: add_missing_dir_index: BUG_ON `ret` triggered, value -17

2017-11-18 Thread Marc MERLIN
On Sat, Nov 18, 2017 at 08:16:32AM +0800, Qu Wenruo wrote:
> > item 27 key (1919785864 DIR_ITEM 2591417872) itemoff 14637 itemsize 
> > 46
> > location key (1919805647 INODE_ITEM 0) type FILE
> > transid 2231988 data_len 0 name_len 16
> > name: 1955-capture.jpg
> 
> OK, this DIR_ITEM matches with INODE_REF.
> So btrfs-check should only need to insert DIR_INDEX for it.
> > 
> >> Although what we could try is to avoid BUG_ON(), but I'm afraid the
> >> problem is more severe than my expectation.
> >  
> > How does it look now?
> 
> At least we know what btrfs check should do.
> I could dig it a little deeper to see if we could fix it.
> (Or something strange happened again)

Thanks for having had a look, hopefully it helps improving btrfs check,
thanks for getting the info and getting it turned into better code :)

In the meantime this was an easy FS to just wipe and start over with, so
I just did that.

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 4.13.12: kernel BUG at fs/btrfs/ctree.h:1802!

2017-11-17 Thread Marc MERLIN
On Thu, Nov 16, 2017 at 09:53:15PM -0800, Marc MERLIN wrote:
> > I suggest that you try lvmcache instead. It's much more flexible than 
> > bcache,
> > does pretty much the same job, and has much less of the "hacky" feel to it.
> 
> I can read up on it, it's going to be a big pain to convert from one to
> the other, but I can look at it for new filesystems.

I had a quick read. As expected, it's slower since it goes through all
the LVM overhead that I got rid of recently
https://github.com/stec-inc/EnhanceIO/wiki/PERFORMANCE-COMPARISON-AMONG-dm-cache,-bcache-and-EnhanceIO

Given the pain it would be for me to switch, I'm going to stick with
bcache and hope it improves.
But just to be safe, I'm going to stick to this:
echo writearound > /sys/block/bcache0/bcache/cache_mode

Probably my issues were having writes go through bcache, writeback on
one drive even.
I'll go back to the safest setting and hope for the best.

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs check: add_missing_dir_index: BUG_ON `ret` triggered, value -17

2017-11-17 Thread Marc MERLIN
On Fri, Nov 17, 2017 at 04:12:07PM +0800, Qu Wenruo wrote:
> 
> 
> On 2017年11月17日 15:30, Marc MERLIN wrote:
> > Here's the whole output:
> > gargamel:~# btrfs-debug-tree -t 258 /dev/mapper/raid0d1 | grep 1919805647
> 
> Sorry, I missed "-C10" parameter for grep.
 
generation 2231977 transid 2237084 size 64 nbytes 0
block group 0 mode 40755 links 1 uid 33 gid 33 rdev 0
sequence 0 flags 0x1710(none)
atime 1510290002.516060162 (2017-11-09 21:00:02)
ctime 1510477350.88506455 (2017-11-12 01:02:30)
mtime 1510477350.88506455 (2017-11-12 01:02:30)
otime 1510290002.516060162 (2017-11-09 21:00:02)
item 26 key (1919785864 INODE_REF 1919785862) itemoff 14683 itemsize 12
index 2 namelen 2 name: 00
item 27 key (1919785864 DIR_ITEM 2591417872) itemoff 14637 itemsize 46
location key (1919805647 INODE_ITEM 0) type FILE
transid 2231988 data_len 0 name_len 16
name: 1955-capture.jpg
item 28 key (1919785864 DIR_ITEM 3406016191) itemoff 14591 itemsize 46
location key (1919805657 INODE_ITEM 0) type FILE
transid 2231988 data_len 0 name_len 16
name: 1956-capture.jpg
item 29 key (1919785864 DIR_INDEX 1957) itemoff 14575 itemsize 16
location key (7383370114097217536 UNKNOWN.211 
15651972432879681580) type DIR_ITEM.0
transid 72057594045427176 data_len 0 name_len 0
name: 
item 30 key (1919805647 INODE_ITEM 0) itemoff 14415 itemsize 160
generation 2231988 transid 2231989 size 81701 nbytes 81920
block group 0 mode 100644 links 1 uid 33 gid 33 rdev 0
sequence 8 flags 0x14(NOCOMPRESS)
atime 1510290392.703320623 (2017-11-09 21:06:32)
ctime 1510290392.715320477 (2017-11-09 21:06:32)
mtime 1510290392.715320477 (2017-11-09 21:06:32)
otime 1510290392.703320623 (2017-11-09 21:06:32)
item 31 key (1919805647 INODE_REF 1919785864) itemoff 14389 itemsize 26
index 1957 namelen 16 name: 1955-capture.jpg
item 32 key (1919805647 EXTENT_DATA 0) itemoff 14336 itemsize 53
generation 2231989 type 1 (regular)
extent data disk byte 2381649588224 nr 81920
extent data offset 0 nr 81920 ram 81920
extent compression 0 (none)
item 33 key (1919805657 INODE_ITEM 0) itemoff 14176 itemsize 160
generation 2231988 transid 2231989 size 81856 nbytes 81920
block group 0 mode 100644 links 1 uid 33 gid 33 rdev 0
sequence 8 flags 0x14(NOCOMPRESS)
atime 1510290392.919317997 (2017-11-09 21:06:32)
ctime 1510290392.931317852 (2017-11-09 21:06:32)


> Although what we could try is to avoid BUG_ON(), but I'm afraid the
> problem is more severe than my expectation.
 
How does it look now?

> Any idea how such corruption happened?

Sigh, I wish I knew.

It feels like every btrfs filesystem I've had between my 3 systems has
gotten inexplicably corrupted at least once.
This one is not even using bcache, just dmcrypt underneath.

It's my only one using btrfs raid (1):
gargamel:~# btrfs fi show /dev/mapper/raid0d1 
Label: 'btrfs_space'  uuid: 01334b81-c0db-4e80-92e4-cac4da867651
Total devices 2 FS bytes used 1.12TiB
devid1 size 836.13GiB used 722.03GiB path /dev/mapper/raid0d1
devid2 size 836.13GiB used 722.03GiB path /dev/mapper/raid0d2

Data, RAID0: total=1.38TiB, used=1.11TiB
System, RAID1: total=32.00MiB, used=128.00KiB
Metadata, RAID1: total=13.00GiB, used=8.54GiB
GlobalReserve, single: total=512.00MiB, used=0.00B

Now, I didn't get errors or warnings, or even scrub warnings on it, I just ran 
btrfs check to see what would happen.

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs check: add_missing_dir_index: BUG_ON `ret` triggered, value -17

2017-11-16 Thread Marc MERLIN
Here's the whole output:
gargamel:~# btrfs-debug-tree -t 258 /dev/mapper/raid0d1 | grep 1919805647
location key (1919805647 INODE_ITEM 0) type FILE
item 30 key (1919805647 INODE_ITEM 0) itemoff 14415 itemsize 160
item 31 key (1919805647 INODE_REF 1919785864) itemoff 14389 itemsize 26
item 32 key (1919805647 EXTENT_DATA 0) itemoff 14336 itemsize 53
parent transid verify failed on 1173964603392 wanted 2244945 found 2247404
parent transid verify failed on 1173964603392 wanted 2244945 found 2247404
parent transid verify failed on 1173964603392 wanted 2244945 found 2247404
parent transid verify failed on 1173964603392 wanted 2244945 found 2247404
Ignoring transid failure
parent transid verify failed on 1652248100864 wanted 2245186 found 2247494
parent transid verify failed on 1652248100864 wanted 2245186 found 2247494
parent transid verify failed on 1652248100864 wanted 2245186 found 2247494
parent transid verify failed on 1652248100864 wanted 2245186 found 2247494
Ignoring transid failure
parent transid verify failed on 1174605512704 wanted 2245171 found 2247435
parent transid verify failed on 1174605512704 wanted 2245171 found 2247435
parent transid verify failed on 1174605512704 wanted 2245171 found 2247435
parent transid verify failed on 1174605512704 wanted 2245171 found 2247435
Ignoring transid failure
WARNING: eb corrupted: item 130 eb level 0 next level 2, skipping the rest


On Thu, Nov 16, 2017 at 10:17:07PM -0800, Marc MERLIN wrote:
> On Fri, Nov 17, 2017 at 01:17:19PM +0800, Qu Wenruo wrote:
> > 
> > 
> > On 2017年11月17日 10:26, Marc MERLIN wrote:
> > > Howdy,
> > > 
> > > Up to date git pull from btrfs-progs:
> > > 
> > > gargamel:~# btrfs check --repair /dev/mapper/raid0d1
> > > enabling repair mode
> > > Checking filesystem on /dev/mapper/raid0d1
> > > UUID: 01334b81-c0db-4e80-92e4-cac4da867651
> > > checking extents
> > > corrupt extent record: key 203003699200 168 40960
> > > corrupt extent record: key 203003764736 168 172032
> > > ref mismatch on [203003699200 40960] extent item 0, found 1
> > > Data backref 203003699200 root 258 owner 1933897829 offset 0 num_refs 0 
> > > not found in extent tree
> > > Incorrect local backref count on 203003699200 root 258 owner 1933897829 
> > > offset 0 found 1 wanted 0 back 0x5596988c2130
> > > backpointer mismatch on [203003699200 40960]
> > > repair deleting extent record: key 203003699200 168 40960
> > > adding new data backref on 203003699200 root 258 owner 1933897829 offset 
> > > 0 found 1
> > > Repaired extent references for 203003699200
> > > Data backref 203003764736 root 258 owner 1932315368 offset 0 num_refs 0 
> > > not found in extent tree
> > > Incorrect local backref count on 203003764736 root 258 owner 1932315368 
> > > offset 0 found 1 wanted 0 back 0x5596dde358f0
> > > backpointer mismatch on [203003764736 172032]
> > > repair deleting extent record: key 203003764736 168 172032
> > > adding new data backref on 203003764736 root 258 owner 1932315368 offset 
> > > 0 found 1
> > > Repaired extent references for 203003764736
> > > Fixed 0 roots.
> > > checking free space cache
> > > cache and super generation don't match, space cache will be invalidated
> > > checking fs roots
> > > invalid location in dir item 0
> > > Deleting bad dir index [1919785864,96,1958] root 258
> > > Deleting bad dir index [1919785864,96,1957] root 258
> > > repairing missing dir index item for inode 1919805647
> > > cmds-check.c:2614: add_missing_dir_index: BUG_ON `ret` triggered, value 
> > > -17
> > 
> > -EEXIST. Btrfs check --repair is trying to re-insert some key which
> > exists already.
> > 
> > Would you please provide the output of "btrfs-debug-tree -t 258 | grep
> > 1919805647" to help the debugging?
>  
> Sure. It may run all night and I'm going to bed now, but the output I
> got so far, is:
> gargamel:~# btrfs-debug-tree -t 258 /dev/mapper/raid0d1 | grep 1919805647
> location key (1919805647 INODE_ITEM 0) type FILE
> item 30 key (1919805647 INODE_ITEM 0) itemoff 14415 itemsize 160
> item 31 key (1919805647 INODE_REF 1919785864) itemoff 14389 itemsize 
> 26
> item 32 key (1919805647 EXTENT_DATA 0) itemoff 14336 itemsize 53
> (...)
> 
> I'll post tomorrow if I get more overnight
> 
> Marc
> -- 
> "A mouse is a device used to point at the xterm you want to type in" - A.S.R.
> Microsoft is to operating systems 
>    what McDonalds is to gourmet 
>

Re: btrfs check: add_missing_dir_index: BUG_ON `ret` triggered, value -17

2017-11-16 Thread Marc MERLIN
On Fri, Nov 17, 2017 at 01:17:19PM +0800, Qu Wenruo wrote:
> 
> 
> On 2017年11月17日 10:26, Marc MERLIN wrote:
> > Howdy,
> > 
> > Up to date git pull from btrfs-progs:
> > 
> > gargamel:~# btrfs check --repair /dev/mapper/raid0d1
> > enabling repair mode
> > Checking filesystem on /dev/mapper/raid0d1
> > UUID: 01334b81-c0db-4e80-92e4-cac4da867651
> > checking extents
> > corrupt extent record: key 203003699200 168 40960
> > corrupt extent record: key 203003764736 168 172032
> > ref mismatch on [203003699200 40960] extent item 0, found 1
> > Data backref 203003699200 root 258 owner 1933897829 offset 0 num_refs 0 not 
> > found in extent tree
> > Incorrect local backref count on 203003699200 root 258 owner 1933897829 
> > offset 0 found 1 wanted 0 back 0x5596988c2130
> > backpointer mismatch on [203003699200 40960]
> > repair deleting extent record: key 203003699200 168 40960
> > adding new data backref on 203003699200 root 258 owner 1933897829 offset 0 
> > found 1
> > Repaired extent references for 203003699200
> > Data backref 203003764736 root 258 owner 1932315368 offset 0 num_refs 0 not 
> > found in extent tree
> > Incorrect local backref count on 203003764736 root 258 owner 1932315368 
> > offset 0 found 1 wanted 0 back 0x5596dde358f0
> > backpointer mismatch on [203003764736 172032]
> > repair deleting extent record: key 203003764736 168 172032
> > adding new data backref on 203003764736 root 258 owner 1932315368 offset 0 
> > found 1
> > Repaired extent references for 203003764736
> > Fixed 0 roots.
> > checking free space cache
> > cache and super generation don't match, space cache will be invalidated
> > checking fs roots
> > invalid location in dir item 0
> > Deleting bad dir index [1919785864,96,1958] root 258
> > Deleting bad dir index [1919785864,96,1957] root 258
> > repairing missing dir index item for inode 1919805647
> > cmds-check.c:2614: add_missing_dir_index: BUG_ON `ret` triggered, value -17
> 
> -EEXIST. Btrfs check --repair is trying to re-insert some key which
> exists already.
> 
> Would you please provide the output of "btrfs-debug-tree -t 258 | grep
> 1919805647" to help the debugging?
 
Sure. It may run all night and I'm going to bed now, but the output I
got so far, is:
gargamel:~# btrfs-debug-tree -t 258 /dev/mapper/raid0d1 | grep 1919805647
location key (1919805647 INODE_ITEM 0) type FILE
item 30 key (1919805647 INODE_ITEM 0) itemoff 14415 itemsize 160
item 31 key (1919805647 INODE_REF 1919785864) itemoff 14389 itemsize 26
item 32 key (1919805647 EXTENT_DATA 0) itemoff 14336 itemsize 53
(...)

I'll post tomorrow if I get more overnight

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 4.13.12: kernel BUG at fs/btrfs/ctree.h:1802!

2017-11-16 Thread Marc MERLIN
On Fri, Nov 17, 2017 at 10:41:48AM +0500, Roman Mamedov wrote:
> On Thu, 16 Nov 2017 16:12:56 -0800
> Marc MERLIN <m...@merlins.org> wrote:
> 
> > On Thu, Nov 16, 2017 at 11:32:33PM +0100, Holger Hoffstätte wrote:
> > > Don't pop the champagne just yet, I just read that apprently 4.14 broke
> > > bcache for some people [1]. Not sure how much that affects you, but it 
> > > might
> > > well make things worse. Yeah, I know, wonderful.
> > 
> > Oh my, that's actually pretty terrible.
> > I've just reverted both my machines to 3.13, the last thing I need is more
> > btrfs corruption.
> 
> Why so far back though, the latest 4.4 and 4.9 are both good series and run
> without issues for me since a long time. Or perhaps you meant 4.13 :)
 
Typo indeed, I meant 4.13.

> I suggest that you try lvmcache instead. It's much more flexible than bcache,
> does pretty much the same job, and has much less of the "hacky" feel to it.

I can read up on it, it's going to be a big pain to convert from one to
the other, but I can look at it for new filesystems.

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


btrfs check: add_missing_dir_index: BUG_ON `ret` triggered, value -17

2017-11-16 Thread Marc MERLIN
Howdy,

Up to date git pull from btrfs-progs:

gargamel:~# btrfs check --repair /dev/mapper/raid0d1
enabling repair mode
Checking filesystem on /dev/mapper/raid0d1
UUID: 01334b81-c0db-4e80-92e4-cac4da867651
checking extents
corrupt extent record: key 203003699200 168 40960
corrupt extent record: key 203003764736 168 172032
ref mismatch on [203003699200 40960] extent item 0, found 1
Data backref 203003699200 root 258 owner 1933897829 offset 0 num_refs 0 not 
found in extent tree
Incorrect local backref count on 203003699200 root 258 owner 1933897829 offset 
0 found 1 wanted 0 back 0x5596988c2130
backpointer mismatch on [203003699200 40960]
repair deleting extent record: key 203003699200 168 40960
adding new data backref on 203003699200 root 258 owner 1933897829 offset 0 
found 1
Repaired extent references for 203003699200
Data backref 203003764736 root 258 owner 1932315368 offset 0 num_refs 0 not 
found in extent tree
Incorrect local backref count on 203003764736 root 258 owner 1932315368 offset 
0 found 1 wanted 0 back 0x5596dde358f0
backpointer mismatch on [203003764736 172032]
repair deleting extent record: key 203003764736 168 172032
adding new data backref on 203003764736 root 258 owner 1932315368 offset 0 
found 1
Repaired extent references for 203003764736
Fixed 0 roots.
checking free space cache
cache and super generation don't match, space cache will be invalidated
checking fs roots
invalid location in dir item 0
Deleting bad dir index [1919785864,96,1958] root 258
Deleting bad dir index [1919785864,96,1957] root 258
repairing missing dir index item for inode 1919805647
cmds-check.c:2614: add_missing_dir_index: BUG_ON `ret` triggered, value -17
btrfs(+0x52207)[0x55966a6fa207]
btrfs(+0x5225a)[0x55966a6fa25a]
btrfs(+0x5cef3)[0x55966a704ef3]
btrfs(cmd_check+0x2e10)[0x55966a70ea35]
btrfs(main+0x85)[0x55966a6b9dc3]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf1)[0x7f794aadf2b1]
btrfs(_start+0x2a)[0x55966a6b993a]
Aborted

-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 4.13.12: kernel BUG at fs/btrfs/ctree.h:1802!

2017-11-16 Thread Marc MERLIN
On Thu, Nov 16, 2017 at 11:32:33PM +0100, Holger Hoffstätte wrote:
> Don't pop the champagne just yet, I just read that apprently 4.14 broke
> bcache for some people [1]. Not sure how much that affects you, but it might
> well make things worse. Yeah, I know, wonderful.

Oh my, that's actually pretty terrible.
I've just reverted both my machines to 3.13, the last thing I need is more
btrfs corruption.

I'm also starting to question if I should just drop bcache. It does help
access to a big and slow-ish array, but corruption and periodic btrfs
full rebuilds is not something I can afford to do timewise :-/

> > As for 4.14, the serial console code seems broken though, I can't get login 
> > or bash
> > to work anymore on them:
> > [ 2786.305004] INFO: task login:5636 blocked for more than 120 seconds.
> > [ 2786.324648]   Tainted: G U  W   
> > 4.14.0-amd64-stkreg-sysrq-20171018 #1
> > [ 2786.347692] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
> > this message.
> > [ 2786.371742] login   D0  5636  1 0xa0020006
> 
> I'm out. :/

Yeah, I didn't expect you to know, or this list even, just warning that 4.14
is not "that great" yet (although that was before the "bcache will corrupt
your stuff", which now makes it "terrible" :( ).

On the plus side, I'm back to 4.13.12 and it hasn't crashed yet, so maybe
4.14 fixed the issue I had (wishful thinking)

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 4.13.12: kernel BUG at fs/btrfs/ctree.h:1802!

2017-11-16 Thread Marc MERLIN
On Thu, Nov 16, 2017 at 06:27:44PM +0100, Holger Hoffstätte wrote:
> On 11/16/17 18:07, Marc MERLIN wrote:
> > Sorry, was missing the kernel number in the subject, just fixed that.
> > 
> > On Thu, Nov 16, 2017 at 09:04:45AM -0800, Marc MERLIN wrote:
> >> My server now reboots every 20mn or so, with this.
> >> Sadly another BUG_ON() and it won't even tell me which filesystem
> >> it's on
> >>
> >> static inline u32 btrfs_extent_inline_ref_size(int type)
> >> {
> >>if (type == BTRFS_TREE_BLOCK_REF_KEY ||
> >>type == BTRFS_SHARED_BLOCK_REF_KEY)
> >>return sizeof(struct btrfs_extent_inline_ref);
> >>if (type == BTRFS_SHARED_DATA_REF_KEY)
> >>return sizeof(struct btrfs_shared_data_ref) +
> >>   sizeof(struct btrfs_extent_inline_ref);
> >>if (type == BTRFS_EXTENT_DATA_REF_KEY)
> >>return sizeof(struct btrfs_extent_data_ref) +
> >>   offsetof(struct btrfs_extent_inline_ref, offset);
> >>BUG();
> >>return 0;
> >> }
> 
> This BUG() was recently removed and seems to be caused by some kind
> of persistent corruption, which is seen as invalid inline extent.
> See [1], [2] for details. Maybe you can backport them?
> Alternatively just give 4.14 a whirl, it's great.
> 
> -h
> 
> [1] 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=167ce953ca55bdee20fe56c3c0fa51002435f745
> [2] 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4335958de2a43c6790c7f6aa0682aa7189983fa4

First thanks a lot for the quick reply, it was super timely considering
my server was rebooting every 20mn :)
I've now been running 4.14 for a couple of hours, and things seem ok
btrfs-wise.

So, just so that I understand:
1) I do have some kind of FS problem/corruption (minor? major?)

2) it started crashing 4.9.36 and then 4.13 today, every 20mn, probably due to 
some background
cleaner process that kept starting and hitting the problem spot

3) 4.14 does not crash anymore, but it doesn't even report any problem either. 
Does it mean
the error that crashed the old kernel is minor enough that the new kernel 
doesn't bother even
logging it?

4) I just ran scrub on the filesystem and it ran fine.

Sadly, while the BUG_ON was another one that failed to say which
mountpoint was affected, through painful trial and error, I think I
found out that it was affecting the root filesystem.
Doing a check or check --repair on that FS will be a major pain (need a rescue
media with the right version of dmcrypt, bcache, btrfs kernel, and btrfs progs)

I'm asusming that running btrfs check --force on a mounted filesystem
that is being used is not going to give useful results, unless I leave
the FS read only. Correct?


As for 4.14, the serial console code seems broken though, I can't get login or 
bash
to work anymore on them:
[ 2786.305004] INFO: task login:5636 blocked for more than 120 seconds.
[ 2786.324648]   Tainted: G U  W   
4.14.0-amd64-stkreg-sysrq-20171018 #1
[ 2786.347692] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[ 2786.371742] login   D0  5636  1 0xa0020006
[ 2786.388826] Call Trace:
[ 2786.396756]  __schedule+0x4b3/0x5bd
[ 2786.408077]  schedule+0x89/0x9a
[ 2786.418070]  schedule_timeout+0x43/0x101
[ 2786.430728]  ? default_wake_function+0x12/0x14
[ 2786.444620]  ? woken_wake_function+0x11/0x13
[ 2786.457967]  ldsem_down_write+0xe0/0x1a8
[ 2786.470293]  ? ldsem_down_write+0xe0/0x1a8
[ 2786.483143]  ? __wake_up_common_lock+0xa6/0xcf
[ 2786.497039]  tty_ldisc_lock+0x16/0x30
[ 2786.508587]  ? tty_ldisc_lock+0x16/0x30
[ 2786.520655]  tty_ldisc_hangup+0xbb/0x170
[ 2786.533000]  __tty_hangup+0x15f/0x21d
[ 2786.544541]  tty_vhangup_session+0x13/0x15
[ 2786.557388]  disassociate_ctty+0x51/0x209
[ 2786.570004]  do_exit+0x43a/0x923
[ 2786.580262]  ? recalc_sigpending_tsk+0x42/0x49
[ 2786.594120]  do_group_exit+0x6c/0xa5
[ 2786.605419]  get_signal+0x46b/0x4b3
[ 2786.616464]  do_signal+0x37/0x5ed
[ 2786.626969]  ? list_add+0x34/0x34
[ 2786.637474]  ? C_SYSC_wait4+0x49/0x99
[ 2786.649099]  ? handle_mm_fault+0x10f/0x17f
[ 2786.661968]  prepare_exit_to_usermode+0x94/0xef
[ 2786.676115]  syscall_return_slowpath+0xb9/0xd9
[ 2786.690035]  do_fast_syscall_32+0xc3/0xfe
[ 2786.702897]  entry_SYSENTER_compat+0x4c/0x5b
[ 2786.716272] RIP: 0023:0xf7f45c29
[ 2786.726496] RSP: 002b:ffb5d0f0 EFLAGS: 0246 ORIG_RAX: 
0072
[ 2786.749827] RAX: fe00 RBX:  RCX: 
[ 2786.772104] RDX:  RSI:  RDI: 080504ec
[ 2786.794087] RBP: ffb5f638 R08:  R09: 
[ 2786.794088] R10:  R11: 00

4.13.12: kernel BUG at fs/btrfs/ctree.h:1802!

2017-11-16 Thread Marc MERLIN
Sorry, was missing the kernel number in the subject, just fixed that.

On Thu, Nov 16, 2017 at 09:04:45AM -0800, Marc MERLIN wrote:
> My server now reboots every 20mn or so, with this.
> Sadly another BUG_ON() and it won't even tell me which filesystem
> it's on
> 
> static inline u32 btrfs_extent_inline_ref_size(int type)
> {
>   if (type == BTRFS_TREE_BLOCK_REF_KEY ||
>   type == BTRFS_SHARED_BLOCK_REF_KEY)
>   return sizeof(struct btrfs_extent_inline_ref);
>   if (type == BTRFS_SHARED_DATA_REF_KEY)
>   return sizeof(struct btrfs_shared_data_ref) +
>  sizeof(struct btrfs_extent_inline_ref);
>   if (type == BTRFS_EXTENT_DATA_REF_KEY)
>   return sizeof(struct btrfs_extent_data_ref) +
>  offsetof(struct btrfs_extent_inline_ref, offset);
>   BUG();
>   return 0;
> }
> 
> 
> 
> [ 1399.728735] [ cut here ]
> [ 1399.744149] kernel BUG at fs/btrfs/ctree.h:1802!
> [ 1399.759400] invalid opcode:  [#1] PREEMPT SMP
> [ 1399.774892] Modules linked in: veth ip6table_filter ip6_tables ebtable_nat 
> ebtables ppdev lp xt_addrtype br_netfilter bridge stp llc tun autofs4 softdog 
> binfmt_misc ftdi_sio nfsd auth_rpcgss nfs_acl nfs lockd grace fscache sunrpc 
> ipt_REJECT nf_reject_ipv4 xt_conntrack xt_mark xt_nat xt_tcpudp nf_log_ipv4 
> nf_log_common xt_LOG iptable_mangle iptable_filter lm85 hwmon_vid pl2303 
> dm_snapshot dm_bufio iptable_nat ip_tables nf_conntrack_ipv4 nf_defrag_ipv4 
> nf_nat_ipv4 nf_conntrack_ftp ipt_MASQUERADE nf_nat_masquerade_ipv4 nf_nat 
> nf_conntrack x_tables sg st snd_pcm_oss snd_mixer_oss bcache kvm_intel kvm 
> irqbypass snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel 
> snd_cmipci snd_hda_codec snd_mpu401_uart eeepc_wmi snd_opl3_lib snd_hda_core 
> snd_rawmidi asus_wmi sparse_keymap snd_seq_device asix snd_hwdep
> [ 1399.997524]  rc_ati_x10 tpm_infineon snd_pcm rfkill snd_timer ati_remote 
> usbnet tpm_tis usbserial libphy i915 rc_core hwmon tpm_tis_core snd wmi_bmof 
> mei_me lpc_ich i2c_i801 soundcore battery pcspkr evdev input_leds tpm wmi 
> parport_pc parport e1000e ptp pps_core fuse raid456 multipath mmc_block 
> mmc_core lrw ablk_helper dm_crypt dm_mod async_raid6_recov async_pq async_xor 
> async_memcpy async_tx crc32c_intel blowfish_x86_64 blowfish_common pcbc 
> aesni_intel aes_x86_64 crypto_simd glue_helper cryptd xhci_pci ehci_pci mvsas 
> ehci_hcd xhci_hcd libsas r8169 scsi_transport_sas mii usbcore sata_sil24 
> thermal fan [last unloaded: ftdi_sio]
> [ 1400.174918] CPU: 0 PID: 80 Comm: kworker/u16:1 Tainted: G U  W   
> 4.13.12-amd64-stkreg-sysrq-20171018 #2
> [ 1400.206640] Hardware name: System manufacturer System Product Name/P8H67-M 
> PRO, BIOS 3904 04/27/2013
> [ 1400.235418] Workqueue: btrfs-extent-refs btrfs_extent_refs_helper
> [ 1400.255104] task: 8a94fa750180 task.stack: b632c3404000
> [ 1400.274346] RIP: 0010:btrfs_extent_inline_ref_size+0x29/0x39
> [ 1400.292749] RSP: 0018:b632c3407b28 EFLAGS: 00210297
> [ 1400.309756] RAX: 001d RBX: 8a94f59f4a80 RCX: 
> 8a9127287000
> [ 1400.332482] RDX: 2000 RSI: 2598 RDI: 
> 
> [ 1400.355188] RBP: b632c3407b28 R08: b632c3407ae8 R09: 
> b632c3407af0
> [ 1400.377871] R10:  R11: 2000 R12: 
> 
> [ 1400.400537] R13:  R14:  R15: 
> 2598
> [ 1400.423912] FS:  () GS:8a951e20() 
> knlGS:
> [ 1400.449416] CS:  0010 DS:  ES:  CR0: 80050033
> [ 1400.467878] CR2: 0be9c3f8 CR3: 0004b5c09000 CR4: 
> 001406f0
> [ 1400.490491] Call Trace:
> [ 1400.499066]  lookup_inline_extent_backref+0x2ff/0x411
> [ 1400.515452]  ? ___cache_free+0x200/0x25c
> [ 1400.528390]  __btrfs_free_extent+0xeb/0xa5d
> [ 1400.542077]  ? ___cache_free+0x1e8/0x25c
> [ 1400.554950]  __btrfs_run_delayed_refs+0xb6c/0xd52
> [ 1400.570208]  ? _raw_spin_unlock_irqrestore+0x14/0x24
> [ 1400.586191]  ? try_to_wake_up+0x251/0x277
> [ 1400.599288]  btrfs_run_delayed_refs+0x77/0x1a1
> [ 1400.613653]  delayed_ref_async_start+0x5e/0x9b
> [ 1400.628580]  btrfs_scrubparity_helper+0x111/0x271
> [ 1400.644271]  ? pwq_activate_delayed_work+0x4d/0x5b
> [ 1400.659614]  btrfs_extent_refs_helper+0xe/0x10
> [ 1400.673972]  process_one_work+0x179/0x2a5
> [ 1400.686951]  worker_thread+0x1b1/0x262
> [ 1400.699155]  ? rescuer_thread+0x273/0x273
> [ 1400.712069]  kthread+0xfb/0x100
> [ 1400.722354]  ? init_completion+0x24/0x24
> [ 1400.734949]  ret_from_fork+0x25/0x30
> [ 1400.746497] Code: 5d c3 55 81 ff b0 00 00 00 48 89 e5 74 1f 81 ff b6

kernel BUG at fs/btrfs/ctree.h:1802!

2017-11-16 Thread Marc MERLIN
My server now reboots every 20mn or so, with this.
Sadly another BUG_ON() and it won't even tell me which filesystem
it's on

static inline u32 btrfs_extent_inline_ref_size(int type)
{
if (type == BTRFS_TREE_BLOCK_REF_KEY ||
type == BTRFS_SHARED_BLOCK_REF_KEY)
return sizeof(struct btrfs_extent_inline_ref);
if (type == BTRFS_SHARED_DATA_REF_KEY)
return sizeof(struct btrfs_shared_data_ref) +
   sizeof(struct btrfs_extent_inline_ref);
if (type == BTRFS_EXTENT_DATA_REF_KEY)
return sizeof(struct btrfs_extent_data_ref) +
   offsetof(struct btrfs_extent_inline_ref, offset);
BUG();
return 0;
}



[ 1399.728735] [ cut here ]
[ 1399.744149] kernel BUG at fs/btrfs/ctree.h:1802!
[ 1399.759400] invalid opcode:  [#1] PREEMPT SMP
[ 1399.774892] Modules linked in: veth ip6table_filter ip6_tables ebtable_nat 
ebtables ppdev lp xt_addrtype br_netfilter bridge stp llc tun autofs4 softdog 
binfmt_misc ftdi_sio nfsd auth_rpcgss nfs_acl nfs lockd grace fscache sunrpc 
ipt_REJECT nf_reject_ipv4 xt_conntrack xt_mark xt_nat xt_tcpudp nf_log_ipv4 
nf_log_common xt_LOG iptable_mangle iptable_filter lm85 hwmon_vid pl2303 
dm_snapshot dm_bufio iptable_nat ip_tables nf_conntrack_ipv4 nf_defrag_ipv4 
nf_nat_ipv4 nf_conntrack_ftp ipt_MASQUERADE nf_nat_masquerade_ipv4 nf_nat 
nf_conntrack x_tables sg st snd_pcm_oss snd_mixer_oss bcache kvm_intel kvm 
irqbypass snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel snd_cmipci 
snd_hda_codec snd_mpu401_uart eeepc_wmi snd_opl3_lib snd_hda_core snd_rawmidi 
asus_wmi sparse_keymap snd_seq_device asix snd_hwdep
[ 1399.997524]  rc_ati_x10 tpm_infineon snd_pcm rfkill snd_timer ati_remote 
usbnet tpm_tis usbserial libphy i915 rc_core hwmon tpm_tis_core snd wmi_bmof 
mei_me lpc_ich i2c_i801 soundcore battery pcspkr evdev input_leds tpm wmi 
parport_pc parport e1000e ptp pps_core fuse raid456 multipath mmc_block 
mmc_core lrw ablk_helper dm_crypt dm_mod async_raid6_recov async_pq async_xor 
async_memcpy async_tx crc32c_intel blowfish_x86_64 blowfish_common pcbc 
aesni_intel aes_x86_64 crypto_simd glue_helper cryptd xhci_pci ehci_pci mvsas 
ehci_hcd xhci_hcd libsas r8169 scsi_transport_sas mii usbcore sata_sil24 
thermal fan [last unloaded: ftdi_sio]
[ 1400.174918] CPU: 0 PID: 80 Comm: kworker/u16:1 Tainted: G U  W   
4.13.12-amd64-stkreg-sysrq-20171018 #2
[ 1400.206640] Hardware name: System manufacturer System Product Name/P8H67-M 
PRO, BIOS 3904 04/27/2013
[ 1400.235418] Workqueue: btrfs-extent-refs btrfs_extent_refs_helper
[ 1400.255104] task: 8a94fa750180 task.stack: b632c3404000
[ 1400.274346] RIP: 0010:btrfs_extent_inline_ref_size+0x29/0x39
[ 1400.292749] RSP: 0018:b632c3407b28 EFLAGS: 00210297
[ 1400.309756] RAX: 001d RBX: 8a94f59f4a80 RCX: 8a9127287000
[ 1400.332482] RDX: 2000 RSI: 2598 RDI: 
[ 1400.355188] RBP: b632c3407b28 R08: b632c3407ae8 R09: b632c3407af0
[ 1400.377871] R10:  R11: 2000 R12: 
[ 1400.400537] R13:  R14:  R15: 2598
[ 1400.423912] FS:  () GS:8a951e20() 
knlGS:
[ 1400.449416] CS:  0010 DS:  ES:  CR0: 80050033
[ 1400.467878] CR2: 0be9c3f8 CR3: 0004b5c09000 CR4: 001406f0
[ 1400.490491] Call Trace:
[ 1400.499066]  lookup_inline_extent_backref+0x2ff/0x411
[ 1400.515452]  ? ___cache_free+0x200/0x25c
[ 1400.528390]  __btrfs_free_extent+0xeb/0xa5d
[ 1400.542077]  ? ___cache_free+0x1e8/0x25c
[ 1400.554950]  __btrfs_run_delayed_refs+0xb6c/0xd52
[ 1400.570208]  ? _raw_spin_unlock_irqrestore+0x14/0x24
[ 1400.586191]  ? try_to_wake_up+0x251/0x277
[ 1400.599288]  btrfs_run_delayed_refs+0x77/0x1a1
[ 1400.613653]  delayed_ref_async_start+0x5e/0x9b
[ 1400.628580]  btrfs_scrubparity_helper+0x111/0x271
[ 1400.644271]  ? pwq_activate_delayed_work+0x4d/0x5b
[ 1400.659614]  btrfs_extent_refs_helper+0xe/0x10
[ 1400.673972]  process_one_work+0x179/0x2a5
[ 1400.686951]  worker_thread+0x1b1/0x262
[ 1400.699155]  ? rescuer_thread+0x273/0x273
[ 1400.712069]  kthread+0xfb/0x100
[ 1400.722354]  ? init_completion+0x24/0x24
[ 1400.734949]  ret_from_fork+0x25/0x30
[ 1400.746497] Code: 5d c3 55 81 ff b0 00 00 00 48 89 e5 74 1f 81 ff b6 00 00 
00 74 17 81 ff b8 00 00 00 74 16 81 ff b2 00 00 00 b8 1d 00 00 00 74 0e <0f> 0b 
b8 09 00 00 00 eb 05 b8 0d 00 00 00 5d c3 55 48 89 f0 48
[ 1400.804753] RIP: btrfs_extent_inline_ref_size+0x29/0x39 RSP: b632c3407b28
[ 1400.827109] ---[ end trace 70850509bfd007d7 ]---
[ 1400.841833] Kernel panic - not syncing: Fatal exception
[ 1400.858356] Kernel Offset: 0x1700 from 0x8100 (relocation 
range: 0x8000-0xbfff)


-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft 

Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)

2017-09-27 Thread Marc MERLIN
On Sun, Sep 10, 2017 at 05:22:14PM -0700, Marc MERLIN wrote:
> On Sun, Sep 10, 2017 at 01:16:26PM +, Josef Bacik wrote:
> > Great, if the free space cache is fucked again after the next go
> > around then I need to expand the verifier to watch entries being added
> > to the cache as well.  Thanks,
> 
> Well, I copied about 1TB of data, and nothing happened.
> So it seems clearing it and fsck may have fixed this fault I had been
> carrying for quite a while.
> If so, yeah!
> 
> I'm not sure if this needs a kernel fix to not get triggered and if
> btrfs check should also be improved to catch this, but hopefully you
> know what makes sense there.

Just to report back, it's now been another 2 weeks, and no problem.
Seems that forcing the clear cache was actually the issue. Not sure if
the kernel should have found/detected/auto fixed the problem or if btrfs
check should have.

Either way, thanks for your help.

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs-progs: Output time elapsed for each major tree it checked

2017-09-11 Thread Marc MERLIN
On Mon, Sep 11, 2017 at 02:26:23PM +0900, Qu Wenruo wrote:
> Marc reported that "btrfs check --repair" runs much faster than "btrfs
> check", which is quite weird.
> 
> This patch will add time elapsed for each major tree it checked, for
> both original mode and lowmem mode, so we can have a clue what's going
> wrong.
 
Thanks.
Sadly, as you may have seen in the other thread, after I ran clear_cache
to fix another bug, now check --repair takes around 3H is is slower than
regular check, as it's expected to be.
But next time I need to run check, I'll try check vs check --repair
again and report the times if they are weird.

In the meantime you can check in btrfs-progs WIP and maybe someone else
will get useful time data before I can again.

Thanks,
Marc

> Reported-by: Marc MERLIN <m...@merlins.org>
> Signed-off-by: Qu Wenruo <quwenruo.bt...@gmx.com>
> ---
>  cmds-check.c | 21 +++--
>  utils.h  | 24 
>  2 files changed, 43 insertions(+), 2 deletions(-)
> 
> diff --git a/cmds-check.c b/cmds-check.c
> index 006edbde..fee806cd 100644
> --- a/cmds-check.c
> +++ b/cmds-check.c
> @@ -5318,13 +5318,16 @@ static int do_check_fs_roots(struct btrfs_fs_info 
> *fs_info,
> struct cache_tree *root_cache)
>  {
>   int ret;
> + struct timer timer;
>  
>   if (!ctx.progress_enabled)
>   fprintf(stderr, "checking fs roots\n");
> + start_timer();
>   if (check_mode == CHECK_MODE_LOWMEM)
>   ret = check_fs_roots_v2(fs_info);
>   else
>   ret = check_fs_roots(fs_info, root_cache);
> + printf("done in %d seconds\n", stop_timer());
>  
>   return ret;
>  }
> @@ -11584,14 +11587,16 @@ out:
>  static int do_check_chunks_and_extents(struct btrfs_fs_info *fs_info)
>  {
>   int ret;
> + struct timer timer;
>  
>   if (!ctx.progress_enabled)
>   fprintf(stderr, "checking extents\n");
> + start_timer();
>   if (check_mode == CHECK_MODE_LOWMEM)
>   ret = check_chunks_and_extents_v2(fs_info);
>   else
>   ret = check_chunks_and_extents(fs_info);
> -
> + printf("done in %d seconds\n", stop_timer());
>   return ret;
>  }
>  
> @@ -12772,6 +12777,7 @@ int cmd_check(int argc, char **argv)
>   int qgroups_repaired = 0;
>   unsigned ctree_flags = OPEN_CTREE_EXCLUSIVE;
>   int force = 0;
> + struct timer timer;
>  
>   while(1) {
>   int c;
> @@ -12953,8 +12959,11 @@ int cmd_check(int argc, char **argv)
>   if (repair)
>   ctree_flags |= OPEN_CTREE_PARTIAL;
>  
> + printf("opening btrfs filesystem\n");
> + start_timer();
>   info = open_ctree_fs_info(argv[optind], bytenr, tree_root_bytenr,
> chunk_root_bytenr, ctree_flags);
> + printf("done in %d seconds\n", stop_timer());
>   if (!info) {
>   error("cannot open file system");
>   ret = -EIO;
> @@ -13115,8 +13124,10 @@ int cmd_check(int argc, char **argv)
>   else
>   fprintf(stderr, "checking free space cache\n");
>   }
> + start_timer();
>   ret = check_space_cache(root);
>   err |= !!ret;
> + printf("done in %d seconds\n", stop_timer());
>   if (ret) {
>   if (btrfs_fs_compat_ro(info, FREE_SPACE_TREE))
>   error("errors found in free space tree");
> @@ -13140,18 +13151,22 @@ int cmd_check(int argc, char **argv)
>   }
>  
>   fprintf(stderr, "checking csums\n");
> + start_timer();
>   ret = check_csums(root);
>   err |= !!ret;
> + printf("done in %d seconds\n", stop_timer());
>   if (ret) {
>   error("errors found in csum tree");
>   goto out;
>   }
>  
> - fprintf(stderr, "checking root refs\n");
>   /* For low memory mode, check_fs_roots_v2 handles root refs */
>   if (check_mode != CHECK_MODE_LOWMEM) {
> + fprintf(stderr, "checking root refs\n");
> + start_timer();
>   ret = check_root_refs(root, _cache);
>   err |= !!ret;
> + printf("done in %d seconds\n", stop_timer());
>   if (ret) {
>   error("errors found in root refs");
>   goto out;
> @@ -13186,8 +13201,10 @@ int cmd_check(int argc, char **argv)
>  
>   if (info->quota_enabled) {
>   fprintf(stderr, &q

Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)

2017-09-10 Thread Marc MERLIN
On Sun, Sep 10, 2017 at 01:16:26PM +, Josef Bacik wrote:
> Great, if the free space cache is fucked again after the next go
> around then I need to expand the verifier to watch entries being added
> to the cache as well.  Thanks,

Well, I copied about 1TB of data, and nothing happened.
So it seems clearing it and fsck may have fixed this fault I had been
carrying for quite a while.
If so, yeah!

I'm not sure if this needs a kernel fix to not get triggered and if
btrfs check should also be improved to catch this, but hopefully you
know what makes sense there.

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: netapp-alike snapshots?

2017-09-10 Thread Marc MERLIN
On Sat, Sep 09, 2017 at 10:43:16PM +0300, Andrei Borzenkov wrote:
> 09.09.2017 16:44, Ulli Horlacher пишет:
> > 
> > Your tool does not create .snapshot subdirectories in EVERY directory like
> 
> Neither does NetApp. Those "directories" are magic handles that do not
> really exist.
 
Correct, thanks for saving me typing the same thing (I actually did work
at netapp many years back, so I'm familiar with how they work)

> > Netapp does.
> > Example:
> > 
> > framstag@fex:~: cd ~/Mail/.snapshot/
> > framstag@fex:~/Mail/.snapshot: l
> > lR-X - 2017-09-09 09:55 2017-09-09_.daily -> 
> > /local/home/.snapshot/2017-09-09_.daily/framstag/Mail
> 
> Apart from obvious problem with recursive directory traversal (NetApp
> .snapshot are not visible with normal directory list) those will also be
> captured in snapshots and cannot be removed. NetApp snapshots themselves
> do not expose .snapshot "directories".

Correct. Netapp knows this of course, which is why those .snapshot
directories are "magic" and hidden to ls(1), find(1) and others when
they do a readdir(3)

> > lR-X - 2017-09-09 14:00 2017-09-09_1400.hourly -> 
> > /local/home/.snapshot/2017-09-09_1400.hourly/framstag/Mail
> > lR-X - 2017-09-09 15:00 2017-09-09_1500.hourly -> 
> > /local/home/.snapshot/2017-09-09_1500.hourly/framstag/Mail
> > lR-X - 2017-09-09 15:18 2017-09-09_1518.single -> 
> > /local/home/.snapshot/2017-09-09_1518.single/framstag/Mail
> > lR-X - 2017-09-09 15:20 2017-09-09_1520.single -> 
> > /local/home/.snapshot/2017-09-09_1520.single/framstag/Mail
> > lR-X - 2017-09-09 15:22 2017-09-09_1522.single -> 
> > /local/home/.snapshot/2017-09-09_1522.single/framstag/Mail
> > 
> > My users (and I) need snapshots in this way.

You are used to them being there, I was too :)
While you could create lots of symlinks, I opted not to since it would
have littered the filesystem.
I can simply cd $(SNAPROOT)/volname_hourly/$(PWD)
and end up where I wanted to be.

I suppose you could make a snapcd shell function that does this for you.
The only issue is that volname_hourly comes before the rest of the path,
so you aren't given a list of all the snapshots available for a given
path, you have to cd into the given snapshot first, and then add the
path.
I agree it's not as nice as netapp, but honestly I don't think you can
do better with btrfs at this point.

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs check --repair now runs in minutes instead of hours? aborting

2017-09-10 Thread Marc MERLIN
On Sun, Sep 10, 2017 at 02:01:58PM +0800, Qu Wenruo wrote:
> 
> 
> On 2017年09月10日 01:44, Marc MERLIN wrote:
> > So, should I assume that btrfs progs git has some issue since there is
> > no plausible way that a check --repair should be faster than a regular
> > check?
> 
> Yes, the assumption that repair should be no faster than RO check is
> correct.
> Especially for clean fs, repair should just behave the same as RO check.
> 
> And I'll first submit a patch (or patches) to output the consumed time for
> each tree, so we could have a clue what is going wrong.
> (Digging the code is just a little too boring for me)

Cool. Let me know when I should sync and re-try.
In the meantime, though, my check --repair went back to 170mn after
triggering an FS bug for Josef, so it seems back to normal.

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)

2017-09-10 Thread Marc MERLIN
On Sun, Sep 10, 2017 at 03:12:16AM +, Josef Bacik wrote:
> Ok mount -o clear_cache, umount and run fsck again just to make sure.  Then 
> if it comes out clean mount with ref_verify again and wait for it to blow up 
> again.  Thanks,
 
Ok, just did the 2nd fsck, came back clean after mount -o clear_cache

I'll re-trigger the exact same bug and repeat the whole cycle then.

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)

2017-09-09 Thread Marc MERLIN
On Sat, Sep 09, 2017 at 10:56:14PM +, Josef Bacik wrote:
> Well that's odd, a block allocated on disk is in the free space cache.  Can I 
> see the full output of the fsck?  I want to make sure it's actually getting 
> to the part where it checks the free space cache.  If it does then I'll have 
> to think of how to catch this kind of bug, because you've got a weird one.  
> Thanks,
 
Well, btrfs check was clean before, that, but now I returned this:
gargamel:~# time btrfs check /dev/mapper/dshelf1  
Checking filesystem on /dev/mapper/dshelf1  
UUID: 36f5079e-ca6c-4855-8639-ccb82695c18d  
checking extents  
checking free space cache  
Wanted bytes 16384, found 196608 for off 13282417049600  
Wanted bytes 536870912, found 196608 for off 13282417049600  
cache appears valid but isn't 13282417049600  
There is no free space entry for 13849889603584-13849889652736  
There is no free space entry for 13849889603584-13850426474496  
cache appears valid but isn't 13849889603584  
Wanted bytes 5832704, found 81920 for off 13870290698240  
Wanted bytes 536870912, found 81920 for off 13870290698240  
cache appears valid but isn't 13870290698240  
block group 13928272756736 has wrong amount of free space  
failed to load free space cache for block group 13928272756736  
Duplicate entries in free space cache  
failed to load free space cache for block group 13962095624192  
block group 14003434684416 has wrong amount of free space  
failed to load free space cache for block group 14003434684416  
block group 14470042615808 has wrong amount of free space  
failed to load free space cache for block group 14470042615808  
block group 14610702794752 has wrong amount of free space  
failed to load free space cache for block group 14610702794752  
block group 14612313407488 has wrong amount of free space  
failed to load free space cache for block group 14612313407488  
block group 14624661438464 has wrong amount of free space  
failed to load free space cache for block group 14624661438464  
block group 14648820629504 has wrong amount of free space  
failed to load free space cache for block group 14648820629504  
Wanted offset 14657410793472, found 14657410760704  
Wanted offset 14657410793472, found 14657410760704  
cache appears valid but isn't 14657410564096  
block group 15886844952576 has wrong amount of free space  
failed to load free space cache for block group 15886844952576  
There is no free space entry for 15905635434496-15905636499456  
There is no free space entry for 15905635434496-15906172305408  
cache appears valid but isn't 15905635434496  
block group 16542901207040 has wrong amount of free space  
failed to load free space cache for block group 16542901207040  
block group 16581019041792 has wrong amount of free space  
failed to load free space cache for block group 16581019041792  
block group 16616989392896 has wrong amount of free space  
failed to load free space cache for block group 16616989392896  
block group 16676582064128 has wrong amount of free space  
failed to load free space cache for block group 16676582064128  
block group 16697520029696 has wrong amount of free space  
failed to load free space cache for block group 16697520029696  
block group 16848380755968 has wrong amount of free space  
failed to load free space cache for block group 16848380755968  
ERROR: errors found in free space cache  
found 11732749766656 bytes used, error(s) found  
total csum bytes: 11441478452  
total tree bytes: 13793296384  
total fs tree bytes: 727580672  
total extent tree bytes: 483426304  
btree space waste bytes: 1194373662  
file data blocks allocated: 12133646495744  
 referenced 12155707805696  
  
real100m12.252s  
user0m33.771s  
sys 1m11.220s 

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)

2017-09-09 Thread Marc MERLIN
On Tue, Sep 05, 2017 at 06:19:25PM +, Josef Bacik wrote:
> Alright I just reworked the build tree ref stuff and tested it to make sure 
> it wasn’t going to give false positives again.  Apparently I had only ever 
> used this with very basic existing fs’es and nothing super complicated, so it 
> was just broken for anything complex.  I’ve pushed it to my tree, you can 
> just pull and build and try again.  This time the stack traces will even 
> work!  Thanks,
 
Ok, so I found out that I just need to copy a bunch of data to the
filesystem to trigger the bug.

There you go:
[318400.507972] re-allocated a block that still has references to it!
[318400.527517] Dumping block entry [13282417065984 16384], num_refs 2, 
metadata 1, from disk 1
[318400.553751]   Ref root 2, parent 0, owner 0, offset 0, num_refs 1
[318400.573208]   Root entry 2, num_refs 1
[318400.585614]   Root entry 7, num_refs 0
[318400.598028]   Ref action 3, root 7, ref_root 7, parent 0, owner 1, offset 
0, num_refs 1
[318400.623774]btrfs_alloc_tree_block+0x33e/0x3e1
[318400.639083]__btrfs_cow_block+0xf3/0x420
[318400.652817]btrfs_cow_block+0xcf/0x145
[318400.666024]btrfs_search_slot+0x269/0x6de
[318400.680041]btrfs_del_csums+0xac/0x2f9
[318400.693245]__btrfs_free_extent+0x88b/0xa0b
[318400.707718]__btrfs_run_delayed_refs+0xb4e/0xd20
[318400.723491]btrfs_run_delayed_refs+0x77/0x1a1
[318400.738993]btrfs_write_dirty_block_groups+0xf5/0x2c1
[318400.755994]commit_cowonly_roots+0x1da/0x273
[318400.770673]btrfs_commit_transaction+0x3dd/0x761
[318400.786397]transaction_kthread+0xe2/0x178
[318400.800515]kthread+0xfb/0x100
[318400.811487]ret_from_fork+0x25/0x30
[318400.823748]0x
[318400.957574] [ cut here ]
[318400.972498] WARNING: CPU: 2 PID: 3242 at fs/btrfs/extent-tree.c:3015 
btrfs_run_delayed_refs+0xa2/0x1a1
[318401.001382] Modules linked in: veth ip6table_filter ip6_tables ebtable_nat 
ebtables ppdev lp xt_addrtype br_netfilter bridge stp llc tun autofs4 softdog 
binfmt_misc ftdi_sio nfsd auth_rpcgss nfs_acl nfs lockd grace fscache sunrpc 
ipt_REJECT nf_reject_ipv4 xt_conntrack xt_mark xt_nat xt_tcpudp nf_log_ipv4 
nf_log_common xt_LOG iptable_mangle iptable_filter lm85 hwmon_vid pl2303 
dm_snapshot dm_bufio iptable_nat ip_tables nf_conntrack_ipv4 nf_defrag_ipv4 
nf_nat_ipv4 nf_conntrack_ftp ipt_MASQUERADE nf_nat_masquerade_ipv4 nf_nat 
nf_conntrack x_tables sg st snd_pcm_oss snd_mixer_oss bcache kvm_intel kvm 
irqbypass snd_hda_codec_realtek snd_hda_codec_generic snd_cmipci snd_hda_intel 
snd_mpu401_uart snd_hda_codec snd_opl3_lib eeepc_wmi snd_hda_core tpm_infineon 
snd_rawmidi asix asus_wmi rc_ati_x10 tpm_tis
[318401.218357]  snd_seq_device sparse_keymap snd_hwdep tpm_tis_core ati_remote 
usbnet parport_pc snd_pcm rfkill pcspkr i915 hwmon tpm parport rc_core libphy 
mei_me snd_timer lpc_ich wmi_bmof battery usbserial evdev wmi input_leds 
i2c_i801 snd soundcore e1000e ptp pps_core fuse raid456 multipath mmc_block 
mmc_core lrw ablk_helper dm_crypt dm_mod async_raid6_recov async_pq async_xor 
async_memcpy async_tx crc32c_intel blowfish_x86_64 blowfish_common pcbc 
aesni_intel aes_x86_64 crypto_simd glue_helper cryptd xhci_pci ehci_pci 
xhci_hcd ehci_hcd mvsas libsas r8169 sata_sil24 usbcore mii scsi_transport_sas 
thermal fan [last unloaded: ftdi_sio]
[318401.392440] CPU: 2 PID: 3242 Comm: btrfs-transacti Tainted: G U 
 4.13.0-rc5-amd64-stkreg-sysrq-20170902d+ #6
[318401.426262] Hardware name: System manufacturer System Product Name/P8H67-M 
PRO, BIOS 3904 04/27/2013
[318401.454894] task: 948ef791e200 task.stack: b18a091ec000
[318401.473918] RIP: 0010:btrfs_run_delayed_refs+0xa2/0x1a1
[318401.490849] RSP: 0018:b18a091efd08 EFLAGS: 00010296
[318401.507751] RAX: 0026 RBX: 9488208be618 RCX: 

[318401.530384] RDX: 948f1e295e01 RSI: 948f1e28dd58 RDI: 
948f1e28dd58
[318401.553548] RBP: b18a091efd50 R08: 0003dc12ea8bcc57 R09: 
948f1f50b868
[318401.576127] R10: 948b1f1cc460 R11: aef37285 R12: 
ffef
[318401.598717] R13:  R14: 948edb7efd48 R15: 
948cdbdeb000
[318401.621327] FS:  () GS:948f1e28() 
knlGS:
[318401.646737] CS:  0010 DS:  ES:  CR0: 80050033
[318401.665149] CR2: f7f05001 CR3: 00061f587000 CR4: 
001406e0
[318401.687684] Call Trace:
[318401.696148]  btrfs_write_dirty_block_groups+0xf5/0x2c1
[318401.712745]  ? btrfs_run_delayed_refs+0x127/0x1a1
[318401.727981]  commit_cowonly_roots+0x1da/0x273
[318401.742183]  btrfs_commit_transaction+0x3dd/0x761
[318401.757447]  transaction_kthread+0xe2/0x178
[318401.771158]  ? btrfs_cleanup_transaction+0x3c2/0x3c2
[318401.787169]  kthread+0xfb/0x100
[318401.797769]  ? init_completion+0x24/0x24
[318401.810718]  ret_from_fork+0x25/0x30
[318401.822588] Code: 85 c0 41 89 c4 79 60 48 8b 43 60 f0 0f ba a8 d8 

Re: btrfs check --repair now runs in minutes instead of hours? aborting

2017-09-09 Thread Marc MERLIN
So, should I assume that btrfs progs git has some issue since there is
no plausible way that a check --repair should be faster than a regular
check?

Thanks,
Marc

On Tue, Sep 05, 2017 at 07:45:25AM -0700, Marc MERLIN wrote:
> On Tue, Sep 05, 2017 at 04:05:04PM +0800, Qu Wenruo wrote:
> > > gargamel:~# btrfs fi df /mnt/btrfs_pool1
> > > Data, single: total.60TiB, used.54TiB
> > > System, DUP: total2.00MiB, used=1.19MiB
> > > Metadata, DUP: totalX.00GiB, used.69GiB
> > 
> > Wait for a minute.
> > 
> > Is that .69GiB means 706 MiB? Or my email client/GMX screwed up the format
> > (again)?
> > This output format must be changed, at least to 0.69 GiB, or 706 MiB.
>  
> Email client problem. I see control characters in what you quoted.
> 
> Let's try again
> gargamel:~# btrfs fi df /mnt/btrfs_pool1
> Data, single: total=10.66TiB, used=10.60TiB  => 10TB
> System, DUP: total=64.00MiB, used=1.20MiB=> 1.2MB
> Metadata, DUP: total=57.50GiB, used=12.76GiB => 13GB
> GlobalReserve, single: total=512.00MiB, used=0.00B  => 0
> 
> > You mean lowmem is actually FASTER than original mode?
> > That's very surprising.
>  
> Correct, unless I add --repair and then original mode is 2x faster than
> lowmem.
> 
> > Is there any special operation done for that btrfs?
> > Like offline dedupe or tons of reflinks?
> 
> In this case, no.
> Note that btrfs check used to take many hours overnight until I did a
> git pull of btrfs progs and built the latest from TOT.
> 
> > BTW, how many subvolumes do you have in the fs?
>  
> gargamel:/mnt/btrfs_pool1# btrfs subvolume list . | wc -l
> 91
> 
> If I remove snapshots for btrfs send and historical 'backups':
> gargamel:/mnt/btrfs_pool1# btrfs subvolume list . | grep -Ev 
> '(hourly|daily|weekly|rw|ro)' | wc -l
> 5
> 
> > This looks like a bug. My first guess is related to number of
> > subvolumes/reflinks, but I'm not sure since I don't have many real-world
> > btrfs.
> > 
> > I'll take sometime to look into it.
> > 
> > Thanks for the very interesting report,
> 
> Thanks for having a look :)
> 
> Marc
> -- 
> "A mouse is a device used to point at the xterm you want to type in" - A.S.R.
> Microsoft is to operating systems 
>    what McDonalds is to gourmet 
> cooking
> Home page: http://marc.merlins.org/ | PGP 
> 1024R/763BE901
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: netapp-alike snapshots?

2017-09-09 Thread Marc MERLIN
On Sat, Sep 09, 2017 at 03:26:14PM +0200, Ulli Horlacher wrote:
> On Tue 2017-08-22 (15:22), Ulli Horlacher wrote:
> > With Netapp/waffle you have automatic hourly/daily/weekly snapshots.
> > You can find these snapshots in every local directory (readonly).
> 
> > I would like to have something similar with btrfs.
> > Is there (where?) such a tool?
> 
> I have found none, so I have implemented it by myself:
> 
> https://fex.rus.uni-stuttgart.de/snaprotate.html

Not sure how you looked :)
https://www.google.com/search?q=btrfs+netapp+snapshot
http://marc.merlins.org/perso/btrfs/post_2014-03-21_Btrfs-Tips_-How-To-Setup-Netapp-Style-Snapshots.html

Might not be exactly what you wanted, but been using it for 3 years.

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs check --repair now runs in minutes instead of hours? aborting

2017-09-05 Thread Marc MERLIN
On Tue, Sep 05, 2017 at 04:05:04PM +0800, Qu Wenruo wrote:
> > gargamel:~# btrfs fi df /mnt/btrfs_pool1
> > Data, single: total.60TiB, used.54TiB
> > System, DUP: total2.00MiB, used=1.19MiB
> > Metadata, DUP: totalX.00GiB, used.69GiB
> 
> Wait for a minute.
> 
> Is that .69GiB means 706 MiB? Or my email client/GMX screwed up the format
> (again)?
> This output format must be changed, at least to 0.69 GiB, or 706 MiB.
 
Email client problem. I see control characters in what you quoted.

Let's try again
gargamel:~# btrfs fi df /mnt/btrfs_pool1
Data, single: total=10.66TiB, used=10.60TiB  => 10TB
System, DUP: total=64.00MiB, used=1.20MiB=> 1.2MB
Metadata, DUP: total=57.50GiB, used=12.76GiB => 13GB
GlobalReserve, single: total=512.00MiB, used=0.00B  => 0

> You mean lowmem is actually FASTER than original mode?
> That's very surprising.
 
Correct, unless I add --repair and then original mode is 2x faster than
lowmem.

> Is there any special operation done for that btrfs?
> Like offline dedupe or tons of reflinks?

In this case, no.
Note that btrfs check used to take many hours overnight until I did a
git pull of btrfs progs and built the latest from TOT.

> BTW, how many subvolumes do you have in the fs?
 
gargamel:/mnt/btrfs_pool1# btrfs subvolume list . | wc -l
91

If I remove snapshots for btrfs send and historical 'backups':
gargamel:/mnt/btrfs_pool1# btrfs subvolume list . | grep -Ev 
'(hourly|daily|weekly|rw|ro)' | wc -l
5

> This looks like a bug. My first guess is related to number of
> subvolumes/reflinks, but I'm not sure since I don't have many real-world
> btrfs.
> 
> I'll take sometime to look into it.
> 
> Thanks for the very interesting report,

Thanks for having a look :)

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs check --repair now runs in minutes instead of hours? aborting

2017-09-04 Thread Marc MERLIN
Ok, not quite hours, but check takes 88mn, check --repair takes 11mn

gargamel:/var/local/src/btrfs-progs# time btrfs check /dev/mapper/dshelf1
Checking filesystem on /dev/mapper/dshelf1
UUID: 36f5079e-ca6c-4855-8639-ccb82695c18d
checking extents
checking free space cache
cache and super generation don't match, space cache will be invalidated
checking fs roots
checking csums
checking root refs
found 11674263330816 bytes used, no error found
total csum bytes: 11384482936
total tree bytes: 13704478720
total fs tree bytes: 724729856
total extent tree bytes: 482623488
btree space waste bytes: 1167009013
file data blocks allocated: 12041456693248
referenced 12063146434560

real88m56.597s
user2m13.985s
sys 2m7.880s

gargamel:/var/local/src/btrfs-progs# time btrfs check --repair
/dev/mapper/dshelf1
enabling repair mode
Checking filesystem on /dev/mapper/dshelf1
UUID: 36f5079e-ca6c-4855-8639-ccb82695c18d
checking extents
Fixed 0 roots.
checking free space cache
cache and super generation don't match, space cache will be invalidated
checking fs roots
checking csums
checking root refs
found 11674263330816 bytes used, no error found
total csum bytes: 11384482936
total tree bytes: 13704478720
total fs tree bytes: 724729856
total extent tree bytes: 482623488
btree space waste bytes: 1167009013
file data blocks allocated: 12041456693248
referenced 12063146434560

real11m10.499s
user1m55.067s
sys 1m31.666s

And lowmem is 24mn:
gargamel:/var/local/src/btrfs-progs# time btrfs check --mode=lowmem
/dev/mapper/dshelf1
Checking filesystem on /dev/mapper/dshelf1
UUID: 36f5079e-ca6c-4855-8639-ccb82695c18d
checking extents
checking free space cache
cache and super generation don't match, space cache will be invalidated
checking fs roots
checking csums
checking root refs
found 11674263363584 bytes used, no error found
total csum bytes: 11384482936
total tree bytes: 13738770432
total fs tree bytes: 758988800
total extent tree bytes: 482656256
btree space waste bytes: 1171508121
file data blocks allocated: 12888981110784
referenced 12930453286912

real24m20.493s
user5m45.749s
sys 1m40.204s


Does this make any sense at all that check without --repair is so much
slower than with --repair or lowmem?

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs check --repair now runs in minutes instead of hours? aborting

2017-09-04 Thread Marc MERLIN
On Tue, Sep 05, 2017 at 09:21:55AM +0800, Qu Wenruo wrote:
> 
> 
> On 2017年09月05日 09:05, Marc MERLIN wrote:
> >Ok, I don't want to sound like I'm complaining :) but I updated
> >btrfs-progs to top of tree in git, installed it, and ran it on an 8TiB
> >filesystem that used to take 12H or so to check.
> 
> How much space allocated for that 8T fs?
> If metadata is not that large, 10min is valid.
> 
> Here fi df output could help.

gargamel:~# btrfs fi df /mnt/btrfs_pool1
Data, single: total=10.60TiB, used=10.54TiB
System, DUP: total=32.00MiB, used=1.19MiB
Metadata, DUP: total=58.00GiB, used=12.69GiB
GlobalReserve, single: total=512.00MiB, used=0.00B

> And, without --repair, how much time it takes to run?

Well, funny that you ask, it's now been running for hours, still waiting...
 
Just before, I ran lowmem, and it was pretty quick too (didn't time it,
but less than 1h):
gargamel:/var/local/src/btrfs-progs# btrfs check --mode=lowmem
/dev/mapper/dshelf1
Checking filesystem on /dev/mapper/dshelf1
UUID: 36f5079e-ca6c-4855-8639-ccb82695c18d
checking extents
checking free space cache
cache and super generation don't match, space cache will be invalidated
checking fs roots
checking csums
checking root refs
found 11674263330816 bytes used, no error found
total csum bytes: 11384482936
total tree bytes: 13738737664
total fs tree bytes: 758988800
total extent tree bytes: 482623488
btree space waste bytes: 1171475737
file data blocks allocated: 12888981110784
 referenced 12930453286912

Now, this is good news for my filesystem being probably clean (previous
versions of lowmem before my git update found issues that were unclear, but
apparently errors in the code, and this version finds nothing)

But I'm not sure why --repair would be fast, and not --repair would be slow?

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


btrfs check --repair now runs in minutes instead of hours? aborting

2017-09-04 Thread Marc MERLIN
Ok, I don't want to sound like I'm complaining :) but I updated
btrfs-progs to top of tree in git, installed it, and ran it on an 8TiB
filesystem that used to take 12H or so to check.

It finished in maybe 10mn, just 10mn! :)
gargamel:/var/local/src/btrfs-progs# btrfs check --repair /dev/mapper/dshelf1
enabling repair mode
Checking filesystem on /dev/mapper/dshelf1
UUID: 36f5079e-ca6c-4855-8639-ccb82695c18d
checking extents
Fixed 0 roots.
checking free space cache
cache and super generation don't match, space cache will be invalidated
checking fs roots
checking csums
checking root refs
found 11674263347200 bytes used, no error found
total csum bytes: 11384482936
total tree bytes: 13704495104
total fs tree bytes: 724729856
total extent tree bytes: 482639872
btree space waste bytes: 1167025205
file data blocks allocated: 12041456693248
 referenced 12063146434560

This is great news, but can I trust that the program worked properly and
indeed that my filesystem is fully clean?
Or at this point if I'm not running --mode=lowmem, the regular mode is
really doesn't check much and only lowmem can do a proper check? (even
though it can't fix problems once it finds them)

Thanks
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)

2017-09-03 Thread Marc MERLIN
On Sun, Sep 03, 2017 at 05:33:33PM +, Josef Bacik wrote:
> Alright pushed, sorry about that.
 
I'm reasonably sure I'm running the new code, but still got this:
[ 2104.336513] Dropping a ref for a root that doesn't have a ref on the block
[ 2104.358226] Dumping block entry [115253923840 155648], num_refs 1, metadata 
0, from disk 1
[ 2104.384037]   Ref root 0, parent 3414272884736, owner 262813, offset 0, 
num_refs 18446744073709551615
[ 2104.412766]   Ref root 418, parent 0, owner 262813, offset 0, num_refs 1
[ 2104.433888]   Root entry 418, num_refs 1
[ 2104.446648]   Root entry 69869, num_refs 0
[ 2104.459904]   Ref action 2, root 69869, ref_root 0, parent 3414272884736, 
owner 262813, offset 0, num_refs 18446744073709551615
[ 2104.496244]   No Stacktrace

Now, in the background I had a monthly md check of the underlying device
(mdadm raid 5), and got some of those. Obviously that's not good, and 
I'm assuming that md raid5 may not have a checksum on blocks, so it won't know
which drive has the corrupted data.
Does that sound right?

Now, the good news is that btrfs on top does have checksums, so running a scrub 
should
hopefully find those corrupted blocks if they happen to be in use by the 
filesystem
(maybe they are free).
But as a reminder, this whole thread started with my FS maybe not being in a 
good state, but both
check --repair and scrub returning clean. Maybe I'll use the opportunity to 
re-run a check --repair
and a scrub after that to see what state things are in.

md6: mismatch sector in range 3581539536-3581539544
md6: mismatch sector in range 3581539544-3581539552
md6: mismatch sector in range 3581539552-3581539560
md6: mismatch sector in range 3581539560-3581539568  
md6: mismatch sector in range 3581543792-3581543800
md6: mismatch sector in range 3581543800-3581543808
md6: mismatch sector in range 3581543808-3581543816
md6: mismatch sector in range 3581543816-3581543824
md6: mismatch sector in range 3581544112-3581544120
md6: mismatch sector in range 3581544120-3581544128

As for your patch, no idea why it's not giving me a stacktrace, sorry :-/

Git log of my tree does show:
commit aa162d2908bd7452805ea812b7550232b0b6ed53
Author: Josef Bacik 
Date:   Sun Sep 3 13:32:17 2017 -0400

Btrfs: use be->metadata just in case

I suspect we're not getting the owner in some cases, so we want to just
use the known value.

Signed-off-by: Josef Bacik 

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)

2017-09-03 Thread Marc MERLIN
On Sun, Sep 03, 2017 at 02:38:57PM +, Josef Bacik wrote:
> Oh yeah you need CONFIG_STACKTRACE turned on, otherwise this is going to be 
> difficult ;).  Thanks,
 
Right, except that I thought I did:

saruman:/usr/src/linux-btrfs/btrfs-next# grep STACKTRACE .config
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_HAVE_RELIABLE_STACKTRACE=y
CONFIG_STACKTRACE=y
CONFIG_USER_STACKTRACE_SUPPORT=y

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)

2017-09-03 Thread Marc MERLIN
On Sun, Sep 03, 2017 at 03:26:34AM +, Josef Bacik wrote:
> I was looking through the code for other ways to cut down memory usage when I 
> noticed we only catch improper re-allocations, not adding another ref for 
> metadata which is what I suspect your problem is.  I added another patch and 
> pushed it out, sorry for the churn.

Installed.

For now, I've seen this once, but otherwise no issues:
Dropping a ref for a root that doesn't have a ref on the block
Dumping block entry [26538725376 4096], num_refs 2, metadata 0, from disk 1
  Ref root 0, parent 29818880, owner 23608, offset 0, num_refs 
18446744073709551615
  Ref root 0, parent 202129408, owner 23608, offset 0, num_refs 1
  Ref root 418, parent 0, owner 23608, offset 0, num_refs 1
  Root entry 418, num_refs 1
  Root entry 69809, num_refs 0
  Ref action 1, root 418, ref_root 0, parent 202129408, owner 23608, offset 0, 
num_refs 1
  No stacktrace support
  Ref action 2, root 69809, ref_root 0, parent 29818880, owner 23608, offset 0, 
num_refs 18446744073709551615
  No stacktrace support


I'm assuming this was done by your patch?
Should I worry about 'No stacktrace support' ?

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)

2017-09-02 Thread Marc MERLIN
On Sun, Sep 03, 2017 at 12:30:07AM +, Josef Bacik wrote:
> My bad, I forgot I don't dynamically allocate the stack trace space so my 
> patch did nothing, I blame the children for distracting me.  I've dropped 
> allocating the action altogether for the on disk stuff, that should 
> dramatically reduce the memory usage.  You can just do a git pull since I 
> made a new commit.  You are mounting with -o ref_verify on only the one fs 
> right?  Give this a try and if it still doesn't work we can try a stripped 
> down version that doesn't build the initial tree and just hope that the 
> problem exists in allocating a new block and not modifying the refs for an 
> existing block.  Thanks,

Good news, this time it booted without crashing on OOM.

I'll now get to see how it runs and hopefully it won't crash due to
other problems in 4.13

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)

2017-09-02 Thread Marc MERLIN
On Sat, Sep 02, 2017 at 04:52:20PM +, Josef Bacik wrote:
> Oops, ok I've updated my tree so we don't save the stack trace of the initial 
> scan, which we don't need anyway.  That should save a decent amount of memory 
> in your case.  It was an in place update so you'll need to blow away your 
> local branch and pull the new one to get the new code.  Thanks,

Still did not work unfortunately (on top of extra unrelated bugs in
4.13rc5 like I was afraid)

mounting the partition still sucks all the memory

[  358.719722] bcache_writebac invoked oom-killer: 
gfp_mask=0x14000c0(GFP_KERNEL), nodemask=(null),  order=0, oom_score_adj=0
[  358.753716] bcache_writebac cpuset=/ mems_allowed=0
[  358.769071] CPU: 3 PID: 2339 Comm: bcache_writebac Tainted: G U  
4.13.0-rc5-amd64-stkreg-sysrq-20170902+ #2
[  358.802040] Hardware name: System manufacturer System Product Name/P8H67-M 
PRO, BIOS 3904 04/27/2013
[  358.830082] Call Trace:
[  358.838108]  dump_stack+0x61/0x7d
[  358.848728]  dump_header+0x97/0x239
[  358.859846]  ? _raw_spin_unlock_irqrestore+0x14/0x24
[  358.875398]  oom_kill_process+0x86/0x379
[  358.887838]  out_of_memory+0x3a6/0x3ef
[  358.899730]  __alloc_pages_slowpath+0x86e/0xa1f
[  358.913977]  ? native_sched_clock+0x1a/0x37
[  358.927197]  __alloc_pages_nodemask+0x134/0x1d4
[  358.941432]  alloc_pages_current+0x8d/0x96
[  358.954343]  bio_alloc_pages+0x29/0x6a
[  358.966194]  bch_writeback_thread+0x51c/0x6d4 [bcache]
[  358.982206]  ? write_dirty+0x90/0x90 [bcache]
[  358.995878]  kthread+0xfb/0x100
[  359.005899]  ? init_completion+0x24/0x24
[  359.018242]  ? do_fast_syscall_32+0xb7/0xfe
[  359.031360]  ret_from_fork+0x25/0x30
[  359.042723] Mem-Info:
[  359.050529] active_anon:0 inactive_anon:2 isolated_anon:0
[  359.050529]  active_file:306 inactive_file:163 isolated_file:0
[  359.050529]  unevictable:0 dirty:0 writeback:0 unstable:0
[  359.050529]  slab_reclaimable:3430 slab_unreclaimable:8034083
[  359.050529]  mapped:1 shmem:2 pagetables:80 bounce:0
[  359.050529]  free:51932 free_pcp:46 free_cma:3741
[  359.149971] Node 0 active_anon:0kB inactive_anon:8kB active_file:1128kB 
inactive_file:892kB unevictable:0kB isolated(anon):0kB isolated(file):0kB 
mapped:4kB dirty:0kB writeback:0kB shmem:8kB shmem_thp: 0kB shmem_pmdmapped: 
0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? yes
[  359.229593] Node 0 DMA free:15880kB min:32kB low:44kB high:56kB 
active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
unevictable:0kB writepending:0kB present:15964kB managed:15880kB mlocked:0kB 
kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB 
free_cma:0kB
[  359.308570] lowmem_reserve[]: 0 3201 31832 31832 31832
[  359.324706] Node 0 DMA32 free:121124kB min:6788kB low:10064kB high:13340kB 
active_anon:0kB inactive_anon:0kB active_file:100kB inactive_file:0kB 
unevictable:0kB writepending:0kB present:3362068kB managed:3296500kB 
mlocked:0kB kernel_stack:16kB pagetables:0kB bounce:0kB free_pcp:0kB 
local_pcp:0kB free_cma:0kB
[  359.408498] lowmem_reserve[]: 0 0 28631 28631 28631
[  359.423773] Node 0 Normal free:70792kB min:60760kB low:90092kB high:119424kB 
active_anon:0kB inactive_anon:8kB active_file:780kB inactive_file:808kB 
unevictable:0kB writepending:0kB present:29874176kB managed:29337512kB 
mlocked:0kB kernel_stack:4608kB pagetables:320kB bounce:0kB free_pcp:0kB 
local_pcp:0kB free_cma:14964kB
[  359.511284] lowmem_reserve[]: 0 0 0 0 0
[  359.523514] Node 0 DMA: 0*4kB 1*8kB (U) 0*16kB 0*32kB 2*64kB (U) 1*128kB (U) 
1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15880kB
[  359.564260] Node 0 DMA32: 3*4kB (UME) 3*8kB (ME) 4*16kB (UME) 6*32kB (UME) 
4*64kB (ME) 4*128kB (ME) 5*256kB (ME) 4*512kB (ME) 6*1024kB (UME) 4*2048kB 
(UME) 25*4096kB (M) = 121124kB
[  359.614116] Node 0 Normal: 559*4kB (UMEC) 272*8kB (ME) 163*16kB (UMEC) 
93*32kB (UMEC) 65*64kB (MEC) 71*128kB (UME) 37*256kB (ME) 18*512kB (UMC) 
8*1024kB (M) 4*2048kB (MC) 3*4096kB (C) = 70604kB
[  359.667377] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 
hugepages_size=2048kB
[  359.693519] 456 total pagecache pages
[  359.705331] 0 pages in swap cache
[  359.716151] Swap cache stats: add 1184, delete 1184, find 4/8
[  359.734213] Free swap  = 15610620kB
[  359.745499] Total swap = 15616764kB
[  359.756879] 8313052 pages RAM
[  359.766596] 0 pages HighMem/MovableOnly
[  359.778927] 150579 pages reserved
[  359.789686] 4096 pages cma reserved
[  359.801052] 0 pages hwpoisoned
[  359.811026] [ pid ]   uid  tgid total_vm  rss nr_ptes nr_pmds swapents 
oom_score_adj name
[  359.837419] [  967] 0   967  9360   6   2   32   
  0 init
[  359.863819] [  968] 0   968  9411   5   2   98   
  0 rc
[  359.889683] [ 1087] 0  1087  9421   5   2  212   
  -1000 udevd
[  359.916457] [ 1294] 0  1294  9171   5   2   60   
  -1000 

Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)

2017-09-02 Thread Marc MERLIN
On Fri, Sep 01, 2017 at 11:01:30PM +, Josef Bacik wrote:
> You'll be fine, it's only happening on the one fs right?  That's 13gib of 
> metadata with checksums and all that shit, it'll probably look like 8 or 9gib 
> of ram worst case.  I'd mount with -o ref_verify and check the slab amount in 
> /proc/meminfo to get an idea of real usage.  Once the mount is finished 
> that'll be about as much metadata you will use, of course it'll grow as 
> metadata usage grows but it should be nominal.  Thanks,

Looks like I don't have enough RAM :(

[   80.964838] BTRFS info (device dm-2): bdev /dev/mapper/dshelf1 errs: wr 0, 
rd 0, flush 0, corrupt 2, gen 0
[ 1382.968986]Tbcache_writebaceinvoked 
oom-killer:dgfp_mask=0x14000c0(GFP_KERNEL), nodemask=(null),  order=0, 
oom_score_adj=0
[ 1383.003255] bcache_writebac cpuset=/ mems_allowed=0
[ 1383.018947] CPU: 6 PID: 2359 Comm: bcache_writebac Tainted: G U  
4.13.0-rc5-amd64-preempt-sysrq-20170406+ #1
[ 1383.052448] Hardware name: System manufacturer System Product Name/P8H67-M 
PRO, BIOS 3904 04/27/2013
[ 1383.080911] Call Trace:
[ 1383.089336]  dump_stack+0x61/0x7d
[ 1383.100132]  dump_header+0x97/0x239
[ 1383.111354]  ? _raw_spin_unlock_irqrestore+0x14/0x24
[ 1383.127322]  oom_kill_process+0x86/0x379
[ 1383.140208]  out_of_memory+0x3b8/0x416
[ 1383.152581]  __alloc_pages_slowpath+0x890/0xa55
[ 1383.166960]  ? _raw_spin_unlock_irq+0x11/0x21
[ 1383.180806]  __alloc_pages_nodemask+0x141/0x1f5
[ 1383.195144]  alloc_pages_current+0x8d/0x96
[ 1383.208310]  bio_alloc_pages+0x29/0x6a
[ 1383.220472]  bch_writeback_thread+0x53b/0x6ff [bcache]
[ 1383.236942]  ? write_dirty+0x90/0x90 [bcache]
[ 1383.250734]  kthread+0xfb/0x100
[ 1383.261230]  ? init_completion+0x24/0x24
[ 1383.273988]  ? do_fast_syscall_32+0xb7/0xfe
[ 1383.287265]  ret_from_fork+0x25/0x30
[ 1383.298733] Mem-Info:
[ 1383.306446] active_anon:1 inactive_anon:3 isolated_anon:0
[ 1383.306446]  active_file:190 inactive_file:180 isolated_file:0
[ 1383.306446]  unevictable:0 dirty:0 writeback:1 unstable:0
[ 1383.306446]  slab_reclaimable:3436 slab_unreclaimable:8033273
[ 1383.306446]  mapped:1 shmem:2 pagetables:74 bounce:0
[ 1383.306446]  free:53127 free_pcp:0 free_cma:3741
[ 1383.406332] Node 0 active_anon:0kB inactive_anon:16kB active_file:896kB 
inactive_file:824kB unevictable:0kB isolated(anon):0kB isolated(file):0kB 
mapped:4kB dirty:0kB writeback:0kB shmem:8kB shmem_thp: 0kB shmem_pmdmapped: 
0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? yes
[ 1383.486392] Node 0 DMA free:15880kB min:32kB low:44kB high:56kB 
active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
unevictable:0kB writepending:0kB present:15964kB managed:15880kB mlocked:0kB 
kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB 
free_cma:0kB
[ 1383.565818] lowmem_reserve[]: 0 3201 31832 31832 31832
[ 1383.581956] Node 0 DMA32 free:121256kB min:6788kB low:10064kB high:13340kB 
active_anon:0kB inactive_anon:0kB active_file:44kB inactive_file:52kB 
unevictable:0kB writepending:0kB present:3362068kB managed:3296500kB 
mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB 
local_pcp:0kB free_cma:0kB
[ 1383.665831] lowmem_reserve[]: 0 0 28631 28631 28631
[ 1383.681212] Node 0 Normal free:75372kB min:60760kB low:90092kB high:119424kB 
active_anon:0kB inactive_anon:16kB active_file:788kB inactive_file:836kB 
unevictable:0kB writepending:0kB present:29874176kB managed:29337252kB 
mlocked:0kB kernel_stack:8048kB pagetables:296kB bounce:0kB free_pcp:0kB 
local_pcp:0kB free_cma:14964kB
[ 1383.769793] lowmem_reserve[]: 0 0 0 0 0
[ 1383.782429] Node 0 DMA: 0*4kB 1*8kB (U) 0*16kB 0*32kB 2*64kB (U) 1*128kB (U) 
1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15880kB
[ 1383.823171] Node 0 DMA32: 4*4kB (UM) 21*8kB (UME) 9*16kB (UME) 5*32kB (ME) 
5*64kB (UME) 5*128kB (UME) 6*256kB (UME) 5*512kB (UME) 5*1024kB (ME) 4*2048kB 
(UME) 25*4096kB (M) = 121256kB
[ 1383.874564] Node 0 Normal: 773*4kB (UMEC) 494*8kB (ME) 373*16kB (UMEC) 
284*32kB (MEC) 177*64kB (UMEC) 108*128kB (UME) 36*256kB (UME) 9*512kB (UMEC) 
0*1024kB 1*2048kB (C) 3*4096kB (C) = 75412kB
[ 1383.927787] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 
hugepages_size=2048kB
[ 1383.954002] 467 total pagecache pages
[ 1383.965889] 3 pages in swap cache
[ 1383.976715] Swap cache stats: add 1253, delete 1250, find 21/36
[ 1383.995325] Free swap  = 15610620kB
[ 1384.006675] Total swap = 15616764kB
[ 1384.018005] 8313052 pages RAM
[ 1384.027730] 0 pages HighMem/MovableOnly
[ 1384.040076] 150644 pages reserved
[ 1384.050845] 4096 pages cma reserved
[ 1384.062127] 0 pages hwpoisoned
[ 1384.072133] [ pid ]   uid  tgid total_vm  rss nr_ptes nr_pmds swapents 
oom_score_adj name
[ 1384.098531] [  983] 0   983  9360   6   2   32   
  0 init
[ 1384.124971] [  984] 0   984  9411   5   2   98   
  0 rc
[ 1384.150843] [ 1103] 0  1103  920  

Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)

2017-09-01 Thread Marc MERLIN
On Thu, Aug 31, 2017 at 05:48:23PM +, Josef Bacik wrote:
> We are using 4.11 in production at fb with backports from recent (a month 
> ago?) stuff.  I’m relatively certain nothing bad will happen, and this branch 
> has the most recent fsync() corruption fix (which exists in your kernel so 
> it’s not new).  That said if you are uncomfortable I can rebase this patch 
> onto whatever base you want and push out a branch, it’s your choice.  Keep in 
> mind this is going to hold a lot of shit in memory, so I hope you have 
> enough, and I’d definitely remove the sleep’s from your script, there’s no 
> telling if this is a race condition or not and the overhead of the ref-verify 
> stuff may cause it to be less likely to happen.  Thanks,

Thanks for the warning. I have 32GB of RAM in the server, and I probably use
8. Most of the rest is so that I can do btrfs check --repair without the
machine dying :-/

I am concerned that I have a lot more metadata than I have memory:
gargamel:~# btrfs fi df /mnt/btrfs_pool1
Data, single: total=10.66TiB, used=10.60TiB
System, DUP: total=32.00MiB, used=1.20MiB
Metadata, DUP: total=58.00GiB, used=12.76GiB
GlobalReserve, single: total=512.00MiB, used=0.00B
gargamel:~# btrfs fi df /mnt/btrfs_pool2
Data, single: total=5.07TiB, used=4.78TiB
System, DUP: total=8.00MiB, used=640.00KiB
Metadata, DUP: total=70.50GiB, used=66.58GiB
GlobalReserve, single: total=512.00MiB, used=0.00B

That's 13GB + 67GB.
Is it going to fall over if I only have 32GB of RAM?

If I stop mounting /mnt/btrfs_pool2 for a while, will 32GB of RAM
cover the 13GB of metadata from /mnt/btrfs_pool1 ?

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)

2017-08-31 Thread Marc MERLIN
On Thu, Aug 31, 2017 at 02:52:56PM +, Josef Bacik wrote:
> Hello,
> 
> Sorry I really thought I could accomplish this with BPF, but ref tracking is 
> just too complicated to work properly with BPF.  I forward ported my ref 
> verification patch to the latest kernel, you can find it in the btrfs-readdir 
> branch of my btrfs-next tree here
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-next.git

Thanks.

Now, I have to ask: how safe is this kernel btrfs-wise? I'm ok if it
crashes, but much less so if it damages my filesysetem.
I spent over a week recovering from the last corruption that happened when I
moved to 4.11 (and retreated back to 4.9).

>From other reports you've seen, has 4.11/4.12 been stable enough for others,
and is 4.13-rc (which your branch is based on, correct?) safe enough in your
opinion?
(and yes, just asking for your opinion, I totally understand that you can't
predict all bugs, and you can't give me a 100% assurance)

I do have a backup, but it indeed takes days to recover, and over a week if
the kernel also damages the other FS on that system, which is smaller, but
has maybe 100x the amount of files.

For now, the problem in the subject line, happens rarely-ish (2-3 weeks?)
although if I remove sleeps in my snapshot creation and rotation, it may
start happening more often again.

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)

2017-08-29 Thread Marc MERLIN
On Tue, Aug 29, 2017 at 06:22:38PM +, Josef Bacik wrote:
> How much metadata do you have on this fs?  I was going to hold everything in 
> bpf hash trees, but I’m worried we’ll hit collisions and then the tracing 
> will be useless.  If it’s too big I’ll have to dump everything to userspace 
> and let python take care of keeping everything in memory, so if you have a 
> lot of metadata hopefully you have lots of memory too ;).  Thanks,

gargamel:~# btrfs fi df /mnt/btrfs_pool1
Data, single: total=10.60TiB, used=10.54TiB
System, DUP: total=32.00MiB, used=1.19MiB
Metadata, DUP: total=58.00GiB, used=12.69GiB
GlobalReserve, single: total=512.00MiB, used=0.00B

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)

2017-08-29 Thread Marc MERLIN
On Tue, Aug 29, 2017 at 02:30:19PM +, Josef Bacik wrote:
> Sorry Marc, I’ll wire up a bcc script to try and catch when this
> happens.  In order for it to work it’ll need to read the extent tree in
> before you mount the fs, is that something you’ll be able to swing or is
> this your root fs?  Also is it the only btrfs fs on the system?  Thanks,

HI Josef, thanks for your reply.

Thankfully it's not the root FS.
There are 3 btrfs filesystems on that system.

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists (since 3.4 / 2012)

2017-08-28 Thread Marc MERLIN
On Sat, Jul 15, 2017 at 04:12:45PM -0700, Marc MERLIN wrote:
> On Fri, Jul 14, 2017 at 06:22:16PM -0700, Marc MERLIN wrote:
> > Dear Chris and other developers,
> > 
> > Can you look at this bug which has been happening since 2012 on apparently 
> > all kernels between at least
> > 3.4 and 4.11.
> > I didn't look in detail at each thread (took long enough to even find them 
> > all and paste here), but they seem pretty
> > similar although the reasons how they got there may be different, or at 
> > least not as benign as a race condition
> > between snapshot creation and deletion for those who do hourly snapshot 
> > rotations like me.
> 
> I just finished 2 check repairs, one with each mode, they both come back
> clean.
> Yet my FS still remounts read only with the same
> BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2967: errno=-17 Object 
> already exists
> BTRFS info (device dm-2): forced readonly
> BTRFS warning (device dm-2): failed setting block group ro, ret=-30 

So this still happens pseudo randomly every 2 weeks maybe?

Last one is below.
It did not happen during a btrfs snapshot although I'm not entirely sure
what else was running at the time.

Any update on this problem?

[ cut here ]  
WARNING: CPU: 6 PID: 3783 at fs/btrfs/extent-tree.c:2967 
btrfs_run_delayed_refs+0xbd/0x1be  
BTRFS: Transaction aborted (error -17)  
Modules linked in: asix veth ip6table_filter ip6_tables ebtable_nat ebtables 
ppdev lp xt_addrtype br_netfilter bridge stp llc tun autofs4 softdog 
binfmt_misc ftdi_sio nfsd auth_rpcgss nfs_acl nfs lockd grace fscache sunrpc 
ipt_REJECT nf_reject_ipv4 xt_conntrack xt_mark xt_nat xt_tcpudp nf_log_ipv4 
nf_log_common xt_LOG iptable_mangle iptable_filter lm85 hwmon_vid pl2303 
dm_snapshot dm_bufio iptable_nat ip_tables nf_conntrack_ipv4 nf_defrag_ipv4 
nf_nat_ipv4 nf_conntrack_ftp ipt_MASQUERADE nf_nat_masquerade_ipv4 nf_nat 
nf_conntrack x_tables sg st snd_pcm_oss snd_mixer_oss bcache kvm_intel kvm 
irqbypass snd_hda_codec_realtek snd_hda_codec_generic snd_cmipci 
snd_mpu401_uart snd_hda_intel snd_opl3_lib snd_hda_codec snd_hda_core snd_hwdep 
eeepc_wmi snd_rawmidi snd_seq_device tpm_infineon tpm_tis  
 snd_pcm asus_wmi snd_timer tpm_tis_core rc_ati_x10 snd ati_remote 
sparse_keymap rfkill i2c_i801 usbserial hwmon usbnet libphy pcspkr wmi 
soundcore input_leds tpm rc_core parport_pc evdev i915 lpc_ich i2c_smbus 
parport battery mei_me e1000e ptp pps_core fuse raid456 multipath mmc_block 
mmc_core dm_crypt dm_mod async_raid6_recov async_pq async_xor async_memcpy 
async_tx crc32c_intel blowfish_x86_64 blowfish_common aesni_intel aes_x86_64 
lrw glue_helper ablk_helper cryptd sata_sil24 fjes mvsas xhci_pci libsas 
xhci_hcd ehci_pci ehci_hcd thermal usbcore fan r8169 mii scsi_transport_sas 
[last unloaded: asix]  
CPU: 2 PID: 3783 Comm: btrfs-transacti Tainted: G U  
4.9.36-amd64-preempt-sysrq-20170406 #1  
Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 
04/27/2013  
 b7eb67affc98 ae39b00b b7eb67affce8   
 b7eb67affcd8 ae066769 0b9767affd58 974f736da960  
 9756319df000 ffef 975302da7a50   
Call Trace:  
 [] dump_stack+0x61/0x7d  
 [] __warn+0xc2/0xdd  
 [] warn_slowpath_fmt+0x5a/0x76  
 [] btrfs_run_delayed_refs+0xbd/0x1be  
 [] commit_cowonly_roots+0x10d/0x2b2  
 [] ? btrfs_qgroup_account_extents+0x131/0x181  
 [] ? btrfs_run_delayed_refs+0x1a6/0x1be  
 [] btrfs_commit_transaction+0x46b/0x8fb  
 [] transaction_kthread+0xf5/0x1a1  
 [] ? btrfs_cleanup_transaction+0x436/0x436  
 [] kthread+0xd1/0xd9  
 [] ? init_completion+0x24/0x24  
 [] ? do_fast_syscall_32+0xb7/0xfe  
 [] ret_from_fork+0x25/0x30  
---[ end trace 4c5fcb9daa07c11a ]---  
BTRFS: error (device dm-2) in btrfs_run_delayed_refs:2967: errno=-17 Object 
already exists  
BTRFS info (device dm-2): forced readonly  
BTRFS warning (device dm-2): Skipping commit of aborted transaction.  
BTRFS: error (device dm-2) in cleanup_transaction:1850: errno=-17 Object 
already exists  
BTRFS error (device dm-2): pending csums is 131072  

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 4.11.6 / more corruption / root 15455 has a root item with a more recent gen (33682) compared to the found root node (0)

2017-08-01 Thread Marc MERLIN
On Mon, Jul 31, 2017 at 03:00:53PM -0700, Justin Maggard wrote:
> Marc, do you have quotas enabled?  IIRC, you're a send/receive user.
> The combination of quotas and btrfs receive can corrupt your
> filesystem, as shown by the xfstest I sent to the list a little while
> ago.

Thanks for checking. I do not use quota given the problems I had with
them early on over 2y ago.

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   3   4   5   6   7   8   >