Re: btrfs double send

2015-10-25 Thread Ed Tomlinson

Filipe,

Its still not perfect.  Here I can do sequential sends a few times then I 
get something like this:


[root@grover snap]# sh -x brh
+ base=/snap/shot
++ date +%Y-%V-%u_%m-%d_%H:%M
+ stamp=2015-43-6_10-24_10:24
+ btrfs subv snapshot -r /snap/homevol /snap/shot.2015-43-6_10-24_10:24
Create a readonly snapshot of '/snap/homevol' in 
'/snap/shot.2015-43-6_10-24_10:24'

+ sync
+ /usr/bin/time -f 'send %E' nice -19 ionice -c3 btrfs send -v -p 
/snap/shot /snap/shot.2015-43-6_10-24_10:24

+ btrfs receive -v /backup/snap
At subvol /snap/shot.2015-43-6_10-24_10:24
At snapshot shot.2015-43-6_10-24_10:24
receiving snapshot shot.2015-43-6_10-24_10:24 
uuid=cb3ad856-ca0f-f744-b7c3-b33f2d5bc8d3, ctransid=625267 
parent_uuid=cb3ad856-ca0f-f744-b7c3-b33f2d5bc8d3, parent_ctransid=619893
ERROR: unlink home/ed/Maildir/.spam/dovecot.index.log.2 failed. No such 
file or directory

Command terminated by signal 13
send 1:49.64
+ sync
+ btrfs subv delete /snap/shot
Delete subvolume (no-commit): '/snap/shot'
+ sync
+ mv /snap/shot.2015-43-6_10-24_10:24 /snap/shot


so there is another bug hiding in the code, its usually something in my 
Maildir that shows in the log.


Please let me know if there is anything you would like me to try.  I am 
running 4.2 with the 4.3 for-linus tree applied and the 4.2.x patches with  
btrfs fixes removed.  On top of this are a few patches from this list.


TIA
Ed Tomlinson

On Saturday, October 24, 2015 1:52:21 PM EDT, Filipe Manana wrote:

On Sat, Oct 24, 2015 at 6:36 PM,   wrote:

Hello,

I would like to do backups based on btrfs send/receive.

So I though I would do a transfer over portable HDD and then incremental
sends (using -p) over network.

Initial : ...


It's a bug, we got it recently reported and fixed.
The fix is in Chris' integration branch for the upcoming 4.4 merge window:

http://git.kernel.org/cgit/linux/kernel/git/mason/linux-btrfs.git/commit/?h=integration-4.4=b96b1db039ebc584d03a9933b279e0d3e704c528

cheers


Best regards,

Lubos
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html






--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2 v2] Btrfs: fix regression when running delayed references

2015-10-25 Thread fdmanana
From: Filipe Manana 

In the kernel 4.2 merge window we had a refactoring/rework of the delayed
references implementation in order to fix certain problems with qgroups.
However that rework introduced one more regression that leads to the
following trace when running delayed references for metadata:

[35908.064664] kernel BUG at fs/btrfs/extent-tree.c:1832!
[35908.065201] invalid opcode:  [#1] PREEMPT SMP DEBUG_PAGEALLOC
[35908.065201] Modules linked in: dm_flakey dm_mod btrfs crc32c_generic xor 
raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc 
loop fuse parport_pc psmouse i2
[35908.065201] CPU: 14 PID: 15014 Comm: kworker/u32:9 Tainted: GW   
4.3.0-rc5-btrfs-next-17+ #1
[35908.065201] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
rel-1.8.1-0-g4adadbd-20150316_085822-nilsson.home.kraxel.org 04/01/2014
[35908.065201] Workqueue: btrfs-extent-refs btrfs_extent_refs_helper [btrfs]
[35908.065201] task: 880114b7d780 ti: 88010c4c8000 task.ti: 
88010c4c8000
[35908.065201] RIP: 0010:[]  [] 
insert_inline_extent_backref+0x52/0xb1 [btrfs]
[35908.065201] RSP: 0018:88010c4cbb08  EFLAGS: 00010293
[35908.065201] RAX:  RBX: 88008a661000 RCX: 
[35908.065201] RDX: a04dd58f RSI: 0001 RDI: 
[35908.065201] RBP: 88010c4cbb40 R08: 1000 R09: 88010c4cb9f8
[35908.065201] R10:  R11: 002c R12: 
[35908.065201] R13: 88020a74c578 R14:  R15: 
[35908.065201] FS:  () GS:88023edc() 
knlGS:
[35908.065201] CS:  0010 DS:  ES:  CR0: 8005003b
[35908.065201] CR2: 015e8708 CR3: 000102185000 CR4: 06e0
[35908.065201] Stack:
[35908.065201]  88010c4cbb18 0f37 88020a74c578 
88015a408000
[35908.065201]  880154a44000  0005 
88010c4cbbd8
[35908.065201]  a0492b9a 0005  

[35908.065201] Call Trace:
[35908.065201]  [] __btrfs_inc_extent_ref+0x8b/0x208 [btrfs]
[35908.065201]  [] ? __btrfs_run_delayed_refs+0x4d4/0xd33 
[btrfs]
[35908.065201]  [] __btrfs_run_delayed_refs+0xafa/0xd33 
[btrfs]
[35908.065201]  [] ? join_transaction.isra.10+0x25/0x41f 
[btrfs]
[35908.065201]  [] ? join_transaction.isra.10+0xa8/0x41f 
[btrfs]
[35908.065201]  [] btrfs_run_delayed_refs+0x75/0x1dd [btrfs]
[35908.065201]  [] delayed_ref_async_start+0x3c/0x7b [btrfs]
[35908.065201]  [] normal_work_helper+0x14c/0x32a [btrfs]
[35908.065201]  [] btrfs_extent_refs_helper+0x12/0x14 [btrfs]
[35908.065201]  [] process_one_work+0x24a/0x4ac
[35908.065201]  [] worker_thread+0x206/0x2c2
[35908.065201]  [] ? rescuer_thread+0x2cb/0x2cb
[35908.065201]  [] ? rescuer_thread+0x2cb/0x2cb
[35908.065201]  [] kthread+0xef/0xf7
[35908.065201]  [] ? kthread_parkme+0x24/0x24
[35908.065201]  [] ret_from_fork+0x3f/0x70
[35908.065201]  [] ? kthread_parkme+0x24/0x24
[35908.065201] Code: 6a 01 41 56 41 54 ff 75 10 41 51 4d 89 c1 49 89 c8 48 8d 
4d d0 e8 f6 f1 ff ff 48 83 c4 28 85 c0 75 2c 49 81 fc ff 00 00 00 77 02 <0f> 0b 
4c 8b 45 30 8b 4d 28 45 31
[35908.065201] RIP  [] insert_inline_extent_backref+0x52/0xb1 
[btrfs]
[35908.065201]  RSP 
[35908.310885] ---[ end trace fe4299baf0666457 ]---

This happens because the new delayed references code no longer merges
delayed references that have different sequence values. The following
steps are an example sequence leading to this issue:

1) Transaction N starts, fs_info->tree_mod_seq has value 0;

2) Extent buffer (btree node) A is allocated, delayed reference Ref1 for
   bytenr A is created, with a value of 1 and a seq value of 0;

3) fs_info->tree_mod_seq is incremented to 1;

4) Extent buffer A is deleted through btrfs_del_items(), which calls
   btrfs_del_leaf(), which in turn calls btrfs_free_tree_block(). The
   later returns the metadata extent associated to extent buffer A to
   the free space cache (the range is not pinned), because the extent
   buffer was created in the current transaction (N) and writeback never
   happened for the extent buffer (flag BTRFS_HEADER_FLAG_WRITTEN not set
   in the extent buffer).
   This creates the delayed reference Ref2 for bytenr A, with a value
   of -1 and a seq value of 1;

5) Delayed reference Ref2 is not merged with Ref1 when we create it,
   because they have different sequence numbers (decided at
   add_delayed_ref_tail_merge());

6) fs_info->tree_mod_seq is incremented to 2;

7) Some task attempts to allocate a new extent buffer (done at
   extent-tree.c:find_free_extent()), but due to heavy fragmentation
   and running low on metadata space the clustered allocation fails
   and we fall back to unclustered allocation, which finds the
   extent at offset A, so a new extent buffer at offset A is allocated.
   This creates delayed reference Ref3 for bytenr A, 

[PATCH 2/2 v2] Btrfs: fix regression running delayed references when using qgroups

2015-10-25 Thread fdmanana
From: Filipe Manana 

In the kernel 4.2 merge window we had a big changes to the implementation
of delayed references and qgroups which made the no_quota field of delayed
references not used anymore. More specifically the no_quota field is not
used anymore as of:

  commit 0ed4792af0e8 ("btrfs: qgroup: Switch to new extent-oriented qgroup 
mechanism.")

Leaving the no_quota field actually prevents delayed references from
getting merged, which in turn cause the following BUG_ON(), at
fs/btrfs/extent-tree.c, to be hit when qgroups are enabled:

  static int run_delayed_tree_ref(...)
  {
 (...)
 BUG_ON(node->ref_mod != 1);
 (...)
  }

This happens on a scenario like the following:

  1) Ref1 bytenr X, action = BTRFS_ADD_DELAYED_REF, no_quota = 1, added.

  2) Ref2 bytenr X, action = BTRFS_DROP_DELAYED_REF, no_quota = 0, added.
 It's not merged with Ref1 because Ref1->no_quota != Ref2->no_quota.

  3) Ref3 bytenr X, action = BTRFS_ADD_DELAYED_REF, no_quota = 1, added.
 It's not merged with the reference at the tail of the list of refs
 for bytenr X because the reference at the tail, Ref2 is incompatible
 due to Ref2->no_quota != Ref3->no_quota.

  4) Ref4 bytenr X, action = BTRFS_DROP_DELAYED_REF, no_quota = 0, added.
 It's not merged with the reference at the tail of the list of refs
 for bytenr X because the reference at the tail, Ref3 is incompatible
 due to Ref3->no_quota != Ref4->no_quota.

  5) We run delayed references, trigger merging of delayed references,
 through __btrfs_run_delayed_refs() -> btrfs_merge_delayed_refs().

  6) Ref1 and Ref3 are merged as Ref1->no_quota = Ref3->no_quota and
 all other conditions are satisfied too. So Ref1 gets a ref_mod
 value of 2.

  7) Ref2 and Ref4 are merged as Ref2->no_quota = Ref4->no_quota and
 all other conditions are satisfied too. So Ref2 gets a ref_mod
 value of 2.

  8) Ref1 and Ref2 aren't merged, because they have different values
 for their no_quota field.

  9) Delayed reference Ref1 is picked for running (select_delayed_ref()
 always prefers references with an action == BTRFS_ADD_DELAYED_REF).
 So run_delayed_tree_ref() is called for Ref1 which triggers the
 BUG_ON because Ref1->red_mod != 1 (equals 2).

So fix this by removing the no_quota field, as it's not used anymore as
of commit 0ed4792af0e8 ("btrfs: qgroup: Switch to new extent-oriented
qgroup mechanism.").

This fixes the remainder of problems several people have been having when
running delayed references, mostly while a balance is running in parallel,
on a 4.2+ kernel.

Very special thanks to Stéphane Lesimple for helping debugging this issue
and testing this fix on his multi terabyte filesystem (which took more
than one day to balance alone, plus fsck, etc).

Reported-by: Stéphane Lesimple 
Tested-by: Stéphane Lesimple 
Cc: sta...@vger.kernel.org  # 4.2+
Signed-off-by: Filipe Manana 
---

V2: There was no v1. But since the first patch in the series became
a v2, made this one a v2 from the start.

 fs/btrfs/ctree.h   |  4 ++--
 fs/btrfs/delayed-ref.c | 28 +++
 fs/btrfs/delayed-ref.h |  7 ++
 fs/btrfs/extent-tree.c | 45 +---
 fs/btrfs/file.c| 10 
 fs/btrfs/inode.c   |  4 ++--
 fs/btrfs/ioctl.c   | 62 +-
 fs/btrfs/relocation.c  | 16 ++---
 fs/btrfs/tree-log.c|  2 +-
 9 files changed, 44 insertions(+), 134 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 49bc792..43eb981 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3398,7 +3398,7 @@ int btrfs_set_disk_extent_flags(struct btrfs_trans_handle 
*trans,
 int btrfs_free_extent(struct btrfs_trans_handle *trans,
  struct btrfs_root *root,
  u64 bytenr, u64 num_bytes, u64 parent, u64 root_objectid,
- u64 owner, u64 offset, int no_quota);
+ u64 owner, u64 offset);
 
 int btrfs_free_reserved_extent(struct btrfs_root *root, u64 start, u64 len,
   int delalloc);
@@ -3411,7 +3411,7 @@ int btrfs_finish_extent_commit(struct btrfs_trans_handle 
*trans,
 int btrfs_inc_extent_ref(struct btrfs_trans_handle *trans,
 struct btrfs_root *root,
 u64 bytenr, u64 num_bytes, u64 parent,
-u64 root_objectid, u64 owner, u64 offset, int 
no_quota);
+u64 root_objectid, u64 owner, u64 offset);
 
 int btrfs_start_dirty_block_groups(struct btrfs_trans_handle *trans,
   struct btrfs_root *root);
diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c
index 4832943..7832031 100644
--- a/fs/btrfs/delayed-ref.c
+++ b/fs/btrfs/delayed-ref.c
@@ -220,7 +220,7 @@ static bool merge_ref(struct 

Re: Exclusive quota of snapshot exceeded despite no space used

2015-10-25 Thread Johannes Henninger
On 25.10.2015 02:44, Qu Wenruo wrote:
>
>
> 在 2015年10月23日 23:05, Johannes Henninger 写道:
>> On 23.10.2015 01:47, Qu Wenruo wrote:
>>> 在 2015年10月23日 04:38, Johannes Henninger 写道:
 I'm having a weird problem with snapshots and exclusive quotas. After
 creating a snapshot of a subvolume and setting an exclusive quota of
 50MB for the snapshot, everything seems to work fine. I can write
 approximately 50MB before the quota kicks in.

 However, if I create a snapshot, set an exclusive quota and just wait
 for some time, I suddenly cannot even create an empty file because I'm
 getting a "quota exceeded" error. The time until the bug appears seems
 to vary. During the waiting time, I'm changing neither the snapshot
 nor
 the original subvolume. "qgroup show -e" reports an exclusive use of
 only a few kilobytes for the snapshot, which is nowhere near the
 limit.

 Steps to reproduce (/media/extern is a fresh and empty btrfs
 partition):

 Enable quota and create an empty subvolume:
   root@t420:/media/extern# btrfs quota enable .
   root@t420:/media/extern# btrfs subvolume create sub
   Create subvolume './sub'

 Snapshot the subvolume and set a limit:
   root@t420:/media/extern# btrfs subvolume snapshot sub snap
   Create a snapshot of 'sub' in './snap'
   root@t420:/media/extern# cd snap/
   root@t420:/media/extern/snap# btrfs qgroup limit -e 50M .

 Sometimes it takes "longer" for the quota to kick in, so I'm
 touching a
 file every 5 minutes here:

   root@t420:/media/extern/snap# for file in {1..100}; do touch
 $file;
 sleep 5m; done
   touch: cannot touch ‘7’: Disk quota exceeded
   ^C
   root@t420:/media/extern/snap# btrfs qgroup show -e .
   qgroupid rfer excl max_excl
      
   0/5  16.00KiB 16.00KiB none
   0/25716.00KiB 16.00KiB none
   0/25816.00KiB 16.00KiB 50.00MiB

 Any idea why this happens?
>>> BTW, to make btrfs qgroup show work, it's better to call sync before
>>> qgroup show.
>>>
>>> It's a known bug that even after qgroup accounting rework, qgroup
>>> reserve still has bug and can cause reserved space to underflow,
>>> making such problem happen.
>>>
>>> For such case, btrfs qgroup show won't help as reserved space is not
>>> shown in the output.
>>>
>>> One workaround would be, umount the filesystem and mount again.
>>> Which will reset the underflow reserved space and work for sometime.
>>>
>>> If it's OK for you to recompile the kernel, you can try the following
>>> patchset:
>>> [PATCH v3 00/21] Rework btrfs qgroup reserved space framework
>>>
>>> Which should solve the problem.
>>>
>>> Thanks,
>>> Qu
>>>
>>
>> Thanks a lot for your reply!
>>
>> While remounting the filesystem fixes the issue temporary, it doesn't
>> take very long for the bug to happen again so it's not really a
>> workaround I can work with.
>>
>> I did recompile the kernel using your patches, but unfortunately the
>> problem still appears.
>>
>> Thanks,
>> Johannes
>>
> Interesting, just touching file will cause EQUOTA is quite a big problem.
>
> I'll try to reproduce it with my patchset and see what really caused
> the problem.
> The problem seems to do with snapshot qgroup hacking.
> But I'm not completely sure yet.
>
> BTW, does "sync; btrfs qgroup show -prce" still show excl as 16K?
> 16K is the correct number with only 6 empty files, just in case.
>
> Thanks,
> Qu

I ran my example from the first mail again and managed to write 7 files
this time, "qgroup show" still shows 16kB after sync:

root@t420:/media/extern/snap# btrfs qg limit -e 50M .
root@t420:/media/extern/snap# for file in {1..100}; do touch $file;
sleep 5m; done
touch: cannot touch ‘8’: Disk quota exceeded
^C
root@t420:/media/extern/snap# sync
root@t420:/media/extern/snap# btrfs qgroup show -pcre .
qgroupid rfer excl max_rfer max_excl parent  child
     --  -
0/5  16.00KiB 16.00KiB none none --- --- 
0/25716.00KiB 16.00KiB none none --- --- 
0/25816.00KiB 16.00KiB none 50.00MiB --- --- 
root@t420:/media/extern/snap# btrfs fi sync .
FSSync '.'
root@t420:/media/extern/snap# btrfs qgroup show -pcre .
qgroupid rfer excl max_rfer max_excl parent  child
     --  -
0/5  16.00KiB 16.00KiB none none --- --- 
0/25716.00KiB 16.00KiB none none --- --- 
0/25816.00KiB 16.00KiB none 50.00MiB --- --- 

By the way, I don't if its relevant but the problem is 

Re: Exclusive quota of snapshot exceeded despite no space used

2015-10-25 Thread Qu Wenruo



在 2015年10月25日 19:10, Johannes Henninger 写道:

On 25.10.2015 02:44, Qu Wenruo wrote:



在 2015年10月23日 23:05, Johannes Henninger 写道:

On 23.10.2015 01:47, Qu Wenruo wrote:

在 2015年10月23日 04:38, Johannes Henninger 写道:

I'm having a weird problem with snapshots and exclusive quotas. After
creating a snapshot of a subvolume and setting an exclusive quota of
50MB for the snapshot, everything seems to work fine. I can write
approximately 50MB before the quota kicks in.

However, if I create a snapshot, set an exclusive quota and just wait
for some time, I suddenly cannot even create an empty file because I'm
getting a "quota exceeded" error. The time until the bug appears seems
to vary. During the waiting time, I'm changing neither the snapshot
nor
the original subvolume. "qgroup show -e" reports an exclusive use of
only a few kilobytes for the snapshot, which is nowhere near the
limit.

Steps to reproduce (/media/extern is a fresh and empty btrfs
partition):

Enable quota and create an empty subvolume:
   root@t420:/media/extern# btrfs quota enable .
   root@t420:/media/extern# btrfs subvolume create sub
   Create subvolume './sub'

Snapshot the subvolume and set a limit:
   root@t420:/media/extern# btrfs subvolume snapshot sub snap
   Create a snapshot of 'sub' in './snap'
   root@t420:/media/extern# cd snap/
   root@t420:/media/extern/snap# btrfs qgroup limit -e 50M .

Sometimes it takes "longer" for the quota to kick in, so I'm
touching a
file every 5 minutes here:

   root@t420:/media/extern/snap# for file in {1..100}; do touch
$file;
sleep 5m; done
   touch: cannot touch ‘7’: Disk quota exceeded
   ^C
   root@t420:/media/extern/snap# btrfs qgroup show -e .
   qgroupid rfer excl max_excl
      
   0/5  16.00KiB 16.00KiB none
   0/25716.00KiB 16.00KiB none
   0/25816.00KiB 16.00KiB 50.00MiB

Any idea why this happens?

BTW, to make btrfs qgroup show work, it's better to call sync before
qgroup show.

It's a known bug that even after qgroup accounting rework, qgroup
reserve still has bug and can cause reserved space to underflow,
making such problem happen.

For such case, btrfs qgroup show won't help as reserved space is not
shown in the output.

One workaround would be, umount the filesystem and mount again.
Which will reset the underflow reserved space and work for sometime.

If it's OK for you to recompile the kernel, you can try the following
patchset:
[PATCH v3 00/21] Rework btrfs qgroup reserved space framework

Which should solve the problem.

Thanks,
Qu



Thanks a lot for your reply!

While remounting the filesystem fixes the issue temporary, it doesn't
take very long for the bug to happen again so it's not really a
workaround I can work with.

I did recompile the kernel using your patches, but unfortunately the
problem still appears.

Thanks,
Johannes


Interesting, just touching file will cause EQUOTA is quite a big problem.

I'll try to reproduce it with my patchset and see what really caused
the problem.
The problem seems to do with snapshot qgroup hacking.
But I'm not completely sure yet.

BTW, does "sync; btrfs qgroup show -prce" still show excl as 16K?
16K is the correct number with only 6 empty files, just in case.

Thanks,
Qu


I ran my example from the first mail again and managed to write 7 files
this time, "qgroup show" still shows 16kB after sync:

root@t420:/media/extern/snap# btrfs qg limit -e 50M .
root@t420:/media/extern/snap# for file in {1..100}; do touch $file;
sleep 5m; done
touch: cannot touch ‘8’: Disk quota exceeded
^C
root@t420:/media/extern/snap# sync
root@t420:/media/extern/snap# btrfs qgroup show -pcre .
qgroupid rfer excl max_rfer max_excl parent  child
     --  -
0/5  16.00KiB 16.00KiB none none --- ---
0/25716.00KiB 16.00KiB none none --- ---
0/25816.00KiB 16.00KiB none 50.00MiB --- ---
root@t420:/media/extern/snap# btrfs fi sync .
FSSync '.'
root@t420:/media/extern/snap# btrfs qgroup show -pcre .
qgroupid rfer excl max_rfer max_excl parent  child
     --  -
0/5  16.00KiB 16.00KiB none none --- ---
0/25716.00KiB 16.00KiB none none --- ---
0/25816.00KiB 16.00KiB none 50.00MiB --- ---


Thanks for the report.

At least, the rfer/excl number is correct.

So the problem is limited to qgroup reserved space.
I'll debug it tomorrow to see what's going wrong inside the reserved space.


By the way, I don't if its relevant but the problem is not limited to
exclusive quotas, but also happens when setting a "referenced" limit
(qgroup limit 

Re: btrfs double send

2015-10-25 Thread Filipe Manana
On Sun, Oct 25, 2015 at 1:38 PM, Ed Tomlinson  wrote:
> Filipe,
>
> Its still not perfect.  Here I can do sequential sends a few times then I
> get something like this:
>
> [root@grover snap]# sh -x brh
> + base=/snap/shot
> ++ date +%Y-%V-%u_%m-%d_%H:%M
> + stamp=2015-43-6_10-24_10:24
> + btrfs subv snapshot -r /snap/homevol /snap/shot.2015-43-6_10-24_10:24
> Create a readonly snapshot of '/snap/homevol' in
> '/snap/shot.2015-43-6_10-24_10:24'
> + sync
> + /usr/bin/time -f 'send %E' nice -19 ionice -c3 btrfs send -v -p /snap/shot
> /snap/shot.2015-43-6_10-24_10:24
> + btrfs receive -v /backup/snap
> At subvol /snap/shot.2015-43-6_10-24_10:24
> At snapshot shot.2015-43-6_10-24_10:24
> receiving snapshot shot.2015-43-6_10-24_10:24
> uuid=cb3ad856-ca0f-f744-b7c3-b33f2d5bc8d3, ctransid=625267
> parent_uuid=cb3ad856-ca0f-f744-b7c3-b33f2d5bc8d3, parent_ctransid=619893
> ERROR: unlink home/ed/Maildir/.spam/dovecot.index.log.2 failed. No such file
> or directory
> Command terminated by signal 13
> send 1:49.64
> + sync
> + btrfs subv delete /snap/shot
> Delete subvolume (no-commit): '/snap/shot'
> + sync
> + mv /snap/shot.2015-43-6_10-24_10:24 /snap/shot
>
>
> so there is another bug hiding in the code, its usually something in my
> Maildir that shows in the log.

Different and unrelated problem.
Yes, we know there are still some problems regarding send issuing
invalid/outdated paths to the send stream. Nothing to do with
incorrect uuids in the send stream.

>
> Please let me know if there is anything you would like me to try.  I am
> running 4.2 with the 4.3 for-linus tree applied and the 4.2.x patches with
> btrfs fixes removed.  On top of this are a few patches from this list.
>
> TIA
> Ed Tomlinson
>
> On Saturday, October 24, 2015 1:52:21 PM EDT, Filipe Manana wrote:
>>
>> On Sat, Oct 24, 2015 at 6:36 PM,   wrote:
>>>
>>> Hello,
>>>
>>> I would like to do backups based on btrfs send/receive.
>>>
>>> So I though I would do a transfer over portable HDD and then incremental
>>> sends (using -p) over network.
>>>
>>> Initial : ...
>>
>>
>> It's a bug, we got it recently reported and fixed.
>> The fix is in Chris' integration branch for the upcoming 4.4 merge window:
>>
>>
>> http://git.kernel.org/cgit/linux/kernel/git/mason/linux-btrfs.git/commit/?h=integration-4.4=b96b1db039ebc584d03a9933b279e0d3e704c528
>>
>> cheers
>>>
>>>
>>> Best regards,
>>>
>>> Lubos
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>>> the body of a message to majord...@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>
>>
>>
>



-- 
Filipe David Manana,

"Reasonable men adapt themselves to the world.
 Unreasonable men adapt the world to themselves.
 That's why all progress depends on unreasonable men."
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 06/21] btrfs: delayed_ref: Add new function to record reserved space into delayed ref

2015-10-25 Thread Qu Wenruo



Filipe Manana wrote on 2015/10/25 14:39 +:

On Tue, Oct 13, 2015 at 3:20 AM, Qu Wenruo  wrote:

Add new function btrfs_add_delayed_qgroup_reserve() function to record
how much space is reserved for that extent.

As btrfs only accounts qgroup at run_delayed_refs() time, so newly
allocated extent should keep the reserved space until then.

So add needed function with related members to do it.

Signed-off-by: Qu Wenruo 
---
v2:
   None
v3:
   None
---
  fs/btrfs/delayed-ref.c | 29 +
  fs/btrfs/delayed-ref.h | 14 ++
  2 files changed, 43 insertions(+)

diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c
index ac3e81d..bd9b63b 100644
--- a/fs/btrfs/delayed-ref.c
+++ b/fs/btrfs/delayed-ref.c
@@ -476,6 +476,8 @@ add_delayed_ref_head(struct btrfs_fs_info *fs_info,
 INIT_LIST_HEAD(_ref->ref_list);
 head_ref->processing = 0;
 head_ref->total_ref_mod = count_mod;
+   head_ref->qgroup_reserved = 0;
+   head_ref->qgroup_ref_root = 0;

 /* Record qgroup extent info if provided */
 if (qrecord) {
@@ -746,6 +748,33 @@ int btrfs_add_delayed_data_ref(struct btrfs_fs_info 
*fs_info,
 return 0;
  }

+int btrfs_add_delayed_qgroup_reserve(struct btrfs_fs_info *fs_info,
+struct btrfs_trans_handle *trans,
+u64 ref_root, u64 bytenr, u64 num_bytes)
+{
+   struct btrfs_delayed_ref_root *delayed_refs;
+   struct btrfs_delayed_ref_head *ref_head;
+   int ret = 0;
+
+   if (!fs_info->quota_enabled || !is_fstree(ref_root))
+   return 0;
+
+   delayed_refs = >transaction->delayed_refs;
+
+   spin_lock(_refs->lock);
+   ref_head = find_ref_head(_refs->href_root, bytenr, 0);
+   if (!ref_head) {
+   ret = -ENOENT;
+   goto out;
+   }


Hi Qu,

So while running btrfs/063, with qgroups enabled (I modified the test
to enable qgroups), ran into this 2 times:


Thanks for the test.

I also want a method to enable quota for all other btrfs/generic tests,
but have no good idea other than modifing testcase itself.

Any good ideas?


[169125.246506] BTRFS info (device sdc): disk space caching is enabled
[169125.363164] [ cut here ]
[169125.365236] WARNING: CPU: 10 PID: 2827 at fs/btrfs/inode.c:2929
btrfs_finish_ordered_io+0x347/0x4eb [btrfs]()
[169125.367702] BTRFS: Transaction aborted (error -2)
[169125.368830] Modules linked in: btrfs dm_flakey dm_mod
crc32c_generic xor raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs
lockd grace fscache sunrpc loop fuse parport_pc parport i2c_piix4
psmouse acpi_cpufreq microcode pcspkr processor evdev i2c_core
serio_raw button ext4 crc16 jbd2 mbcache sd_mod sg sr_mod cdrom
ata_generic virtio_scsi ata_piix libata floppy virtio_pci virtio_ring
scsi_mod e1000 virtio [last unloaded: btrfs]
[169125.376755] CPU: 10 PID: 2827 Comm: kworker/u32:14 Tainted: G
   W   4.3.0-rc5-btrfs-next-17+ #1
[169125.378522] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS rel-1.8.1-0-g4adadbd-20150316_085822-nilsson.home.kraxel.org
04/01/2014
[169125.380916] Workqueue: btrfs-endio-write btrfs_endio_write_helper [btrfs]
[169125.382167]   88007ef2bc28 812566f4
88007ef2bc70
[169125.383643]  88007ef2bc60 8104d0a6 a03cac33
8801f5ca6db0
[169125.385197]  8802c6c7ee98 880122bc1000 fffe
88007ef2bcc8
[169125.386691] Call Trace:
[169125.387194]  [] dump_stack+0x4e/0x79
[169125.388205]  [] warn_slowpath_common+0x9f/0xb8
[169125.389386]  [] ?
btrfs_finish_ordered_io+0x347/0x4eb [btrfs]
[169125.390837]  [] warn_slowpath_fmt+0x48/0x50
[169125.391839]  [] ? unpin_extent_cache+0xbe/0xcc [btrfs]
[169125.392973]  []
btrfs_finish_ordered_io+0x347/0x4eb [btrfs]
[169125.395714]  [] ? _raw_spin_unlock_irqrestore+0x38/0x60
[169125.396888]  [] ? trace_hardirqs_off_caller+0x1f/0xb9
[169125.397986]  [] finish_ordered_fn+0x15/0x17 [btrfs]
[169125.399122]  [] normal_work_helper+0x14c/0x32a [btrfs]
[169125.400300]  [] btrfs_endio_write_helper+0x12/0x14 [btrfs]
[169125.401450]  [] process_one_work+0x24a/0x4ac
[169125.402631]  [] worker_thread+0x206/0x2c2
[169125.403622]  [] ? rescuer_thread+0x2cb/0x2cb
[169125.404693]  [] kthread+0xef/0xf7
[169125.405727]  [] ? kthread_parkme+0x24/0x24
[169125.406808]  [] ret_from_fork+0x3f/0x70
[169125.407834]  [] ? kthread_parkme+0x24/0x24
[169125.408840] ---[ end trace 6ee4342a5722b119 ]---
[169125.409654] BTRFS: error (device sdc) in
btrfs_finish_ordered_io:2929: errno=-2 No such entry

So what you have here is racy:

btrfs_finish_ordered_io()
joins existing transaction (or starts a new one)
insert_reserved_file_extent()
   btrfs_alloc_reserved_file_extent() --> creates delayed ref

   *** delayed refs are run, someone called
btrfs_async_run_delayed_refs() from btrfs_end_transaction(), ref 

[PATCH v5] btrfs: qgroup: Don't copy extent buffer to do qgroup rescan

2015-10-25 Thread Qu Wenruo
Ancient qgroup code call memcpy() on a extent buffer and use it for leaf
iteration.

As extent buffer contains lock, pointers to pages, it's never sane to do
such copy.

The following bug may be caused by this insane operation:
[92098.841309] general protection fault:  [#1] SMP
[92098.841338] Modules linked in: ...
[92098.841814] CPU: 1 PID: 24655 Comm: kworker/u4:12 Not tainted
4.3.0-rc1 #1
[92098.841868] Workqueue: btrfs-qgroup-rescan btrfs_qgroup_rescan_helper
[btrfs]
[92098.842261] Call Trace:
[92098.842277]  [] ? read_extent_buffer+0xb8/0x110
[btrfs]
[92098.842304]  [] ? btrfs_find_all_roots+0x60/0x70
[btrfs]
[92098.842329]  []
btrfs_qgroup_rescan_worker+0x28d/0x5a0 [btrfs]

Where btrfs_qgroup_rescan_worker+0x28d is btrfs_disk_key_to_cpu(),
called in reading key from the copied extent_buffer.

This patch will use btrfs_clone_extent_buffer() to a better copy of
extent buffer to deal such case.

Reported-by: Stephane Lesimple 
Suggested-by: Filipe Manana 
Signed-off-by: Qu Wenruo 
---
v2:
  Follow the parameter change in previous patch.
v3:
  None
v4:
  Use btrfs_clone_extent_buffer() other than introducing new facilities
v5:
  Change slot = path->slots[0] postion.
---
 fs/btrfs/qgroup.c | 26 --
 1 file changed, 16 insertions(+), 10 deletions(-)

diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 158633c..31d1934 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -2192,10 +2192,10 @@ void assert_qgroups_uptodate(struct btrfs_trans_handle 
*trans)
  */
 static int
 qgroup_rescan_leaf(struct btrfs_fs_info *fs_info, struct btrfs_path *path,
-  struct btrfs_trans_handle *trans,
-  struct extent_buffer *scratch_leaf)
+  struct btrfs_trans_handle *trans)
 {
struct btrfs_key found;
+   struct extent_buffer *scratch_leaf = NULL;
struct ulist *roots = NULL;
struct seq_list tree_mod_seq_elem = SEQ_LIST_INIT(tree_mod_seq_elem);
u64 num_bytes;
@@ -2233,7 +2233,15 @@ qgroup_rescan_leaf(struct btrfs_fs_info *fs_info, struct 
btrfs_path *path,
fs_info->qgroup_rescan_progress.objectid = found.objectid + 1;
 
btrfs_get_tree_mod_seq(fs_info, _mod_seq_elem);
-   memcpy(scratch_leaf, path->nodes[0], sizeof(*scratch_leaf));
+   scratch_leaf = btrfs_clone_extent_buffer(path->nodes[0]);
+   if (!scratch_leaf) {
+   ret = -ENOMEM;
+   mutex_unlock(_info->qgroup_rescan_lock);
+   goto out;
+   }
+   extent_buffer_get(scratch_leaf);
+   btrfs_tree_read_lock(scratch_leaf);
+   btrfs_set_lock_blocking_rw(scratch_leaf, BTRFS_READ_LOCK);
slot = path->slots[0];
btrfs_release_path(path);
mutex_unlock(_info->qgroup_rescan_lock);
@@ -2259,6 +2267,10 @@ qgroup_rescan_leaf(struct btrfs_fs_info *fs_info, struct 
btrfs_path *path,
goto out;
}
 out:
+   if (scratch_leaf) {
+   btrfs_tree_read_unlock_blocking(scratch_leaf);
+   free_extent_buffer(scratch_leaf);
+   }
btrfs_put_tree_mod_seq(fs_info, _mod_seq_elem);
 
return ret;
@@ -2270,16 +2282,12 @@ static void btrfs_qgroup_rescan_worker(struct 
btrfs_work *work)
 qgroup_rescan_work);
struct btrfs_path *path;
struct btrfs_trans_handle *trans = NULL;
-   struct extent_buffer *scratch_leaf = NULL;
int err = -ENOMEM;
int ret = 0;
 
path = btrfs_alloc_path();
if (!path)
goto out;
-   scratch_leaf = kmalloc(sizeof(*scratch_leaf), GFP_NOFS);
-   if (!scratch_leaf)
-   goto out;
 
err = 0;
while (!err) {
@@ -2291,8 +2299,7 @@ static void btrfs_qgroup_rescan_worker(struct btrfs_work 
*work)
if (!fs_info->quota_enabled) {
err = -EINTR;
} else {
-   err = qgroup_rescan_leaf(fs_info, path, trans,
-scratch_leaf);
+   err = qgroup_rescan_leaf(fs_info, path, trans);
}
if (err > 0)
btrfs_commit_transaction(trans, fs_info->fs_root);
@@ -2301,7 +2308,6 @@ static void btrfs_qgroup_rescan_worker(struct btrfs_work 
*work)
}
 
 out:
-   kfree(scratch_leaf);
btrfs_free_path(path);
 
mutex_lock(_info->qgroup_rescan_lock);
-- 
2.6.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 0/4] VFS: In-kernel copy system call

2015-10-25 Thread Christoph Hellwig
On Sat, Oct 24, 2015 at 11:52:37AM -0500, Eric Biggers wrote:
> A few comments:
> 
> >   if (!(file_in->f_mode & FMODE_READ) ||
> >   !(file_out->f_mode & FMODE_WRITE) ||
> >   (file_out->f_flags & O_APPEND) ||
> >   !file_out->f_op)
> >   return -EBADF;
> 
> Isn't 'f_op' always non-NULL?

Yes, its is.

> If the destination file cannot be append-only, shouldn't this be documented?

Yes.

> > if (inode_in->i_sb != inode_out->i_sb ||
> > file_in->f_path.mnt != file_out->f_path.mnt)
> > return -EXDEV;
> 
> Doesn't the same mount already imply the same superblock?

It does.

> > /*
> >  * copy_file_range() differs from regular file read and write in that it
> >  * specifically allows return partial success.  When it does so is up to
> >  * the copy_file_range method.
> >  */
> 
> What does this mean?  I thought that read() and write() can also return
> partial success.

The syscalls are allow to return short from the standards perspective,
but if you actually do that for regualr fiels hell will break loose as
applications don't expect it.  That's why we can't actually ever do it.

> Should FMODE_PREAD or FMODE_PWRITE access be checked if the user specifies 
> their
> own 'off_in' or 'off_out', respectively?

Maybe.

> What is supposed to happen if the user passes provides a file descriptor to a
> non-regular file, such as a block device or char device?

If they implement the proper method I see no reason why we can't support
it.  For block device we only have one file_ops instance and mapping
that to the bio-level XCOPY abstraction that's been posted a couple of
times would seem sensible.  For character devices that's entirely up to
the driver.

> If the 'in' file has fewer than 'len' bytes remaining until EOF, what is the
> expected behavior?  It looks like the btrfs implementation has different
> behavior from the pagecache implementation.

Good question.  I'd say failure is the right way to handle a mismatching
length.

> It appears the btrfs implementation has alignment restrictions --- where is 
> this
> documented and how will users know what alignment to use?

For actual clones we're limited to the file system block size (NFS adds
an extra attribute for the clone block size), but for regaulr copies we
probably should fall back to the dumb implementation if we don't match
it.

> Are copies within the same file permitted and can the ranges overlap?  The man
> page doesn't say.

For clones we defintively want to support it, but for copies I'd be
tempted to say no.  Does anyone else have an opinion?

> It looks like the initial patch defines __NR_copy_file_range for the ARM
> architecture but doesn't actually hook that system call up for ARM; why is 
> that?

Looks like that should be dropped.  I really wish we had a way to just
wire up syscalls everywhere.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 5/4] copy_file_range.2: New page documenting copy_file_range()

2015-10-25 Thread Christoph Hellwig
On Sat, Oct 24, 2015 at 01:02:21PM +0100, P??draig Brady wrote:
> I'm a bit worried about the sparse expansion and default reflinking
> which might preclude cp(1) from using this call in most cases, but I will
> test and try to use it. coreutils has heuristics for determining if files
> are remote, which we might use to restrict to that use case.

Can you explain why reflinking and hole expansion are an issue if done
locally and not if done remotely?  I'd really like to make the call as
usable as possible for everyone, but we really need clear sem?ntics for
that.

Also note that Annas current series allows for hole filling - any decent
implementation should not do them, but that's really a quality of
implementation and not an interface issue.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: zero out delayed node upon allocation

2015-10-25 Thread Jeff Mahoney

On Oct 25, 2015, at 3:50 PM, Alexandru Moise <00moses.alexande...@gmail.com> 
wrote:

>>> This allows us to trim out half of btrfs_init_delayed_node() which
>>> is now reduntant.
>> 
>> It's redundant if kmem_cache_zalloc is used, but you haven't
>> documented that doing so is now required.  For all of these changes
>> you've posted, if they're to be accepted, I'd really prefer to set up
>> the slab with a constructor instead.  Then we don't need to worry
>> about such guarantees.  The object returned via kmem_cache_alloc will
>> always be properly initialized.
> 
> Well I wouldn't say it's *required* just makes this particular piece
> of code neater, since we memset-zero the node's inode_item _anyways_.

But the rest of the delayed node still needs to be initialized. That's 
happening implicitly with your patch via zalloc but if anyone ever adds a new 
allocation from that cache, they'll need to know that zalloc is required for 
proper initializatiom. Documenting that is one approach. Constructors are 
another.

> I like the constructor idea though, do you suggest I should invest in
> that idea?

Well, see below.

>> 
>> This makes assumptions about atomic_t and what atomic_set does that
>> aren't guaranteed to be true.  When accessors/mutators are part of the
>> API they should be used.
>> 
>> - -Jeff
> 
> You're right, taking out that atomic_set was really stupid. I'll
> resent the patch with a proper explanation in the commit message and
> put the atomic_set back.
> 
> Unless you feel that the change is rather pointless, I'll gladly back
> off :-).

I don't want to dissuade new contributors but there's plenty of work to be done 
before we start worrying about cleaning initializers. These aren't hot paths 
(but if they were, the implicit memset of the whole object might have negative 
effects.) The immediate impact of cleanup patches like these is code churn so 
that developers with outstanding patch sets need to rebase on top of new 
context. It's a normal part of the development cycle but it's more work for not 
a lot of benefit.

So, if your goal is to contribute to btrfs, I'd suggest digging into the code. 
Try out changes. Break it and figure out how you broke it. It's like learning 
how to navigate in a new city: by getting lost you learn more. :)

-Jeff--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: zero out delayed node upon allocation

2015-10-25 Thread Alexandru Moise
On Sun, Oct 25, 2015 at 01:33:45PM -0400, Jeff Mahoney wrote:
> 
> On Oct 25, 2015, at 3:50 PM, Alexandru Moise <00moses.alexande...@gmail.com> 
> wrote:
> 
> >>> This allows us to trim out half of btrfs_init_delayed_node() which
> >>> is now reduntant.
> >> 
> >> It's redundant if kmem_cache_zalloc is used, but you haven't
> >> documented that doing so is now required.  For all of these changes
> >> you've posted, if they're to be accepted, I'd really prefer to set up
> >> the slab with a constructor instead.  Then we don't need to worry
> >> about such guarantees.  The object returned via kmem_cache_alloc will
> >> always be properly initialized.
> > 
> > Well I wouldn't say it's *required* just makes this particular piece
> > of code neater, since we memset-zero the node's inode_item _anyways_.
> 
> But the rest of the delayed node still needs to be initialized. That's 
> happening implicitly with your patch via zalloc but if anyone ever adds a new 
> allocation from that cache, they'll need to know that zalloc is required for 
> proper initializatiom. Documenting that is one approach. Constructors are 
> another.
> 
> > I like the constructor idea though, do you suggest I should invest in
> > that idea?
> 
> Well, see below.
> 
> >> 
> >> This makes assumptions about atomic_t and what atomic_set does that
> >> aren't guaranteed to be true.  When accessors/mutators are part of the
> >> API they should be used.
> >> 
> >> - -Jeff
> > 
> > You're right, taking out that atomic_set was really stupid. I'll
> > resent the patch with a proper explanation in the commit message and
> > put the atomic_set back.
> > 
> > Unless you feel that the change is rather pointless, I'll gladly back
> > off :-).
> 
> I don't want to dissuade new contributors but there's plenty of work to be 
> done before we start worrying about cleaning initializers. These aren't hot 
> paths (but if they were, the implicit memset of the whole object might have 
> negative effects.) The immediate impact of cleanup patches like these is code 
> churn so that developers with outstanding patch sets need to rebase on top of 
> new context. It's a normal part of the development cycle but it's more work 
> for not a lot of benefit.
> 
> So, if your goal is to contribute to btrfs, I'd suggest digging into the 
> code. Try out changes. Break it and figure out how you broke it. It's like 
> learning how to navigate in a new city: by getting lost you learn more. :)
> 
> -Jeff

Yeah I see your point, I take back version 2 of this patch and will
follow your advice. Much appreciated!

Alex
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: zero out delayed node upon allocation

2015-10-25 Thread Alexandru Moise
> > This allows us to trim out half of btrfs_init_delayed_node() which
> > is now reduntant.
> 
> It's redundant if kmem_cache_zalloc is used, but you haven't
> documented that doing so is now required.  For all of these changes
> you've posted, if they're to be accepted, I'd really prefer to set up
> the slab with a constructor instead.  Then we don't need to worry
> about such guarantees.  The object returned via kmem_cache_alloc will
> always be properly initialized.

Well I wouldn't say it's *required* just makes this particular piece
of code neater, since we memset-zero the node's inode_item _anyways_.
I like the constructor idea though, do you suggest I should invest in
that idea?

> 
> This makes assumptions about atomic_t and what atomic_set does that
> aren't guaranteed to be true.  When accessors/mutators are part of the
> API they should be used.
> 
> - -Jeff

You're right, taking out that atomic_set was really stupid. I'll
resent the patch with a proper explanation in the commit message and
put the atomic_set back.

Unless you feel that the change is rather pointless, I'll gladly back
off :-).

Regards,
Alex
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] btrfs: zero out delayed node upon allocation

2015-10-25 Thread Alexandru Moise
It's slightly cleaner to zero-out the delayed node upon allocation
than to do it by hand in btrfs_init_delayed_node() for a few members

Signed-off-by: Alexandru Moise <00moses.alexande...@gmail.com>
---
v2: Thanks Jeff Mahoney for pointing out the mistake of removing
the atomic_set in v1 of the patch. atomic_set() call restored.

 fs/btrfs/delayed-inode.c | 7 +--
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c
index a2ae427..0a00c4f 100644
--- a/fs/btrfs/delayed-inode.c
+++ b/fs/btrfs/delayed-inode.c
@@ -54,16 +54,11 @@ static inline void btrfs_init_delayed_node(
delayed_node->root = root;
delayed_node->inode_id = inode_id;
atomic_set(_node->refs, 0);
-   delayed_node->count = 0;
-   delayed_node->flags = 0;
delayed_node->ins_root = RB_ROOT;
delayed_node->del_root = RB_ROOT;
mutex_init(_node->mutex);
-   delayed_node->index_cnt = 0;
INIT_LIST_HEAD(_node->n_list);
INIT_LIST_HEAD(_node->p_list);
-   delayed_node->bytes_reserved = 0;
-   memset(_node->inode_item, 0, sizeof(delayed_node->inode_item));
 }
 
 static inline int btrfs_is_continuous_delayed_item(
@@ -132,7 +127,7 @@ again:
if (node)
return node;
 
-   node = kmem_cache_alloc(delayed_node_cache, GFP_NOFS);
+   node = kmem_cache_zalloc(delayed_node_cache, GFP_NOFS);
if (!node)
return ERR_PTR(-ENOMEM);
btrfs_init_delayed_node(node, root, ino);
-- 
2.6.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs double send

2015-10-25 Thread Ed Tomlinson

Filipe,

I realize its another bug.  Two additional pieces of info that might help.
One, btrfs-progs was at 4.1.2 (I missed a tag= in my git pull).  Second
is that I have been able to recreate this issue three times over a period 
of
two to three days (tring again with 4.2.3).  My fs is probably a good 
testcase for any patches that appear.


Meanwhile I've been falling back to rsync which always works but is so much
slower.

TIA
Ed 


On Sunday, October 25, 2015 9:42:54 AM EDT, Filipe Manana wrote:

On Sun, Oct 25, 2015 at 1:38 PM, Ed Tomlinson  wrote:

Filipe,

Its still not perfect.  Here I can do sequential sends a few times then I
get something like this:

[root@grover snap]# sh -x brh
+ base=/snap/shot ...


Different and unrelated problem.
Yes, we know there are still some problems regarding send issuing
invalid/outdated paths to the send stream. Nothing to do with
incorrect uuids in the send stream.


Please let me know if there is anything you would like me to try.  I am
running 4.2 with the 4.3 for-linus tree applied and the 4.2.x patches with
btrfs fixes removed.  On top of this are a few patches from this list.

TIA
Ed Tomlinson ...






--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs double send

2015-10-25 Thread Filipe Manana
On Sun, Oct 25, 2015 at 2:10 PM, Ed Tomlinson  wrote:
> Filipe,
>
> I realize its another bug.  Two additional pieces of info that might help.
> One, btrfs-progs was at 4.1.2 (I missed a tag= in my git pull).

Btrfs-progs' version is irrelevant. The problem is the send side
generating incorrect paths in the send stream (as I mentioned
earlier), so it's exclusively a problem in the kernel.

> Second
> is that I have been able to recreate this issue three times over a period of
> two to three days (tring again with 4.2.3).  My fs is probably a good
> testcase for any patches that appear.

Yeah.
If you can find out what happened to the directory/file in question
between the parent and send snapshots (was it renamed/moved? was it
deleted and then a new one with the same name/location created?) that
helps. Otherwise looking at btrfs-debug-tree and see the trees for
each snapshot helps too, but you need to figure out the number of the
inode where it's failing (sometimes btrfs receive -vv is enough to
figure this out).

thanks

>
> Meanwhile I've been falling back to rsync which always works but is so much
> slower.
>
> TIA
> Ed
> On Sunday, October 25, 2015 9:42:54 AM EDT, Filipe Manana wrote:
>>
>> On Sun, Oct 25, 2015 at 1:38 PM, Ed Tomlinson  wrote:
>>>
>>> Filipe,
>>>
>>> Its still not perfect.  Here I can do sequential sends a few times then I
>>> get something like this:
>>>
>>> [root@grover snap]# sh -x brh
>>> + base=/snap/shot ...
>>
>>
>> Different and unrelated problem.
>> Yes, we know there are still some problems regarding send issuing
>> invalid/outdated paths to the send stream. Nothing to do with
>> incorrect uuids in the send stream.
>>
>>> Please let me know if there is anything you would like me to try.  I am
>>> running 4.2 with the 4.3 for-linus tree applied and the 4.2.x patches
>>> with
>>> btrfs fixes removed.  On top of this are a few patches from this list.
>>>
>>> TIA
>>> Ed Tomlinson ...
>>
>>
>>
>>
>



-- 
Filipe David Manana,

"Reasonable men adapt themselves to the world.
 Unreasonable men adapt the world to themselves.
 That's why all progress depends on unreasonable men."
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 06/21] btrfs: delayed_ref: Add new function to record reserved space into delayed ref

2015-10-25 Thread Filipe Manana
On Tue, Oct 13, 2015 at 3:20 AM, Qu Wenruo  wrote:
> Add new function btrfs_add_delayed_qgroup_reserve() function to record
> how much space is reserved for that extent.
>
> As btrfs only accounts qgroup at run_delayed_refs() time, so newly
> allocated extent should keep the reserved space until then.
>
> So add needed function with related members to do it.
>
> Signed-off-by: Qu Wenruo 
> ---
> v2:
>   None
> v3:
>   None
> ---
>  fs/btrfs/delayed-ref.c | 29 +
>  fs/btrfs/delayed-ref.h | 14 ++
>  2 files changed, 43 insertions(+)
>
> diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c
> index ac3e81d..bd9b63b 100644
> --- a/fs/btrfs/delayed-ref.c
> +++ b/fs/btrfs/delayed-ref.c
> @@ -476,6 +476,8 @@ add_delayed_ref_head(struct btrfs_fs_info *fs_info,
> INIT_LIST_HEAD(_ref->ref_list);
> head_ref->processing = 0;
> head_ref->total_ref_mod = count_mod;
> +   head_ref->qgroup_reserved = 0;
> +   head_ref->qgroup_ref_root = 0;
>
> /* Record qgroup extent info if provided */
> if (qrecord) {
> @@ -746,6 +748,33 @@ int btrfs_add_delayed_data_ref(struct btrfs_fs_info 
> *fs_info,
> return 0;
>  }
>
> +int btrfs_add_delayed_qgroup_reserve(struct btrfs_fs_info *fs_info,
> +struct btrfs_trans_handle *trans,
> +u64 ref_root, u64 bytenr, u64 num_bytes)
> +{
> +   struct btrfs_delayed_ref_root *delayed_refs;
> +   struct btrfs_delayed_ref_head *ref_head;
> +   int ret = 0;
> +
> +   if (!fs_info->quota_enabled || !is_fstree(ref_root))
> +   return 0;
> +
> +   delayed_refs = >transaction->delayed_refs;
> +
> +   spin_lock(_refs->lock);
> +   ref_head = find_ref_head(_refs->href_root, bytenr, 0);
> +   if (!ref_head) {
> +   ret = -ENOENT;
> +   goto out;
> +   }

Hi Qu,

So while running btrfs/063, with qgroups enabled (I modified the test
to enable qgroups), ran into this 2 times:

[169125.246506] BTRFS info (device sdc): disk space caching is enabled
[169125.363164] [ cut here ]
[169125.365236] WARNING: CPU: 10 PID: 2827 at fs/btrfs/inode.c:2929
btrfs_finish_ordered_io+0x347/0x4eb [btrfs]()
[169125.367702] BTRFS: Transaction aborted (error -2)
[169125.368830] Modules linked in: btrfs dm_flakey dm_mod
crc32c_generic xor raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs
lockd grace fscache sunrpc loop fuse parport_pc parport i2c_piix4
psmouse acpi_cpufreq microcode pcspkr processor evdev i2c_core
serio_raw button ext4 crc16 jbd2 mbcache sd_mod sg sr_mod cdrom
ata_generic virtio_scsi ata_piix libata floppy virtio_pci virtio_ring
scsi_mod e1000 virtio [last unloaded: btrfs]
[169125.376755] CPU: 10 PID: 2827 Comm: kworker/u32:14 Tainted: G
  W   4.3.0-rc5-btrfs-next-17+ #1
[169125.378522] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS rel-1.8.1-0-g4adadbd-20150316_085822-nilsson.home.kraxel.org
04/01/2014
[169125.380916] Workqueue: btrfs-endio-write btrfs_endio_write_helper [btrfs]
[169125.382167]   88007ef2bc28 812566f4
88007ef2bc70
[169125.383643]  88007ef2bc60 8104d0a6 a03cac33
8801f5ca6db0
[169125.385197]  8802c6c7ee98 880122bc1000 fffe
88007ef2bcc8
[169125.386691] Call Trace:
[169125.387194]  [] dump_stack+0x4e/0x79
[169125.388205]  [] warn_slowpath_common+0x9f/0xb8
[169125.389386]  [] ?
btrfs_finish_ordered_io+0x347/0x4eb [btrfs]
[169125.390837]  [] warn_slowpath_fmt+0x48/0x50
[169125.391839]  [] ? unpin_extent_cache+0xbe/0xcc [btrfs]
[169125.392973]  []
btrfs_finish_ordered_io+0x347/0x4eb [btrfs]
[169125.395714]  [] ? _raw_spin_unlock_irqrestore+0x38/0x60
[169125.396888]  [] ? trace_hardirqs_off_caller+0x1f/0xb9
[169125.397986]  [] finish_ordered_fn+0x15/0x17 [btrfs]
[169125.399122]  [] normal_work_helper+0x14c/0x32a [btrfs]
[169125.400300]  [] btrfs_endio_write_helper+0x12/0x14 [btrfs]
[169125.401450]  [] process_one_work+0x24a/0x4ac
[169125.402631]  [] worker_thread+0x206/0x2c2
[169125.403622]  [] ? rescuer_thread+0x2cb/0x2cb
[169125.404693]  [] kthread+0xef/0xf7
[169125.405727]  [] ? kthread_parkme+0x24/0x24
[169125.406808]  [] ret_from_fork+0x3f/0x70
[169125.407834]  [] ? kthread_parkme+0x24/0x24
[169125.408840] ---[ end trace 6ee4342a5722b119 ]---
[169125.409654] BTRFS: error (device sdc) in
btrfs_finish_ordered_io:2929: errno=-2 No such entry

So what you have here is racy:

btrfs_finish_ordered_io()
   joins existing transaction (or starts a new one)
   insert_reserved_file_extent()
  btrfs_alloc_reserved_file_extent() --> creates delayed ref

  *** delayed refs are run, someone called
btrfs_async_run_delayed_refs() from btrfs_end_transaction(), ref head
is removed **

  btrfs_add_delayed_qgroup_reserve() --> does not find delayed ref
head, returns -ENOENT and 

[PATCH] btrfs: zero out delayed node upon allocation

2015-10-25 Thread Alexandru Moise
This allows us to trim out half of btrfs_init_delayed_node()
which is now reduntant.

Signed-off-by: Alexandru Moise <00moses.alexande...@gmail.com>
---
 fs/btrfs/delayed-inode.c | 8 +---
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c
index a2ae427..7aa84be 100644
--- a/fs/btrfs/delayed-inode.c
+++ b/fs/btrfs/delayed-inode.c
@@ -53,17 +53,11 @@ static inline void btrfs_init_delayed_node(
 {
delayed_node->root = root;
delayed_node->inode_id = inode_id;
-   atomic_set(_node->refs, 0);
-   delayed_node->count = 0;
-   delayed_node->flags = 0;
delayed_node->ins_root = RB_ROOT;
delayed_node->del_root = RB_ROOT;
mutex_init(_node->mutex);
-   delayed_node->index_cnt = 0;
INIT_LIST_HEAD(_node->n_list);
INIT_LIST_HEAD(_node->p_list);
-   delayed_node->bytes_reserved = 0;
-   memset(_node->inode_item, 0, sizeof(delayed_node->inode_item));
 }
 
 static inline int btrfs_is_continuous_delayed_item(
@@ -132,7 +126,7 @@ again:
if (node)
return node;
 
-   node = kmem_cache_alloc(delayed_node_cache, GFP_NOFS);
+   node = kmem_cache_zalloc(delayed_node_cache, GFP_NOFS);
if (!node)
return ERR_PTR(-ENOMEM);
btrfs_init_delayed_node(node, root, ino);
-- 
2.6.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: zero out delayed node upon allocation

2015-10-25 Thread Jeff Mahoney
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

On 10/25/15 1:48 PM, Alexandru Moise wrote:
> This allows us to trim out half of btrfs_init_delayed_node() which
> is now reduntant.

It's redundant if kmem_cache_zalloc is used, but you haven't
documented that doing so is now required.  For all of these changes
you've posted, if they're to be accepted, I'd really prefer to set up
the slab with a constructor instead.  Then we don't need to worry
about such guarantees.  The object returned via kmem_cache_alloc will
always be properly initialized.

> Signed-off-by: Alexandru Moise <00moses.alexande...@gmail.com> --- 
> fs/btrfs/delayed-inode.c | 8 +--- 1 file changed, 1
> insertion(+), 7 deletions(-)
> 
> diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c 
> index a2ae427..7aa84be 100644 --- a/fs/btrfs/delayed-inode.c +++
> b/fs/btrfs/delayed-inode.c @@ -53,17 +53,11 @@ static inline void
> btrfs_init_delayed_node( { delayed_node->root = root; 
> delayed_node->inode_id = inode_id; -
> atomic_set(_node->refs, 0);

This makes assumptions about atomic_t and what atomic_set does that
aren't guaranteed to be true.  When accessors/mutators are part of the
API they should be used.

- -Jeff

> - delayed_node->count = 0; -  delayed_node->flags = 0; 
> delayed_node->ins_root = RB_ROOT; delayed_node->del_root =
> RB_ROOT; mutex_init(_node->mutex); -
> delayed_node->index_cnt = 0; 
> INIT_LIST_HEAD(_node->n_list); 
> INIT_LIST_HEAD(_node->p_list); -
> delayed_node->bytes_reserved = 0; -
> memset(_node->inode_item, 0,
> sizeof(delayed_node->inode_item)); }
> 
> static inline int btrfs_is_continuous_delayed_item( @@ -132,7
> +126,7 @@ again: if (node) return node;
> 
> - node = kmem_cache_alloc(delayed_node_cache, GFP_NOFS); +node =
> kmem_cache_zalloc(delayed_node_cache, GFP_NOFS); if (!node) return
> ERR_PTR(-ENOMEM); btrfs_init_delayed_node(node, root, ino);
> 


- -- 
Jeff Mahoney
SUSE Labs
-BEGIN PGP SIGNATURE-
Version: GnuPG/MacGPG2 v2

iQIcBAEBCAAGBQJWLP5RAAoJEB57S2MheeWyQMoQAMUNGQFl14cwbJWtSsNjwZJy
eRaSckKcRinc4zoFxATSn4R/UQLadFNg+tAtSnNgoJngpg/nn6Fc19vMS5QVpU1c
OhPgYUVkBTjLxhBaY6Iivinbx4+pjFaEP+e6LoqXefelB6Wq17e8DavQcDU5gH2e
3vzXLt8kKa1hlC5SbtjW59cLgHzrclR6h4qeO00R7vWPA4YyIzOKOnYznrgZ69wS
V97YBmcyucrl+EENYFe5BYBJmP5LojHjsxfPtF2zzZnvCpSrWPZ5Num/2yCofFir
7HNs86wLEBjsEUhkSjZ/7u3XATjVOXNHVUsknOOMB/B6zk9ngnK51mp7a2Ib41PM
nMbNOVHhauumY9sf++RWjN53HTGH8aAlp7B0JIRTceGrvKdT2YGjW8iWwIrtlCE0
bSoEXRNcMYFvR+0we/WKVi4sX0ysRfhDbCuQeWPEJ6R4DbzmtqjCmmHQnZyVpwex
1ZxT8yvFVJvQHEl2E+mHOneRgdVuj89j8AZJ99idIHKa9b60tHAlkq+zXoW0odIt
qV09bYoc7J87W5JvTgKVks13ovwKmM7Ij7zHFMBLVO/BGrd+zFFW1qq8r6GJBKG9
PaXt+HUHBCt4TpvAza5JZ0Yn7tYvALHizj+IZStTw2Kz1vQC5KEnjZIMa3LcKrsx
r2oQLSfGkcsf8EcIzDX1
=qINk
-END PGP SIGNATURE-
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs: pass proper enum type to start_transaction()

2015-10-25 Thread Alexandru Moise
Signed-off-by: Alexandru Moise <00moses.alexande...@gmail.com>
---
 fs/btrfs/transaction.c | 15 ++-
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index a5b0644..cb50f53 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -618,17 +618,20 @@ struct btrfs_trans_handle *btrfs_start_transaction_lflush(
 
 struct btrfs_trans_handle *btrfs_join_transaction(struct btrfs_root *root)
 {
-   return start_transaction(root, 0, TRANS_JOIN, 0);
+   return start_transaction(root, 0, TRANS_JOIN,
+BTRFS_RESERVE_NO_FLUSH);
 }
 
 struct btrfs_trans_handle *btrfs_join_transaction_nolock(struct btrfs_root 
*root)
 {
-   return start_transaction(root, 0, TRANS_JOIN_NOLOCK, 0);
+   return start_transaction(root, 0, TRANS_JOIN_NOLOCK,
+BTRFS_RESERVE_NO_FLUSH);
 }
 
 struct btrfs_trans_handle *btrfs_start_ioctl_transaction(struct btrfs_root 
*root)
 {
-   return start_transaction(root, 0, TRANS_USERSPACE, 0);
+   return start_transaction(root, 0, TRANS_USERSPACE,
+BTRFS_RESERVE_NO_FLUSH);
 }
 
 /*
@@ -646,7 +649,8 @@ struct btrfs_trans_handle 
*btrfs_start_ioctl_transaction(struct btrfs_root *root
  */
 struct btrfs_trans_handle *btrfs_attach_transaction(struct btrfs_root *root)
 {
-   return start_transaction(root, 0, TRANS_ATTACH, 0);
+   return start_transaction(root, 0, TRANS_ATTACH,
+BTRFS_RESERVE_NO_FLUSH);
 }
 
 /*
@@ -661,7 +665,8 @@ btrfs_attach_transaction_barrier(struct btrfs_root *root)
 {
struct btrfs_trans_handle *trans;
 
-   trans = start_transaction(root, 0, TRANS_ATTACH, 0);
+   trans = start_transaction(root, 0, TRANS_ATTACH,
+ BTRFS_RESERVE_NO_FLUSH);
if (IS_ERR(trans) && PTR_ERR(trans) == -ENOENT)
btrfs_wait_for_commit(root, 0);
 
-- 
2.6.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html