Re: [PATCH 0/8][V2] Enospc cleanups and fixeS

2019-02-08 Thread David Sterba
On Mon, Dec 03, 2018 at 10:24:51AM -0500, Josef Bacik wrote:
> v1->v2:
> - addressed comments from reviewers.
> - fixed a bug in patch 6 that was introduced because of changes to upstream.
> 
> -- Original message --
> 
> The delayed refs rsv patches exposed a bunch of issues in our enospc
> infrastructure that needed to be addressed.  These aren't really one coherent
> group, but they are all around flushing and reservations.
> may_commit_transaction() needed to be updated a little bit, and we needed to 
> add
> a new state to force chunk allocation if things got dicey.  Also because we 
> can
> end up needed to reserve a whole bunch of extra space for outstanding delayed
> refs we needed to add the ability to only ENOSPC tickets that were too big to
> satisfy, instead of failing all of the tickets.  There's also a fix in here 
> for
> one of the corner cases where we didn't quite have enough space reserved for 
> the
> delayed refs we were generating during evict().  Thanks,

The patchset has been added to misc-next, I've updated the changelogs
and addressed the feedback from Nikolay.


Re: [PATCH 0/8][V2] Enospc cleanups and fixeS

2018-12-13 Thread David Sterba
On Thu, Dec 13, 2018 at 01:28:52PM -0500, Josef Bacik wrote:
> I mean I hit this ASSERT() in testing, and this patch series has the fixed 
> patch
> 
> https://patchwork.kernel.org/patch/10709829/
> 
> this is what fixes the problem that is causing the ASSERT(), it appears your
> next branch doesn't have this updated patch, but the previous one which trips
> that ASSERT.  Thanks,

Ok found it, patch updated, thanks. No other related diff between the
topic branch vs branch with patches appleid from mail.


Re: [PATCH 0/8][V2] Enospc cleanups and fixeS

2018-12-13 Thread Josef Bacik
On Thu, Dec 13, 2018 at 07:17:41PM +0100, David Sterba wrote:
> On Thu, Dec 13, 2018 at 09:45:55AM -0500, Josef Bacik wrote:
> > On Thu, Dec 13, 2018 at 03:11:11PM +0100, David Sterba wrote:
> > > On Mon, Dec 03, 2018 at 10:24:51AM -0500, Josef Bacik wrote:
> > > > v1->v2:
> > > > - addressed comments from reviewers.
> > > > - fixed a bug in patch 6 that was introduced because of changes to 
> > > > upstream.
> > > > 
> > > > -- Original message --
> > > > 
> > > > The delayed refs rsv patches exposed a bunch of issues in our enospc
> > > > infrastructure that needed to be addressed.  These aren't really one 
> > > > coherent
> > > > group, but they are all around flushing and reservations.
> > > > may_commit_transaction() needed to be updated a little bit, and we 
> > > > needed to add
> > > > a new state to force chunk allocation if things got dicey.  Also 
> > > > because we can
> > > > end up needed to reserve a whole bunch of extra space for outstanding 
> > > > delayed
> > > > refs we needed to add the ability to only ENOSPC tickets that were too 
> > > > big to
> > > > satisfy, instead of failing all of the tickets.  There's also a fix in 
> > > > here for
> > > > one of the corner cases where we didn't quite have enough space 
> > > > reserved for the
> > > > delayed refs we were generating during evict().  Thanks,
> > > 
> > > One testbox reports an assertion failure on current for-next,
> > > generic/224. I'm reporting it under this patchset as it's my best guess.
> > > Same host running misc-next (with the delayed rsv patchset) was fine and
> > > the run with for-next (including this patchset) fails. The assertion is
> > > 
> > >  5225 static int __reserve_metadata_bytes(struct btrfs_fs_info *fs_info,
> > >  5226 struct btrfs_space_info 
> > > *space_info,
> > >  5227 u64 orig_bytes,
> > >  5228 enum btrfs_reserve_flush_enum 
> > > flush,
> > >  5229 bool system_chunk)
> > >  5230 {
> > >  5231 struct reserve_ticket ticket;
> > >  5232 u64 used;
> > >  5233 u64 reclaim_bytes = 0;
> > >  5234 int ret = 0;
> > >  5235
> > >  5236 ASSERT(orig_bytes);
> > >  
> > > 
> > 
> > Looking at your for-next branch on your github (I assume this is what you 
> > are
> > testing)
> > 
> > https://github.com/kdave/btrfs-devel/blob/for-next-20181212/fs/btrfs/extent-tree.c
> > 
> > at line 5860 there's supposed to be a 
> > 
> > if (num_bytes == 0)
> > return 0
> > 
> > that's what I changed in v2 of this patchset, as I hit this bug as well.  It
> 
> What does 'this' refer to? The patchset in this mail thread? If yes,
> then something's wrong, because in the patch
> 
> https://patchwork.kernel.org/patch/10709827/
> 
> there's a clear ASSERT(orig_bytes) in the context of one hunk:
> 
> @@ -5210,6 +5217,7 @@  static int __reserve_metadata_bytes(struct 
> btrfs_fs_info *fs_info,
>  {
>   struct reserve_ticket ticket;
>   u64 used;
> + u64 reclaim_bytes = 0;
>   int ret = 0;
>  
>   ASSERT(orig_bytes);
> ---
> 
> > looks like you still have v1 of this patchset applied.  Thanks,
> 
> I looked up the patch series on patchwork too to double check that I haven't
> missed it in my mailboxes but no.
> 
> The assert was introduced by "Btrfs: introduce ticketed enospc infrastructure"
> which is quite old. The v2 of that patch is
> 
> https://lore.kernel.org/linux-btrfs/1463506255-15918-1-git-send-email-jba...@fb.com/
> 
> and also has the assert and not if (orig_bytes). Confused.

I mean I hit this ASSERT() in testing, and this patch series has the fixed patch

https://patchwork.kernel.org/patch/10709829/

this is what fixes the problem that is causing the ASSERT(), it appears your
next branch doesn't have this updated patch, but the previous one which trips
that ASSERT.  Thanks,

Josef


Re: [PATCH 0/8][V2] Enospc cleanups and fixeS

2018-12-13 Thread David Sterba
On Thu, Dec 13, 2018 at 09:45:55AM -0500, Josef Bacik wrote:
> On Thu, Dec 13, 2018 at 03:11:11PM +0100, David Sterba wrote:
> > On Mon, Dec 03, 2018 at 10:24:51AM -0500, Josef Bacik wrote:
> > > v1->v2:
> > > - addressed comments from reviewers.
> > > - fixed a bug in patch 6 that was introduced because of changes to 
> > > upstream.
> > > 
> > > -- Original message --
> > > 
> > > The delayed refs rsv patches exposed a bunch of issues in our enospc
> > > infrastructure that needed to be addressed.  These aren't really one 
> > > coherent
> > > group, but they are all around flushing and reservations.
> > > may_commit_transaction() needed to be updated a little bit, and we needed 
> > > to add
> > > a new state to force chunk allocation if things got dicey.  Also because 
> > > we can
> > > end up needed to reserve a whole bunch of extra space for outstanding 
> > > delayed
> > > refs we needed to add the ability to only ENOSPC tickets that were too 
> > > big to
> > > satisfy, instead of failing all of the tickets.  There's also a fix in 
> > > here for
> > > one of the corner cases where we didn't quite have enough space reserved 
> > > for the
> > > delayed refs we were generating during evict().  Thanks,
> > 
> > One testbox reports an assertion failure on current for-next,
> > generic/224. I'm reporting it under this patchset as it's my best guess.
> > Same host running misc-next (with the delayed rsv patchset) was fine and
> > the run with for-next (including this patchset) fails. The assertion is
> > 
> >  5225 static int __reserve_metadata_bytes(struct btrfs_fs_info *fs_info,
> >  5226 struct btrfs_space_info 
> > *space_info,
> >  5227 u64 orig_bytes,
> >  5228 enum btrfs_reserve_flush_enum 
> > flush,
> >  5229 bool system_chunk)
> >  5230 {
> >  5231 struct reserve_ticket ticket;
> >  5232 u64 used;
> >  5233 u64 reclaim_bytes = 0;
> >  5234 int ret = 0;
> >  5235
> >  5236 ASSERT(orig_bytes);
> >  
> > 
> 
> Looking at your for-next branch on your github (I assume this is what you are
> testing)
> 
> https://github.com/kdave/btrfs-devel/blob/for-next-20181212/fs/btrfs/extent-tree.c
> 
> at line 5860 there's supposed to be a 
> 
> if (num_bytes == 0)
>   return 0
> 
> that's what I changed in v2 of this patchset, as I hit this bug as well.  It

What does 'this' refer to? The patchset in this mail thread? If yes,
then something's wrong, because in the patch

https://patchwork.kernel.org/patch/10709827/

there's a clear ASSERT(orig_bytes) in the context of one hunk:

@@ -5210,6 +5217,7 @@  static int __reserve_metadata_bytes(struct btrfs_fs_info 
*fs_info,
 {
struct reserve_ticket ticket;
u64 used;
+   u64 reclaim_bytes = 0;
int ret = 0;
 
ASSERT(orig_bytes);
---

> looks like you still have v1 of this patchset applied.  Thanks,

I looked up the patch series on patchwork too to double check that I haven't
missed it in my mailboxes but no.

The assert was introduced by "Btrfs: introduce ticketed enospc infrastructure"
which is quite old. The v2 of that patch is

https://lore.kernel.org/linux-btrfs/1463506255-15918-1-git-send-email-jba...@fb.com/

and also has the assert and not if (orig_bytes). Confused.


Re: [PATCH 0/8][V2] Enospc cleanups and fixeS

2018-12-13 Thread Josef Bacik
On Thu, Dec 13, 2018 at 03:11:11PM +0100, David Sterba wrote:
> On Mon, Dec 03, 2018 at 10:24:51AM -0500, Josef Bacik wrote:
> > v1->v2:
> > - addressed comments from reviewers.
> > - fixed a bug in patch 6 that was introduced because of changes to upstream.
> > 
> > -- Original message --
> > 
> > The delayed refs rsv patches exposed a bunch of issues in our enospc
> > infrastructure that needed to be addressed.  These aren't really one 
> > coherent
> > group, but they are all around flushing and reservations.
> > may_commit_transaction() needed to be updated a little bit, and we needed 
> > to add
> > a new state to force chunk allocation if things got dicey.  Also because we 
> > can
> > end up needed to reserve a whole bunch of extra space for outstanding 
> > delayed
> > refs we needed to add the ability to only ENOSPC tickets that were too big 
> > to
> > satisfy, instead of failing all of the tickets.  There's also a fix in here 
> > for
> > one of the corner cases where we didn't quite have enough space reserved 
> > for the
> > delayed refs we were generating during evict().  Thanks,
> 
> One testbox reports an assertion failure on current for-next,
> generic/224. I'm reporting it under this patchset as it's my best guess.
> Same host running misc-next (with the delayed rsv patchset) was fine and
> the run with for-next (including this patchset) fails. The assertion is
> 
>  5225 static int __reserve_metadata_bytes(struct btrfs_fs_info *fs_info,
>  5226 struct btrfs_space_info *space_info,
>  5227 u64 orig_bytes,
>  5228 enum btrfs_reserve_flush_enum flush,
>  5229 bool system_chunk)
>  5230 {
>  5231 struct reserve_ticket ticket;
>  5232 u64 used;
>  5233 u64 reclaim_bytes = 0;
>  5234 int ret = 0;
>  5235
>  5236 ASSERT(orig_bytes);
>  
> 

Looking at your for-next branch on your github (I assume this is what you are
testing)

https://github.com/kdave/btrfs-devel/blob/for-next-20181212/fs/btrfs/extent-tree.c

at line 5860 there's supposed to be a 

if (num_bytes == 0)
return 0

that's what I changed in v2 of this patchset, as I hit this bug as well.  It
looks like you still have v1 of this patchset applied.  Thanks,

Josef


Re: [PATCH 0/8][V2] Enospc cleanups and fixeS

2018-12-13 Thread Nikolay Borisov



On 13.12.18 г. 16:11 ч., David Sterba wrote:
> On Mon, Dec 03, 2018 at 10:24:51AM -0500, Josef Bacik wrote:
>> v1->v2:
>> - addressed comments from reviewers.
>> - fixed a bug in patch 6 that was introduced because of changes to upstream.
>>
>> -- Original message --
>>
>> The delayed refs rsv patches exposed a bunch of issues in our enospc
>> infrastructure that needed to be addressed.  These aren't really one coherent
>> group, but they are all around flushing and reservations.
>> may_commit_transaction() needed to be updated a little bit, and we needed to 
>> add
>> a new state to force chunk allocation if things got dicey.  Also because we 
>> can
>> end up needed to reserve a whole bunch of extra space for outstanding delayed
>> refs we needed to add the ability to only ENOSPC tickets that were too big to
>> satisfy, instead of failing all of the tickets.  There's also a fix in here 
>> for
>> one of the corner cases where we didn't quite have enough space reserved for 
>> the
>> delayed refs we were generating during evict().  Thanks,
> 
> One testbox reports an assertion failure on current for-next,
> generic/224. I'm reporting it under this patchset as it's my best guess.
> Same host running misc-next (with the delayed rsv patchset) was fine and
> the run with for-next (including this patchset) fails. The assertion is
> 
>  5225 static int __reserve_metadata_bytes(struct btrfs_fs_info *fs_info,
>  5226 struct btrfs_space_info *space_info,
>  5227 u64 orig_bytes,
>  5228 enum btrfs_reserve_flush_enum flush,
>  5229 bool system_chunk)
>  5230 {
>  5231 struct reserve_ticket ticket;
>  5232 u64 used;
>  5233 u64 reclaim_bytes = 0;
>  5234 int ret = 0;
>  5235
>  5236 ASSERT(orig_bytes);
>  
> 
> I can't decipher from the register dump what's the value because the assertion
> calls printk and RAX is most likely length of the string of the
> resulting string:

Well you don't need to. THe assert ensures we are reserving non-null
bytes. Looking at the trace of the assert it must be caused by calling
btrfs_inode_rsv_refill from btrfs_delalloc_reserve_metadata. Patch 6/8
actually modifies the code there to loop. But it has the necessary
checks to ensure num_bytes is non null. This is really odd indeed.

> 
> RAX = 0x46 = 70
> length("assertion failed: orig_bytes, file: fs/btrfs/extent-tree.c, line: 
> 5236") = 70
> 
> There's assertion failure and a KASAN report below.
> 
> [36231.239898] run fstests generic/224 at 2018-12-12 18:36:17
> [36232.066101] BTRFS: device fsid 0a848fb2-05d4-4f0c-9ad5-0414ef33b530 devid 
> 1 transid 5 /dev/sdc1
> [36232.101758] BTRFS info (device sdc1): disk space caching is enabled
> [36232.108133] BTRFS info (device sdc1): has skinny extents
> [36232.113581] BTRFS info (device sdc1): flagging fs with big metadata feature
> [36232.149156] BTRFS info (device sdc1): checking UUID tree
> [36574.735814] assertion failed: orig_bytes, file: fs/btrfs/extent-tree.c, 
> line: 5236
> [36574.815254] [ cut here ]
> [36574.819980] kernel BUG at fs/btrfs/ctree.h:3517!
> [36574.826147] invalid opcode:  [#1] PREEMPT SMP KASAN
> [36574.831480] CPU: 7 PID: 4015 Comm: dd Tainted: G  I   
> 4.20.0-rc6-1.gab9e909-default+ #179
> [36574.840857] Hardware name: HP ProLiant DL380 G6, BIOS P62 10/01/2009
> [36574.847489] RIP: 0010:assfail.constprop.79+0x18/0x1a [btrfs]
> [36574.872226] RSP: 0018:8883e88b77c0 EFLAGS: 00010286
> [36574.877546] RAX: 0046 RBX: 8882acea0080 RCX: 
> 9316c63e
> [36574.884773] RDX:  RSI: 0008 RDI: 
> 8884136e520c
> [36574.891998] RBP:  R08: ed10826dcd71 R09: 
> ed10826dcd71
> [36574.899220] R10: 0001 R11: ed10826dcd70 R12: 
> 88840bd0d240
> [36574.906444] R13: 88804d6e5700 R14: 88804d6e5700 R15: 
> 
> [36574.913670] FS:  7f120abed540() GS:8884136c() 
> knlGS:
> [36574.921919] CS:  0010 DS:  ES:  CR0: 80050033
> [36574.927755] CR2: 7fa563830648 CR3: 0003f17ba000 CR4: 
> 06e0
> [36574.934975] Call Trace:
> [36574.937641]  reserve_metadata_bytes+0xb22/0x10c0 [btrfs]
> [36574.943056]  ? _raw_spin_lock+0x81/0xd0
> [36574.947107]  ? btrfs_async_reclaim_metadata_space+0x7b0/0x7b0 [btrfs]
> [36574.953639]  ? _raw_spin_lock+0x81/0xd0
> [36574.957580]  ? _raw_read_lock_irq+0x40/0x40
> [36574.962009]  ? btrfs_calculate_inode_block_rsv_size+0xe2/0x110 [btrfs]
> [36574.968813]  ? __btrfs_qgroup_reserve_meta+0x3b/0x1d0 [btrfs]
> [36574.974764]  btrfs_delalloc_reserve_metadata+0x2a1/0x8c0 [btrfs]
> [36574.980981]  btrfs_buffered_write.isra.22+0x309/0x970 [btrfs]
> [36574.986930]  ? btrfs_dirty_pages+0x3c0/0x3c0 [btrfs]
> [36574.991991]  ? __vfs_getxattr+0x5e/0x80
> [36574.

Re: [PATCH 0/8][V2] Enospc cleanups and fixeS

2018-12-13 Thread David Sterba
On Mon, Dec 03, 2018 at 10:24:51AM -0500, Josef Bacik wrote:
> v1->v2:
> - addressed comments from reviewers.
> - fixed a bug in patch 6 that was introduced because of changes to upstream.
> 
> -- Original message --
> 
> The delayed refs rsv patches exposed a bunch of issues in our enospc
> infrastructure that needed to be addressed.  These aren't really one coherent
> group, but they are all around flushing and reservations.
> may_commit_transaction() needed to be updated a little bit, and we needed to 
> add
> a new state to force chunk allocation if things got dicey.  Also because we 
> can
> end up needed to reserve a whole bunch of extra space for outstanding delayed
> refs we needed to add the ability to only ENOSPC tickets that were too big to
> satisfy, instead of failing all of the tickets.  There's also a fix in here 
> for
> one of the corner cases where we didn't quite have enough space reserved for 
> the
> delayed refs we were generating during evict().  Thanks,

One testbox reports an assertion failure on current for-next,
generic/224. I'm reporting it under this patchset as it's my best guess.
Same host running misc-next (with the delayed rsv patchset) was fine and
the run with for-next (including this patchset) fails. The assertion is

 5225 static int __reserve_metadata_bytes(struct btrfs_fs_info *fs_info,
 5226 struct btrfs_space_info *space_info,
 5227 u64 orig_bytes,
 5228 enum btrfs_reserve_flush_enum flush,
 5229 bool system_chunk)
 5230 {
 5231 struct reserve_ticket ticket;
 5232 u64 used;
 5233 u64 reclaim_bytes = 0;
 5234 int ret = 0;
 5235
 5236 ASSERT(orig_bytes);
 

I can't decipher from the register dump what's the value because the assertion
calls printk and RAX is most likely length of the string of the
resulting string:

RAX = 0x46 = 70
length("assertion failed: orig_bytes, file: fs/btrfs/extent-tree.c, line: 
5236") = 70

There's assertion failure and a KASAN report below.

[36231.239898] run fstests generic/224 at 2018-12-12 18:36:17
[36232.066101] BTRFS: device fsid 0a848fb2-05d4-4f0c-9ad5-0414ef33b530 devid 1 
transid 5 /dev/sdc1
[36232.101758] BTRFS info (device sdc1): disk space caching is enabled
[36232.108133] BTRFS info (device sdc1): has skinny extents
[36232.113581] BTRFS info (device sdc1): flagging fs with big metadata feature
[36232.149156] BTRFS info (device sdc1): checking UUID tree
[36574.735814] assertion failed: orig_bytes, file: fs/btrfs/extent-tree.c, 
line: 5236
[36574.815254] [ cut here ]
[36574.819980] kernel BUG at fs/btrfs/ctree.h:3517!
[36574.826147] invalid opcode:  [#1] PREEMPT SMP KASAN
[36574.831480] CPU: 7 PID: 4015 Comm: dd Tainted: G  I   
4.20.0-rc6-1.gab9e909-default+ #179
[36574.840857] Hardware name: HP ProLiant DL380 G6, BIOS P62 10/01/2009
[36574.847489] RIP: 0010:assfail.constprop.79+0x18/0x1a [btrfs]
[36574.872226] RSP: 0018:8883e88b77c0 EFLAGS: 00010286
[36574.877546] RAX: 0046 RBX: 8882acea0080 RCX: 9316c63e
[36574.884773] RDX:  RSI: 0008 RDI: 8884136e520c
[36574.891998] RBP:  R08: ed10826dcd71 R09: ed10826dcd71
[36574.899220] R10: 0001 R11: ed10826dcd70 R12: 88840bd0d240
[36574.906444] R13: 88804d6e5700 R14: 88804d6e5700 R15: 
[36574.913670] FS:  7f120abed540() GS:8884136c() 
knlGS:
[36574.921919] CS:  0010 DS:  ES:  CR0: 80050033
[36574.927755] CR2: 7fa563830648 CR3: 0003f17ba000 CR4: 06e0
[36574.934975] Call Trace:
[36574.937641]  reserve_metadata_bytes+0xb22/0x10c0 [btrfs]
[36574.943056]  ? _raw_spin_lock+0x81/0xd0
[36574.947107]  ? btrfs_async_reclaim_metadata_space+0x7b0/0x7b0 [btrfs]
[36574.953639]  ? _raw_spin_lock+0x81/0xd0
[36574.957580]  ? _raw_read_lock_irq+0x40/0x40
[36574.962009]  ? btrfs_calculate_inode_block_rsv_size+0xe2/0x110 [btrfs]
[36574.968813]  ? __btrfs_qgroup_reserve_meta+0x3b/0x1d0 [btrfs]
[36574.974764]  btrfs_delalloc_reserve_metadata+0x2a1/0x8c0 [btrfs]
[36574.980981]  btrfs_buffered_write.isra.22+0x309/0x970 [btrfs]
[36574.986930]  ? btrfs_dirty_pages+0x3c0/0x3c0 [btrfs]
[36574.991991]  ? __vfs_getxattr+0x5e/0x80
[36574.995922]  ? cap_inode_need_killpriv+0x2a/0x40
[36575.000644]  ? file_remove_privs+0xa4/0x1c0
[36575.004921]  ? timespec64_trunc+0x5c/0x90
[36575.009031]  ? current_time+0xa9/0x100
[36575.012882]  ? timespec64_trunc+0x90/0x90
[36575.016986]  ? _raw_spin_lock+0x81/0xd0
[36575.020920]  ? _raw_read_lock_irq+0x40/0x40
[36575.025199]  ? clear_user+0x25/0x60
[36575.030508]  btrfs_file_write_iter+0x5a8/0xac0 [btrfs]
[36575.035898]  ? btrfs_sync_file+0x600/0x600 [btrfs]
[36575.040787]  ? mem_cgroup_charge_statistics+0x1f2/0x3b0
[36575.046110]  __vfs_write+0x236/0x370
[36575.

[PATCH 0/8][V2] Enospc cleanups and fixeS

2018-12-03 Thread Josef Bacik
v1->v2:
- addressed comments from reviewers.
- fixed a bug in patch 6 that was introduced because of changes to upstream.

-- Original message --

The delayed refs rsv patches exposed a bunch of issues in our enospc
infrastructure that needed to be addressed.  These aren't really one coherent
group, but they are all around flushing and reservations.
may_commit_transaction() needed to be updated a little bit, and we needed to add
a new state to force chunk allocation if things got dicey.  Also because we can
end up needed to reserve a whole bunch of extra space for outstanding delayed
refs we needed to add the ability to only ENOSPC tickets that were too big to
satisfy, instead of failing all of the tickets.  There's also a fix in here for
one of the corner cases where we didn't quite have enough space reserved for the
delayed refs we were generating during evict().  Thanks,

Josef