Re: REGRESSION: 37f4a24c2469: blk-mq: centralise related handling into blk_mq_get_driver_tag

2020-09-27 Thread Theodore Y. Ts'o
On Fri, Sep 25, 2020 at 02:18:48PM -0700, Shakeel Butt wrote:
> 
> Yes, you are right. Let's first get this patch tested and after
> confirmation we can update the commit message.

Thanks Shakeel!  I've tested your patch, as well as reverting the
three commits that Linus had suggested, and both seem to address the
problem for me as well.  I did see a small number of failures
immediately as soon as the VM has booted, when testing with the
"revert the three commits" but this appears to be a different failure,
which I had been seeing periodically during the bisect as well which
was no doubt introducing noise in my testing:

[   28.545018] watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [swapper/1:0]
[   28.545018] Modules linked in:
[   28.545018] irq event stamp: 4517759
[   28.545018] hardirqs last  enabled at (4517758): [] 
asm_common_interrupt+0x1e/0x40
[   28.545018] hardirqs last disabled at (4517759): [] 
sysvec_apic_timer_interrupt+0xb/0x90
[   28.545018] softirqs last  enabled at (10634): [] 
irq_enter_rcu+0x6d/0x70
[   28.545018] softirqs last disabled at (10635): [] 
asm_call_on_stack+0x12/0x20
[   28.545018] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 
5.9.0-rc6-xfstests-7-g3f3cb48a7d90 #1916
[   28.545018] Hardware name: Google Google Compute Engine/Google Compute 
Engine, BIOS Google 01/01/2011
[   28.545018] RIP: 0010:__do_softirq+0xa3/0x435
[   28.545018] Code: 00 83 80 ac 07 00 00 01 48 89 44 24 08 c7 44 24 1c 0a 00 
00 00 65 66 c7 05 a8 ae 9e 55 00 00 e8 d3 92 3b ff fb b8 ff ff ff ff <48> c7 c3 
40 51 00 ab 41 0f bc c7 89 c6 83 c6 01 89 74 24 04 75 6a
[   28.545018] RSP: :b89f000e0f98 EFLAGS: 0202
[   28.545018] RAX:  RBX:  RCX: 298a
[   28.545018] RDX:  RSI:  RDI: aa80009d
[   28.545018] RBP: b89f000abda0 R08: 0001 R09: 
[   28.545018] R10: 0001 R11: 0046 R12: 0001
[   28.545018] R13:  R14:  R15: 0080
[   28.545018] FS:  () GS:998e5920() 
knlGS:
[   28.545018] CS:  0010 DS:  ES:  CR0: 80050033
[   28.545018] CR2:  CR3: 00023e012001 CR4: 001706e0
[   28.545018] DR0:  DR1:  DR2: 
[   28.545018] DR3:  DR6: fffe0ff0 DR7: 0400
[   28.545018] Call Trace:
[   28.545018]  
[   28.545018]  asm_call_on_stack+0x12/0x20
[   28.545018]  
[   28.545018]  do_softirq_own_stack+0x4e/0x60
[   28.545018]  irq_exit_rcu+0x9f/0xe0
[   28.545018]  sysvec_call_function_single+0x43/0x90
[   28.545018]  asm_sysvec_call_function_single+0x12/0x20
[   28.545018] RIP: 0010:acpi_idle_do_entry+0x54/0x70
[   28.545018] Code: ed c3 e9 cf fe ff ff 65 48 8b 04 25 00 6e 01 00 48 8b 00 
a8 08 75 ea e8 ba c0 5b ff e9 07 00 00 00 0f 00 2d f8 3d 4e 00 fb f4 <9c> 58 fa 
f6 c4 02 74 cf e9 5f c2 5b ff cc cc cc cc cc cc cc cc cc
[   28.545018] RSP: :b89f000abe88 EFLAGS: 0202
[   28.545018] RAX: 293b RBX: 998e5564 RCX: 1a12
[   28.545018] RDX:  RSI:  RDI: aa5fd2b6
[   28.545018] RBP: ab163760 R08: 0001 R09: 000e003c
[   28.545018] R10: 998e582e2340 R11: 0046 R12: 0001
[   28.545018] R13: 0001 R14: ab1637e0 R15: 
[   28.545018]  ? acpi_idle_do_entry+0x46/0x70
[   28.545018]  ? acpi_idle_do_entry+0x46/0x70
[   28.545018]  acpi_idle_enter+0x7d/0xb0
[   28.545018]  cpuidle_enter_state+0x84/0x2c0
[   28.545018]  cpuidle_enter+0x29/0x40
[   28.545018]  cpuidle_idle_call+0x111/0x180
[   28.545018]  do_idle+0x7b/0xd0
[   28.545018]  cpu_startup_entry+0x19/0x20
[   28.545018]  secondary_startup_64+0xb6/0xc0

I think this was an issue relating to acpi_idle that others have
reported, but I thought this was fixed before -rc6 was released?  In
any case, this is post -rc6, so apparently there is something else
going on here, and this is probably unrelated to the regression which
Shakeel's patch is addressing.

- Ted


Re: REGRESSION: 37f4a24c2469: blk-mq: centralise related handling into blk_mq_get_driver_tag

2020-09-26 Thread Roman Gushchin
On Sat, Sep 26, 2020 at 09:43:25AM +0800, Ming Lei wrote:
> On Fri, Sep 25, 2020 at 12:19:02PM -0700, Shakeel Butt wrote:
> > On Fri, Sep 25, 2020 at 10:58 AM Shakeel Butt 
> > wrote:
> > >
> > [snip]
> > >
> > > I don't think you can ignore the flushing. The __free_once() in
> > > ___cache_free() assumes there is a space available.
> > >
> > > BTW do_drain() also have the same issue.
> > >
> > > Why not move slabs_destroy() after we update ac->avail and memmove()?
> > 
> > Ming, can you please try the following patch?
> > 
> > 
> > From: Shakeel Butt 
> > 
> > [PATCH] mm: slab: fix potential infinite recursion in ___cache_free
> > 
> > With the commit 10befea91b61 ("mm: memcg/slab: use a single set of
> > kmem_caches for all allocations"), it becomes possible to call kfree()
> > from the slabs_destroy(). However if slabs_destroy() is being called for
> > the array_cache of the local CPU then this opens the potential scenario
> > of infinite recursion because kfree() called from slabs_destroy() can
> > call slabs_destroy() with the same array_cache of the local CPU. Since
> > the array_cache of the local CPU is not updated before calling
> > slabs_destroy(), it will try to free the same pages.
> > 
> > To fix the issue, simply update the cache before calling
> > slabs_destroy().
> > 
> > Signed-off-by: Shakeel Butt 
> > ---
> >  mm/slab.c | 8 ++--
> >  1 file changed, 6 insertions(+), 2 deletions(-)
> > 
> > diff --git a/mm/slab.c b/mm/slab.c
> > index 3160dff6fd76..f658e86ec8ce 100644
> > --- a/mm/slab.c
> > +++ b/mm/slab.c
> > @@ -1632,6 +1632,10 @@ static void slab_destroy(struct kmem_cache *cachep, 
> > struct page *page)
> > kmem_cache_free(cachep->freelist_cache, freelist);
> >  }
> >  
> > +/*
> > + * Update the size of the caches before calling slabs_destroy as it may
> > + * recursively call kfree.
> > + */
> >  static void slabs_destroy(struct kmem_cache *cachep, struct list_head 
> > *list)
> >  {
> > struct page *page, *n;
> > @@ -2153,8 +2157,8 @@ static void do_drain(void *arg)
> > spin_lock(>list_lock);
> > free_block(cachep, ac->entry, ac->avail, node, );
> > spin_unlock(>list_lock);
> > -   slabs_destroy(cachep, );
> > ac->avail = 0;
> > +   slabs_destroy(cachep, );
> >  }
> >  
> >  static void drain_cpu_caches(struct kmem_cache *cachep)
> > @@ -3402,9 +3406,9 @@ static void cache_flusharray(struct kmem_cache 
> > *cachep, struct array_cache *ac)
> > }
> >  #endif
> > spin_unlock(>list_lock);
> > -   slabs_destroy(cachep, );
> > ac->avail -= batchcount;
> > memmove(ac->entry, &(ac->entry[batchcount]), sizeof(void *)*ac->avail);
> > +   slabs_destroy(cachep, );
> >  }
> 
> The issue can't be reproduced after applying this patch:
> 
> Tested-by: Ming Lei 

Perfect, thank you very much for the confirmation!

Shakeel, can you, please, resend the patch with the proper fixes tag
and the updated commit log? Please, feel free to add
Reviewed-by: Roman Gushchin  .

Thank you!

Roman


Re: REGRESSION: 37f4a24c2469: blk-mq: centralise related handling into blk_mq_get_driver_tag

2020-09-25 Thread Ming Lei
On Fri, Sep 25, 2020 at 12:19:02PM -0700, Shakeel Butt wrote:
> On Fri, Sep 25, 2020 at 10:58 AM Shakeel Butt 
> wrote:
> >
> [snip]
> >
> > I don't think you can ignore the flushing. The __free_once() in
> > ___cache_free() assumes there is a space available.
> >
> > BTW do_drain() also have the same issue.
> >
> > Why not move slabs_destroy() after we update ac->avail and memmove()?
> 
> Ming, can you please try the following patch?
> 
> 
> From: Shakeel Butt 
> 
> [PATCH] mm: slab: fix potential infinite recursion in ___cache_free
> 
> With the commit 10befea91b61 ("mm: memcg/slab: use a single set of
> kmem_caches for all allocations"), it becomes possible to call kfree()
> from the slabs_destroy(). However if slabs_destroy() is being called for
> the array_cache of the local CPU then this opens the potential scenario
> of infinite recursion because kfree() called from slabs_destroy() can
> call slabs_destroy() with the same array_cache of the local CPU. Since
> the array_cache of the local CPU is not updated before calling
> slabs_destroy(), it will try to free the same pages.
> 
> To fix the issue, simply update the cache before calling
> slabs_destroy().
> 
> Signed-off-by: Shakeel Butt 
> ---
>  mm/slab.c | 8 ++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/slab.c b/mm/slab.c
> index 3160dff6fd76..f658e86ec8ce 100644
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -1632,6 +1632,10 @@ static void slab_destroy(struct kmem_cache *cachep, 
> struct page *page)
>   kmem_cache_free(cachep->freelist_cache, freelist);
>  }
>  
> +/*
> + * Update the size of the caches before calling slabs_destroy as it may
> + * recursively call kfree.
> + */
>  static void slabs_destroy(struct kmem_cache *cachep, struct list_head *list)
>  {
>   struct page *page, *n;
> @@ -2153,8 +2157,8 @@ static void do_drain(void *arg)
>   spin_lock(>list_lock);
>   free_block(cachep, ac->entry, ac->avail, node, );
>   spin_unlock(>list_lock);
> - slabs_destroy(cachep, );
>   ac->avail = 0;
> + slabs_destroy(cachep, );
>  }
>  
>  static void drain_cpu_caches(struct kmem_cache *cachep)
> @@ -3402,9 +3406,9 @@ static void cache_flusharray(struct kmem_cache *cachep, 
> struct array_cache *ac)
>   }
>  #endif
>   spin_unlock(>list_lock);
> - slabs_destroy(cachep, );
>   ac->avail -= batchcount;
>   memmove(ac->entry, &(ac->entry[batchcount]), sizeof(void *)*ac->avail);
> + slabs_destroy(cachep, );
>  }

The issue can't be reproduced after applying this patch:

Tested-by: Ming Lei 

Thanks,
Ming



Re: REGRESSION: 37f4a24c2469: blk-mq: centralise related handling into blk_mq_get_driver_tag

2020-09-25 Thread Shakeel Butt
On Fri, Sep 25, 2020 at 1:56 PM Roman Gushchin  wrote:
>
> On Fri, Sep 25, 2020 at 12:19:02PM -0700, Shakeel Butt wrote:
> > On Fri, Sep 25, 2020 at 10:58 AM Shakeel Butt 
> > wrote:
> > >
> > [snip]
> > >
> > > I don't think you can ignore the flushing. The __free_once() in
> > > ___cache_free() assumes there is a space available.
> > >
> > > BTW do_drain() also have the same issue.
> > >
> > > Why not move slabs_destroy() after we update ac->avail and memmove()?
> >
> > Ming, can you please try the following patch?
> >
> >
> > From: Shakeel Butt 
> >
> > [PATCH] mm: slab: fix potential infinite recursion in ___cache_free
> >
> > With the commit 10befea91b61 ("mm: memcg/slab: use a single set of
> > kmem_caches for all allocations"), it becomes possible to call kfree()
> > from the slabs_destroy(). However if slabs_destroy() is being called for
> > the array_cache of the local CPU then this opens the potential scenario
> > of infinite recursion because kfree() called from slabs_destroy() can
> > call slabs_destroy() with the same array_cache of the local CPU. Since
> > the array_cache of the local CPU is not updated before calling
> > slabs_destroy(), it will try to free the same pages.
> >
> > To fix the issue, simply update the cache before calling
> > slabs_destroy().
> >
> > Signed-off-by: Shakeel Butt 
>
> I like the patch and I think it should fix the problem.
>
> However the description above should be likely asjusted a bit.
> It seems that the problem is not necessary caused by an infinite recursion,
> it can be even simpler.
>
> In cache_flusharray() we rely on the state of ac, which is described
> by ac->avail. In particular we rely on batchcount < ac->avail,
> as we shift the batchcount number of pointers by memmove.
> But if slabs_destroy() is called before and leaded to a change of the
> ac state, it can lead to a memory corruption.
>
> Also, unconditionally resetting ac->avail to 0 in do_drain() after calling
> to slab_destroy() seems to be wrong.
> It explains double free BUGs we've seen in stacktraces.
>

Yes, you are right. Let's first get this patch tested and after
confirmation we can update the commit message.


Re: REGRESSION: 37f4a24c2469: blk-mq: centralise related handling into blk_mq_get_driver_tag

2020-09-25 Thread Roman Gushchin
On Fri, Sep 25, 2020 at 12:19:02PM -0700, Shakeel Butt wrote:
> On Fri, Sep 25, 2020 at 10:58 AM Shakeel Butt 
> wrote:
> >
> [snip]
> >
> > I don't think you can ignore the flushing. The __free_once() in
> > ___cache_free() assumes there is a space available.
> >
> > BTW do_drain() also have the same issue.
> >
> > Why not move slabs_destroy() after we update ac->avail and memmove()?
> 
> Ming, can you please try the following patch?
> 
> 
> From: Shakeel Butt 
> 
> [PATCH] mm: slab: fix potential infinite recursion in ___cache_free
> 
> With the commit 10befea91b61 ("mm: memcg/slab: use a single set of
> kmem_caches for all allocations"), it becomes possible to call kfree()
> from the slabs_destroy(). However if slabs_destroy() is being called for
> the array_cache of the local CPU then this opens the potential scenario
> of infinite recursion because kfree() called from slabs_destroy() can
> call slabs_destroy() with the same array_cache of the local CPU. Since
> the array_cache of the local CPU is not updated before calling
> slabs_destroy(), it will try to free the same pages.
> 
> To fix the issue, simply update the cache before calling
> slabs_destroy().
> 
> Signed-off-by: Shakeel Butt 

I like the patch and I think it should fix the problem.

However the description above should be likely asjusted a bit.
It seems that the problem is not necessary caused by an infinite recursion,
it can be even simpler.

In cache_flusharray() we rely on the state of ac, which is described
by ac->avail. In particular we rely on batchcount < ac->avail,
as we shift the batchcount number of pointers by memmove.
But if slabs_destroy() is called before and leaded to a change of the
ac state, it can lead to a memory corruption.

Also, unconditionally resetting ac->avail to 0 in do_drain() after calling
to slab_destroy() seems to be wrong.
It explains double free BUGs we've seen in stacktraces.

Thanks!

> ---
>  mm/slab.c | 8 ++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/slab.c b/mm/slab.c
> index 3160dff6fd76..f658e86ec8ce 100644
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -1632,6 +1632,10 @@ static void slab_destroy(struct kmem_cache *cachep, 
> struct page *page)
>   kmem_cache_free(cachep->freelist_cache, freelist);
>  }
>  
> +/*
> + * Update the size of the caches before calling slabs_destroy as it may
> + * recursively call kfree.
> + */
>  static void slabs_destroy(struct kmem_cache *cachep, struct list_head *list)
>  {
>   struct page *page, *n;
> @@ -2153,8 +2157,8 @@ static void do_drain(void *arg)
>   spin_lock(>list_lock);
>   free_block(cachep, ac->entry, ac->avail, node, );
>   spin_unlock(>list_lock);
> - slabs_destroy(cachep, );
>   ac->avail = 0;
> + slabs_destroy(cachep, );
>  }
>  
>  static void drain_cpu_caches(struct kmem_cache *cachep)
> @@ -3402,9 +3406,9 @@ static void cache_flusharray(struct kmem_cache *cachep, 
> struct array_cache *ac)
>   }
>  #endif
>   spin_unlock(>list_lock);
> - slabs_destroy(cachep, );
>   ac->avail -= batchcount;
>   memmove(ac->entry, &(ac->entry[batchcount]), sizeof(void *)*ac->avail);
> + slabs_destroy(cachep, );
>  }
>  
>  /*
> -- 
> 2.28.0.681.g6f77f65b4e-goog
> 


Re: REGRESSION: 37f4a24c2469: blk-mq: centralise related handling into blk_mq_get_driver_tag

2020-09-25 Thread Shakeel Butt
On Fri, Sep 25, 2020 at 10:58 AM Shakeel Butt 
wrote:
>
[snip]
>
> I don't think you can ignore the flushing. The __free_once() in
> ___cache_free() assumes there is a space available.
>
> BTW do_drain() also have the same issue.
>
> Why not move slabs_destroy() after we update ac->avail and memmove()?

Ming, can you please try the following patch?


From: Shakeel Butt 

[PATCH] mm: slab: fix potential infinite recursion in ___cache_free

With the commit 10befea91b61 ("mm: memcg/slab: use a single set of
kmem_caches for all allocations"), it becomes possible to call kfree()
from the slabs_destroy(). However if slabs_destroy() is being called for
the array_cache of the local CPU then this opens the potential scenario
of infinite recursion because kfree() called from slabs_destroy() can
call slabs_destroy() with the same array_cache of the local CPU. Since
the array_cache of the local CPU is not updated before calling
slabs_destroy(), it will try to free the same pages.

To fix the issue, simply update the cache before calling
slabs_destroy().

Signed-off-by: Shakeel Butt 
---
 mm/slab.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/mm/slab.c b/mm/slab.c
index 3160dff6fd76..f658e86ec8ce 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -1632,6 +1632,10 @@ static void slab_destroy(struct kmem_cache *cachep, 
struct page *page)
kmem_cache_free(cachep->freelist_cache, freelist);
 }
 
+/*
+ * Update the size of the caches before calling slabs_destroy as it may
+ * recursively call kfree.
+ */
 static void slabs_destroy(struct kmem_cache *cachep, struct list_head *list)
 {
struct page *page, *n;
@@ -2153,8 +2157,8 @@ static void do_drain(void *arg)
spin_lock(>list_lock);
free_block(cachep, ac->entry, ac->avail, node, );
spin_unlock(>list_lock);
-   slabs_destroy(cachep, );
ac->avail = 0;
+   slabs_destroy(cachep, );
 }
 
 static void drain_cpu_caches(struct kmem_cache *cachep)
@@ -3402,9 +3406,9 @@ static void cache_flusharray(struct kmem_cache *cachep, 
struct array_cache *ac)
}
 #endif
spin_unlock(>list_lock);
-   slabs_destroy(cachep, );
ac->avail -= batchcount;
memmove(ac->entry, &(ac->entry[batchcount]), sizeof(void *)*ac->avail);
+   slabs_destroy(cachep, );
 }
 
 /*
-- 
2.28.0.681.g6f77f65b4e-goog



Re: REGRESSION: 37f4a24c2469: blk-mq: centralise related handling into blk_mq_get_driver_tag

2020-09-25 Thread Shakeel Butt
On Fri, Sep 25, 2020 at 10:48 AM Roman Gushchin  wrote:
>
> On Fri, Sep 25, 2020 at 10:35:03AM -0700, Shakeel Butt wrote:
> > On Fri, Sep 25, 2020 at 10:22 AM Shakeel Butt  wrote:
> > >
> > > On Fri, Sep 25, 2020 at 10:17 AM Linus Torvalds
> > >  wrote:
> > > >
> > > > On Fri, Sep 25, 2020 at 9:19 AM Ming Lei  wrote:
> > > > >
> > > > > git bisect shows the first bad commit:
> > > > >
> > > > > [10befea91b61c4e2c2d1df06a2e978d182fcf792] mm: memcg/slab: 
> > > > > use a single set of
> > > > > kmem_caches for all allocations
> > > > >
> > > > > And I have double checked that the above commit is really the first 
> > > > > bad
> > > > > commit for the list corruption issue of 'list_del corruption, 
> > > > > e1c241b00408->next
> > > > > is LIST_POISON1 (dead0100)',
> > > >
> > > > Thet commit doesn't revert cleanly, but I think that's purely because
> > > > we'd also need to revert
> > > >
> > > >   849504809f86 ("mm: memcg/slab: remove unused argument by 
> > > > charge_slab_page()")
> > > >   74d555bed5d0 ("mm: slab: rename (un)charge_slab_page() to
> > > > (un)account_slab_page()")
> > > >
> > > > too.
> > > >
> > > > Can you verify that a
> > > >
> > > > git revert 74d555bed5d0 849504809f86 10befea91b61
> > > >
> > > > on top of current -git makes things work for you again?
> > > >
> > > > I'm going to do an rc8 this release simply because we have another VM
> > > > issue that I hope to get fixed - but there we know what the problem
> > > > and the fix _is_, it just needs some care.
> > > >
> > > > So if Roman (or somebody else) can see what's wrong and we can fix
> > > > this quickly, we don't need to go down the revert path, but ..
> > > >
> > >
> > > I think I have a theory. The issue is happening due to the potential
> > > infinite recursion:
> > >
> > > [ 5060.124412]  ___cache_free+0x488/0x6b0
> > > *Second recursion
> > > [ 5060.128666]  kfree+0xc9/0x1d0
> > > [ 5060.131947]  kmem_freepages+0xa0/0xf0
> > > [ 5060.135746]  slab_destroy+0x19/0x50
> > > [ 5060.139577]  slabs_destroy+0x6d/0x90
> > > [ 5060.143379]  ___cache_free+0x4a3/0x6b0
> > > *First recursion
> > > [ 5060.147896]  kfree+0xc9/0x1d0
> > > [ 5060.151082]  kmem_freepages+0xa0/0xf0
> > > [ 5060.155121]  slab_destroy+0x19/0x50
> > > [ 5060.159028]  slabs_destroy+0x6d/0x90
> > > [ 5060.162920]  ___cache_free+0x4a3/0x6b0
> > > [ 5060.167097]  kfree+0xc9/0x1d0
> > >
> > > ___cache_free() is calling cache_flusharray() to flush the local cpu
> > > array_cache if the cache has more elements than the limit (ac->avail
> > > >= ac->limit).
> > >
> > > cache_flusharray() is removing batchcount number of element from local
> > > cpu array_cache and pass it slabs_destroy (if the node shared cache is
> > > also full).
> > >
> > > Note that we have not updated local cpu array_cache size yet and
> > > called slabs_destroy() which can call kfree() through
> > > unaccount_slab_page().
> > >
> > > We are on the same CPU and this recursive kfree again check the
> > > (ac->avail >= ac->limit) and call cache_flusharray() again and recurse
> > > indefinitely.
>
> It's a coll theory! And it explains why we haven't seen it with SLUB.
>
> >
> > I can see two possible fixes. We can either do async kfree of
> > page_obj_cgroups(page) or we can update the local cpu array_cache's
> > size before slabs_destroy().
>
> I wonder if something like this can fix the problem?
> (completely untested).
>
> --
>
> diff --git a/mm/slab.c b/mm/slab.c
> index 684ebe5b0c7a..c94b9ccfb803 100644
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -186,6 +186,7 @@ struct array_cache {
> unsigned int limit;
> unsigned int batchcount;
> unsigned int touched;
> +   bool flushing;
> void *entry[];  /*
>  * Must have this definition in here for the proper
>  * alignment of array_cache. Also simplifies accessing
> @@ -526,6 +527,7 @@ static void init_arraycache(struct array_cache *ac, int 
> limit, int batch)
> ac->limit = limit;
> ac->batchcount = batch;
> ac->touched = 0;
> +   ac->flushing = false;
> }
>  }
>
> @@ -3368,6 +3370,11 @@ static void cache_flusharray(struct kmem_cache 
> *cachep, struct array_cache *ac)
> int node = numa_mem_id();
> LIST_HEAD(list);
>
> +   if (ac->flushing)
> +   return;
> +
> +   ac->flushing = true;
> +
> batchcount = ac->batchcount;
>
> check_irq_off();
> @@ -3404,6 +3411,7 @@ static void cache_flusharray(struct kmem_cache *cachep, 
> struct array_cache *ac)
> spin_unlock(>list_lock);
> slabs_destroy(cachep, );
> ac->avail -= batchcount;
> +   ac->flushing = false;
> memmove(ac->entry, &(ac->entry[batchcount]), sizeof(void 
> *)*ac->avail);
>  }
>

I don't think you can ignore the flushing. The __free_once() in
___cache_free() assumes there is a space available.

BTW do_drain() 

Re: REGRESSION: 37f4a24c2469: blk-mq: centralise related handling into blk_mq_get_driver_tag

2020-09-25 Thread Roman Gushchin
On Fri, Sep 25, 2020 at 10:35:03AM -0700, Shakeel Butt wrote:
> On Fri, Sep 25, 2020 at 10:22 AM Shakeel Butt  wrote:
> >
> > On Fri, Sep 25, 2020 at 10:17 AM Linus Torvalds
> >  wrote:
> > >
> > > On Fri, Sep 25, 2020 at 9:19 AM Ming Lei  wrote:
> > > >
> > > > git bisect shows the first bad commit:
> > > >
> > > > [10befea91b61c4e2c2d1df06a2e978d182fcf792] mm: memcg/slab: use 
> > > > a single set of
> > > > kmem_caches for all allocations
> > > >
> > > > And I have double checked that the above commit is really the first bad
> > > > commit for the list corruption issue of 'list_del corruption, 
> > > > e1c241b00408->next
> > > > is LIST_POISON1 (dead0100)',
> > >
> > > Thet commit doesn't revert cleanly, but I think that's purely because
> > > we'd also need to revert
> > >
> > >   849504809f86 ("mm: memcg/slab: remove unused argument by 
> > > charge_slab_page()")
> > >   74d555bed5d0 ("mm: slab: rename (un)charge_slab_page() to
> > > (un)account_slab_page()")
> > >
> > > too.
> > >
> > > Can you verify that a
> > >
> > > git revert 74d555bed5d0 849504809f86 10befea91b61
> > >
> > > on top of current -git makes things work for you again?
> > >
> > > I'm going to do an rc8 this release simply because we have another VM
> > > issue that I hope to get fixed - but there we know what the problem
> > > and the fix _is_, it just needs some care.
> > >
> > > So if Roman (or somebody else) can see what's wrong and we can fix
> > > this quickly, we don't need to go down the revert path, but ..
> > >
> >
> > I think I have a theory. The issue is happening due to the potential
> > infinite recursion:
> >
> > [ 5060.124412]  ___cache_free+0x488/0x6b0
> > *Second recursion
> > [ 5060.128666]  kfree+0xc9/0x1d0
> > [ 5060.131947]  kmem_freepages+0xa0/0xf0
> > [ 5060.135746]  slab_destroy+0x19/0x50
> > [ 5060.139577]  slabs_destroy+0x6d/0x90
> > [ 5060.143379]  ___cache_free+0x4a3/0x6b0
> > *First recursion
> > [ 5060.147896]  kfree+0xc9/0x1d0
> > [ 5060.151082]  kmem_freepages+0xa0/0xf0
> > [ 5060.155121]  slab_destroy+0x19/0x50
> > [ 5060.159028]  slabs_destroy+0x6d/0x90
> > [ 5060.162920]  ___cache_free+0x4a3/0x6b0
> > [ 5060.167097]  kfree+0xc9/0x1d0
> >
> > ___cache_free() is calling cache_flusharray() to flush the local cpu
> > array_cache if the cache has more elements than the limit (ac->avail
> > >= ac->limit).
> >
> > cache_flusharray() is removing batchcount number of element from local
> > cpu array_cache and pass it slabs_destroy (if the node shared cache is
> > also full).
> >
> > Note that we have not updated local cpu array_cache size yet and
> > called slabs_destroy() which can call kfree() through
> > unaccount_slab_page().
> >
> > We are on the same CPU and this recursive kfree again check the
> > (ac->avail >= ac->limit) and call cache_flusharray() again and recurse
> > indefinitely.

It's a coll theory! And it explains why we haven't seen it with SLUB.

> 
> I can see two possible fixes. We can either do async kfree of
> page_obj_cgroups(page) or we can update the local cpu array_cache's
> size before slabs_destroy().

I wonder if something like this can fix the problem?
(completely untested).

--

diff --git a/mm/slab.c b/mm/slab.c
index 684ebe5b0c7a..c94b9ccfb803 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -186,6 +186,7 @@ struct array_cache {
unsigned int limit;
unsigned int batchcount;
unsigned int touched;
+   bool flushing;
void *entry[];  /*
 * Must have this definition in here for the proper
 * alignment of array_cache. Also simplifies accessing
@@ -526,6 +527,7 @@ static void init_arraycache(struct array_cache *ac, int 
limit, int batch)
ac->limit = limit;
ac->batchcount = batch;
ac->touched = 0;
+   ac->flushing = false;
}
 }
 
@@ -3368,6 +3370,11 @@ static void cache_flusharray(struct kmem_cache *cachep, 
struct array_cache *ac)
int node = numa_mem_id();
LIST_HEAD(list);
 
+   if (ac->flushing)
+   return;
+
+   ac->flushing = true;
+
batchcount = ac->batchcount;
 
check_irq_off();
@@ -3404,6 +3411,7 @@ static void cache_flusharray(struct kmem_cache *cachep, 
struct array_cache *ac)
spin_unlock(>list_lock);
slabs_destroy(cachep, );
ac->avail -= batchcount;
+   ac->flushing = false;
memmove(ac->entry, &(ac->entry[batchcount]), sizeof(void *)*ac->avail);
 }
 


Re: REGRESSION: 37f4a24c2469: blk-mq: centralise related handling into blk_mq_get_driver_tag

2020-09-25 Thread Shakeel Butt
On Fri, Sep 25, 2020 at 10:22 AM Shakeel Butt  wrote:
>
> On Fri, Sep 25, 2020 at 10:17 AM Linus Torvalds
>  wrote:
> >
> > On Fri, Sep 25, 2020 at 9:19 AM Ming Lei  wrote:
> > >
> > > git bisect shows the first bad commit:
> > >
> > > [10befea91b61c4e2c2d1df06a2e978d182fcf792] mm: memcg/slab: use a 
> > > single set of
> > > kmem_caches for all allocations
> > >
> > > And I have double checked that the above commit is really the first bad
> > > commit for the list corruption issue of 'list_del corruption, 
> > > e1c241b00408->next
> > > is LIST_POISON1 (dead0100)',
> >
> > Thet commit doesn't revert cleanly, but I think that's purely because
> > we'd also need to revert
> >
> >   849504809f86 ("mm: memcg/slab: remove unused argument by 
> > charge_slab_page()")
> >   74d555bed5d0 ("mm: slab: rename (un)charge_slab_page() to
> > (un)account_slab_page()")
> >
> > too.
> >
> > Can you verify that a
> >
> > git revert 74d555bed5d0 849504809f86 10befea91b61
> >
> > on top of current -git makes things work for you again?
> >
> > I'm going to do an rc8 this release simply because we have another VM
> > issue that I hope to get fixed - but there we know what the problem
> > and the fix _is_, it just needs some care.
> >
> > So if Roman (or somebody else) can see what's wrong and we can fix
> > this quickly, we don't need to go down the revert path, but ..
> >
>
> I think I have a theory. The issue is happening due to the potential
> infinite recursion:
>
> [ 5060.124412]  ___cache_free+0x488/0x6b0
> *Second recursion
> [ 5060.128666]  kfree+0xc9/0x1d0
> [ 5060.131947]  kmem_freepages+0xa0/0xf0
> [ 5060.135746]  slab_destroy+0x19/0x50
> [ 5060.139577]  slabs_destroy+0x6d/0x90
> [ 5060.143379]  ___cache_free+0x4a3/0x6b0
> *First recursion
> [ 5060.147896]  kfree+0xc9/0x1d0
> [ 5060.151082]  kmem_freepages+0xa0/0xf0
> [ 5060.155121]  slab_destroy+0x19/0x50
> [ 5060.159028]  slabs_destroy+0x6d/0x90
> [ 5060.162920]  ___cache_free+0x4a3/0x6b0
> [ 5060.167097]  kfree+0xc9/0x1d0
>
> ___cache_free() is calling cache_flusharray() to flush the local cpu
> array_cache if the cache has more elements than the limit (ac->avail
> >= ac->limit).
>
> cache_flusharray() is removing batchcount number of element from local
> cpu array_cache and pass it slabs_destroy (if the node shared cache is
> also full).
>
> Note that we have not updated local cpu array_cache size yet and
> called slabs_destroy() which can call kfree() through
> unaccount_slab_page().
>
> We are on the same CPU and this recursive kfree again check the
> (ac->avail >= ac->limit) and call cache_flusharray() again and recurse
> indefinitely.

I can see two possible fixes. We can either do async kfree of
page_obj_cgroups(page) or we can update the local cpu array_cache's
size before slabs_destroy().

Shakeel


Re: REGRESSION: 37f4a24c2469: blk-mq: centralise related handling into blk_mq_get_driver_tag

2020-09-25 Thread Roman Gushchin
On Fri, Sep 25, 2020 at 09:47:43AM -0700, Shakeel Butt wrote:
> On Fri, Sep 25, 2020 at 9:32 AM Shakeel Butt  wrote:
> >
> > On Fri, Sep 25, 2020 at 9:19 AM Ming Lei  wrote:
> > >
> > > On Fri, Sep 25, 2020 at 03:31:45PM +0800, Ming Lei wrote:
> > > > On Thu, Sep 24, 2020 at 09:13:11PM -0400, Theodore Y. Ts'o wrote:
> > > > > On Thu, Sep 24, 2020 at 10:33:45AM -0400, Theodore Y. Ts'o wrote:
> > > > > > HOWEVER, thanks to a hint from a colleague at $WORK, and realizing
> > > > > > that one of the stack traces had virtio balloon in the trace, I
> > > > > > realized that when I switched the GCE VM type from e1-standard-2 to
> > > > > > n1-standard-2 (where e1 VM's are cheaper because they use
> > > > > > virtio-balloon to better manage host OS memory utilization), problem
> > > > > > has become, much, *much* rarer (and possibly has gone away, although
> > > > > > I'm going to want to run a lot more tests before I say that
> > > > > > conclusively) on my test setup.  At the very least, using an n1 VM
> > > > > > (which doesn't have virtio-balloon enabled in the hypervisor) is
> > > > > > enough to unblock ext4 development.
> > > > >
> > > > >  and I spoke too soon.  A number of runs using -rc6 are now
> > > > > failing even with the n1-standard-2 VM, so virtio-ballon may not be an
> > > > > indicator.
> > > > >
> > > > > This is why debugging this is frustrating; it is very much a heisenbug
> > > > > --- although 5.8 seems to work completely reliably, as does commits
> > > > > before 37f4a24c2469.  Anything after that point will show random
> > > > > failures.  :-(
> > > >
> > > > It does not make sense to mention 37f4a24c2469, which is reverted in
> > > > 4e2f62e566b5. Later the patch in 37f4a24c2469 is fixed and re-commited
> > > > as 568f27006577.
> > > >
> > > > However, I can _not_ reproduce the issue by running the same test on
> > > > kernel built from 568f27006577 directly.
> > > >
> > > > Also you have confirmed that the issue can't be fixed after reverting
> > > > 568f27006577 against v5.9-rc4.
> > > >
> > > > Looks the real issue(slab list corruption) should be introduced between
> > > > 568f27006577 and v5.9-rc4.
> > >
> > > git bisect shows the first bad commit:
> > >
> > > [10befea91b61c4e2c2d1df06a2e978d182fcf792] mm: memcg/slab: use a 
> > > single set of
> > > kmem_caches for all allocations
> > >
> > > And I have double checked that the above commit is really the first bad
> > > commit for the list corruption issue of 'list_del corruption, 
> > > e1c241b00408->next
> > > is LIST_POISON1 (dead0100)', see the detailed stack trace and
> > > kernel oops log in the following link:
> > >
> > > https://lore.kernel.org/lkml/20200916202026.gc38...@mit.edu/
> >
> > The failure signature is similar to
> > https://lore.kernel.org/lkml/20200901075321.GL4299@shao2-debian/
> >
> > >
> > > And the kernel config is the one(without KASAN) used by Theodore in GCE 
> > > VM, see
> > > the following link:
> > >
> > > https://lore.kernel.org/lkml/20200917143012.gf38...@mit.edu/
> > >
> > > The reproducer is xfstests generic/038. In my setting, test device is 
> > > virtio-scsi, and
> > > scratch device is virtio-blk.
> 
> Is it possible to check SLUB as well to confirm that the issue is only
> happening on SLAB?

Can you also, please, check if passing cgroup.memory=nokmem as a boot argument
is fixing the issue?

Thanks!


Re: REGRESSION: 37f4a24c2469: blk-mq: centralise related handling into blk_mq_get_driver_tag

2020-09-25 Thread Shakeel Butt
On Fri, Sep 25, 2020 at 10:17 AM Linus Torvalds
 wrote:
>
> On Fri, Sep 25, 2020 at 9:19 AM Ming Lei  wrote:
> >
> > git bisect shows the first bad commit:
> >
> > [10befea91b61c4e2c2d1df06a2e978d182fcf792] mm: memcg/slab: use a 
> > single set of
> > kmem_caches for all allocations
> >
> > And I have double checked that the above commit is really the first bad
> > commit for the list corruption issue of 'list_del corruption, 
> > e1c241b00408->next
> > is LIST_POISON1 (dead0100)',
>
> Thet commit doesn't revert cleanly, but I think that's purely because
> we'd also need to revert
>
>   849504809f86 ("mm: memcg/slab: remove unused argument by 
> charge_slab_page()")
>   74d555bed5d0 ("mm: slab: rename (un)charge_slab_page() to
> (un)account_slab_page()")
>
> too.
>
> Can you verify that a
>
> git revert 74d555bed5d0 849504809f86 10befea91b61
>
> on top of current -git makes things work for you again?
>
> I'm going to do an rc8 this release simply because we have another VM
> issue that I hope to get fixed - but there we know what the problem
> and the fix _is_, it just needs some care.
>
> So if Roman (or somebody else) can see what's wrong and we can fix
> this quickly, we don't need to go down the revert path, but ..
>

I think I have a theory. The issue is happening due to the potential
infinite recursion:

[ 5060.124412]  ___cache_free+0x488/0x6b0
*Second recursion
[ 5060.128666]  kfree+0xc9/0x1d0
[ 5060.131947]  kmem_freepages+0xa0/0xf0
[ 5060.135746]  slab_destroy+0x19/0x50
[ 5060.139577]  slabs_destroy+0x6d/0x90
[ 5060.143379]  ___cache_free+0x4a3/0x6b0
*First recursion
[ 5060.147896]  kfree+0xc9/0x1d0
[ 5060.151082]  kmem_freepages+0xa0/0xf0
[ 5060.155121]  slab_destroy+0x19/0x50
[ 5060.159028]  slabs_destroy+0x6d/0x90
[ 5060.162920]  ___cache_free+0x4a3/0x6b0
[ 5060.167097]  kfree+0xc9/0x1d0

___cache_free() is calling cache_flusharray() to flush the local cpu
array_cache if the cache has more elements than the limit (ac->avail
>= ac->limit).

cache_flusharray() is removing batchcount number of element from local
cpu array_cache and pass it slabs_destroy (if the node shared cache is
also full).

Note that we have not updated local cpu array_cache size yet and
called slabs_destroy() which can call kfree() through
unaccount_slab_page().

We are on the same CPU and this recursive kfree again check the
(ac->avail >= ac->limit) and call cache_flusharray() again and recurse
indefinitely.


Re: REGRESSION: 37f4a24c2469: blk-mq: centralise related handling into blk_mq_get_driver_tag

2020-09-25 Thread Linus Torvalds
On Fri, Sep 25, 2020 at 9:19 AM Ming Lei  wrote:
>
> git bisect shows the first bad commit:
>
> [10befea91b61c4e2c2d1df06a2e978d182fcf792] mm: memcg/slab: use a 
> single set of
> kmem_caches for all allocations
>
> And I have double checked that the above commit is really the first bad
> commit for the list corruption issue of 'list_del corruption, 
> e1c241b00408->next
> is LIST_POISON1 (dead0100)',

Thet commit doesn't revert cleanly, but I think that's purely because
we'd also need to revert

  849504809f86 ("mm: memcg/slab: remove unused argument by charge_slab_page()")
  74d555bed5d0 ("mm: slab: rename (un)charge_slab_page() to
(un)account_slab_page()")

too.

Can you verify that a

git revert 74d555bed5d0 849504809f86 10befea91b61

on top of current -git makes things work for you again?

I'm going to do an rc8 this release simply because we have another VM
issue that I hope to get fixed - but there we know what the problem
and the fix _is_, it just needs some care.

So if Roman (or somebody else) can see what's wrong and we can fix
this quickly, we don't need to go down the revert path, but ..

  Linus


Re: REGRESSION: 37f4a24c2469: blk-mq: centralise related handling into blk_mq_get_driver_tag

2020-09-25 Thread Shakeel Butt
On Fri, Sep 25, 2020 at 9:32 AM Shakeel Butt  wrote:
>
> On Fri, Sep 25, 2020 at 9:19 AM Ming Lei  wrote:
> >
> > On Fri, Sep 25, 2020 at 03:31:45PM +0800, Ming Lei wrote:
> > > On Thu, Sep 24, 2020 at 09:13:11PM -0400, Theodore Y. Ts'o wrote:
> > > > On Thu, Sep 24, 2020 at 10:33:45AM -0400, Theodore Y. Ts'o wrote:
> > > > > HOWEVER, thanks to a hint from a colleague at $WORK, and realizing
> > > > > that one of the stack traces had virtio balloon in the trace, I
> > > > > realized that when I switched the GCE VM type from e1-standard-2 to
> > > > > n1-standard-2 (where e1 VM's are cheaper because they use
> > > > > virtio-balloon to better manage host OS memory utilization), problem
> > > > > has become, much, *much* rarer (and possibly has gone away, although
> > > > > I'm going to want to run a lot more tests before I say that
> > > > > conclusively) on my test setup.  At the very least, using an n1 VM
> > > > > (which doesn't have virtio-balloon enabled in the hypervisor) is
> > > > > enough to unblock ext4 development.
> > > >
> > > >  and I spoke too soon.  A number of runs using -rc6 are now
> > > > failing even with the n1-standard-2 VM, so virtio-ballon may not be an
> > > > indicator.
> > > >
> > > > This is why debugging this is frustrating; it is very much a heisenbug
> > > > --- although 5.8 seems to work completely reliably, as does commits
> > > > before 37f4a24c2469.  Anything after that point will show random
> > > > failures.  :-(
> > >
> > > It does not make sense to mention 37f4a24c2469, which is reverted in
> > > 4e2f62e566b5. Later the patch in 37f4a24c2469 is fixed and re-commited
> > > as 568f27006577.
> > >
> > > However, I can _not_ reproduce the issue by running the same test on
> > > kernel built from 568f27006577 directly.
> > >
> > > Also you have confirmed that the issue can't be fixed after reverting
> > > 568f27006577 against v5.9-rc4.
> > >
> > > Looks the real issue(slab list corruption) should be introduced between
> > > 568f27006577 and v5.9-rc4.
> >
> > git bisect shows the first bad commit:
> >
> > [10befea91b61c4e2c2d1df06a2e978d182fcf792] mm: memcg/slab: use a 
> > single set of
> > kmem_caches for all allocations
> >
> > And I have double checked that the above commit is really the first bad
> > commit for the list corruption issue of 'list_del corruption, 
> > e1c241b00408->next
> > is LIST_POISON1 (dead0100)', see the detailed stack trace and
> > kernel oops log in the following link:
> >
> > https://lore.kernel.org/lkml/20200916202026.gc38...@mit.edu/
>
> The failure signature is similar to
> https://lore.kernel.org/lkml/20200901075321.GL4299@shao2-debian/
>
> >
> > And the kernel config is the one(without KASAN) used by Theodore in GCE VM, 
> > see
> > the following link:
> >
> > https://lore.kernel.org/lkml/20200917143012.gf38...@mit.edu/
> >
> > The reproducer is xfstests generic/038. In my setting, test device is 
> > virtio-scsi, and
> > scratch device is virtio-blk.

Is it possible to check SLUB as well to confirm that the issue is only
happening on SLAB?


Re: REGRESSION: 37f4a24c2469: blk-mq: centralise related handling into blk_mq_get_driver_tag

2020-09-25 Thread Shakeel Butt
On Fri, Sep 25, 2020 at 9:19 AM Ming Lei  wrote:
>
> On Fri, Sep 25, 2020 at 03:31:45PM +0800, Ming Lei wrote:
> > On Thu, Sep 24, 2020 at 09:13:11PM -0400, Theodore Y. Ts'o wrote:
> > > On Thu, Sep 24, 2020 at 10:33:45AM -0400, Theodore Y. Ts'o wrote:
> > > > HOWEVER, thanks to a hint from a colleague at $WORK, and realizing
> > > > that one of the stack traces had virtio balloon in the trace, I
> > > > realized that when I switched the GCE VM type from e1-standard-2 to
> > > > n1-standard-2 (where e1 VM's are cheaper because they use
> > > > virtio-balloon to better manage host OS memory utilization), problem
> > > > has become, much, *much* rarer (and possibly has gone away, although
> > > > I'm going to want to run a lot more tests before I say that
> > > > conclusively) on my test setup.  At the very least, using an n1 VM
> > > > (which doesn't have virtio-balloon enabled in the hypervisor) is
> > > > enough to unblock ext4 development.
> > >
> > >  and I spoke too soon.  A number of runs using -rc6 are now
> > > failing even with the n1-standard-2 VM, so virtio-ballon may not be an
> > > indicator.
> > >
> > > This is why debugging this is frustrating; it is very much a heisenbug
> > > --- although 5.8 seems to work completely reliably, as does commits
> > > before 37f4a24c2469.  Anything after that point will show random
> > > failures.  :-(
> >
> > It does not make sense to mention 37f4a24c2469, which is reverted in
> > 4e2f62e566b5. Later the patch in 37f4a24c2469 is fixed and re-commited
> > as 568f27006577.
> >
> > However, I can _not_ reproduce the issue by running the same test on
> > kernel built from 568f27006577 directly.
> >
> > Also you have confirmed that the issue can't be fixed after reverting
> > 568f27006577 against v5.9-rc4.
> >
> > Looks the real issue(slab list corruption) should be introduced between
> > 568f27006577 and v5.9-rc4.
>
> git bisect shows the first bad commit:
>
> [10befea91b61c4e2c2d1df06a2e978d182fcf792] mm: memcg/slab: use a 
> single set of
> kmem_caches for all allocations
>
> And I have double checked that the above commit is really the first bad
> commit for the list corruption issue of 'list_del corruption, 
> e1c241b00408->next
> is LIST_POISON1 (dead0100)', see the detailed stack trace and
> kernel oops log in the following link:
>
> https://lore.kernel.org/lkml/20200916202026.gc38...@mit.edu/

The failure signature is similar to
https://lore.kernel.org/lkml/20200901075321.GL4299@shao2-debian/

>
> And the kernel config is the one(without KASAN) used by Theodore in GCE VM, 
> see
> the following link:
>
> https://lore.kernel.org/lkml/20200917143012.gf38...@mit.edu/
>
> The reproducer is xfstests generic/038. In my setting, test device is 
> virtio-scsi, and
> scratch device is virtio-blk.
>
>
> [1] git bisect log:
>
> git bisect start
> # good: [568f2700657794b8258e72983ba508793a658942] blk-mq: centralise related 
> handling into blk_mq_get_driver_tag
> git bisect good 568f2700657794b8258e72983ba508793a658942
> # bad: [f4d51dffc6c01a9e94650d95ce0104964f8ae822] Linux 5.9-rc4
> git bisect bad f4d51dffc6c01a9e94650d95ce0104964f8ae822
> # good: [8186749621ed6b8fc42644c399e8c755a2b6f630] Merge tag 
> 'drm-next-2020-08-06' of git://anongit.freedesktop.org/drm/drm
> git bisect good 8186749621ed6b8fc42644c399e8c755a2b6f630
> # good: [60e76bb8a4e4c5398ea9053535e1fd0c9d6bb06e] Merge tag 
> 'm68knommu-for-v5.9' of 
> git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu
> git bisect good 60e76bb8a4e4c5398ea9053535e1fd0c9d6bb06e
> # bad: [57b077939287835b9396a1c3b40d35609cf2fcb8] Merge tag 'for_linus' of 
> git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
> git bisect bad 57b077939287835b9396a1c3b40d35609cf2fcb8
> # bad: [0f43283be7fec4a76cd4ed50dc37db30719bde05] Merge branch 'work.fdpic' 
> of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
> git bisect bad 0f43283be7fec4a76cd4ed50dc37db30719bde05
> # good: [5631c5e0eb9035d92ceb20fcd9cdb7779a3f5cc7] Merge tag 
> 'xfs-5.9-merge-7' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
> git bisect good 5631c5e0eb9035d92ceb20fcd9cdb7779a3f5cc7
> # good: [e3083c3f369700cd1eec3de93b8d8ec0918ff778] media: cafe-driver: use 
> generic power management
> git bisect good e3083c3f369700cd1eec3de93b8d8ec0918ff778
> # bad: [42742d9bde2a8e11ec932cb5821f720a40a7c2a9] mm: thp: replace HTTP links 
> with HTTPS ones
> git bisect bad 42742d9bde2a8e11ec932cb5821f720a40a7c2a9
> # bad: [10befea91b61c4e2c2d1df06a2e978d182fcf792] mm: memcg/slab: use a 
> single set of kmem_caches for all allocations
> git bisect bad 10befea91b61c4e2c2d1df06a2e978d182fcf792
> # good: [cfbe1636c3585c1e032bfac512fb8be903fbc913] mm, kcsan: instrument 
> SLAB/SLUB free with "ASSERT_EXCLUSIVE_ACCESS"
> git bisect good cfbe1636c3585c1e032bfac512fb8be903fbc913
> # good: [0f190a7ab78878f9e6c6930fc0f5f92c1250b57d] mm/page_io.c: use 
> blk_io_schedule() for avoiding task hung in sync io
> git 

Re: REGRESSION: 37f4a24c2469: blk-mq: centralise related handling into blk_mq_get_driver_tag

2020-09-25 Thread Ming Lei
On Fri, Sep 25, 2020 at 03:31:45PM +0800, Ming Lei wrote:
> On Thu, Sep 24, 2020 at 09:13:11PM -0400, Theodore Y. Ts'o wrote:
> > On Thu, Sep 24, 2020 at 10:33:45AM -0400, Theodore Y. Ts'o wrote:
> > > HOWEVER, thanks to a hint from a colleague at $WORK, and realizing
> > > that one of the stack traces had virtio balloon in the trace, I
> > > realized that when I switched the GCE VM type from e1-standard-2 to
> > > n1-standard-2 (where e1 VM's are cheaper because they use
> > > virtio-balloon to better manage host OS memory utilization), problem
> > > has become, much, *much* rarer (and possibly has gone away, although
> > > I'm going to want to run a lot more tests before I say that
> > > conclusively) on my test setup.  At the very least, using an n1 VM
> > > (which doesn't have virtio-balloon enabled in the hypervisor) is
> > > enough to unblock ext4 development.
> > 
> >  and I spoke too soon.  A number of runs using -rc6 are now
> > failing even with the n1-standard-2 VM, so virtio-ballon may not be an
> > indicator.
> > 
> > This is why debugging this is frustrating; it is very much a heisenbug
> > --- although 5.8 seems to work completely reliably, as does commits
> > before 37f4a24c2469.  Anything after that point will show random
> > failures.  :-(
> 
> It does not make sense to mention 37f4a24c2469, which is reverted in
> 4e2f62e566b5. Later the patch in 37f4a24c2469 is fixed and re-commited
> as 568f27006577.
> 
> However, I can _not_ reproduce the issue by running the same test on
> kernel built from 568f27006577 directly.
> 
> Also you have confirmed that the issue can't be fixed after reverting
> 568f27006577 against v5.9-rc4.
> 
> Looks the real issue(slab list corruption) should be introduced between
> 568f27006577 and v5.9-rc4.

git bisect shows the first bad commit:

[10befea91b61c4e2c2d1df06a2e978d182fcf792] mm: memcg/slab: use a single 
set of
kmem_caches for all allocations

And I have double checked that the above commit is really the first bad
commit for the list corruption issue of 'list_del corruption, 
e1c241b00408->next
is LIST_POISON1 (dead0100)', see the detailed stack trace and
kernel oops log in the following link:

https://lore.kernel.org/lkml/20200916202026.gc38...@mit.edu/

And the kernel config is the one(without KASAN) used by Theodore in GCE VM, see
the following link:

https://lore.kernel.org/lkml/20200917143012.gf38...@mit.edu/

The reproducer is xfstests generic/038. In my setting, test device is 
virtio-scsi, and
scratch device is virtio-blk.


[1] git bisect log:

git bisect start
# good: [568f2700657794b8258e72983ba508793a658942] blk-mq: centralise related 
handling into blk_mq_get_driver_tag
git bisect good 568f2700657794b8258e72983ba508793a658942
# bad: [f4d51dffc6c01a9e94650d95ce0104964f8ae822] Linux 5.9-rc4
git bisect bad f4d51dffc6c01a9e94650d95ce0104964f8ae822
# good: [8186749621ed6b8fc42644c399e8c755a2b6f630] Merge tag 
'drm-next-2020-08-06' of git://anongit.freedesktop.org/drm/drm
git bisect good 8186749621ed6b8fc42644c399e8c755a2b6f630
# good: [60e76bb8a4e4c5398ea9053535e1fd0c9d6bb06e] Merge tag 
'm68knommu-for-v5.9' of 
git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu
git bisect good 60e76bb8a4e4c5398ea9053535e1fd0c9d6bb06e
# bad: [57b077939287835b9396a1c3b40d35609cf2fcb8] Merge tag 'for_linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
git bisect bad 57b077939287835b9396a1c3b40d35609cf2fcb8
# bad: [0f43283be7fec4a76cd4ed50dc37db30719bde05] Merge branch 'work.fdpic' of 
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
git bisect bad 0f43283be7fec4a76cd4ed50dc37db30719bde05
# good: [5631c5e0eb9035d92ceb20fcd9cdb7779a3f5cc7] Merge tag 'xfs-5.9-merge-7' 
of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
git bisect good 5631c5e0eb9035d92ceb20fcd9cdb7779a3f5cc7
# good: [e3083c3f369700cd1eec3de93b8d8ec0918ff778] media: cafe-driver: use 
generic power management
git bisect good e3083c3f369700cd1eec3de93b8d8ec0918ff778
# bad: [42742d9bde2a8e11ec932cb5821f720a40a7c2a9] mm: thp: replace HTTP links 
with HTTPS ones
git bisect bad 42742d9bde2a8e11ec932cb5821f720a40a7c2a9
# bad: [10befea91b61c4e2c2d1df06a2e978d182fcf792] mm: memcg/slab: use a single 
set of kmem_caches for all allocations
git bisect bad 10befea91b61c4e2c2d1df06a2e978d182fcf792
# good: [cfbe1636c3585c1e032bfac512fb8be903fbc913] mm, kcsan: instrument 
SLAB/SLUB free with "ASSERT_EXCLUSIVE_ACCESS"
git bisect good cfbe1636c3585c1e032bfac512fb8be903fbc913
# good: [0f190a7ab78878f9e6c6930fc0f5f92c1250b57d] mm/page_io.c: use 
blk_io_schedule() for avoiding task hung in sync io
git bisect good 0f190a7ab78878f9e6c6930fc0f5f92c1250b57d
# good: [286e04b8ed7a04279ae277f0f024430246ea5eec] mm: memcg/slab: allocate 
obj_cgroups for non-root slab pages
git bisect good 286e04b8ed7a04279ae277f0f024430246ea5eec
# good: [9855609bde03e2472b99a95e869d29ee1e78a751] mm: memcg/slab: use a single 
set of kmem_caches for all accounted 

Re: REGRESSION: 37f4a24c2469: blk-mq: centralise related handling into blk_mq_get_driver_tag

2020-09-25 Thread Ming Lei
On Thu, Sep 24, 2020 at 09:13:11PM -0400, Theodore Y. Ts'o wrote:
> On Thu, Sep 24, 2020 at 10:33:45AM -0400, Theodore Y. Ts'o wrote:
> > HOWEVER, thanks to a hint from a colleague at $WORK, and realizing
> > that one of the stack traces had virtio balloon in the trace, I
> > realized that when I switched the GCE VM type from e1-standard-2 to
> > n1-standard-2 (where e1 VM's are cheaper because they use
> > virtio-balloon to better manage host OS memory utilization), problem
> > has become, much, *much* rarer (and possibly has gone away, although
> > I'm going to want to run a lot more tests before I say that
> > conclusively) on my test setup.  At the very least, using an n1 VM
> > (which doesn't have virtio-balloon enabled in the hypervisor) is
> > enough to unblock ext4 development.
> 
>  and I spoke too soon.  A number of runs using -rc6 are now
> failing even with the n1-standard-2 VM, so virtio-ballon may not be an
> indicator.
> 
> This is why debugging this is frustrating; it is very much a heisenbug
> --- although 5.8 seems to work completely reliably, as does commits
> before 37f4a24c2469.  Anything after that point will show random
> failures.  :-(

It does not make sense to mention 37f4a24c2469, which is reverted in
4e2f62e566b5. Later the patch in 37f4a24c2469 is fixed and re-commited
as 568f27006577.

However, I can _not_ reproduce the issue by running the same test on
kernel built from 568f27006577 directly.

Also you have confirmed that the issue can't be fixed after reverting
568f27006577 against v5.9-rc4.

Looks the real issue(slab list corruption) should be introduced between
568f27006577 and v5.9-rc4.


thanks,
Ming



Re: REGRESSION: 37f4a24c2469: blk-mq: centralise related handling into blk_mq_get_driver_tag

2020-09-24 Thread Ming Lei
On Fri, Sep 25, 2020 at 09:14:16AM +0800, Ming Lei wrote:
> On Thu, Sep 24, 2020 at 10:33:45AM -0400, Theodore Y. Ts'o wrote:
> > On Thu, Sep 24, 2020 at 08:59:01AM +0800, Ming Lei wrote:
> > > 
> > > The list corruption issue can be reproduced on kvm/qumu guest too when
> > > running xfstests(ext4) generic/038.
> > > 
> > > However, the issue may become not reproduced when adding or removing 
> > > memory
> > > debug options, such as adding KASAN.
> > 
> > Can you reliably reprodue this crash?  And if so, with what config and
> > what kernel version.
> 
> Yeah, it can be reproduced reliably by running xfstests(ext4)
> generic/038 over virtio-scsi(test device)/virtio-blk(scratch device).
> 
> The kernel is -rc4, and not test -rc5 yet.
> 
> It is exactly the config you provided, and I just enabled CDROM & ISOFS
> against your config for passing cloud-init data via cdrom to VM.
> 
> > One of the reasons why I had gone silent on this bug is that I've been
> > trying many, many configurations and configurations which reliably
> > reproduced on say, -rc4 would not reproduce on -rc5, and then I would
> > get a completely different set of results on -rc6.  So I've been
> > trying to run a lot of different experiments to try to understand what
> > might be going on, since it seems pretty clear this must be a very
> > timing-sensitive failure.
> > 
> > I also found that the re-occrance went down significantly if I enabled
> > KASAN, and while it didn't go away, I wasn't able to get a KASAN
> > failure to trigger, either.  Turning off CONFIG_PROVE_LOCKING and a
> > *lot* of other debugging configs made the problem vanish in -rc4, but
> > that trick didn't work with -rc5 or -rc6.
> 
> The issue can be reproduced reliably in my environment after
> disabling LOCKDEP only for disabling KMEMLEAK only.
> 
> But after disabling DEBUG_OBJECTS, it becomes hard to trigger.
> 
> > 
> > Each time I discovered one of these things, I was about to post to the
> > e-mail thread, only to have a confirmation test run on a different
> > kernel version make the problem go away.  In particular, your revert
> > helped with -rc4 and -rc6 IIRC, but it didn't help in -rc5.
> > 
> > HOWEVER, thanks to a hint from a colleague at $WORK, and realizing
> > that one of the stack traces had virtio balloon in the trace, I
> > realized that when I switched the GCE VM type from e1-standard-2 to
> > n1-standard-2 (where e1 VM's are cheaper because they use
> > virtio-balloon to better manage host OS memory utilization), problem
> > has become, much, *much* rarer (and possibly has gone away, although
> > I'm going to want to run a lot more tests before I say that
> > conclusively) on my test setup.  At the very least, using an n1 VM
> > (which doesn't have virtio-balloon enabled in the hypervisor) is
> > enough to unblock ext4 development.
> > 
> > Any chance your kvm/qemu configuration might have been using
> > virtio-ballon?  Because other ext4 developers who have been using
> > kvm-xftests have not had any problems
> 
> I don't setup virtio-ballon, see the attached script for setting the VM.
> 
> > 
> > > When I enable PAGE_POISONING, double free on kmalloc(192) is captured:
> > > 
> > > [ 1198.317139] slab: double free detected in cache 'kmalloc-192', objp 
> > > 89ada7584300^M
> > > [ 1198.326651] [ cut here ]^M
> > > [ 1198.327969] kernel BUG at mm/slab.c:2535!^M
> > > [ 1198.329129] invalid opcode:  [#1] SMP PTI^M
> > > [ 1198.333776] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 
> > > 5.9.0-rc4_quiesce_srcu-xfstests #102^M
> > > [ 1198.336085] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 
> > > rel-1.13.0-48-gd9c812dda519-prebuilt.qemu.org 04/01/2014^M
> > > [ 1198.339826] RIP: 0010:free_block.cold.92+0x13/0x15^M
> > > [ 1198.341472] Code: 8d 44 05 f0 eb d0 48 63 83 e0 00 00 00 48 8d 54 05 
> > > f8 e9 4b 81 ff ff 48 8b 73 58 48 89 ea 48 c7 c7 98 e7 4a 9c e8 20 c3 eb 
> > > ff <0f> 0b 48 8b 73 58 48 c7 c2 20 e8 4a 9c 48 c7 c7 70 32 22 9c e8 19^M
> > > [ 1198.347331] RSP: 0018:982e40710be8 EFLAGS: 00010046^M
> > > [ 1198.349091] RAX: 0048 RBX: 89adb6441400 RCX: 
> > > ^M
> > > [ 1198.351839] RDX:  RSI: 89adbaa97800 RDI: 
> > > 89adbaa97800^M
> > > [ 1198.354572] RBP: 89ada7584300 R08: 0417 R09: 
> > > 0057^M
> > > [ 1198.357150] R10: 0001 R11: 982e40710aa5 R12: 
> > > 89adbaaae598^M
> > > [ 1198.359067] R13: e7bc819d6108 R14: e7bc819d6100 R15: 
> > > 89adb6442280^M
> > > [ 1198.360975] FS:  () GS:89adbaa8() 
> > > knlGS:^M
> > > [ 1198.363202] CS:  0010 DS:  ES:  CR0: 80050033^M
> > > [ 1198.365986] CR2: 55f6a3811318 CR3: 00017adca005 CR4: 
> > > 00770ee0^M
> > > [ 1198.368679] DR0:  DR1:  DR2: 
> > > ^M
> > > [ 1198.371386] DR3:  DR6: 

Re: REGRESSION: 37f4a24c2469: blk-mq: centralise related handling into blk_mq_get_driver_tag

2020-09-24 Thread Ming Lei
On Thu, Sep 24, 2020 at 10:33:45AM -0400, Theodore Y. Ts'o wrote:
> On Thu, Sep 24, 2020 at 08:59:01AM +0800, Ming Lei wrote:
> > 
> > The list corruption issue can be reproduced on kvm/qumu guest too when
> > running xfstests(ext4) generic/038.
> > 
> > However, the issue may become not reproduced when adding or removing memory
> > debug options, such as adding KASAN.
> 
> Can you reliably reprodue this crash?  And if so, with what config and
> what kernel version.

Yeah, it can be reproduced reliably by running xfstests(ext4)
generic/038 over virtio-scsi(test device)/virtio-blk(scratch device).

The kernel is -rc4, and not test -rc5 yet.

It is exactly the config you provided, and I just enabled CDROM & ISOFS
against your config for passing cloud-init data via cdrom to VM.

> One of the reasons why I had gone silent on this bug is that I've been
> trying many, many configurations and configurations which reliably
> reproduced on say, -rc4 would not reproduce on -rc5, and then I would
> get a completely different set of results on -rc6.  So I've been
> trying to run a lot of different experiments to try to understand what
> might be going on, since it seems pretty clear this must be a very
> timing-sensitive failure.
> 
> I also found that the re-occrance went down significantly if I enabled
> KASAN, and while it didn't go away, I wasn't able to get a KASAN
> failure to trigger, either.  Turning off CONFIG_PROVE_LOCKING and a
> *lot* of other debugging configs made the problem vanish in -rc4, but
> that trick didn't work with -rc5 or -rc6.

The issue can be reproduced reliably in my environment after
disabling LOCKDEP only for disabling KMEMLEAK only.

But after disabling DEBUG_OBJECTS, it becomes hard to trigger.

> 
> Each time I discovered one of these things, I was about to post to the
> e-mail thread, only to have a confirmation test run on a different
> kernel version make the problem go away.  In particular, your revert
> helped with -rc4 and -rc6 IIRC, but it didn't help in -rc5.
> 
> HOWEVER, thanks to a hint from a colleague at $WORK, and realizing
> that one of the stack traces had virtio balloon in the trace, I
> realized that when I switched the GCE VM type from e1-standard-2 to
> n1-standard-2 (where e1 VM's are cheaper because they use
> virtio-balloon to better manage host OS memory utilization), problem
> has become, much, *much* rarer (and possibly has gone away, although
> I'm going to want to run a lot more tests before I say that
> conclusively) on my test setup.  At the very least, using an n1 VM
> (which doesn't have virtio-balloon enabled in the hypervisor) is
> enough to unblock ext4 development.
> 
> Any chance your kvm/qemu configuration might have been using
> virtio-ballon?  Because other ext4 developers who have been using
> kvm-xftests have not had any problems

I don't setup virtio-ballon, see the attached script for setting the VM.

> 
> > When I enable PAGE_POISONING, double free on kmalloc(192) is captured:
> > 
> > [ 1198.317139] slab: double free detected in cache 'kmalloc-192', objp 
> > 89ada7584300^M
> > [ 1198.326651] [ cut here ]^M
> > [ 1198.327969] kernel BUG at mm/slab.c:2535!^M
> > [ 1198.329129] invalid opcode:  [#1] SMP PTI^M
> > [ 1198.333776] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 
> > 5.9.0-rc4_quiesce_srcu-xfstests #102^M
> > [ 1198.336085] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 
> > rel-1.13.0-48-gd9c812dda519-prebuilt.qemu.org 04/01/2014^M
> > [ 1198.339826] RIP: 0010:free_block.cold.92+0x13/0x15^M
> > [ 1198.341472] Code: 8d 44 05 f0 eb d0 48 63 83 e0 00 00 00 48 8d 54 05 f8 
> > e9 4b 81 ff ff 48 8b 73 58 48 89 ea 48 c7 c7 98 e7 4a 9c e8 20 c3 eb ff 
> > <0f> 0b 48 8b 73 58 48 c7 c2 20 e8 4a 9c 48 c7 c7 70 32 22 9c e8 19^M
> > [ 1198.347331] RSP: 0018:982e40710be8 EFLAGS: 00010046^M
> > [ 1198.349091] RAX: 0048 RBX: 89adb6441400 RCX: 
> > ^M
> > [ 1198.351839] RDX:  RSI: 89adbaa97800 RDI: 
> > 89adbaa97800^M
> > [ 1198.354572] RBP: 89ada7584300 R08: 0417 R09: 
> > 0057^M
> > [ 1198.357150] R10: 0001 R11: 982e40710aa5 R12: 
> > 89adbaaae598^M
> > [ 1198.359067] R13: e7bc819d6108 R14: e7bc819d6100 R15: 
> > 89adb6442280^M
> > [ 1198.360975] FS:  () GS:89adbaa8() 
> > knlGS:^M
> > [ 1198.363202] CS:  0010 DS:  ES:  CR0: 80050033^M
> > [ 1198.365986] CR2: 55f6a3811318 CR3: 00017adca005 CR4: 
> > 00770ee0^M
> > [ 1198.368679] DR0:  DR1:  DR2: 
> > ^M
> > [ 1198.371386] DR3:  DR6: fffe0ff0 DR7: 
> > 0400^M
> > [ 1198.374203] PKRU: 5554^M
> > [ 1198.375174] Call Trace:^M
> > [ 1198.376165]  ^M
> > [ 1198.376908]  ___cache_free+0x56d/0x770^M
> > [ 1198.378355]  ? kmem_freepages+0xa0/0xf0^M
> > [ 1198.379814]  

Re: REGRESSION: 37f4a24c2469: blk-mq: centralise related handling into blk_mq_get_driver_tag

2020-09-24 Thread Theodore Y. Ts'o
On Thu, Sep 24, 2020 at 10:33:45AM -0400, Theodore Y. Ts'o wrote:
> HOWEVER, thanks to a hint from a colleague at $WORK, and realizing
> that one of the stack traces had virtio balloon in the trace, I
> realized that when I switched the GCE VM type from e1-standard-2 to
> n1-standard-2 (where e1 VM's are cheaper because they use
> virtio-balloon to better manage host OS memory utilization), problem
> has become, much, *much* rarer (and possibly has gone away, although
> I'm going to want to run a lot more tests before I say that
> conclusively) on my test setup.  At the very least, using an n1 VM
> (which doesn't have virtio-balloon enabled in the hypervisor) is
> enough to unblock ext4 development.

 and I spoke too soon.  A number of runs using -rc6 are now
failing even with the n1-standard-2 VM, so virtio-ballon may not be an
indicator.

This is why debugging this is frustrating; it is very much a heisenbug
--- although 5.8 seems to work completely reliably, as does commits
before 37f4a24c2469.  Anything after that point will show random
failures.  :-(

- Ted





Re: REGRESSION: 37f4a24c2469: blk-mq: centralise related handling into blk_mq_get_driver_tag

2020-09-24 Thread Theodore Y. Ts'o
On Thu, Sep 24, 2020 at 08:59:01AM +0800, Ming Lei wrote:
> 
> The list corruption issue can be reproduced on kvm/qumu guest too when
> running xfstests(ext4) generic/038.
> 
> However, the issue may become not reproduced when adding or removing memory
> debug options, such as adding KASAN.

Can you reliably reprodue this crash?  And if so, with what config and
what kernel version.

One of the reasons why I had gone silent on this bug is that I've been
trying many, many configurations and configurations which reliably
reproduced on say, -rc4 would not reproduce on -rc5, and then I would
get a completely different set of results on -rc6.  So I've been
trying to run a lot of different experiments to try to understand what
might be going on, since it seems pretty clear this must be a very
timing-sensitive failure.

I also found that the re-occrance went down significantly if I enabled
KASAN, and while it didn't go away, I wasn't able to get a KASAN
failure to trigger, either.  Turning off CONFIG_PROVE_LOCKING and a
*lot* of other debugging configs made the problem vanish in -rc4, but
that trick didn't work with -rc5 or -rc6.

Each time I discovered one of these things, I was about to post to the
e-mail thread, only to have a confirmation test run on a different
kernel version make the problem go away.  In particular, your revert
helped with -rc4 and -rc6 IIRC, but it didn't help in -rc5.

HOWEVER, thanks to a hint from a colleague at $WORK, and realizing
that one of the stack traces had virtio balloon in the trace, I
realized that when I switched the GCE VM type from e1-standard-2 to
n1-standard-2 (where e1 VM's are cheaper because they use
virtio-balloon to better manage host OS memory utilization), problem
has become, much, *much* rarer (and possibly has gone away, although
I'm going to want to run a lot more tests before I say that
conclusively) on my test setup.  At the very least, using an n1 VM
(which doesn't have virtio-balloon enabled in the hypervisor) is
enough to unblock ext4 development.

Any chance your kvm/qemu configuration might have been using
virtio-ballon?  Because other ext4 developers who have been using
kvm-xftests have not had any problems

> When I enable PAGE_POISONING, double free on kmalloc(192) is captured:
> 
> [ 1198.317139] slab: double free detected in cache 'kmalloc-192', objp 
> 89ada7584300^M
> [ 1198.326651] [ cut here ]^M
> [ 1198.327969] kernel BUG at mm/slab.c:2535!^M
> [ 1198.329129] invalid opcode:  [#1] SMP PTI^M
> [ 1198.333776] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 
> 5.9.0-rc4_quiesce_srcu-xfstests #102^M
> [ 1198.336085] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 
> rel-1.13.0-48-gd9c812dda519-prebuilt.qemu.org 04/01/2014^M
> [ 1198.339826] RIP: 0010:free_block.cold.92+0x13/0x15^M
> [ 1198.341472] Code: 8d 44 05 f0 eb d0 48 63 83 e0 00 00 00 48 8d 54 05 f8 e9 
> 4b 81 ff ff 48 8b 73 58 48 89 ea 48 c7 c7 98 e7 4a 9c e8 20 c3 eb ff <0f> 0b 
> 48 8b 73 58 48 c7 c2 20 e8 4a 9c 48 c7 c7 70 32 22 9c e8 19^M
> [ 1198.347331] RSP: 0018:982e40710be8 EFLAGS: 00010046^M
> [ 1198.349091] RAX: 0048 RBX: 89adb6441400 RCX: 
> ^M
> [ 1198.351839] RDX:  RSI: 89adbaa97800 RDI: 
> 89adbaa97800^M
> [ 1198.354572] RBP: 89ada7584300 R08: 0417 R09: 
> 0057^M
> [ 1198.357150] R10: 0001 R11: 982e40710aa5 R12: 
> 89adbaaae598^M
> [ 1198.359067] R13: e7bc819d6108 R14: e7bc819d6100 R15: 
> 89adb6442280^M
> [ 1198.360975] FS:  () GS:89adbaa8() 
> knlGS:^M
> [ 1198.363202] CS:  0010 DS:  ES:  CR0: 80050033^M
> [ 1198.365986] CR2: 55f6a3811318 CR3: 00017adca005 CR4: 
> 00770ee0^M
> [ 1198.368679] DR0:  DR1:  DR2: 
> ^M
> [ 1198.371386] DR3:  DR6: fffe0ff0 DR7: 
> 0400^M
> [ 1198.374203] PKRU: 5554^M
> [ 1198.375174] Call Trace:^M
> [ 1198.376165]  ^M
> [ 1198.376908]  ___cache_free+0x56d/0x770^M
> [ 1198.378355]  ? kmem_freepages+0xa0/0xf0^M
> [ 1198.379814]  kfree+0x91/0x120^M
> [ 1198.382121]  kmem_freepages+0xa0/0xf0^M
> [ 1198.383474]  slab_destroy+0x9f/0x120^M
> [ 1198.384779]  slabs_destroy+0x6d/0x90^M
> [ 1198.386110]  ___cache_free+0x632/0x770^M
> [ 1198.387547]  ? kmem_freepages+0xa0/0xf0^M
> [ 1198.389016]  kfree+0x91/0x120^M
> [ 1198.390160]  kmem_freepages+0xa0/0xf0^M
> [ 1198.391551]  slab_destroy+0x9f/0x120^M
> [ 1198.392964]  slabs_destroy+0x6d/0x90^M
> [ 1198.394439]  ___cache_free+0x632/0x770^M
> [ 1198.395896]  kmem_cache_free.part.75+0x19/0x70^M
> [ 1198.397791]  rcu_core+0x1eb/0x6b0^M
> [ 1198.399829]  ? ktime_get+0x37/0xa0^M
> [ 1198.401343]  __do_softirq+0xdf/0x2c5^M
> [ 1198.403010]  asm_call_on_stack+0x12/0x20^M
> [ 1198.404847]  ^M
> [ 1198.405799]  do_softirq_own_stack+0x39/0x50^M
> [ 1198.407621]  irq_exit_rcu+0x97/0xa0^M
> [ 

Re: REGRESSION: 37f4a24c2469: blk-mq: centralise related handling into blk_mq_get_driver_tag

2020-09-23 Thread Ming Lei
On Thu, Sep 17, 2020 at 10:30:12AM -0400, Theodore Y. Ts'o wrote:
> On Thu, Sep 17, 2020 at 10:20:51AM +0800, Ming Lei wrote:
> > 
> > Obviously there is other more serious issue, since 568f27006577 is
> > completely reverted in your test, and you still see list corruption
> > issue.
> > 
> > So I'd suggest to find the big issue first. Once it is fixed, maybe
> > everything becomes fine.
> > ...
> > Looks it is more like a memory corruption issue, is there any helpful log
> > dumped when running kernel with kasan?
> 
> Last night, I ran six VM's using -rc4 with and without KASAN; without
> Kasan, half of them hung.  With KASAN enabled, all of the test VM's
> ran to completion.
> 
> This strongly suggests whatever the problem is, it's timing related.
> I'll run a larger set of test runs to see if this pattern is confirmed
> today.
> 
> > BTW, I have kvm/qumu auto test which runs blktest/xfstest over 
> > virtio-blk/virito-scsi/loop/nvme
> > with xfs/ext4 every two days, and not see such failure recently, but the 
> > kernel config is based
> > rhel8's config.
> 
> Here is the configs I'm using, with and without KASAN.  (With KASAN is
> enabled is sent as a diff to avoid running into LKML's e-mail size
> restrictrions.)
>

The list corruption issue can be reproduced on kvm/qumu guest too when
running xfstests(ext4) generic/038.

However, the issue may become not reproduced when adding or removing memory
debug options, such as adding KASAN.

When I enable PAGE_POISONING, double free on kmalloc(192) is captured:

[ 1198.317139] slab: double free detected in cache 'kmalloc-192', objp 
89ada7584300^M
[ 1198.326651] [ cut here ]^M
[ 1198.327969] kernel BUG at mm/slab.c:2535!^M
[ 1198.329129] invalid opcode:  [#1] SMP PTI^M
[ 1198.333776] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 
5.9.0-rc4_quiesce_srcu-xfstests #102^M
[ 1198.336085] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 
rel-1.13.0-48-gd9c812dda519-prebuilt.qemu.org 04/01/2014^M
[ 1198.339826] RIP: 0010:free_block.cold.92+0x13/0x15^M
[ 1198.341472] Code: 8d 44 05 f0 eb d0 48 63 83 e0 00 00 00 48 8d 54 05 f8 e9 
4b 81 ff ff 48 8b 73 58 48 89 ea 48 c7 c7 98 e7 4a 9c e8 20 c3 eb ff <0f> 0b 48 
8b 73 58 48 c7 c2 20 e8 4a 9c 48 c7 c7 70 32 22 9c e8 19^M
[ 1198.347331] RSP: 0018:982e40710be8 EFLAGS: 00010046^M
[ 1198.349091] RAX: 0048 RBX: 89adb6441400 RCX: 
^M
[ 1198.351839] RDX:  RSI: 89adbaa97800 RDI: 
89adbaa97800^M
[ 1198.354572] RBP: 89ada7584300 R08: 0417 R09: 
0057^M
[ 1198.357150] R10: 0001 R11: 982e40710aa5 R12: 
89adbaaae598^M
[ 1198.359067] R13: e7bc819d6108 R14: e7bc819d6100 R15: 
89adb6442280^M
[ 1198.360975] FS:  () GS:89adbaa8() 
knlGS:^M
[ 1198.363202] CS:  0010 DS:  ES:  CR0: 80050033^M
[ 1198.365986] CR2: 55f6a3811318 CR3: 00017adca005 CR4: 
00770ee0^M
[ 1198.368679] DR0:  DR1:  DR2: 
^M
[ 1198.371386] DR3:  DR6: fffe0ff0 DR7: 
0400^M
[ 1198.374203] PKRU: 5554^M
[ 1198.375174] Call Trace:^M
[ 1198.376165]  ^M
[ 1198.376908]  ___cache_free+0x56d/0x770^M
[ 1198.378355]  ? kmem_freepages+0xa0/0xf0^M
[ 1198.379814]  kfree+0x91/0x120^M
[ 1198.382121]  kmem_freepages+0xa0/0xf0^M
[ 1198.383474]  slab_destroy+0x9f/0x120^M
[ 1198.384779]  slabs_destroy+0x6d/0x90^M
[ 1198.386110]  ___cache_free+0x632/0x770^M
[ 1198.387547]  ? kmem_freepages+0xa0/0xf0^M
[ 1198.389016]  kfree+0x91/0x120^M
[ 1198.390160]  kmem_freepages+0xa0/0xf0^M
[ 1198.391551]  slab_destroy+0x9f/0x120^M
[ 1198.392964]  slabs_destroy+0x6d/0x90^M
[ 1198.394439]  ___cache_free+0x632/0x770^M
[ 1198.395896]  kmem_cache_free.part.75+0x19/0x70^M
[ 1198.397791]  rcu_core+0x1eb/0x6b0^M
[ 1198.399829]  ? ktime_get+0x37/0xa0^M
[ 1198.401343]  __do_softirq+0xdf/0x2c5^M
[ 1198.403010]  asm_call_on_stack+0x12/0x20^M
[ 1198.404847]  ^M
[ 1198.405799]  do_softirq_own_stack+0x39/0x50^M
[ 1198.407621]  irq_exit_rcu+0x97/0xa0^M
[ 1198.409127]  sysvec_apic_timer_interrupt+0x2c/0x80^M
[ 1198.410608]  asm_sysvec_apic_timer_interrupt+0x12/0x20^M
[ 1198.411883] RIP: 0010:default_idle+0x13/0x20^M
[ 1198.412994] Code: 89 44 24 20 48 83 c0 22 48 89 44 24 28 eb c7 e8 03 93 ff 
ff cc cc cc 0f 1f 44 00 00 e9 07 00 00 00 0f 00 2d 11 ec 55 00 fb f4  66 66 
2e 0f 1f 84 00 00 00 00 00 90 0f 1f 44 00 00 65 48 8b 04^M
[ 1198.418791] RSP: 0018:982e406abec8 EFLAGS: 0246^M
[ 1198.420529] RAX: 9bca6ea0 RBX: 0001 RCX: 
89aeafa69648^M
[ 1198.422884] RDX: 89adb9c0 RSI:  RDI: 
0116f5b68899^M
[ 1198.425283] RBP: 0001 R08: 012531639bb4 R09: 
^M
[ 1198.427804] R10: 89adb5e1f600 R11:  R12: 
^M
[ 1198.430532] R13:  R14:  R15: 
^M
[ 

Re: REGRESSION: 37f4a24c2469: blk-mq: centralise related handling into blk_mq_get_driver_tag

2020-09-17 Thread Ming Lei
On Thu, Sep 17, 2020 at 10:30:12AM -0400, Theodore Y. Ts'o wrote:
> On Thu, Sep 17, 2020 at 10:20:51AM +0800, Ming Lei wrote:
> > 
> > Obviously there is other more serious issue, since 568f27006577 is
> > completely reverted in your test, and you still see list corruption
> > issue.
> > 
> > So I'd suggest to find the big issue first. Once it is fixed, maybe
> > everything becomes fine.
> > ...
> > Looks it is more like a memory corruption issue, is there any helpful log
> > dumped when running kernel with kasan?
> 
> Last night, I ran six VM's using -rc4 with and without KASAN; without
> Kasan, half of them hung.  With KASAN enabled, all of the test VM's
> ran to completion.

>From your last email, when you run -rc4 with revert of 568f27006577, you
can observe list corruption easily.

So can you enable KASAN on -rc4 with revert of 568f27006577 and see if
it makes a difference?

> 
> This strongly suggests whatever the problem is, it's timing related.
> I'll run a larger set of test runs to see if this pattern is confirmed
> today.

Looks you enable lots of other debug options, such a lockdep, which has
much much heavy runtime load. Maybe you can disable all non-KASAN debug
option(non-KASAN memory debug options, lockdep, ...) and keep KASAN
debug only and see if you are lucky.


Thanks, 
Ming



Re: REGRESSION: 37f4a24c2469: blk-mq: centralise related handling into blk_mq_get_driver_tag

2020-09-17 Thread Theodore Y. Ts'o
On Thu, Sep 17, 2020 at 10:20:51AM +0800, Ming Lei wrote:
> 
> Obviously there is other more serious issue, since 568f27006577 is
> completely reverted in your test, and you still see list corruption
> issue.
> 
> So I'd suggest to find the big issue first. Once it is fixed, maybe
> everything becomes fine.
> ...
> Looks it is more like a memory corruption issue, is there any helpful log
> dumped when running kernel with kasan?

Last night, I ran six VM's using -rc4 with and without KASAN; without
Kasan, half of them hung.  With KASAN enabled, all of the test VM's
ran to completion.

This strongly suggests whatever the problem is, it's timing related.
I'll run a larger set of test runs to see if this pattern is confirmed
today.

> BTW, I have kvm/qumu auto test which runs blktest/xfstest over 
> virtio-blk/virito-scsi/loop/nvme
> with xfs/ext4 every two days, and not see such failure recently, but the 
> kernel config is based
> rhel8's config.

Here is the configs I'm using, with and without KASAN.  (With KASAN is
enabled is sent as a diff to avoid running into LKML's e-mail size
restrictrions.)

 - Ted
#
# Automatically generated file; DO NOT EDIT.
# Linux/x86_64 5.9.0-rc4 Kernel Configuration
#
CONFIG_CC_VERSION_TEXT="gcc (Debian 10.2.0-7) 10.2.0"
CONFIG_CC_IS_GCC=y
CONFIG_GCC_VERSION=100200
CONFIG_LD_VERSION=23500
CONFIG_CLANG_VERSION=0
CONFIG_CC_CAN_LINK=y
CONFIG_CC_CAN_LINK_STATIC=y
CONFIG_CC_HAS_ASM_GOTO=y
CONFIG_CC_HAS_ASM_INLINE=y
CONFIG_IRQ_WORK=y
CONFIG_BUILDTIME_TABLE_SORT=y
CONFIG_THREAD_INFO_IN_TASK=y

#
# General setup
#
CONFIG_INIT_ENV_ARG_LIMIT=32
# CONFIG_COMPILE_TEST is not set
CONFIG_LOCALVERSION="-xfstests"
CONFIG_LOCALVERSION_AUTO=y
CONFIG_BUILD_SALT=""
CONFIG_HAVE_KERNEL_GZIP=y
CONFIG_HAVE_KERNEL_BZIP2=y
CONFIG_HAVE_KERNEL_LZMA=y
CONFIG_HAVE_KERNEL_XZ=y
CONFIG_HAVE_KERNEL_LZO=y
CONFIG_HAVE_KERNEL_LZ4=y
CONFIG_HAVE_KERNEL_ZSTD=y
CONFIG_KERNEL_GZIP=y
# CONFIG_KERNEL_BZIP2 is not set
# CONFIG_KERNEL_LZMA is not set
# CONFIG_KERNEL_XZ is not set
# CONFIG_KERNEL_LZO is not set
# CONFIG_KERNEL_LZ4 is not set
# CONFIG_KERNEL_ZSTD is not set
CONFIG_DEFAULT_INIT=""
CONFIG_DEFAULT_HOSTNAME="(none)"
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
CONFIG_POSIX_MQUEUE=y
CONFIG_POSIX_MQUEUE_SYSCTL=y
# CONFIG_WATCH_QUEUE is not set
CONFIG_CROSS_MEMORY_ATTACH=y
CONFIG_USELIB=y
# CONFIG_AUDIT is not set
CONFIG_HAVE_ARCH_AUDITSYSCALL=y

#
# IRQ subsystem
#
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_GENERIC_IRQ_SHOW=y
CONFIG_GENERIC_IRQ_EFFECTIVE_AFF_MASK=y
CONFIG_GENERIC_PENDING_IRQ=y
CONFIG_GENERIC_IRQ_MIGRATION=y
CONFIG_HARDIRQS_SW_RESEND=y
CONFIG_IRQ_DOMAIN=y
CONFIG_IRQ_DOMAIN_HIERARCHY=y
CONFIG_GENERIC_MSI_IRQ=y
CONFIG_GENERIC_MSI_IRQ_DOMAIN=y
CONFIG_GENERIC_IRQ_MATRIX_ALLOCATOR=y
CONFIG_GENERIC_IRQ_RESERVATION_MODE=y
CONFIG_IRQ_FORCED_THREADING=y
CONFIG_SPARSE_IRQ=y
# CONFIG_GENERIC_IRQ_DEBUGFS is not set
# end of IRQ subsystem

CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_ARCH_CLOCKSOURCE_INIT=y
CONFIG_CLOCKSOURCE_VALIDATE_LAST_CYCLE=y
CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST=y
CONFIG_GENERIC_CMOS_UPDATE=y
CONFIG_HAVE_POSIX_CPU_TIMERS_TASK_WORK=y
CONFIG_POSIX_CPU_TIMERS_TASK_WORK=y

#
# Timers subsystem
#
CONFIG_TICK_ONESHOT=y
CONFIG_NO_HZ_COMMON=y
# CONFIG_HZ_PERIODIC is not set
CONFIG_NO_HZ_IDLE=y
# CONFIG_NO_HZ_FULL is not set
CONFIG_NO_HZ=y
CONFIG_HIGH_RES_TIMERS=y
# end of Timers subsystem

CONFIG_PREEMPT_NONE=y
# CONFIG_PREEMPT_VOLUNTARY is not set
# CONFIG_PREEMPT is not set
CONFIG_PREEMPT_COUNT=y

#
# CPU/Task time and stats accounting
#
CONFIG_TICK_CPU_ACCOUNTING=y
# CONFIG_VIRT_CPU_ACCOUNTING_GEN is not set
# CONFIG_IRQ_TIME_ACCOUNTING is not set
CONFIG_HAVE_SCHED_AVG_IRQ=y
# CONFIG_BSD_PROCESS_ACCT is not set
# CONFIG_TASKSTATS is not set
# CONFIG_PSI is not set
# end of CPU/Task time and stats accounting

CONFIG_CPU_ISOLATION=y

#
# RCU Subsystem
#
CONFIG_TREE_RCU=y
# CONFIG_RCU_EXPERT is not set
CONFIG_SRCU=y
CONFIG_TREE_SRCU=y
CONFIG_TASKS_RCU_GENERIC=y
CONFIG_TASKS_RUDE_RCU=y
CONFIG_RCU_STALL_COMMON=y
CONFIG_RCU_NEED_SEGCBLIST=y
# end of RCU Subsystem

CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
# CONFIG_IKHEADERS is not set
CONFIG_LOG_BUF_SHIFT=17
CONFIG_LOG_CPU_MAX_BUF_SHIFT=12
CONFIG_PRINTK_SAFE_LOG_BUF_SHIFT=13
CONFIG_HAVE_UNSTABLE_SCHED_CLOCK=y

#
# Scheduler features
#
# CONFIG_UCLAMP_TASK is not set
# end of Scheduler features

CONFIG_ARCH_SUPPORTS_NUMA_BALANCING=y
CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH=y
CONFIG_CC_HAS_INT128=y
CONFIG_ARCH_SUPPORTS_INT128=y
# CONFIG_NUMA_BALANCING is not set
CONFIG_CGROUPS=y
CONFIG_PAGE_COUNTER=y
CONFIG_MEMCG=y
CONFIG_MEMCG_SWAP=y
CONFIG_MEMCG_KMEM=y
CONFIG_BLK_CGROUP=y
CONFIG_CGROUP_WRITEBACK=y
CONFIG_CGROUP_SCHED=y
CONFIG_FAIR_GROUP_SCHED=y
# CONFIG_CFS_BANDWIDTH is not set
# CONFIG_RT_GROUP_SCHED is not set
CONFIG_CGROUP_PIDS=y
CONFIG_CGROUP_RDMA=y
CONFIG_CGROUP_FREEZER=y
CONFIG_CPUSETS=y

Re: REGRESSION: 37f4a24c2469: blk-mq: centralise related handling into blk_mq_get_driver_tag

2020-09-16 Thread Ming Lei
On Wed, Sep 16, 2020 at 04:20:26PM -0400, Theodore Y. Ts'o wrote:
> On Wed, Sep 16, 2020 at 07:09:41AM +0800, Ming Lei wrote:
> > > The problem is it's a bit tricky to revert 568f27006577, since there
> > > is a merge conflict in blk_kick_flush().  I attempted to do the bisect
> > > manually here, but it's clearly not right since the kernel is not
> > > booting after the revert:
> > > 
> > > https://github.com/tytso/ext4/commit/1e67516382a33da2c9d483b860ac4ec2ad390870
> > > 
> > > branch:
> > > 
> > > https://github.com/tytso/ext4/tree/manual-revert-of-568f27006577
> > > 
> > > Can you send me a patch which correctly reverts 568f27006577?  I can
> > > try it against -rc1 .. -rc4, whichever is most convenient.
> > 
> > Please test the following revert patch against -rc4.
> 
> Unfortunately the results of the revert is... wierd.
> 
> With -rc4, *all* of the VM's are failing --- reliably.  With rc4 with
> the revert, *some* of the VM's are able to complete the tests, but
> over half are still locking up or failing with some kind of oops.  So
> that seems to imply that there is some kind of timing issue going on,
> or maybe there are multiple bugs in play?

Obviously there is other more serious issue, since 568f27006577 is
completely reverted in your test, and you still see list corruption
issue.

So I'd suggest to find the big issue first. Once it is fixed, maybe
everything becomes fine.

..

> 
> v5.9-rc4 with the revert of 568f27006577: we're seeing a similar
>   number of VM failures, but the failure signature is different.
>   The most common failure is:
> 
>   [  390.023691] [ cut here ]
>   [  390.028614] list_del corruption, e1c241b00408->next is 
> LIST_POISON1 (dead0100)
>   [  390.037040] WARNING: CPU: 1 PID: 5948 at lib/list_debug.c:47 
> __list_del_entry_valid+0x4e/0x90
>   [  390.045684] CPU: 1 PID: 5948 Comm: umount Not tainted 
> 5.9.0-rc4-xfstests-1-g6fdef015feba #11
>   [  390.054581] Hardware name: Google Google Compute Engine/Google 
> Compute Engine, BIOS Google 01/01/2011
>   [  390.063943] RIP: 0010:__list_del_entry_valid+0x4e/0x90
>   [  390.069731] Code: 2e 48 8b 32 48 39 fe 75 3a 48 8b 50 08 48 39 f2 75 
> 48 b8 01 00 00 00 c3 48 89 fe 48 89 c2 48 c7 c7 10 13 12 9b e8 30 2f 8c ff 
> <0f> 0b 31 c0 c3 48 89 fe 48 c7 c7 48 13 12 9b e8 1c 2f 8c ff 0f 0b
>   [  390.088615] RSP: 0018:ae95c6ddba28 EFLAGS: 00010082
>   [  390.094079] RAX:  RBX: ce95bfc007d0 RCX: 
> 0027
>   [  390.101338] RDX: 0027 RSI: a0c9d93d7dc0 RDI: 
> a0c9d93d7dc8
>   [  390.108659] RBP: e1c241b00408 R08: 006ba6bff7dc R09: 
> 
>   [  390.115925] R10: a0c9d3f444c0 R11: 0046 R12: 
> a0c9d8041180
>   [  390.123186] R13: a0c86c010e00 R14: e1c241b00400 R15: 
> a0c9d8042240
>   [  390.130637] FS:  7fb227580080() GS:a0c9d920() 
> knlGS:
>   [  390.138860] CS:  0010 DS:  ES:  CR0: 80050033
>   [  390.144721] CR2: 7ff72d2dfe74 CR3: 0001cdbb8002 CR4: 
> 003706e0
>   [  390.152022] DR0:  DR1:  DR2: 
> 
>   [  390.159314] DR3:  DR6: fffe0ff0 DR7: 
> 0400
>   [  390.166569] Call Trace:
>   [  390.169148]  free_block+0xec/0x270
>   [  390.173100]  ___cache_free+0x488/0x6b0
>   [  390.177062]  kfree+0xc9/0x1d0
>   [  390.181202]  kmem_freepages+0xa0/0xf0
>   [  390.185009]  slab_destroy+0x19/0x50
>   [  390.188653]  slabs_destroy+0x6d/0x90
>   [  390.192339]  ___cache_free+0x4a3/0x6b0
>   [  390.196477]  kfree+0xc9/0x1d0
>   [  390.199651]  kmem_freepages+0xa0/0xf0
>   [  390.203459]  slab_destroy+0x19/0x50
>   [  390.207060]  slabs_destroy+0x6d/0x90
>   [  390.210784]  ___cache_free+0x4a3/0x6b0
>   [  390.214672]  ? lockdep_hardirqs_on_prepare+0xe7/0x180
>   [  390.219845]  kfree+0xc9/0x1d0
>   [  390.222928]  put_crypt_info+0xe3/0x100
>   [  390.226801]  fscrypt_put_encryption_info+0x15/0x30
>   [  390.231721]  ext4_clear_inode+0x80/0x90
>   [  390.235774]  ext4_evict_inode+0x6d/0x630
>   [  390.239960]  evict+0xd0/0x1a0
>   [  390.243049]  dispose_list+0x51/0x80
>   [  390.246659]  evict_inodes+0x15b/0x1b0
>   [  390.250526]  generic_shutdown_super+0x37/0x100
>   [  390.255094]  kill_block_super+0x21/0x50
>   [  390.259066]  deactivate_locked_super+0x2f/0x70
>   [  390.263638]  cleanup_mnt+0xb8/0x140
>   [  390.267248]  task_work_run+0x73/0xc0
>   [  390.270953]  exit_to_user_mode_prepare+0x197/0x1a0
>   [  390.277333]  syscall_exit_to_user_mode+0x3c/0x210
>   [  390.282171]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>   [  390.287348] RIP: 0033:0x7fb2279a6507
>   [  390.291128] Code: 19 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 

Re: REGRESSION: 37f4a24c2469: blk-mq: centralise related handling into blk_mq_get_driver_tag

2020-09-16 Thread Theodore Y. Ts'o
On Wed, Sep 16, 2020 at 07:09:41AM +0800, Ming Lei wrote:
> > The problem is it's a bit tricky to revert 568f27006577, since there
> > is a merge conflict in blk_kick_flush().  I attempted to do the bisect
> > manually here, but it's clearly not right since the kernel is not
> > booting after the revert:
> > 
> > https://github.com/tytso/ext4/commit/1e67516382a33da2c9d483b860ac4ec2ad390870
> > 
> > branch:
> > 
> > https://github.com/tytso/ext4/tree/manual-revert-of-568f27006577
> > 
> > Can you send me a patch which correctly reverts 568f27006577?  I can
> > try it against -rc1 .. -rc4, whichever is most convenient.
> 
> Please test the following revert patch against -rc4.

Unfortunately the results of the revert is... wierd.

With -rc4, *all* of the VM's are failing --- reliably.  With rc4 with
the revert, *some* of the VM's are able to complete the tests, but
over half are still locking up or failing with some kind of oops.  So
that seems to imply that there is some kind of timing issue going on,
or maybe there are multiple bugs in play?

So let's review the bidding.   We're going to review four commits:

7bf137298cb7: (Parent of 568f27006577)  Completely clean, all VM's complete the 
tests

568f27006577: Fails reliably.  In 9 of the 11 VM's there is nothing on
the console; the I/O is just stopped.  If I've been able to
get to the VM before it gets killed from the timeout, ssh
works, but any attempt do any I/O will hang, which presumably
explains why the tests are hanging.  In the other two VM's
there are a hung task timeouts, with stack traces that look
like this...

v5.9-rc4: More than half of the VM's are failing --- but at least some are 
succeeding,
which is more than can be said for 568f27006577.  There is a
*variety* of different sort of failures.  So the fact that
we're not seeing the silent hangs in -rc4 is... interesting


v5.9-rc4 with the revert of 568f27006577: we're seeing a similar
number of VM failures, but the failure signature is different.
The most common failure is...

(More details below, with the stack traces.)

I really don't know what to make of this.  It looks like there's
something going on in the block layer, based the fact that
568f27006577 fails reliably, but its predecssor is completely clean.
But then things have changed significantly by the time we get to -rc4.
I'll do a more in-depth analysis of -rc1 to see if the failure
patterns are more similar to 568f27006577 than -rc4.  But hopefully
you can see something that I'm missing?

Thanks,

- Ted

---


7bf137298cb7: (Parent of 568f27006577)  Completely clean, all VM's complete the 
tests

---

568f27006577: Fails reliably.  In 9 of the 11 VM's there is nothing on
the console; the I/O is just stopped.  If I've been able to
get to the VM before it gets killed from the timeout, ssh
works, but any attempt do any I/O will hang, which presumably
explains why the tests are hanging.  In the other two VM's
there are a hung task timeouts, with stack traces that look
like this:

[14375.634282] INFO: task jbd2/sda1-8:116 blocked for more than 122 
seconds.
[14375.641679]  Not tainted 5.8.0-rc2-xfstests-30545-g568f27006577 
#6
[14375.648517] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[14375.656523] jbd2/sda1-8 D0   116  2 0x4000
[14375.656530] Call Trace:
[14375.656548]  __schedule+0x2cc/0x6e0
[14375.656695]  ? sched_clock_cpu+0xc/0xb0
[14375.656699]  schedule+0x55/0xd0
[14375.656702]  io_schedule+0x12/0x40
[14375.656708]  blk_mq_get_tag+0x11e/0x280
[14375.656715]  ? __wake_up_common_lock+0xc0/0xc0
[14375.656719]  __blk_mq_alloc_request+0xb6/0x100
[14375.656722]  blk_mq_submit_bio+0x13f/0x7d0
[14375.656727]  ? blk_queue_enter+0x15c/0x510
[14375.656731]  submit_bio_noacct+0x48d/0x500
[14375.656737]  ? kvm_sched_clock_read+0x14/0x30
[14375.656740]  ? submit_bio+0x42/0x150
[14375.656744]  submit_bio+0x42/0x150
[14375.656748]  ? guard_bio_eod+0x90/0x140
[14375.656754]  submit_bh_wbc+0x16d/0x190
[14375.656761]  jbd2_journal_commit_transaction+0x70d/0x1f23
[14375.656767]  ? kjournald2+0x128/0x3b0
[14375.656771]  kjournald2+0x128/0x3b0
[14375.656777]  ? trace_hardirqs_on+0x1c/0xf0
[14375.656781]  ? __wake_up_common_lock+0xc0/0xc0
[14375.656785]  ? __jbd2_debug+0x50/0x50
[14375.656788]  kthread+0x136/0x150
[14375.656792]  ? __kthread_queue_delayed_work+0x90/0x90
[14375.656796]  ret_from_fork+0x22/0x30

---

v5.9-rc4: More than half of the VM's are failing --- but at least some are 
succeeding,
which 

Re: REGRESSION: 37f4a24c2469: blk-mq: centralise related handling into blk_mq_get_driver_tag

2020-09-15 Thread Ming Lei
On Tue, Sep 15, 2020 at 06:45:41PM -0400, Theodore Y. Ts'o wrote:
> On Tue, Sep 15, 2020 at 03:33:03PM +0800, Ming Lei wrote:
> > Hi Theodore,
> > 
> > On Tue, Sep 15, 2020 at 12:45:19AM -0400, Theodore Y. Ts'o wrote:
> > > On Thu, Sep 03, 2020 at 11:55:28PM -0400, Theodore Y. Ts'o wrote:
> > > > Worse, right now, -rc1 and -rc2 is causing random crashes in my
> > > > gce-xfstests framework.  Sometimes it happens before we've run even a
> > > > single xfstests; sometimes it happens after we have successfully
> > > > completed all of the tests, and we're doing a shutdown of the VM under
> > > > test.  Other times it happens in the middle of a test run.  Given that
> > > > I'm seeing this at -rc1, which is before my late ext4 pull request to
> > > > Linus, it's probably not an ext4 related bug.  But it also means that
> > > > I'm partially blind in terms of my kernel testing at the moment.  So I
> > > > can't even tell Linus that I've run lots of tests and I'm 100%
> > > > confident your one-line change is 100% safe.
> > > 
> > > I was finally able to bisect it down to the commit:
> > > 
> > > 37f4a24c2469: blk-mq: centralise related handling into 
> > > blk_mq_get_driver_tag
> > 
> > 37f4a24c2469 has been reverted in:
> > 
> > 4e2f62e566b5 Revert "blk-mq: put driver tag when this request is 
> > completed"
> > 
> > And later the patch is committed as the following after being fixed:
> > 
> > 568f27006577 blk-mq: centralise related handling into 
> > blk_mq_get_driver_tag
> > 
> > So can you reproduce the issue by running kernel of commit 568f27006577?
> 
> Yes.  And things work fine if I try 4e2f62e566b5.
> 
> > If yes, can the issue be fixed by reverting 568f27006577?
> 
> The problem is it's a bit tricky to revert 568f27006577, since there
> is a merge conflict in blk_kick_flush().  I attempted to do the bisect
> manually here, but it's clearly not right since the kernel is not
> booting after the revert:
> 
> https://github.com/tytso/ext4/commit/1e67516382a33da2c9d483b860ac4ec2ad390870
> 
> branch:
> 
> https://github.com/tytso/ext4/tree/manual-revert-of-568f27006577
> 
> Can you send me a patch which correctly reverts 568f27006577?  I can
> try it against -rc1 .. -rc4, whichever is most convenient.

Please test the following revert patch against -rc4.

diff --git a/block/blk-flush.c b/block/blk-flush.c
index 53abb5c73d99..24c208d21793 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -219,6 +219,7 @@ static void flush_end_io(struct request *flush_rq, 
blk_status_t error)
struct request *rq, *n;
unsigned long flags = 0;
struct blk_flush_queue *fq = blk_get_flush_queue(q, flush_rq->mq_ctx);
+   struct blk_mq_hw_ctx *hctx;
 
blk_account_io_flush(flush_rq);
 
@@ -234,11 +235,13 @@ static void flush_end_io(struct request *flush_rq, 
blk_status_t error)
if (fq->rq_status != BLK_STS_OK)
error = fq->rq_status;
 
+   hctx = flush_rq->mq_hctx;
if (!q->elevator) {
-   flush_rq->tag = BLK_MQ_NO_TAG;
+   blk_mq_tag_set_rq(hctx, flush_rq->tag, fq->orig_rq);
+   flush_rq->tag = -1;
} else {
blk_mq_put_driver_tag(flush_rq);
-   flush_rq->internal_tag = BLK_MQ_NO_TAG;
+   flush_rq->internal_tag = -1;
}
 
running = >flush_queue[fq->flush_running_idx];
@@ -309,16 +312,12 @@ static void blk_kick_flush(struct request_queue *q, 
struct blk_flush_queue *fq,
flush_rq->mq_hctx = first_rq->mq_hctx;
 
if (!q->elevator) {
+   fq->orig_rq = first_rq;
flush_rq->tag = first_rq->tag;
-
-   /*
-* We borrow data request's driver tag, so have to mark
-* this flush request as INFLIGHT for avoiding double
-* account of this driver tag
-*/
-   flush_rq->rq_flags |= RQF_MQ_INFLIGHT;
-   } else
+   blk_mq_tag_set_rq(flush_rq->mq_hctx, first_rq->tag, flush_rq);
+   } else {
flush_rq->internal_tag = first_rq->internal_tag;
+   }
 
flush_rq->cmd_flags = REQ_OP_FLUSH | REQ_PREFLUSH;
flush_rq->cmd_flags |= (flags & REQ_DRV) | (flags & REQ_FAILFAST_MASK);
diff --git a/block/blk-mq-tag.h b/block/blk-mq-tag.h
index b1acac518c4e..3945c7f5b944 100644
--- a/block/blk-mq-tag.h
+++ b/block/blk-mq-tag.h
@@ -101,6 +101,18 @@ static inline bool hctx_may_queue(struct blk_mq_hw_ctx 
*hctx,
return atomic_read(>nr_active) < depth;
 }
 
+/*
+ * This helper should only be used for flush request to share tag
+ * with the request cloned from, and both the two requests can't be
+ * in flight at the same time. The caller has to make sure the tag
+ * can't be freed.
+ */
+static inline void blk_mq_tag_set_rq(struct blk_mq_hw_ctx *hctx,
+   unsigned int tag, struct request *rq)
+{
+   hctx->tags->rqs[tag] = rq;
+}
+
 static inline bool blk_mq_tag_is_reserved(struct blk_mq_tags 

Re: REGRESSION: 37f4a24c2469: blk-mq: centralise related handling into blk_mq_get_driver_tag

2020-09-15 Thread Theodore Y. Ts'o
On Tue, Sep 15, 2020 at 03:33:03PM +0800, Ming Lei wrote:
> Hi Theodore,
> 
> On Tue, Sep 15, 2020 at 12:45:19AM -0400, Theodore Y. Ts'o wrote:
> > On Thu, Sep 03, 2020 at 11:55:28PM -0400, Theodore Y. Ts'o wrote:
> > > Worse, right now, -rc1 and -rc2 is causing random crashes in my
> > > gce-xfstests framework.  Sometimes it happens before we've run even a
> > > single xfstests; sometimes it happens after we have successfully
> > > completed all of the tests, and we're doing a shutdown of the VM under
> > > test.  Other times it happens in the middle of a test run.  Given that
> > > I'm seeing this at -rc1, which is before my late ext4 pull request to
> > > Linus, it's probably not an ext4 related bug.  But it also means that
> > > I'm partially blind in terms of my kernel testing at the moment.  So I
> > > can't even tell Linus that I've run lots of tests and I'm 100%
> > > confident your one-line change is 100% safe.
> > 
> > I was finally able to bisect it down to the commit:
> > 
> > 37f4a24c2469: blk-mq: centralise related handling into blk_mq_get_driver_tag
> 
> 37f4a24c2469 has been reverted in:
> 
>   4e2f62e566b5 Revert "blk-mq: put driver tag when this request is 
> completed"
> 
> And later the patch is committed as the following after being fixed:
> 
>   568f27006577 blk-mq: centralise related handling into 
> blk_mq_get_driver_tag
> 
> So can you reproduce the issue by running kernel of commit 568f27006577?

Yes.  And things work fine if I try 4e2f62e566b5.

> If yes, can the issue be fixed by reverting 568f27006577?

The problem is it's a bit tricky to revert 568f27006577, since there
is a merge conflict in blk_kick_flush().  I attempted to do the bisect
manually here, but it's clearly not right since the kernel is not
booting after the revert:

https://github.com/tytso/ext4/commit/1e67516382a33da2c9d483b860ac4ec2ad390870

branch:

https://github.com/tytso/ext4/tree/manual-revert-of-568f27006577

Can you send me a patch which correctly reverts 568f27006577?  I can
try it against -rc1 .. -rc4, whichever is most convenient.

> Can you share the exact mount command line for setup the environment?
> and the exact xfstest item?

It's a variety of mount command lines, since I'm using gce-xfstests[1][2]
using a variety of file system scenarios --- but the basic one, which
is ext4 using the default 4k block size is failing (they all are failing).

[1] https://thunk.org/gce-xfstests
[2] 
https://github.com/tytso/xfstests-bld/blob/master/Documentation/gce-xfstests.md

It's also not one consistent xfstests which is failing, but it does
tend to be tests which are loading up the storage stack with a lot of
small random read/writes, especially involving metadata blocks/writes.
(For example, tests which run fsstress.)

Since this reliably triggers for me, and other people running
kvm-xfstests or are running xfstests on their own test environments
aren't seeing it, I'm assuming it must be some kind of interesting
interaction between virtio-scsi, perhaps with how Google Persistent
Disk is behaving (maybe timing related?  who knows?).  Darrick Wong
did say he saw something like it once using Oracle's Cloud
infrastructure, but as far as I know it hasn't reproduced since.  On
Google Compute Engine VM's, it reproduces *extremely* reliably.

I expect that if you were to set up gce-xfstests, get a free GCE
account with the initial $300 free credits, you could run
"gce-xfstests -c ext4/4k -g auto" and it would reproduce within an
hour or so.  (So under a dollar's worth of VM credits, so long as you
notice that it's hung and shut down the VM after gathering debugging
data.)

The instructions are at [2], and the image xfstests-202008311554 in
the xfstests-cloud project is a public copy of the VM test appliance I
was using.

% gcloud compute images describe --project xfstests-cloud xfstests-202008311554
archiveSizeBytes: '1720022528'
creationTimestamp: '2020-09-15T15:09:30.544-07:00'
description: Linux Kernel File System Test Appliance
diskSizeGb: '10'
family: xfstests
guestOsFeatures:
- type: VIRTIO_SCSI_MULTIQUEUE
- type: UEFI_COMPATIBLE
id: '1558420969906537845'
kind: compute#image
labelFingerprint: V-2Qgcxt2uw=
labels:
  blktests: g8a75bed
  e2fsprogs: v1_45_6
  fio: fio-3_22
  fsverity: v1_2
  ima-evm-utils: v1_3_1
  nvme-cli: v1_12
  quota: g13bb8c2
  util-linux: v2_36
  xfsprogs: v5_8_0-rc1
  xfstests: linux-v3_8-2838-geb439bf2
  xfstests-bld: gb5085ab
licenseCodes:
- '5543610867827062957'
licenses:
- 
https://www.googleapis.com/compute/v1/projects/debian-cloud/global/licenses/debian-10-buster
name: xfstests-202008311554
selfLink: 
https://www.googleapis.com/compute/v1/projects/xfstests-cloud/global/images/xfstests-202008311554
sourceDisk: 
https://www.googleapis.com/compute/v1/projects/xfstests-cloud/zones/us-east1-d/disks/temp-xfstests-202008311554
sourceDiskId: '5824762850044577124'
sourceType: RAW
status: READY
storageLocations:
- us

Cheers,

- Ted

P.S.  As 

Re: REGRESSION: 37f4a24c2469: blk-mq: centralise related handling into blk_mq_get_driver_tag

2020-09-15 Thread Ming Lei
Hi Theodore,

On Tue, Sep 15, 2020 at 12:45:19AM -0400, Theodore Y. Ts'o wrote:
> On Thu, Sep 03, 2020 at 11:55:28PM -0400, Theodore Y. Ts'o wrote:
> > Worse, right now, -rc1 and -rc2 is causing random crashes in my
> > gce-xfstests framework.  Sometimes it happens before we've run even a
> > single xfstests; sometimes it happens after we have successfully
> > completed all of the tests, and we're doing a shutdown of the VM under
> > test.  Other times it happens in the middle of a test run.  Given that
> > I'm seeing this at -rc1, which is before my late ext4 pull request to
> > Linus, it's probably not an ext4 related bug.  But it also means that
> > I'm partially blind in terms of my kernel testing at the moment.  So I
> > can't even tell Linus that I've run lots of tests and I'm 100%
> > confident your one-line change is 100% safe.
> 
> I was finally able to bisect it down to the commit:
> 
> 37f4a24c2469: blk-mq: centralise related handling into blk_mq_get_driver_tag

37f4a24c2469 has been reverted in:

4e2f62e566b5 Revert "blk-mq: put driver tag when this request is 
completed"

And later the patch is committed as the following after being fixed:

568f27006577 blk-mq: centralise related handling into 
blk_mq_get_driver_tag

So can you reproduce the issue by running kernel of commit 568f27006577?
If yes, can the issue be fixed by reverting 568f27006577?

> 
> (See below for [1] Bisect log.)
> 
> The previous commit allows the tests to run to completion.  With
> commit 37f4a24c2469 and later all 11 test scenarios (4k blocks, 1k
> blocks, ext3 compat, ext4 w/ fscrypt, nojournal mode, data=journal,
> bigalloc, etc.) the VM will get stuck.

Can you share the exact mount command line for setup the environment?
and the exact xfstest item?



Thanks,
Ming



REGRESSION: 37f4a24c2469: blk-mq: centralise related handling into blk_mq_get_driver_tag

2020-09-14 Thread Theodore Y. Ts'o
On Thu, Sep 03, 2020 at 11:55:28PM -0400, Theodore Y. Ts'o wrote:
> Worse, right now, -rc1 and -rc2 is causing random crashes in my
> gce-xfstests framework.  Sometimes it happens before we've run even a
> single xfstests; sometimes it happens after we have successfully
> completed all of the tests, and we're doing a shutdown of the VM under
> test.  Other times it happens in the middle of a test run.  Given that
> I'm seeing this at -rc1, which is before my late ext4 pull request to
> Linus, it's probably not an ext4 related bug.  But it also means that
> I'm partially blind in terms of my kernel testing at the moment.  So I
> can't even tell Linus that I've run lots of tests and I'm 100%
> confident your one-line change is 100% safe.

I was finally able to bisect it down to the commit:

37f4a24c2469: blk-mq: centralise related handling into blk_mq_get_driver_tag

(See below for [1] Bisect log.)

The previous commit allows the tests to run to completion.  With
commit 37f4a24c2469 and later all 11 test scenarios (4k blocks, 1k
blocks, ext3 compat, ext4 w/ fscrypt, nojournal mode, data=journal,
bigalloc, etc.) the VM will get stuck.

The symptom is that while running xfstests in a Google Compute Engine
(GCE) VM, the tests just hang.  There are a number of tests where this
is more likely, but it's not unique to a single test.

In most cases, there is nothing; just the test stops running until the
test framework times out after an hour (tests usually complete in
seconds or at most a few tens of minutes or so in the worst case) and
kills the VM.  In one case, I did get a report like this.  (See below
for [2] stack trace from 37f4a24c2469.)

I attempted to revert the commit in question against -rc1 and -rc4;
that result can be found at the branches manual-revert-of-blk-mq-patch
and manual-revert-of-blk-mq-patch-rc4 at https://github.com/tytso/ext4.

I don't think I got the revert quite right; with the revert, most of
the test VM's successfully complete, but 2 out of the 11 fail, with a
different stack trace.  (See below for [3] stack trace from my
attempted manual revert of 37f4a24c2469).  But it does seem to confirm
that the primary cause of the test VM hangs is caused by commit
37f4a24c2469.

Does this make any sense as to what might be going on?  I hope it does
for you, since I'm pretty confused what might be going on.

Thanks,

   - Ted
   

[1] Bisect log

git bisect start
# bad: [9123e3a74ec7b934a4a099e98af6a61c2f80bbf5] Linux 5.9-rc1
git bisect bad 9123e3a74ec7b934a4a099e98af6a61c2f80bbf5
# good: [bcf876870b95592b52519ed4aafcf9d95999bc9c] Linux 5.8
git bisect good bcf876870b95592b52519ed4aafcf9d95999bc9c
# bad: [8186749621ed6b8fc42644c399e8c755a2b6f630] Merge tag 
'drm-next-2020-08-06' of git://anongit.freedesktop.org/drm/drm
git bisect bad 8186749621ed6b8fc42644c399e8c755a2b6f630
# bad: [2324d50d051ec0f14a548e78554fb02513d6dcef] Merge tag 'docs-5.9' of 
git://git.lwn.net/linux
git bisect bad 2324d50d051ec0f14a548e78554fb02513d6dcef
# bad: [92c59e126b21fd212195358a0d296e787e444087] Merge tag 'arm-defconfig-5.9' 
of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
git bisect bad 92c59e126b21fd212195358a0d296e787e444087
# bad: [cdc8fcb49905c0b67e355e027cb462ee168ffaa3] Merge tag 
'for-5.9/io_uring-20200802' of git://git.kernel.dk/linux-block
git bisect bad cdc8fcb49905c0b67e355e027cb462ee168ffaa3
# good: [ab5c60b79ab6cc50b39bbb21b2f9fb55af900b84] Merge branch 'linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
git bisect good ab5c60b79ab6cc50b39bbb21b2f9fb55af900b84
# bad: [d958e343bdc3de2643ce25225bed082dc222858d] block: blk-timeout: delete 
duplicated word
git bisect bad d958e343bdc3de2643ce25225bed082dc222858d
# bad: [53042f3cc411adc79811ba3cfbca5d7a42a7b806] ps3vram: stop using 
->queuedata
git bisect bad 53042f3cc411adc79811ba3cfbca5d7a42a7b806
# good: [621c1f42945e76015c3a585e7a9fe6e71665eba0] block: move struct 
block_device to blk_types.h
git bisect good 621c1f42945e76015c3a585e7a9fe6e71665eba0
# good: [36a3df5a4574d5ddf59804fcd0c4e9654c514d9a] blk-mq: put driver tag when 
this request is completed
git bisect good 36a3df5a4574d5ddf59804fcd0c4e9654c514d9a
# good: [570e9b73b0af2e5381ca5343759779b8c1ed20e3] blk-mq: move 
blk_mq_get_driver_tag into blk-mq.c
git bisect good 570e9b73b0af2e5381ca5343759779b8c1ed20e3
# bad: [b5fc1e8bedf8ad2c6381e0df6331ad5686aca425] blk-mq: remove pointless call 
of list_entry_rq() in hctx_show_busy_rq()
git bisect bad b5fc1e8bedf8ad2c6381e0df6331ad5686aca425
# bad: [37f4a24c2469a10a4c16c641671bd766e276cf9f] blk-mq: centralise related 
handling into blk_mq_get_driver_tag
git bisect bad 37f4a24c2469a10a4c16c641671bd766e276cf9f
# good: [723bf178f158abd1ce6069cb049581b3cb003aab] blk-mq: move 
blk_mq_put_driver_tag() into blk-mq.c
git bisect good 723bf178f158abd1ce6069cb049581b3cb003aab
# first bad commit: [37f4a24c2469a10a4c16c641671bd766e276cf9f] blk-mq: 
centralise related handling into