Re: possible deadlock in sk_clone_lock

2021-03-05 Thread Michal Hocko
On Tue 02-03-21 19:59:22, Shakeel Butt wrote:
> On Tue, Mar 2, 2021 at 1:19 PM Mike Kravetz  wrote:
> >
> > On 3/2/21 6:29 AM, Michal Hocko wrote:
> > > On Tue 02-03-21 06:11:51, Shakeel Butt wrote:
> > >> On Tue, Mar 2, 2021 at 1:44 AM Michal Hocko  wrote:
> > >>>
> > >>> On Mon 01-03-21 17:16:29, Mike Kravetz wrote:
> >  On 3/1/21 9:23 AM, Michal Hocko wrote:
> > > On Mon 01-03-21 08:39:22, Shakeel Butt wrote:
> > >> On Mon, Mar 1, 2021 at 7:57 AM Michal Hocko  wrote:
> > > [...]
> > >>> Then how come this can ever be a problem? in_task() should exclude 
> > >>> soft
> > >>> irq context unless I am mistaken.
> > >>>
> > >>
> > >> If I take the following example of syzbot's deadlock scenario then
> > >> CPU1 is the one freeing the hugetlb pages. It is in the process
> > >> context but has disabled softirqs (see __tcp_close()).
> > >>
> > >> CPU0CPU1
> > >> 
> > >>lock(hugetlb_lock);
> > >> local_irq_disable();
> > >> lock(slock-AF_INET);
> > >> lock(hugetlb_lock);
> > >>
> > >>  lock(slock-AF_INET);
> > >>
> > > [...]
> > >>> Wouldn't something like this help? It is quite ugly but it would be
> > >>> simple enough and backportable while we come up with a more rigorous
> > >>> solution. What do you think?
> > >>>
> > >>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > >>> index 4bdb58ab14cb..c9a8b39f678d 100644
> > >>> --- a/mm/hugetlb.c
> > >>> +++ b/mm/hugetlb.c
> > >>> @@ -1495,9 +1495,11 @@ static DECLARE_WORK(free_hpage_work, 
> > >>> free_hpage_workfn);
> > >>>  void free_huge_page(struct page *page)
> > >>>  {
> > >>> /*
> > >>> -* Defer freeing if in non-task context to avoid hugetlb_lock 
> > >>> deadlock.
> > >>> +* Defer freeing if in non-task context or when put_page is 
> > >>> called
> > >>> +* with IRQ disabled (e.g from via TCP slock dependency chain) 
> > >>> to
> > >>> +* avoid hugetlb_lock deadlock.
> > >>>  */
> > >>> -   if (!in_task()) {
> > >>> +   if (!in_task() || irqs_disabled()) {
> > >>
> > >> Does irqs_disabled() also check softirqs?
> > >
> > > Nope it doesn't AFAICS. I was referring to the above lockdep splat which
> > > claims irq disabled to be the trigger. But now that you are mentioning
> > > that it would be better to replace in_task() along the way. We have
> > > discussed that in another email thread and I was suggesting to use
> > > in_atomic() which should catch also bh disabled situation. The big IF is
> > > that this needs preempt count to be enabled unconditionally. There are
> > > changes in the RCU tree heading that direction.
> >
> > I have not been following developments in preemption and the RCU tree.
> > The comment for in_atomic() says:
> >
> > /*
> >  * Are we running in atomic context?  WARNING: this macro cannot
> >  * always detect atomic context; in particular, it cannot know about
> >  * held spinlocks in non-preemptible kernels.  Thus it should not be
> >  * used in the general case to determine whether sleeping is possible.
> >  * Do not use in_atomic() in driver code.
> >  */
> >
> > That does seem to be the case.  I verified in_atomic can detect softirq
> > context even in non-preemptible kernels.  But, as the comment says it
> > will not detect a held spinlock in non-preemptible kernels.  So, I think
> > in_atomic would be better than the current check for !in_task.  That
> > would handle this syzbot issue, but we could still have issues if the
> > hugetlb put_page path is called while someone is holding a spinlock with
> > all interrupts enabled.  Looks like there is no way to detect this
> > today in non-preemptible kernels.  in_atomic does detect spinlocks held
> > in preemptible kernels.
> >
> > I might suggest changing !in_task to in_atomic for now, and then work on
> > a more robust solution.  I'm afraid such a robust solution will
> > require considerable effort.  It would need to handle put_page being
> > called in any context: hardirq, softirq, spinlock held ...  The
> > put_page/free_huge_page path will need to offload (workqueue or
> > something else) any processing that can possibly sleep.
> >
> > Is it worth making the in_atomic change now, or should we just start
> > working on the more robust complete solution?
> 
> IMHO the change to in_atomic is beneficial because it will at least
> fix this specific issue. No reason to keep the users of TCP TX
> zerocopy from hugetlb pages broken for a more comprehensive solution.

Another option would be to select PREEMPT_COUNT when hugetlb is enabled.
That would reduce dependency on a patch I was talking about in other
email. Not nice but here we are...

-- 
Michal Hocko
SUSE Labs


Re: possible deadlock in sk_clone_lock

2021-03-04 Thread Michal Hocko
On Wed 03-03-21 09:59:45, Paul E. McKenney wrote:
> On Wed, Mar 03, 2021 at 09:03:27AM +0100, Michal Hocko wrote:
[...]
> > Paul what is the current plan with in_atomic to be usable for !PREEMPT
> > configurations?
> 
> Ah, thank you for the reminder!  I have rebased that series on top of
> v5.12-rc1 on -rcu branch tglx-pc.2021.03.03a.
> 
> The current state is that Linus is not convinced that this change is
> worthwhile given that only RCU and printk() want it.  (BPF would use
> it if it were available, but is willing to live without it, at least in
> the short term.)
> 
> But maybe Linus will be convinced given your additional use case.
> Here is hoping!

Yes, hugetlb freeing path would benefit from this. You can reference
this lockdep report 
(http://lkml.kernel.org/r/f1c03b05bc43a...@google.com)
with an additional argument that making hugetlb_lock irq safe is a
larger undertaking and we will need something reasonably backportable
for older kernels as well.
-- 
Michal Hocko
SUSE Labs


Re: possible deadlock in sk_clone_lock

2021-03-03 Thread Paul E. McKenney
On Wed, Mar 03, 2021 at 09:03:27AM +0100, Michal Hocko wrote:
> [Add Paul]
> 
> On Tue 02-03-21 13:19:34, Mike Kravetz wrote:
> > On 3/2/21 6:29 AM, Michal Hocko wrote:
> > > On Tue 02-03-21 06:11:51, Shakeel Butt wrote:
> > >> On Tue, Mar 2, 2021 at 1:44 AM Michal Hocko  wrote:
> > >>>
> > >>> On Mon 01-03-21 17:16:29, Mike Kravetz wrote:
> >  On 3/1/21 9:23 AM, Michal Hocko wrote:
> > > On Mon 01-03-21 08:39:22, Shakeel Butt wrote:
> > >> On Mon, Mar 1, 2021 at 7:57 AM Michal Hocko  wrote:
> > > [...]
> > >>> Then how come this can ever be a problem? in_task() should exclude 
> > >>> soft
> > >>> irq context unless I am mistaken.
> > >>>
> > >>
> > >> If I take the following example of syzbot's deadlock scenario then
> > >> CPU1 is the one freeing the hugetlb pages. It is in the process
> > >> context but has disabled softirqs (see __tcp_close()).
> > >>
> > >> CPU0CPU1
> > >> 
> > >>lock(hugetlb_lock);
> > >> local_irq_disable();
> > >> lock(slock-AF_INET);
> > >> lock(hugetlb_lock);
> > >>
> > >>  lock(slock-AF_INET);
> > >>
> > > [...]
> > >>> Wouldn't something like this help? It is quite ugly but it would be
> > >>> simple enough and backportable while we come up with a more rigorous
> > >>> solution. What do you think?
> > >>>
> > >>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > >>> index 4bdb58ab14cb..c9a8b39f678d 100644
> > >>> --- a/mm/hugetlb.c
> > >>> +++ b/mm/hugetlb.c
> > >>> @@ -1495,9 +1495,11 @@ static DECLARE_WORK(free_hpage_work, 
> > >>> free_hpage_workfn);
> > >>>  void free_huge_page(struct page *page)
> > >>>  {
> > >>> /*
> > >>> -* Defer freeing if in non-task context to avoid hugetlb_lock 
> > >>> deadlock.
> > >>> +* Defer freeing if in non-task context or when put_page is 
> > >>> called
> > >>> +* with IRQ disabled (e.g from via TCP slock dependency chain) 
> > >>> to
> > >>> +* avoid hugetlb_lock deadlock.
> > >>>  */
> > >>> -   if (!in_task()) {
> > >>> +   if (!in_task() || irqs_disabled()) {
> > >>
> > >> Does irqs_disabled() also check softirqs?
> > > 
> > > Nope it doesn't AFAICS. I was referring to the above lockdep splat which
> > > claims irq disabled to be the trigger. But now that you are mentioning
> > > that it would be better to replace in_task() along the way. We have
> > > discussed that in another email thread and I was suggesting to use
> > > in_atomic() which should catch also bh disabled situation. The big IF is
> > > that this needs preempt count to be enabled unconditionally. There are
> > > changes in the RCU tree heading that direction.
> > 
> > I have not been following developments in preemption and the RCU tree. 
> > The comment for in_atomic() says:
> > 
> > /*
> >  * Are we running in atomic context?  WARNING: this macro cannot
> >  * always detect atomic context; in particular, it cannot know about
> >  * held spinlocks in non-preemptible kernels.  Thus it should not be
> >  * used in the general case to determine whether sleeping is possible.
> >  * Do not use in_atomic() in driver code.
> >  */
> > 
> > That does seem to be the case.  I verified in_atomic can detect softirq
> > context even in non-preemptible kernels.  But, as the comment says it
> > will not detect a held spinlock in non-preemptible kernels.  So, I think
> > in_atomic would be better than the current check for !in_task.  That
> > would handle this syzbot issue, but we could still have issues if the
> > hugetlb put_page path is called while someone is holding a spinlock with
> > all interrupts enabled.  Looks like there is no way to detect this
> > today in non-preemptible kernels.  in_atomic does detect spinlocks held
> > in preemptible kernels.
> 
> Paul what is the current plan with in_atomic to be usable for !PREEMPT
> configurations?

Ah, thank you for the reminder!  I have rebased that series on top of
v5.12-rc1 on -rcu branch tglx-pc.2021.03.03a.

The current state is that Linus is not convinced that this change is
worthwhile given that only RCU and printk() want it.  (BPF would use
it if it were available, but is willing to live without it, at least in
the short term.)

But maybe Linus will be convinced given your additional use case.
Here is hoping!

Thanx, Paul


Re: possible deadlock in sk_clone_lock

2021-03-03 Thread Michal Hocko
[Add Paul]

On Tue 02-03-21 13:19:34, Mike Kravetz wrote:
> On 3/2/21 6:29 AM, Michal Hocko wrote:
> > On Tue 02-03-21 06:11:51, Shakeel Butt wrote:
> >> On Tue, Mar 2, 2021 at 1:44 AM Michal Hocko  wrote:
> >>>
> >>> On Mon 01-03-21 17:16:29, Mike Kravetz wrote:
>  On 3/1/21 9:23 AM, Michal Hocko wrote:
> > On Mon 01-03-21 08:39:22, Shakeel Butt wrote:
> >> On Mon, Mar 1, 2021 at 7:57 AM Michal Hocko  wrote:
> > [...]
> >>> Then how come this can ever be a problem? in_task() should exclude 
> >>> soft
> >>> irq context unless I am mistaken.
> >>>
> >>
> >> If I take the following example of syzbot's deadlock scenario then
> >> CPU1 is the one freeing the hugetlb pages. It is in the process
> >> context but has disabled softirqs (see __tcp_close()).
> >>
> >> CPU0CPU1
> >> 
> >>lock(hugetlb_lock);
> >> local_irq_disable();
> >> lock(slock-AF_INET);
> >> lock(hugetlb_lock);
> >>
> >>  lock(slock-AF_INET);
> >>
> > [...]
> >>> Wouldn't something like this help? It is quite ugly but it would be
> >>> simple enough and backportable while we come up with a more rigorous
> >>> solution. What do you think?
> >>>
> >>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> >>> index 4bdb58ab14cb..c9a8b39f678d 100644
> >>> --- a/mm/hugetlb.c
> >>> +++ b/mm/hugetlb.c
> >>> @@ -1495,9 +1495,11 @@ static DECLARE_WORK(free_hpage_work, 
> >>> free_hpage_workfn);
> >>>  void free_huge_page(struct page *page)
> >>>  {
> >>> /*
> >>> -* Defer freeing if in non-task context to avoid hugetlb_lock 
> >>> deadlock.
> >>> +* Defer freeing if in non-task context or when put_page is called
> >>> +* with IRQ disabled (e.g from via TCP slock dependency chain) to
> >>> +* avoid hugetlb_lock deadlock.
> >>>  */
> >>> -   if (!in_task()) {
> >>> +   if (!in_task() || irqs_disabled()) {
> >>
> >> Does irqs_disabled() also check softirqs?
> > 
> > Nope it doesn't AFAICS. I was referring to the above lockdep splat which
> > claims irq disabled to be the trigger. But now that you are mentioning
> > that it would be better to replace in_task() along the way. We have
> > discussed that in another email thread and I was suggesting to use
> > in_atomic() which should catch also bh disabled situation. The big IF is
> > that this needs preempt count to be enabled unconditionally. There are
> > changes in the RCU tree heading that direction.
> 
> I have not been following developments in preemption and the RCU tree. 
> The comment for in_atomic() says:
> 
> /*
>  * Are we running in atomic context?  WARNING: this macro cannot
>  * always detect atomic context; in particular, it cannot know about
>  * held spinlocks in non-preemptible kernels.  Thus it should not be
>  * used in the general case to determine whether sleeping is possible.
>  * Do not use in_atomic() in driver code.
>  */
> 
> That does seem to be the case.  I verified in_atomic can detect softirq
> context even in non-preemptible kernels.  But, as the comment says it
> will not detect a held spinlock in non-preemptible kernels.  So, I think
> in_atomic would be better than the current check for !in_task.  That
> would handle this syzbot issue, but we could still have issues if the
> hugetlb put_page path is called while someone is holding a spinlock with
> all interrupts enabled.  Looks like there is no way to detect this
> today in non-preemptible kernels.  in_atomic does detect spinlocks held
> in preemptible kernels.

Paul what is the current plan with in_atomic to be usable for !PREEMPT
configurations?

-- 
Michal Hocko
SUSE Labs


Re: possible deadlock in sk_clone_lock

2021-03-03 Thread Shakeel Butt
On Tue, Mar 2, 2021 at 1:19 PM Mike Kravetz  wrote:
>
> On 3/2/21 6:29 AM, Michal Hocko wrote:
> > On Tue 02-03-21 06:11:51, Shakeel Butt wrote:
> >> On Tue, Mar 2, 2021 at 1:44 AM Michal Hocko  wrote:
> >>>
> >>> On Mon 01-03-21 17:16:29, Mike Kravetz wrote:
>  On 3/1/21 9:23 AM, Michal Hocko wrote:
> > On Mon 01-03-21 08:39:22, Shakeel Butt wrote:
> >> On Mon, Mar 1, 2021 at 7:57 AM Michal Hocko  wrote:
> > [...]
> >>> Then how come this can ever be a problem? in_task() should exclude 
> >>> soft
> >>> irq context unless I am mistaken.
> >>>
> >>
> >> If I take the following example of syzbot's deadlock scenario then
> >> CPU1 is the one freeing the hugetlb pages. It is in the process
> >> context but has disabled softirqs (see __tcp_close()).
> >>
> >> CPU0CPU1
> >> 
> >>lock(hugetlb_lock);
> >> local_irq_disable();
> >> lock(slock-AF_INET);
> >> lock(hugetlb_lock);
> >>
> >>  lock(slock-AF_INET);
> >>
> > [...]
> >>> Wouldn't something like this help? It is quite ugly but it would be
> >>> simple enough and backportable while we come up with a more rigorous
> >>> solution. What do you think?
> >>>
> >>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> >>> index 4bdb58ab14cb..c9a8b39f678d 100644
> >>> --- a/mm/hugetlb.c
> >>> +++ b/mm/hugetlb.c
> >>> @@ -1495,9 +1495,11 @@ static DECLARE_WORK(free_hpage_work, 
> >>> free_hpage_workfn);
> >>>  void free_huge_page(struct page *page)
> >>>  {
> >>> /*
> >>> -* Defer freeing if in non-task context to avoid hugetlb_lock 
> >>> deadlock.
> >>> +* Defer freeing if in non-task context or when put_page is called
> >>> +* with IRQ disabled (e.g from via TCP slock dependency chain) to
> >>> +* avoid hugetlb_lock deadlock.
> >>>  */
> >>> -   if (!in_task()) {
> >>> +   if (!in_task() || irqs_disabled()) {
> >>
> >> Does irqs_disabled() also check softirqs?
> >
> > Nope it doesn't AFAICS. I was referring to the above lockdep splat which
> > claims irq disabled to be the trigger. But now that you are mentioning
> > that it would be better to replace in_task() along the way. We have
> > discussed that in another email thread and I was suggesting to use
> > in_atomic() which should catch also bh disabled situation. The big IF is
> > that this needs preempt count to be enabled unconditionally. There are
> > changes in the RCU tree heading that direction.
>
> I have not been following developments in preemption and the RCU tree.
> The comment for in_atomic() says:
>
> /*
>  * Are we running in atomic context?  WARNING: this macro cannot
>  * always detect atomic context; in particular, it cannot know about
>  * held spinlocks in non-preemptible kernels.  Thus it should not be
>  * used in the general case to determine whether sleeping is possible.
>  * Do not use in_atomic() in driver code.
>  */
>
> That does seem to be the case.  I verified in_atomic can detect softirq
> context even in non-preemptible kernels.  But, as the comment says it
> will not detect a held spinlock in non-preemptible kernels.  So, I think
> in_atomic would be better than the current check for !in_task.  That
> would handle this syzbot issue, but we could still have issues if the
> hugetlb put_page path is called while someone is holding a spinlock with
> all interrupts enabled.  Looks like there is no way to detect this
> today in non-preemptible kernels.  in_atomic does detect spinlocks held
> in preemptible kernels.
>
> I might suggest changing !in_task to in_atomic for now, and then work on
> a more robust solution.  I'm afraid such a robust solution will
> require considerable effort.  It would need to handle put_page being
> called in any context: hardirq, softirq, spinlock held ...  The
> put_page/free_huge_page path will need to offload (workqueue or
> something else) any processing that can possibly sleep.
>
> Is it worth making the in_atomic change now, or should we just start
> working on the more robust complete solution?

IMHO the change to in_atomic is beneficial because it will at least
fix this specific issue. No reason to keep the users of TCP TX
zerocopy from hugetlb pages broken for a more comprehensive solution.


Re: possible deadlock in sk_clone_lock

2021-03-02 Thread Mike Kravetz
On 3/2/21 6:29 AM, Michal Hocko wrote:
> On Tue 02-03-21 06:11:51, Shakeel Butt wrote:
>> On Tue, Mar 2, 2021 at 1:44 AM Michal Hocko  wrote:
>>>
>>> On Mon 01-03-21 17:16:29, Mike Kravetz wrote:
 On 3/1/21 9:23 AM, Michal Hocko wrote:
> On Mon 01-03-21 08:39:22, Shakeel Butt wrote:
>> On Mon, Mar 1, 2021 at 7:57 AM Michal Hocko  wrote:
> [...]
>>> Then how come this can ever be a problem? in_task() should exclude soft
>>> irq context unless I am mistaken.
>>>
>>
>> If I take the following example of syzbot's deadlock scenario then
>> CPU1 is the one freeing the hugetlb pages. It is in the process
>> context but has disabled softirqs (see __tcp_close()).
>>
>> CPU0CPU1
>> 
>>lock(hugetlb_lock);
>> local_irq_disable();
>> lock(slock-AF_INET);
>> lock(hugetlb_lock);
>>
>>  lock(slock-AF_INET);
>>
> [...]
>>> Wouldn't something like this help? It is quite ugly but it would be
>>> simple enough and backportable while we come up with a more rigorous
>>> solution. What do you think?
>>>
>>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>>> index 4bdb58ab14cb..c9a8b39f678d 100644
>>> --- a/mm/hugetlb.c
>>> +++ b/mm/hugetlb.c
>>> @@ -1495,9 +1495,11 @@ static DECLARE_WORK(free_hpage_work, 
>>> free_hpage_workfn);
>>>  void free_huge_page(struct page *page)
>>>  {
>>> /*
>>> -* Defer freeing if in non-task context to avoid hugetlb_lock 
>>> deadlock.
>>> +* Defer freeing if in non-task context or when put_page is called
>>> +* with IRQ disabled (e.g from via TCP slock dependency chain) to
>>> +* avoid hugetlb_lock deadlock.
>>>  */
>>> -   if (!in_task()) {
>>> +   if (!in_task() || irqs_disabled()) {
>>
>> Does irqs_disabled() also check softirqs?
> 
> Nope it doesn't AFAICS. I was referring to the above lockdep splat which
> claims irq disabled to be the trigger. But now that you are mentioning
> that it would be better to replace in_task() along the way. We have
> discussed that in another email thread and I was suggesting to use
> in_atomic() which should catch also bh disabled situation. The big IF is
> that this needs preempt count to be enabled unconditionally. There are
> changes in the RCU tree heading that direction.

I have not been following developments in preemption and the RCU tree. 
The comment for in_atomic() says:

/*
 * Are we running in atomic context?  WARNING: this macro cannot
 * always detect atomic context; in particular, it cannot know about
 * held spinlocks in non-preemptible kernels.  Thus it should not be
 * used in the general case to determine whether sleeping is possible.
 * Do not use in_atomic() in driver code.
 */

That does seem to be the case.  I verified in_atomic can detect softirq
context even in non-preemptible kernels.  But, as the comment says it
will not detect a held spinlock in non-preemptible kernels.  So, I think
in_atomic would be better than the current check for !in_task.  That
would handle this syzbot issue, but we could still have issues if the
hugetlb put_page path is called while someone is holding a spinlock with
all interrupts enabled.  Looks like there is no way to detect this
today in non-preemptible kernels.  in_atomic does detect spinlocks held
in preemptible kernels.

I might suggest changing !in_task to in_atomic for now, and then work on
a more robust solution.  I'm afraid such a robust solution will
require considerable effort.  It would need to handle put_page being
called in any context: hardirq, softirq, spinlock held ...  The
put_page/free_huge_page path will need to offload (workqueue or
something else) any processing that can possibly sleep.

Is it worth making the in_atomic change now, or should we just start
working on the more robust complete solution?
-- 
Mike Kravetz


Re: possible deadlock in sk_clone_lock

2021-03-02 Thread Michal Hocko
On Tue 02-03-21 06:11:51, Shakeel Butt wrote:
> On Tue, Mar 2, 2021 at 1:44 AM Michal Hocko  wrote:
> >
> > On Mon 01-03-21 17:16:29, Mike Kravetz wrote:
> > > On 3/1/21 9:23 AM, Michal Hocko wrote:
> > > > On Mon 01-03-21 08:39:22, Shakeel Butt wrote:
> > > >> On Mon, Mar 1, 2021 at 7:57 AM Michal Hocko  wrote:
> > > > [...]
> > > >>> Then how come this can ever be a problem? in_task() should exclude 
> > > >>> soft
> > > >>> irq context unless I am mistaken.
> > > >>>
> > > >>
> > > >> If I take the following example of syzbot's deadlock scenario then
> > > >> CPU1 is the one freeing the hugetlb pages. It is in the process
> > > >> context but has disabled softirqs (see __tcp_close()).
> > > >>
> > > >> CPU0CPU1
> > > >> 
> > > >>lock(hugetlb_lock);
> > > >> local_irq_disable();
> > > >> lock(slock-AF_INET);
> > > >> lock(hugetlb_lock);
> > > >>
> > > >>  lock(slock-AF_INET);
> > > >>
[...]
> > Wouldn't something like this help? It is quite ugly but it would be
> > simple enough and backportable while we come up with a more rigorous
> > solution. What do you think?
> >
> > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > index 4bdb58ab14cb..c9a8b39f678d 100644
> > --- a/mm/hugetlb.c
> > +++ b/mm/hugetlb.c
> > @@ -1495,9 +1495,11 @@ static DECLARE_WORK(free_hpage_work, 
> > free_hpage_workfn);
> >  void free_huge_page(struct page *page)
> >  {
> > /*
> > -* Defer freeing if in non-task context to avoid hugetlb_lock 
> > deadlock.
> > +* Defer freeing if in non-task context or when put_page is called
> > +* with IRQ disabled (e.g from via TCP slock dependency chain) to
> > +* avoid hugetlb_lock deadlock.
> >  */
> > -   if (!in_task()) {
> > +   if (!in_task() || irqs_disabled()) {
> 
> Does irqs_disabled() also check softirqs?

Nope it doesn't AFAICS. I was referring to the above lockdep splat which
claims irq disabled to be the trigger. But now that you are mentioning
that it would be better to replace in_task() along the way. We have
discussed that in another email thread and I was suggesting to use
in_atomic() which should catch also bh disabled situation. The big IF is
that this needs preempt count to be enabled unconditionally. There are
changes in the RCU tree heading that direction.
-- 
Michal Hocko
SUSE Labs


Re: possible deadlock in sk_clone_lock

2021-03-02 Thread Shakeel Butt
On Tue, Mar 2, 2021 at 1:44 AM Michal Hocko  wrote:
>
> On Mon 01-03-21 17:16:29, Mike Kravetz wrote:
> > On 3/1/21 9:23 AM, Michal Hocko wrote:
> > > On Mon 01-03-21 08:39:22, Shakeel Butt wrote:
> > >> On Mon, Mar 1, 2021 at 7:57 AM Michal Hocko  wrote:
> > > [...]
> > >>> Then how come this can ever be a problem? in_task() should exclude soft
> > >>> irq context unless I am mistaken.
> > >>>
> > >>
> > >> If I take the following example of syzbot's deadlock scenario then
> > >> CPU1 is the one freeing the hugetlb pages. It is in the process
> > >> context but has disabled softirqs (see __tcp_close()).
> > >>
> > >> CPU0CPU1
> > >> 
> > >>lock(hugetlb_lock);
> > >> local_irq_disable();
> > >> lock(slock-AF_INET);
> > >> lock(hugetlb_lock);
> > >>
> > >>  lock(slock-AF_INET);
> > >>
> > >> So, this deadlock scenario is very much possible.
> > >
> > > OK, I see the point now. I was focusing on the IRQ context and hugetlb
> > > side too much. We do not need to be freeing from there. All it takes is
> > > to get a dependency chain over a common lock held here. Thanks for
> > > bearing with me.
> > >
> > > Let's see whether we can make hugetlb_lock irq safe.
> >
> > I may be confused, but it seems like we have a general problem with
> > calling free_huge_page (as a result of put_page) with interrupts
> > disabled.
> >
> > Consider the current free_huge_page code.  Today, we drop the lock
> > when processing gigantic pages because we may need to block on a mutex
> > in cma code.  If our caller has disabled interrupts, then it doesn't
> > matter if the hugetlb lock is irq safe, when we drop it interrupts will
> > still be disabled we can not block .  Right?  If correct, then making
> > hugetlb_lock irq safe would not help.
> >
> > Again, I may be missing something.
> >
> > Note that we also are considering doing more with the hugetlb lock
> > dropped in this path in the 'free vmemmap of hugetlb pages' series.
> >
> > Since we need to do some work that could block in this path, it seems
> > like we really need to use a workqueue.  It is too bad that there is not
> > an interface to identify all the cases where interrupts are disabled.
>
> Wouldn't something like this help? It is quite ugly but it would be
> simple enough and backportable while we come up with a more rigorous
> solution. What do you think?
>
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 4bdb58ab14cb..c9a8b39f678d 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -1495,9 +1495,11 @@ static DECLARE_WORK(free_hpage_work, 
> free_hpage_workfn);
>  void free_huge_page(struct page *page)
>  {
> /*
> -* Defer freeing if in non-task context to avoid hugetlb_lock 
> deadlock.
> +* Defer freeing if in non-task context or when put_page is called
> +* with IRQ disabled (e.g from via TCP slock dependency chain) to
> +* avoid hugetlb_lock deadlock.
>  */
> -   if (!in_task()) {
> +   if (!in_task() || irqs_disabled()) {

Does irqs_disabled() also check softirqs?


Re: possible deadlock in sk_clone_lock

2021-03-02 Thread Michal Hocko
On Mon 01-03-21 17:16:29, Mike Kravetz wrote:
> On 3/1/21 9:23 AM, Michal Hocko wrote:
> > On Mon 01-03-21 08:39:22, Shakeel Butt wrote:
> >> On Mon, Mar 1, 2021 at 7:57 AM Michal Hocko  wrote:
> > [...]
> >>> Then how come this can ever be a problem? in_task() should exclude soft
> >>> irq context unless I am mistaken.
> >>>
> >>
> >> If I take the following example of syzbot's deadlock scenario then
> >> CPU1 is the one freeing the hugetlb pages. It is in the process
> >> context but has disabled softirqs (see __tcp_close()).
> >>
> >> CPU0CPU1
> >> 
> >>lock(hugetlb_lock);
> >> local_irq_disable();
> >> lock(slock-AF_INET);
> >> lock(hugetlb_lock);
> >>
> >>  lock(slock-AF_INET);
> >>
> >> So, this deadlock scenario is very much possible.
> > 
> > OK, I see the point now. I was focusing on the IRQ context and hugetlb
> > side too much. We do not need to be freeing from there. All it takes is
> > to get a dependency chain over a common lock held here. Thanks for
> > bearing with me.
> > 
> > Let's see whether we can make hugetlb_lock irq safe.
> 
> I may be confused, but it seems like we have a general problem with
> calling free_huge_page (as a result of put_page) with interrupts
> disabled.
> 
> Consider the current free_huge_page code.  Today, we drop the lock
> when processing gigantic pages because we may need to block on a mutex
> in cma code.  If our caller has disabled interrupts, then it doesn't
> matter if the hugetlb lock is irq safe, when we drop it interrupts will
> still be disabled we can not block .  Right?  If correct, then making
> hugetlb_lock irq safe would not help.
> 
> Again, I may be missing something.
> 
> Note that we also are considering doing more with the hugetlb lock
> dropped in this path in the 'free vmemmap of hugetlb pages' series.
> 
> Since we need to do some work that could block in this path, it seems
> like we really need to use a workqueue.  It is too bad that there is not
> an interface to identify all the cases where interrupts are disabled.

Wouldn't something like this help? It is quite ugly but it would be
simple enough and backportable while we come up with a more rigorous 
solution. What do you think?

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 4bdb58ab14cb..c9a8b39f678d 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1495,9 +1495,11 @@ static DECLARE_WORK(free_hpage_work, free_hpage_workfn);
 void free_huge_page(struct page *page)
 {
/*
-* Defer freeing if in non-task context to avoid hugetlb_lock deadlock.
+* Defer freeing if in non-task context or when put_page is called
+* with IRQ disabled (e.g from via TCP slock dependency chain) to
+* avoid hugetlb_lock deadlock.
 */
-   if (!in_task()) {
+   if (!in_task() || irqs_disabled()) {
/*
 * Only call schedule_work() if hpage_freelist is previously
 * empty. Otherwise, schedule_work() had been called but the
-- 
Michal Hocko
SUSE Labs


Re: possible deadlock in sk_clone_lock

2021-03-01 Thread Mike Kravetz
On 3/1/21 9:23 AM, Michal Hocko wrote:
> On Mon 01-03-21 08:39:22, Shakeel Butt wrote:
>> On Mon, Mar 1, 2021 at 7:57 AM Michal Hocko  wrote:
> [...]
>>> Then how come this can ever be a problem? in_task() should exclude soft
>>> irq context unless I am mistaken.
>>>
>>
>> If I take the following example of syzbot's deadlock scenario then
>> CPU1 is the one freeing the hugetlb pages. It is in the process
>> context but has disabled softirqs (see __tcp_close()).
>>
>> CPU0CPU1
>> 
>>lock(hugetlb_lock);
>> local_irq_disable();
>> lock(slock-AF_INET);
>> lock(hugetlb_lock);
>>
>>  lock(slock-AF_INET);
>>
>> So, this deadlock scenario is very much possible.
> 
> OK, I see the point now. I was focusing on the IRQ context and hugetlb
> side too much. We do not need to be freeing from there. All it takes is
> to get a dependency chain over a common lock held here. Thanks for
> bearing with me.
> 
> Let's see whether we can make hugetlb_lock irq safe.

I may be confused, but it seems like we have a general problem with
calling free_huge_page (as a result of put_page) with interrupts
disabled.

Consider the current free_huge_page code.  Today, we drop the lock
when processing gigantic pages because we may need to block on a mutex
in cma code.  If our caller has disabled interrupts, then it doesn't
matter if the hugetlb lock is irq safe, when we drop it interrupts will
still be disabled we can not block .  Right?  If correct, then making
hugetlb_lock irq safe would not help.

Again, I may be missing something.

Note that we also are considering doing more with the hugetlb lock
dropped in this path in the 'free vmemmap of hugetlb pages' series.

Since we need to do some work that could block in this path, it seems
like we really need to use a workqueue.  It is too bad that there is not
an interface to identify all the cases where interrupts are disabled.
-- 
Mike Kravetz


Re: possible deadlock in sk_clone_lock

2021-03-01 Thread Michal Hocko
On Mon 01-03-21 08:39:22, Shakeel Butt wrote:
> On Mon, Mar 1, 2021 at 7:57 AM Michal Hocko  wrote:
[...]
> > Then how come this can ever be a problem? in_task() should exclude soft
> > irq context unless I am mistaken.
> >
> 
> If I take the following example of syzbot's deadlock scenario then
> CPU1 is the one freeing the hugetlb pages. It is in the process
> context but has disabled softirqs (see __tcp_close()).
> 
> CPU0CPU1
> 
>lock(hugetlb_lock);
> local_irq_disable();
> lock(slock-AF_INET);
> lock(hugetlb_lock);
>
>  lock(slock-AF_INET);
> 
> So, this deadlock scenario is very much possible.

OK, I see the point now. I was focusing on the IRQ context and hugetlb
side too much. We do not need to be freeing from there. All it takes is
to get a dependency chain over a common lock held here. Thanks for
bearing with me.

Let's see whether we can make hugetlb_lock irq safe.

-- 
Michal Hocko
SUSE Labs


Re: possible deadlock in sk_clone_lock

2021-03-01 Thread Shakeel Butt
On Mon, Mar 1, 2021 at 7:57 AM Michal Hocko  wrote:
>
> On Mon 01-03-21 07:10:11, Shakeel Butt wrote:
> > On Mon, Mar 1, 2021 at 4:12 AM Michal Hocko  wrote:
> > >
> > > On Fri 26-02-21 16:00:30, Shakeel Butt wrote:
> > > > On Fri, Feb 26, 2021 at 3:14 PM Mike Kravetz  
> > > > wrote:
> > > > >
> > > > > Cc: Michal
> > > > >
> > > > > On 2/26/21 2:44 PM, Shakeel Butt wrote:
> > > > > > On Fri, Feb 26, 2021 at 2:09 PM syzbot
> > > > > >  wrote:
> > > > > 
> > > > > >> other info that might help us debug this:
> > > > > >>
> > > > > >>  Possible interrupt unsafe locking scenario:
> > > > > >>
> > > > > >>CPU0CPU1
> > > > > >>
> > > > > >>   lock(hugetlb_lock);
> > > > > >>local_irq_disable();
> > > > > >>lock(slock-AF_INET);
> > > > > >>lock(hugetlb_lock);
> > > > > >>   
> > > > > >> lock(slock-AF_INET);
> > > > > >>
> > > > > >>  *** DEADLOCK ***
> > > > > >
> > > > > > This has been reproduced on 4.19 stable kernel as well [1] and there
> > > > > > is a reproducer as well.
> > > > > >
> > > > > > It seems like sendmsg(MSG_ZEROCOPY) from a buffer backed by 
> > > > > > hugetlb. I
> > > > > > wonder if we just need to make hugetlb_lock softirq-safe.
> > > > > >
> > > > > > [1] https://syzkaller.appspot.com/bug?extid=6383ce4b0b8ec575ad93
> > > > >
> > > > > Thanks Shakeel,
> > > > >
> > > > > Commit c77c0a8ac4c5 ("mm/hugetlb: defer freeing of huge pages if in 
> > > > > non-task
> > > > > context") attempted to address this issue.  It uses a work queue to
> > > > > acquire hugetlb_lock if the caller is !in_task().
> > > > >
> > > > > In another recent thread, there was the suggestion to change the
> > > > > !in_task to in_atomic.
> > > > >
> > > > > I need to do some research on the subtle differences between in_task,
> > > > > in_atomic, etc.  TBH, I 'thought' !in_task would prevent the issue
> > > > > reported here.  But, that obviously is not the case.
> > > >
> > > > I think the freeing is happening in the process context in this report
> > > > but it is creating the lock chain from softirq-safe slock to
> > > > irq-unsafe hugetlb_lock. So, two solutions I can think of are: (1)
> > > > always defer the freeing of hugetlb pages to a work queue or (2) make
> > > > hugetlb_lock softirq-safe.
> > >
> > > There is __do_softirq so this should be in the soft IRQ context no?
> > > Is this really reproducible with kernels which have c77c0a8ac4c5
> > > applied?
> >
> > Yes this is softirq context and syzbot has reproduced this on
> > linux-next 20210224.
>
> Then how come this can ever be a problem? in_task() should exclude soft
> irq context unless I am mistaken.
>

If I take the following example of syzbot's deadlock scenario then
CPU1 is the one freeing the hugetlb pages. It is in the process
context but has disabled softirqs (see __tcp_close()).

CPU0CPU1

   lock(hugetlb_lock);
local_irq_disable();
lock(slock-AF_INET);
lock(hugetlb_lock);
   
 lock(slock-AF_INET);

So, this deadlock scenario is very much possible.


Re: possible deadlock in sk_clone_lock

2021-03-01 Thread Michal Hocko
On Mon 01-03-21 07:10:11, Shakeel Butt wrote:
> On Mon, Mar 1, 2021 at 4:12 AM Michal Hocko  wrote:
> >
> > On Fri 26-02-21 16:00:30, Shakeel Butt wrote:
> > > On Fri, Feb 26, 2021 at 3:14 PM Mike Kravetz  
> > > wrote:
> > > >
> > > > Cc: Michal
> > > >
> > > > On 2/26/21 2:44 PM, Shakeel Butt wrote:
> > > > > On Fri, Feb 26, 2021 at 2:09 PM syzbot
> > > > >  wrote:
> > > > 
> > > > >> other info that might help us debug this:
> > > > >>
> > > > >>  Possible interrupt unsafe locking scenario:
> > > > >>
> > > > >>CPU0CPU1
> > > > >>
> > > > >>   lock(hugetlb_lock);
> > > > >>local_irq_disable();
> > > > >>lock(slock-AF_INET);
> > > > >>lock(hugetlb_lock);
> > > > >>   
> > > > >> lock(slock-AF_INET);
> > > > >>
> > > > >>  *** DEADLOCK ***
> > > > >
> > > > > This has been reproduced on 4.19 stable kernel as well [1] and there
> > > > > is a reproducer as well.
> > > > >
> > > > > It seems like sendmsg(MSG_ZEROCOPY) from a buffer backed by hugetlb. I
> > > > > wonder if we just need to make hugetlb_lock softirq-safe.
> > > > >
> > > > > [1] https://syzkaller.appspot.com/bug?extid=6383ce4b0b8ec575ad93
> > > >
> > > > Thanks Shakeel,
> > > >
> > > > Commit c77c0a8ac4c5 ("mm/hugetlb: defer freeing of huge pages if in 
> > > > non-task
> > > > context") attempted to address this issue.  It uses a work queue to
> > > > acquire hugetlb_lock if the caller is !in_task().
> > > >
> > > > In another recent thread, there was the suggestion to change the
> > > > !in_task to in_atomic.
> > > >
> > > > I need to do some research on the subtle differences between in_task,
> > > > in_atomic, etc.  TBH, I 'thought' !in_task would prevent the issue
> > > > reported here.  But, that obviously is not the case.
> > >
> > > I think the freeing is happening in the process context in this report
> > > but it is creating the lock chain from softirq-safe slock to
> > > irq-unsafe hugetlb_lock. So, two solutions I can think of are: (1)
> > > always defer the freeing of hugetlb pages to a work queue or (2) make
> > > hugetlb_lock softirq-safe.
> >
> > There is __do_softirq so this should be in the soft IRQ context no?
> > Is this really reproducible with kernels which have c77c0a8ac4c5
> > applied?
> 
> Yes this is softirq context and syzbot has reproduced this on
> linux-next 20210224.

Then how come this can ever be a problem? in_task() should exclude soft
irq context unless I am mistaken.
 
> > Btw. making hugetlb lock irq safe has been already discussed and it
> > seems to be much harder than expected as some heavy operations are done
> > under the lock. This is really bad.
> 
> What about just softirq-safe i.e. spin_[un]lock_bh()? Will it still be that 
> bad?

This would be a similar problem to the irq variant. It would just result
in soft irq being delayed potentially.

> > Postponing the whole freeing
> > operation into a worker context is certainly possible but I would
> > consider it rather unfortunate. We would have to add some sync mechanism
> > to wait for hugetlb pages in flight to prevent from external
> > observability to the userspace. E.g. when shrinking the pool.
> 
> I think in practice recycling of hugetlb pages is a rare event, so we
> might get away without the sync mechanism. How about start postponing
> the freeing without sync mechanism and add it later if there are any
> user reports complaining?

I think this should be a last resort. Maybe we can come up with
something better. E.g. break down the hugetlb_lock and use something
different for expensive operations.
-- 
Michal Hocko
SUSE Labs


Re: possible deadlock in sk_clone_lock

2021-03-01 Thread Shakeel Butt
On Mon, Mar 1, 2021 at 4:12 AM Michal Hocko  wrote:
>
> On Fri 26-02-21 16:00:30, Shakeel Butt wrote:
> > On Fri, Feb 26, 2021 at 3:14 PM Mike Kravetz  
> > wrote:
> > >
> > > Cc: Michal
> > >
> > > On 2/26/21 2:44 PM, Shakeel Butt wrote:
> > > > On Fri, Feb 26, 2021 at 2:09 PM syzbot
> > > >  wrote:
> > > 
> > > >> other info that might help us debug this:
> > > >>
> > > >>  Possible interrupt unsafe locking scenario:
> > > >>
> > > >>CPU0CPU1
> > > >>
> > > >>   lock(hugetlb_lock);
> > > >>local_irq_disable();
> > > >>lock(slock-AF_INET);
> > > >>lock(hugetlb_lock);
> > > >>   
> > > >> lock(slock-AF_INET);
> > > >>
> > > >>  *** DEADLOCK ***
> > > >
> > > > This has been reproduced on 4.19 stable kernel as well [1] and there
> > > > is a reproducer as well.
> > > >
> > > > It seems like sendmsg(MSG_ZEROCOPY) from a buffer backed by hugetlb. I
> > > > wonder if we just need to make hugetlb_lock softirq-safe.
> > > >
> > > > [1] https://syzkaller.appspot.com/bug?extid=6383ce4b0b8ec575ad93
> > >
> > > Thanks Shakeel,
> > >
> > > Commit c77c0a8ac4c5 ("mm/hugetlb: defer freeing of huge pages if in 
> > > non-task
> > > context") attempted to address this issue.  It uses a work queue to
> > > acquire hugetlb_lock if the caller is !in_task().
> > >
> > > In another recent thread, there was the suggestion to change the
> > > !in_task to in_atomic.
> > >
> > > I need to do some research on the subtle differences between in_task,
> > > in_atomic, etc.  TBH, I 'thought' !in_task would prevent the issue
> > > reported here.  But, that obviously is not the case.
> >
> > I think the freeing is happening in the process context in this report
> > but it is creating the lock chain from softirq-safe slock to
> > irq-unsafe hugetlb_lock. So, two solutions I can think of are: (1)
> > always defer the freeing of hugetlb pages to a work queue or (2) make
> > hugetlb_lock softirq-safe.
>
> There is __do_softirq so this should be in the soft IRQ context no?
> Is this really reproducible with kernels which have c77c0a8ac4c5
> applied?

Yes this is softirq context and syzbot has reproduced this on
linux-next 20210224.

>
> Btw. making hugetlb lock irq safe has been already discussed and it
> seems to be much harder than expected as some heavy operations are done
> under the lock. This is really bad.

What about just softirq-safe i.e. spin_[un]lock_bh()? Will it still be that bad?

> Postponing the whole freeing
> operation into a worker context is certainly possible but I would
> consider it rather unfortunate. We would have to add some sync mechanism
> to wait for hugetlb pages in flight to prevent from external
> observability to the userspace. E.g. when shrinking the pool.

I think in practice recycling of hugetlb pages is a rare event, so we
might get away without the sync mechanism. How about start postponing
the freeing without sync mechanism and add it later if there are any
user reports complaining?

> --
> Michal Hocko
> SUSE Labs


Re: possible deadlock in sk_clone_lock

2021-03-01 Thread Michal Hocko
On Fri 26-02-21 16:00:30, Shakeel Butt wrote:
> On Fri, Feb 26, 2021 at 3:14 PM Mike Kravetz  wrote:
> >
> > Cc: Michal
> >
> > On 2/26/21 2:44 PM, Shakeel Butt wrote:
> > > On Fri, Feb 26, 2021 at 2:09 PM syzbot
> > >  wrote:
> > 
> > >> other info that might help us debug this:
> > >>
> > >>  Possible interrupt unsafe locking scenario:
> > >>
> > >>CPU0CPU1
> > >>
> > >>   lock(hugetlb_lock);
> > >>local_irq_disable();
> > >>lock(slock-AF_INET);
> > >>lock(hugetlb_lock);
> > >>   
> > >> lock(slock-AF_INET);
> > >>
> > >>  *** DEADLOCK ***
> > >
> > > This has been reproduced on 4.19 stable kernel as well [1] and there
> > > is a reproducer as well.
> > >
> > > It seems like sendmsg(MSG_ZEROCOPY) from a buffer backed by hugetlb. I
> > > wonder if we just need to make hugetlb_lock softirq-safe.
> > >
> > > [1] https://syzkaller.appspot.com/bug?extid=6383ce4b0b8ec575ad93
> >
> > Thanks Shakeel,
> >
> > Commit c77c0a8ac4c5 ("mm/hugetlb: defer freeing of huge pages if in non-task
> > context") attempted to address this issue.  It uses a work queue to
> > acquire hugetlb_lock if the caller is !in_task().
> >
> > In another recent thread, there was the suggestion to change the
> > !in_task to in_atomic.
> >
> > I need to do some research on the subtle differences between in_task,
> > in_atomic, etc.  TBH, I 'thought' !in_task would prevent the issue
> > reported here.  But, that obviously is not the case.
> 
> I think the freeing is happening in the process context in this report
> but it is creating the lock chain from softirq-safe slock to
> irq-unsafe hugetlb_lock. So, two solutions I can think of are: (1)
> always defer the freeing of hugetlb pages to a work queue or (2) make
> hugetlb_lock softirq-safe.

There is __do_softirq so this should be in the soft IRQ context no?
Is this really reproducible with kernels which have c77c0a8ac4c5
applied?

Btw. making hugetlb lock irq safe has been already discussed and it
seems to be much harder than expected as some heavy operations are done
under the lock. This is really bad. Postponing the whole freeing
operation into a worker context is certainly possible but I would
consider it rather unfortunate. We would have to add some sync mechanism
to wait for hugetlb pages in flight to prevent from external
observability to the userspace. E.g. when shrinking the pool.
-- 
Michal Hocko
SUSE Labs


Re: possible deadlock in sk_clone_lock

2021-02-26 Thread Shakeel Butt
On Fri, Feb 26, 2021 at 3:14 PM Mike Kravetz  wrote:
>
> Cc: Michal
>
> On 2/26/21 2:44 PM, Shakeel Butt wrote:
> > On Fri, Feb 26, 2021 at 2:09 PM syzbot
> >  wrote:
> 
> >> other info that might help us debug this:
> >>
> >>  Possible interrupt unsafe locking scenario:
> >>
> >>CPU0CPU1
> >>
> >>   lock(hugetlb_lock);
> >>local_irq_disable();
> >>lock(slock-AF_INET);
> >>lock(hugetlb_lock);
> >>   
> >> lock(slock-AF_INET);
> >>
> >>  *** DEADLOCK ***
> >
> > This has been reproduced on 4.19 stable kernel as well [1] and there
> > is a reproducer as well.
> >
> > It seems like sendmsg(MSG_ZEROCOPY) from a buffer backed by hugetlb. I
> > wonder if we just need to make hugetlb_lock softirq-safe.
> >
> > [1] https://syzkaller.appspot.com/bug?extid=6383ce4b0b8ec575ad93
>
> Thanks Shakeel,
>
> Commit c77c0a8ac4c5 ("mm/hugetlb: defer freeing of huge pages if in non-task
> context") attempted to address this issue.  It uses a work queue to
> acquire hugetlb_lock if the caller is !in_task().
>
> In another recent thread, there was the suggestion to change the
> !in_task to in_atomic.
>
> I need to do some research on the subtle differences between in_task,
> in_atomic, etc.  TBH, I 'thought' !in_task would prevent the issue
> reported here.  But, that obviously is not the case.

I think the freeing is happening in the process context in this report
but it is creating the lock chain from softirq-safe slock to
irq-unsafe hugetlb_lock. So, two solutions I can think of are: (1)
always defer the freeing of hugetlb pages to a work queue or (2) make
hugetlb_lock softirq-safe.


Re: possible deadlock in sk_clone_lock

2021-02-26 Thread Mike Kravetz
Cc: Michal

On 2/26/21 2:44 PM, Shakeel Butt wrote:
> On Fri, Feb 26, 2021 at 2:09 PM syzbot
>  wrote:

>> other info that might help us debug this:
>>
>>  Possible interrupt unsafe locking scenario:
>>
>>CPU0CPU1
>>
>>   lock(hugetlb_lock);
>>local_irq_disable();
>>lock(slock-AF_INET);
>>lock(hugetlb_lock);
>>   
>> lock(slock-AF_INET);
>>
>>  *** DEADLOCK ***
> 
> This has been reproduced on 4.19 stable kernel as well [1] and there
> is a reproducer as well.
> 
> It seems like sendmsg(MSG_ZEROCOPY) from a buffer backed by hugetlb. I
> wonder if we just need to make hugetlb_lock softirq-safe.
> 
> [1] https://syzkaller.appspot.com/bug?extid=6383ce4b0b8ec575ad93

Thanks Shakeel,

Commit c77c0a8ac4c5 ("mm/hugetlb: defer freeing of huge pages if in non-task
context") attempted to address this issue.  It uses a work queue to
acquire hugetlb_lock if the caller is !in_task().

In another recent thread, there was the suggestion to change the
!in_task to in_atomic.

I need to do some research on the subtle differences between in_task,
in_atomic, etc.  TBH, I 'thought' !in_task would prevent the issue
reported here.  But, that obviously is not the case.
-- 
Mike Kravetz


Re: possible deadlock in sk_clone_lock

2021-02-26 Thread Shakeel Butt
On Fri, Feb 26, 2021 at 2:09 PM syzbot
 wrote:
>
> Hello,
>
> syzbot found the following issue on:
>
> HEAD commit:577c2835 Add linux-next specific files for 20210224
> git tree:   linux-next
> console output: https://syzkaller.appspot.com/x/log.txt?x=137cef82d0
> kernel config:  https://syzkaller.appspot.com/x/.config?x=e9bb3d369b3bf49
> dashboard link: https://syzkaller.appspot.com/bug?extid=506c8a2a115201881d45
>
> Unfortunately, I don't have any reproducer for this issue yet.
>
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+506c8a2a115201881...@syzkaller.appspotmail.com
>
> =
> WARNING: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected
> 5.11.0-next-20210224-syzkaller #0 Not tainted
> -
> syz-executor.3/15411 [HC0[0]:SC0[2]:HE1:SE0] is trying to acquire:
> 8c0a0e18 (hugetlb_lock){+.+.}-{2:2}, at: spin_lock 
> include/linux/spinlock.h:354 [inline]
> 8c0a0e18 (hugetlb_lock){+.+.}-{2:2}, at: __free_huge_page+0x4cd/0xc10 
> mm/hugetlb.c:1390
>
> and this task is already holding:
> 88802391c720 (slock-AF_INET){+.-.}-{2:2}, at: spin_lock 
> include/linux/spinlock.h:354 [inline]
> 88802391c720 (slock-AF_INET){+.-.}-{2:2}, at: __tcp_close+0x6d9/0x1170 
> net/ipv4/tcp.c:2788
> which would create a new lock dependency:
>  (slock-AF_INET){+.-.}-{2:2} -> (hugetlb_lock){+.+.}-{2:2}
>
> but this new dependency connects a SOFTIRQ-irq-safe lock:
>  (slock-AF_INET){+.-.}-{2:2}
>
> ... which became SOFTIRQ-irq-safe at:
>   lock_acquire kernel/locking/lockdep.c:5510 [inline]
>   lock_acquire+0x1ab/0x730 kernel/locking/lockdep.c:5475
>   __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
>   _raw_spin_lock+0x2a/0x40 kernel/locking/spinlock.c:151
>   spin_lock include/linux/spinlock.h:354 [inline]
>   sk_clone_lock+0x296/0x1070 net/core/sock.c:1913
>   inet_csk_clone_lock+0x21/0x4c0 net/ipv4/inet_connection_sock.c:830
>   tcp_create_openreq_child+0x30/0x16d0 net/ipv4/tcp_minisocks.c:460
>   tcp_v4_syn_recv_sock+0x10c/0x1460 net/ipv4/tcp_ipv4.c:1526
>   tcp_check_req+0x616/0x1860 net/ipv4/tcp_minisocks.c:772
>   tcp_v4_rcv+0x221a/0x3780 net/ipv4/tcp_ipv4.c:2001
>   ip_protocol_deliver_rcu+0x5c/0x8a0 net/ipv4/ip_input.c:204
>   ip_local_deliver_finish+0x20a/0x370 net/ipv4/ip_input.c:231
>   NF_HOOK include/linux/netfilter.h:301 [inline]
>   NF_HOOK include/linux/netfilter.h:295 [inline]
>   ip_local_deliver+0x1b3/0x200 net/ipv4/ip_input.c:252
>   dst_input include/net/dst.h:458 [inline]
>   ip_sublist_rcv_finish+0x9a/0x2c0 net/ipv4/ip_input.c:551
>   ip_list_rcv_finish.constprop.0+0x514/0x6e0 net/ipv4/ip_input.c:601
>   ip_sublist_rcv net/ipv4/ip_input.c:609 [inline]
>   ip_list_rcv+0x34e/0x490 net/ipv4/ip_input.c:644
>   __netif_receive_skb_list_ptype net/core/dev.c:5408 [inline]
>   __netif_receive_skb_list_core+0x549/0x8e0 net/core/dev.c:5456
>   __netif_receive_skb_list net/core/dev.c:5508 [inline]
>   netif_receive_skb_list_internal+0x777/0xd70 net/core/dev.c:5618
>   gro_normal_list net/core/dev.c:5772 [inline]
>   gro_normal_list net/core/dev.c:5768 [inline]
>   napi_complete_done+0x1f1/0x820 net/core/dev.c:6474
>   virtqueue_napi_complete+0x2c/0xc0 drivers/net/virtio_net.c:334
>   virtnet_poll+0xae2/0xd90 drivers/net/virtio_net.c:1455
>   __napi_poll+0xaf/0x440 net/core/dev.c:6892
>   napi_poll net/core/dev.c:6959 [inline]
>   net_rx_action+0x801/0xb40 net/core/dev.c:7036
>   __do_softirq+0x29b/0x9f6 kernel/softirq.c:345
>   invoke_softirq kernel/softirq.c:221 [inline]
>   __irq_exit_rcu kernel/softirq.c:422 [inline]
>   irq_exit_rcu+0x134/0x200 kernel/softirq.c:434
>   common_interrupt+0xa4/0xd0 arch/x86/kernel/irq.c:240
>   asm_common_interrupt+0x1e/0x40 arch/x86/include/asm/idtentry.h:623
>   tomoyo_domain_quota_is_ok+0x2f1/0x550 security/tomoyo/util.c:1093
>   tomoyo_supervisor+0x2f2/0xf00 security/tomoyo/common.c:2089
>   tomoyo_audit_path_log security/tomoyo/file.c:168 [inline]
>   tomoyo_path_permission security/tomoyo/file.c:587 [inline]
>   tomoyo_path_permission+0x270/0x3a0 security/tomoyo/file.c:573
>   tomoyo_path_perm+0x39e/0x400 security/tomoyo/file.c:838
>   tomoyo_path_symlink+0x94/0xe0 security/tomoyo/tomoyo.c:200
>   security_path_symlink+0xdf/0x150 security/security.c:1119
>   do_symlinkat+0x123/0x300 fs/namei.c:4201
>   do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
>   entry_SYSCALL_64_after_hwframe+0x44/0xae
>
> to a SOFTIRQ-irq-unsafe lock:
>  (hugetlb_lock){+.+.}-{2:2}
>
> ... which became SOFTIRQ-irq-unsafe at:
> ...
>   lock_acquire kernel/locking/lockdep.c:5510 [inline]
>   lock_acquire+0x1ab/0x730 kernel/locking/lockdep.c:5475
>   __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
>   _raw_spin_lock+0x2a/0x40 kernel/locking/spinlock.c:151
>   spin_lock include/linux/spinlock.h:354 [inline]
>   hugetlb_overcommit_handler+0x260/0x3e0 mm/hugetlb.c:3448
>   

possible deadlock in sk_clone_lock

2021-02-26 Thread syzbot
Hello,

syzbot found the following issue on:

HEAD commit:577c2835 Add linux-next specific files for 20210224
git tree:   linux-next
console output: https://syzkaller.appspot.com/x/log.txt?x=137cef82d0
kernel config:  https://syzkaller.appspot.com/x/.config?x=e9bb3d369b3bf49
dashboard link: https://syzkaller.appspot.com/bug?extid=506c8a2a115201881d45

Unfortunately, I don't have any reproducer for this issue yet.

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+506c8a2a115201881...@syzkaller.appspotmail.com

=
WARNING: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected
5.11.0-next-20210224-syzkaller #0 Not tainted
-
syz-executor.3/15411 [HC0[0]:SC0[2]:HE1:SE0] is trying to acquire:
8c0a0e18 (hugetlb_lock){+.+.}-{2:2}, at: spin_lock 
include/linux/spinlock.h:354 [inline]
8c0a0e18 (hugetlb_lock){+.+.}-{2:2}, at: __free_huge_page+0x4cd/0xc10 
mm/hugetlb.c:1390

and this task is already holding:
88802391c720 (slock-AF_INET){+.-.}-{2:2}, at: spin_lock 
include/linux/spinlock.h:354 [inline]
88802391c720 (slock-AF_INET){+.-.}-{2:2}, at: __tcp_close+0x6d9/0x1170 
net/ipv4/tcp.c:2788
which would create a new lock dependency:
 (slock-AF_INET){+.-.}-{2:2} -> (hugetlb_lock){+.+.}-{2:2}

but this new dependency connects a SOFTIRQ-irq-safe lock:
 (slock-AF_INET){+.-.}-{2:2}

... which became SOFTIRQ-irq-safe at:
  lock_acquire kernel/locking/lockdep.c:5510 [inline]
  lock_acquire+0x1ab/0x730 kernel/locking/lockdep.c:5475
  __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
  _raw_spin_lock+0x2a/0x40 kernel/locking/spinlock.c:151
  spin_lock include/linux/spinlock.h:354 [inline]
  sk_clone_lock+0x296/0x1070 net/core/sock.c:1913
  inet_csk_clone_lock+0x21/0x4c0 net/ipv4/inet_connection_sock.c:830
  tcp_create_openreq_child+0x30/0x16d0 net/ipv4/tcp_minisocks.c:460
  tcp_v4_syn_recv_sock+0x10c/0x1460 net/ipv4/tcp_ipv4.c:1526
  tcp_check_req+0x616/0x1860 net/ipv4/tcp_minisocks.c:772
  tcp_v4_rcv+0x221a/0x3780 net/ipv4/tcp_ipv4.c:2001
  ip_protocol_deliver_rcu+0x5c/0x8a0 net/ipv4/ip_input.c:204
  ip_local_deliver_finish+0x20a/0x370 net/ipv4/ip_input.c:231
  NF_HOOK include/linux/netfilter.h:301 [inline]
  NF_HOOK include/linux/netfilter.h:295 [inline]
  ip_local_deliver+0x1b3/0x200 net/ipv4/ip_input.c:252
  dst_input include/net/dst.h:458 [inline]
  ip_sublist_rcv_finish+0x9a/0x2c0 net/ipv4/ip_input.c:551
  ip_list_rcv_finish.constprop.0+0x514/0x6e0 net/ipv4/ip_input.c:601
  ip_sublist_rcv net/ipv4/ip_input.c:609 [inline]
  ip_list_rcv+0x34e/0x490 net/ipv4/ip_input.c:644
  __netif_receive_skb_list_ptype net/core/dev.c:5408 [inline]
  __netif_receive_skb_list_core+0x549/0x8e0 net/core/dev.c:5456
  __netif_receive_skb_list net/core/dev.c:5508 [inline]
  netif_receive_skb_list_internal+0x777/0xd70 net/core/dev.c:5618
  gro_normal_list net/core/dev.c:5772 [inline]
  gro_normal_list net/core/dev.c:5768 [inline]
  napi_complete_done+0x1f1/0x820 net/core/dev.c:6474
  virtqueue_napi_complete+0x2c/0xc0 drivers/net/virtio_net.c:334
  virtnet_poll+0xae2/0xd90 drivers/net/virtio_net.c:1455
  __napi_poll+0xaf/0x440 net/core/dev.c:6892
  napi_poll net/core/dev.c:6959 [inline]
  net_rx_action+0x801/0xb40 net/core/dev.c:7036
  __do_softirq+0x29b/0x9f6 kernel/softirq.c:345
  invoke_softirq kernel/softirq.c:221 [inline]
  __irq_exit_rcu kernel/softirq.c:422 [inline]
  irq_exit_rcu+0x134/0x200 kernel/softirq.c:434
  common_interrupt+0xa4/0xd0 arch/x86/kernel/irq.c:240
  asm_common_interrupt+0x1e/0x40 arch/x86/include/asm/idtentry.h:623
  tomoyo_domain_quota_is_ok+0x2f1/0x550 security/tomoyo/util.c:1093
  tomoyo_supervisor+0x2f2/0xf00 security/tomoyo/common.c:2089
  tomoyo_audit_path_log security/tomoyo/file.c:168 [inline]
  tomoyo_path_permission security/tomoyo/file.c:587 [inline]
  tomoyo_path_permission+0x270/0x3a0 security/tomoyo/file.c:573
  tomoyo_path_perm+0x39e/0x400 security/tomoyo/file.c:838
  tomoyo_path_symlink+0x94/0xe0 security/tomoyo/tomoyo.c:200
  security_path_symlink+0xdf/0x150 security/security.c:1119
  do_symlinkat+0x123/0x300 fs/namei.c:4201
  do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
  entry_SYSCALL_64_after_hwframe+0x44/0xae

to a SOFTIRQ-irq-unsafe lock:
 (hugetlb_lock){+.+.}-{2:2}

... which became SOFTIRQ-irq-unsafe at:
...
  lock_acquire kernel/locking/lockdep.c:5510 [inline]
  lock_acquire+0x1ab/0x730 kernel/locking/lockdep.c:5475
  __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
  _raw_spin_lock+0x2a/0x40 kernel/locking/spinlock.c:151
  spin_lock include/linux/spinlock.h:354 [inline]
  hugetlb_overcommit_handler+0x260/0x3e0 mm/hugetlb.c:3448
  proc_sys_call_handler+0x336/0x610 fs/proc/proc_sysctl.c:591
  call_write_iter include/linux/fs.h:1977 [inline]
  new_sync_write+0x426/0x650 fs/read_write.c:519
  vfs_write+0x796/0xa30 fs/read_write.c:606
  ksys_write+0x12d/0x250 fs/read_write.c:659