[Bug 1733662] Re: [REGRESSION][v4.14.y][v4.15] x86/intel_rdt/cqm: Improve limbo list processing

2018-01-17 Thread Joseph Salisbury
On 01/17/2018 05:55 PM, Thomas Gleixner wrote:
> On Wed, 17 Jan 2018, Joseph Salisbury wrote:
>> On 01/16/2018 01:59 PM, Thomas Gleixner wrote:
>>
>> Testing of your patch shows that your patch resolves the bug.  Thanks
>> for the assistance!  Is this something you could submit to mainline?
> Already there :)
>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d47924417319e3b6a728c0b690f183e75bc2a702
>
> Tagged for stable.
>
> Thanks,
>
>   tglx

Thanks so much!

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1733662

Title:
  System hang with Linux kernel due to mainline commit 24247aeeabe

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1733662/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1733662] Re: [REGRESSION][v4.14.y][v4.15] x86/intel_rdt/cqm: Improve limbo list processing

2018-01-17 Thread tglx
On Wed, 17 Jan 2018, Joseph Salisbury wrote:
> On 01/16/2018 01:59 PM, Thomas Gleixner wrote:
> 
> Testing of your patch shows that your patch resolves the bug.  Thanks
> for the assistance!  Is this something you could submit to mainline?

Already there :)

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d47924417319e3b6a728c0b690f183e75bc2a702

Tagged for stable.

Thanks,

tglx

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1733662

Title:
  System hang with Linux kernel due to mainline commit 24247aeeabe

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1733662/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1733662] Re: [REGRESSION][v4.14.y][v4.15] x86/intel_rdt/cqm: Improve limbo list processing

2018-01-17 Thread Joseph Salisbury
On 01/16/2018 01:59 PM, Thomas Gleixner wrote:
> On Tue, 16 Jan 2018, Yu, Fenghua wrote:
>>> From: Thomas Gleixner [mailto:t...@linutronix.de]
>> Is this a Haswell specific issue?
>>
>> I run the following test forever without issue on Broadwell and 4.15.0-rc6 
>> with rdt mounted:
>> for ((;;)) do
>> for ((i=1;i<88;i++)) do
>> echo 0 >/sys/devices/system/cpu/cpu$i/online
>> done
>> echo "online cpus:"
>> grep processor /proc/cpuinfo |wc
>> for ((i=1;i<88;i++)) do
>> echo 1 >/sys/devices/system/cpu/cpu$i/online
>> done
>> echo "online cpus:"
>> grep processor /proc/cpuinfo|wc
>> done
>>
>> I'm finding a Haswell to reproduce the issue.
> Come on. This is crystal clear from the KASAN trace. And the fix is simple 
> enough.
>
> You simply do not run into it because on your machine
>
> is_llc_occupancy_enabled() is false...
>
> Thanks,
>
>   tglx
>   
> 8<
>
> diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c
> index 88dcf8479013..99442370de40 100644
> --- a/arch/x86/kernel/cpu/intel_rdt.c
> +++ b/arch/x86/kernel/cpu/intel_rdt.c
> @@ -525,10 +525,6 @@ static void domain_remove_cpu(int cpu, struct 
> rdt_resource *r)
>*/
>   if (static_branch_unlikely(_mon_enable_key))
>   rmdir_mondata_subdir_allrdtgrp(r, d->id);
> - kfree(d->ctrl_val);
> - kfree(d->rmid_busy_llc);
> - kfree(d->mbm_total);
> - kfree(d->mbm_local);
>   list_del(>list);
>   if (is_mbm_enabled())
>   cancel_delayed_work(>mbm_over);
> @@ -545,6 +541,10 @@ static void domain_remove_cpu(int cpu, struct 
> rdt_resource *r)
>   cancel_delayed_work(>cqm_limbo);
>   }
>  
> + kfree(d->ctrl_val);
> + kfree(d->rmid_busy_llc);
> + kfree(d->mbm_total);
> + kfree(d->mbm_local);
>   kfree(d);
>   return;
>   }

Hi Thomas,

Testing of your patch shows that your patch resolves the bug.  Thanks
for the assistance!  Is this something you could submit to mainline?

Thanks,


Joe

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1733662

Title:
  System hang with Linux kernel due to mainline commit 24247aeeabe

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1733662/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1733662] Re: [REGRESSION][v4.14.y][v4.15] x86/intel_rdt/cqm: Improve limbo list processing

2018-01-17 Thread Joseph Salisbury
On 01/16/2018 01:59 PM, Thomas Gleixner wrote:
> On Tue, 16 Jan 2018, Yu, Fenghua wrote:
>>> From: Thomas Gleixner [mailto:t...@linutronix.de]
>> Is this a Haswell specific issue?
>>
>> I run the following test forever without issue on Broadwell and 4.15.0-rc6 
>> with rdt mounted:
>> for ((;;)) do
>> for ((i=1;i<88;i++)) do
>> echo 0 >/sys/devices/system/cpu/cpu$i/online
>> done
>> echo "online cpus:"
>> grep processor /proc/cpuinfo |wc
>> for ((i=1;i<88;i++)) do
>> echo 1 >/sys/devices/system/cpu/cpu$i/online
>> done
>> echo "online cpus:"
>> grep processor /proc/cpuinfo|wc
>> done
>>
>> I'm finding a Haswell to reproduce the issue.
> Come on. This is crystal clear from the KASAN trace. And the fix is simple 
> enough.
>
> You simply do not run into it because on your machine
>
> is_llc_occupancy_enabled() is false...
>
> Thanks,
>
>   tglx
>   
> 8<
>
> diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c
> index 88dcf8479013..99442370de40 100644
> --- a/arch/x86/kernel/cpu/intel_rdt.c
> +++ b/arch/x86/kernel/cpu/intel_rdt.c
> @@ -525,10 +525,6 @@ static void domain_remove_cpu(int cpu, struct 
> rdt_resource *r)
>*/
>   if (static_branch_unlikely(_mon_enable_key))
>   rmdir_mondata_subdir_allrdtgrp(r, d->id);
> - kfree(d->ctrl_val);
> - kfree(d->rmid_busy_llc);
> - kfree(d->mbm_total);
> - kfree(d->mbm_local);
>   list_del(>list);
>   if (is_mbm_enabled())
>   cancel_delayed_work(>mbm_over);
> @@ -545,6 +541,10 @@ static void domain_remove_cpu(int cpu, struct 
> rdt_resource *r)
>   cancel_delayed_work(>cqm_limbo);
>   }
>  
> + kfree(d->ctrl_val);
> + kfree(d->rmid_busy_llc);
> + kfree(d->mbm_total);
> + kfree(d->mbm_local);
>   kfree(d);
>   return;
>   }

Thanks, Thomas.  I'll build some test kernels and have your patch tested
out.


Thanks,


Joe

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1733662

Title:
  System hang with Linux kernel due to mainline commit 24247aeeabe

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1733662/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1733662] RE: [REGRESSION][v4.14.y][v4.15] x86/intel_rdt/cqm: Improve limbo list processing

2018-01-16 Thread tglx
On Tue, 16 Jan 2018, Yu, Fenghua wrote:
> > From: Thomas Gleixner [mailto:t...@linutronix.de]
> Is this a Haswell specific issue?
> 
> I run the following test forever without issue on Broadwell and 4.15.0-rc6 
> with rdt mounted:
> for ((;;)) do
> for ((i=1;i<88;i++)) do
> echo 0 >/sys/devices/system/cpu/cpu$i/online
> done
> echo "online cpus:"
> grep processor /proc/cpuinfo |wc
> for ((i=1;i<88;i++)) do
> echo 1 >/sys/devices/system/cpu/cpu$i/online
> done
> echo "online cpus:"
> grep processor /proc/cpuinfo|wc
> done
> 
> I'm finding a Haswell to reproduce the issue.

Come on. This is crystal clear from the KASAN trace. And the fix is
simple enough.

You simply do not run into it because on your machine

is_llc_occupancy_enabled() is false...

Thanks,

tglx

8<  

diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c
index 88dcf8479013..99442370de40 100644
--- a/arch/x86/kernel/cpu/intel_rdt.c
+++ b/arch/x86/kernel/cpu/intel_rdt.c
@@ -525,10 +525,6 @@ static void domain_remove_cpu(int cpu, struct rdt_resource 
*r)
 */
if (static_branch_unlikely(_mon_enable_key))
rmdir_mondata_subdir_allrdtgrp(r, d->id);
-   kfree(d->ctrl_val);
-   kfree(d->rmid_busy_llc);
-   kfree(d->mbm_total);
-   kfree(d->mbm_local);
list_del(>list);
if (is_mbm_enabled())
cancel_delayed_work(>mbm_over);
@@ -545,6 +541,10 @@ static void domain_remove_cpu(int cpu, struct rdt_resource 
*r)
cancel_delayed_work(>cqm_limbo);
}
 
+   kfree(d->ctrl_val);
+   kfree(d->rmid_busy_llc);
+   kfree(d->mbm_total);
+   kfree(d->mbm_local);
kfree(d);
return;
}

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1733662

Title:
  System hang with Linux kernel due to mainline commit 24247aeeabe

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1733662/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1733662] RE: [REGRESSION][v4.14.y][v4.15] x86/intel_rdt/cqm: Improve limbo list processing

2018-01-16 Thread Fenghua Yu
> From: Thomas Gleixner [mailto:t...@linutronix.de]
> On Tue, 16 Jan 2018, Joseph Salisbury wrote:
> > On 01/16/2018 08:32 AM, Shankar, Ravi V wrote:
> > > Vikas on vacation until end of the month. Fenghua will look into
> > > this issue.
> > >
> > > On Jan 16, 2018, at 5:09 AM, Thomas Gleixner  > > > wrote:
> > >
> > >>
> > >> Vikas, Fenghua can you please look at that ASAP?
> > >>
> > >> On Sun, 14 Jan 2018, Thomas Gleixner wrote:
> > >>
> > >>> On Fri, 12 Jan 2018, Joseph Salisbury wrote:
> > >>>
> >  Hi Vikas,
> > 
> >  A kernel bug report was opened against Ubuntu [0].  After a
> >  kernel bisect, it was found that reverting the following commit
> >  resolved this bug:
> > 
> >  commit 24247aeeabe99eab13b798c2dec066dd6f07
> >  Author: Vikas Shivappa  >  >
> >  Date:   Tue Aug 15 18:00:43 2017 -0700
> > 
> >      x86/intel_rdt/cqm: Improve limbo list processing
> > 
> > 
> >  The regression was introduced as of v4.14-r1 and still exists
> >  with current mainline.  The trace with v4.15-rc7 is in comment #44[1].
> > 
> >  I was hoping to get your feedback, since you are the patch
> >  author.  Do you think gathering any additional data will help
> >  diagnose this issue, or would it be best to submit a revert request?
> > >>>
> > >>> That stinks like a use after free. Can you run with KASAN enabled?
> > >>>
> > >>> Thanks,
> > >>>
> > >>>    tglx
> >
> >
> > Here is some data wiht KASAN enabled:
> > https://bugs.launchpad.net/ubuntu/+source/linux-
> hwe/+bug/1733662/comme
> > nts/51
> >
> > Are there any specific logs you would like to see, or specific actions
> > executed?
> 
> No, the KASAN output is pretty clear where the issue is.
> 
> Thanks,
> 
>   tglx

Is this a Haswell specific issue?

I run the following test forever without issue on Broadwell and 4.15.0-rc6 with 
rdt mounted:
for ((;;)) do
for ((i=1;i<88;i++)) do
echo 0 >/sys/devices/system/cpu/cpu$i/online
done
echo "online cpus:"
grep processor /proc/cpuinfo |wc
for ((i=1;i<88;i++)) do
echo 1 >/sys/devices/system/cpu/cpu$i/online
done
echo "online cpus:"
grep processor /proc/cpuinfo|wc
done

I'm finding a Haswell to reproduce the issue.

Thanks.

-Fenghua

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1733662

Title:
  System hang with Linux kernel due to mainline commit 24247aeeabe

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1733662/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1733662] Re: [REGRESSION][v4.14.y][v4.15] x86/intel_rdt/cqm: Improve limbo list processing

2018-01-16 Thread tglx
On Tue, 16 Jan 2018, Joseph Salisbury wrote:
> On 01/16/2018 08:32 AM, Shankar, Ravi V wrote:
> > Vikas on vacation until end of the month. Fenghua will look into this
> > issue.
> >
> > On Jan 16, 2018, at 5:09 AM, Thomas Gleixner  > > wrote:
> >
> >>
> >> Vikas, Fenghua can you please look at that ASAP?
> >>
> >> On Sun, 14 Jan 2018, Thomas Gleixner wrote:
> >>
> >>> On Fri, 12 Jan 2018, Joseph Salisbury wrote:
> >>>
>  Hi Vikas,
> 
>  A kernel bug report was opened against Ubuntu [0].  After a kernel
>  bisect, it was found that reverting the following commit resolved
>  this bug:
> 
>  commit 24247aeeabe99eab13b798c2dec066dd6f07
>  Author: Vikas Shivappa   >
>  Date:   Tue Aug 15 18:00:43 2017 -0700
> 
>      x86/intel_rdt/cqm: Improve limbo list processing
> 
> 
>  The regression was introduced as of v4.14-r1 and still exists with
>  current mainline.  The trace with v4.15-rc7 is in comment #44[1].
> 
>  I was hoping to get your feedback, since you are the patch author.  Do
>  you think gathering any additional data will help diagnose this issue,
>  or would it be best to submit a revert request?
> >>>
> >>> That stinks like a use after free. Can you run with KASAN enabled?
> >>>
> >>> Thanks,
> >>>
> >>>    tglx
> 
> 
> Here is some data wiht KASAN enabled:
> https://bugs.launchpad.net/ubuntu/+source/linux-hwe/+bug/1733662/comments/51
> 
> Are there any specific logs you would like to see, or specific actions
> executed?

No, the KASAN output is pretty clear where the issue is.

Thanks,

tglx

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1733662

Title:
  System hang with Linux kernel due to mainline commit 24247aeeabe

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1733662/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1733662] Re: [REGRESSION][v4.14.y][v4.15] x86/intel_rdt/cqm: Improve limbo list processing

2018-01-16 Thread Joseph Salisbury
On 01/16/2018 08:32 AM, Shankar, Ravi V wrote:
> Vikas on vacation until end of the month. Fenghua will look into this
> issue.
>
> On Jan 16, 2018, at 5:09 AM, Thomas Gleixner  > wrote:
>
>>
>> Vikas, Fenghua can you please look at that ASAP?
>>
>> On Sun, 14 Jan 2018, Thomas Gleixner wrote:
>>
>>> On Fri, 12 Jan 2018, Joseph Salisbury wrote:
>>>
 Hi Vikas,

 A kernel bug report was opened against Ubuntu [0].  After a kernel
 bisect, it was found that reverting the following commit resolved
 this bug:

 commit 24247aeeabe99eab13b798c2dec066dd6f07
 Author: Vikas Shivappa >
 Date:   Tue Aug 15 18:00:43 2017 -0700

     x86/intel_rdt/cqm: Improve limbo list processing


 The regression was introduced as of v4.14-r1 and still exists with
 current mainline.  The trace with v4.15-rc7 is in comment #44[1].

 I was hoping to get your feedback, since you are the patch author.  Do
 you think gathering any additional data will help diagnose this issue,
 or would it be best to submit a revert request?
>>>
>>> That stinks like a use after free. Can you run with KASAN enabled?
>>>
>>> Thanks,
>>>
>>>    tglx


Here is some data wiht KASAN enabled:
https://bugs.launchpad.net/ubuntu/+source/linux-hwe/+bug/1733662/comments/51

Are there any specific logs you would like to see, or specific actions
executed?

Thanks,

Joe

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1733662

Title:
  System hang with Linux kernel due to mainline commit 24247aeeabe

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1733662/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1733662] Re: [REGRESSION][v4.14.y][v4.15] x86/intel_rdt/cqm: Improve limbo list processing

2018-01-16 Thread tglx
Vikas, Fenghua can you please look at that ASAP?

On Sun, 14 Jan 2018, Thomas Gleixner wrote:

> On Fri, 12 Jan 2018, Joseph Salisbury wrote:
> 
> > Hi Vikas,
> > 
> > A kernel bug report was opened against Ubuntu [0].  After a kernel
> > bisect, it was found that reverting the following commit resolved this bug:
> > 
> > commit 24247aeeabe99eab13b798c2dec066dd6f07
> > Author: Vikas Shivappa 
> > Date:   Tue Aug 15 18:00:43 2017 -0700
> > 
> >     x86/intel_rdt/cqm: Improve limbo list processing
> > 
> > 
> > The regression was introduced as of v4.14-r1 and still exists with
> > current mainline.  The trace with v4.15-rc7 is in comment #44[1].
> > 
> > I was hoping to get your feedback, since you are the patch author.  Do
> > you think gathering any additional data will help diagnose this issue,
> > or would it be best to submit a revert request?
> 
> That stinks like a use after free. Can you run with KASAN enabled?
> 
> Thanks,
> 
>   tglx

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1733662

Title:
  System hang with Linux kernel due to mainline commit 24247aeeabe

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1733662/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1733662] Re: [REGRESSION][v4.14.y][v4.15] x86/intel_rdt/cqm: Improve limbo list processing

2018-01-14 Thread tglx
On Fri, 12 Jan 2018, Joseph Salisbury wrote:

> Hi Vikas,
> 
> A kernel bug report was opened against Ubuntu [0].  After a kernel
> bisect, it was found that reverting the following commit resolved this bug:
> 
> commit 24247aeeabe99eab13b798c2dec066dd6f07
> Author: Vikas Shivappa 
> Date:   Tue Aug 15 18:00:43 2017 -0700
> 
>     x86/intel_rdt/cqm: Improve limbo list processing
> 
> 
> The regression was introduced as of v4.14-r1 and still exists with
> current mainline.  The trace with v4.15-rc7 is in comment #44[1].
> 
> I was hoping to get your feedback, since you are the patch author.  Do
> you think gathering any additional data will help diagnose this issue,
> or would it be best to submit a revert request?

That stinks like a use after free. Can you run with KASAN enabled?

Thanks,

tglx

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1733662

Title:
  System hang with Linux kernel due to mainline commit 24247aeeabe

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1733662/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs