[Kernel-packages] [Bug 1690085]

2018-12-31 Thread cevelnet
Hi,

I have an ASUS A320M-K with latest BIOS 4027 (AGESA 1.0.0.6), Ryzen 7
2700, Patriot Viper RGB 16GB RAM 3200CL16, Samsung 830 SSD 128GB,
Corsair AX860 860W PSU (80+ Platinum) and ASUS ROG Strix RX480 8GB.
Running Arch Linux w/ Kernel 4.19.12 and GNOME 3.30 on Wayland. BIOS
configured to Defaults + XMP Profile activated + CSM disabled (-> UEFI-
only mode).

I have experienced 2 hangs with this system, both within 1 hour uptime.
However, I couldn't spot anything in the logs. There were no watchdog
entries, unlike previously posted in this bug report.

The "system lockup" expresses itself by blanking my screen to dark and
showing a blinking cursor in the top left corner. No other symbols /
information. I can't interact in any way with my system - it ignores all
inputs; can't switch console, doesn't react to single power button
press, nothing. Only way to shutdown is holding power button for 4s.

I entered the BIOS after the lockups and noticed *VERY* slow performance
within BIOS. Keypresses took very long to register (like 5+ seconds)
until BIOS eventually hung. After another powercycle, all was good
again. I wonder, if rebooting using the reset button could give us more
clues, if the system continues to run "buggy" after the reset.

After setting "Typical Current Idle", I didn't had a single lockup ever
since, even after 48h uptime, mostly idling. System runs perfectly good.
This fix is fine for desktops, but laptops suffer from higher power
draw, if they're able to set this.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1690085

Title:
  Ryzen 1800X freeze - rcu_sched detected stalls on CPUs/tasks

Status in Linux:
  Confirmed
Status in linux package in Ubuntu:
  Confirmed

Bug description:
  Hi,

  
  We aregetting various kernel crash on a pretty new config.
  We're using Ryzen 1800X CPU with X370 Gaming Pro Carbon MB (7A32V1) using 
latest BIOS available (1.52)

  We are running Ubuntu 17.04 (amd64), we've tried different kernel version, 
native one and releases from http://kernel.ubuntu.com/~kernel-ppa/mainline/ too.
  Tested kernel version:

  native 17.04 kernel
  4.10.15

  Issues are the same, we're getting random freeze on the machine.

  Here is kern.log entry when happening :

  May 10 22:41:56 dev2 kernel: [24366.186246] INFO: rcu_sched detected stalls 
on CPUs/tasks:
  May 10 22:41:56 dev2 kernel: [24366.187618] 0-...: (1 GPs behind) 
idle=49b/1/0 softirq=28561/28563 fqs=913449
  May 10 22:41:56 dev2 kernel: [24366.188977] (detected by 12, t=1860207 
jiffies, g=10001, c=1, q=4656)
  May 10 22:41:56 dev2 kernel: [24366.190344] Task dump for CPU 0:
  May 10 22:41:56 dev2 kernel: [24366.190345] swapper/0   R  running task   
 0 0  0 0x0008
  May 10 22:41:56 dev2 kernel: [24366.190348] Call Trace:
  May 10 22:41:56 dev2 kernel: [24366.190354]  ? native_safe_halt+0x6/0x10
  May 10 22:41:56 dev2 kernel: [24366.190355]  ? default_idle+0x20/0xd0
  May 10 22:41:56 dev2 kernel: [24366.190358]  ? arch_cpu_idle+0xf/0x20
  May 10 22:41:56 dev2 kernel: [24366.190360]  ? default_idle_call+0x23/0x30
  May 10 22:41:56 dev2 kernel: [24366.190362]  ? do_idle+0x16f/0x200
  May 10 22:41:56 dev2 kernel: [24366.190364]  ? cpu_startup_entry+0x71/0x80
  May 10 22:41:56 dev2 kernel: [24366.190366]  ? rest_init+0x77/0x80
  May 10 22:41:56 dev2 kernel: [24366.190368]  ? start_kernel+0x464/0x485
  May 10 22:41:56 dev2 kernel: [24366.190369]  ? 
early_idt_handler_array+0x120/0x120
  May 10 22:41:56 dev2 kernel: [24366.190371]  ? 
x86_64_start_reservations+0x24/0x26
  May 10 22:41:56 dev2 kernel: [24366.190372]  ? x86_64_start_kernel+0x14d/0x170
  May 10 22:41:56 dev2 kernel: [24366.190373]  ? start_cpu+0x14/0x14
  May 10 22:44:56 dev2 kernel: [24546.188093] INFO: rcu_sched detected stalls 
on CPUs/tasks:
  May 10 22:44:56 dev2 kernel: [24546.189461] 0-...: (1 GPs behind) 
idle=49b/1/0 softirq=28561/28563 fqs=935027
  May 10 22:44:56 dev2 kernel: [24546.190823] (detected by 14, t=1905212 
jiffies, g=10001, c=1, q=4740)
  May 10 22:44:56 dev2 kernel: [24546.192191] Task dump for CPU 0:
  May 10 22:44:56 dev2 kernel: [24546.192192] swapper/0   R  running task   
 0 0  0 0x0008
  May 10 22:44:56 dev2 kernel: [24546.192195] Call Trace:
  May 10 22:44:56 dev2 kernel: [24546.192199]  ? native_safe_halt+0x6/0x10
  May 10 22:44:56 dev2 kernel: [24546.192201]  ? default_idle+0x20/0xd0
  May 10 22:44:56 dev2 kernel: [24546.192203]  ? arch_cpu_idle+0xf/0x20
  May 10 22:44:56 dev2 kernel: [24546.192204]  ? default_idle_call+0x23/0x30
  May 10 22:44:56 dev2 kernel: [24546.192206]  ? do_idle+0x16f/0x200
  May 10 22:44:56 dev2 kernel: [24546.192208]  ? cpu_startup_entry+0x71/0x80
  May 10 22:44:56 dev2 kernel: [24546.192210]  ? rest_init+0x77/0x80
  May 10 22:44:56 dev2 kernel: [24546.192211]  ? start_kernel+0x464/0x485
  May 10 22:44:56 dev2 kernel: 

[Kernel-packages] [Bug 1690085]

2019-03-20 Thread cevelnet
I have revisited the errata. Errata 1033 "A Lock Operation May Cause the
System to Hang" and 1109 "MWAIT Instruction May Hang a Thread" are the
top contenders. According to page 12 and 13 of that document, Pinnacle
Ridge processors are not affected. However, page 16 and 17 suggest 2nd
Gen Ryzen are affected. This makes me conclude: either there is an error
in the document, or Raven Ridge aka improved Zen 1 (Desktop and Mobile)
are affected, while Pinnacle Ridge is not.

During my latest tests on kernel 5.0.1, on AGESA 1006 and after
upgrading to AGESA 0070, I wasn't able to reproduce any freezes / hangs
on a Ryzen 7 2700, ASUS A320M-K (BIOS defaults, no OC, no Typical
Current Idle setting), 16GB Patriot Viper RGB 3200CL16, ASUS RX480 8GB
and Corsair AX860 (Haswell C6/C7 support, 80 PLUS Platinum).

I installed a fresh, bare minimum, default Arch Linux install w/o any
special configuration, let the system sit in idle for half a day and
check log for hangs. Then, I enabled all power management features using
powertop and let the system sit over night, still no hangs. I repeated
the same for AGESA 0070, still no hangs.

I think it's too early to conclude anything, as the hangs / freezes used
to be very random. I will repeat my tests (on AGESA 0070) with a
slightly more bloated system by installing GNOME.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1690085

Title:
  Ryzen 1800X freeze - rcu_sched detected stalls on CPUs/tasks

Status in Linux:
  Confirmed
Status in linux package in Ubuntu:
  Confirmed

Bug description:
  Hi,

  
  We aregetting various kernel crash on a pretty new config.
  We're using Ryzen 1800X CPU with X370 Gaming Pro Carbon MB (7A32V1) using 
latest BIOS available (1.52)

  We are running Ubuntu 17.04 (amd64), we've tried different kernel version, 
native one and releases from http://kernel.ubuntu.com/~kernel-ppa/mainline/ too.
  Tested kernel version:

  native 17.04 kernel
  4.10.15

  Issues are the same, we're getting random freeze on the machine.

  Here is kern.log entry when happening :

  May 10 22:41:56 dev2 kernel: [24366.186246] INFO: rcu_sched detected stalls 
on CPUs/tasks:
  May 10 22:41:56 dev2 kernel: [24366.187618] 0-...: (1 GPs behind) 
idle=49b/1/0 softirq=28561/28563 fqs=913449
  May 10 22:41:56 dev2 kernel: [24366.188977] (detected by 12, t=1860207 
jiffies, g=10001, c=1, q=4656)
  May 10 22:41:56 dev2 kernel: [24366.190344] Task dump for CPU 0:
  May 10 22:41:56 dev2 kernel: [24366.190345] swapper/0   R  running task   
 0 0  0 0x0008
  May 10 22:41:56 dev2 kernel: [24366.190348] Call Trace:
  May 10 22:41:56 dev2 kernel: [24366.190354]  ? native_safe_halt+0x6/0x10
  May 10 22:41:56 dev2 kernel: [24366.190355]  ? default_idle+0x20/0xd0
  May 10 22:41:56 dev2 kernel: [24366.190358]  ? arch_cpu_idle+0xf/0x20
  May 10 22:41:56 dev2 kernel: [24366.190360]  ? default_idle_call+0x23/0x30
  May 10 22:41:56 dev2 kernel: [24366.190362]  ? do_idle+0x16f/0x200
  May 10 22:41:56 dev2 kernel: [24366.190364]  ? cpu_startup_entry+0x71/0x80
  May 10 22:41:56 dev2 kernel: [24366.190366]  ? rest_init+0x77/0x80
  May 10 22:41:56 dev2 kernel: [24366.190368]  ? start_kernel+0x464/0x485
  May 10 22:41:56 dev2 kernel: [24366.190369]  ? 
early_idt_handler_array+0x120/0x120
  May 10 22:41:56 dev2 kernel: [24366.190371]  ? 
x86_64_start_reservations+0x24/0x26
  May 10 22:41:56 dev2 kernel: [24366.190372]  ? x86_64_start_kernel+0x14d/0x170
  May 10 22:41:56 dev2 kernel: [24366.190373]  ? start_cpu+0x14/0x14
  May 10 22:44:56 dev2 kernel: [24546.188093] INFO: rcu_sched detected stalls 
on CPUs/tasks:
  May 10 22:44:56 dev2 kernel: [24546.189461] 0-...: (1 GPs behind) 
idle=49b/1/0 softirq=28561/28563 fqs=935027
  May 10 22:44:56 dev2 kernel: [24546.190823] (detected by 14, t=1905212 
jiffies, g=10001, c=1, q=4740)
  May 10 22:44:56 dev2 kernel: [24546.192191] Task dump for CPU 0:
  May 10 22:44:56 dev2 kernel: [24546.192192] swapper/0   R  running task   
 0 0  0 0x0008
  May 10 22:44:56 dev2 kernel: [24546.192195] Call Trace:
  May 10 22:44:56 dev2 kernel: [24546.192199]  ? native_safe_halt+0x6/0x10
  May 10 22:44:56 dev2 kernel: [24546.192201]  ? default_idle+0x20/0xd0
  May 10 22:44:56 dev2 kernel: [24546.192203]  ? arch_cpu_idle+0xf/0x20
  May 10 22:44:56 dev2 kernel: [24546.192204]  ? default_idle_call+0x23/0x30
  May 10 22:44:56 dev2 kernel: [24546.192206]  ? do_idle+0x16f/0x200
  May 10 22:44:56 dev2 kernel: [24546.192208]  ? cpu_startup_entry+0x71/0x80
  May 10 22:44:56 dev2 kernel: [24546.192210]  ? rest_init+0x77/0x80
  May 10 22:44:56 dev2 kernel: [24546.192211]  ? start_kernel+0x464/0x485
  May 10 22:44:56 dev2 kernel: [24546.192213]  ? 
early_idt_handler_array+0x120/0x120
  May 10 22:44:56 dev2 kernel: [24546.192214]  ? 
x86_64_start_reservations+0x24/0x26
  May 10 22:44:56 dev2 kernel: [24546.192215]  ? 

[Kernel-packages] [Bug 1690085]

2019-03-22 Thread cevelnet
(In reply to Lars Viklund from comment #572)
> (You people should be happy you're not running FreeBSD, there I can
> reasonably reliably hang Ryzens within hours by sending ZFS snapshots :D )

Looks like they've addressed Ryzen errata issues around August 2018:

https://github.com/freebsd/freebsd/blob/75ee4f08d3acd4bf70f24b3203fa440255873973/sys/amd64/amd64/initcpu.c#L133

Also, they've made machdep.idle=hlt and machdep.idle_mwait=0 default for
Ryzen processors.

I've additionally checked the registers mentioned in FresBSD's source.
Latest BIOS 4406 (w/ AGESA 0070) for my ASUS A320M-K seems to apply
fixes for all affected errata.

Eventhough I can't reliably reproduce the stability issues anymore, I
think it's still good practice to have Boris' patch applied, since it's
addresses a known erratum. I'll test the patch and check for
regressions.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1690085

Title:
  Ryzen 1800X freeze - rcu_sched detected stalls on CPUs/tasks

Status in Linux:
  Confirmed
Status in linux package in Ubuntu:
  Confirmed

Bug description:
  Hi,

  
  We aregetting various kernel crash on a pretty new config.
  We're using Ryzen 1800X CPU with X370 Gaming Pro Carbon MB (7A32V1) using 
latest BIOS available (1.52)

  We are running Ubuntu 17.04 (amd64), we've tried different kernel version, 
native one and releases from http://kernel.ubuntu.com/~kernel-ppa/mainline/ too.
  Tested kernel version:

  native 17.04 kernel
  4.10.15

  Issues are the same, we're getting random freeze on the machine.

  Here is kern.log entry when happening :

  May 10 22:41:56 dev2 kernel: [24366.186246] INFO: rcu_sched detected stalls 
on CPUs/tasks:
  May 10 22:41:56 dev2 kernel: [24366.187618] 0-...: (1 GPs behind) 
idle=49b/1/0 softirq=28561/28563 fqs=913449
  May 10 22:41:56 dev2 kernel: [24366.188977] (detected by 12, t=1860207 
jiffies, g=10001, c=1, q=4656)
  May 10 22:41:56 dev2 kernel: [24366.190344] Task dump for CPU 0:
  May 10 22:41:56 dev2 kernel: [24366.190345] swapper/0   R  running task   
 0 0  0 0x0008
  May 10 22:41:56 dev2 kernel: [24366.190348] Call Trace:
  May 10 22:41:56 dev2 kernel: [24366.190354]  ? native_safe_halt+0x6/0x10
  May 10 22:41:56 dev2 kernel: [24366.190355]  ? default_idle+0x20/0xd0
  May 10 22:41:56 dev2 kernel: [24366.190358]  ? arch_cpu_idle+0xf/0x20
  May 10 22:41:56 dev2 kernel: [24366.190360]  ? default_idle_call+0x23/0x30
  May 10 22:41:56 dev2 kernel: [24366.190362]  ? do_idle+0x16f/0x200
  May 10 22:41:56 dev2 kernel: [24366.190364]  ? cpu_startup_entry+0x71/0x80
  May 10 22:41:56 dev2 kernel: [24366.190366]  ? rest_init+0x77/0x80
  May 10 22:41:56 dev2 kernel: [24366.190368]  ? start_kernel+0x464/0x485
  May 10 22:41:56 dev2 kernel: [24366.190369]  ? 
early_idt_handler_array+0x120/0x120
  May 10 22:41:56 dev2 kernel: [24366.190371]  ? 
x86_64_start_reservations+0x24/0x26
  May 10 22:41:56 dev2 kernel: [24366.190372]  ? x86_64_start_kernel+0x14d/0x170
  May 10 22:41:56 dev2 kernel: [24366.190373]  ? start_cpu+0x14/0x14
  May 10 22:44:56 dev2 kernel: [24546.188093] INFO: rcu_sched detected stalls 
on CPUs/tasks:
  May 10 22:44:56 dev2 kernel: [24546.189461] 0-...: (1 GPs behind) 
idle=49b/1/0 softirq=28561/28563 fqs=935027
  May 10 22:44:56 dev2 kernel: [24546.190823] (detected by 14, t=1905212 
jiffies, g=10001, c=1, q=4740)
  May 10 22:44:56 dev2 kernel: [24546.192191] Task dump for CPU 0:
  May 10 22:44:56 dev2 kernel: [24546.192192] swapper/0   R  running task   
 0 0  0 0x0008
  May 10 22:44:56 dev2 kernel: [24546.192195] Call Trace:
  May 10 22:44:56 dev2 kernel: [24546.192199]  ? native_safe_halt+0x6/0x10
  May 10 22:44:56 dev2 kernel: [24546.192201]  ? default_idle+0x20/0xd0
  May 10 22:44:56 dev2 kernel: [24546.192203]  ? arch_cpu_idle+0xf/0x20
  May 10 22:44:56 dev2 kernel: [24546.192204]  ? default_idle_call+0x23/0x30
  May 10 22:44:56 dev2 kernel: [24546.192206]  ? do_idle+0x16f/0x200
  May 10 22:44:56 dev2 kernel: [24546.192208]  ? cpu_startup_entry+0x71/0x80
  May 10 22:44:56 dev2 kernel: [24546.192210]  ? rest_init+0x77/0x80
  May 10 22:44:56 dev2 kernel: [24546.192211]  ? start_kernel+0x464/0x485
  May 10 22:44:56 dev2 kernel: [24546.192213]  ? 
early_idt_handler_array+0x120/0x120
  May 10 22:44:56 dev2 kernel: [24546.192214]  ? 
x86_64_start_reservations+0x24/0x26
  May 10 22:44:56 dev2 kernel: [24546.192215]  ? x86_64_start_kernel+0x14d/0x170
  May 10 22:44:56 dev2 kernel: [24546.192217]  ? start_cpu+0x14/0x14

  Depending on the kernel version, we've got NMI watchdog errors related to CPU 
stuck (mentioning the CPU core id, which is random).
  Crash is happening randomly, but in general after some hours (3-4h).

  Now, we've installed kernel 4.11.0-041100-generic #201705041534 this morning 
and waiting for crash...
  For now, the machine is not "used", at least, it's not 

[Kernel-packages] [Bug 1690085]

2019-03-12 Thread cevelnet
AGESA 0070 has been released for some mainboards (e.g. ASUS A320M-K,
ASUS X470 Pro and a couple of MSI boards). For anyone experiencing
crashes, I think 0070 is worth a try. Official changelog says something
along "added support for new processors", but more changes under the
hood aren't unlikely.

Regarding the segfault issue: it has already been confirmed by AMD to be
a hardware issue on early batches of Ryzen 1xxx CPUs and you can RMA
affected CPUs under warranty. I don't have the landing page at hand
right now, but I'd advise anyone with the segfault bug to replace their
CPUs; especially those experiencing crashes under Windows.

Like Trevor, I also think we have incostencies, because we're most
likely dealing with multiple bugs. From what I've seen so far, Vega
users have a different form of crash caused by Vega (including Ryzen
Mobile, Vega 56/64 dGPU, 2200G/2400G), users experiencing crashes under
Windows and Linux have a different form of instability issue, crashes
caused by segfault is another category and issues fixed by idle=halt /
"Typical Load Idle" is another category.

>From my own tests, I fall into the category fixed by "Typical Load
Idle". Later today, I will update to AGESA 0070, test against latest
kernel with various settings and report back.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1690085

Title:
  Ryzen 1800X freeze - rcu_sched detected stalls on CPUs/tasks

Status in Linux:
  Confirmed
Status in linux package in Ubuntu:
  Confirmed

Bug description:
  Hi,

  
  We aregetting various kernel crash on a pretty new config.
  We're using Ryzen 1800X CPU with X370 Gaming Pro Carbon MB (7A32V1) using 
latest BIOS available (1.52)

  We are running Ubuntu 17.04 (amd64), we've tried different kernel version, 
native one and releases from http://kernel.ubuntu.com/~kernel-ppa/mainline/ too.
  Tested kernel version:

  native 17.04 kernel
  4.10.15

  Issues are the same, we're getting random freeze on the machine.

  Here is kern.log entry when happening :

  May 10 22:41:56 dev2 kernel: [24366.186246] INFO: rcu_sched detected stalls 
on CPUs/tasks:
  May 10 22:41:56 dev2 kernel: [24366.187618] 0-...: (1 GPs behind) 
idle=49b/1/0 softirq=28561/28563 fqs=913449
  May 10 22:41:56 dev2 kernel: [24366.188977] (detected by 12, t=1860207 
jiffies, g=10001, c=1, q=4656)
  May 10 22:41:56 dev2 kernel: [24366.190344] Task dump for CPU 0:
  May 10 22:41:56 dev2 kernel: [24366.190345] swapper/0   R  running task   
 0 0  0 0x0008
  May 10 22:41:56 dev2 kernel: [24366.190348] Call Trace:
  May 10 22:41:56 dev2 kernel: [24366.190354]  ? native_safe_halt+0x6/0x10
  May 10 22:41:56 dev2 kernel: [24366.190355]  ? default_idle+0x20/0xd0
  May 10 22:41:56 dev2 kernel: [24366.190358]  ? arch_cpu_idle+0xf/0x20
  May 10 22:41:56 dev2 kernel: [24366.190360]  ? default_idle_call+0x23/0x30
  May 10 22:41:56 dev2 kernel: [24366.190362]  ? do_idle+0x16f/0x200
  May 10 22:41:56 dev2 kernel: [24366.190364]  ? cpu_startup_entry+0x71/0x80
  May 10 22:41:56 dev2 kernel: [24366.190366]  ? rest_init+0x77/0x80
  May 10 22:41:56 dev2 kernel: [24366.190368]  ? start_kernel+0x464/0x485
  May 10 22:41:56 dev2 kernel: [24366.190369]  ? 
early_idt_handler_array+0x120/0x120
  May 10 22:41:56 dev2 kernel: [24366.190371]  ? 
x86_64_start_reservations+0x24/0x26
  May 10 22:41:56 dev2 kernel: [24366.190372]  ? x86_64_start_kernel+0x14d/0x170
  May 10 22:41:56 dev2 kernel: [24366.190373]  ? start_cpu+0x14/0x14
  May 10 22:44:56 dev2 kernel: [24546.188093] INFO: rcu_sched detected stalls 
on CPUs/tasks:
  May 10 22:44:56 dev2 kernel: [24546.189461] 0-...: (1 GPs behind) 
idle=49b/1/0 softirq=28561/28563 fqs=935027
  May 10 22:44:56 dev2 kernel: [24546.190823] (detected by 14, t=1905212 
jiffies, g=10001, c=1, q=4740)
  May 10 22:44:56 dev2 kernel: [24546.192191] Task dump for CPU 0:
  May 10 22:44:56 dev2 kernel: [24546.192192] swapper/0   R  running task   
 0 0  0 0x0008
  May 10 22:44:56 dev2 kernel: [24546.192195] Call Trace:
  May 10 22:44:56 dev2 kernel: [24546.192199]  ? native_safe_halt+0x6/0x10
  May 10 22:44:56 dev2 kernel: [24546.192201]  ? default_idle+0x20/0xd0
  May 10 22:44:56 dev2 kernel: [24546.192203]  ? arch_cpu_idle+0xf/0x20
  May 10 22:44:56 dev2 kernel: [24546.192204]  ? default_idle_call+0x23/0x30
  May 10 22:44:56 dev2 kernel: [24546.192206]  ? do_idle+0x16f/0x200
  May 10 22:44:56 dev2 kernel: [24546.192208]  ? cpu_startup_entry+0x71/0x80
  May 10 22:44:56 dev2 kernel: [24546.192210]  ? rest_init+0x77/0x80
  May 10 22:44:56 dev2 kernel: [24546.192211]  ? start_kernel+0x464/0x485
  May 10 22:44:56 dev2 kernel: [24546.192213]  ? 
early_idt_handler_array+0x120/0x120
  May 10 22:44:56 dev2 kernel: [24546.192214]  ? 
x86_64_start_reservations+0x24/0x26
  May 10 22:44:56 dev2 kernel: [24546.192215]  ? x86_64_start_kernel+0x14d/0x170
  May 10 

[Kernel-packages] [Bug 1690085]

2019-03-21 Thread cevelnet
@Borislav has the fix for erratum 1033 "A Lock Operation May Cause the
System to Hang" been applied so far? The suggested workaround was
"Program MSRC001_1020[4] to 1b", but I couldn't find anything about it
in master branch. According to the document, 1033 only affects B1.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1690085

Title:
  Ryzen 1800X freeze - rcu_sched detected stalls on CPUs/tasks

Status in Linux:
  Confirmed
Status in linux package in Ubuntu:
  Confirmed

Bug description:
  Hi,

  
  We aregetting various kernel crash on a pretty new config.
  We're using Ryzen 1800X CPU with X370 Gaming Pro Carbon MB (7A32V1) using 
latest BIOS available (1.52)

  We are running Ubuntu 17.04 (amd64), we've tried different kernel version, 
native one and releases from http://kernel.ubuntu.com/~kernel-ppa/mainline/ too.
  Tested kernel version:

  native 17.04 kernel
  4.10.15

  Issues are the same, we're getting random freeze on the machine.

  Here is kern.log entry when happening :

  May 10 22:41:56 dev2 kernel: [24366.186246] INFO: rcu_sched detected stalls 
on CPUs/tasks:
  May 10 22:41:56 dev2 kernel: [24366.187618] 0-...: (1 GPs behind) 
idle=49b/1/0 softirq=28561/28563 fqs=913449
  May 10 22:41:56 dev2 kernel: [24366.188977] (detected by 12, t=1860207 
jiffies, g=10001, c=1, q=4656)
  May 10 22:41:56 dev2 kernel: [24366.190344] Task dump for CPU 0:
  May 10 22:41:56 dev2 kernel: [24366.190345] swapper/0   R  running task   
 0 0  0 0x0008
  May 10 22:41:56 dev2 kernel: [24366.190348] Call Trace:
  May 10 22:41:56 dev2 kernel: [24366.190354]  ? native_safe_halt+0x6/0x10
  May 10 22:41:56 dev2 kernel: [24366.190355]  ? default_idle+0x20/0xd0
  May 10 22:41:56 dev2 kernel: [24366.190358]  ? arch_cpu_idle+0xf/0x20
  May 10 22:41:56 dev2 kernel: [24366.190360]  ? default_idle_call+0x23/0x30
  May 10 22:41:56 dev2 kernel: [24366.190362]  ? do_idle+0x16f/0x200
  May 10 22:41:56 dev2 kernel: [24366.190364]  ? cpu_startup_entry+0x71/0x80
  May 10 22:41:56 dev2 kernel: [24366.190366]  ? rest_init+0x77/0x80
  May 10 22:41:56 dev2 kernel: [24366.190368]  ? start_kernel+0x464/0x485
  May 10 22:41:56 dev2 kernel: [24366.190369]  ? 
early_idt_handler_array+0x120/0x120
  May 10 22:41:56 dev2 kernel: [24366.190371]  ? 
x86_64_start_reservations+0x24/0x26
  May 10 22:41:56 dev2 kernel: [24366.190372]  ? x86_64_start_kernel+0x14d/0x170
  May 10 22:41:56 dev2 kernel: [24366.190373]  ? start_cpu+0x14/0x14
  May 10 22:44:56 dev2 kernel: [24546.188093] INFO: rcu_sched detected stalls 
on CPUs/tasks:
  May 10 22:44:56 dev2 kernel: [24546.189461] 0-...: (1 GPs behind) 
idle=49b/1/0 softirq=28561/28563 fqs=935027
  May 10 22:44:56 dev2 kernel: [24546.190823] (detected by 14, t=1905212 
jiffies, g=10001, c=1, q=4740)
  May 10 22:44:56 dev2 kernel: [24546.192191] Task dump for CPU 0:
  May 10 22:44:56 dev2 kernel: [24546.192192] swapper/0   R  running task   
 0 0  0 0x0008
  May 10 22:44:56 dev2 kernel: [24546.192195] Call Trace:
  May 10 22:44:56 dev2 kernel: [24546.192199]  ? native_safe_halt+0x6/0x10
  May 10 22:44:56 dev2 kernel: [24546.192201]  ? default_idle+0x20/0xd0
  May 10 22:44:56 dev2 kernel: [24546.192203]  ? arch_cpu_idle+0xf/0x20
  May 10 22:44:56 dev2 kernel: [24546.192204]  ? default_idle_call+0x23/0x30
  May 10 22:44:56 dev2 kernel: [24546.192206]  ? do_idle+0x16f/0x200
  May 10 22:44:56 dev2 kernel: [24546.192208]  ? cpu_startup_entry+0x71/0x80
  May 10 22:44:56 dev2 kernel: [24546.192210]  ? rest_init+0x77/0x80
  May 10 22:44:56 dev2 kernel: [24546.192211]  ? start_kernel+0x464/0x485
  May 10 22:44:56 dev2 kernel: [24546.192213]  ? 
early_idt_handler_array+0x120/0x120
  May 10 22:44:56 dev2 kernel: [24546.192214]  ? 
x86_64_start_reservations+0x24/0x26
  May 10 22:44:56 dev2 kernel: [24546.192215]  ? x86_64_start_kernel+0x14d/0x170
  May 10 22:44:56 dev2 kernel: [24546.192217]  ? start_cpu+0x14/0x14

  Depending on the kernel version, we've got NMI watchdog errors related to CPU 
stuck (mentioning the CPU core id, which is random).
  Crash is happening randomly, but in general after some hours (3-4h).

  Now, we've installed kernel 4.11.0-041100-generic #201705041534 this morning 
and waiting for crash...
  For now, the machine is not "used", at least, it's not CPU stressed...

  
  Thanks
  --- 
  ApportVersion: 2.20.4-0ubuntu4
  Architecture: amd64
  DistroRelease: Ubuntu 17.04
  InstallationDate: Installed on 2017-05-09 (1 days ago)
  InstallationMedia: Ubuntu-Server 17.04 "Zesty Zapus" - Release amd64 
(20170412)
  Package: linux (not installed)
  ProcEnviron:
   TERM=xterm-256color
   PATH=(custom, no user)
   XDG_RUNTIME_DIR=
   LANG=fr_FR.UTF-8
   SHELL=/bin/bash
  Tags:  zesty
  Uname: Linux 4.11.0-041100-generic x86_64
  UnreportableReason: The running kernel is not an Ubuntu kernel
  UpgradeStatus: No upgrade log present