[Bug 1690085]

2019-03-22 Thread cevelnet
(In reply to Lars Viklund from comment #572)
> (You people should be happy you're not running FreeBSD, there I can
> reasonably reliably hang Ryzens within hours by sending ZFS snapshots :D )

Looks like they've addressed Ryzen errata issues around August 2018:

https://github.com/freebsd/freebsd/blob/75ee4f08d3acd4bf70f24b3203fa440255873973/sys/amd64/amd64/initcpu.c#L133

Also, they've made machdep.idle=hlt and machdep.idle_mwait=0 default for
Ryzen processors.

I've additionally checked the registers mentioned in FresBSD's source.
Latest BIOS 4406 (w/ AGESA 0070) for my ASUS A320M-K seems to apply
fixes for all affected errata.

Eventhough I can't reliably reproduce the stability issues anymore, I
think it's still good practice to have Boris' patch applied, since it's
addresses a known erratum. I'll test the patch and check for
regressions.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1690085

Title:
  Ryzen 1800X freeze - rcu_sched detected stalls on CPUs/tasks

To manage notifications about this bug go to:
https://bugs.launchpad.net/linux/+bug/1690085/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1690085]

2019-03-21 Thread cevelnet
@Borislav has the fix for erratum 1033 "A Lock Operation May Cause the
System to Hang" been applied so far? The suggested workaround was
"Program MSRC001_1020[4] to 1b", but I couldn't find anything about it
in master branch. According to the document, 1033 only affects B1.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1690085

Title:
  Ryzen 1800X freeze - rcu_sched detected stalls on CPUs/tasks

To manage notifications about this bug go to:
https://bugs.launchpad.net/linux/+bug/1690085/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1690085]

2019-03-20 Thread cevelnet
I have revisited the errata. Errata 1033 "A Lock Operation May Cause the
System to Hang" and 1109 "MWAIT Instruction May Hang a Thread" are the
top contenders. According to page 12 and 13 of that document, Pinnacle
Ridge processors are not affected. However, page 16 and 17 suggest 2nd
Gen Ryzen are affected. This makes me conclude: either there is an error
in the document, or Raven Ridge aka improved Zen 1 (Desktop and Mobile)
are affected, while Pinnacle Ridge is not.

During my latest tests on kernel 5.0.1, on AGESA 1006 and after
upgrading to AGESA 0070, I wasn't able to reproduce any freezes / hangs
on a Ryzen 7 2700, ASUS A320M-K (BIOS defaults, no OC, no Typical
Current Idle setting), 16GB Patriot Viper RGB 3200CL16, ASUS RX480 8GB
and Corsair AX860 (Haswell C6/C7 support, 80 PLUS Platinum).

I installed a fresh, bare minimum, default Arch Linux install w/o any
special configuration, let the system sit in idle for half a day and
check log for hangs. Then, I enabled all power management features using
powertop and let the system sit over night, still no hangs. I repeated
the same for AGESA 0070, still no hangs.

I think it's too early to conclude anything, as the hangs / freezes used
to be very random. I will repeat my tests (on AGESA 0070) with a
slightly more bloated system by installing GNOME.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1690085

Title:
  Ryzen 1800X freeze - rcu_sched detected stalls on CPUs/tasks

To manage notifications about this bug go to:
https://bugs.launchpad.net/linux/+bug/1690085/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1690085]

2019-03-12 Thread cevelnet
AGESA 0070 has been released for some mainboards (e.g. ASUS A320M-K,
ASUS X470 Pro and a couple of MSI boards). For anyone experiencing
crashes, I think 0070 is worth a try. Official changelog says something
along "added support for new processors", but more changes under the
hood aren't unlikely.

Regarding the segfault issue: it has already been confirmed by AMD to be
a hardware issue on early batches of Ryzen 1xxx CPUs and you can RMA
affected CPUs under warranty. I don't have the landing page at hand
right now, but I'd advise anyone with the segfault bug to replace their
CPUs; especially those experiencing crashes under Windows.

Like Trevor, I also think we have incostencies, because we're most
likely dealing with multiple bugs. From what I've seen so far, Vega
users have a different form of crash caused by Vega (including Ryzen
Mobile, Vega 56/64 dGPU, 2200G/2400G), users experiencing crashes under
Windows and Linux have a different form of instability issue, crashes
caused by segfault is another category and issues fixed by idle=halt /
"Typical Load Idle" is another category.

>From my own tests, I fall into the category fixed by "Typical Load
Idle". Later today, I will update to AGESA 0070, test against latest
kernel with various settings and report back.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1690085

Title:
  Ryzen 1800X freeze - rcu_sched detected stalls on CPUs/tasks

To manage notifications about this bug go to:
https://bugs.launchpad.net/linux/+bug/1690085/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1690085]

2018-12-31 Thread cevelnet
Hi,

I have an ASUS A320M-K with latest BIOS 4027 (AGESA 1.0.0.6), Ryzen 7
2700, Patriot Viper RGB 16GB RAM 3200CL16, Samsung 830 SSD 128GB,
Corsair AX860 860W PSU (80+ Platinum) and ASUS ROG Strix RX480 8GB.
Running Arch Linux w/ Kernel 4.19.12 and GNOME 3.30 on Wayland. BIOS
configured to Defaults + XMP Profile activated + CSM disabled (-> UEFI-
only mode).

I have experienced 2 hangs with this system, both within 1 hour uptime.
However, I couldn't spot anything in the logs. There were no watchdog
entries, unlike previously posted in this bug report.

The "system lockup" expresses itself by blanking my screen to dark and
showing a blinking cursor in the top left corner. No other symbols /
information. I can't interact in any way with my system - it ignores all
inputs; can't switch console, doesn't react to single power button
press, nothing. Only way to shutdown is holding power button for 4s.

I entered the BIOS after the lockups and noticed *VERY* slow performance
within BIOS. Keypresses took very long to register (like 5+ seconds)
until BIOS eventually hung. After another powercycle, all was good
again. I wonder, if rebooting using the reset button could give us more
clues, if the system continues to run "buggy" after the reset.

After setting "Typical Current Idle", I didn't had a single lockup ever
since, even after 48h uptime, mostly idling. System runs perfectly good.
This fix is fine for desktops, but laptops suffer from higher power
draw, if they're able to set this.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1690085

Title:
  Ryzen 1800X freeze - rcu_sched detected stalls on CPUs/tasks

To manage notifications about this bug go to:
https://bugs.launchpad.net/linux/+bug/1690085/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs