Public bug reported:
== Comment: #0 - Ranjal G. Shenoy
On Boston DD2.0 system, where deep stop states such as stop4 are enabled, the
17.10 kernel Ubuntu-4.12.0-12.13 fails to boot.
It requires the following upstream fixes to be backported.
1) commit 5f221c3ca13d ("powerpc/powernv/idle: Correctly initialize
core_idle_state_ptr")
2) commit ec4867355244 ("powerpc/powernv/idle: Decouple Timebase restore &
Per-core SPRs restore")
3) commit cb0be7ec0307 ("powerpc/powernv/idle: Restore LPCR on wakeup from
deep-stop")
4) commit 1e1601b38e6e ("powerpc/powernv/idle: Restore SPRs for deep idle
states via stop API.")
5) commit 22c6663dc69a ("powerpc/powernv/idle: Use Requested Level for
restoring state on P9 DD1")
6) commit f9122ee4f558 ("cpuidle-powernv: Allow Deep stop states that don't
stop time")
7) commit 785a12afdb4a ("powerpc/powernv/idle: Disable LOSE_FULL_CONTEXT states
when stop-api fails")
8) commit e1c1cfed5432 ("powerpc/powernv: Save/Restore additional SPRs for
stop4 cpuidle")
9) commit 24be85a23d1f ("powerpc/powernv: Clear PECE1 in LPCR via stop-api only
on Hotplug")
10) https://patchwork.ozlabs.org/patch/808233/ ("powerpc/powernv: Clear
LPCR[PECE1] via stop-api only for deep state offline")
Of these 1-7 are in Linux Kernel 4.13. 8 and 9 are in powerpc/linux.git
-next branch. and 10) is posted upstream which fixes 9).
These patches have been backported on top of Ubuntu-4.12.0-12.13 and
tested on Boston where they are working as expected.
== Comment: #1 - Ranjal G. Shenoy
The lower 8 bits of core_idle_state_ptr tracks the number of non-idle
threads in the core. This is supposed to be initialized to bit-map
corresponding to the threads_per_core. However, currently it is
initialized to PNV_CORE_IDLE_THREAD_BITS (0xFF). This is correct for
POWER8 which has 8 threads per core, but not for POWER9 which has 4
threads per core.
As a result, on POWER9, core_idle_state_ptr gets initialized to
0xFF. In case when all the threads of the core are idle, the bits
corresponding tracking the idle-threads are non-zero. As a result, the
idle entry/exit code fails to save/restore per-core hypervisor state
since it assumes that there are threads in the cores which are still
active.
Fix this by correctly initializing the lower bits of the
core_idle_state_ptr on the basis of threads_per_core.
Cherry-picked from commit 5f221c3ca13d ("powerpc/powernv/idle:
Correctly initialize core_idle_state_ptr")
== Comment: #2 - Ranjal G. Shenoy
On POWER8, in case of
- nap: both timebase and hypervisor state is retained.
- fast-sleep: timebase is lost. But the hypervisor state is retained.
- winkle: timebase and hypervisor state is lost.
Hence, the current code for handling exit from a idle state assumes
that if the timebase value is retained, then so is the hypervisor
state. Thus, the current code doesn't restore per-core hypervisor
state in such cases.
But that is no longer the case on POWER9 where we do have stop states
in which timebase value is retained, but the hypervisor state is
lost. So we have to ensure that the per-core hypervisor state gets
restored in such cases.
Fix this by ensuring that even in the case when timebase is retained,
we explicitly check if we are waking up from a deep stop that loses
per-core hypervisor state (indicated by cr4 being eq or gt), and if
this is the case, we restore the per-core hypervisor state.
Cherry-picked from commit ec4867355244 ("powerpc/powernv/idle:
Decouple Timebase restore & Per-core SPRs restore")
== Comment: #3 - Ranjal G. Shenoy
On wakeup from a deep stop state which is supposed to lose the
hypervisor state, we don't restore the LPCR to the old value but set
it to a "sane" value via cur_cpu_spec->cpu_restore().
The problem is that the "sane" value doesn't include UPRT and the HR
bits which are required to run correctly in Radix mode.
Fix this on POWER9 onwards by restoring the LPCR value whatever it was
before executing the stop instruction.
Cherry-picked from commit cb0be7ec0307 ("powerpc/powernv/idle: Restore
LPCR on wakeup from deep-stop")
== Comment: #4 - Ranjal G. Shenoy
Some of the SPR values (HID0, MSR, SPRG0) don't change during the run
time of a booted kernel, once they have been initialized.
The contents of these SPRs are lost when the CPUs enter deep stop
states. So instead saving and restoring SPRs from the kernel, use the
stop-api provided by the firmware by which the firmware can restore
the contents of these SPRs to their initialized values after wakeup
from a deep stop state.
Apart from these, program the PSSCR value to that of the deepest stop
state via the stop-api. This will be used to indicate to the
underlying firmware as to what stop state to put the threads that have
been woken up by a special-wakeup.
And while we are at programming SPRs via stop-api, ensure that HID1,
HID4 and HID5 registers which are only available on POWER8 are not
requested to be restored by the firware on POWER9.
Cherry-picked from commit 1e1601b38e6e ("powerpc/powernv/idle: Restore
SPRs for deep idle states via stop API.")
== Comment: #5 - Ranjal G. Shenoy
On Power9 DD1 due to a hardware bug the Power-Saving Level Status
field (PLS) of the PSSCR for a thread waking up from a deep state can
under-report if some other thread in the core is in a shallow stop
state. The scenario in which this can manifest is as follows:
1) All the threads of the core are in deep stop.
2) One of the threads is woken up. The PLS for this thread will
correctly reflect that it is waking up from deep stop.
3) The thread that has woken up now executes a shallow stop.
4) When some other thread in the core is woken, its PLS will reflect
the shallow stop state.
Thus, the subsequent thread for which the PLS is under-reporting the
wakeup state will not restore the hypervisor resources.
Hence, on DD1 systems, use the Requested Level (RL) field as a
workaround to restore the contents of the hypervisor resources on the
wakeup from the stop state.
Cherry-picked from commit 22c6663dc69a ("powerpc/powernv/idle: Use
Requested Level for restoring state on P9 DD1")
== Comment: #6 - Ranjal G. Shenoy
The current code in the cpuidle-powernv intialization only allows deep
stop states (indicated by OPAL_PM_STOP_INST_DEEP) which lose timebase
(indicated by OPAL_PM_TIMEBASE_STOP). This assumption goes back to
POWER8 time where deep states used to lose the timebase. However, on
POWER9, we do have stop states that are deep (they lose hypervisor
state) but retain the timebase.
Fix the initialization code in the cpuidle-powernv driver to allow
such deep states.
Further, there is a bug in cpuidle-powernv driver with
CONFIG_TICK_ONESHOT=n where we end up incrementing the nr_idle_states
even if a platform idle state which loses time base was not added to
the cpuidle table.
Fix this by ensuring that the nr_idle_states variable gets incremented
only when the platform idle state was added to the cpuidle table.
Cherry-picked from commit f9122ee4f558 ("cpuidle-powernv: Allow Deep
stop states that don't stop time")
== Comment: #7 - Ranjal G. Shenoy
Currently, we use the opal call opal_slw_set_reg() to inform the
Sleep-Winkle Engine (SLW) to restore the contents of some of the
Hypervisor state on wakeup from deep idle states that lose full
hypervisor context (characterized by the flag
OPAL_PM_LOSE_FULL_CONTEXT).
However, the current code has a bug in that if opal_slw_set_reg()
fails, we don't disable the use of these deep states (winkle on
POWER8, stop4 onwards on POWER9).
This patch fixes this bug by ensuring that if programing the
sleep-winkle engine to restore the hypervisor states in
pnv_save_sprs_for_deep_states() fails, then we exclude such states by
clearing the OPAL_PM_LOSE_FULL_CONTEXT flag from
supported_cpuidle_states. As a result POWER8 will be prevented from
using winkle for CPU-Hotplug, and POWER9 will put the offlined CPUs to
the default stop state when available.
Further, we ensure in the initialization of the cpuidle-powernv driver
to only include those states whose flags are present in
supported_cpuidle_states, thereby skipping OPAL_PM_LOSE_FULL_CONTEXT
states when they have been disabled due to stop-api failure.
Fixes: 1e1601b38e6 ("powerpc/powernv/idle: Restore SPRs for deep idle
states via stop API.")
Cherry-picked from commit 785a12afdb4a ("powerpc/powernv/idle: Disable
LOSE_FULL_CONTEXT states when stop-api fails")
== Comment: #8 - Ranjal G. Shenoy
The stop4 idle state on POWER9 is a deep idle state which loses
hypervisor resources, but whose latency is low enough that it can be
exposed via cpuidle.
Until now, the deep idle states which lose hypervisor resources (eg:
winkle) were only exposed via CPU-Hotplug. Hence currently on wakeup
from such states, barring a few SPRs which need to be restored to
their older value, rest of the SPRS are reinitialized to their values
corresponding to that at boot time.
When stop4 is used in the context of cpuidle, we want these additional
SPRs to be restored to their older value, to ensure that the context
on the CPU coming back from idle is same as it was before going idle.
In this patch, we define a SPR save area in PACA (since we have used
up the volatile register space in the stack) and on POWER9, we restore
SPRN_PID, SPRN_LDBAR, SPRN_FSCR, SPRN_HFSCR, SPRN_MMCRA, SPRN_MMCR1,
SPRN_MMCR2 to the values they had before entering stop.
Cherry-picked from commit e1c1cfed5432 ("powerpc/powernv: Save/Restore
additional SPRs for stop4 cpuidle")
== Comment: #9 - Ranjal G. Shenoy
Currently we use the stop-api provided by the firmware to program the
SLW engine to restore the values of hypervisor resources that get lost
on deeper idle states (such as winkle). Since the deep states were
only used for CPU-Hotplug on POWER8 systems, we would program the LPCR
to have the PECE1 bit since Hotplugged CPUs shouldn't be spuriously
woken up by decrementer.
On POWER9, some of the deep platform idle states such as stop4 can be
used in cpuidle as well. In this case, we want the CPU in stop4 to be
woken up by the decrementer when some timer on the CPU expires.
In this patch, we program the stop-api for LPCR with PECE1
bit cleared only when we are offlining the CPU and set it
back once the CPU is online.
Cherry-picked from commit 24be85a23d1f ("powerpc/powernv: Clear PECE1
in LPCR via stop-api only on Hotplug")
== Comment: #10 - Ranjal G. Shenoy
commit 24be85a23d1f ("powerpc/powernv: Clear PECE1 in LPCR via
stop-api only on Hotplug") clears the PECE1 bit of the LPCR via
stop-api during CPU-Hotplug to prevent wakeup due to a decrementer on
an offlined CPU which is in a deep stop state.
In the case where the stop-api support is found to be lacking, the
commit 785a12afdb4a ("powerpc/powernv/idle: Disable LOSE_FULL_CONTEXT
states when stop-api fails") disables deep states that lose hypervisor
context. Thus in this case, the offlined CPU will be put to some
shallow idle state.
However, we currently unconditionally clear the PECE1 in LPCR via
stop-api during CPU-Hotplug even when deep states are disabled due to
stop-api failure.
Fix this by clearing PECE1 of LPCR via stop-api during CPU-Hotplug
*only* when the offlined CPU will be put to a deep state that loses
hypervisor context.
Fixes: commit 24be85a23d1f ("powerpc/powernv: Clear PECE1 in LPCR via
stop-api only on Hotplug")
upstream reference: https://patchwork.ozlabs.org/patch/808233/
** Affects: linux (Ubuntu)
Importance: Undecided
Assignee: Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage)
Status: New
** Tags: architecture-ppc64le bugnameltc-158257 severity-high
targetmilestone-inin1710
** Tags added: architecture-ppc64le bugnameltc-158257 severity-high
targetmilestone-inin1710
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1715064
Title:
17.10 fails to boot on POWER9 DD2.0 with Deep stop states
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1715064/+subscriptions
--
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs