http://defect.opensolaris.org/bz/show_bug.cgi?id=6605
Summary: PAD needs to be hardened against strange _PSDs
Classification: Development
Product: power-mgmt
Version: unspecified
Platform: Other
OS/Version: Solaris
Status: ACCEPTED
Severity: major
Priority: P2
Component: PAD
AssignedTo: eric.saxe at sun.com
ReportedBy: eric.saxe at sun.com
CC: tesla-dev at opensolaris.org
Testing has raised an issue where on an X4450 system, the P-state domains don't
look consistent with the CPUID enumerated topology. Here's what we see
enumerated through CPUID (numbers are logical CPU ids):
Sockets
0, 4, 5, 6
1, 7, 8, 9
2, 10, 11, 12
3, 13, 14, 15
Caches
0, 4
1, 7
2, 10
5, 6
3, 13
8, 9
11, 12
14, 15
...and through ACPI:
P-State Domains
0, 4, 8, 12
1, 5, 9, 13
2, 6, 10, 14
3, 7, 11, 15
So the P-state domains span (and intersect) sockets, which doesn't seem
correct. Regardless, the PAD kernel doesn't handle this gracefully. It
flounders trying to reconcile where the P-state domains should wind up in the
hierarchy. On a debug kernel, this results in an assertion failure:
panic[cpu6]/thread=ffffff001ef82c60: assertion failed:
GROUP_SIZE(parent->cmt_children) <= 1, file: ../../common/disp/cmt.c, line: 344
ffffff0522f7ce60 genunix:assfail+7e ()
ffffff0522f7cec0 unix:cmt_hier_promote+296 ()
ffffff0522f7cf80 unix:pg_cmt_cpu_init+228 ()
ffffff0522f7cfb0 unix:pg_cpu_init+70 ()
ffffff0522f7cfe0 unix:mp_startup+1c7 ()
ffffff0522f7cff0 unix:real_mode_start+135 ()
The PAD kernel needs to be able to deal (and at least be able to recover from)
situations where the platform presents strange power domains. It should detect
this, emit some sort of message that Event Based power management isn't
possible, and fall back to a sane state of operation.
Unfortunately, this isn't a trivial fix, because we cannot detect that there's
a problem until after several CPUs have come into being, and we have a
partially created PG hierarchy. So the solution to this involves detecting the
illegal groupings and pruning them from the hierarchy, and then blacklisting
optimizing for that HW sharing relationship as future CPUS are configured into
the system.
I have the code written for this, and I'm testing and validating it.
--
Configure bugmail: http://defect.opensolaris.org/bz/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.