http://defect.opensolaris.org/bz/show_bug.cgi?id=6605

           Summary: PAD needs to be hardened against strange _PSDs
    Classification: Development
           Product: power-mgmt
           Version: unspecified
          Platform: Other
        OS/Version: Solaris
            Status: ACCEPTED
          Severity: major
          Priority: P2
         Component: PAD
        AssignedTo: eric.saxe at sun.com
        ReportedBy: eric.saxe at sun.com
                CC: tesla-dev at opensolaris.org


Testing has raised an issue where on an X4450 system, the P-state domains don't
look consistent with the CPUID enumerated topology. Here's what we see
enumerated through CPUID (numbers are logical CPU ids):

Sockets
0, 4, 5, 6
1, 7, 8, 9
2, 10, 11, 12
3, 13, 14, 15

Caches
0, 4
1, 7
2, 10
5, 6
3, 13
8, 9
11, 12
14, 15

...and through ACPI:

P-State Domains
0, 4, 8, 12
1, 5, 9, 13
2, 6, 10, 14
3, 7, 11, 15

So the P-state domains span (and intersect) sockets, which doesn't seem
correct. Regardless, the PAD kernel doesn't handle this gracefully. It
flounders trying to reconcile where the P-state domains should wind up in the
hierarchy. On a debug kernel, this results in an assertion failure:

       panic[cpu6]/thread=ffffff001ef82c60: assertion failed:
GROUP_SIZE(parent->cmt_children) <= 1, file: ../../common/disp/cmt.c, line: 344

       ffffff0522f7ce60 genunix:assfail+7e ()
       ffffff0522f7cec0 unix:cmt_hier_promote+296 ()
       ffffff0522f7cf80 unix:pg_cmt_cpu_init+228 ()
       ffffff0522f7cfb0 unix:pg_cpu_init+70 ()
       ffffff0522f7cfe0 unix:mp_startup+1c7 ()
       ffffff0522f7cff0 unix:real_mode_start+135 () 

The PAD kernel needs to be able to deal (and at least be able to recover from)
situations where the platform presents strange power domains. It should detect
this, emit some sort of message that Event Based power management isn't
possible, and fall back to a sane state of operation.

Unfortunately, this isn't a trivial fix, because we cannot detect that there's
a problem until after several CPUs have come into being, and we have a
partially created PG hierarchy. So the solution to this involves detecting the
illegal groupings and pruning them from the hierarchy, and then blacklisting
optimizing for that HW sharing relationship as future CPUS are configured into
the system.

I have the code written for this, and I'm testing and validating it.

-- 
Configure bugmail: http://defect.opensolaris.org/bz/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.

Reply via email to