http://defect.opensolaris.org/bz/show_bug.cgi?id=6537

           Summary: cmt_hier_promote() assertion failure seen under CPUPM
                    test suite run
    Classification: Development
           Product: power-mgmt
           Version: unspecified
          Platform: Other
        OS/Version: Solaris
            Status: ACCEPTED
          Severity: major
          Priority: P2
         Component: PAD
        AssignedTo: eric.saxe at sun.com
        ReportedBy: eric.saxe at sun.com
                CC: tesla-dev at opensolaris.org


The following assertion failure was tripped while running the CPUPM test suite
on an x4150 based system:

assertion failed: child->cmt_parent == pg, file: ../../common/disp/cmt.c, line: 
321
> $c
vpanic()
assfail+0x7e(fffffffffb9590d0, fffffffffb959138, 141)
cmt_hier_promote+0x25c(ffffff02eb4e4a38)
cmt_pad_enable+0x90(7)
cpupm_set_policy+0x40(0)
pm_ioctl+0x3542(8300000001, 33, 0, 100003, ffffff049422c3a8, ffffff00110a1dd4)
cdev_ioctl+0x45(8300000001, 33, 0, 100003, ffffff049422c3a8, ffffff00110a1dd4)
spec_ioctl+0x83(ffffff0d68664400, 33, 0, 100003, ffffff049422c3a8, 
ffffff00110a1dd4)
fop_ioctl+0x7b(ffffff0d68664400, 33, 0, 100003, ffffff049422c3a8, 
ffffff00110a1dd4)
ioctl+0x18e(3, 33, 0)
_sys_sysenter_post_swapgs+0x23c()

While enabling event based CPUPM, we were promoting a power domain above it's
parent in the hierarchy. While doing so, we tripped an assertion that the
children of the PG being promoted have that PG as their parent. This implies an
inconsistency in the hierarchy that was previously introduced. It cannot be a
race, as cpu_lock is held.

cmt_hier_promote() should update the children's reference when their parent is
being promoted. However, it finds the children by looking through the PGs
cmt_children grouping (also the children's siblings group) which is the set of
PGs against which the dispatcher does load balancing/coalescence policy at the
childrens level.

When all the CPUs in a given group go offline, that group is removed from the
siblings group, so the dispatcher won't consider it when implementing policy.
However, when it's removed, should we take a trip through cmt_hier_promote(),
we won't find that PG to update it's parent reference when the parent is
promoted. Later when the CPUs in that group come back online, the parent
reference will be incorrect opening the door to trip the "child->cmt_parent ==
pg" assertion.

The fix involves not depending on the cmt_children group to find the set of
children PGs needing to have their parent reference updated. This is done by
walking the CPUs in the PG being promoted, and for each CPU, examining the PGs
in the CPUs lineage, looking for ones that need to have their parent reference
updated. cmt_hier_promote() already walks the CPUs in the PG, so it's simply
another case to add.

-- 
Configure bugmail: http://defect.opensolaris.org/bz/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.

Reply via email to