http://defect.opensolaris.org/bz/show_bug.cgi?id=4015





--- Comment #10 from Eric Saxe <eric.saxe at sun.com>  2008-10-24 10:20:17 ---


(In reply to comment #9)
> I'll have to request to re-open this bug.
> The siblings size of pipeline group is still wrong.

Hi Aubrey,

The siblings group size is correct. You're probably assuming that the size of
the siblings group should equal the number of instances of that hardware
sharing relationship system wide....in which case the size of the "pipeline"
siblings set would be 4 if there were 4 shared pipelines in the system, etc.
Instead, the siblings set (and the size of it) should capture a group of PGs
sharing the same parent in the hierarchy (just like a family tree).

Let's look at an example hierarchy, and walk through how cmt_balance was
designed to work. On the test machine, if the hierarchy looks like:

      POW_ACTIVE            POW_ACTIVE    
          |                     |
        CHIP                   ...
          |
    ----CACHE-----
   /    |    |    \
  IDLE IDLE IDLE IDLE
   |    |     |     |
 IPIPE IPIPE IPIPE IPIPE
  / \   / \   / \   / \
 0   1 2   3 4   5 6   7

The size of the siblings set for the CHIP, CACHE, and IPIPE groupings should be
1, and the size of the IDLE siblings set should be 4. CPU 7's PG lineage runs
along the right side of the hierarchy on the left. When the dispatcher does the
CMT policy for CPU 7, it will start at the top of CPU 7's lineage. The coalesce
policy is defined for the POW_ACTIVE level, so it will look at the PG at the
top of the right hierarchy (the only other sibling) to see if it should
balance. If yes, it will select a CPU from that POW active group on the right
and return. If no, then it will go to the next level down in CPU 7's
lineage...the CHIP PG to balance against the CHIP's other siblings (and there
aren't any so there's nothing to do).

Note that the dispatcher doesn't then go and consider the other CHIPs elsewhere
in the system (cousins), since doing so would undermine the policy decisions
make at higher levels (in this case, the decision already made to stay running
in the POW_ACTIVE domain on the left.).

So moving on, there's nothing to do at the CHIP level (it's implicitly
balanced). Same for the cache level. Then on to the IDLE group. There's 3 other
siblings, and the COALESCE policy is defined for this level...so we select a
balancing partner from this set, and see if we should select another idle group
(to drive up utilization on that group) or stay where we are at. The process
continues until we either get to the bottom of the hierarchy, or find a level
where we should balance.

Does this make sense?

-- 
Configure bugmail: http://defect.opensolaris.org/bz/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.

Reply via email to