http://defect.opensolaris.org/bz/show_bug.cgi?id=6219
Summary: Implement code review feedback for Sasha Kolbasov
Classification: Development
Product: power-mgmt
Version: unspecified
Platform: Other
OS/Version: Solaris
Status: NEW
Severity: minor
Priority: P3
Component: PAD
AssignedTo: eric.saxe at sun.com
ReportedBy: eric.saxe at sun.com
CC: tesla-dev at opensolaris.org
pad code review
===============
Author: Alexander Kolbasov <akolb at eng.sun.com>
Date: 2009/01/23
URL: http://cr.opensolaris.org/~esaxe/pad-cr/
1 usr/src/cmd/mdb/common/modules/genunix/pg.c
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1.1 Future RFEs
Are there some extra things that can be done in MDB to simplify
debugging for PAD? (future RFEs).
2 usr/src/uts/common/sys/pg.h
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2.1
====
59 typedef struct pg_cb_ops {
60 void (*thread_swtch)(struct pg *, struct cpu *,
kthread_t *,
61 kthread_t *);
62 void (*thread_remain)(struct pg *, struct cpu *,
63 kthread_t *);
64 } pg_cb_ops_t;
65
66 /*
67 * Processor group structure
68 */
69 typedef struct pg {
70 pgid_t pg_id; /* seq id */
71 pg_relation_t pg_relation; /* grouping relationship
*/
72 struct pg_class *pg_class; /* pg class */
73 struct group pg_cpus; /* group of CPUs */
74 pg_cb_ops_t pg_cb; /* pg events ops vector */
75 } pg_t;
Why pg_cb_ops are part of pg and not pg_cmt? Are these relevant for all kinds
of
PGs?
2.2
====
144 #define PG_NUM_CPUS(pgrp) \
145 (GROUP_SIZE(&((pg_t *)pgrp)->pg_cpus))
Doing the (pg_t *) cast in aa macro is dangerous - it is easy to pass some
unrelated pointer to it and get away with it. Can the cast be done by a caller
where it is easier to verify that the pointer is the right one?
2.3
====
182 * PG Event callbacks
183 */
184 void pg_ev_thread_swtch(cpu_t *, kthread_t *, kthread_t *);
185 void pg_ev_thread_remain(cpu_t *, kthread_t *);
What is the relationship of these callbacks with pg cb callbacks with similar
names?
3 src/uts/common/sys/cpuvar.h
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- No comments
4 usr/src/uts/common/sys/cyclic.h
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- There are no actual changes in this file
5 usr/src/uts/common/sys/pghw.h
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
5.1
====
60 * Returns true if the hardware is a type of power management domain
This comment is rather mysterious, but the code below is clear.
62 #define PGHW_PM_DOMAIN(hw) \
63 (hw == PGHW_POW_ACTIVE || hw == PGHW_POW_IDLE)
with the exception that it is a boolean check, so may be it should be
PGHW_IS_PM_DOMAIN(hw) ?
5.2
====
89 kstat_t *pghw_kstat; /* physical kstats exported */
Do you think that kstats belong to hardware PG or to cmt pg layer (or may be
both?)
6 usr/src/uts/common/os/pg.c
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
6.1
====
154 #define PG_POLICY_NAME(pg) \
155 ((pg)->pg_class->pgc_ops->policy_name ?
\
156 (pg)->pg_class->pgc_ops->policy_name(pg) : 0) \
This should be NULL, instead of 0.
6.2
====
Why events callback vercot is bzeroed instead of initializing to null ops or
the default?
398 /*
399 * Initialize the events ops vector
400 */
401 bcopy(&pg_cb_ops_default, &pg->pg_cb, sizeof (struct pg_cb_ops));
402
7 usr/src/uts/common/os/pghw.c
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
7.1
====
Who now calls pghw_type_string?
419 kstat_named_setstr(&pgsp->pg_policy, pg_policy_name((pg_t *)pg));
7.2
====
I am wondering whether it is more useful to put kstats to cmt layer rather than
hw layer. This will allow them to show information from cmt, hw and group level
at once.
8 usr/src/uts/common/disp/cmt_policy.c
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
8.1
====
The following comment is confusing in its description of top level balancing
on NUMA systems.
54 *
55 * On NUMA systems, the CMT load balancing algorithm balances across the
56 * CMT PGs within their respective lgroups. On UMA based system, there
57 * exists a top level group of PGs to balance across. On NUMA systems
multiple
58 * top level groups are instantiated, where the top level balancing begins
by
59 * balancng across the CMT PGs within their respective (per lgroup) top
level
60 * groups.
8.2
====
What can be a more generic solution for the XXX comment below? It assumes too
much specifics about the way utilization is computed.
76 /*
77 * XXX Note this assumes that curthread's utilization is "1"
78 */
79 if (self && bitset_in_set(&here->cmt_cpus_actv_set,
CPU->cpu_seqid))
80 here_util--; /* Ignore curthread's effect */
81
8.3
====
Would it simplify things to have just a single bit which means whether the
policy is coalescence or load balancing?
82 /*
83 * Load balancing and coalescence are conflicting policies
84 */
85 ASSERT((policy & (CMT_BALANCE|CMT_COALESCE)) !=
86 (CMT_BALANCE|CMT_COALESCE));
8.4
====
What do you do with CMT_AFFINITY? There seems to be no policy implementation
for it!
8.5
====
For the balancing policy what do should be done if
(here_util < there_util) &&
(CMT_CAPACITY(there) much more than CMT_CAPACITY(here))
8.6 cmt_balance()
=================
197 /*
198 * Select an idle CPU from the target PG
199 */
Does it mean that after we walked through PG tree and found PG to move two we
will only move there if there are idle CPUs available, but we will stay with
the
current cp even if it is not idle?
8.7 cmt_should_migrate()
========================
Should there be some memory barriers for the access of pg->cmt_utilization
values or this is not very important anyway?
9 usr/src/uts/common/disp/cmt.c
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
9.1
====
In the code below you combine plain reads with atomics. Is there any need to
have memory barriers as well?
1147 do {
1148 u = cmt->cmt_utilization;
1149 } while (atomic_cas_32(&cmt->cmt_utilization, u,
1150 u - 1) != u);
9.2
====
The comments around CPU_DISP_CMT_TRANS talk about time slice expiration, but
the
code doesn't seem to have nothing to do with time slice expiration. It would be
good to have a better comment explaining the notion of transients and how
exactly is it defined.
10 usr/src/uts/common/os/cpu.c
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
10.1
=====
Need to update cp->cpu_info_kstat->ks_data_size by the size of ci_curr_cstate
size
10.2
=====
pg_cmt_hier_rank()
>From the code it seems that there is no need to get platform rank at all if one
of the hw{1,2} is power domain and CPUPM_POLICY_DISABLED.
11 usr/src/uts/common/sys/cpu_pm.h
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
11.1
=====
It would be good to have some comment explaining what is CPUPM_POLICY_ELASTIC.
Same goes for a short explanation of active and idle power domains.
11.2
=====
72 typedef struct cpupm_state {
73 uint32_t cps_speed;
74 cpupm_handle_t cps_handle;
75 } cpupm_state_t;
The only use of cps_speed that I found was in
i86pc/os/cpupm/cpupm_mach.c:
uint_t
cpupm_plat_state_enumerate()
{
int *speeds;
...
/*
* If the caller passes NULL for states, just return the
* number of states.
*/
if (states != NULL) {
for (i = 0; i < nspeeds; i++) {
states[i].cps_speed = speeds[i];
states[i].cps_handle = (cpupm_handle_t)i;
}
}
}
So here speed is treated as int, not as uint32_t and there are, actually no
other consumers of it. What is it used for?
11.3
=====
Why is utilization in cpupm_utilization_change() uint32_t? The only values
used are 0 and 1. Why the choice of fixed-size type?
12 usr/src/uts/common/os/cpu_pm.c
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
12.1 cpupm_set_policy()
=======================
The comment says:
55 /*
56 * Pausing CPUs causes a high priority thread to be scheduled
57 * on all other CPUs (besides the current one). This locks out
58 * other CPUs from making CPUPM state transitions, and in elastic
59 * mode has the side effect of bringing all CPUs active and to
full
60 * speed.
61 */
Why do you need to do the same when setting the policy to disabled? Can
something bad happen if the policy changes without stopping other CPUs?
Also, the cpu_lock() is not required if the policy is not changing.
12.2
=====
78 default:
79 cmn_err(CE_WARN, "Attempt to set unknown CPUPM policy
%d\n",
80 new_policy);
Can you ASSERT that this is not the case? I guess the pm_ioctl() handler should
verify that it tries to set the correct policy.
12.3 XXX comments
=================
XXX comment left: * XXX CPUPM platform interface for this?
Same for line 196 and 281.
12.4 cpupm_utilization_change()
===============================
it seems that there are no policy changes if the state is not LOW_POWER or
MAX_POWER. Well, you can only have these two for now, but it still seems
strange
since it is not clear whether the code should remain correct if more states are
added or it should be changed as well.
12.5
=====
287 if (dom != NULL) {
288 dom->cpd_named_states[CPUPM_STATE_MAX_PERF] =
289
&dom->cpd_states[max_perf_level];
290 }
Should there be a check that max_perf_level is within the nstates?
13 usr/src/uts/sun4{u,v}/os/cmp.c
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
13.1 pg_plat_hw_rank()
======================
234 int i, rank1, rank2;
rank1 and rank2 are not initialised to anything. In the code below if either
hw1
or hw2 is not part of the list of { ipipe, chip, cache } the comparison will
use
junk values.
Same goes for sun4v version.
--
Configure bugmail: http://defect.opensolaris.org/bz/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.