[tesla-dev] [Bug 6219] New: Implement code review feedback for Sasha Kolbasov

[email protected] Fri, 23 Jan 2009 12:17:14 -0800 (PST)

http://defect.opensolaris.org/bz/show_bug.cgi?id=6219


           Summary: Implement code review feedback for Sasha Kolbasov
    Classification: Development
           Product: power-mgmt
           Version: unspecified
          Platform: Other
        OS/Version: Solaris
            Status: NEW
          Severity: minor
          Priority: P3
         Component: PAD
        AssignedTo: eric.saxe at sun.com
        ReportedBy: eric.saxe at sun.com
                CC: tesla-dev at opensolaris.org


pad code review
===============

Author: Alexander Kolbasov <akolb at eng.sun.com>
Date: 2009/01/23
URL: http://cr.opensolaris.org/~esaxe/pad-cr/


1 usr/src/cmd/mdb/common/modules/genunix/pg.c
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

1.1 Future RFEs

Are there some extra things that can be done in MDB to simplify
debugging for PAD? (future RFEs).

2 usr/src/uts/common/sys/pg.h
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

2.1 
====

  59 typedef struct pg_cb_ops {
  60         void            (*thread_swtch)(struct pg *, struct cpu *,
kthread_t *,
  61                             kthread_t *);
  62         void            (*thread_remain)(struct pg *, struct cpu *,
  63                             kthread_t *);
  64 } pg_cb_ops_t;
  65 
  66 /*
  67  * Processor group structure
  68  */
  69 typedef struct pg {
  70         pgid_t                  pg_id;          /* seq id */
  71         pg_relation_t           pg_relation;    /* grouping relationship
*/
  72         struct pg_class         *pg_class;      /* pg class */
  73         struct group            pg_cpus;        /* group of CPUs */
  74         pg_cb_ops_t             pg_cb;          /* pg events ops vector */
  75 } pg_t;

Why pg_cb_ops are part of pg and not pg_cmt? Are these relevant for all kinds
of
PGs?

2.2 
====

 144 #define PG_NUM_CPUS(pgrp)                       \
 145         (GROUP_SIZE(&((pg_t *)pgrp)->pg_cpus))

Doing the (pg_t *) cast in aa macro is dangerous - it is easy to pass some
unrelated pointer to it and get away with it. Can the cast be done by a caller
where it is easier to verify that the pointer is the right one?

2.3 
====

 182  * PG Event callbacks
 183  */
 184 void            pg_ev_thread_swtch(cpu_t *, kthread_t *, kthread_t *);
 185 void            pg_ev_thread_remain(cpu_t *, kthread_t *);

What is the relationship of these callbacks with pg cb callbacks with similar
names?

3 src/uts/common/sys/cpuvar.h
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

- No comments

4 usr/src/uts/common/sys/cyclic.h
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

- There are no actual changes in this file

5 usr/src/uts/common/sys/pghw.h
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

5.1 
====

60  * Returns true if the hardware is a type of power management domain

This comment is rather mysterious, but the code below is clear.

  62 #define PGHW_PM_DOMAIN(hw)      \
  63         (hw == PGHW_POW_ACTIVE || hw == PGHW_POW_IDLE)

with the exception that it is a boolean check, so may be it should be 

PGHW_IS_PM_DOMAIN(hw) ?

5.2 
====

 89         kstat_t         *pghw_kstat;    /* physical kstats exported */

Do you think that kstats belong to hardware PG or to cmt pg layer (or may be
both?)

6 usr/src/uts/common/os/pg.c
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

6.1 
====

 154 #define PG_POLICY_NAME(pg)                                              \
 155         ((pg)->pg_class->pgc_ops->policy_name ?                           
    \
 156             (pg)->pg_class->pgc_ops->policy_name(pg) : 0)              \

This should be NULL, instead of 0.

6.2 
====

Why events callback vercot is bzeroed instead of initializing to null ops or
the default?


 398         /*
 399          * Initialize the events ops vector
 400          */
 401         bcopy(&pg_cb_ops_default, &pg->pg_cb, sizeof (struct pg_cb_ops));
 402 

7 usr/src/uts/common/os/pghw.c
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

7.1 
====

Who now calls pghw_type_string?

 419         kstat_named_setstr(&pgsp->pg_policy, pg_policy_name((pg_t *)pg));

7.2 
====

I am wondering whether it is more useful to put kstats to cmt layer rather than
hw layer. This will allow them to show information from cmt, hw and group level
at once.

8 usr/src/uts/common/disp/cmt_policy.c
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

8.1 
====

The following comment is confusing in its description of top level balancing
on NUMA systems.

  54  *
  55  * On NUMA systems, the CMT load balancing algorithm balances across the
  56  * CMT PGs within their respective lgroups. On UMA based system, there
  57  * exists a top level group of PGs to balance across. On NUMA systems
multiple
  58  * top level groups are instantiated, where the top level balancing begins
by
  59  * balancng across the CMT PGs within their respective (per lgroup) top
level
  60  * groups.

8.2 
====

What can be a more generic solution for the XXX comment below? It assumes too
much specifics about the way utilization is computed.

  76         /*
  77          * XXX Note this assumes that curthread's utilization is "1"
  78          */
  79         if (self && bitset_in_set(&here->cmt_cpus_actv_set,
CPU->cpu_seqid))
  80                 here_util--;    /* Ignore curthread's effect */
  81 

8.3 
====

Would it simplify things to have just a single bit which means whether the
   policy is coalescence or load balancing?

  82         /*
  83          * Load balancing and coalescence are conflicting policies
  84          */
  85         ASSERT((policy & (CMT_BALANCE|CMT_COALESCE)) !=
  86             (CMT_BALANCE|CMT_COALESCE));

8.4 
====

What do you do with CMT_AFFINITY? There seems to be no policy implementation
for it!

8.5 
====

For the balancing policy what do should be done if 

   (here_util < there_util) && 
   (CMT_CAPACITY(there) much more than  CMT_CAPACITY(here))

8.6 cmt_balance()
=================

 197                 /*
 198                  * Select an idle CPU from the target PG
 199                  */

Does it mean that after we walked through PG tree and found PG to move two we
will only move there if there are idle CPUs available, but we will stay with
the
current cp even if it is not idle?

8.7 cmt_should_migrate()
========================

Should there be some memory barriers for the access of pg->cmt_utilization
values or this is not very important anyway?

9 usr/src/uts/common/disp/cmt.c
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

9.1 
====

In the code below you combine plain reads with atomics. Is there any need to
have memory barriers as well? 

1147                         do {
1148                                 u = cmt->cmt_utilization;
1149                         } while (atomic_cas_32(&cmt->cmt_utilization, u,
1150                             u - 1) != u);

9.2 
====

The comments around CPU_DISP_CMT_TRANS talk about time slice expiration, but
the
code doesn't seem to have nothing to do with time slice expiration. It would be
good to have a better comment explaining the notion of transients and how
exactly is it defined.


10 usr/src/uts/common/os/cpu.c
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

10.1 
=====

Need to update cp->cpu_info_kstat->ks_data_size by the size of ci_curr_cstate
size

10.2 
=====

pg_cmt_hier_rank()

>From the code it seems that there is no need to get platform rank at all if one
of the hw{1,2} is power domain and CPUPM_POLICY_DISABLED.


11 usr/src/uts/common/sys/cpu_pm.h
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

11.1 
=====

It would be good to have some comment explaining what is CPUPM_POLICY_ELASTIC.
Same goes for a short explanation of active and idle power domains.

11.2 
=====

  72 typedef struct cpupm_state {
  73         uint32_t        cps_speed;
  74         cpupm_handle_t  cps_handle;
  75 } cpupm_state_t;

The only use of cps_speed that I found was in 

i86pc/os/cpupm/cpupm_mach.c:

uint_t
cpupm_plat_state_enumerate()
{
        int     *speeds;

        ...

        /*
         * If the caller passes NULL for states, just return the
         * number of states.
         */
        if (states != NULL) {
                for (i = 0; i < nspeeds; i++) {
                        states[i].cps_speed = speeds[i];
                        states[i].cps_handle = (cpupm_handle_t)i;
                }
        }
}

So here speed is treated as int, not as uint32_t and there are, actually no
other consumers of it. What is it used for?

11.3 
=====

Why is utilization in cpupm_utilization_change() uint32_t? The only values
used are 0 and 1. Why the choice of fixed-size type?


12 usr/src/uts/common/os/cpu_pm.c
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

12.1 cpupm_set_policy()
=======================

The comment says:

  55         /*
  56          * Pausing CPUs causes a high priority thread to be scheduled
  57          * on all other CPUs (besides the current one). This locks out
  58          * other CPUs from making CPUPM state transitions, and in elastic
  59          * mode has the side effect of bringing all CPUs active and to
full
  60          * speed.
  61          */

Why do you need to do the same when setting the policy to disabled? Can
something bad happen if the policy changes without stopping other CPUs?

Also, the cpu_lock() is not required if the policy is not changing.

12.2 
=====

  78         default:
  79                 cmn_err(CE_WARN, "Attempt to set unknown CPUPM policy
%d\n",
  80                     new_policy);

Can you ASSERT that this is not the case? I guess the pm_ioctl() handler should
verify that it tries to set the correct policy.

12.3 XXX comments
=================

XXX comment left: * XXX CPUPM platform interface for this?
Same for line 196 and 281.

12.4 cpupm_utilization_change()
===============================

it seems that there are no policy changes if the state is not LOW_POWER or
MAX_POWER. Well, you can only have these two for now, but it still seems
strange
since it is not clear whether the code should remain correct if more states are
added or it should be changed as well.

12.5 
=====

 287         if (dom != NULL) {
 288                 dom->cpd_named_states[CPUPM_STATE_MAX_PERF] =
 289                                            
&dom->cpd_states[max_perf_level];
 290         }

Should there be a check that max_perf_level is within the nstates?

13 usr/src/uts/sun4{u,v}/os/cmp.c
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

13.1 pg_plat_hw_rank()
======================

234         int i, rank1, rank2;

rank1 and rank2 are not initialised to anything. In the code below if either
hw1
or hw2 is not part of the list of { ipipe, chip, cache } the comparison will
use
junk values. 

Same goes for sun4v version.

-- 
Configure bugmail: http://defect.opensolaris.org/bz/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.

[tesla-dev] [Bug 6219] New: Implement code review feedback for Sasha Kolbasov

Reply via email to