The attached is a draft of the case which introduces the current engineering program for Power Mangement in Solaris.

Note that it is intentionally high-level, and does not, itself present any specific interfaces. It attempts to provide an overview of what the program has in mind, with a partial outline of some of the initially intended projects. Specific component design and interface will be the purview of each of these succeeding one-pagers.

Comments are invited.
-db

--
; David J. Brown Ph.D. (cantab.)
; Principal Engineer
; Solaris Engineering
; Oracle
; --
; Postal Address:                   Telephone: (650) 786-5558
;  4150 Network Circle, UMPK17-307  FAX:       (650) 786-5734
;  Santa Clara, CA 95054            e-mail:    [email protected]


Template Version: @(#)onepager.txt 1.35 07/11/07 SMI
Copyright 2007 Sun Microsystems

1. Introduction
   1.1. Project/Component Working Name:
        Power Management 2.0 Umbrella Case

   1.2. Name of Document Author/Supplier:
        David J. Brown

   1.3. Date of This Document:
        06/15/2010
        
        1.3.1. Date this project was conceived:
                June 2009 (This umbrella case is derived and extended
                from earlier work to support suspend/resume and CPU
                power management on Sun's x64 hardware platforms).

   1.4. Name of Major Document Customer(s)/Consumer(s):
        1.4.1. The PAC or CPT you expect to review your project:
                Systems PAC
        1.4.2. The ARC(s) you expect to review your project:
                PSARC
        1.4.3. The Director/VP who is "Sponsoring" this project:
                [email protected], [email protected]
        1.4.4. The name of your business unit:
                x64 Platform Software

   1.5. Email Aliases:
        1.5.1. Responsible Manager:     [email protected]
        1.5.2. Responsible Engineer:    [email protected]
        1.5.3. Marketing Manager:       [email protected]
        1.5.4. Interest List:           [email protected]

2. Project Summary
   2.1. Project Description:
        The existing power management in Solaris dates from over 17 years
        ago (April 1993), when the original effort to implement checkpoint
        resume (CPR) for the "Voyager" product took place.
        Recently, there has been a great deal of vigor in the industry related
        energy efficiency, and hence the appearance of many new power management
        facilities - ranging from the individual hardware components to the
        contemporary hardware platforms.

        Over the past four years specific work has been done to support
        contemporary features on the Intel-architecture platforms (both 
        Sun's AMD- and Intel-based systems).  The principal focus of these
        projects has been to implement Suspend-to-RAM (ACPI S3), and to
        support contemporary CPU power-management features (P-states, C-states, 
        and T-states) for contemporary AMD and Intel processors.

        A range of modern facilities for power management are emerging.
        The system's earlier conceptions for power management need to be 
        revised to support these properly, and to pursue the end-objective of 
        energy-efficient computing.

        This case introduces the Program, and a number of the initial projects 
        that constitute the next generation of power management facilities 
within Solaris.


   2.2. Risks and Assumptions:

        The program's broad goal is to provide a practical solution to the 
energy-efficient 
        computing problem.  This requires the ability to construct a 
comprehensive system power
        model (for each platform upon which it runs), and will also ultimately
        require improved knowledge of the dynamic resource use of both 
individual applications and
        workloads.  We expect to gain better informational interfaces from both 
hardware 
        component vendors (e.g. Intel) and the hardware platforms teams to 
address the first point. 
        Improved knowledge of applications and workloads is a great opportunity 
now that Sun has 
        been integrated with Oracle.  This relies on the participation of the 
various software
        product groups in question.

3. Business Summary

   3.1. Problem Area:
        Support for contemporary component- and platform-level power management
        facilities provides the basis for the more energy-efficient operation
        of the computing hardware Sun/Oracle sells.

        A number of rudimentary facilities are coming to be in place in the
        hardware components and platforms, and the remainder of the systems
        stack above must be improved to exploit these.  Particular attention
        is needed at the levels of the firmware, the system virtualization layer
        (hypervisor) and the operating system (whether virtualized or running
        natively).

        The focus of this program is on facilities implemented within the 
Solaris operating 
        system.  The initial attention will be to the system design and 
implementation 
        required when the OS is run on bare iron (i.e. the non-virtualized 
        case).  These techniques will then be considered within the context 
where
        Solaris is a virtualized guest and extended as appropriate.  The 
specific virtualized
        settings of interest are Solaris under both the Sun4v and OVM 
hypervisors (i.e.
        these two para-virtualized cases).

        The Program's primary focus will be on Server power management (with 
particular 
        attention to the company's volume servers to begin with).


   3.2. Market/Requester:
        All customers are now attentive to the energy consequences of
        the systems they operate.  This is of particular concern in 
        data centers, where the energy costs to operate equipment
        can now be expected to meet or exceed the capital cost of
        the equipement's acquisition.

        All federal government customers (as well, possibly, as state
        and local government ones, and others) will be required to purchase 
        equipment that meets the EPA's Energy-star guidelines.  The EPA already 
        have a specification for consumer equipment, and one for mobile and
        workstation class computer equipment.  They are presently developing
        the Energy-star specifications for servers, storage, and data
        centers.


   3.3. Business Justification:
        Energy management is now a principal concern for all computer
        equipment purchasers - pointedly so for those in the 
enterprise/commercial
        and government sectors.

   3.4. Competitive Analysis:
        This feature is needed for Solaris to be competitive with operating 
systems 
        from other vendors, and perhaps even to provide advantage over them.

   3.5. Opportunity Window/Exposure:
        Exposure is immediate.  All major hardware component vendors are driving
        and delivering these features, and we must support and exploit them to 
        keep pace.  RedHat, Microsoft Windows (both client and server), SuSE
        and others are all following these features and working to support them.

        Power management work in Solaris is publicly visible via development 
projects 
        in OpenSolaris.

   3.6. How will you know when you are done?:
        This Power Management Program can be considered done, when the system 
has 
        a power management facility that can:

        - Be enabled or disabled: When enabled, the system strives to be 
"energy efficient" --
        by making dynamic changes to the hardware platform's provisioning 
and/or performance 
        levels in order to minimize the amount of energy required to perform 
any workload run 
        on the system.

        - When enabled, the system administrator can express bounds on the 
degradation
        in performance and/or responsiveness to dynamic changes in load that 
the system 
        will stay within.

        - In addition (whether dynamic power management is enabled or not), the 
system will 
        be able to restrict itself to operate within a specified
        portion of the hardware platform's full capacity when the environment 
it's operating 
        in requires that.  This may occur when either power or reserved energy 
are limited; when 
        the system is running as a virtualized guest; or when otherwise 
specified by the sytsem's
        administrator.

        Power Management will be enabled by default.  It is a goal that no 
administrator
        would wish to disable it, except in a small number of special or 
unusual circumstances.
        a. Certain mission-critical or pseudo real-time deployments where 
static provisioning 
        for worst-case is required.
        b. Pathological workloads whose dynamic behaviors grossly violate the 
assumptions
        of the energy-efficiency algorithms used by the PM system.

4. Technical Description:

        Power management refers to the system's dynamic adjustment of a 
platform's hardware 
        resources.  This may be achieved by adjusting the performance levels of 
particular 
        resources, and/or what is presently provisioned (available), in order 
to achieve the 
        best possible energy efficiency while running computational tasks.

        At the highest level, the simple purpose of power management on 
Oracle's platforms is
        to maximize energy-efficiency at all times.  That is, the objective is 
to minimize
        the total energy required to complete any computational task, and/or to 
operate any
        service.  The following basic operating conceptions are illustrative, 
and may be 
        helpful to a more detailed understanding.

        - "Performance-only"

        The system does not perform any dynamic power management.  It operates 
as systems software
        has traditionally, using a statically defined set of resources whose 
performance levels
        and available capacity is not adjusted according to the workload's 
requirements 
        while the system runs.

        - Energy-efficient at maximum performance (elsewhere called "Adaptive 
Performance")

        The system does perform dynamic power management, but must still 
achieve the maximum
        possible performance for sustained workloads.  Energy is saved where 
possible 
        by dynamically adjusting the platform's provisioning and resource 
performance levels,
        but only for those hardware assets that do not improve the performace 
of what 
        is running.  The simplest way to think of this is that the system will 
eliminate any 
        gratuitous over-provisioning - all resources which do not affect 
performance of
        the current workload are appropriately adjusted.

        The key desideratum for this choice, is that there should be no 
practical performance 
        difference between workloads run in this way, when compared to the 
situation in which 
        no power management at all is being done (i.e. when the system's power 
management function 
        is disabled).  In practice this may mean that we designate a certain 
small "principled amount"
        of allowed performance regression as assessed under certain standard 
benchmarks.

        - Energy-efficient with tolerated performance regression (elsewhere 
called "Elastic")

        The general case of energy-efficiency is that in which the constraint 
of maximum possible
        performance for the workload can be relaxed.  The system's objective is 
still to minimize 
        the total energy to perform any computational task run on the platform.

        While the best achievable performance is not required, some bounds are 
established in 
        order to limit the degradation of performance and/or responsiveness to 
changing load.  
        The system may adjust provisioning dynamically so long as it stays 
within a
        specified responsiveness to increase provisioning to full capacity as 
required by transient
        load.

        - Power-constrained or Energy-constrained operation (elsewhere called 
"Power-saving")

        This operating constraint applies when there is a practical limit on 
power (rate of energy
        delivery) or a limited [reserve] supply of energy available.  System 
capacity must be 
        reduced to remain within these limits.

        In these cases, the system is prepared to degrade the quality of one or 
more services, and/or 
        to reduce resourcing that may reduce their service level.  In addition, 
decisions that certain
        tasks or services are not to be run at all, in favor of others that are 
deemed to be more 
        critical may be made.
        
        The system might choose to reduce capacity and use that more limited 
resource
        in such a way that all tasks are affected equally (in their performance 
or responsiveness
        to changing demand).
        In the ideal, different tasks or services running on the system might 
be degraded
        non-uniformly, with the objective of keeping critical services running 
at required
        throughput, to stay within available power limits or so that an energy 
reserve can 
        be made to last for an appropriate duration.

        The usage cases for this operating condition is limited energy reserve, 
such as for 
        mobile systems whilst operating on battery power, 
        or for tethered systems that find themselves to be 
        running on a UPS, battery-backup, or generator backup power source due 
to power outage.
        Another usage case is when instantaneous *power* availability is 
limited - such as in 
        a power utility brown out, or any other power distribution situation 
that might cause this.
        In each of these cases, load must be shed and/or service quality 
reduced to stay within
        these limits.

    4.1. Details

        This umbrella case provides context and scope for the Program.  
Specific design is 
        to be provided in the projects under it.
        A number of projects are expected.  The following provides a partial 
outline:
        
        1. The system's power management facility will be implementated as a 
        Solaris service, with new high-level administrative controls expressed
        in SMF (Solaris's service managment facility).  These controls will
        be hardware platform and instruction-set architecture abstract.
        The primary method of administration is SMF.

        2. Aspects of the system's earlier power management facility which are
        obsolete or inadequate to address the above-described dynamic power
        management solution will be removed.  This includes a number of 
low-level 
        implementation-specific controls (hardware platform and/or 
ISA-specific) 
        which were exposed earlier.

        3. The usability of the service will be improved. Both a command-line 
and 
        programmatic interface to the PM service and its facilities will be 
provided.
        Appropriate exposure of the administrative controls to the service is 
another 
        consideration. An appropriate means to access aspects of the service's 
        SMF description will be provided.

        4. An improved framework for resource-centric power management will be 
provided.
        The initial work done with the power-aware dispatcher (as that relates 
to the CPU 
        resources) will be used as the example to extend similar dynamic power 
management 
        capabilities to other hardware devices on the platform. 

        5. A new device driver interface for power-relevant operation will be 
introduced,
        so that devices can describe their PM-relevant capabilities to the 
system.
        For example, to describe the various power and performance states they 
offer,
        their ability to perform software actions such as suspend/resume, as 
well as 
        the interfaces required to operate those controls.

        6. Development practices for power management and energy-aware modules 
(with 
        particular attention to device drivers) within the Solaris OS will be 
defined 
        and codified.

        7. Observability and debugability of the system's power-aware and 
energy-efficient 
        facilities and their operation will be improved.  dtrace probes seem 
one likely avenue.

        8. The system's Suspend/Resume capability will be expanded to encompass 
a 
        broader range of system-level Power states.
        The suspend/resume facility will be improved to encompass and unify a 
more 
        complete range of system suspend types (from power-on suspend, 
best-available-suspend, 
        suspend-to-RAM, suspend-to-disk [non-volatile storage], hybrid-suspend, 
to soft-off)


    4.2. Bug/RFE Number(s):
        N/A
    
    4.3. In Scope:
        The framework and system implementation needed to offer a 
power-management service that
        operate according to the aforementioned approach to energy-efficiency.
        This encompasses power management facilities both for active 
(while-running) and inactive 
        (non-running) operation of the platform or its more individual 
components.

    4.4. Out of Scope:

        Near-term, we will give much less attention to non-server systems 
(desktops
        and laptops).
        
    4.5. Interfaces:
        Interfaces will be specified on a per-project basis.

    4.6. Doc Impact:
        Manual pages, developer docs and administration guides will be impacted 
by
        the individual projects on this roadmap.

    4.7. Admin/Config Impact:
        The default installation will enable the system to perform dynamic 
power management, 
        and its default configuration shall be to perform energy-efficient 
operation described by
        the "adaptive performance" conception above.
        The administrator may configure the system to perform energy-efficient 
operation under 
        more relaxed performance constraints.  

    4.8. HA Impact:
        More rapid availability (activation) of non-provisioned (idle/suspended)
        resources is one expected outcome of this program.
 
    4.9. I18N/L10N Impact:
        Limited to the addition of 10's of messages as exposed by the new SMF 
power
        service.

    4.10. Packaging & Delivery:
        Part of the core OS facilities delivered in the Solaris OS/Net 
consolidation

    4.11. Security Impact:
        This project introduces system-level facilities that require an 
appropriate level
        of authorization to configure and/or enact, but this is not in any way
        extraordinary.

        Configuration and enactment of power-management actions will be 
auditable.

    4.12. Dependencies:
        The capacity and utilization abstractions presently underlying the 
implementation 
        of the Power-aware dispatcher, and the energy-efficiency heuristic it 
presently uses,
        is something this Program expects to sustain.

5. Reference Documents:

    5.1. Design/Specification documents
        To be provided by each individual project under this umbrella. 

    5.2. Related documents
        Brown, David J. and Charles Reams, "Toward Energy-efficient Computing,"
        Communications of the ACM, Vol. 53, No. 3, pp. 50-58, March 2010,
        
http://cacm.acm.org/magazines/2010/3/76284-toward-energy-efficient-computing/fulltext

        Saxe, Eric, "Power-efficient Software," Communications of the ACM, Vol. 
53, No.2,
        pp. 44-48, Feb 2010, 
        
http://cacm.acm.org/magazines/2010/2/69355-power-efficient-software/fulltext

        Power management community on Open Solaris:
        http://hub.opensolaris.org/bin/view/Community+Group+pm/

        Recent PM-related ARC cases

        PSARC 2009/396 Tickless Kernel Architecture / lbolt decoupling
        PSARC 2009/289 FBDIMM Idle Power Enhancement (FIPE) driver
        PSARC 2009/283 Default enabling of CPU power management in S10U8 for 
x86 systems
        PSARC 2009/112 sys-suspend(1)
        PSARC 2009/101 Turbo mode observability
        PSARC 2009/086 PowerTOP --cpu option
        PSARC 2008/777 cpupm keyword mode extensions
        PSARC 2008/742 SDcard Framework Suspend & Resume
        PSARC 2008/376 PowerTOP for OpenSolaris
        PSARC 2008/291 Power Management Core-disable for n2/vf CPUs
        PSARC 2008/091 Libtopo enumeration of fans and power supplies via IPMI
        PSARC 2008/021 HAL Power Management Support
        PSARC 2007/679 CPUFreq HAL
        PSARC 2006/273 Rage XL Framebuffer Driver
        PSARC 2006/132 Wake On LAN
        PSARC 2005/469 X86 Energy Star compliance


6. Resources and Schedule:
   6.1. Projected Availability:
        CY2010Q3 Suspend-to-disk
        CY2010Q4 Power service (initial PM 2.0) SMF facility
        CY2010Q4 libpower - C language bindings (programmatic interface)
        CY2010Q4 Suspend/Resume for initial reference set of x64 volume 
                server products (e.g. x4170, x4270, x4275, Lynx+)
        CY2011Q4 Energy-star compliance for certain volume servers

   6.2. Cost of Effort:
        To be defined by the follow-on projects.

   6.3. Cost of Capital Resources:
        Existing lab clients and servers will be used.

   6.4. Product Approval Committee requested information:
        6.4.1. Consolidation or Component Name:
                ON
        6.4.3. Type of CPT Review and Approval expected:
                Standard
        6.4.4. Project Boundary Conditions:
                To be define by the follow-on projects.
        6.4.5. Is this a necessary project for OEM agreements:
                No
        6.4.6. Notes:
                // See dependencies section above.
        6.4.7. Target RTI Date/Release:
                To be defined by the follow-on projects.
        6.4.8. Target Code Design Review Date:
                To be defined by the follow-on projects.
        6.4.9. Update approval addition:
                N/A

   6.5. ARC review type:
        Standard

   6.6. ARC Exposure:
        open
       6.6.1. Rationale:
                N/A

7. Prototype Availability:
   7.1. Prototype Availability:
        To be defined by the follow-on projects.

   7.2. Prototype Cost:
        To be defined by the follow-on projects.

_______________________________________________
pm-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/pm-discuss

Reply via email to