I am sponsoring the following fast-track on behalf of Rafael Vanoni,
with a time-out of 09/09/2009.  The project desires
minor/major binding, plus micro/patch binding for one
interface, as specified.

-------------------------------------

Template Version: @(#)onepager.txt 1.35 07/11/07 SMI
Copyright 2007 Sun Microsystems

1. Introduction
    1.1. Project/Component Working Name:
        Tickless Kernel Architecture / lbolt decoupling

    1.2. Name of Document Author/Supplier:
        Rafael Vanoni Polanczyk (rafael.vanoni at sun.com)

    1.3. Date of This Document:
     08/04/09

     1.3.1. Date this project was conceived:
         07/01/09

    1.4. Name of Major Document Customer(s)/Consumer(s):
     1.4.1. The PAC or CPT you expect to review your project:
         Solaris PAC
     1.4.2. The ARC(s) you expect to review your project:
     1.4.3. The Director/VP who is "Sponsoring" this project:
         Greg.Lavender at Sun.COM
     1.4.4. The name of your business unit:
         Systems

    1.5. Email Aliases:
         1.5.1. Responsible Manager: darrin.johnson at sun.com
         1.5.2. Responsible Engineer: rafael.vanoni at sun.com
     1.5.3. Marketing Manger: mike.mulkey at sun.com
     1.5.4. Interest List: tickless-dev at opensolaris.org


2. Project Summary
    2.1. Project Description:
        The tickless project aims at implementing the services provided by the
        clock cyclic in an event driven fashion. The first sub-project is the
        decoupling of the lbolt and lbolt64 variables from clock(). These two
        variables are incremented at each firing of the clock cyclic and provide
        a time reference to the system. They are being replaced by two routines
        that are backed by gethrtime(), the existing ddi_get_lbolt() and
        the new ddi_get_lbolt64(), introduced as a migration path for existing
        non-DDI compliant consumers.

        This project also presents a solution to minimize the usage of the DDI
        lbolt routines through new interfaces, and a method to prevent any
        performance impact of migrating inexpensive references to variables, to
        calling of routines. These are described in detail on section 4.1.


4. Technical Description:
     4.1. Details:
     lbolt and lbolt64 variables will be replaced by two routines,
     ddi_get_lbolt() and ddi_get_lbolt64(), which are backed by a hardware
     counter to provide the same service in en event driven way.

        One of the major consumers of the lbolt service are the cv_timedwait()
        and cv_timedwait_sig() routines, which require lbolt to form one of its
        arguments (an absolute value of time) and once again internally to
        decompose it into a relative time. This project is introducing two new
        routines, cv_reltimedwait() and cv_reltimedwait_sig() which will perform
        the same service of the previously mentioned routines but simply
        receiving a relative time, and not requiring lbolt at all. These new
        routines will also have a new argument of type time_res_t to inform
        the underlying timeout system as to how accurately the given timeout
        must expire. This will allow the kernel to anticipate or defer such
        timeouts when possible, allowing the system to stay idle for longer
        periods of time.

        Some consumers of the lbolt and lbolt64 variables may have inexplicit
        dependencies on the cheapness of reading a memory position that will be
        exposed when migrated to a gethrtime() backed routine. In such cases
        migrating references to lbolt and lbolt64 to ddi_get_lbolt() and
        ddi_get_lbolt64() will have a negative performance impact. To address
        this case, our project will perform the    lbolt service in an hybrid 
way,
        switching from event to cyclic driven when the DDI lbolt routines are
        being heavily used. This cyclic mode will reprogram a timer that will
        expire at each clock tick and increment    an internal (lbolt like)
        variable and return its value to the consumer. This cyclic will only
        be activated during periods of heavy load, and will switch itself off
        when the activity subsides.

     The decision to remove the lbolt and lbolt64 variables was made during
     design review, and a consensus was reached on the basis that, since
     we're reaching the end of a major release, this is the right moment to
     obsolete these. The side effects and cost of maintaining such symbols
     outweigh the benefits. However, this decision can be re-evaluated in
     case the negative impact on 3rd party modules during the development
     release is greater than expected. We're working with ISV and RPE to
     minimize the impact pro-actively.

     4.2. Bug/RFE Number(s):
          6860030 tickless clock requires a clock() decoupled lbolt / lbolt64

     4.5. Interfaces:
         This project is adding the following interfaces to the DDI:

         int64_t ddi_get_lbolt64(void);

         clock_t cv_reltimedwait(kcondvar_t *cvp, kmutex_t *mp, clock_t delta,
             time_res_t res);

         clock_t cv_reltimedwait_sig(kcondvar_t *cvp, kmutex_t *mp, clock_t
             delta, time_res_t res);

         With time_res_t defined as

     enum time_res {
         TR_NANOSEC,
         TR_MICROSEC,
         TR_MILLISEC,
         TR_SEC,
         TR_CLOCK_TICK,
         TR_COUNT
     };

     typedef enum time_res time_res_t;

         In addition to that, the lbolt and lbolt64 variables (which are
         *private* symbols known to be used by non-DDI compliant modules) are
         being removed. 3rd party modules that are not brought up to speed will
         fail to load.

     In summary:

     Interface            Commitment  Comments
     -----------------------------------------------------------------------
     ddi_get_lbolt64()    Public/DDI  return lbolt64
     cv_reltimedwait(9F)    Public/DDI  cv_timedwait(9f), relative time
     cv_reltimedwait_sig(9F)    Public/DDI  cv_timedwait_sig(9F), relative time
     lbolt            Obsolete    commonly referenced kernel symbol
     lbolt64            Obsolete    commonly referenced kernel symbol

     We also plan on back porting the ddi_get_lbolt64() interface to Solaris
     10 Update 9 to extend the migration path for S10 users who would like
     to update their modules before moving to Solaris Nevada or the next
     version of Solaris. These users already have ddi_get_lbolt() but
     currently lack the 64 bits version of it. Such back port will have
     patch release binding.


     4.6. Doc Impact:
         6868417 updates for tickless kernel/lbolt decoupling (6860030)

         Updates to the 'Writing Device Drivers' document are necessary, the
         project team is in contact with the documentation group to address
         these.


5. Reference Documents:
     This project is being developed through OpenSolaris, our project pages
     and alias contain all the necessary information:
         http://opensolaris.org/os/project/tickless/
         http://opensolaris.org/os/project/tickless/tasks/lbolt/
         tickless-dev at opensolaris.org


6. Resources and Schedule:
     6.5. ARC review type: Fast track
     6.6. ARC Exposure: open





Updates to existing man pages:
------------------------------

drv_getparm.9f

PARAMETERS
      ...

      LBOLT     Read the value of lbolt. lbolt is a  clock_t  that    |
                represents the number of clock ticks since system     |
                boot. No special treatment is  applied  when          |
                this  value  overflows  the  maximum  value of the
                signed integral type clock_t.  When  this  occurs,
                its value will be negative, and its magnitude will
                be decreasing until it again passes zero.  It  can
                ...




drv_hztousec.9f

DESCRIPTION
      The drv_hztousec() function converts into  microseconds  the
      time expressed by hertz, which is in system clock ticks.

      The length of time the system has been up since boot can be    |
      retrieved by calling ddi_get_lbolt(9F), which will return a    |
      value of type clock_t containing the number of clock ticks
      since boot. Drivers often use this value before and after an
      I/O request to measure the amount of time it took the device to
      process the request. The drv_hztousec() function can be used
      by the driver to convert the reading from clock ticks  to  a
      known unit of time.




Intro.9f

Kernel Functions for Drivers                            Intro(9F)

      ddi_get_instance                  Solaris DDI
      ddi_get_kt_did                    Solaris DDI
      ddi_get_lbolt                     Solaris DDI
      ddi_get_lbolt64                   Solaris DDI            +
      ddi_get_name                      Solaris DDI
      ...




Updated ddi_get_lbolt.9f:
-------------------------

Kernel Functions for Drivers                    ddi_get_lbolt(9F)

NAME
      ddi_get_lbolt - returns the number of clock ticks since boot    |

SYNOPSIS
      #include <sys/types.h>
      #include <sys/ddi.h>
      #include <sys/sunddi.h>

      clock_t ddi_get_lbolt(void);

INTERFACE LEVEL
      Solaris DDI specific (Solaris DDI).

DESCRIPTION
      ddi_get_lbolt() returns a value that represents the number        |
      of clock ticks since the system booted.  This value is        |
      used  as  a  counter  or timer  inside  the  system kernel.
      The tick frequency can be determined  by using drv_usectohz(9F)
      which converts microseconds into clock ticks.


RETURN VALUES
      ddi_get_lbolt() returns the number of clock ticks since boot    |
      in clock_t type.

CONTEXT
       This routine can be called from any context.

SEE ALSO
      ddi_get_lbolt64(9F), ddi_get_time(9F), drv_getparm(9F),
      drv_usectohz(9F)




New man page for ddi_get_lbolt64():
-----------------------------------

Kernel Functions for Drivers                    ddi_get_lbolt64(9F)

NAME
      ddi_get_lbolt64 - returns the number of clock ticks since boot
      in int64_t type

SYNOPSIS
      #include <sys/types.h>
      #include <sys/ddi.h>
      #include <sys/sunddi.h>

      int64_t ddi_get_lbolt64(void);

INTERFACE LEVEL
      Solaris DDI specific (Solaris DDI).

DESCRIPTION
      ddi_get_lbolt64() returns a value that represents the number
      of clock ticks since the system booted.  This value is
      used  as  a  counter  or timer  inside  the  system kernel. It is
      essentially the same value returned by ddi_get_lbolt(9F), but in a
      longer data type that will not wrap for 2.9 billion years.

RETURN VALUES
      ddi_get_lbolt64() returns the number of clock ticks since boot
      in int64_t type.

CONTEXT
       This routine can be called from any context.

SEE ALSO
      ddi_get_lbolt(9F), ddi_get_time(9F)

      Writing Device Drivers

       STREAMS Programming Guide

SunOS 5.11          Last change: 29 Jul 2009                    1


Updates to condvar(9f):
----------------------

Kernel Functions for Drivers                          condvar(9F)

NAME
      condvar,   cv_init,    cv_destroy,    cv_wait,    cv_signal,
      cv_broadcast,  cv_wait_sig, cv_timedwait, cv_timedwait_sig,
      cv_reltimedwait, cv_reltimedwait_sig - condition variable
      routines

SYNOPSIS
      #include <sys/ksynch.h>

      void cv_init(kcondvar_t *cvp, char *name, kcv_type_t type, void *arg);

      void cv_destroy(kcondvar_t *cvp);

      void cv_wait(kcondvar_t *cvp, kmutex_t *mp);

      void cv_signal(kcondvar_t *cvp);

      void cv_broadcast(kcondvar_t *cvp);

      int cv_wait_sig(kcondvar_t *cvp, kmutex_t *mp);

      clock_t cv_timedwait(kcondvar_t *cvp, kmutex_t *mp, clock_t timeout);

      clock_t cv_timedwait_sig(kcondvar_t *cvp, kmutex_t *mp, clock_t timeout);

|    clock_t cv_reltimedwait(kcondvar_t *cvp, kmutex_t *mp, clock_t delta,
|    time_res_t resolution);

|    clock_t cv_reltimedwait_sig(kcondvar_t *cvp, kmutex_t *mp, clock_t delta,
|    time_res_t resolution);

INTERFACE LEVEL
      Solaris DDI specific (Solaris DDI).

PARAMETERS
      cvp        A pointer to an abstract data type kcondvar_t.

      mp         A pointer to a mutual exclusion lock  (kmutex_t),
                 initialized  by  mutex_init(9F)  and  held by the
                 caller.

      name       Descriptive string. This is obsolete  and  should
                 be NULL. (Non-NULL strings are legal, but they're
                 a waste of kernel memory.)

SunOS 5.11          Last change: 02 Aug 2009                    1

Kernel Functions for Drivers                          condvar(9F)

      type       The constant CV_DRIVER.

      arg        A type-specific argument, drivers should pass arg
                 as NULL.

      timeout    A  time,  in  absolute  ticks  since  boot,  when
                 cv_timedwait()   or   cv_timedwait_sig()   should
                 return.

|     delta      A time, in relative ticks, when cv_reltimedwait()
|        or cv_reltimedwait_sig() should return.
|
|  resolution    A flag that specifies how accurately the relative
|          time interval should be. Possible values are
|          TR_NANOSEC, TR_MICROSEC, TR_MILLISEC, TR_SEC or
|          TR_CLOCK_TICK, the former indicating that the interval
|          should be aligned to system clock ticks. This
|          information allows the system to anticipate or
|          deffer the timeout expiration in order to batch process
|          similarly expiring events. Allowing the system to
|          stay idle for longer periods of time and enhance
|          its power efficiency.


DESCRIPTION
      Condition variables are a standard form of thread synchroni-
      zation.  They  are designed to be used with mutual exclusion
      locks (mutexes). The associated mutex is used to ensure that
      a  condition  can  be checked atomically and that the thread
      can block on the associated condition variable without miss-
      ing  either  a  change to the condition or a signal that the
      condition has changed. Condition variables must be  initial-
      ized  by  calling cv_init(), and must be deallocated by cal-
      ling cv_destroy().

      The usual use of condition variables is to check a condition
      (for  example, device state, data structure reference count,
      etc.) while holding a mutex which keeps other  threads  from
      changing  the  condition.  If the condition is such that the
      thread should block, cv_wait() is called with a related con-
      dition  variable and the mutex. At some later point in time,
      another thread would acquire the mutex,  set  the  condition
      such  that the previous thread can be unblocked, unblock the
      previous thread with cv_signal() or cv_broadcast(), and then
      release the mutex.

      cv_wait() suspends the calling thread and  exits  the  mutex
      atomically so that another thread which holds the mutex can-
      not signal on the  condition  variable  until  the  blocking
      thread  is  blocked.  Before  returning,  the mutex is reac-
      quired.

      cv_signal() signals the  condition  and  wakes  one  blocked
      thread.  All  blocked  threads  can  be unblocked by calling
      cv_broadcast(). cv_signal() and cv_broadcast() can be called
      by  a  thread even if it does not hold the mutex passed into
      cv_wait(), though holding the mutex is necessary  to  ensure
      predictable scheduling.

SunOS 5.11          Last change: 02 Aug 2009                    2

Kernel Functions for Drivers                          condvar(9F)

      The function  cv_wait_sig()  is  similar  to  cv_wait()  but
      returns  0  if a signal (for example, by kill(2)) is sent to
      the thread. In any case,  the  mutex  is  reacquired  before
      returning.

      The function cv_timedwait() is similar to cv_wait(),  except
      that  it  returns  -1  without  the condition being signaled
      after the timeout time has been reached.

      The function cv_timedwait_sig() is similar to cv_timedwait()
      and  cv_wait_sig(),  except  that  it returns -1 without the
      condition being signaled after the  timeout  time  has  been
      reached,  or 0 if a signal (for example, by kill(2)) is sent
      to the thread.

      For both cv_timedwait() and cv_timedwait_sig(), time  is  in
      absolute  clock  ticks  since  the  last  system reboot. The
      current time may be found by calling ddi_get_lbolt(9F).

|     The cv_reltimedwait() function is similar to cv_timedwait(),
|     except that it takes a relative time value as argument and
|     it also takes an additional argument to specify the accuracy
|     of such interval. cv_reltimedwait_sig() is analogous to
|     cv_timedwait_sig(), but takes the same arguments as
|     cv_reltimedwait().

RETURN VALUES
      0        For cv_wait_sig(), cv_timedwait_sig() and cv_reltimedwait_sig()
           indicates
               that the condition was not necessarily signaled and
               the function  returned  because  a  signal  (as  in
               kill(2)) was pending.

|     -1       For cv_timedwait(), cv_timedwait_sig(),
|              cv_reltimedwait() and cv_reltimedwait_sig() indicates
               that the condition was not necessarily signaled and
               the function returned because the timeout time  was
               reached.

|     >0       For cv_wait_sig(), cv_timedwait(), cv_timedwait_sig(),
|               cv_reltimedwait() or cv_reltimedwait_sig()
|                indicates that the condition was
               met and the function returned  due  to  a  call  to
               cv_signal()  or  cv_broadcast(), or due to a prema-
               ture wakeup (see NOTES).

CONTEXT
      These functions can be called from user, kernel or interrupt
      context.  In most cases, however, cv_wait(), cv_timedwait(),
|     cv_wait_sig(), cv_timedwait_sig(), cv_reltimedwait() and
|     cv_reltimedwait_sig()
      should not  be  called
      from  interrupt  context,  and cannot be called from a high-
      level interrupt context.

      If    cv_wait(),    cv_timedwait(),    cv_wait_sig(),
|     cv_timedwait_sig(), cv_reltimedwait() or cv_reltimedwait_sig()
|       are  used from interrupt context, lower-

SunOS 5.11          Last change: 02 Aug 2009                    3

Kernel Functions for Drivers                          condvar(9F)

      priority interrupts will not be serviced  during  the  wait.
      This  means  that if the thread that will eventually perform
      the wakeup becomes blocked on  anything  that  requires  the
      lower-priority interrupt, the system will hang.

      For example, the thread that will  perform  the  wakeup  may
      need  to  first  allocate memory. This memory allocation may
      require waiting  for  paging  I/O  to  complete,  which  may
      require  a  lower-priority  disk  or network interrupt to be
      serviced. In general,  situations  like  this  are  hard  to
      predict,  so  it  is advisable to avoid waiting on condition
      variables or semaphores in an interrupt context.

EXAMPLES
      Example 1 Waiting for a Flag Value in a Driver's Unit

      Here the condition being waited for is a  flag  value  in  a
      driver's  unit  structure. The condition variable is also in
      the unit structure, and the flag  word  is  protected  by  a
      mutex in the unit structure.

             mutex_enter(&un->un_lock);
             while (un->un_flag & UNIT_BUSY)
               cv_wait(&un->un_cv, &un->un_lock);
             un->un_flag |= UNIT_BUSY;
             mutex_exit(&un->un_lock);

      Example 2 Unblocking Threads Blocked by the Code in  Example
      1

      At some later point in time, another  thread  would  execute
      the  following  to  unblock any threads blocked by the above
      code.

        mutex_enter(&un->un_lock);
        un->un_flag &= ~UNIT_BUSY;
        cv_broadcast(&un->un_cv);
        mutex_exit(&un->un_lock);

NOTES
|     It is possible for cv_wait(), cv_wait_sig(), cv_timedwait(),
|     cv_timedwait_sig(), cv_reltimedwait() and cv_reltimedwait_sig()
|     to return prematurely, that is, not
      due to a call to cv_signal() or cv_broadcast(). This  occurs
      most   commonly   in   the   case   of   cv_wait_sig(),

SunOS 5.11          Last change: 02 Aug 2009                    4

Kernel Functions for Drivers                          condvar(9F)

|    cv_timedwait_sig() and cv_reltimedwait_sig() when the thread
|     is stopped and  restarted
      by  job  control signals or by a debugger, but can happen in
      other cases as well, even for  cv_wait().  Code  that  calls
      these  functions must always recheck the reason for blocking
      and call again if the reason for blocking is still true.

|     If your driver needs to wait on  behalf  of  processes  that
|     have  real-time  constraints, use cv_timedwait() or cv_reltimedwait()
|     rather than
      delay(9F). The delay() function calls timeout(9F), which can
      be subject to priority inversions.

      Not  all  threads  can  receive  signals  from  user   level
      processes. In cases where such reception is impossible (such
      as  during  execution  of   close(9E)   due   to   exit(2)),
      cv_wait_sig()  behaves  as cv_wait(), cv_timedwait_sig()
|     behaves as cv_timedwait() and cv_reltimedwait_sig() behaves as
|     cv_reltimedwait().
      To  avoid  unkillable  processes,
      users of these functions may need to protect against waiting
      indefinitely  for  events  that   might   not   occur.   The
      ddi_can_receive_sig(9F)  function is provided to detect when
      signal reception is possible.

SEE ALSO
      kill(2),     ddi_can_receive_sig(9F),     ddi_get_lbolt(9F),
|     ddi_get_lbolt64(9F), mutex(9F), mutex_init(9F)

      Writing Device Drivers

SunOS 5.11          Last change: 02 Aug 2009                    5

Reply via email to