Rafael Vanoni wrote: > Hi Garret > > Garrett D'Amore wrote: >> I'm inclined to give this a +1, but I'm also concerned that the >> changes here, while *good*, may exceed the scope of a fast track. >> I'm close to derailing, only to ensure that the case is properly >> reviewed. This is not something that we should have slip in under >> the radar, even if I think the changes are all *good*. :-) >> >> If any other member is in agreement with me, let me know, and I'll >> derail it. I'm happy to own the case and the opinion duties. :-) If >> however, everyone else feels that this should be left as a fast >> track, then I'll leave it alone. I certainly don't have any >> particular problem with the case as specified, beyond the minor >> issues pointed out below. >> >> Other considerations: >> >> 1) I don't see any changes made to cv_wait and cv_timedwait. Does it >> make sense to obsolete them at the same time that we're introducing >> cv_reltimedwait()? > > There are still many consumers of cv_timedwait who need to pass in an > absolute time. cv_wait serves a similar purpose but without the timed > component, so it's functionality doesn't overlap with cv_reltimedwait().
Really? In device drivers? Because darn near every time I've used cv_timedwait (and I've used it a *lot* over my career), what I really wanted was a relative time (usually for a timeout.) I can't think of any reason in normal kernel code why you'd want an *absolute* time. (Apart from the kernel proper, where I can see absolute times being useful for wall clock kinds of things.) > >> 2) for enum time_res, it might be convenient to use shorter names, >> using SI abbreviations. I'd suggest: >> >> enum time_res { >> TR_NSEC, >> TR_USEC, >> TR_MSEC, >> TR_SEC, >> TR_TICK, >> TR_COUNT >> }; >> >> (Btw, what is the difference between TR_COUNT and TR_CLOCK_TICK?) > > TR_CLOCK_TICK represents the duration of a clock tick in nanoseconds > (nsec_per_tick). TR_COUNT is simply the number of elements in the > enumeration. Ah, so TR_COUNT is not an exported value then. -- Garrett > > Thanks, > Rafael > >> - Garrett >> >> Jerry Gilliam wrote: >>> >>> I am sponsoring the following fast-track on behalf of Rafael Vanoni, >>> with a time-out of 09/09/2009. The project desires >>> minor/major binding, plus micro/patch binding for one >>> interface, as specified. >>> >>> ------------------------------------- >>> >>> Template Version: @(#)onepager.txt 1.35 07/11/07 SMI >>> Copyright 2007 Sun Microsystems >>> >>> 1. Introduction >>> 1.1. Project/Component Working Name: >>> Tickless Kernel Architecture / lbolt decoupling >>> >>> 1.2. Name of Document Author/Supplier: >>> Rafael Vanoni Polanczyk (rafael.vanoni at sun.com) >>> >>> 1.3. Date of This Document: >>> 08/04/09 >>> >>> 1.3.1. Date this project was conceived: >>> 07/01/09 >>> >>> 1.4. Name of Major Document Customer(s)/Consumer(s): >>> 1.4.1. The PAC or CPT you expect to review your project: >>> Solaris PAC >>> 1.4.2. The ARC(s) you expect to review your project: >>> 1.4.3. The Director/VP who is "Sponsoring" this project: >>> Greg.Lavender at Sun.COM >>> 1.4.4. The name of your business unit: >>> Systems >>> >>> 1.5. Email Aliases: >>> 1.5.1. Responsible Manager: darrin.johnson at sun.com >>> 1.5.2. Responsible Engineer: rafael.vanoni at sun.com >>> 1.5.3. Marketing Manger: mike.mulkey at sun.com >>> 1.5.4. Interest List: tickless-dev at opensolaris.org >>> >>> >>> 2. Project Summary >>> 2.1. Project Description: >>> The tickless project aims at implementing the services >>> provided by the >>> clock cyclic in an event driven fashion. The first >>> sub-project is the >>> decoupling of the lbolt and lbolt64 variables from clock(). >>> These two >>> variables are incremented at each firing of the clock cyclic >>> and provide >>> a time reference to the system. They are being replaced by >>> two routines >>> that are backed by gethrtime(), the existing ddi_get_lbolt() and >>> the new ddi_get_lbolt64(), introduced as a migration path for >>> existing >>> non-DDI compliant consumers. >>> >>> This project also presents a solution to minimize the usage >>> of the DDI >>> lbolt routines through new interfaces, and a method to >>> prevent any >>> performance impact of migrating inexpensive references to >>> variables, to >>> calling of routines. These are described in detail on section >>> 4.1. >>> >>> >>> 4. Technical Description: >>> 4.1. Details: >>> lbolt and lbolt64 variables will be replaced by two routines, >>> ddi_get_lbolt() and ddi_get_lbolt64(), which are backed by a >>> hardware >>> counter to provide the same service in en event driven way. >>> >>> One of the major consumers of the lbolt service are the >>> cv_timedwait() >>> and cv_timedwait_sig() routines, which require lbolt to form >>> one of its >>> arguments (an absolute value of time) and once again >>> internally to >>> decompose it into a relative time. This project is >>> introducing two new >>> routines, cv_reltimedwait() and cv_reltimedwait_sig() which >>> will perform >>> the same service of the previously mentioned routines but simply >>> receiving a relative time, and not requiring lbolt at all. >>> These new >>> routines will also have a new argument of type time_res_t to >>> inform >>> the underlying timeout system as to how accurately the given >>> timeout >>> must expire. This will allow the kernel to anticipate or >>> defer such >>> timeouts when possible, allowing the system to stay idle for >>> longer >>> periods of time. >>> >>> Some consumers of the lbolt and lbolt64 variables may have >>> inexplicit >>> dependencies on the cheapness of reading a memory position >>> that will be >>> exposed when migrated to a gethrtime() backed routine. In >>> such cases >>> migrating references to lbolt and lbolt64 to ddi_get_lbolt() and >>> ddi_get_lbolt64() will have a negative performance impact. To >>> address >>> this case, our project will perform the lbolt service in >>> an hybrid way, >>> switching from event to cyclic driven when the DDI lbolt >>> routines are >>> being heavily used. This cyclic mode will reprogram a timer >>> that will >>> expire at each clock tick and increment an internal (lbolt >>> like) >>> variable and return its value to the consumer. This cyclic >>> will only >>> be activated during periods of heavy load, and will switch >>> itself off >>> when the activity subsides. >>> >>> The decision to remove the lbolt and lbolt64 variables was made >>> during >>> design review, and a consensus was reached on the basis that, since >>> we're reaching the end of a major release, this is the right >>> moment to >>> obsolete these. The side effects and cost of maintaining such >>> symbols >>> outweigh the benefits. However, this decision can be >>> re-evaluated in >>> case the negative impact on 3rd party modules during the >>> development >>> release is greater than expected. We're working with ISV and RPE to >>> minimize the impact pro-actively. >>> >>> 4.2. Bug/RFE Number(s): >>> 6860030 tickless clock requires a clock() decoupled lbolt / >>> lbolt64 >>> >>> 4.5. Interfaces: >>> This project is adding the following interfaces to the DDI: >>> >>> int64_t ddi_get_lbolt64(void); >>> >>> clock_t cv_reltimedwait(kcondvar_t *cvp, kmutex_t *mp, >>> clock_t delta, >>> time_res_t res); >>> >>> clock_t cv_reltimedwait_sig(kcondvar_t *cvp, kmutex_t *mp, >>> clock_t >>> delta, time_res_t res); >>> >>> With time_res_t defined as >>> >>> enum time_res { >>> TR_NANOSEC, >>> TR_MICROSEC, >>> TR_MILLISEC, >>> TR_SEC, >>> TR_CLOCK_TICK, >>> TR_COUNT >>> }; >>> >>> typedef enum time_res time_res_t; >>> >>> In addition to that, the lbolt and lbolt64 variables (which are >>> *private* symbols known to be used by non-DDI compliant >>> modules) are >>> being removed. 3rd party modules that are not brought up to >>> speed will >>> fail to load. >>> >>> In summary: >>> >>> Interface Commitment Comments >>> >>> ----------------------------------------------------------------------- >>> ddi_get_lbolt64() Public/DDI return lbolt64 >>> cv_reltimedwait(9F) Public/DDI cv_timedwait(9f), relative time >>> cv_reltimedwait_sig(9F) Public/DDI cv_timedwait_sig(9F), >>> relative time >>> lbolt Obsolete commonly referenced kernel symbol >>> lbolt64 Obsolete commonly referenced kernel symbol >>> >>> We also plan on back porting the ddi_get_lbolt64() interface to >>> Solaris >>> 10 Update 9 to extend the migration path for S10 users who would >>> like >>> to update their modules before moving to Solaris Nevada or the next >>> version of Solaris. These users already have ddi_get_lbolt() but >>> currently lack the 64 bits version of it. Such back port will have >>> patch release binding. >>> >>> >>> 4.6. Doc Impact: >>> 6868417 updates for tickless kernel/lbolt decoupling (6860030) >>> >>> Updates to the 'Writing Device Drivers' document are >>> necessary, the >>> project team is in contact with the documentation group to >>> address >>> these. >>> >>> >>> 5. Reference Documents: >>> This project is being developed through OpenSolaris, our project >>> pages >>> and alias contain all the necessary information: >>> http://opensolaris.org/os/project/tickless/ >>> http://opensolaris.org/os/project/tickless/tasks/lbolt/ >>> tickless-dev at opensolaris.org >>> >>> >>> 6. Resources and Schedule: >>> 6.5. ARC review type: Fast track >>> 6.6. ARC Exposure: open >>> >>> >>> >>> >>> >>> Updates to existing man pages: >>> ------------------------------ >>> >>> drv_getparm.9f >>> >>> PARAMETERS >>> ... >>> >>> LBOLT Read the value of lbolt. lbolt is a clock_t that | >>> represents the number of clock ticks since system | >>> boot. No special treatment is applied when | >>> this value overflows the maximum value of the >>> signed integral type clock_t. When this occurs, >>> its value will be negative, and its magnitude will >>> be decreasing until it again passes zero. It can >>> ... >>> >>> >>> >>> >>> drv_hztousec.9f >>> >>> DESCRIPTION >>> The drv_hztousec() function converts into microseconds the >>> time expressed by hertz, which is in system clock ticks. >>> >>> The length of time the system has been up since boot can be | >>> retrieved by calling ddi_get_lbolt(9F), which will return a | >>> value of type clock_t containing the number of clock ticks >>> since boot. Drivers often use this value before and after an >>> I/O request to measure the amount of time it took the device to >>> process the request. The drv_hztousec() function can be used >>> by the driver to convert the reading from clock ticks to a >>> known unit of time. >>> >>> >>> >>> >>> Intro.9f >>> >>> Kernel Functions for Drivers Intro(9F) >>> >>> ddi_get_instance Solaris DDI >>> ddi_get_kt_did Solaris DDI >>> ddi_get_lbolt Solaris DDI >>> ddi_get_lbolt64 Solaris DDI + >>> ddi_get_name Solaris DDI >>> ... >>> >>> >>> >>> >>> Updated ddi_get_lbolt.9f: >>> ------------------------- >>> >>> Kernel Functions for Drivers ddi_get_lbolt(9F) >>> >>> NAME >>> ddi_get_lbolt - returns the number of clock ticks since boot | >>> >>> SYNOPSIS >>> #include <sys/types.h> >>> #include <sys/ddi.h> >>> #include <sys/sunddi.h> >>> >>> clock_t ddi_get_lbolt(void); >>> >>> INTERFACE LEVEL >>> Solaris DDI specific (Solaris DDI). >>> >>> DESCRIPTION >>> ddi_get_lbolt() returns a value that represents the >>> number | >>> of clock ticks since the system booted. This value is | >>> used as a counter or timer inside the system kernel. >>> The tick frequency can be determined by using drv_usectohz(9F) >>> which converts microseconds into clock ticks. >>> >>> >>> RETURN VALUES >>> ddi_get_lbolt() returns the number of clock ticks since boot | >>> in clock_t type. >>> >>> CONTEXT >>> This routine can be called from any context. >>> >>> SEE ALSO >>> ddi_get_lbolt64(9F), ddi_get_time(9F), drv_getparm(9F), >>> drv_usectohz(9F) >>> >>> >>> >>> >>> New man page for ddi_get_lbolt64(): >>> ----------------------------------- >>> >>> Kernel Functions for Drivers ddi_get_lbolt64(9F) >>> >>> NAME >>> ddi_get_lbolt64 - returns the number of clock ticks since boot >>> in int64_t type >>> >>> SYNOPSIS >>> #include <sys/types.h> >>> #include <sys/ddi.h> >>> #include <sys/sunddi.h> >>> >>> int64_t ddi_get_lbolt64(void); >>> >>> INTERFACE LEVEL >>> Solaris DDI specific (Solaris DDI). >>> >>> DESCRIPTION >>> ddi_get_lbolt64() returns a value that represents the number >>> of clock ticks since the system booted. This value is >>> used as a counter or timer inside the system kernel. It is >>> essentially the same value returned by ddi_get_lbolt(9F), but in a >>> longer data type that will not wrap for 2.9 billion years. >>> >>> RETURN VALUES >>> ddi_get_lbolt64() returns the number of clock ticks since boot >>> in int64_t type. >>> >>> CONTEXT >>> This routine can be called from any context. >>> >>> SEE ALSO >>> ddi_get_lbolt(9F), ddi_get_time(9F) >>> >>> Writing Device Drivers >>> >>> STREAMS Programming Guide >>> >>> SunOS 5.11 Last change: 29 Jul 2009 1 >>> >>> >>> Updates to condvar(9f): >>> ---------------------- >>> >>> Kernel Functions for Drivers condvar(9F) >>> >>> NAME >>> condvar, cv_init, cv_destroy, cv_wait, cv_signal, >>> cv_broadcast, cv_wait_sig, cv_timedwait, cv_timedwait_sig, >>> cv_reltimedwait, cv_reltimedwait_sig - condition variable >>> routines >>> >>> SYNOPSIS >>> #include <sys/ksynch.h> >>> >>> void cv_init(kcondvar_t *cvp, char *name, kcv_type_t type, void >>> *arg); >>> >>> void cv_destroy(kcondvar_t *cvp); >>> >>> void cv_wait(kcondvar_t *cvp, kmutex_t *mp); >>> >>> void cv_signal(kcondvar_t *cvp); >>> >>> void cv_broadcast(kcondvar_t *cvp); >>> >>> int cv_wait_sig(kcondvar_t *cvp, kmutex_t *mp); >>> >>> clock_t cv_timedwait(kcondvar_t *cvp, kmutex_t *mp, clock_t >>> timeout); >>> >>> clock_t cv_timedwait_sig(kcondvar_t *cvp, kmutex_t *mp, clock_t >>> timeout); >>> >>> | clock_t cv_reltimedwait(kcondvar_t *cvp, kmutex_t *mp, clock_t >>> delta, >>> | time_res_t resolution); >>> >>> | clock_t cv_reltimedwait_sig(kcondvar_t *cvp, kmutex_t *mp, >>> clock_t delta, >>> | time_res_t resolution); >>> >>> INTERFACE LEVEL >>> Solaris DDI specific (Solaris DDI). >>> >>> PARAMETERS >>> cvp A pointer to an abstract data type kcondvar_t. >>> >>> mp A pointer to a mutual exclusion lock (kmutex_t), >>> initialized by mutex_init(9F) and held by the >>> caller. >>> >>> name Descriptive string. This is obsolete and should >>> be NULL. (Non-NULL strings are legal, but they're >>> a waste of kernel memory.) >>> >>> SunOS 5.11 Last change: 02 Aug 2009 1 >>> >>> Kernel Functions for Drivers condvar(9F) >>> >>> type The constant CV_DRIVER. >>> >>> arg A type-specific argument, drivers should pass arg >>> as NULL. >>> >>> timeout A time, in absolute ticks since boot, when >>> cv_timedwait() or cv_timedwait_sig() should >>> return. >>> >>> | delta A time, in relative ticks, when cv_reltimedwait() >>> | or cv_reltimedwait_sig() should return. >>> | >>> | resolution A flag that specifies how accurately the relative >>> | time interval should be. Possible values are >>> | TR_NANOSEC, TR_MICROSEC, TR_MILLISEC, TR_SEC or >>> | TR_CLOCK_TICK, the former indicating that the interval >>> | should be aligned to system clock ticks. This >>> | information allows the system to anticipate or >>> | deffer the timeout expiration in order to batch process >>> | similarly expiring events. Allowing the system to >>> | stay idle for longer periods of time and enhance >>> | its power efficiency. >>> >>> >>> DESCRIPTION >>> Condition variables are a standard form of thread synchroni- >>> zation. They are designed to be used with mutual exclusion >>> locks (mutexes). The associated mutex is used to ensure that >>> a condition can be checked atomically and that the thread >>> can block on the associated condition variable without miss- >>> ing either a change to the condition or a signal that the >>> condition has changed. Condition variables must be initial- >>> ized by calling cv_init(), and must be deallocated by cal- >>> ling cv_destroy(). >>> >>> The usual use of condition variables is to check a condition >>> (for example, device state, data structure reference count, >>> etc.) while holding a mutex which keeps other threads from >>> changing the condition. If the condition is such that the >>> thread should block, cv_wait() is called with a related con- >>> dition variable and the mutex. At some later point in time, >>> another thread would acquire the mutex, set the condition >>> such that the previous thread can be unblocked, unblock the >>> previous thread with cv_signal() or cv_broadcast(), and then >>> release the mutex. >>> >>> cv_wait() suspends the calling thread and exits the mutex >>> atomically so that another thread which holds the mutex can- >>> not signal on the condition variable until the blocking >>> thread is blocked. Before returning, the mutex is reac- >>> quired. >>> >>> cv_signal() signals the condition and wakes one blocked >>> thread. All blocked threads can be unblocked by calling >>> cv_broadcast(). cv_signal() and cv_broadcast() can be called >>> by a thread even if it does not hold the mutex passed into >>> cv_wait(), though holding the mutex is necessary to ensure >>> predictable scheduling. >>> >>> SunOS 5.11 Last change: 02 Aug 2009 2 >>> >>> Kernel Functions for Drivers condvar(9F) >>> >>> The function cv_wait_sig() is similar to cv_wait() but >>> returns 0 if a signal (for example, by kill(2)) is sent to >>> the thread. In any case, the mutex is reacquired before >>> returning. >>> >>> The function cv_timedwait() is similar to cv_wait(), except >>> that it returns -1 without the condition being signaled >>> after the timeout time has been reached. >>> >>> The function cv_timedwait_sig() is similar to cv_timedwait() >>> and cv_wait_sig(), except that it returns -1 without the >>> condition being signaled after the timeout time has been >>> reached, or 0 if a signal (for example, by kill(2)) is sent >>> to the thread. >>> >>> For both cv_timedwait() and cv_timedwait_sig(), time is in >>> absolute clock ticks since the last system reboot. The >>> current time may be found by calling ddi_get_lbolt(9F). >>> >>> | The cv_reltimedwait() function is similar to cv_timedwait(), >>> | except that it takes a relative time value as argument and >>> | it also takes an additional argument to specify the accuracy >>> | of such interval. cv_reltimedwait_sig() is analogous to >>> | cv_timedwait_sig(), but takes the same arguments as >>> | cv_reltimedwait(). >>> >>> RETURN VALUES >>> 0 For cv_wait_sig(), cv_timedwait_sig() and >>> cv_reltimedwait_sig() >>> indicates >>> that the condition was not necessarily signaled and >>> the function returned because a signal (as in >>> kill(2)) was pending. >>> >>> | -1 For cv_timedwait(), cv_timedwait_sig(), >>> | cv_reltimedwait() and cv_reltimedwait_sig() indicates >>> that the condition was not necessarily signaled and >>> the function returned because the timeout time was >>> reached. >>> >>> | >0 For cv_wait_sig(), cv_timedwait(), cv_timedwait_sig(), >>> | cv_reltimedwait() or cv_reltimedwait_sig() >>> | indicates that the condition was >>> met and the function returned due to a call to >>> cv_signal() or cv_broadcast(), or due to a prema- >>> ture wakeup (see NOTES). >>> >>> CONTEXT >>> These functions can be called from user, kernel or interrupt >>> context. In most cases, however, cv_wait(), cv_timedwait(), >>> | cv_wait_sig(), cv_timedwait_sig(), cv_reltimedwait() and >>> | cv_reltimedwait_sig() >>> should not be called >>> from interrupt context, and cannot be called from a high- >>> level interrupt context. >>> >>> If cv_wait(), cv_timedwait(), cv_wait_sig(), >>> | cv_timedwait_sig(), cv_reltimedwait() or cv_reltimedwait_sig() >>> | are used from interrupt context, lower- >>> >>> SunOS 5.11 Last change: 02 Aug 2009 3 >>> >>> Kernel Functions for Drivers condvar(9F) >>> >>> priority interrupts will not be serviced during the wait. >>> This means that if the thread that will eventually perform >>> the wakeup becomes blocked on anything that requires the >>> lower-priority interrupt, the system will hang. >>> >>> For example, the thread that will perform the wakeup may >>> need to first allocate memory. This memory allocation may >>> require waiting for paging I/O to complete, which may >>> require a lower-priority disk or network interrupt to be >>> serviced. In general, situations like this are hard to >>> predict, so it is advisable to avoid waiting on condition >>> variables or semaphores in an interrupt context. >>> >>> EXAMPLES >>> Example 1 Waiting for a Flag Value in a Driver's Unit >>> >>> Here the condition being waited for is a flag value in a >>> driver's unit structure. The condition variable is also in >>> the unit structure, and the flag word is protected by a >>> mutex in the unit structure. >>> >>> mutex_enter(&un->un_lock); >>> while (un->un_flag & UNIT_BUSY) >>> cv_wait(&un->un_cv, &un->un_lock); >>> un->un_flag |= UNIT_BUSY; >>> mutex_exit(&un->un_lock); >>> >>> Example 2 Unblocking Threads Blocked by the Code in Example >>> 1 >>> >>> At some later point in time, another thread would execute >>> the following to unblock any threads blocked by the above >>> code. >>> >>> mutex_enter(&un->un_lock); >>> un->un_flag &= ~UNIT_BUSY; >>> cv_broadcast(&un->un_cv); >>> mutex_exit(&un->un_lock); >>> >>> NOTES >>> | It is possible for cv_wait(), cv_wait_sig(), cv_timedwait(), >>> | cv_timedwait_sig(), cv_reltimedwait() and cv_reltimedwait_sig() >>> | to return prematurely, that is, not >>> due to a call to cv_signal() or cv_broadcast(). This occurs >>> most commonly in the case of cv_wait_sig(), >>> >>> SunOS 5.11 Last change: 02 Aug 2009 4 >>> >>> Kernel Functions for Drivers condvar(9F) >>> >>> | cv_timedwait_sig() and cv_reltimedwait_sig() when the thread >>> | is stopped and restarted >>> by job control signals or by a debugger, but can happen in >>> other cases as well, even for cv_wait(). Code that calls >>> these functions must always recheck the reason for blocking >>> and call again if the reason for blocking is still true. >>> >>> | If your driver needs to wait on behalf of processes that >>> | have real-time constraints, use cv_timedwait() or >>> cv_reltimedwait() >>> | rather than >>> delay(9F). The delay() function calls timeout(9F), which can >>> be subject to priority inversions. >>> >>> Not all threads can receive signals from user level >>> processes. In cases where such reception is impossible (such >>> as during execution of close(9E) due to exit(2)), >>> cv_wait_sig() behaves as cv_wait(), cv_timedwait_sig() >>> | behaves as cv_timedwait() and cv_reltimedwait_sig() behaves as >>> | cv_reltimedwait(). >>> To avoid unkillable processes, >>> users of these functions may need to protect against waiting >>> indefinitely for events that might not occur. The >>> ddi_can_receive_sig(9F) function is provided to detect when >>> signal reception is possible. >>> >>> SEE ALSO >>> kill(2), ddi_can_receive_sig(9F), ddi_get_lbolt(9F), >>> | ddi_get_lbolt64(9F), mutex(9F), mutex_init(9F) >>> >>> Writing Device Drivers >>> >>> SunOS 5.11 Last change: 02 Aug 2009 5 >>> >> >