After discussions with Madhavan and code inspection, we have determined
that the CALLOUT_FLAG_HRESTIME introduced in nv_103 caused an unintended
change in behavior.  This is a bug that we will fix as soon as possible.

- Steve Sistare

Madhavan Venkataraman wrote:
>>
>> Date: Wed, 21 Jan 2009 09:17:28 -0800 (PST)
>> > From: J?rgen Keil <j...@tools.de>
>> > Subject: [osol-code] 6565503 callout changes, stime(2), and timeouts
>> > To: opensolaris-code@opensolaris.org
>> > 
>> > The putback for 6565503 "callout processing is single threaded, throttling
>> > applications that rely on scalable callouts" in build 103 apparently has
>> > changed the kernel function cv_waituntil_sig() to create callouts with
>> > a new flag CALLOUT_FLAG_HRESTIME.  This flag is described in
>> > uts/common/sys/callo.h as:
>> > 
>> >  * CALLOUT_FLAG_HRESTIME
>> >  * Normally, callouts are not affected by changes to system time
>> >  * (hrestime). This flag is used to create a callout that is affected
>> >  * by system time. If system time changes, these timers must expire
>> >  * at once. These are used by condition variables and LWP timers that
>> >  * need this behavior.
>> > 
>> > cv_waituntil_sig() is used with several system calls (poll() / select() /
>> > sigtimedwait(), semtimedop(), ...).  What I'm observing is that with
>> > build 103 or newer all of these system calls return prematurely -
>> > before the timeout expires - when the system time is set (e.g. by
>> > ntp / ntpdate or rdate).
>> > 
>> > Test cases are
[...]
>> > 
>> > 
>> > If you start any one of these on build 103 or newer
>> > and run "rdate {time-server}" or set the date with
>> > "date HHMM" in another window, the system call
>> > gets a timeout and the program terminates.
>> > Expected behavior would be that these programs
>> > wait 250 seconds for some event, and changing
>> > the system clock does not affect waiting.
>> > 
>> > Is this a bug or a feature of 6565503?
>> > 
>>   
> 
> It is a feature.
> 
> Here is the comment before cv_waituntil_sig().
> 
>  * As a special test, if someone abruptly resets the system time 
>  * (but not through adjtime(2); drifting of the clock is allowed and 
>  * expected [see timespectohz_adj()]), then we force a return of -1 
>  * so the caller can return a premature timeout to the calling process 
>  * so it can reevaluate the situation in light of the new system time. 
>  * (The system clock has been reset if timecheck != timechanged.)
> 
> So, when system time is changed, this function is supposed to
> return immediately. It has a check for that. However, the callout
> subsystem did not implement this semantic. So, once this code
> got into the callout subsystem, it had to wait for the timer to
> go off.
> 
> This implementation corrects that. Now, the callout subsystem
> implements the correct semantic. Hence, the premature return
> when system time is changed.
> 
> This should have been documented in stime(), sleep(), usleep(),
> nanosleep() and poll(). Either a bug should be filed for
> documentation or callers of cv_waituntil_sig() should check
> for premature return and wait again. I will consult my team about
> the right action for this.
> 
> To summarize, this is not a bug in the callout subsystem. But the
> legacy behavior is wrong.
> 
> 
> 
>> > 
>> > Btw. the changed timeout for poll() behavior did break hald:
>> > 
>> >     Bug ID: 6792302
>> >     Synopsis: hald occasionally exits on startup with status 2
>> >     http://bugs.opensolaris.org/view_bug.do?bug_id=6792302
>> > -- 
>>   

_______________________________________________
opensolaris-code mailing list
opensolaris-code@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/opensolaris-code

Reply via email to