Steve,
On 01/23/09 07:55, Steve Sistare wrote:
After discussions with Madhavan and code inspection, we have determined
that the CALLOUT_FLAG_HRESTIME introduced in nv_103 caused an unintended
change in behavior. This is a bug that we will fix as soon as possible.
Is there a CR we could track? I've seen this go in snv_107:
6784948 Bug fixes to the Callout implementation putback in SNV 103
but it doesn't seem to address this particular problem.
-Artem
Madhavan Venkataraman wrote:
Date: Wed, 21 Jan 2009 09:17:28 -0800 (PST)
From: J?rgen Keil <j...@tools.de>
Subject: [osol-code] 6565503 callout changes, stime(2), and timeouts
To: opensolaris-code@opensolaris.org
The putback for 6565503 "callout processing is single threaded, throttling
applications that rely on scalable callouts" in build 103 apparently has
changed the kernel function cv_waituntil_sig() to create callouts with
a new flag CALLOUT_FLAG_HRESTIME. This flag is described in
uts/common/sys/callo.h as:
* CALLOUT_FLAG_HRESTIME
* Normally, callouts are not affected by changes to system time
* (hrestime). This flag is used to create a callout that is affected
* by system time. If system time changes, these timers must expire
* at once. These are used by condition variables and LWP timers that
* need this behavior.
cv_waituntil_sig() is used with several system calls (poll() / select() /
sigtimedwait(), semtimedop(), ...). What I'm observing is that with
build 103 or newer all of these system calls return prematurely -
before the timeout expires - when the system time is set (e.g. by
ntp / ntpdate or rdate).
Test cases are
[...]
If you start any one of these on build 103 or newer
and run "rdate {time-server}" or set the date with
"date HHMM" in another window, the system call
gets a timeout and the program terminates.
Expected behavior would be that these programs
wait 250 seconds for some event, and changing
the system clock does not affect waiting.
Is this a bug or a feature of 6565503?
It is a feature.
Here is the comment before cv_waituntil_sig().
* As a special test, if someone abruptly resets the system time
* (but not through adjtime(2); drifting of the clock is allowed and
* expected [see timespectohz_adj()]), then we force a return of -1
* so the caller can return a premature timeout to the calling process
* so it can reevaluate the situation in light of the new system time.
* (The system clock has been reset if timecheck != timechanged.)
So, when system time is changed, this function is supposed to
return immediately. It has a check for that. However, the callout
subsystem did not implement this semantic. So, once this code
got into the callout subsystem, it had to wait for the timer to
go off.
This implementation corrects that. Now, the callout subsystem
implements the correct semantic. Hence, the premature return
when system time is changed.
This should have been documented in stime(), sleep(), usleep(),
nanosleep() and poll(). Either a bug should be filed for
documentation or callers of cv_waituntil_sig() should check
for premature return and wait again. I will consult my team about
the right action for this.
To summarize, this is not a bug in the callout subsystem. But the
legacy behavior is wrong.
Btw. the changed timeout for poll() behavior did break hald:
Bug ID: 6792302
Synopsis: hald occasionally exits on startup with status 2
http://bugs.opensolaris.org/view_bug.do?bug_id=6792302
--
_______________________________________________
opensolaris-code mailing list
opensolaris-code@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/opensolaris-code
_______________________________________________
opensolaris-code mailing list
opensolaris-code@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/opensolaris-code