After discussions with Madhavan and code inspection, we have determined that the CALLOUT_FLAG_HRESTIME introduced in nv_103 caused an unintended change in behavior. This is a bug that we will fix as soon as possible.
- Steve Sistare Madhavan Venkataraman wrote: >> >> Date: Wed, 21 Jan 2009 09:17:28 -0800 (PST) >> > From: J?rgen Keil <j...@tools.de> >> > Subject: [osol-code] 6565503 callout changes, stime(2), and timeouts >> > To: opensolaris-code@opensolaris.org >> > >> > The putback for 6565503 "callout processing is single threaded, throttling >> > applications that rely on scalable callouts" in build 103 apparently has >> > changed the kernel function cv_waituntil_sig() to create callouts with >> > a new flag CALLOUT_FLAG_HRESTIME. This flag is described in >> > uts/common/sys/callo.h as: >> > >> > * CALLOUT_FLAG_HRESTIME >> > * Normally, callouts are not affected by changes to system time >> > * (hrestime). This flag is used to create a callout that is affected >> > * by system time. If system time changes, these timers must expire >> > * at once. These are used by condition variables and LWP timers that >> > * need this behavior. >> > >> > cv_waituntil_sig() is used with several system calls (poll() / select() / >> > sigtimedwait(), semtimedop(), ...). What I'm observing is that with >> > build 103 or newer all of these system calls return prematurely - >> > before the timeout expires - when the system time is set (e.g. by >> > ntp / ntpdate or rdate). >> > >> > Test cases are [...] >> > >> > >> > If you start any one of these on build 103 or newer >> > and run "rdate {time-server}" or set the date with >> > "date HHMM" in another window, the system call >> > gets a timeout and the program terminates. >> > Expected behavior would be that these programs >> > wait 250 seconds for some event, and changing >> > the system clock does not affect waiting. >> > >> > Is this a bug or a feature of 6565503? >> > >> > > It is a feature. > > Here is the comment before cv_waituntil_sig(). > > * As a special test, if someone abruptly resets the system time > * (but not through adjtime(2); drifting of the clock is allowed and > * expected [see timespectohz_adj()]), then we force a return of -1 > * so the caller can return a premature timeout to the calling process > * so it can reevaluate the situation in light of the new system time. > * (The system clock has been reset if timecheck != timechanged.) > > So, when system time is changed, this function is supposed to > return immediately. It has a check for that. However, the callout > subsystem did not implement this semantic. So, once this code > got into the callout subsystem, it had to wait for the timer to > go off. > > This implementation corrects that. Now, the callout subsystem > implements the correct semantic. Hence, the premature return > when system time is changed. > > This should have been documented in stime(), sleep(), usleep(), > nanosleep() and poll(). Either a bug should be filed for > documentation or callers of cv_waituntil_sig() should check > for premature return and wait again. I will consult my team about > the right action for this. > > To summarize, this is not a bug in the callout subsystem. But the > legacy behavior is wrong. > > > >> > >> > Btw. the changed timeout for poll() behavior did break hald: >> > >> > Bug ID: 6792302 >> > Synopsis: hald occasionally exits on startup with status 2 >> > http://bugs.opensolaris.org/view_bug.do?bug_id=6792302 >> > -- >> _______________________________________________ opensolaris-code mailing list opensolaris-code@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/opensolaris-code