Madhavan T. Venkataraman wrote: > It is a feature. > > Here is the comment before cv_waituntil_sig(). > > * As a special test, if someone abruptly resets the system time > * (but not through adjtime(2); drifting of the clock is allowed and > * expected [see timespectohz_adj()]), then we force a return of -1 > * so the caller can return a premature timeout to the calling process > * so it can reevaluate the situation in light of the new system time. > * (The system clock has been reset if timecheck != timechanged.) > > So, when system time is changed, this function is supposed to > return immediately. It has a check for that. However, the callout > subsystem did not implement this semantic. So, once this code > got into the callout subsystem, it had to wait for the timer to > go off. > > This implementation corrects that. Now, the callout subsystem > implements the correct semantic. Hence, the premature return > when system time is changed. > > This should have been documented in stime(), sleep(), usleep(), > nanosleep() and poll(). Either a bug should be filed for > documentation or callers of cv_waituntil_sig() should check > for premature return and wait again. I will consult my team about > the right action for this.
Is the premature return from the system call (not caused by signal delivery) allowed by any standard? How is this case handled in other operating systems? E.g. the www.opengroup.org page for poll() tells me that "... poll() shall wait at least timeout milliseconds for an event to occur on any of the selected file descriptors." OTOH, x/open does not seem to include the concept of changing system time... Another thing that I've observed: when an interval timer is set with setitimer(), the timer is not affected by stime(). E.g. with the new ksh93 based /bin/sleep a sleep 30 is implemented with poll() and gets the premature return; but /bin/sleep 31 is implemented in ksh93 using setitimer() and is not a affected by stime(). > To summarize, this is not a bug in the callout subsystem. But the > legacy behavior is wrong. That is, kernel consumers of cv_waituntil_sig() may need more fixes; the idea is that user level code shouldn't need changes? > > Btw. the changed timeout for poll() behavior did break hald: > > Bug ID: 6792302 > > Synopsis: hald occasionally exits on startup with status 2 > > http://bugs.opensolaris.org/view_bug.do?bug_id=6792302 > Same issue. One simple solution is to check for premature return and > sleep again. This will only happen when someone changes the system time. On my box the ntp service is enabled, and this runs ntpdate at boot time, which sets the system clock. This races with the startup of hald and in about 3 out of 4 cases hald does not startup correctly because the poll() with a 250 second timeout returned too early. /lib/svc/method/xntp: # Run ntpdate to sync system to peer before starting xntpd [ -n "$ARGS" ] && /usr/sbin/ntpdate $ARGS /usr/lib/inet/xntpd And I suspect that users that have set up their system to synchronize clocks using cron + rdate might see the premature return from timeout delays, too. -- This message posted from opensolaris.org _______________________________________________ opensolaris-code mailing list opensolaris-code@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/opensolaris-code