Re: [osol-code] 6565503 callout changes, stime(2), and timeouts

Jürgen Keil Thu, 22 Jan 2009 03:45:29 -0800

Madhavan T. Venkataraman wrote:

> It is a feature.
>
> Here is the comment before cv_waituntil_sig().
>
> * As a special test, if someone abruptly resets the system time 
> * (but not through adjtime(2); drifting of the clock is allowed and 
> * expected [see timespectohz_adj()]), then we force a return of -1 
> * so the caller can return a premature timeout to the calling process 
> * so it can reevaluate the situation in light of the new system time. 
> * (The system clock has been reset if timecheck != timechanged.)
>
> So, when system time is changed, this function is supposed to
> return immediately. It has a check for that. However, the callout
> subsystem did not implement this semantic. So, once this code
> got into the callout subsystem, it had to wait for the timer to
> go off.
> 
> This implementation corrects that. Now, the callout subsystem
> implements the correct semantic. Hence, the premature return
> when system time is changed.
>
> This should have been documented in stime(), sleep(), usleep(),
> nanosleep() and poll(). Either a bug should be filed for
> documentation or callers of cv_waituntil_sig() should check
> for premature return and wait again. I will consult my team about
> the right action for this.


Is the premature return from the system call (not caused by signal
delivery) allowed by any standard?  How is this case handled in other
operating systems?

E.g. the www.opengroup.org page for poll() tells me
that "... poll() shall wait at least timeout milliseconds
for an event to occur on any of the selected file descriptors."
OTOH, x/open does not seem to include the concept of
changing system time...


Another thing that I've observed: when an interval timer
is set with setitimer(), the timer is not affected 
by stime().  E.g. with the new ksh93 based /bin/sleep
a sleep 30 is implemented with poll() and gets the
premature return; but /bin/sleep 31 is implemented in
ksh93 using setitimer() and is not a affected by stime().


> To summarize, this is not a bug in the callout subsystem. But the
> legacy behavior is wrong.

That is, kernel consumers of cv_waituntil_sig()
may need more fixes; the idea is that user level code
shouldn't need changes?


> > Btw. the changed timeout for poll() behavior did break hald:
> >  Bug ID: 6792302
> >  Synopsis: hald occasionally exits on startup with status 2
> > http://bugs.opensolaris.org/view_bug.do?bug_id=6792302

> Same issue. One simple solution is to check for premature return and
> sleep again. This will only happen when someone changes the system time.

On my box the ntp service is enabled, and this runs ntpdate
at boot time, which sets the system clock.  This races with the
startup of hald and in about 3 out of 4 cases hald does not startup
correctly because the poll() with a 250 second timeout returned too
early.

/lib/svc/method/xntp:

    # Run ntpdate to sync system to peer before starting xntpd
    [ -n "$ARGS" ] && /usr/sbin/ntpdate $ARGS
    /usr/lib/inet/xntpd



And I suspect that users that have set up their system to
synchronize clocks using cron + rdate might see the 
premature return from timeout delays,  too.
-- 
This message posted from opensolaris.org
_______________________________________________
opensolaris-code mailing list
opensolaris-code@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/opensolaris-code

Re: [osol-code] 6565503 callout changes, stime(2), and timeouts

Reply via email to