Date: Wed, 21 Jan 2009 09:17:28 -0800 (PST)
> From: J?rgen Keil <j...@tools.de>
> Subject: [osol-code] 6565503 callout changes, stime(2), and timeouts
> To: opensolaris-code@opensolaris.org
> > The putback for 6565503 "callout processing is single threaded, throttling
> applications that rely on scalable callouts" in build 103 apparently has
> changed the kernel function cv_waituntil_sig() to create callouts with
> a new flag CALLOUT_FLAG_HRESTIME.  This flag is described in
> uts/common/sys/callo.h as:
> > * CALLOUT_FLAG_HRESTIME
>  * Normally, callouts are not affected by changes to system time
>  * (hrestime). This flag is used to create a callout that is affected
>  * by system time. If system time changes, these timers must expire
>  * at once. These are used by condition variables and LWP timers that
>  * need this behavior.
> > cv_waituntil_sig() is used with several system calls (poll() / select() /
> sigtimedwait(), semtimedop(), ...).  What I'm observing is that with
> build 103 or newer all of these system calls return prematurely -
> before the timeout expires - when the system time is set (e.g. by
> ntp / ntpdate or rdate).
> > Test cases are: > > #include <stdio.h>
> #include <stdlib.h>
> #include <unistd.h>
> > int
> main(int argc, char **argv)
> {
>    struct timeval  tv;
>    int             i;
> > tv.tv_sec = 250;
>    tv.tv_usec = 0;
>    i = select(0, NULL, NULL, NULL, &tv);
>    if (i < 0) {
>            perror("select");
>            exit(1);
>    }
>    printf("select returned %d\n", i);
>    exit(0);
> }
> > ----------------------------------------------- > > #include <stdio.h>
> #include <stdlib.h>
> #include <unistd.h>
> #include <signal.h>
> > int
> main(int argc, char **argv)
> {
>    sigset_t set;
>    siginfo_t info;
>    struct timespec tmout;
> > sigemptyset(&set);
>    sigaddset(&set, SIGALRM);
>    tmout.tv_sec = 250;
>    tmout.tv_nsec = 0;
>    sigtimedwait(&set, &info, &tmout);
> }
> > > ----------------------------------------------- > > #include <sys/types.h>
> #include <sys/ipc.h>
> #include <sys/sem.h>
> > > int
> main(int argc, char **argv)
> {
>    int sem;
>    struct sembuf ops[1];
>    struct timespec ts;
> > sem = semget(IPC_PRIVATE, 1, 0600);
>    if (sem < 0) {
>            perror("semget");
>            exit(1);
>    }
>    ops[0].sem_num = 0;
>    ops[0].sem_op = -1;
>    ops[0].sem_flg = 0;
> > ts.tv_sec = 250;
>    ts.tv_nsec = 0;
>    semtimedop(sem, ops, 1, &ts);
> }
> > > > If you start any one of these on build 103 or newer
> and run "rdate {time-server}" or set the date with
> "date HHMM" in another window, the system call
> gets a timeout and the program terminates.
> Expected behavior would be that these programs
> wait 250 seconds for some event, and changing
> the system clock does not affect waiting.
> > Is this a bug or a feature of 6565503? >

It is a feature.

Here is the comment before cv_waituntil_sig().

* As a special test, if someone abruptly resets the system time * (but not through adjtime(2); drifting of the clock is allowed and * expected [see timespectohz_adj()]), then we force a return of -1 * so the caller can return a premature timeout to the calling process * so it can reevaluate the situation in light of the new system time. * (The system clock has been reset if timecheck != timechanged.)

So, when system time is changed, this function is supposed to
return immediately. It has a check for that. However, the callout
subsystem did not implement this semantic. So, once this code
got into the callout subsystem, it had to wait for the timer to
go off.

This implementation corrects that. Now, the callout subsystem
implements the correct semantic. Hence, the premature return
when system time is changed.

This should have been documented in stime(), sleep(), usleep(),
nanosleep() and poll(). Either a bug should be filed for
documentation or callers of cv_waituntil_sig() should check
for premature return and wait again. I will consult my team about
the right action for this.

To summarize, this is not a bug in the callout subsystem. But the
legacy behavior is wrong.



> > Btw. the changed timeout for poll() behavior did break hald: > > Bug ID: 6792302
>     Synopsis: hald occasionally exits on startup with status 2
>     http://bugs.opensolaris.org/view_bug.do?bug_id=6792302
> --
Same issue. One simple solution is to check for premature return and
sleep again. This will only happen when someone changes the system time.


> This message posted from opensolaris.org
> _______________________________________________
> opensolaris-code mailing list
> opensolaris-code@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/opensolaris-code


----- End forwarded message -----

-- Chris J. Kiick - Perf Geek and I/O monkey | #include <disclaimer.h> Sun Microsystems: SSG: SPARC Platform Software: Enterprise Workgroup Software Austin TX 512-401-1408

_______________________________________________
opensolaris-code mailing list
opensolaris-code@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/opensolaris-code

Reply via email to