Timekeeping and scheduling

Martin Lucina Thu, 05 Mar 2015 08:35:19 -0800

Hi,

While trying to get MySQL working I ran into problems with timekeeping in
the rump kernel which I don't entirely understand. Basically it boils down
to the following code snippet not behaving as expected:


    for (i= 0; i < 1000000; ++i)
    {
        usecs_end = get_usecs();
        // sched_yield();
        if (usecs_end - usecs_start > 200)
            break;
    }

Full code is at: https://gist.github.com/mato/2caf693b2d339308825d

get_usecs() uses gettimeofday() internally to get the time as number of
microseconds since the epoch.

Observed behaviour:

native: Loop completes before limit, usecs_end - usecs_start == ~200.

rr-xen: Loop does not complete before limit,  usecs_end - usecs_start is
zero, i.e. the time from gettimeofday is NEVER updated during the loop.

rr-baremetal: As for -xen.

rr-posix: Loop completes before limit, usecs_end - usecs_start is 10000
(100 Hz, matches the internal rump timecounter frequency).

If I uncomment the sched_yield() call, then the behaviour for -baremetal
and -xen matches -posix.

Looking at the code paths, rump_schedule() and rump_unschedule() are called
around each syscall in rump_syscall() so this should cause the clock softint to 
run *eventually* thus updating the time values inside the rump kernel. However, 
in practice that never happens.

What am I missing here? Why is the extra call to sched_yield() necessary?

Background: MySQL uses a similar loop to measure the frequency of RDTSC
during bootstrap. If gettimeofday() is not working as expected then
resulting "time passed" is zero which then causes a divide by zero in the
computation after the loop.

As discussed with Antti yesterday on IRC that a proper solution for
*accurate* timekeeping is a better timecounter driver for -xen and
-baremetal. I am not disputing that but before I start developing one I'd
like to fully understand why code like this is not working *at all* with
the current arrangement.

-mato

Timekeeping and scheduling

Reply via email to