Hi,
While trying to get MySQL working I ran into problems with timekeeping in
the rump kernel which I don't entirely understand. Basically it boils down
to the following code snippet not behaving as expected:
for (i= 0; i < 1000000; ++i)
{
usecs_end = get_usecs();
// sched_yield();
if (usecs_end - usecs_start > 200)
break;
}
Full code is at: https://gist.github.com/mato/2caf693b2d339308825d
get_usecs() uses gettimeofday() internally to get the time as number of
microseconds since the epoch.
Observed behaviour:
native: Loop completes before limit, usecs_end - usecs_start == ~200.
rr-xen: Loop does not complete before limit, usecs_end - usecs_start is
zero, i.e. the time from gettimeofday is NEVER updated during the loop.
rr-baremetal: As for -xen.
rr-posix: Loop completes before limit, usecs_end - usecs_start is 10000
(100 Hz, matches the internal rump timecounter frequency).
If I uncomment the sched_yield() call, then the behaviour for -baremetal
and -xen matches -posix.
Looking at the code paths, rump_schedule() and rump_unschedule() are called
around each syscall in rump_syscall() so this should cause the clock softint to
run *eventually* thus updating the time values inside the rump kernel. However,
in practice that never happens.
What am I missing here? Why is the extra call to sched_yield() necessary?
Background: MySQL uses a similar loop to measure the frequency of RDTSC
during bootstrap. If gettimeofday() is not working as expected then
resulting "time passed" is zero which then causes a divide by zero in the
computation after the loop.
As discussed with Antti yesterday on IRC that a proper solution for
*accurate* timekeeping is a better timecounter driver for -xen and
-baremetal. I am not disputing that but before I start developing one I'd
like to fully understand why code like this is not working *at all* with
the current arrangement.
-mato