date:20170730

Re: Understanding PR kern/43997 (kernel timing problems / qemu)

2017-07-30 Thread Michael van Elst

k...@munnari.oz.au (Robert Elz) writes:

>  | This is not to be confused with the kernel idea of wall-clock time
>  | (i.e. what date reports). wall-clock time is usually maintained
>  | by hardware seperated from the interrupt timers. The 'date; sleep 5; date'
>  | sequence therefore can show that 10 seconds passed.

>But that is totally broken.

Broken is the HZ=100 configuration that doesn't match the broken (emulated)
hardware.

Given a sane time reference you could adjust HZ automatically. Without
you could make it a boot time parameter.

For the Anita test harness you could probably just boot into ddb and set
the hz variable.

-- 
-- 
Michael van Elst
Internet: mlel...@serpens.de
"A potential Snark may lurk in every tree."

Re: Understanding PR kern/43997 (kernel timing problems / qemu)

2017-07-30 Thread Robert Elz

Date:Sun, 30 Jul 2017 16:04:38 - (UTC)
From:mlel...@serpens.de (Michael van Elst)
Message-ID:  

  | There are slower emulated systems that don't have these issues. (*)

Yes, that it is not qemu's execution speed was (really, always) becoming
obvious.

  | If the host misses interrupts, time in the guest just passes slower
  | than real-time. But inside the guest it is consistent.

If we could achieve that (which changing the timecounter in qemu
apparently achieves) it would at least make the world become rational.
Of course, keeping the timing running faster would be better - if we were
able to get to a state where the client/guest were actually able to talk
to the outside world (that part is easy) and run NTP, and act as a time
server that others could trust, that would be ideal.

  | This is not to be confused with the kernel idea of wall-clock time
  | (i.e. what date reports). wall-clock time is usually maintained
  | by hardware seperated from the interrupt timers. The 'date; sleep 5; date'
  | sequence therefore can show that 10 seconds passed.

But that is totally broken.   While there is no guarantee that a sleep will
wake up after exactly the time requested, it should be as close as is
reasonably possible - and on an unloaded system, where there is sufficient
RAM, and nothing swapped out, and nothing computing for cpu cycles, that
sequence should (always) show something between 5 and 5 and a bit seconds
have passed.   If the cpu is busy, or things are getting swapped/paged out,
then we can expect slower (not only for processes waiting upon timer signals,
but for everything), and that's acceptable.

But otherwise, inconsistent timing is not acceptable.   All kinds of
applications (including network protocols) require time to be kept in a
way that is at least close to what others observe, even if not identical.

One easy (poor) fix is simply to do as used to be done, and have kernel
wall clock time maintained by the tick interrupt - that makes things
consistent, but without any real expectation of accuracy.  The alternative
is to make the tic counts depend upon the external wall clock time source,
so they keep in sync - much the same as the power companies do with frequency,
over any short period, the nominal 50/60 Hz frequency can drift around a lot,
but when measured over any reasonable period, those things are highly accurate
(which is why old AC frequency based tick systems used to have very good
long term time stability, provided they never lost clock interrupts.)

  | The problem with qemu is that it's running on a NetBSD host and
  | therefore cannot issue interrupts based on host time unless the
  | host has a larger HZ value.

In the system of most interest, the host, and the guest, are the exact
same system (the exact same binary kernel) - unless we alter the config
of one of them explicitly to avoid this issue, they cannot help but have
the same HZ value.

As long as the emulated qemu client has access to a reasonably accurate ToD
value (which it obviously does, as the host's time is available to qemu, and
can be, and is it seems, made available to the guest) there's no reason at
all the guest cannot produce the correct number of ticks.

And doing so (since it is just a generic NetBSD) would solve the similar,
but less blatant issue for any other system using ticks, where the occasional
clock interrupt might get lost, and where there is some other ToD source
available.

  | With host and guest running at HZ=100, it's obvious that interrupts
  | mostly come just too late and require two ticks on the host, thus
  | slowing down guest time by a factor of two.

Yes, that is a very good explanation for the observed behaviour, and I
cannot help but be grateful that simply beginning to discuss this issue
has provided so many insights into what is happening, and what we can do
to fix things.

When there is no alternative than tick interrupts, we can, and do, use
those to measure time, and everything works - just if the ticks are not
received at the expected rate time keeping drifts away from real time
(but invisibly when considered only within the system.)

When there is some better measure of real time we can use, we can make sure
that keeps all time keeping synchronised better, regardless of whether the
system is "tickless" or still tick based - it isn't required that every
single tick be 1/HZ apart (they never are precisely anyway) just that over
the long term (which in computing is a half second or so) the correct number
of ticks have occurred.

I think it should be possible to make that happen, and that is what I am
going to see if I can do.   Then we can see if we can find a (good enough)
way to make nanosleep() less ticky - whether by giving up on ticks
altogether (which is probably not the best solution - even if we don't
use ticks for timing, we'd end up emulating them for other things, if only
to avoid needing to rewrite too much of the kernel in one ste

Re: kmem_alloc(0, f)

2017-07-30 Thread Martin Husemann

On Sun, Jul 30, 2017 at 03:23:50PM -, Michael van Elst wrote:
> So what does kmem_alloc(0, KM_SLEEP) do? fail where KM_SLEEP says it
> cannot fail? I don't think that it can return a zero sized allocation
> (i.e. ptr != NULL that cannot be dereferenced).

Sure it could, return a pointer inside some red zone unmapped (but reserved
kva) page. On typical setups and modulo syscctl vm.user_va0_disable
e.g. "return (void*)16;" just as a simple example.

Martin

Re: kmem API to allocate arrays

2017-07-30 Thread Martin Husemann

On Sun, Jul 30, 2017 at 03:30:59PM -, Michael van Elst wrote:
> Reallocation is usually a reason for memory fragmentation. I would
> rather try to avoid it instead of making it easier.

Agreeed. Also for kernel drivers, resizing an array allocation is
a very rare operation and no good reason to overcomplicate the api.

Martin

Re: Understanding PR kern/43997 (kernel timing problems / qemu)

2017-07-30 Thread Michael van Elst

g...@gson.org (Andreas Gustafsson) writes:

>Frank Kardel wrote:
>> Fixing that requires some more work. But I am surprised that the qemu 
>> interrupt rate is seemingly somewhat around 50Hz.

It shouldn't have a problem on Linux.
-- 
-- 
Michael van Elst
Internet: mlel...@serpens.de
"A potential Snark may lurk in every tree."

Re: kmem API to allocate arrays

2017-07-30 Thread Kamil Rytarowski

On 30.07.2017 16:51, Taylor R Campbell wrote:
>> Date: Sun, 30 Jul 2017 16:24:07 +0200
>> From: Kamil Rytarowski 
>>
>> I would allow size to be 0, like with the original reallocarr(3). It
>> might be less pretty, but more compatible with the original model and
>> less vulnerable to accidental panics for no good reason.
> 
> Hard to imagine a legitimate use case for size = 0.  Almost always,
> the parameter will be sizeof(struct foo), or some kind of blocksize
> which necessarily has to be nonzero.
> 
> I started writing some example code, and I'm not too keen on having to
> write kmem_reallocarr for initial allocation and final freeing, so if
> we adopted this, I'd like to have
> 
> int   kmem_allocarr(void *ptrp, size_t size, size_t count, km_flag_t flags);
> int   kmem_reallocarr(void *ptrp, size_t size, size_t ocnt, size_t ncnt,
>   km_flag_t flags);
> void  kmem_freearr(void *ptrp, size_t size, size_t count);
> 
> ...at which point it's actually not clear to me that we have much of a
> use for kmem_reallocarr.  Maybe we do -- I haven't surveyed many
> users.
> 
> This still doesn't address the question of whether or how we should
> express bounds on the allowed sizes of the arrays.
> 

I see, perhaps it's legitimate to avoid realloc due to fragmentation.
Without this reallocarr has little point.



signature.asc
Description: OpenPGP digital signature

Re: Understanding PR kern/43997 (kernel timing problems / qemu)

2017-07-30 Thread Michael van Elst

k...@munnari.oz.au (Robert Elz) writes:

>kern/43997 is the "qemu is too slow, clock interrupts get lost, timing
>gets all messed up" problem that plagues many of the ATF tests that kind
>of expect time to be maintained rationally.

There are slower emulated systems that don't have these issues. (*)

>The problem is really (again from the PR)

>   The routines sleep(3), usleep(3), and nanosleep(2) wake-up based on the 
>   occurrence of clock ticks.  However, the timer interrupt routine
>   determines the actual absolute time.

If the host misses interrupts, time in the guest just passes slower
than real-time. But inside the guest it is consistent.

This is not to be confused with the kernel idea of wall-clock time
(i.e. what date reports). wall-clock time is usually maintained
by hardware seperated from the interrupt timers. The 'date; sleep 5; date'
sequence therefore can show that 10 seconds passed.

The problem with qemu is that it's running on a NetBSD host and
therefore cannot issue interrupts based on host time unless the
host has a larger HZ value.

With host and guest running at HZ=100, it's obvious that interrupts
mostly come just too late and require two ticks on the host, thus
slowing down guest time by a factor of two.

(*) This emulator derives timer information from the emulation itself.
I.e. after N emulated cycles, the timers advances accordingly. If the
emulation is too slow, it may even skip timer values to keep pace.
Only when it is too slow to even adjust the timers once per HZ, you
see similar issues as with in qemu.

dummy# date; sleep 5; date
Sun Jul 30 18:01:23 CEST 2017
Sun Jul 30 18:01:28 CEST 2017

dummy# atf-run t_ldp_regen | atf-report
Tests root: /usr/tests/net/mpls

t_ldp_regen (1/1): 1 test cases
ldp_regen: [48.646148s] Passed.
[48.646421s]

Summary for 1 test programs:
1 passed test cases.
0 failed test cases.
0 expected failed test cases.
0 skipped test cases.

-- 
-- 
Michael van Elst
Internet: mlel...@serpens.de
"A potential Snark may lurk in every tree."

Re: kmem API to allocate arrays

2017-07-30 Thread Michael van Elst

campbell+netbsd-tech-k...@mumble.net (Taylor R Campbell) writes:

>Initially I was reluctant to do that because (a) we don't even have a
>kmem_realloc, perhaps for some particular reason, and (b) it requires
>an extra parameter for the old size.  But I don't know any particular
>reason in (a), and perhaps (b) not so bad after all.  Here's a draft:

Reallocation is usually a reason for memory fragmentation. I would
rather try to avoid it instead of making it easier.

-- 
-- 
Michael van Elst
Internet: mlel...@serpens.de
"A potential Snark may lurk in every tree."

Re: kmem_alloc(0, f)

2017-07-30 Thread Michael van Elst

mar...@duskware.de (Martin Husemann) writes:

>On Sat, Jul 29, 2017 at 02:04:42PM +, Taylor R Campbell wrote:
>> This seems like a foot-oriented panic gun, and it's been a source of
>> problems in the past.  Can we change it?

>I think it is a valuable tool to catch driver bugs early during
>developement, but wouldn't mind to reduce it to a KASSERT.

So what does kmem_alloc(0, KM_SLEEP) do? fail where KM_SLEEP says it
cannot fail? I don't think that it can return a zero sized allocation
(i.e. ptr != NULL that cannot be dereferenced).

-- 
-- 
Michael van Elst
Internet: mlel...@serpens.de
"A potential Snark may lurk in every tree."

Re: Understanding PR kern/43997 (kernel timing problems / qemu)

2017-07-30 Thread Mouse

>> # time sleep 10
>>10.02 real 0.00 user 0.00 sys

>> This actually took 20 seconds of real time (manually timed with a
>> stopwatch).

> [...], but an error of a factor 2 looks suspicious.

This is tickling old memories.  I think I ran into a case where
requesting timer ticks at 100Hz actually got them at 50Hz instead, even
though the kernel was running with 100Hz ticks.  I've done some
searching and completely failed to find either the program exhibiting
the symptom (I _think_ it was userland) or the fix, but it might be
worth looking into the possibility that this is another manifestation
of the same underlying problem, whatever it was.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

Re: kmem API to allocate arrays

2017-07-30 Thread Taylor R Campbell

> Date: Sun, 30 Jul 2017 16:24:07 +0200
> From: Kamil Rytarowski 
> 
> I would allow size to be 0, like with the original reallocarr(3). It
> might be less pretty, but more compatible with the original model and
> less vulnerable to accidental panics for no good reason.

Hard to imagine a legitimate use case for size = 0.  Almost always,
the parameter will be sizeof(struct foo), or some kind of blocksize
which necessarily has to be nonzero.

I started writing some example code, and I'm not too keen on having to
write kmem_reallocarr for initial allocation and final freeing, so if
we adopted this, I'd like to have

int kmem_allocarr(void *ptrp, size_t size, size_t count, km_flag_t flags);
int kmem_reallocarr(void *ptrp, size_t size, size_t ocnt, size_t ncnt,
km_flag_t flags);
voidkmem_freearr(void *ptrp, size_t size, size_t count);

...at which point it's actually not clear to me that we have much of a
use for kmem_reallocarr.  Maybe we do -- I haven't surveyed many
users.

This still doesn't address the question of whether or how we should
express bounds on the allowed sizes of the arrays.

Re: kmem API to allocate arrays

2017-07-30 Thread Kamil Rytarowski

On 30.07.2017 15:45, Taylor R Campbell wrote:
>> Date: Sun, 30 Jul 2017 10:22:11 +0200
>> From: Kamil Rytarowski 
>>
>> I think we should go for kmem_reallocarr(). It has been designed for
>> overflows like realocarray(3) with an option to be capable to resize a
>> table fron 1 to N elements and back from N to 0 including freeing.
> 
> Initially I was reluctant to do that because (a) we don't even have a
> kmem_realloc, perhaps for some particular reason, and (b) it requires
> an extra parameter for the old size.  But I don't know any particular
> reason in (a), and perhaps (b) not so bad after all.  Here's a draft:
> 
> int
> kmem_reallocarr(void *ptrp, size_t size, size_t ocnt, size_t ncnt, int flags)
> {
>   void *optr, *nptr;
> 
>   KASSERT(size != 0);
>   if (__predict_false((size|ncnt) >= SQRT_SIZE_MAX &&
>   ncnt > SIZE_MAX/size))
>   return ENOMEM;
> 
>   memcpy(&optr, ptrp, sizeof(void *));
>   KASSERT((ocnt == 0) == (optr == NULL));
>   if (ncnt == 0) {
>   nptr = NULL;
>   } else {
>   nptr = kmem_alloc(size*ncnt, flags);
>   KASSERT(nptr != NULL || flags == KM_NOSLEEP);
>   if (nptr == NULL)
>   return ENOMEM;
>   }
>   KASSERT((ncnt == 0) == (nptr == NULL));
>   if (ocnt & ncnt)
>   memcpy(nptr, optr, size*MIN(ocnt, ncnt));
>   if (ocnt != 0)
>   kmem_free(optr, size*ocnt);
>   memcpy(ptrp, &nptr, sizeof(void *));
> 
>   return 0;
> }
> 

I would allow size to be 0, like with the original reallocarr(3). It
might be less pretty, but more compatible with the original model and
less vulnerable to accidental panics for no good reason.



signature.asc
Description: OpenPGP digital signature

Re: Understanding PR kern/43997 (kernel timing problems / qemu)

2017-07-30 Thread Andreas Gustafsson

Frank Kardel wrote:
> Fixing that requires some more work. But I am surprised that the qemu 
> interrupt rate is seemingly somewhat around 50Hz.
> Could it be a bug in qemu getting the frequeny not right. qemu should 
> read the clock to get the frequencies right an possibly skip
> usleeps less than i/HZ possibly managing an error-budget. I haven't 
> looked into qemu at all, but an error of a factor 2 looks suspicious.

I fully agree.
-- 
Andreas Gustafsson, g...@gson.org

Re: kmem API to allocate arrays

2017-07-30 Thread Taylor R Campbell

> Date: Sun, 30 Jul 2017 10:22:11 +0200
> From: Kamil Rytarowski 
> 
> I think we should go for kmem_reallocarr(). It has been designed for
> overflows like realocarray(3) with an option to be capable to resize a
> table fron 1 to N elements and back from N to 0 including freeing.

Initially I was reluctant to do that because (a) we don't even have a
kmem_realloc, perhaps for some particular reason, and (b) it requires
an extra parameter for the old size.  But I don't know any particular
reason in (a), and perhaps (b) not so bad after all.  Here's a draft:

int
kmem_reallocarr(void *ptrp, size_t size, size_t ocnt, size_t ncnt, int flags)
{
void *optr, *nptr;

KASSERT(size != 0);
if (__predict_false((size|ncnt) >= SQRT_SIZE_MAX &&
ncnt > SIZE_MAX/size))
return ENOMEM;

memcpy(&optr, ptrp, sizeof(void *));
KASSERT((ocnt == 0) == (optr == NULL));
if (ncnt == 0) {
nptr = NULL;
} else {
nptr = kmem_alloc(size*ncnt, flags);
KASSERT(nptr != NULL || flags == KM_NOSLEEP);
if (nptr == NULL)
return ENOMEM;
}
KASSERT((ncnt == 0) == (nptr == NULL));
if (ocnt & ncnt)
memcpy(nptr, optr, size*MIN(ocnt, ncnt));
if (ocnt != 0)
kmem_free(optr, size*ocnt);
memcpy(ptrp, &nptr, sizeof(void *));

return 0;
}

Re: Understanding PR kern/43997 (kernel timing problems / qemu)

2017-07-30 Thread Frank Kardel


Hi Andreas !

On 07/30/17 15:20, Andreas Gustafsson wrote:
> Frank Kardel wrote:
>> Could you check which timecounter is used under qemu?
>>
>> sysctl kern.timecounter.hardware
>
> # sysctl kern.timecounter.hardware
> kern.timecounter.hardware = hpet0
>
>> Usually the timecounters are hardware-based and have no relation
>> to the clockinterrupt. In case of qemu you might get a good
>> emulated timecounter, but a suboptimal clockinterupt.
>> If this is the case it helps to use the clockinterrupt
>> itself as timecounter for the wall clock time to avoid a discrepancy
>> between clockinterrupt-driven timeout handling and wall-clock time 
tracking.

>>
>> sysctl -w kern.timecounter.hardware=clockinterrupt
>
> # sysctl -w kern.timecounter.hardware=clockinterrupt
> kern.timecounter.hardware: hpet0 -> clockinterrupt
> # time sleep 10
>10.02 real 0.00 user 0.00 sys
>
> This actually took 20 seconds of real time (manually timed with a
> stopwatch).
>
>> This is the opposite from deducing the missed clock interrupts
>> from the wall clock time and keeps timeout handling and in the 
emulation

>> observed wall-time synchronized no matter how slow
>> the clock-interrupts are - the emulated wall clock time will be
>> at the same rate.
>
> Right, but I would still rather see the bug fixed than worked around
> this way.
Fixing that requires some more work. But I am surprised that the qemu 
interrupt rate is seemingly somewhat around 50Hz.
Could it be a bug in qemu getting the frequeny not right. qemu should 
read the clock to get the frequencies right an possibly skip
usleeps less than i/HZ possibly managing an error-budget. I haven't 
looked into qemu at all, but an error of a factor 2 looks suspicious.


Frank

Re: Understanding PR kern/43997 (kernel timing problems / qemu)

2017-07-30 Thread Andreas Gustafsson

Frank Kardel wrote:
> Could you check which timecounter is used under qemu?
> 
> sysctl kern.timecounter.hardware

# sysctl kern.timecounter.hardware
kern.timecounter.hardware = hpet0

> Usually the timecounters are hardware-based and have no relation
> to the clockinterrupt. In case of qemu you might get a good
> emulated timecounter, but a suboptimal clockinterupt.
> If this is the case it helps to use the clockinterrupt
> itself as timecounter for the wall clock time to avoid a discrepancy 
> between clockinterrupt-driven timeout handling and wall-clock time tracking.
> 
> sysctl -w kern.timecounter.hardware=clockinterrupt

# sysctl -w kern.timecounter.hardware=clockinterrupt
kern.timecounter.hardware: hpet0 -> clockinterrupt
# time sleep 10
   10.02 real 0.00 user 0.00 sys

This actually took 20 seconds of real time (manually timed with a
stopwatch).

> This is the opposite from deducing the missed clock interrupts
> from the wall clock time and keeps timeout handling and in the emulation 
> observed wall-time synchronized no matter how slow
> the clock-interrupts are - the emulated wall clock time will be
> at the same rate.

Right, but I would still rather see the bug fixed than worked around
this way.
-- 
Andreas Gustafsson, g...@gson.org

Re: Understanding PR kern/43997 (kernel timing problems / qemu)

2017-07-30 Thread Frank Kardel


Could you check which timecounter is used under qemu?

sysctl kern.timecounter.hardware

Usually the timecounters are hardware-based and have no relation
to the clockinterrupt. In case of qemu you might get a good
emulated timecounter, but a suboptimal clockinterupt.
If this is the case it helps to use the clockinterrupt
itself as timecounter for the wall clock time to avoid a discrepancy 
between clockinterrupt-driven timeout handling and wall-clock time tracking.


sysctl -w kern.timecounter.hardware=clockinterrupt

This is the opposite from deducing the missed clock interrupts
from the wall clock time and keeps timeout handling and in the emulation 
observed wall-time synchronized no matter how slow

the clock-interrupts are - the emulated wall clock time will be
at the same rate.

This might be a workaround for the current qemu issue and does not
affect any discussion about improving sleep timing or
migrating to a tick-less kernel.

BTW: even a tick-less kernel will need to e a minimum interrupt
frequency in order to avoid undetected timecounter wrapping.

Frank



On 07/30/17 14:22, Robert Elz wrote:

 Date:Sun, 30 Jul 2017 13:01:50 +0300
 From:Andreas Gustafsson 
 Message-ID:  <22909.44686.188004.117...@guava.gson.org>

   | I don't think the slowness of qemu's emulation is the actual cause of
   | its inability to simulate clock interrupts at 100 Hz.

Yes, I was wondering about that, as if it was, there'd often be no time
left for anything else...

   | If my theory is correct, there are at least three ways the problem
   | could be fixed:
   |
   |  - Improve the time resolution of sleeps on the host system,
   |  - Make qemu deal better with hosts unable to sleep for short periods

Either, or both, of those should be fixed, and I might get to take a
look at the first one (the insides of qemu are not all that appealing...)
but

   |  - Make the guest system deal better with missed timer interrupts.

This one needs to be fixed. an idle system that says it takes 13 seconds
to do a sleep 10 is simply broken.  Fixing the other issues (or either
one of them) would make it much harder to work on this one - that is
keeping the qemu/host relationship stable allows a platform where the
timekeeping issues in the kernel are known to occur, so a good way to
verify any fix, so I think this should be fixed first.

kre

Re: Understanding PR kern/43997 (kernel timing problems / qemu)

2017-07-30 Thread Robert Elz

Date:Sun, 30 Jul 2017 13:01:50 +0300
From:Andreas Gustafsson 
Message-ID:  <22909.44686.188004.117...@guava.gson.org>

  | I don't think the slowness of qemu's emulation is the actual cause of
  | its inability to simulate clock interrupts at 100 Hz.

Yes, I was wondering about that, as if it was, there'd often be no time
left for anything else...

  | If my theory is correct, there are at least three ways the problem
  | could be fixed:
  | 
  |  - Improve the time resolution of sleeps on the host system,
  |  - Make qemu deal better with hosts unable to sleep for short periods

Either, or both, of those should be fixed, and I might get to take a
look at the first one (the insides of qemu are not all that appealing...)
but

  |  - Make the guest system deal better with missed timer interrupts.

This one needs to be fixed. an idle system that says it takes 13 seconds
to do a sleep 10 is simply broken.  Fixing the other issues (or either
one of them) would make it much harder to work on this one - that is
keeping the qemu/host relationship stable allows a platform where the
timekeeping issues in the kernel are known to occur, so a good way to
verify any fix, so I think this should be fixed first.

kre

Re: kmem_alloc(0, f)

2017-07-30 Thread Martin Husemann

On Sat, Jul 29, 2017 at 02:04:42PM +, Taylor R Campbell wrote:
> This seems like a foot-oriented panic gun, and it's been a source of
> problems in the past.  Can we change it?

I think it is a valuable tool to catch driver bugs early during
developement, but wouldn't mind to reduce it to a KASSERT.

Martin

Re: Understanding PR kern/43997 (kernel timing problems / qemu)

2017-07-30 Thread Andreas Gustafsson

Robert Elz wrote:
> I want to leave /bin/sh to percolate for a while, make sure there are
> no issues with it as it is, before starting on the next round of
> cleanups and bug fixes, so I was looking for something else to poke
> my nose into ...
> 
> [Aside: the people I added to the cc of this message are those who have
>  added text to PR kern/43997 and so I thought might be interested, if you're
>  not, just say...]
> 
> kern/43997 is the "qemu is too slow, clock interrupts get lost, timing
> gets all messed up" problem that plagues many of the ATF tests that kind
> of expect time to be maintained rationally.

Thank you for looking into this.

> Now there's no question that qemu is slow, for example, on my amd64 Xen
> DomU test system, the shell arithmetic test of ++x (etc) takes:
>   var_preinc: [0.077617s] Passed.
> whereas from the latest completed b5 (qemu) test run (as of this e-mail)
>   var_preinc   Passed   N/A   6.200489s
> 
> That's about 80 times slower (and most of the other tests show similar
> factors).   I don't think we can blame qemu for that, given what it is
> doing.
> 
> So, it is hardly surprising that, to borrow Paul's words from the PR:
>   On (at least) amd64 architecture, qemu cannot simulate clock
>   interrupts at 100Hz.

I don't think the slowness of qemu's emulation is the actual cause of
its inability to simulate clock interrupts at 100 Hz.  Rather, I think
it is more likely caused by the inability of qemu to sleep for periods
shorter than 10 ms due to limitations of the underlying host OS, such
as that documented in the BUGS section of nanosleep(2).

That this is at least partly a host system issue is supported by the
observation that when qemu is hosted on a Linux system, the timing in
the NetBSD guest is much more accurate than when qemu is hosted on
NetBSD, on similar hardware:

  NetBSD-on-qemu-on-NetBSD# time sleep 10
 13.00 real 0.00 user 0.03 sys

  NetBSD-on-qemu-on-Linux# time sleep 10
 10.13 real 0.02 user 0.02 sys

If my theory is correct, there are at least three ways the problem
could be fixed:

 - Improve the time resolution of sleeps on the host system, as
   recently discussed on tech-kern in a thread starting with
   http://mail-index.netbsd.org/tech-kern/2017/07/02/msg022024.html

 - Make qemu deal better with hosts unable to sleep for short
   periods of time, or

 - Make the guest system deal better with missed timer interrupts.

-- 
Andreas Gustafsson, g...@gson.org

Re: kmem API to allocate arrays

2017-07-30 Thread Kamil Rytarowski

On 29.07.2017 16:19, Taylor R Campbell wrote:
> It's stupid that we have to litter drivers with
> 
>   if (SIZE_MAX/sizeof(struct xyz_cookie) < iocmd->ncookies) {
>   error = EINVAL;
>   goto out;
>   }
>   cookies = kmem_alloc(iocmd->ncookies*sizeof(struct xyz_cookie),
>   KM_SLEEP);
>   ...
> 
> and as you can tell from some recent commits, it hasn't been done
> everywhere.  It's been a consistent source of problems in the past.
> 
> This multiplication overflow check, which is all that most drivers do,
> also doesn't limit the amount of wired kmem that userland can request,
> and there's no way for kmem to say `sorry, I can't satisfy this
> request: it's too large' other than to panic or wedge.
> 
> In userland we now have reallocarr(3).  I propose that we add
> something to the kernel, but I'm not immediately sure what it should
> look like because kernel is a little different.  Solaris/illumos
> doesn't seem to have anything we could obviously parrot, from a
> cursory examination.
> 
> We could add something like
> 
>   void*kmem_allocarray(size_t n, size_t size, int flags);
>   voidkmem_freearray(size_t n, size_t size);
> 
> That wouldn't address bounding the amount of wired kmem userland can
> request.  Perhaps that's OK: perhaps it's enough to have drivers put
> limits on the number of (say) array elements at the call site,
> although then there's not as much advantage to having a new API.
> Instead, we could make it
> 
>   void*kmem_allocarray(size_t n, size_t size, size_t maxn,
>   int flags);
> or
>   void*kmem_allocarray(size_t n, size_t size, size_t maxbytes,
>   int flags);
> 
> It's not clear that the call site is exactly the right place to
> compute a bound on the number of bytes a user can allocate.  On the
> other hand, if it's not clear up front what the bound is, then that
> makes a foot-oriented panic gun, or an instawedge, if the kernel can
> decides at run-time how many bytes is more than it can possibly ever
> satisfy, which is not so great either.  If you specify up front in the
> source, at least you can say by examination of the source whether it
> has a chance of working or not on some particular platform.  And maybe
> we can make it easier to write an expression for `no more than 10% of
> the machine's current RAM' or something.
> 
> Either way, kmem_allocarray would have to have the option of returning
> NULL, unlike kmem_alloc(..., KM_SLEEP), which is a nontrivial change
> to the contract now that chuq@ recently dove in deep to make sure it
> never returns NULL.
> 
> Thoughts?
> 

I think we should go for kmem_reallocarr(). It has been designed for
overflows like realocarray(3) with an option to be capable to resize a
table fron 1 to N elements and back from N to 0 including freeing.



signature.asc
Description: OpenPGP digital signature

Re: Understanding PR kern/43997 (kernel timing problems / qemu)

Re: Understanding PR kern/43997 (kernel timing problems / qemu)

Re: kmem_alloc(0, f)

Re: kmem API to allocate arrays

Re: Understanding PR kern/43997 (kernel timing problems / qemu)

Re: kmem API to allocate arrays

Re: Understanding PR kern/43997 (kernel timing problems / qemu)

Re: kmem API to allocate arrays

Re: kmem_alloc(0, f)

Re: Understanding PR kern/43997 (kernel timing problems / qemu)

Re: kmem API to allocate arrays

Re: kmem API to allocate arrays

Re: Understanding PR kern/43997 (kernel timing problems / qemu)

Re: kmem API to allocate arrays

Re: Understanding PR kern/43997 (kernel timing problems / qemu)

Re: Understanding PR kern/43997 (kernel timing problems / qemu)

Re: Understanding PR kern/43997 (kernel timing problems / qemu)

Re: Understanding PR kern/43997 (kernel timing problems / qemu)

Re: kmem_alloc(0, f)

Re: Understanding PR kern/43997 (kernel timing problems / qemu)

Re: kmem API to allocate arrays

21 matches

Site Navigation

Mail list logo

Footer information