Re: 9.1: boot-time delay? [WORKAROUND FOUND]

2021-05-26 Thread Michael van Elst
mlel...@serpens.de (Michael van Elst) writes:

>Either direction mstohz or hztoms should better always round up to
>guarantee a minimal delay.

And both should be replaced by hztous()/ustohz().

Microseconds allow a time value of ~35 minutes as 32bit signed
integer, which should be a safe range for delays.

A higher resolution might have its use too, but then you have
to assure that such values are always carried in 64bit integers
or handled by a structure. There is already a hz2bintime()
macro in that realm, which is unsafe (truncates to milliseconds)
but AFAIK also unused.



Re: 9.1: boot-time delay? [WORKAROUND FOUND]

2021-05-26 Thread Michael van Elst
mo...@rodents-montreal.org (Mouse) writes:

>How heavily is hztoms used?  (I would expect mstohz to be used far more
>heavily.)

sys/dev/acpi/acpi_cpu_cstate.c:   sc->sc_cstate_sleep = 
hztoms(acpitimer_delta(end, start)) * 1000;
sys/dev/spkr_audio.c: audiobell(sc->sc_audiodev, xhz, hztoms(ticks), 
sc->sc_spkr.sc_vol, 0);
sys/dev/i2c/tsllux.c: if (ms < hztoms(1)) {
sys/dev/ic/mvsata.c:  timeout = mstohz(timeout + hztoms(1) - 1);
sys/dev/ic/mvsata.c:  ata_delay(chp, hztoms(1), "mvsata_edma2", 
wflags);
sys/dev/pci/ixgbe/ixgbe_netbsd.c: else if ((us / 1000) >= hztoms(1)) {
sys/dev/sdmmc/if_bwfm_sdio.c: sdmmc_pause(hztoms(1)*1000, NULL);
sys/dev/usb/if_axe.c: usbd_delay_ms(sc->axe_un.un_udev, hztoms(y)); 
  \
sys/dev/usb/if_axe.c: usbd_delay_ms(un->un_udev, hztoms(hz / 32));
sys/dev/usb/if_axe.c: usbd_delay_ms(un->un_udev, hztoms(hz / 32));
sys/dev/usb/if_axe.c: usbd_delay_ms(un->un_udev, hztoms(hz / 32));
sys/dev/usb/if_axe.c: usbd_delay_ms(un->un_udev, hztoms(hz / 32));
sys/external/bsd/drm2/include/linux/jiffies.h:return hztoms(j);
sys/external/bsd/drm2/include/linux/sched.h:  unsigned ms = 
hztoms(MIN((unsigned long)timeout,
sys/kern/sched_4bsd.c:int rttsms = hztoms(sched_rrticks);
sys/kern/sched_m2.c:  int rttsms = hztoms(sched_rrticks);
sys/kern/sched_m2.c:  newsize = hztoms(min_ts);
sys/kern/sched_m2.c:  newsize = hztoms(max_ts);


>> 0 can sometimes mean "never block" and sometimes can mean "block
>> forever".

>What does hztoms's return value get used for?  Is it actually used to
>compute delay values?  It sounds dodgy to me to do so, for this among
>other reasons.

Mostly to compute a delay.

Either direction mstohz or hztoms should better always round up to
guarantee a minimal delay.



Re: Expected behavior when returning from a SIGFPE handler

2021-05-26 Thread Mouse
> But the x86_64 code appears to return to the same instruction, banging its h$

> It's my belief that the alpha behavior is more desirable.

> Please, discuss.

I could argue that either way.

In some cases, you want to re-execute the instruction.  A simple
example is "FPU disabled" on architectures that have such a notion, eg
for lazy FPU switching.

In some cases, you don't want to.  An example might be soft-float, or
partial soft-float (such as, emulation of cases the hardware doesn't
handle).

On architectures like SPARC or, I think - you'd know better than I -
Alpha, where there are very few possible instruction sizes, sometimes
as few as just one, advancing past the instruction in the trap handler
is easy even if you have to do it in software.  On others, like the
VAX, it's a right pain to do in software.  On the latter sort, ideally,
the hardware would give you both the PC to use to re-execute the
instruction and the PC to use to skip the instruction, letting the trap
handler choose, but I'm not aware of any architecture that does that.
(The closest I'm aware of is, ironically, the SPARC, on which advancing
past an instruction in software is about as simple as it gets - but it
has both PC and next-PC in hardware, though admittedly for other
reasons.)

I see no clear single right answer here.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: Expected behavior when returning from a SIGFPE handler

2021-05-26 Thread Paul Goyette

On Wed, 26 May 2021, Jason Thorpe wrote:


...
The alpha code has, for a very long time, always advanced the PC past
the faulting instruction on an arithmetic trap[1].  This, in essence,
makes it behave exactly as if the exception were disabled, while still
giving the handler a chance to "do something").

But the x86_64 code appears to return to the same instruction,
banging its head against the proverbial wall.

It's my belief that the alpha behavior is more desirable.


I agree.  To me, the x86_64 behaviour as reported sounds quite "sub-
optimal" and rather unuseful.


++--+---+
| Paul Goyette   | PGP Key fingerprint: | E-mail addresses: |
| (Retired)  | FA29 0E3B 35AF E8AE 6651 | p...@whooppee.com |
| Software Developer | 0786 F758 55DE 53BA 7731 | pgoye...@netbsd.org   |
++--+---+


Expected behavior when returning from a SIGFPE handler

2021-05-26 Thread Jason Thorpe
I've been working on fixing some test case failures on the Alpha port, and I'm 
elbow-deep in FP-land right now.  The Alpha has a somewhat complicated FP story 
because it has architecture-mandated software completion for essentially 
anything outside the happy path in hardware, and to support that it has a 
virtual floating point control register called "FP_C" that is completely 
orthogonal to the hardware floating point control register (called "FPCR").  
(Don't get me started on the "trap shadow"...)

I wrote a reduced test case program to exercise the different code paths around 
the DZE exception.  I can run this program in 2 different modes: DZE traps 
disabled (the default), DZE traps enabled.

The test program sets up a SIGFPE handler, and the handler make a copy of the 
siginfo, sets a global flag, and returns.  The program then does "1.0 / 0.0" 
and prints the result.  It checks to ensure that the DZE exception is set via 
fpgetsticky().  It then does "1.0 / 0.0" again, and then verifies that the 
SIGFPE handler was not called a second time (because I never cleared DZE with 
fpsetsticky()).

In the "disabled" case, the test program performs the same way on x86_64 and 
alpha (modulo the result of dividing by zero... on x86_64 I consistently get 
+inf, and on alpha sometimes I get +inf, sometimes I get 0; the AARM officially 
states that the results are UNPREDICTABLE).

In the "enabled" case, the alpha and x86_64 behavior differ!  On the alpha, the 
program progresses all the way to the end.  But on x86_64, it hangs... spinning 
around the SIGFPE handler.

The alpha code has, for a very long time, always advanced the PC past the 
faulting instruction on an arithmetic trap[1].  This, in essence, makes it 
behave exactly as if the exception were disabled, while still giving the 
handler a chance to "do something").

But the x86_64 code appears to return to the same instruction, banging its head 
against the proverbial wall.

It's my belief that the alpha behavior is more desirable.

Please, discuss.

[1] The PALcode, in fact, does this when delivering the exception.  The alpha 
FP trap handler has to do some shenanigans to find the actual faulting PC.  
These shenanigans are simple (PC += 4) for EV6 and later CPUs, and somewhat 
complicated for pre-EV6 (which has to deal with the "trap shadow").  In any 
case, even when software completion is not possible because trap shadow rules 
were violated by either the program-writer or the compiler, you're pretty much 
guaranteed to not execute the faulting instruction again.

-- thorpej



Re: 9.1: boot-time delay? [WORKAROUND FOUND]

2021-05-26 Thread Mouse
> #  define hztoms(t) ((unsigned int)(((t) + 0ul) * 1000ul / hz))

> when hz > 1000, this returns 0 for input of 1.

True, which is correct - when hz > 1000, one tick is less than one
millisecond.

If, that is, it's defined to round down (and, if not, the
implementation you quote is broken).  What does its interface contract
say?  (I don't have easy access to 9.1 at the moment, or I'd go
looking.)

How heavily is hztoms used?  (I would expect mstohz to be used far more
heavily.)

> 0 can sometimes mean "never block" and sometimes can mean "block
> forever".

What does hztoms's return value get used for?  Is it actually used to
compute delay values?  It sounds dodgy to me to do so, for this among
other reasons.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


re: 9.1: boot-time delay? [WORKAROUND FOUND]

2021-05-26 Thread matthew green
> > Me too.  I was - am - rather puzzled by it.
>
> Right. That was my issue. Claiming that you'd get more problems with 
> rounding and issues with coarsness when running at a higher frequency, 
> when it in fact is the exact opposite. With a higher frequency you have 
> more accuracy and less errors from truncations.
>
> Anyway, I have no idea what the actual problem is. Good luck with that part.

as i posted earlier in this thread:

#  define hztoms(t) ((unsigned int)(((t) + 0ul) * 1000ul / hz))

when hz > 1000, this returns 0 for input of 1.  0 can sometimes
mean "never block" and sometimes can mean "block forever".


.mrg.


Re: 9.1: boot-time delay? [WORKAROUND FOUND]

2021-05-26 Thread Johnny Billquist

On 2021-05-26 12:59, Mouse wrote:

I don't fully understand the discussion here.



Initially people claimed that HZ at 8000 would be a problem,


Well, my experience indicates that it _is_ a problem, at least when
using disks at piixide (or pciide).


Right. But not for the reason suggested. But obviously there is some 
kind of a problem somewhere.



which for me seems a bit backwards.


Me too.  I was - am - rather puzzled by it.


Right. That was my issue. Claiming that you'd get more problems with 
rounding and issues with coarsness when running at a higher frequency, 
when it in fact is the exact opposite. With a higher frequency you have 
more accuracy and less errors from truncations.


Anyway, I have no idea what the actual problem is. Good luck with that part.

  Johnny

--
Johnny Billquist  || "I'm on a bus
  ||  on a psychedelic trip
email: b...@softjar.se ||  Reading murder books
pdp is alive! ||  tryin' to stay hip" - B. Idol


Re: 9.1: boot-time delay? [WORKAROUND FOUND]

2021-05-26 Thread Mouse
> I don't fully understand the discussion here.

> Initially people claimed that HZ at 8000 would be a problem,

Well, my experience indicates that it _is_ a problem, at least when
using disks at piixide (or pciide).

> which for me seems a bit backwards.

Me too.  I was - am - rather puzzled by it.

> I can only see two real problems with a high clock frequency:

> 1. The overhead of just dealing with clock interrupts increase.

Yes.  This is a potential issue.  When I set HZ=8000 it was one of the
first things I looked at.  The extra interrupt handling load is
measurable, but small enough that it was an acceptable price for making
the application work.  IIRC it was on the order of 1-2% of one core.

> 2. If there are things that just give time in ticks, then ticks
> become very short.  And if the assumption is that just one tick is
> enough, such an assumption can fail.

Yes.  But I have trouble seeing how such a failure could lead to a
delay that is (approximately) linear in HZ.  The only thing I've been
able to think of is that some delay somehow is having mstohz(), or
moral equivalent, applied to it twice.  That is, mstohz is applied, and
then some other code assumes the result is ms rather than ticks and
applies mstohz to it again.  At HZ=100 this means you get a tenth the
delay you think you're getting.  At HZ=1000 it's what it should be.  At
HZ=8000 it's eight times what it should be.  Based on that theory, I
would suspect someone has written a delay number that should be three
seconds and is actually getting .3 seconds at HZ=100, going up to 6s at
HZ=2000, 12s at 4000, and 24s at 8000, but about 2s of other stuff
happens in parallel with it, so I see the delay as 2s shorter than it
is.  I could easily see myself thinking "weird, I have to ask for
3000ms to get a delay long enough for this to work, doesn't _look_ like
three seconds, must investigate sometime" and never getting around to
actually investigating.

My principal reason for thinking this is so is that when I disabled
USB, the delay (at HZ=8000) went from 22s to about 24s - but the
absolute time at which the delay ended, the number printed in brakcets,
stayed (approximately) constant at about 25.3.  This matches well with
the theory that a delay of about 24s starts early and runs in parallel
with the rest of autoconf, and USB takes about two seconds, so I see
22s of delay.  But then when I disable USB, it doesn't hide 2s of that
24s, so I see the whole thing.  (I'd suspect 25s of delay except that
the numbers in brackets seem to start at 1.  Also, 24s matches better
with something getting multiplied by 80 than 25 does.)

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: 9.1: boot-time delay? [WORKAROUND FOUND]

2021-05-26 Thread Johnny Billquist

On 2021-05-26 11:12, matthew green wrote:

Manuel Bouyer writes:

On Wed, May 26, 2021 at 05:35:53PM +1000, matthew green wrote:

Manuel Bouyer writes:

On Tue, May 25, 2021 at 10:46:04PM -, Michael van Elst wrote:

bou...@antioche.eu.org (Manuel Bouyer) writes:


Another issue could be mstohz() called with a delay too short;
mstohz() will round it up to 1 tick.



#  define mstohz(ms) ((unsigned int)((ms + 0ul) * hz / 1000ul))


actually, this one is problematic as well.

mstohz(1) here gives 0 for hz < 1000.  "(1 + 0) * hz / 1000"
for any 'hz' less than 1000 will give 0.


I don't fully understand the discussion here.
Initially people claimed that HZ at 8000 would be a problem, which for 
me seems a bit backwards. And this comment should make that even more 
obvious.


With hz at 8000, you actually get some usable value for mstohz(1), while 
with low hz definitions, you do not.


So why would a high frequency at a clock be a problem? Seems the person 
who claimed that must have gotten things a bit backwards.


I can only see two real problems with a high clock frequency:
1. The overhead of just dealing with clock interrupts increase.
2. If there are things that just give time in ticks, then ticks become 
very short. And if the assumption is that just one tick is enough, such 
an assumption can fail.


  Johnny

--
Johnny Billquist  || "I'm on a bus
  ||  on a psychedelic trip
email: b...@softjar.se ||  Reading murder books
pdp is alive! ||  tryin' to stay hip" - B. Idol


re: 9.1: boot-time delay? [WORKAROUND FOUND]

2021-05-26 Thread matthew green
Manuel Bouyer writes:
> On Wed, May 26, 2021 at 05:35:53PM +1000, matthew green wrote:
> > Manuel Bouyer writes:
> > > On Tue, May 25, 2021 at 10:46:04PM -, Michael van Elst wrote:
> > > > bou...@antioche.eu.org (Manuel Bouyer) writes:
> > > > 
> > > > >Another issue could be mstohz() called with a delay too short;
> > > > >mstohz() will round it up to 1 tick.
> > > > 
> > > > 
> > > > #  define mstohz(ms) ((unsigned int)((ms + 0ul) * hz / 1000ul))

actually, this one is problematic as well.

mstohz(1) here gives 0 for hz < 1000.  "(1 + 0) * hz / 1000"
for any 'hz' less than 1000 will give 0.

> > it's hztoms() that is the problem here.
> > 
> > #  define hztoms(t) ((unsigned int)(((t) + 0ul) * 1000ul / hz))
> > 
> > ah... this is the new one.  the old one was:
> > 
> > #define hztoms(t) \
> >  (__predict_false((t) >= 0x2) ? \
> >  ((t +0u) / hz) * 1000u : \
> >  ((t +0u) * 1000u) / hz)
> > 
> > looks like christos fixed it in 2019.
>
> I'm not sure how christos's change could be a fix. I introduced hztoms()
> and mstohz() to avoid interger overflow for large values, and it looks like
> christos reintroduced it for 32bits platforms :(

there's an inline for 32 bit now, not the above code.

i realise the problem i recall here:  it happens with both
of the above with hz is > 1000.  "1 * 1000 / 1024" is 0.


.mrg.


Re: 9.1: boot-time delay? [WORKAROUND FOUND]

2021-05-26 Thread Manuel Bouyer
On Wed, May 26, 2021 at 05:35:53PM +1000, matthew green wrote:
> Manuel Bouyer writes:
> > On Tue, May 25, 2021 at 10:46:04PM -, Michael van Elst wrote:
> > > bou...@antioche.eu.org (Manuel Bouyer) writes:
> > > 
> > > >Another issue could be mstohz() called with a delay too short;
> > > >mstohz() will round it up to 1 tick.
> > > 
> > > 
> > > #  define mstohz(ms) ((unsigned int)((ms + 0ul) * hz / 1000ul))
> > > 
> > > If mstohz() would round up to full ticks, it could actually avoid
> > > some pitfalls. But it doesn't.
> >
> > indeed. But in this case the problem will show up with smaller hz, not
> > larger. So I think we can rule out this case.
> 
> it's hztoms() that is the problem here.
> 
> #  define hztoms(t) ((unsigned int)(((t) + 0ul) * 1000ul / hz))
> 
> ah... this is the new one.  the old one was:
> 
> #define hztoms(t) \
>  (__predict_false((t) >= 0x2) ? \
>  ((t +0u) / hz) * 1000u : \
>  ((t +0u) * 1000u) / hz)
> 
> looks like christos fixed it in 2019.

I'm not sure how christos's change could be a fix. I introduced hztoms()
and mstohz() to avoid interger overflow for large values, and it looks like
christos reintroduced it for 32bits platforms :(

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


re: 9.1: boot-time delay? [WORKAROUND FOUND]

2021-05-26 Thread matthew green
Manuel Bouyer writes:
> On Tue, May 25, 2021 at 10:46:04PM -, Michael van Elst wrote:
> > bou...@antioche.eu.org (Manuel Bouyer) writes:
> > 
> > >Another issue could be mstohz() called with a delay too short;
> > >mstohz() will round it up to 1 tick.
> > 
> > 
> > #  define mstohz(ms) ((unsigned int)((ms + 0ul) * hz / 1000ul))
> > 
> > If mstohz() would round up to full ticks, it could actually avoid
> > some pitfalls. But it doesn't.
>
> indeed. But in this case the problem will show up with smaller hz, not
> larger. So I think we can rule out this case.

it's hztoms() that is the problem here.

#  define hztoms(t) ((unsigned int)(((t) + 0ul) * 1000ul / hz))

ah... this is the new one.  the old one was:

#define hztoms(t) \
 (__predict_false((t) >= 0x2) ? \
 ((t +0u) / hz) * 1000u : \
 ((t +0u) * 1000u) / hz)

looks like christos fixed it in 2019.

revision 1.615
date: 2019-09-28 08:10:58 -0700;  author: christos;  state: Exp;  lines: +25 
-12;  commitid: QCMKDk7BwqyE4NEB;
For 32 bit the mstohz and hztoms functions evaluate their parameter multiple
times. This is inefficient for cases like:
unsigned ms = hztoms(MIN(timeout, mstohz(INT_MAX)));
Make them inline functions; also provide the 64 bit versions for them here
so all the LP64 machines can use them (before only amd64 and sparc64
specialized mstohz).
Make them both return unsigned int.

which explains why the change i made in mid-2019 that seemed to no
longer be necessary was no longer necessary!


.mrg.


Re: 9.1: boot-time delay? [WORKAROUND FOUND]

2021-05-26 Thread Manuel Bouyer
On Tue, May 25, 2021 at 10:46:04PM -, Michael van Elst wrote:
> bou...@antioche.eu.org (Manuel Bouyer) writes:
> 
> >Another issue could be mstohz() called with a delay too short;
> >mstohz() will round it up to 1 tick.
> 
> 
> #  define mstohz(ms) ((unsigned int)((ms + 0ul) * hz / 1000ul))
> 
> If mstohz() would round up to full ticks, it could actually avoid
> some pitfalls. But it doesn't.

indeed. But in this case the problem will show up with smaller hz, not
larger. So I think we can rule out this case.

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--