Re: 9.1: boot-time delay? [WORKAROUND FOUND]

2021-05-27 Thread Michael van Elst
On Fri, May 28, 2021 at 05:13:02AM +0700, Robert Elz wrote:

> But it isn't, you can't convert 60 ticks/second into some number of
> milliseconds, the two are different units.

Sure you can. period = 1/frequency.

But HZ (and now hz) in the code is also used as a period, thus the idioms:

delay(5*HZ)wait 5 seconds
timeout(ttrstrt, tp, HZ/4) trigger ttrstrt in 1/4 second

That's the simplicity when your time units are 1/HZ, then the same
number (called HZ) can stand for a frequency and a one second period.


If the programming language would use measures (number + specific
unit) instead of (integer) numbers, it became more explicit. But
I guess a modern language would still allow to use the same value
as a frequency or a period with some automatic type conversion.


Can we go back now to the original problem ?


Greetings,
-- 
Michael van Elst
Internet: mlel...@serpens.de
"A potential Snark may lurk in every tree."


Re: 9.1: boot-time delay? [WORKAROUND FOUND]

2021-05-27 Thread Johnny Billquist

On 2021-05-28 00:46, Joerg Sonnenberger wrote:

On Fri, May 28, 2021 at 03:14:24AM +0700, Robert Elz wrote:

 Date:Thu, 27 May 2021 05:05:15 - (UTC)
 From:mlel...@serpens.de (Michael van Elst)
 Message-ID:  

   | mlel...@serpens.de (Michael van Elst) writes:
   |
   | >Either direction mstohz or hztoms should better always round up to
   | >guarantee a minimal delay.
   |
   | And both should be replaced by hztous()/ustohz().

While changing ms to us is probably a good idea, when a change happens,
the "hz" part should be changed too.

hz is (a unit of) a measure of frequency, ms (or us) is (a unit of) a
measure of time (duration) - converting one to the other makes no sense.


"hz" in this context comes from HZ - it is used here as dimension of
1s/HZ. Just like "ms" here is used as "1s/1000". It's a pretty sensible
naming compared to much longer naming variants.


Well, Robert have a point.

If you say hztoms(4), we are not asking for a conversion from 4Hz to ms, 
but in fact four ticks at the current HZ, converted to ms.


  Johnny

--
Johnny Billquist  || "I'm on a bus
  ||  on a psychedelic trip
email: b...@softjar.se ||  Reading murder books
pdp is alive! ||  tryin' to stay hip" - B. Idol


Re: 9.1: boot-time delay? [WORKAROUND FOUND]

2021-05-27 Thread Joerg Sonnenberger
On Fri, May 28, 2021 at 03:14:24AM +0700, Robert Elz wrote:
> Date:Thu, 27 May 2021 05:05:15 - (UTC)
> From:mlel...@serpens.de (Michael van Elst)
> Message-ID:  
> 
>   | mlel...@serpens.de (Michael van Elst) writes:
>   |
>   | >Either direction mstohz or hztoms should better always round up to
>   | >guarantee a minimal delay.
>   |
>   | And both should be replaced by hztous()/ustohz().
> 
> While changing ms to us is probably a good idea, when a change happens,
> the "hz" part should be changed too.
> 
> hz is (a unit of) a measure of frequency, ms (or us) is (a unit of) a
> measure of time (duration) - converting one to the other makes no sense.

"hz" in this context comes from HZ - it is used here as dimension of
1s/HZ. Just like "ms" here is used as "1s/1000". It's a pretty sensible
naming compared to much longer naming variants.

Joerg


Re: 9.1: boot-time delay? [WORKAROUND FOUND]

2021-05-27 Thread Johnny Billquist

On 2021-05-28 00:13, Robert Elz wrote:

 Date:Thu, 27 May 2021 20:19:06 +
 From:"Koning, Paul" 
 Message-ID:  <8765ae3a-b5b7-4b67-82ce-93473a5b9...@dell.com>

   | In this particular case it's converting frequency to period,
   | that is a sensible conversion.

But it isn't, you can't convert 60 ticks/second into some number of
milliseconds, the two are different units.


Not that much.
To quote wikipedia:

"The dimension of the unit hertz is 1/time (1/T). Expressed in base SI 
units it is 1/second (1/s)."


So basically, it's just the inverse of time. Based on that, it's pretty 
clear that conversion to/from time is very valid.



And in another reply:

Johnny Billquist  said:
   | Frequency essentially means a counting of the number of  time something
   | happens over a specific time period. With hertz, the time  period is one
   | second.
  
Sure.


   | So then converting the number of times an event
   | happens in a second into how long it is between two events makes total
   | sense.

It would, but that's not what the functions do.   What they do is tell
how many ticks occur in a specific number of milliseconds (or vice
versa).   Your calculation is just (in milliseconds) 1000/hz, and assuming
hz isn't varying, is a constant.


Good point. It's a conversion from ms (or whatever time) to/from the 
number of cycles (or ticks of you want) based on the frequency given.


I didn't think this through enough. While I certainly believe it's 
perfectly valid to convert between a frequency and a time for a single 
cycle, it do become weird to talk about frequency if we are in fact 
talking about some specific number of cycles, although those cycles can 
only be converted to a time if we have a frequency.


I guess you convinced me. We should really call it something like 
tickstoms and mstoticks. But that do rely on people then understanding 
that it is the ticks defined by HZ, and not any random ticks.


  Johnny

--
Johnny Billquist  || "I'm on a bus
  ||  on a psychedelic trip
email: b...@softjar.se ||  Reading murder books
pdp is alive! ||  tryin' to stay hip" - B. Idol


Re: 9.1: boot-time delay? [WORKAROUND FOUND]

2021-05-27 Thread Robert Elz
Date:Thu, 27 May 2021 20:19:06 +
From:"Koning, Paul" 
Message-ID:  <8765ae3a-b5b7-4b67-82ce-93473a5b9...@dell.com>

  | In this particular case it's converting frequency to period,
  | that is a sensible conversion.

But it isn't, you can't convert 60 ticks/second into some number of
milliseconds, the two are different units.   That's just the same as
you can't convert metres/second (velocity) into seconds.   Given a
particular velocity, and some number of metres, you can calculate the
time it takes to move that far, but that isn't converting velocity
into seconds.

What it is happening is that (in one direction of the other, depending
upon which function) it is converting between the number of ticks that
occur and the duration of an interval (which of course depends upon the
frequency, but it is not converting the frequency).

The hztoms() function is no different than a ustoms() function, except
that in the former we have a semi-variable (the frequency) which is simply
a constant (1000) in the second - but that's only a variable because we
allow HZ to vary (by architecture, and sometimes, configuration).   Calling
ustoms() thousandtoms() would be absurd.   So is calling this one hztoms().

  | You could say "hztoperiodinus" but that's rather verbose.

That doesn't help, we're still not converting a frequency to a period.

And in another reply:

Johnny Billquist  said:
  | Frequency essentially means a counting of the number of  time something
  | happens over a specific time period. With hertz, the time  period is one
  | second.
 
Sure.

  | So then converting the number of times an event 
  | happens in a second into how long it is between two events makes total 
  | sense.

It would, but that's not what the functions do.   What they do is tell
how many ticks occur in a specific number of milliseconds (or vice
versa).   Your calculation is just (in milliseconds) 1000/hz, and assuming
hz isn't varying, is a constant.

  | A tick is not a duration. A tick is a specific event at a specific time.  It
  | has no duration. You have a duration between two ticks.

Sure, reasonable point, but as Mouse said, when we're dealing with this
stuff the number of ticks is counted as a representation of the number
of those durations, and we just say how many ticks happened.  The ticks
represent the duration between them - that might be slightly sloppy, but
it isn't outright wrong.

kre




Re: 9.1: boot-time delay? [WORKAROUND FOUND]

2021-05-27 Thread Johnny Billquist

On 2021-05-27 22:50, Mouse wrote:

A tick is not a duration.  A tick is a specific event at a specific
time.  It has no duration.  You have a duration between two ticks.


At least as I use it and have heard it used, `tick' can also be used to
refer to the interval between two of those events.  "How long are timer
ticks on this hardware?" or "For the next few ticks, we crunch on
this...".


I think that is mostly a bit sloppy terminology where they actually want 
to know the time between two ticks. :-)


  Johnny

--
Johnny Billquist  || "I'm on a bus
  ||  on a psychedelic trip
email: b...@softjar.se ||  Reading murder books
pdp is alive! ||  tryin' to stay hip" - B. Idol


Re: 9.1: boot-time delay? [WORKAROUND FOUND]

2021-05-27 Thread Mouse
> A tick is not a duration.  A tick is a specific event at a specific
> time.  It has no duration.  You have a duration between two ticks.

At least as I use it and have heard it used, `tick' can also be used to
refer to the interval between two of those events.  "How long are timer
ticks on this hardware?" or "For the next few ticks, we crunch on
this...".

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: 9.1: boot-time delay? [WORKAROUND FOUND]

2021-05-27 Thread Johnny Billquist

On 2021-05-27 22:14, Robert Elz wrote:

 Date:Thu, 27 May 2021 05:05:15 - (UTC)
 From:mlel...@serpens.de (Michael van Elst)
 Message-ID:  

   | And both should be replaced by hztous()/ustohz().

While changing ms to us is probably a good idea, when a change happens,
the "hz" part should be changed too.


Not sure I agree with that.


hz is (a unit of) a measure of frequency, ms (or us) is (a unit of) a
measure of time (duration) - converting one to the other makes no sense.


It sure does. Frequency essentially means a counting of the number of 
time something happens over a specific time period. With hertz, the time 
period is one second. So then converting the number of times an event 
happens in a second into how long it is between two events makes total 
sense.



What these functions/macros do is convert between ms (or us) and ticks
(another measure of a duration), not hz, so the misleading "hz" part of
the name should be removed (changed) if a new macro/function is to be invented.
(The benefit isn't worth it to justify changing the current existing names,
but we shouldn't persist with nonsense if we're doing something new.)


A tick is not a duration. A tick is a specific event at a specific time. 
It has no duration. You have a duration between two ticks.


And commonly at every tick you get an interrupt, and you do something, 
and then resume your normal work. The tick has passed. Waiting for the 
next one to happen.


  Johnny

--
Johnny Billquist  || "I'm on a bus
  ||  on a psychedelic trip
email: b...@softjar.se ||  Reading murder books
pdp is alive! ||  tryin' to stay hip" - B. Idol


Re: 9.1: boot-time delay? [WORKAROUND FOUND]

2021-05-27 Thread Koning, Paul



> On May 27, 2021, at 4:14 PM, Robert Elz  wrote:
> 
> ...
> While changing ms to us is probably a good idea, when a change happens,
> the "hz" part should be changed too.
> 
> hz is (a unit of) a measure of frequency, ms (or us) is (a unit of) a
> measure of time (duration) - converting one to the other makes no sense.

In this particular case it's converting frequency to period, that is a sensible 
conversion.  You could say "hztoperiodinus" but that's rather verbose.

paul




Re: 9.1: boot-time delay? [WORKAROUND FOUND]

2021-05-27 Thread Robert Elz
Date:Thu, 27 May 2021 05:05:15 - (UTC)
From:mlel...@serpens.de (Michael van Elst)
Message-ID:  

  | mlel...@serpens.de (Michael van Elst) writes:
  |
  | >Either direction mstohz or hztoms should better always round up to
  | >guarantee a minimal delay.
  |
  | And both should be replaced by hztous()/ustohz().

While changing ms to us is probably a good idea, when a change happens,
the "hz" part should be changed too.

hz is (a unit of) a measure of frequency, ms (or us) is (a unit of) a
measure of time (duration) - converting one to the other makes no sense.

What these functions/macros do is convert between ms (or us) and ticks
(another measure of a duration), not hz, so the misleading "hz" part of
the name should be removed (changed) if a new macro/function is to be invented.
(The benefit isn't worth it to justify changing the current existing names,
but we shouldn't persist with nonsense if we're doing something new.)

kre

ps: note that the variable "hz" (and the macro HZ) are used correctly --
their values are frequencies (ticks/second).



Re: 9.1: boot-time delay? [WORKAROUND FOUND]

2021-05-27 Thread Mouse
>> How heavily is hztoms used?

> [18 uses]

That's...almost none, seems to me.  And these

> sys/dev/ic/mvsata.c:  timeout = mstohz(timeout + hztoms(1) - 1);
> sys/dev/ic/mvsata.c:  ata_delay(chp, hztoms(1), "mvsata_edma2", 
> wflags);
> sys/dev/sdmmc/if_bwfm_sdio.c: sdmmc_pause(hztoms(1)*1000, NULL);

look to me like bugs waiting to happen - and the first one looks as
though it might even be related to the bug that got us talking about
this; it changes the units of the number in timeout (from ms to hz) in
just the sort of way that could have led to what I saw.

The last, it seems to me, really should be hztoms(1000).

And these

> sys/dev/i2c/tsllux.c: if (ms < hztoms(1)) {
> sys/dev/pci/ixgbe/ixgbe_netbsd.c: else if ((us / 1000) >= hztoms(1)) {

might be as well; I'd have to read more context.

> sys/dev/usb/if_axe.c: usbd_delay_ms(sc->axe_un.un_udev, hztoms(y));   
> \
> sys/external/bsd/drm2/include/linux/jiffies.h:return hztoms(j);
> sys/external/bsd/drm2/include/linux/sched.h:  unsigned ms 
> = hztoms(MIN((unsigned long)timeout,
> sys/kern/sched_4bsd.c:int rttsms = hztoms(sched_rrticks);
> sys/kern/sched_m2.c:  int rttsms = hztoms(sched_rrticks);
> sys/kern/sched_m2.c:  newsize = hztoms(min_ts);
> sys/kern/sched_m2.c:  newsize = hztoms(max_ts);

These might be as well, but they look at least slightly more
reasonable.

> sys/dev/usb/if_axe.c: usbd_delay_ms(un->un_udev, hztoms(hz / 32));
> sys/dev/usb/if_axe.c: usbd_delay_ms(un->un_udev, hztoms(hz / 32));
> sys/dev/usb/if_axe.c: usbd_delay_ms(un->un_udev, hztoms(hz / 32));
> sys/dev/usb/if_axe.c: usbd_delay_ms(un->un_udev, hztoms(hz / 32));

And these are not only bugs waiting to happen (consider HZ=25), they're
extremely peculiar.  Why hztoms(hz/32) rather than just 30, or 31, or
whatever?  Weird.

Would it be useful for me to send-pr this, or is it more likely to be
something nobody would do anything with?

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: 9.1: boot-time delay? [WORKAROUND FOUND]

2021-05-26 Thread Michael van Elst
mlel...@serpens.de (Michael van Elst) writes:

>Either direction mstohz or hztoms should better always round up to
>guarantee a minimal delay.

And both should be replaced by hztous()/ustohz().

Microseconds allow a time value of ~35 minutes as 32bit signed
integer, which should be a safe range for delays.

A higher resolution might have its use too, but then you have
to assure that such values are always carried in 64bit integers
or handled by a structure. There is already a hz2bintime()
macro in that realm, which is unsafe (truncates to milliseconds)
but AFAIK also unused.



Re: 9.1: boot-time delay? [WORKAROUND FOUND]

2021-05-26 Thread Michael van Elst
mo...@rodents-montreal.org (Mouse) writes:

>How heavily is hztoms used?  (I would expect mstohz to be used far more
>heavily.)

sys/dev/acpi/acpi_cpu_cstate.c:   sc->sc_cstate_sleep = 
hztoms(acpitimer_delta(end, start)) * 1000;
sys/dev/spkr_audio.c: audiobell(sc->sc_audiodev, xhz, hztoms(ticks), 
sc->sc_spkr.sc_vol, 0);
sys/dev/i2c/tsllux.c: if (ms < hztoms(1)) {
sys/dev/ic/mvsata.c:  timeout = mstohz(timeout + hztoms(1) - 1);
sys/dev/ic/mvsata.c:  ata_delay(chp, hztoms(1), "mvsata_edma2", 
wflags);
sys/dev/pci/ixgbe/ixgbe_netbsd.c: else if ((us / 1000) >= hztoms(1)) {
sys/dev/sdmmc/if_bwfm_sdio.c: sdmmc_pause(hztoms(1)*1000, NULL);
sys/dev/usb/if_axe.c: usbd_delay_ms(sc->axe_un.un_udev, hztoms(y)); 
  \
sys/dev/usb/if_axe.c: usbd_delay_ms(un->un_udev, hztoms(hz / 32));
sys/dev/usb/if_axe.c: usbd_delay_ms(un->un_udev, hztoms(hz / 32));
sys/dev/usb/if_axe.c: usbd_delay_ms(un->un_udev, hztoms(hz / 32));
sys/dev/usb/if_axe.c: usbd_delay_ms(un->un_udev, hztoms(hz / 32));
sys/external/bsd/drm2/include/linux/jiffies.h:return hztoms(j);
sys/external/bsd/drm2/include/linux/sched.h:  unsigned ms = 
hztoms(MIN((unsigned long)timeout,
sys/kern/sched_4bsd.c:int rttsms = hztoms(sched_rrticks);
sys/kern/sched_m2.c:  int rttsms = hztoms(sched_rrticks);
sys/kern/sched_m2.c:  newsize = hztoms(min_ts);
sys/kern/sched_m2.c:  newsize = hztoms(max_ts);


>> 0 can sometimes mean "never block" and sometimes can mean "block
>> forever".

>What does hztoms's return value get used for?  Is it actually used to
>compute delay values?  It sounds dodgy to me to do so, for this among
>other reasons.

Mostly to compute a delay.

Either direction mstohz or hztoms should better always round up to
guarantee a minimal delay.



Re: 9.1: boot-time delay? [WORKAROUND FOUND]

2021-05-26 Thread Mouse
> #  define hztoms(t) ((unsigned int)(((t) + 0ul) * 1000ul / hz))

> when hz > 1000, this returns 0 for input of 1.

True, which is correct - when hz > 1000, one tick is less than one
millisecond.

If, that is, it's defined to round down (and, if not, the
implementation you quote is broken).  What does its interface contract
say?  (I don't have easy access to 9.1 at the moment, or I'd go
looking.)

How heavily is hztoms used?  (I would expect mstohz to be used far more
heavily.)

> 0 can sometimes mean "never block" and sometimes can mean "block
> forever".

What does hztoms's return value get used for?  Is it actually used to
compute delay values?  It sounds dodgy to me to do so, for this among
other reasons.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


re: 9.1: boot-time delay? [WORKAROUND FOUND]

2021-05-26 Thread matthew green
> > Me too.  I was - am - rather puzzled by it.
>
> Right. That was my issue. Claiming that you'd get more problems with 
> rounding and issues with coarsness when running at a higher frequency, 
> when it in fact is the exact opposite. With a higher frequency you have 
> more accuracy and less errors from truncations.
>
> Anyway, I have no idea what the actual problem is. Good luck with that part.

as i posted earlier in this thread:

#  define hztoms(t) ((unsigned int)(((t) + 0ul) * 1000ul / hz))

when hz > 1000, this returns 0 for input of 1.  0 can sometimes
mean "never block" and sometimes can mean "block forever".


.mrg.


Re: 9.1: boot-time delay? [WORKAROUND FOUND]

2021-05-26 Thread Johnny Billquist

On 2021-05-26 12:59, Mouse wrote:

I don't fully understand the discussion here.



Initially people claimed that HZ at 8000 would be a problem,


Well, my experience indicates that it _is_ a problem, at least when
using disks at piixide (or pciide).


Right. But not for the reason suggested. But obviously there is some 
kind of a problem somewhere.



which for me seems a bit backwards.


Me too.  I was - am - rather puzzled by it.


Right. That was my issue. Claiming that you'd get more problems with 
rounding and issues with coarsness when running at a higher frequency, 
when it in fact is the exact opposite. With a higher frequency you have 
more accuracy and less errors from truncations.


Anyway, I have no idea what the actual problem is. Good luck with that part.

  Johnny

--
Johnny Billquist  || "I'm on a bus
  ||  on a psychedelic trip
email: b...@softjar.se ||  Reading murder books
pdp is alive! ||  tryin' to stay hip" - B. Idol


Re: 9.1: boot-time delay? [WORKAROUND FOUND]

2021-05-26 Thread Mouse
> I don't fully understand the discussion here.

> Initially people claimed that HZ at 8000 would be a problem,

Well, my experience indicates that it _is_ a problem, at least when
using disks at piixide (or pciide).

> which for me seems a bit backwards.

Me too.  I was - am - rather puzzled by it.

> I can only see two real problems with a high clock frequency:

> 1. The overhead of just dealing with clock interrupts increase.

Yes.  This is a potential issue.  When I set HZ=8000 it was one of the
first things I looked at.  The extra interrupt handling load is
measurable, but small enough that it was an acceptable price for making
the application work.  IIRC it was on the order of 1-2% of one core.

> 2. If there are things that just give time in ticks, then ticks
> become very short.  And if the assumption is that just one tick is
> enough, such an assumption can fail.

Yes.  But I have trouble seeing how such a failure could lead to a
delay that is (approximately) linear in HZ.  The only thing I've been
able to think of is that some delay somehow is having mstohz(), or
moral equivalent, applied to it twice.  That is, mstohz is applied, and
then some other code assumes the result is ms rather than ticks and
applies mstohz to it again.  At HZ=100 this means you get a tenth the
delay you think you're getting.  At HZ=1000 it's what it should be.  At
HZ=8000 it's eight times what it should be.  Based on that theory, I
would suspect someone has written a delay number that should be three
seconds and is actually getting .3 seconds at HZ=100, going up to 6s at
HZ=2000, 12s at 4000, and 24s at 8000, but about 2s of other stuff
happens in parallel with it, so I see the delay as 2s shorter than it
is.  I could easily see myself thinking "weird, I have to ask for
3000ms to get a delay long enough for this to work, doesn't _look_ like
three seconds, must investigate sometime" and never getting around to
actually investigating.

My principal reason for thinking this is so is that when I disabled
USB, the delay (at HZ=8000) went from 22s to about 24s - but the
absolute time at which the delay ended, the number printed in brakcets,
stayed (approximately) constant at about 25.3.  This matches well with
the theory that a delay of about 24s starts early and runs in parallel
with the rest of autoconf, and USB takes about two seconds, so I see
22s of delay.  But then when I disable USB, it doesn't hide 2s of that
24s, so I see the whole thing.  (I'd suspect 25s of delay except that
the numbers in brackets seem to start at 1.  Also, 24s matches better
with something getting multiplied by 80 than 25 does.)

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: 9.1: boot-time delay? [WORKAROUND FOUND]

2021-05-26 Thread Johnny Billquist

On 2021-05-26 11:12, matthew green wrote:

Manuel Bouyer writes:

On Wed, May 26, 2021 at 05:35:53PM +1000, matthew green wrote:

Manuel Bouyer writes:

On Tue, May 25, 2021 at 10:46:04PM -, Michael van Elst wrote:

bou...@antioche.eu.org (Manuel Bouyer) writes:


Another issue could be mstohz() called with a delay too short;
mstohz() will round it up to 1 tick.



#  define mstohz(ms) ((unsigned int)((ms + 0ul) * hz / 1000ul))


actually, this one is problematic as well.

mstohz(1) here gives 0 for hz < 1000.  "(1 + 0) * hz / 1000"
for any 'hz' less than 1000 will give 0.


I don't fully understand the discussion here.
Initially people claimed that HZ at 8000 would be a problem, which for 
me seems a bit backwards. And this comment should make that even more 
obvious.


With hz at 8000, you actually get some usable value for mstohz(1), while 
with low hz definitions, you do not.


So why would a high frequency at a clock be a problem? Seems the person 
who claimed that must have gotten things a bit backwards.


I can only see two real problems with a high clock frequency:
1. The overhead of just dealing with clock interrupts increase.
2. If there are things that just give time in ticks, then ticks become 
very short. And if the assumption is that just one tick is enough, such 
an assumption can fail.


  Johnny

--
Johnny Billquist  || "I'm on a bus
  ||  on a psychedelic trip
email: b...@softjar.se ||  Reading murder books
pdp is alive! ||  tryin' to stay hip" - B. Idol


re: 9.1: boot-time delay? [WORKAROUND FOUND]

2021-05-26 Thread matthew green
Manuel Bouyer writes:
> On Wed, May 26, 2021 at 05:35:53PM +1000, matthew green wrote:
> > Manuel Bouyer writes:
> > > On Tue, May 25, 2021 at 10:46:04PM -, Michael van Elst wrote:
> > > > bou...@antioche.eu.org (Manuel Bouyer) writes:
> > > > 
> > > > >Another issue could be mstohz() called with a delay too short;
> > > > >mstohz() will round it up to 1 tick.
> > > > 
> > > > 
> > > > #  define mstohz(ms) ((unsigned int)((ms + 0ul) * hz / 1000ul))

actually, this one is problematic as well.

mstohz(1) here gives 0 for hz < 1000.  "(1 + 0) * hz / 1000"
for any 'hz' less than 1000 will give 0.

> > it's hztoms() that is the problem here.
> > 
> > #  define hztoms(t) ((unsigned int)(((t) + 0ul) * 1000ul / hz))
> > 
> > ah... this is the new one.  the old one was:
> > 
> > #define hztoms(t) \
> >  (__predict_false((t) >= 0x2) ? \
> >  ((t +0u) / hz) * 1000u : \
> >  ((t +0u) * 1000u) / hz)
> > 
> > looks like christos fixed it in 2019.
>
> I'm not sure how christos's change could be a fix. I introduced hztoms()
> and mstohz() to avoid interger overflow for large values, and it looks like
> christos reintroduced it for 32bits platforms :(

there's an inline for 32 bit now, not the above code.

i realise the problem i recall here:  it happens with both
of the above with hz is > 1000.  "1 * 1000 / 1024" is 0.


.mrg.


Re: 9.1: boot-time delay? [WORKAROUND FOUND]

2021-05-26 Thread Manuel Bouyer
On Wed, May 26, 2021 at 05:35:53PM +1000, matthew green wrote:
> Manuel Bouyer writes:
> > On Tue, May 25, 2021 at 10:46:04PM -, Michael van Elst wrote:
> > > bou...@antioche.eu.org (Manuel Bouyer) writes:
> > > 
> > > >Another issue could be mstohz() called with a delay too short;
> > > >mstohz() will round it up to 1 tick.
> > > 
> > > 
> > > #  define mstohz(ms) ((unsigned int)((ms + 0ul) * hz / 1000ul))
> > > 
> > > If mstohz() would round up to full ticks, it could actually avoid
> > > some pitfalls. But it doesn't.
> >
> > indeed. But in this case the problem will show up with smaller hz, not
> > larger. So I think we can rule out this case.
> 
> it's hztoms() that is the problem here.
> 
> #  define hztoms(t) ((unsigned int)(((t) + 0ul) * 1000ul / hz))
> 
> ah... this is the new one.  the old one was:
> 
> #define hztoms(t) \
>  (__predict_false((t) >= 0x2) ? \
>  ((t +0u) / hz) * 1000u : \
>  ((t +0u) * 1000u) / hz)
> 
> looks like christos fixed it in 2019.

I'm not sure how christos's change could be a fix. I introduced hztoms()
and mstohz() to avoid interger overflow for large values, and it looks like
christos reintroduced it for 32bits platforms :(

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


re: 9.1: boot-time delay? [WORKAROUND FOUND]

2021-05-26 Thread matthew green
Manuel Bouyer writes:
> On Tue, May 25, 2021 at 10:46:04PM -, Michael van Elst wrote:
> > bou...@antioche.eu.org (Manuel Bouyer) writes:
> > 
> > >Another issue could be mstohz() called with a delay too short;
> > >mstohz() will round it up to 1 tick.
> > 
> > 
> > #  define mstohz(ms) ((unsigned int)((ms + 0ul) * hz / 1000ul))
> > 
> > If mstohz() would round up to full ticks, it could actually avoid
> > some pitfalls. But it doesn't.
>
> indeed. But in this case the problem will show up with smaller hz, not
> larger. So I think we can rule out this case.

it's hztoms() that is the problem here.

#  define hztoms(t) ((unsigned int)(((t) + 0ul) * 1000ul / hz))

ah... this is the new one.  the old one was:

#define hztoms(t) \
 (__predict_false((t) >= 0x2) ? \
 ((t +0u) / hz) * 1000u : \
 ((t +0u) * 1000u) / hz)

looks like christos fixed it in 2019.

revision 1.615
date: 2019-09-28 08:10:58 -0700;  author: christos;  state: Exp;  lines: +25 
-12;  commitid: QCMKDk7BwqyE4NEB;
For 32 bit the mstohz and hztoms functions evaluate their parameter multiple
times. This is inefficient for cases like:
unsigned ms = hztoms(MIN(timeout, mstohz(INT_MAX)));
Make them inline functions; also provide the 64 bit versions for them here
so all the LP64 machines can use them (before only amd64 and sparc64
specialized mstohz).
Make them both return unsigned int.

which explains why the change i made in mid-2019 that seemed to no
longer be necessary was no longer necessary!


.mrg.


Re: 9.1: boot-time delay? [WORKAROUND FOUND]

2021-05-26 Thread Manuel Bouyer
On Tue, May 25, 2021 at 10:46:04PM -, Michael van Elst wrote:
> bou...@antioche.eu.org (Manuel Bouyer) writes:
> 
> >Another issue could be mstohz() called with a delay too short;
> >mstohz() will round it up to 1 tick.
> 
> 
> #  define mstohz(ms) ((unsigned int)((ms + 0ul) * hz / 1000ul))
> 
> If mstohz() would round up to full ticks, it could actually avoid
> some pitfalls. But it doesn't.

indeed. But in this case the problem will show up with smaller hz, not
larger. So I think we can rule out this case.

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: 9.1: boot-time delay? [WORKAROUND FOUND]

2021-05-25 Thread Michael van Elst
bou...@antioche.eu.org (Manuel Bouyer) writes:

>Another issue could be mstohz() called with a delay too short;
>mstohz() will round it up to 1 tick.


#  define mstohz(ms) ((unsigned int)((ms + 0ul) * hz / 1000ul))

If mstohz() would round up to full ticks, it could actually avoid
some pitfalls. But it doesn't.



Re: 9.1: boot-time delay? [WORKAROUND FOUND]

2021-05-25 Thread Manuel Bouyer
On Tue, May 25, 2021 at 04:04:56PM -0400, Mouse wrote:
> > I suppose it's not possible to configure ahcisata in the BIOS on the
> > long-delay machines?
> 
> Thank you very much!  Yes.  That is possible - and it fixes the delay.
> I would not have thought to look for that; I would not have expected
> piixide and ahcisata to be similar enough that a BIOS setting could
> personality-swap between them.
> 
> > I'm guessing this is some quirk of the pciide(4) and piixide(4)
> > drivers.
> 
> Sounds like; they presumably have a bug somewhere in some delay
> calculation.  But at least I have something approaching a workaround.

The reset and probe procedure is different bewteen ide and ahci. 
The problem is probably in this area.

Actually the root cause may be a delay too short, not too long.
AFAIK the code uses mstohz() everywhere to compute tick values, exept
maybe a few cases where we want a really short delay and we use 1.
This is where a very high HZ may cause a too short delay.
Another issue could be mstohz() called with a delay too short;
mstohz() will round it up to 1 tick.

But I think you will need to instrument the ide probe in dev/ic/wdc.c.
It's been a while since I last looked at this, but I think the code
you want to look at is wdc_drvprobe().

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: 9.1: boot-time delay?

2021-05-25 Thread Manuel Bouyer
On Tue, May 25, 2021 at 10:21:40PM +0200, Christoph Badura wrote:
> On Wed, May 26, 2021 at 05:02:52AM +1000, matthew green wrote:
> > > > +optionsHZ=8000
> > this can become a problem due to integer division.
> > 
> > any number of ticks less than hz (8000) will be rounded
> > down to 0 in a number of places now, where as before it
> > was only less than 100.  i've seen this trip up in the
> > kernel before, and sometimes that '0' means 'poll', and
> > sometimes it means 'sleep forever'.
> 
> > a lot of places in the kernel *do* avoid (eg, with adding
> > hz-1 and then dividing by hz) but there are a number that
> > do not...
> 
> So, should we introduce a CPP macro or an inline function that abstracts
> the common code away in a way that avoids such rounding down and
> produces correct results?

We already have mstohz() and hztoms()

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: 9.1: boot-time delay? [WORKAROUND FOUND]

2021-05-25 Thread Mouse
>>> I suppose it's not possible to configure ahcisata in the BIOS on
>>> the long-delay machines?
>> I would not have thought to look for that; I would not have expected
>> piixide and ahcisata to be similar enough that a BIOS setting could
>> personality-swap between them.
> IIRC, they are not really that similar.  But the dmesg excerpt you
> cited had piixide at *Intel 6 Series Serial ATA Controller*.  So,
> clearly, the device had to support ahci mode.  Well, "clearly" for
> those in the know.

Indeed!

Still, I wouldn't've thought to look for it because I wasn't aware
there was *any* hardware that could personality-flip like that.  For
someone familiar with that hardware, or with other hardware that can
do it, it'd be a reasonable thing - I guess those are your "those in
the know"!

Which didn't include me.  At least until now. :-)

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: 9.1: boot-time delay?

2021-05-25 Thread Christoph Badura
On Wed, May 26, 2021 at 05:02:52AM +1000, matthew green wrote:
> > > +options  HZ=8000
> this can become a problem due to integer division.
> 
> any number of ticks less than hz (8000) will be rounded
> down to 0 in a number of places now, where as before it
> was only less than 100.  i've seen this trip up in the
> kernel before, and sometimes that '0' means 'poll', and
> sometimes it means 'sleep forever'.

> a lot of places in the kernel *do* avoid (eg, with adding
> hz-1 and then dividing by hz) but there are a number that
> do not...

So, should we introduce a CPP macro or an inline function that abstracts
the common code away in a way that avoids such rounding down and
produces correct results?

--chris


Re: 9.1: boot-time delay? [WORKAROUND FOUND]

2021-05-25 Thread Christoph Badura
On Tue, May 25, 2021 at 04:04:56PM -0400, Mouse wrote:
> > I suppose it's not possible to configure ahcisata in the BIOS on the
> > long-delay machines?
> 
> Thank you very much!  Yes.  That is possible - and it fixes the delay.
> I would not have thought to look for that; I would not have expected
> piixide and ahcisata to be similar enough that a BIOS setting could
> personality-swap between them.

IIRC, they are not really that similar.  But the dmesg excerpt you cited
had piixide at *Intel 6 Series Serial ATA Controller*.  So, clearly, the
device had to support ahci mode.  Well, "clearly" for those in the know.

--chris


Re: 9.1: boot-time delay? [WORKAROUND FOUND]

2021-05-25 Thread Mouse
> I suppose it's not possible to configure ahcisata in the BIOS on the
> long-delay machines?

Thank you very much!  Yes.  That is possible - and it fixes the delay.
I would not have thought to look for that; I would not have expected
piixide and ahcisata to be similar enough that a BIOS setting could
personality-swap between them.

> I'm guessing this is some quirk of the pciide(4) and piixide(4)
> drivers.

Sounds like; they presumably have a bug somewhere in some delay
calculation.  But at least I have something approaching a workaround.

> Not to be too flip, but do you expect these machines to reboot
> frequently in production?

Often enough that my colleague on the customer-interface side of things
thinks twenty seconds of delay on reboot is a problem.

> If not, then I'd probably live with the delay on reboot as at this
> point, [...]

Me too.  I didn't even realize it was there until they pointed it out
to me; I don't really notice twenty seconds more delay when the typical
peecee BIOS imposes anywhere from thirty seconds to five minutes before
it even starts to boot.  I'm obviously not part of their target market
for this thing.

> I'd be concerned that any fix I came up with for it would have
> implications down the road which might be more serious and more
> impactful on operations.

Fair concern.

> I certainly understand the need to know what's going on, but if a
> machine only reboots once or twice a year in production, then ...

Also fair.  But I think once a day is the very fewest reboots we're
likely to see for this, though I'm not really all that in touch with
the user base, so

I'll push this over to them and see if it's an acceptable workaround to
poke the BIOS settings on the existing installed base.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: 9.1: boot-time delay?

2021-05-25 Thread Brian Buhrow
Hello.  I suppose it's not possible to configure ahcisata in the BIOS 
on the long-delay
machines?  I'm guessing this is some quirk of the pciide(4) and piixide(4) 
drivers.  Not to be too 
flip, but do you expect these machines to reboot frequently in production?  If 
not, then I'd 
probably live with the delay on reboot as at this point, I'd be concerned that 
any fix I came up with
for it would have implications down the road which might be more serious and 
more impactful on
operations.  I certainly understand the need to know what's going on, but if a 
machine only
reboots once or twice a year in production, then ...

Any newer hardware should have ahcisaata capable SATA controllers, so 
this problem should
go away as the hardware ages out.

-thanks
-Brian

On May 25,  3:33pm, Mouse wrote:
} Subject: Re: 9.1: boot-time delay?
} >> Will HZ=1000 be sufficient and does that reduce the boot time?
} > The latter is a good question which is likely to hint at possible
} > causes.  I'll experiment with various HZ values and see what happens.
} 
} At HZ=8000, the delay (based on the bracketed numbers) is almost
} exactly 22 seconds.
} 
} At HZ=4000, it's almost exactly 10 seconds.
} 
} At HZ=2000, it's almost exactly 4.3 seconds.
} 
} Based on a quadratic fit to those three data points, all I need to do
} is set HZ to 0 and it will reach the rest of the boot 1.2 seconds
} before it gets to uhub3.  Clearly, this machine includes resublimated
} thiotimoline somewhere in its hardware makeup.
} 
} Seriously, though...
} 
} I've just heard from one of the other people working with this.  I was
} told that, on a different hardware platform, the delay is gone even
} with HZ=8000.  I have an instance of that platform among my development
} machines; I tried it and I see the same thing, even with a bit-for-bit
} identical kernel.
} 
} I spent a little time playing with boot -c and disabling various
} things, as suggested by Martin Husemann upthread.
} 
} Disabling ehci* (what that hardware uses for USB) shuts off all USB
} support, of course.  It does nothing for the delay.
} 
} Disabling piixide* still gets the delay, but it also still finds wd0;
} it just attaches pciide* instead of piixide*.
} 
} Disabling both piixide* and pciide* gets rid of wd* _and_ gets rid of
} the delay.  (It doesn't boot fully, of course, beause it has no root
} device.  But it reaches the root device prompt after some four seconds
} instead of 25-plus.)
} 
} Disabling wd does not fix the delay, but it doesn't completely
} eliminate wd; I still see "wd at ... not configured" messages, so it's
} found in some sense.  (I haven't tried completely removing wd from the
} kernel config.)
} 
} The machine without the delay has wd drives, but they attach at
} ahcisata instead of piixide/pciide.  The piixide attachment (the
} machine that exhibits the delay) is
} 
} piixide0 at pci0 dev 31 function 2: Intel 6 Series Serial ATA Controller 
(rev. 0x05)
} piixide0: bus-master DMA support present
} piixide0: primary channel configured to native-PCI mode
} piixide0: using ioapic0 pin 19 for native-PCI interrupt
} atabus0 at piixide0 channel 0
} piixide0: secondary channel configured to native-PCI mode
} atabus1 at piixide0 channel 1
} ...
} piixide1 at pci0 dev 31 function 5: Intel 6 Series Serial ATA Controller 
(rev. 0x05)
} piixide1: bus-master DMA support present
} piixide1: primary channel wired to native-PCI mode
} piixide1: using ioapic0 pin 19 for native-PCI interrupt
} atabus2 at piixide1 channel 0
} piixide1: secondary channel wired to native-PCI mode
} atabus3 at piixide1 channel 1
} ...
} wd0 at atabus0 drive 0
} wd0: 
} wd0: drive supports 16-sector PIO transfers, LBA48 addressing
} wd0: 465 GB, 969021 cyl, 16 head, 63 sec, 512 bytes/sect x 976773168 sectors 
(0 bytes/physsect; first aligned sector: 8)
} wd0: 32-bit data port
} wd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 5 (Ultra/100), 
WRITE DMA FUA, NCQ (32 tags)
} wd0(piixide0:0:0): using PIO mode 4, Ultra-DMA mode 5 (Ultra/100) (using 
DMA), WRITE DMA FUA EXT
} wd1 at atabus2 drive 0
} wd1: 
} wd1: drive supports 1-sector PIO transfers, LBA48 addressing
} wd1: 7641 MB, 15525 cyl, 16 head, 63 sec, 512 bytes/sect x 15649200 sectors
} wd1: 32-bit data port
} wd1: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 6 (Ultra/133), 
WRITE DMA FUA, NCQ (32 tags)
} wd1(piixide1:0:0): using PIO mode 4, Ultra-DMA mode 6 (Ultra/133) (using 
DMA), WRITE DMA FUA EXT
} 
} In contrast, the ahcisata attachment, on the machine with no delay, is
} 
} ahcisata0 at pci0 dev 23 function 0: vendor 8086 product a102 (rev. 0x31)
} ahcisata0: 64-bit DMA
} ahcisata0: AHCI revision 1.31, 4 ports, 32 slots, CAP 
0xe734ff43
} ahcisata0: interrupting at msi1 vec 0
} atabus0 at ahcisata0 channel 0
} atabus1 at ahcisata0 channel 1
} atabus2 at ahcisata0 channel 2
} atabus3 at ahcisata0 channel 3
} ...
} ahcisata0 port 0: device present, speed: 6.0Gb/s
} ahcisat

Re: 9.1: boot-time delay?

2021-05-25 Thread Mouse
>>> +optionsHZ=8000

> this can become a problem due to integer division.

> any number of ticks less than hz (8000) will be rounded down to 0 in
> a number of places now, where as before it was only less than 100.

Dividing by hz would be used to...all I can think of is, to convert a
tick count to seconds.  I have trouble imagining why/when that would be
a useful thing to do, nor who would do it yielding an integer and then
doing something that would affect autoconf timing with the result.

Since you've seen it happen, though, obviously someone has found a use
for it.  What am I missing?  (See also my latest update; it seems to be
related somehow to piixide/pciide.)

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: 9.1: boot-time delay?

2021-05-25 Thread Mouse
>> Will HZ=1000 be sufficient and does that reduce the boot time?
> The latter is a good question which is likely to hint at possible
> causes.  I'll experiment with various HZ values and see what happens.

At HZ=8000, the delay (based on the bracketed numbers) is almost
exactly 22 seconds.

At HZ=4000, it's almost exactly 10 seconds.

At HZ=2000, it's almost exactly 4.3 seconds.

Based on a quadratic fit to those three data points, all I need to do
is set HZ to 0 and it will reach the rest of the boot 1.2 seconds
before it gets to uhub3.  Clearly, this machine includes resublimated
thiotimoline somewhere in its hardware makeup.

Seriously, though...

I've just heard from one of the other people working with this.  I was
told that, on a different hardware platform, the delay is gone even
with HZ=8000.  I have an instance of that platform among my development
machines; I tried it and I see the same thing, even with a bit-for-bit
identical kernel.

I spent a little time playing with boot -c and disabling various
things, as suggested by Martin Husemann upthread.

Disabling ehci* (what that hardware uses for USB) shuts off all USB
support, of course.  It does nothing for the delay.

Disabling piixide* still gets the delay, but it also still finds wd0;
it just attaches pciide* instead of piixide*.

Disabling both piixide* and pciide* gets rid of wd* _and_ gets rid of
the delay.  (It doesn't boot fully, of course, beause it has no root
device.  But it reaches the root device prompt after some four seconds
instead of 25-plus.)

Disabling wd does not fix the delay, but it doesn't completely
eliminate wd; I still see "wd at ... not configured" messages, so it's
found in some sense.  (I haven't tried completely removing wd from the
kernel config.)

The machine without the delay has wd drives, but they attach at
ahcisata instead of piixide/pciide.  The piixide attachment (the
machine that exhibits the delay) is

piixide0 at pci0 dev 31 function 2: Intel 6 Series Serial ATA Controller (rev. 
0x05)
piixide0: bus-master DMA support present
piixide0: primary channel configured to native-PCI mode
piixide0: using ioapic0 pin 19 for native-PCI interrupt
atabus0 at piixide0 channel 0
piixide0: secondary channel configured to native-PCI mode
atabus1 at piixide0 channel 1
...
piixide1 at pci0 dev 31 function 5: Intel 6 Series Serial ATA Controller (rev. 
0x05)
piixide1: bus-master DMA support present
piixide1: primary channel wired to native-PCI mode
piixide1: using ioapic0 pin 19 for native-PCI interrupt
atabus2 at piixide1 channel 0
piixide1: secondary channel wired to native-PCI mode
atabus3 at piixide1 channel 1
...
wd0 at atabus0 drive 0
wd0: 
wd0: drive supports 16-sector PIO transfers, LBA48 addressing
wd0: 465 GB, 969021 cyl, 16 head, 63 sec, 512 bytes/sect x 976773168 sectors (0 
bytes/physsect; first aligned sector: 8)
wd0: 32-bit data port
wd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 5 (Ultra/100), WRITE 
DMA FUA, NCQ (32 tags)
wd0(piixide0:0:0): using PIO mode 4, Ultra-DMA mode 5 (Ultra/100) (using DMA), 
WRITE DMA FUA EXT
wd1 at atabus2 drive 0
wd1: 
wd1: drive supports 1-sector PIO transfers, LBA48 addressing
wd1: 7641 MB, 15525 cyl, 16 head, 63 sec, 512 bytes/sect x 15649200 sectors
wd1: 32-bit data port
wd1: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 6 (Ultra/133), WRITE 
DMA FUA, NCQ (32 tags)
wd1(piixide1:0:0): using PIO mode 4, Ultra-DMA mode 6 (Ultra/133) (using DMA), 
WRITE DMA FUA EXT

In contrast, the ahcisata attachment, on the machine with no delay, is

ahcisata0 at pci0 dev 23 function 0: vendor 8086 product a102 (rev. 0x31)
ahcisata0: 64-bit DMA
ahcisata0: AHCI revision 1.31, 4 ports, 32 slots, CAP 
0xe734ff43
ahcisata0: interrupting at msi1 vec 0
atabus0 at ahcisata0 channel 0
atabus1 at ahcisata0 channel 1
atabus2 at ahcisata0 channel 2
atabus3 at ahcisata0 channel 3
...
ahcisata0 port 0: device present, speed: 6.0Gb/s
ahcisata0 port 3: PHY offline
ahcisata0 port 1: PHY offline
ahcisata0 port 2: PHY offline
...
wd0 at atabus0 drive 0
wd0: 
wd0: drive supports 16-sector PIO transfers, LBA48 addressing
wd0: 1863 GB, 3876021 cyl, 16 head, 63 sec, 512 bytes/sect x 3907029168 sectors 
(0 bytes/physsect; first aligned sector: 8)
wd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 6 (Ultra/133), NCQ 
(32 tags) w/PRIO
wd0(ahcisata0:0:0): using PIO mode 4, DMA mode 2, Ultra-DMA mode 6 (Ultra/133) 
(using DMA), NCQ (31 tags) w/PRIO

I don't suppose this gives anyone any helpful thoughts?

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


re: 9.1: boot-time delay?

2021-05-25 Thread matthew green
> > +optionsHZ=8000

this can become a problem due to integer division.

any number of ticks less than hz (8000) will be rounded
down to 0 in a number of places now, where as before it
was only less than 100.  i've seen this trip up in the
kernel before, and sometimes that '0' means 'poll', and
sometimes it means 'sleep forever'.

a lot of places in the kernel *do* avoid (eg, with adding
hz-1 and then dividing by hz) but there are a number that
do not...


.mrg.


Re: 9.1: boot-time delay?

2021-05-25 Thread Mouse
> Is there a reason you need a frequency that high?

Yes.  I'm not just cranking HZ up to be difficult.  (I might do it just
to smoke out bugs...on my _own_ machines.  Except, of course, that I
don't use 9.x except at work, where the tradeoffs are different.)

> Will HZ=1000 be sufficient and does that reduce the boot time?

The former is "no", in the sense that it will not be suitable for
production use on the turnkey system in question.  See below.

The latter is a good question which is likely to hint at possible
causes.  I'll experiment with various HZ values and see what happens.

> Also,while I don't entirely understand all the timing mechanisms
> inside NetBSD, it seems if you do need a higher frequency clock, I'd
> suggest HZ=1, since it's a factor of 100 times the default,
> rather than an odd 80 times the default.

Unfortunately a value that is not a multiple of 8000 is not suitable.
(Why would 100 be better than 80, by the way?  I see no reason other
than a human-layer bias towards powers of 10, so I'm wondering if
that's all it is or if there's something I'm missing.)

In this application, userland computes values and pushes them, via a
parallel port, to external D->A hardware.  (Among other things.
Userland does quite a lot else, too; I am outlining just the stuff
relevant to HZ=8000.)  They need to be pushed at fairly precisely 8KHz
to get the desired result out of the D->A, but they are produced
burstily (IIRC, 4KHz bursts of two values each are most common).

The cleanest solution would probably be an external FIFO with an
external 8KHz clock.  But, for reasons such as installed hardware base,
addressing it in software is preferred if it works.  (Getting customers
to just install new software is much cheaper for the company than
forklift upgrades would be.)  I added code to the lp driver to buffer
values in the kernel and push them out based on clock ticks.  This
works relatively well; the only downside we have uncovered is the
boot-time delay, and even that surprises me.  Doing it in software
isn't perfect.  There is still the occasional glitch in timing when the
relevant interrupts are locked out.  But, so far, those glitches have
been a lower price than the external hardware.

I was rather grasping at straws when I tried changing HZ to see if it
fixed it; I was in somewhat of a "when you've eliminated the
impossible..." place after removing all three video drivers
(i915drmkms/intelfb, radeon/radeondrmkmsfb, and nouveau/nouveaufb) and
finding the delay still present.  I tried HZ next because there were
very few changes between the two kernels, and HZ looked like the most
likely of the ones remaining after the video drivers didn't fix it.  I
then tried the HZ change with all the video drivers present and it
still was fast with no HZ option but slow with HZ=8000.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: 9.1: boot-time delay?

2021-05-25 Thread Brian Buhrow
Hello.  Is there a reason you need a frequency that high?  Will HZ=1000 
be sufficient and
does that reduce the boot time?  
Also,while I don't entirely understand all the timing mechanisms inside NetBSD, 
it seems if you
do need a higher frequency clock, I'd suggest HZ=1, since it's a factor of 
100 times the
default, rather than an odd 80 times the default.
-thanks
-Brian

On May 25,  9:44am, Mouse wrote:
} Subject: Re: 9.1: boot-time delay?
} Last week, I wrote, here, of a delay when booting 9.1
} 
} >>> [ 3.288539] uhub2: 4 ports with 4 removable, self powered
} >>> [ 3.288539] uhub3: 6 ports with 6 removable, self powered
} >>> [25.272567] wd0 at atabus0 drive 0
} >>> [25.273568] wd0: 
} 
} and, in a later mail,
} 
} > [A]s soon as I sent my mail and started looking at subsetting diffs,
} > I discovered the diffs between the installer kernel and the
} > operational kernel were far smaller than I remembered (and mostly
} > irrelevant):
} 
} It looks as though this, of all things, is the relevant line:
} 
} > +optionsHZ=8000
} 
} My first reaction is that something somewhere is delaying by a fixed
} number of ticks rather than doing arithmetic with hz, but that would
} make the difference go in the other direction.  I'm having trouble
} coming up with a plausible scenario that would explain the delay
} getting _longer_ with HZ=8000.  The most plausible thing I've thought
} of is that some delay is being computed based on hz and then getting
} treated as milliseconds instead of ticks or some such and getting
} multiplied by hz again.
} 
} Anyone have any thoughts on possible ways to track this one down?  I'm
} going to be doing what I can, but any thoughts anyone has would be
} welcomed; I am only minimally familiar with 9.1's kernel internals.
} 
} /~\ The ASCII   Mouse
} \ / Ribbon Campaign
}  X  Against HTML  mo...@rodents-montreal.org
} / \ Email! 7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B
>-- End of excerpt from Mouse




Re: 9.1: boot-time delay?

2021-05-25 Thread Mouse
Last week, I wrote, here, of a delay when booting 9.1

>>> [ 3.288539] uhub2: 4 ports with 4 removable, self powered
>>> [ 3.288539] uhub3: 6 ports with 6 removable, self powered
>>> [25.272567] wd0 at atabus0 drive 0
>>> [25.273568] wd0: 

and, in a later mail,

> [A]s soon as I sent my mail and started looking at subsetting diffs,
> I discovered the diffs between the installer kernel and the
> operational kernel were far smaller than I remembered (and mostly
> irrelevant):

It looks as though this, of all things, is the relevant line:

> +options  HZ=8000

My first reaction is that something somewhere is delaying by a fixed
number of ticks rather than doing arithmetic with hz, but that would
make the difference go in the other direction.  I'm having trouble
coming up with a plausible scenario that would explain the delay
getting _longer_ with HZ=8000.  The most plausible thing I've thought
of is that some delay is being computed based on hz and then getting
treated as milliseconds instead of ticks or some such and getting
multiplied by hz again.

Anyone have any thoughts on possible ways to track this one down?  I'm
going to be doing what I can, but any thoughts anyone has would be
welcomed; I am only minimally familiar with 9.1's kernel internals.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: 9.1: boot-time delay?

2021-05-18 Thread Rhialto
On Tue 18 May 2021 at 15:40:05 -0400, Mouse wrote:
> This leads me to the depressing conclusion that it is probably one of
> the video drivers, which are not optional for the operational kernel.

I was going to suggest that even before reading this mail. I noticed in
my new machine that with an nVidia card there was a pause similar to
what you described. It went away when I put in a Radeon card instead.
The nVidia was also not-quite-functional, so I wanted to get rid of it
anyway.

-Olaf.
-- 
___ Q: "What's an anagram of Banach-Tarski?"  -- Olaf "Rhialto" Seibert
\X/ A: "Banach-Tarski Banach-Tarski." -- rhialto at falu dot nl


signature.asc
Description: PGP signature


Re: 9.1: boot-time delay?

2021-05-18 Thread Mouse
>> [ 3.288539] uhub2: 4 ports with 4 removable, self powered
>> [ 3.288539] uhub3: 6 ports with 6 removable, self powered
>> [25.272567] wd0 at atabus0 drive 0
>> [25.273568] wd0: 

> There are multiple things with long timeouts happening here, but it
> should be simple to find them.

> Most obvious ones are:

>  - all disk controllers probe their devices and
>e.g. ATA disks have 30s timeouts for the identification.
>  - USB devices are enumerated

I doubt it's either of those, because the installer kernel and the
operational kernel have identical configurations in those respects, and
the issue manifests without any hardware changes at all between the two
boots.

However, as soon as I sent my mail and started looking at subsetting
diffs, I discovered the diffs between the installer kernel and the
operational kernel were far smaller than I remembered (and mostly
irrelevant):

--- kconf/INSTALLER 2020-11-24 13:27:50.336171910 -0500
+++ kconf/RUN   2021-05-18 15:28:52.898200745 -0400
@@ -1,9 +1,7 @@
 include"arch/amd64/conf/std.amd64"
 
 optionsINCLUDE_CONFIG_FILE # embed config file in kernel binary
-
-optionsHALTED_MESSAGE="\"\\n\\nInstaller done.\\nPlease remove flash 
drive and restart your system.\\n\""
-optionsSILENT_DETACH
+optionsHZ=8000
 
 #ident "GENERIC-$Revision: 1.531.2.10 $"
 
@@ -243,13 +241,13 @@
 #options   WSEMUL_SUN  # sun terminal emulation
 #options   WSEMUL_DEFAULT="\"vt100\""  # NB: default is "sun" if enabled
 # different kernel output - see dev/wscons/wsdisplayvar.h
-optionsWS_KERNEL_FG=WSCOL_RED
+optionsWS_KERNEL_FG=WSCOL_BLACK
 optionsWS_KERNEL_BG=WSCOL_BLACK
 # compatibility to other console drivers
-#options   WSDISPLAY_COMPAT_PCVT   # emulate some ioctls
-#options   WSDISPLAY_COMPAT_SYSCONS# emulate some ioctls
-#options   WSDISPLAY_COMPAT_USL# wsconscfg VT handling
-#options   WSDISPLAY_COMPAT_RAWKBD # can get raw scancodes
+optionsWSDISPLAY_COMPAT_PCVT   # emulate some ioctls
+optionsWSDISPLAY_COMPAT_SYSCONS# emulate some ioctls
+optionsWSDISPLAY_COMPAT_USL# wsconscfg VT handling
+optionsWSDISPLAY_COMPAT_RAWKBD # can get raw scancodes
 # don't attach pckbd as the console if no PS/2 keyboard is found
 optionsPCKBD_CNATTACH_MAY_FAIL
 # see dev/pckbport/wskbdmap_mfii.c for implemented layouts
@@ -263,14 +261,14 @@
 # work around a hardware bug that loaded fonts don't work; found on ATI cards
 #options   VGA_CONSOLE_ATI_BROKEN_FONTSEL
 # console scrolling support.
-optionsWSDISPLAY_SCROLLSUPPORT
+#options   WSDISPLAY_SCROLLSUPPORT
 # enable VGA raster mode capable of displaying multilingual text on console
 #options   VGA_RASTERCONSOLE
 # enable splash screen support; requires genfb or radeonfb
 #options   SPLASHSCREEN
 
 # Kernel root file system and dump configuration.
-config netbsd  root on ? type ?
+config netbsd  root on wd0e type ?
 #confignetbsd  root on sd0a type ffs
 #confignetbsd  root on ? type nfs
 
@@ -405,7 +403,7 @@
 wsdisplay* at wsemuldisplaydev?
 wskbd* at pckbd? console ?
 wsmouse*   at pms? mux 2
-wsmouse*   at wsmousedev?
+wsmouse*   at wsmousedev? mux 2
 
 attimer0   at isa?
 pcppi0 at isa?
@@ -422,17 +420,17 @@
 #tdfxdrm*  at drm? # 3dfx (voodoo) DRM driver
 
 # DRMKMS drivers
-#i915drmkms*   at pci? dev ? function ?
-#intelfb*  at intelfbbus?
+i915drmkms*at pci? dev ? function ?
+intelfb*   at intelfbbus?
 
-#radeon*   at pci? dev ? function ?
-#radeondrmkmsfb* at radeonfbbus?
+radeon*at pci? dev ? function ?
+radeondrmkmsfb* at radeonfbbus?
 
 #amdgpu*   at pci? dev ? function ?
 #amdgpufb* at amdgpufbbus?
 
-#nouveau*  at pci? dev ? function ?
-#nouveaufb*at nouveaufbbus?
+nouveau*   at pci? dev ? function ?
+nouveaufb* at nouveaufbbus?
 
 # DRMUMS drivers
 #viadrmums*at drm?
@@ -572,7 +570,7 @@
 
 # I2C Mice
 ims*   at ihidev? reportid ?
-wsmouse* at ims? mux 0
+wsmouse* at ims? mux 2
 
 # I2O devices
 iop*   at pci? dev ? function ?# I/O processor

(HALTED_MESSAGE and SILENT_DETACH are options I added; HALTED_MESSAGE
just replaces the "halted, press any key to reboot" message on halt,
while SILENT_DETACH just does flags|=DETACH_QUIET on entry to
config_detach; the idea is to avoid spamming the console with detach
messages on halt).

This leads me to the depressing conclusion that it is probably one of
the video drivers, which are not optional for the operational kernel.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: 9.1: boot-time delay?

2021-05-18 Thread Martin Husemann
On Tue, May 18, 2021 at 03:01:55PM -0400, Mouse wrote:
> [ 3.288539] uhub2: 4 ports with 4 removable, self powered
> [ 3.288539] uhub3: 6 ports with 6 removable, self powered
> [25.272567] wd0 at atabus0 drive 0
> [25.273568] wd0: 

There are multiple things with long timeouts happening here, but it
should be simple to find them.

Most obvious ones are:

 - all disk controllers probe their devices and
   e.g. ATA disks have 30s timeouts for the identification.
 - USB devices are enumerated

I would try to kill all usb controllers (e.g. via userconf or special test
kernel) and see if that fixes things, or all disk controllers. Then
disable individual USB devices one by one untill you have the culprit (or all
disk controllers).

For the USB part you also could try  kernhist for options USBHIST
and check vmstat output right after booting single user (or from ddb).

Martin

Martin


Re: 9.1: boot-time delay?

2021-05-18 Thread David Brownlee
On Tue, 18 May 2021 at 20:02, Mouse  wrote:
>
> I'm dealing with a turnkey product running under 9.1/amd64.  On certain
> hardware, there is a pause, almost exactly 22 seconds, during autoconf.
> I'm trying to eliminate it.  A sufficiently cut-down kernel does the
> job, but another cut-down kernel doesn't.  I'm trying to track down
> what's responsible.  (The kernel that eliminates the pause is used by
> the installer; the one that doesn't is the one that's used in normal
> operation.  Unless it turns out to be something essential for
> operation, I'd like to cut it out of the operational kernel.)
[...]
> [ 3.288539] uhub2: 4 ports with 4 removable, self powered
> [ 3.288539] uhub3: 6 ports with 6 removable, self powered
> [25.272567] wd0 at atabus0 drive 0
> [25.273568] wd0: 

I'd take a long hard look at what ata or atapi devices were configured
in the kernel - smells like a timeout (though I would have expected 30
seconds...) Though that seems obvious enough to have already been
checked :-p

David