Re: continuous ffs_blkfree_common panic

2024-05-04 Thread Michael van Elst
mar...@duskware.de (Martin Husemann) writes:

>Did you run a forced fsck on the file system?

I just got this panic running 10.99.10 in qemu.

[   4.0895156] panic: ffs_blkfree_common: freeing free frag: dev = 0x1300,
+block = 135894, fs = /

This happened after a clean reboot, after the rc message "Starting file
system checks:".

After another reboot (same qemu process), there was no more error.

If that's a hardware problem, it also affects virtualized hardware
(in this case virtio-block).



Re: disklabel change?

2024-04-22 Thread Michael van Elst
pr...@welche.eu (Patrick Welche) writes:

>In fact, the difference is between "-t" and "-rt":

>I deem "-t" output to be correct (and matches what I had in /etc/diskpart)


The in-kernel disklabel gets the RAW_PART from by the disk geometry
and if RAW_PART == 3, it gets d_partitions[2] from the MBR partition
table.

That explains why 'disklabel -t' looks correct, it shows the in-kernel
disklabel.

It doesn't explain why the on-disk label has the entries swapped.
When you edit the disklabel, the kernel writes to the disk. When
that corrects the error, the bug is in the disklabel program,
otherwise it's in the kernel.



Re: raidframe and gpt

2024-03-16 Thread Michael van Elst
p...@whooppee.com (Paul Goyette) writes:

>> Does anyone have an example of how to configure raid0 on a GPT disk?

For a manual setup, you can just reference the wedges like:

# raidctl config file for /dev/rraid0

START array
# numRow numCol numSpare
1 2 0

START disks
NAME=raid0.0
NAME=raid0.1

START layout
# sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_1
128 1 1 1

START queue
fifo 100


Auto detection of a RAID works by collecting disks with valid
RAID labels into RAID sets. This also works with wedges, the
actual device or unit number is irrelevant, the RAID set is
identified by the unique serial number in the RAID label.
The serial number is specified with the -I option of raidctl.

Even booting from a RAID on GPT is possible with a recent bootloader
and an autoconfigured RAID set.


Here is more about RAID on GPT:

https://wiki.netbsd.org/users/spz/moderndisk/



>> I can easily set the partition type with gpt, but how do I reserve
>> space for the raid component label?  Do I need to reserve that space?

You don't, the components are the GPT partitions.


>> Also, does raidframe understand the NAME=gpt-label syntax in the
>> config file?  Or does it require me to specify the particular dk ?
>> (And what happens if something moves and  changes?)

I've added support for the NAME= Syntax some time ago.


>One more quuestion: the raidctl man page talks about partitioning the
>raid device using mbr partitions.  Is it possible to use GPT here?
>Will the resulting wedges show up automatically?

Whenever you create a raid device (raidN attaches), it will be
scanned for wedges like a regular disk device.

# dkctl dk4 getwedgeinfo
dk4 at raid2: tank
dk4: 4294967296 blocks at 128, type: ffs

# grep tank /etc/fstab
NAME=tank   /tank   ffs rw,log  1 2

This system configures a RAID1 of two RAID0 sets. That's not supported
by the RAID autoconfiguration, so it is created manually at boot time
and obviously the system can't boot from it.




Re: dwiic errors

2024-03-14 Thread Michael van Elst
p...@whooppee.com (Paul Goyette) writes:

>as soon as you proceed past this point (including normal non-single-
>user boot), the dwiic starts spewing time-out messages.  These
>messages come every 0.5 second or so, and there's usually a hundred
>or more messages before they stop;  in some cases the messages have
>continued to stream by for several minutes (at which point I pressed
>the reset button).  The value for %d is always 0 or 1.

Probably result of

GENERIC:ihidev* at iic?

that is probing for a modern laptop touchpad.

Can you disable ihidev instead of dwiic and see what happens then ?



Re: Problem with umass/scsibus/wd0

2024-03-13 Thread Michael van Elst
On Tue, Mar 12, 2024 at 11:00:02PM -0700, Paul Goyette wrote:
> 
> ``scsictl sd0 start'' makes a little bit of progress, and claims
> to be "fabricating a geometry".  ``gpt show -a sd0'' shows two
> partitions (one for NetBSD backups, and one for Windoze backups)
> 
>   # gpt show sd0
>  startsize  index  contents
>  0   1 PMBR
>  1   1 Pri GPT header
>  2  32 Pri GPT table
> 342014 Unused
>   2048  4294967296  1  GPT part - NetBSD FFSv1/FFSv2
> 4294969344  3518951424  2  GPT part - Windows basic data
> 7813920768   49119 Unused
> 7813969887  32 Sec GPT table
> 7813969919   1 Sec GPT header

That looks fine.

> But it does not seem to progress to the discover-wedges process,
> and no wedges seem to exist:
> 
>   # dkctl sd0 listwedges
>   /dev/rsd0: no wedges configured

The wedge autodetection happens when the device attaches (and failed
since the disk was offline). This is different from disklabels that
are fetched by the first opener (and are usually dropped with
the last close, except traditionally for vnd).

You can manually trigger autodetection with

dkctl sd0 makewedges



Greetings,
-- 
Michael van Elst
Internet: mlel...@serpens.de
"A potential Snark may lurk in every tree."


Re: Problem with umass/scsibus/wd0

2024-03-12 Thread Michael van Elst
p...@whooppee.com (Paul Goyette) writes:


>[ 29641.773703] umass0 at uhub11 port 4 configuration 1 interface 0
>[ 29641.773703] umass0: Western Digital (0x1058) Elements 2621 (0x2621), rev 
>3.20/10.34, addr 4
>[ 29641.773703] umass0: using SCSI over Bulk-Only
>[ 29641.793714] scsibus0 at umass0: 2 targets, 1 lun per target
>[ 29641.793714] sd0 at scsibus0 target 0 lun 0:  disk 
>fixed
>[ 29641.793714] sd0(umass0:0:0:0):  Check Condition on CDB: 0x00 00 00 00 00 00
>[ 29641.793714] SENSE KEY:  Not Ready
>[ 29641.793714]  ASC/ASCQ:  Logical Unit Is In Process Of Becoming Ready
>[ 29641.793714] sd0: drive offline


Sounds like that drive isn't spinning up.

The "Elements" product doesn't exactly tell what it is, some units
either come with their own power supply or require non-standard
USB power.

Maybe 'scsictl sd0 start' helps to get the disk online. If that
has an effect you may need 'dkctl sd0 makewedges' if you use a GPT
label.



Re: ccd error with two large components

2024-03-03 Thread Michael van Elst
roland.il...@gmx.de (Roland Illig) writes:

>That's this line:

>> unit =3D *(const int *)newp;

>I don't know at which point newp is validated; maybe that validation is
>missing in this case, although I'd expect it to be in the common sysctl
>infrastructure code.


newp is valid, it's a userland pointer that gets dereferenced by the
kernel. Without SMAP that probably even worked on x86.






Re: bug in ftp(1)?

2024-02-18 Thread Michael van Elst
w...@netbsd.org (Thomas Klausner) writes:

>ftp: Receiving HTTP reply: Input line is too long

#define   FTPBUFLEN   (4 * MAXPATHLEN)
char buf[FTPBUFLEN];

That's 4kB.

>curl -v https://sourceforge.net/projects/courier/files/courier-unicode/2.3.=
>0/courier-unicode-2.3.0.tar.bz2

This returns a 5kB HTTP header "content-security-policy".

There is no protocol limit, but common server implementations do limit header
lines to something between 4k (some nginx versions) to 48k (tomcat).



Re: Experience with Epyc 8/9004 series CPUs?

2023-11-29 Thread Michael van Elst
kar...@netbsd.org (Frank Kardel) writes:

>As you said a "couple of years ago" I assume those CPUs where not 
>Zen4-architecture.

The new pkgbuilder is Zen3, so still too old.



Re: NetBSD-current on thinkpad T495, sound issue (was "audio issue on -current")

2023-11-26 Thread Michael van Elst
shev.vt1...@gmail.com (Vitaly Shevtsov) writes:

>When I'm listening to music I get this error after some time:
>audio1(hdafg1): audio_write: device timeout, seq=16987,
>usrbuf=60224/H60224, outbuf=8192/8192

You get timeouts when the backend driver (hdafg1) doesn't
finish playing buffers. So that's probably a bug there or
maybe a bug in interrupt routing.

Here, audio1/hdafg1 is a digital output (HDMI) which also has
problems but with different symptoms. But the analog output
audio0/hdafg0 works fine.



Re: sys/dev/usb/if_axen.c

2023-11-19 Thread Michael van Elst
On Sun, Nov 19, 2023 at 06:40:19AM +, sc.dy...@gmail.com wrote:

There are regular Ethernet headers (padded/align with 0x).

> 1700366741.414702 axen_uno_rx_loop#0@0:  207b d26f4276 00e04c53 44580800
>      <-- pkt #1

>   0ec0pkt #1--> 
> 1700366741.414722 axen_uno_rx_loop#0@0:  207b d26f4276 00e04c53 44580800
>   0ed0   <-- pkt #2


No idea what this is, but it doesn't really look like garbage.

>   0bd0pkt #2--> 
> 1700366741.414739 axen_uno_rx_loop#0@0:  0008ec05 0080 0008ec05 0080


What is the format of these (64bit wide?) headers ?

>   0be0   <  pkt_hdr #1   > <  pkt_hdr #2   >
> 1700366741.414739 axen_uno_rx_loop#0@0:   0400e00b 400657b9 c0a80104
>   0bf0  <--  garbage  -->


Greetings,
-- 
Michael van Elst
Internet: mlel...@serpens.de
"A potential Snark may lurk in every tree."


Re: sys/dev/usb/if_axen.c

2023-11-13 Thread Michael van Elst
mak...@ki.nu (Makoto Fujiwara) writes:

>I've compared to openbsd: if_axen.c 
>   https://raw.githubusercontent.com/openbsd/src/master/sys/dev/usb/if_axen.c
>to N, and there are so many differencies.

>Does this (N) if_axen.c works on any installation ?


axen seems to work, but I can see that the code does nonsense if
you receive something a buffer with pkt_count == 0.

I suggest to dump the whole buffer as it was received.



Re: file-backed cgd backup question

2023-10-22 Thread Michael van Elst
g...@lexort.com (Greg Troxel) writes:

>> vnd opens the backing file when the unit is created and closes
>> the backing file when the unit is destroyed. Then you can access
>> the file again.

>Is there a guarantee of cache consistency for writes before and reads
>after?

Before the unit is created you can access the file and after the
unit is destroyed you can access the file. That's always safe.

I also think that when the unit is configured but not opened
(by device access or mounts) it is safe to access the file.


>> The data is written directly to the allocated blocks of the file.
>> So exclusively opening  the backing file _or_ the vnd unit should
>> also be safe. But that's not much different from accessing any file
>> concurrently, which also leads to "corrupt", inconsistent backups.

>That's a different kind of corrupt.

Yes, but in the end it's the same, the "backup" isn't usuable.

You cannot access the backing file to get a consistent state of the
data while a unit is in use. And that's independent of how vnd accesses
the bits.

N.B. if you want to talk about dangers, think about fdiscard(). I
doubt that it is safe in the context of the vnd optimization.




Re: cpuctl ucode: no patch available

2023-10-21 Thread Michael van Elst
w...@netbsd.org (Thomas Klausner) writes:

>I read about a new microcode update for the AMD Zen family, downloaded
>the linux firmware repository and tried to apply it.

We don't support microcode updates for Zen (just some older AMD model).




Re: file-backed cgd backup question

2023-10-20 Thread Michael van Elst
g...@lexort.com (Greg Troxel) writes:

>I dimly knew this, but keep forgetting.  Reading vndconfig(8), it does
>not explain that the normal path leads to incorrect behavior (stale
>reads from file cache even after closing the vnd, mtime).

vnd opens the backing file when the unit is created and closes
the backing file when the unit is destroyed. Then you can access
the file again.

The data is written directly to the allocated blocks of the file.
So exclusively opening  the backing file _or_ the vnd unit should
also be safe. But that's not much different from accessing any file
concurrently, which also leads to "corrupt", inconsistent backups.

Updating the backing file mtime on close sounds useful. I'm not sure
what effect updating atime/mtime on every access would have.

> This
>optimization is sufficiently dangerous and not expected that it needs to
>be documented clearly and loudly.  I just added a note to the man page.

I think the reference to "ciphertext" should be adjusted and the
text should be toned more neutral when describing the functionality.

Pointing to the -i option to disable the optimization unconditionally
might also be helpful.



Re: file-backed cgd backup question

2023-10-19 Thread Michael van Elst
w...@netbsd.org (Thomas Klausner) writes:

>For a cgd in a file that I mount via vnd+cgd, the file system contents
>inside may change, but the actual file on the hard disk outside only
>has 'access' time changes. So "smart" backup programs that check
>timestamps to find out if they need to re-hash files don't notice it
>was changed. How do you handle this? Manually touch it?

vnd has an optimization where the backing file isn't touched, but
the underlying device is accessed directly. Then file cache and
device aren't in sync and a backup program reading the file might
read stale data. vnd should probably update the file when
unconfiguring, but so far it does not.

The optimization is disabled under some conditions and explicitely
if you use 'vnconfig -i'. Then all operations are done by file
I/O and the timestamps of the backing file are maintained.
The extra caching of course affects performance.



Re: cgd questions

2023-10-02 Thread Michael van Elst
w...@netbsd.org (Thomas Klausner) writes:

>Follow up question because it just happened to me:

>I have a USB Disk with ffs-on-cgd.  I unmounted the ffs but forgot
>unconfiguring the cgd before unplugging the disk.

>Can this cause problems? What kinds?

Shouldn't do any harm, there is no state on the disk but the data
itself and FFS has already flushed everything and waited for completion.

The cgd device probably doesn't detach immediately and an attempt
to use it may crash the system. But unconfiguring it immediately
after should be safe.



Re: heartbeat panic by heavy traffic

2023-09-15 Thread Michael van Elst
bou...@antioche.eu.org (Manuel Bouyer) writes:

>But the clock softint shouldn't be locked out for 16s, ever.

Then the clock softint must have a higher priority than
everything else including hard interrupts.

Obviously that's not how the system is designed, there
are no limits on how long specific events may take and
thus no guarantee for lower priority tasks to actually
execute with a certain time. That would be some kind
of real-time system.

Such systems also rarely panic if they detect a violation
of their rules.

In any case, locking out lower priority tasks by an
overwhelmed network layer probably isn't the bug that
we look for.



Re: heartbeat panic by heavy traffic

2023-09-15 Thread Michael van Elst
mar...@duskware.de (Martin Husemann) writes:

>On Fri, Sep 15, 2023 at 12:17:58PM +0900, Masanobu SAITOH wrote:
>> I think it would be good to change the default behavior from
>> panic to something others because GENERIC kernel enables HEARTBEAT.
>> by default. One of idea is to print warning message at sufficient intervals.

>I disagree. It is very important that we fix the underlying problem
>instead. Without hearbeat, this behaviour is still visible (but undiagnosable).

The crash here comes from how the network stack operates. Running at
a higher priority, it locks out the lower priority clock softint
and heartbeat detects that and crashes the system intentionally.

I don't consider that useful even in a test environment.



Re: NetBSD 10.0 timeline and branch status

2023-09-10 Thread Michael van Elst
bsieg...@gmail.com (Benny Siegert) writes:

>>=20
>> Unfortunately the additional shared library changes require another =
>round
>> of package rebuilds from scratch. Everyond building packages against
>> netbsd-10: please start a new round from scratch.

>Does that mean the pkgsrc-2023Q2 binary packages for 10_BETA 2 that have =
>been published recently are useless on a new 10_BETA install?

Yes.


>That=E2=80=99s too bad. I was looking forward to using those packages to =
>set up some new CI build machines. Should I wait for the 2023Q3 builds =
>then?

I hope that the library changes are completed soon and that we can quickly
rebuild 2023Q2.



Re: cpu temperature readings

2023-07-02 Thread Michael van Elst
k...@munnari.oz.au (Robert Elz) writes:

>So my current guess (and it is no more than that) would be that if
>powerd happens to notice that happening, which would require it to
>look at just the right time, then powerd does a system shutdown.
>If powerd doesn't notice quickly enough, the CPU (or BIOS perhaps)
>sees that things are getting too hot, and no-one is taking any corrective
>action, and simply kills the power.

The overheating flag persists, i.e. "somewhen overheated in the past".


>I suppose it is possible that when the temp rises very rapidly, very
>quickly, as in when doing CPU intensive work, at high speed, the cooler
>also ramps up, and cools at max rate, whereas when the CPU temp just
>creeps up slowly, the cooler doesn't notice it happening, and fails to
>react - but that seems kind of unlikely to me,   It certainly doesn't
>seem like the kind of problem that can be caused by thermal paste (or
>the lack thereof).

The Cooler is much slower than the (small) die. Any rapid temperature
change is buffered by the heat sink (better if there is good
conductivity).  Air or water carry away the energy, but slowly.

I'd rather think that an idle core that is put to "turbo" speeds
is clocked higher and produces heat faster than can be buffered by
the heat sink. But if the machine is busy, none of the cores actually
reaches the extreme clock rates and there are no short term heat peaks
that trigger the "critical" flag.

That's all just guessing. But if a better connection between CPU
and heat sink helps, it might be right.



Re: cpu temperature readings

2023-07-02 Thread Michael van Elst
On Sun, Jul 02, 2023 at 04:16:51PM +0700, Robert Elz wrote:
> 
> if ((msr & MSR_THERM_STATUS_CRIT_STA) != 0)
> edata->state = ENVSYS_SCRITICAL;
> 
> that is, rather than reaching some configured limit, simply being told
> by the cpu that the status is critical ?

Yes. That bit also triggers powerd.


>   | or something completely
>   | different (motherboard power regulators or even the PSU?).
> 
> Certainly.   Anything is possible.   I suspect something changed
> (broke, or wore out) about a month ago - clearly it is marginal, and
> only seems to affect things in turbo mode (higher power draw), as
> this is a new phenomenom in the past month or so.

It's also possible that the heat sink needs attention (re-apply thermal
paste or similar), this might prevent the CPU from reaching a critical
temperature.


Greetings,
-- 
    Michael van Elst
Internet: mlel...@serpens.de
"A potential Snark may lurk in every tree."


Re: cpu temperature readings

2023-07-02 Thread Michael van Elst
k...@munnari.oz.au (Robert Elz) writes:

>  | You can probably avoid this, if you limit the chip to performance of the
>  | non-selected die (in real applications it will probably lose 1-5%). The
>  | BIOS should have a setting for the cTDP value that you can play with.

>If I am understanding you, which I might not be, you mean slow down the
>fastest cores from 5.5GHz (two cores are currently allowed to run that
>fast, I found the settings for that) to (probably) 5.2GHz - the other
>6 performance cores are currently limited to that (and I think that's
>as fast as they're normally expected to run).

Turbo speed is controlled by the power (dissipation) budget. On
some CPUs you don't have to control the clock itself, but the
available power, and that should also be possible with the
i9-12900.

In the end that means the chip either won't reach it's maximum turbo
speed, or only for a shorter time, or only when cooled better. The
value that corresponds to this is called cTDP (and usually used
to raise the value for extreme overclocking, but it can also be
reduced).

I haven't seen such a setting in the Asrock Z690 BIOS though.


>[Aside: I also noticed that the BIOS claims that the min available
>frequency is 400MHz ... NetBSD thinks 800MHz is as slow as it should go,
>that's the min value in machdep.cpu.frequency.available].

The values probably come from ACPI. I first thought there was a limit
of 16 states, but we (arbitrarily) have a limit of 256. So either
ACPI doesn't show all states that you can see in the BIOS interface
or we have a bug.


>I got to look at all that as the system shut itself down again in the early
>hours of this morning (here) - A/C was on, so room was cool, I had turbo
>mode enabled, just to see if it would still cause a problem, and it seems
>that it did (at the minute, as long as I leave that off, the system is
>stable).

coretemp doesn't have thresholds, so it cannot trigger powerd to shut down.


>  Note that I am still just guessing that thermal issues are what
>is causing this, almost always the system is just running fine, with
>envstat reporting elevated temperatures, but nothing close to 100 - the
>highest I saw before the shutdown were in the low 60's - but I wasn't
>actually watching those numbers at the time), and then it is off.
>No warning (that I saw anyway) - just off.   This time I restarted
>immediately, and as soon as I could, looked at the BIOS's cpu temp
>value - that was about 36C.   But the BIOS doesn't use turbo mode I
>don't believe, so it would have been running slower, and the BIOS spends
>quite a bit of time doing whatever it does, before it allows any kind of
>interaction.

Immediate power off also doesn't suggest that this is a shutdown. I would
guess it's either the CPU reaching its limit (unlikely to your description,
but the temperature can change very very quickly) or something completely
different (motherboard power regulators or even the PSU?).

On server motherboards you would often have some BMC logging the issue.
The Z690 Taichi BIOS seems to have an event log, not sure what it actually
logs.



Re: cpu temperature readings

2023-07-01 Thread Michael van Elst
k...@munnari.oz.au (Robert Elz) writes:

>I have been running that kernel now for approaching 18 hours.   At boot
>time (when coretemp is being attached) Tjmax was read as 115 (on all cores,
>I don't know if that's supposed to be a per-core value, or not, but that
>doesn't matter), and nothing I have done since (including changing the
>cpu frequency target (and actual frequency) has made any difference, it
>seems to simply be a constant 115 no matter what (and the effort to read
>it every time it is used, rather than just remember what was read first,
>seems to be unnecessary - at least on this processor (and perhaps BIOS).

That's what I found first, the i9-12900KS (KS for the selected die)
actually has a Tjmax of 115C (but would report 100C by default, a value
that the BIOS may change).


>I have just set the cpu freq to 3401 (enabled the "turbo boost" - though
>I am not convinced there's an actual turbocharger in the CPU anywhere)
>and the temps more or less immediately rose to the low 50's (a 15 degree
>increase).   I suppose that is possible, but it seems a bit extreme, just
>for enabling higher speed on a system which is really doing nothing that
>matters.

To support the "turbo" speeds, you need higher voltages and it is plausible
that the voltages need to be set for the worst case because switching the
clock to "turbo" doesn't control the voltages (or not fast/precise enough).

That effect is usually not that noticable, but my guess is that the bias is
so much higher for the selected die.

You can probably avoid this, if you limit the chip to performance of the
non-selected die (in real applications it will probably lose 1-5%). The
BIOS should have a setting for the cTDP value that you can play with.



Re: cpu temperature readings

2023-06-29 Thread Michael van Elst
s
(except for the bound system threads).


On Intel however (at least on this i5), the mapping alternates between
both threads:

cpu0: Cluster/Package ID 0
cpu0: Core ID 0
cpu0: SMT ID 0

cpu1: Cluster/Package ID 0
cpu1: Core ID 0
cpu1: SMT ID 1

cpu2: Cluster/Package ID 0
cpu2: Core ID 1
cpu2: SMT ID 0

cpu3: Cluster/Package ID 0
cpu3: Core ID 1
cpu3: SMT ID 1



I expect this to be replaced with something much more bizarre. There
are already data structures that describe the CPU topology.


Greetings,
-- 
Michael van Elst
Internet: mlel...@serpens.de
"A potential Snark may lurk in every tree."


Re: cpu temperature readings

2023-06-29 Thread Michael van Elst
On Thu, Jun 29, 2023 at 06:01:28PM +0700, Robert Elz wrote:

> It is, and I'm aware of it.  I'm not sure why Michael wanted to know
> whether the speed was actually being altered or not,

One possibility would be that the 3401 mode didn't enable turbo frequencies
but actually throttled the CPU (e.g. due to a faulty BIOS). Then the low
temperature readings would have been only a logical consequence.


Greetings,
-- 
    Michael van Elst
Internet: mlel...@serpens.de
"A potential Snark may lurk in every tree."


Re: cpu temperature readings

2023-06-29 Thread Michael van Elst
On Thu, Jun 29, 2023 at 12:03:33PM +0200, Rhialto wrote:
> On Thu 29 Jun 2023 at 16:50:27 +0700, Robert Elz wrote:
> > And then for fun, at 3401 ... this one I needed to run the test several
> > times until the kernel picked one of the fastest processors to run it on
> 
> When I was muddling with estd to dynamically slow down my cpus when not
> in use, I was told that the xx01 frequency on modern (Intel) processors
> will do that, even though in many sources that setting is still called
> "turbo boost" or similar. The other frequencies would actually be fixed.
> In your cpu this may be the case too, which would give confusing results
> if you're not aware of the possibility.


The xx01 frequency sets the maximum base clock and enables turbo mode...
on systems that support such a setting.

On "modern CPUs" however, it is often sufficient to stay on that setting
as "racing to completion" needs the least power. To some degree however
this assumes that the CPU enters a low-power idle mode when "complete",
so YMMV on NetBSD.

The Haswell CPU here (room temperature about 27C) runs idle at about 40C
when clocked at minimum 800, but heats up to 47C when idling at 3300 and
there is no difference to 3301.


Greetings,
-- 
Michael van Elst
Internet: mlel...@serpens.de
"A potential Snark may lurk in every tree."


Re: cpu temperature readings

2023-06-29 Thread Michael van Elst
On Thu, Jun 29, 2023 at 04:50:27PM +0700, Robert Elz wrote:

> It looks to me as if the frequency adjustments are working properly,

Then it gets really strange what the temperature sensor would see.

One possibility would be that the Tjmax value is actually changed
dynamically (maybe some SMM code) and that the patch isn't complete
to handle this.


> though NetBSD's cpu selection algorithm doesn't (yet anyway) really
> understand processors like this.

The scheduler did use first cores first, with performance cores
using low cpu numbers, they should be utilized first but not
necessarily for the important workloads.

It now handles big.little configurations independent of cpu numbers,
but probably only on arm.



-- 
        Michael van Elst
Internet: mlel...@serpens.de
"A potential Snark may lurk in every tree."


Re: cpu temperature readings

2023-06-29 Thread Michael van Elst
k...@munnari.oz.au (Robert Elz) writes:

>  | When this happens, is the machine actually running at 3400 MHz?
>How do I tell?

You could run a benchmark like 'openssl speed sha256' and compare
the 3400 MHz target and the target and step lower.


>  | >The motherboard is an AsRock Z690 Taichi.
>  | Any deviation from factory settings ?

>Several, but nothing which should affect the CPU operation, I'm not
>into overclocking or anything like that (so no voltage changes, or
>anything like that).   I have disabled hyperthreading, and adjusted
>some of the fan thresholds to make them run faster sooner.

ok.



Re: cpu temperature readings

2023-06-28 Thread Michael van Elst
k...@munnari.oz.au (Robert Elz) writes:

>When the
>cpu frequency target is changed to 3400, all the core temp values drop
>to lower than room air temp (which according to my probably inaccurate
>desk lamp, is currently 22.5, the coretemp values are all in the 15-18
>range at the minute).

When this happens, is the machine actually running at 3400 MHz?


>The motherboard is an AsRock Z690 Taichi.

Any deviation from factory settings ?




Re: cpu temperature readings

2023-06-27 Thread Michael van Elst
k...@munnari.oz.au (Robert Elz) writes:

>cpu0: "12th Gen Intel(R) Core(TM) i9-12900KS"

The chip apparently reports a Tjmax of 100 C (as for the non-selected chip)
but actually has a real Tjmax of 115 C.

There are two caveats:

Our driver ignores Tjmax of > 110 C (and uses 100 C as default). If the
chip would report the real value, we would ignore it.

Intel recommends that the BIOS fakes the value and configures the MSR ten
degrees lower (so you see Tjmax of 90 C).


The temperature sensor reading is relative to Tjmax.

/*
 * The temperature is computed by
 * subtracting the reading by Tj(max).
 */
edata->value_cur = sc->sc_tjmax;
edata->value_cur -= __SHIFTOUT(msr, MSR_THERM_STATUS_READOUT);


So it could be 15C lower than reality (if the default of 100 instead
of 115 is used) or even 25C lower if (if the Intel recommenendation
is followed).



Re: cpu temperature readings

2023-06-27 Thread Michael van Elst
k...@munnari.oz.au (Robert Elz) writes:

>sysctl -w machdep.cpu.frequency.target=3D2500

>(reducing from the apparent max, 3401) the temps dropped (almost
>instantly) from upper 30's (C) to low 40's, down to the high teens
>or very low 20's.

coretemp temperatures in that range are unlikely to be true.

But you didn't tell what sensors you were reporting. Is that coretemp?
Some ACPI value? A motherboard sensor (e.g. lm0)?


>cpu0: "12th Gen Intel(R) Core(TM) i9-12900KS"

That's a selected 241W chip that may heat up to > 100 C (Tjmax = 115 C)
and usually requires a liquid cooler. Idle temperature between 50C and 60C
are normal.



Re: ssh client_loop send disconnnect from Dom0 -> DomU (NetBSD 10.0_BETA/Xen)

2023-06-21 Thread Michael van Elst
r...@sdf.org (RVP) writes:

>I don't get that: there's no pipe there when you do `> file'. So how come
>a Broken pipe still?

It's the communication between ssh and sshd where ssh can no longer write
to a network connection closed by sshd. The problem is to find out why
the connection got closed.




Re: scp/sftp -R broken?

2023-06-05 Thread Michael van Elst
w...@netbsd.org (Thomas Klausner) writes:

>Hi!

>When I try to recursively copy a directory with "scp -r" or sftp's
>"put -Rp" between a -current and a NetBSD 9, I see:

># scp -r a netbsd-9:
>scp: realpath ./a: No such file
>scp: upload "./a": path canonicalization failed
>scp: failed to upload directory a to .

That was a known issue with sftp when the target directory (in this
case that's netbsd-9:./a) does not exist and the OpenSSH 9 server
apparently has fixed it.

I doubt that anyone fixes OpenSSH 8, so using -O for the scp
protocol fallback is the only thing you can do.



Re: Raspberry Pi 3 aarch64 -current fails to find root fs label and boot.

2023-04-05 Thread Michael van Elst
On Wed, Apr 05, 2023 at 05:16:17PM +0200, Bartek Krawczyk wrote:

> I see both get root=NAME=netbsd-root and -current also detects this dk1
> wedge with the same "netbsd-root" name.
> I got the same issue 2-3 weeks ago but disregarded it thinking "oh well,
> -current" but seems something is not right. Does this qualify for a PR or am
> I doing something wrong?

Something surely goes wrong.


-- 
        Michael van Elst
Internet: mlel...@serpens.de
"A potential Snark may lurk in every tree."


Re: Raspberry Pi 3 aarch64 -current fails to find root fs label and boot.

2023-04-05 Thread Michael van Elst
bbartlomiej.m...@gmail.com (Bartek Krawczyk) writes:

>My /boot/cmdline.txt has only:
>root=NAME=netbsd-root

You can use'ofctl /chosen'   to find out what bootargs
are passed to the kernel.




Re: Raspberry Pi 3 aarch64 -current fails to find root fs label and boot.

2023-04-05 Thread Michael van Elst
bbartlomiej.m...@gmail.com (Bartek Krawczyk) writes:

>[   1.4967611] ld0: 117 GB, 15371 cyl, 255 head, 63 sec, 512 bytes/sect 
>x 246947840 sectors
>[   1.5157040] dk0 at ld0: "EFI", 163840 blocks at 32768, type: msdos
>[   1.5157040] dk1 at ld0: "netbsd-root", 246743040 blocks at 196608, 


>[   3.2258697] boot device: ld0
>[   3.2369263] root on ld0a dumps on ld0b
>[   3.2369263] vfs_mountroot: can't open root device
>[   3.2369263] cannot mount root, error = 16
>[   3.2497924] root device (default ld0a):
>[   4.1352268] dump device (default ld0b):

The bootloader (EFI?) tells the kernel to use ld0a and ld0b.
No idea why, maybe it doesn't support GPT?

The kernel produces wedges, and access to ld0 is forbidden
(error = 16 == EBUSY), but you can select a wedge as dkN (or by name).

To find the problem, you'd need to analyse your image and bootloader
version.



Re: LLONG_MAX not available from c++

2023-03-31 Thread Michael van Elst
mar...@duskware.de (Martin Husemann) writes:

> > c++ -dM -E - < /dev/null | fgrep __STDC_VERSION__
>#define __STDC_VERSION__ 201710L
> > c++ -dM -E - < /dev/null | fgrep __ISO

There is magic involved.

% touch c.c
% ls -l c.c
-rw-r--r--  1 mlelstv  staff  0 Mar 31 17:28 c.c

% c++ -dM -E - < c.c | grep STDC
#define __STDC_HOSTED__ 1
#define __STDC_UTF_16__ 1
#define __STDC_VERSION__ 201710L
#define __GNUC_STDC_INLINE__ 1
#define __STDC_UTF_32__ 1
#define __STDC__ 1

% c++ -dM -E c.c | grep STDC
#define __STDC_HOSTED__ 1
#define __STDC_UTF_16__ 1
#define __GNUC_STDC_INLINE__ 1
#define __STDC_UTF_32__ 1
#define __STDC__ 1




Re: LLONG_MAX not available from c++

2023-03-31 Thread Michael van Elst
w...@netbsd.org (Thomas Klausner) writes:

>> Make sure c++ with using at least -std=c++11?

>Same error, also with c++17 and gnu++17. Probably lua does something
>weird.

lua defines _XOPEN_SOURCE in lprefix.h:

#if !defined(_XOPEN_SOURCE)
#define _XOPEN_SOURCE   600
#elif _XOPEN_SOURCE == 0
#undef _XOPEN_SOURCE  /* use -D_XOPEN_SOURCE=0 to undefine it */
#endif

That's why it's no longer _NETBSD_SOURCE (featuretest).
c++ also doesn't define __STDC_VERSION__ nor _ISOC99_SOURCE.

And so limits.h doesn't define LLONG_MAX.


>From what I found LLONG_MAX should also be defined for C++11 and
later, then limits.h should also check for __cplusplus >= 201100L.




Re: CARP misbehaving on RPi2+ / RPI3

2023-03-26 Thread Michael van Elst
rela...@gmail.com (Greywolf) writes:

>Greetings; I have a couple RPi that I will refer to here as 'thing1'
>and 'thing2'.  Both are running evbarm-earmv7h, NetBSD 10-BETA.

The carp driver sent advertising packets through the carp interface
which sets the source MAC address to the CARP virtual MAC address
(computed from the vhid value).

A normal ethernet switch will then learn that this MAC address
exists on multiple ports and change its forwarding table all
the time, resulting in lots of packet drops. The CARP election
protocol then doesn't work.

I have comitted a change to send advertisments through the backing
'carpdev' interface instead. This uses its own distinct MAC
address, and the switch stays happy.

With the consumer switches here, it still takes up to 20 seconds to notice
when CARP moves the virtual MAC address to the other machine.



Re: ipmi0: incorrect critical max

2023-03-18 Thread Michael van Elst
net...@precedence.co.uk (Stephen Borrill) writes:

>   Current  CritMax  WarnMax  WarnMin  CritMin  Unit
>[ipmi0]
>11-LOM-CORE:59.2530.000  110.471degC

>Seen on 9.3_STABLE, but also in 10 BETA.

>I suppose one simple fix would be to ensure that if CritMax is lower 
>than WarnMax, it should be set to the value of WarnMax.

IPMI reports 3 upper and 3 lower limits (each as an unsigned byte)
and a bitmask to show which value is valid.

lower non-recoverable threshold
-> configures CritMin
lower critical threshold
-> configures CritMin
lower non-critical threshold
-> configures WarnMin

lower limits of 0 are ignored, because you cannot exceed them.


upper non-recoverable threshold
-> configures CritMax
upper critical threshold
-> configures CritMax
upper non-critical threshold
-> configures WarnMax

upper limits of 255 are ignored, because you cannot exceed them.


Apparently your system says that the upper critical or the
non-recoverable threshold exist but returns a value of zero.

The code could do some more sanity checking and then just
skip the invalid limits.

Something like:

@@ -1582,6 +1684,16 @@ ipmi_get_sensor_limits(struct ipmi_softc
break;
}
 
+   if ((data[0] & 0x28) == 0x28 && data[6] < data[4])
+   data[0] ^= 0x20;
+   if ((data[0] & 0x18) == 0x18 && data[5] < data[4])
+   data[0] ^= 0x10;
+
+   if ((data[0] & 0x0a) == 0x0a && data[3] > data[1])
+   data[0] ^= 0x08;
+   if ((data[0] & 0x06) == 0x06 && data[2] > data[1])
+   data[0] ^= 0x04;
+
if (data[0] & 0x20 && data[6] != 0xff) {
*pcritmax = ipmi_convert_sensor([6], psensor);
*props |= prop_critmax;


As an alternative you could also override the limit in /etc/envsys.conf.




Re: GENERIC64 aarch64 failure to autoboot

2023-03-05 Thread Michael van Elst
On Sun, Mar 05, 2023 at 10:56:31PM +0100, Michael van Elst wrote:
> On Mon, Mar 06, 2023 at 07:44:20AM +1030, Brett Lymn wrote:
> > On Sun, Mar 05, 2023 at 03:01:02PM -, Michael van Elst wrote:
> > >  
> > > - if (guid != NULL && len == 16)
> > > + if (guid == NULL || len == 16)
> > > +
> > 
> > Shouldn't that be "len != 16"?
> 
> Yes, and another error. The wedge device is 'dv' not 'dev'.
> 
> Here is a patch that works for me:

The first hunk was another local change. Please ignore.
Here is without:


Index: sys/arch/evbarm/fdt/fdt_machdep.c
===
RCS file: /cvsroot/src/sys/arch/evbarm/fdt/fdt_machdep.c,v
retrieving revision 1.100
diff -p -u -r1.100 fdt_machdep.c
--- sys/arch/evbarm/fdt/fdt_machdep.c   5 Feb 2023 22:42:39 -   1.100
+++ sys/arch/evbarm/fdt/fdt_machdep.c   5 Mar 2023 21:59:49 -
@@ -743,9 +743,6 @@ fdt_detect_root_device(device_t dev)
 {
int error, len;
 
-   if (booted_device)
-   return;
-
const int chosen = OF_finddevice("/chosen");
if (chosen < 0)
return;
@@ -801,8 +798,15 @@ fdt_detect_root_device(device_t dev)
const struct uuid *guid =
fdtbus_get_prop(chosen, "netbsd,gpt-guid", );
 
-   if (guid != NULL && len == 16)
-   booted_device = dev;
+   if (guid == NULL || len != 16)
+   return;
+
+   char guidstr[UUID_STR_LEN];
+   uuid_snprintf(guidstr, sizeof(guidstr), guid);
+
+   device_t dv = dkwedge_find_by_wname(guidstr);
+   if (dv != NULL)
+   booted_device = dv;
 
return;
}
@@ -895,8 +899,7 @@ fdt_cpu_rootconf(void)
if (device_class(dev) != DV_DISK)
continue;
 
-   if (device_is_a(dev, "ld") || device_is_a(dev, "sd") || 
device_is_a(dev, "wd"))
-   fdt_detect_root_device(dev);
+   fdt_detect_root_device(dev);
 
if (booted_device != NULL)
break;


Re: GENERIC64 aarch64 failure to autoboot

2023-03-05 Thread Michael van Elst
On Mon, Mar 06, 2023 at 07:44:20AM +1030, Brett Lymn wrote:
> On Sun, Mar 05, 2023 at 03:01:02PM -0000, Michael van Elst wrote:
> >  
> > -   if (guid != NULL && len == 16)
> > +   if (guid == NULL || len == 16)
> > +
> 
> Shouldn't that be "len != 16"?

Yes, and another error. The wedge device is 'dv' not 'dev'.

Here is a patch that works for me:


Index: sys/arch/evbarm/fdt/fdt_machdep.c
===
RCS file: /cvsroot/src/sys/arch/evbarm/fdt/fdt_machdep.c,v
retrieving revision 1.100
diff -p -u -r1.100 fdt_machdep.c
--- sys/arch/evbarm/fdt/fdt_machdep.c   5 Feb 2023 22:42:39 -   1.100
+++ sys/arch/evbarm/fdt/fdt_machdep.c   5 Mar 2023 21:54:09 -
@@ -209,6 +209,15 @@ fdt_add_boot_physmem(const struct fdt_me
bp->bp_pages = atop(eaddr) - bp->bp_start;
bp->bp_freelist = VM_FREELIST_DEFAULT;
 
+#ifdef VM_FREELIST_4GB
+   if (eaddr < (paddr_t)4 * 1024 * 1024 * 1024)
+   bp->bp_freelist = VM_FREELIST_4GB;
+#endif
+#ifdef VM_FREELIST_1GB
+   if (eaddr < (paddr_t)1 * 1024 * 1024 * 1024)
+   bp->bp_freelist = VM_FREELIST_1GB;
+#endif
+
 #ifdef PMAP_NEED_ALLOC_POOLPAGE
const uint64_t memory_size = *(uint64_t *)arg;
if (atop(memory_size) > bp->bp_pages) {
@@ -743,9 +752,6 @@ fdt_detect_root_device(device_t dev)
 {
int error, len;
 
-   if (booted_device)
-   return;
-
const int chosen = OF_finddevice("/chosen");
if (chosen < 0)
return;
@@ -801,8 +807,15 @@ fdt_detect_root_device(device_t dev)
const struct uuid *guid =
fdtbus_get_prop(chosen, "netbsd,gpt-guid", );
 
-   if (guid != NULL && len == 16)
-   booted_device = dev;
+   if (guid == NULL || len != 16)
+   return;
+
+   char guidstr[UUID_STR_LEN];
+   uuid_snprintf(guidstr, sizeof(guidstr), guid);
+
+   device_t dv = dkwedge_find_by_wname(guidstr);
+   if (dv != NULL)
+   booted_device = dv;
 
return;
}
@@ -895,8 +908,7 @@ fdt_cpu_rootconf(void)
if (device_class(dev) != DV_DISK)
continue;
 
-   if (device_is_a(dev, "ld") || device_is_a(dev, "sd") || 
device_is_a(dev, "wd"))
-   fdt_detect_root_device(dev);
+   fdt_detect_root_device(dev);
 
if (booted_device != NULL)
break;


-- 
Michael van Elst
Internet: mlel...@serpens.de
"A potential Snark may lurk in every tree."


Re: GENERIC64 aarch64 failure to autoboot

2023-03-05 Thread Michael van Elst
mlel...@serpens.de (Michael van Elst) writes:

>On Sun, Mar 05, 2023 at 12:56:29PM +, Chavdar Ivanov wrote:
>[   1.3797015] dk0 at sd0: "EFI system", 262144 blocks at 2048, type:
>msdos
>[   1.3897890] dk1 at sd0: "cc8f4a89-edc0-48d1-b9ce-b40d227a4a07",
>> netbsd,gpt-guid 894a8fcc c0edd148 b9ceb40d 227a4a07   
>> .J.H"zJ.

>Means, the bootloader passes dk1 as the boot device.
>But the code only checks "ld", "sd" and "wd" devices:


This might help (compile tested only):


Index: sys/arch/evbarm/fdt/fdt_machdep.c
===
RCS file: /cvsroot/src/sys/arch/evbarm/fdt/fdt_machdep.c,v
retrieving revision 1.100
diff -p -u -r1.100 fdt_machdep.c
--- sys/arch/evbarm/fdt/fdt_machdep.c   5 Feb 2023 22:42:39 -   1.100
+++ sys/arch/evbarm/fdt/fdt_machdep.c   5 Mar 2023 14:55:41 -
@@ -743,9 +743,6 @@ fdt_detect_root_device(device_t dev)
 {
int error, len;
 
-   if (booted_device)
-   return;
-
const int chosen = OF_finddevice("/chosen");
if (chosen < 0)
return;
@@ -801,7 +798,14 @@ fdt_detect_root_device(device_t dev)
const struct uuid *guid =
fdtbus_get_prop(chosen, "netbsd,gpt-guid", );
 
-   if (guid != NULL && len == 16)
+   if (guid == NULL || len == 16)
+   return;
+
+   char guidstr[UUID_STR_LEN];
+   uuid_snprintf(guidstr, sizeof(guidstr), guid);
+
+   device_t dv = dkwedge_find_by_wname(guidstr);
+   if (dv != NULL)
booted_device = dev;
 
return;
@@ -895,8 +899,7 @@ fdt_cpu_rootconf(void)
if (device_class(dev) != DV_DISK)
continue;
 
-   if (device_is_a(dev, "ld") || device_is_a(dev, "sd") || 
device_is_a(dev, "wd"))
-   fdt_detect_root_device(dev);
+   fdt_detect_root_device(dev);
 
if (booted_device != NULL)
break;




Re: GENERIC64 aarch64 failure to autoboot

2023-03-05 Thread Michael van Elst
On Sun, Mar 05, 2023 at 12:56:29PM +, Chavdar Ivanov wrote:

[   1.3797015] dk0 at sd0: "EFI system", 262144 blocks at 2048, type:
msdos
[   1.3897890] dk1 at sd0: "cc8f4a89-edc0-48d1-b9ce-b40d227a4a07",

> netbsd,gpt-guid 894a8fcc c0edd148 b9ceb40d 227a4a07   .J.H"zJ.


Means, the bootloader passes dk1 as the boot device.

But the code only checks "ld", "sd" and "wd" devices:

if (device_is_a(dev, "ld") || device_is_a(dev, "sd") || 
device_is_a(dev, "wd")) 
   fdt_detect_root_device(dev);

wedges are skipped, but even if a "dk" device would be checked, then:

if (of_hasprop(chosen, "netbsd,gpt-guid")) {
const struct uuid *guid =
fdtbus_get_prop(chosen, "netbsd,gpt-guid", );

if (guid != NULL && len == 16)
booted_device = dev;

return;
}

this matches anything.

In this case, the first "ld", "sd" or "wd" device matches as soon
as a netbsd,gpt-guid is passed by the bootloader.


Greetings,
-- 
Michael van Elst
Internet: mlel...@serpens.de
"A potential Snark may lurk in every tree."


Re: GENERIC64 aarch64 failure to autoboot

2023-03-04 Thread Michael van Elst
ci4...@gmail.com (Chavdar Ivanov) writes:

>Since my last aarch64 build yesterday, 03/03/2023, my machine no
>longer boots automatically,

sys/arch/evbarm/fdt/fdt_machdep.c 1.100

changed how the boot disk is determined. Apparently it now fails for you.





Re: bta2dpd plays too fast

2023-02-19 Thread Michael van Elst
nathanialsl...@yahoo.com.au (Nat Sloss) writes:

>The solution is to use bta2dpd with the pad(4) device which is throttled 
>sending only the required amount of audio data in the right time.

In current or in netbsd-9 ?



Re: 10.99.2 panic in kern_timeout.c

2023-02-03 Thread Michael van Elst
w...@netbsd.org (Thomas Klausner) writes:

>> The biggest change recently is probably that my bulk build switched
>> from ghc92 to ghc94, but I don't know if that could cause this.

>Next bulk build, next panic, quite reliably. Has anyone else seen this?

Not yet, maybe this is the first use of timerfd from multiple threads.



Re: Does X11 user need permission to /dev/dri/card0

2023-01-03 Thread Michael van Elst
mayur...@acm.org (Mayuresh) writes:

>What do I miss if I don't give dri permission? E.g. would firefox be
>terribly slower?

It is slower, and much slower for anything using video content or
things like WebGL.


>I can already see that I can't even run xscreensaver-demo. I can
>understand performance implications of not having access to dri, but this
>seems a functional restriction as well.

xscreensaver-demo doesn't depend on DRI, but it might depend on
capabilities of your X server, in particular the GLX extension
and double-buffering.



Re: Does X11 user need permission to /dev/dri/card0

2023-01-03 Thread Michael van Elst
mayur...@acm.org (Mayuresh) writes:

>Can someone throw some light on whether wheel membership is mandatory for
>dri? And about the crash if such permission is given?


It's the opposite, the DRI device needs to be accessible to users
that should be allowed to use it. You could create a 'drm' group
and make the device node be owned by group 'drm'. You could make
the device node mode 666 and allow access to everyone.

Obviously such users have control over the GPU and thus the
current desktop session (of any other user). That's why it
is best practice to have access restricted.


If the system crashes when you allow access, this means that
the DRM code is buggy



Re: 10.0 BETA : Status of Wireless N

2023-01-03 Thread Michael van Elst
mayur...@acm.org (Mayuresh) writes:

>On Tue, Dec 27, 2022 at 07:35:56PM +0530, Mayuresh wrote:
>> Is N supported on 10.0 BETA? Is some configuration required to enable the
>> same?

>Which component decides this - firmware? driver? kernel? something else?

All of them.


>Is Wireless N generally not supported, or does it depend on the device?

Merging the FreeBSD wifi subsystem that supports 802.11n is work in
progress and is developed on a separate branch.


>Are there any other devices for which on NetBSD I can get wireless N
>support?

The Broadcom "full mac" chips (e.g. used by RaspberryPi) run most of
the Wifi protocol inside the firmware and will work with 802.11n networks.
Doesn't mean it is fast.

If you are lucky, you can find the same chip with USB (only seen as the
orignal RPI Wifi dongle) or PCIe interface (some Mini-PCIe card).




Re: ECONNREFUSED no longer works

2022-10-30 Thread Michael van Elst
t...@netbsd.org (Tobias Nygren) writes:

>$ nc -n -v 127.0.0.1 1234
># hangs forever in connect(2) instead of exiting w/ connection refused.

The logic in tcp_drop() got reversed:

@@ -1042,17 +1017,12 @@ tcp_newtcpcb(int family, void *aux)
 struct tcpcb *
 tcp_drop(struct tcpcb *tp, int errno)
 {
-   struct socket *so = NULL;
+   struct socket *so;
 
-   KASSERT(!(tp->t_inpcb && tp->t_in6pcb));
+   KASSERT(tp->t_inpcb != NULL);
 
-   if (tp->t_inpcb)
-   so = tp->t_inpcb->inp_socket;
-#ifdef INET6
-   if (tp->t_in6pcb)
-   so = tp->t_in6pcb->in6p_socket;
-#endif
-   if (!so)
+   so = tp->t_inpcb->inp_socket;
+   if (so != NULL)<-
return NULL;
 
if (TCPS_HAVERCVDSYN(tp->t_state)) {




Re: 9.99.104: panic in tcp_shutdown_wrapper

2022-10-29 Thread Michael van Elst
ozak...@netbsd.org (Ryota Ozaki) writes:

>I've committed a possible fix.  Could you try it?

>Thanks,
>  ozaki-r


I just got a NULL pointer dereference in tcp_ctloutput where
the previous check for inp == NULL is also missing.

[ 24837.756043] fp c0016794db70 tcp_ctloutput() at c02ec4b4 
netbsd:tcp_ctloutput+0x94
[ 24837.756043] fp c0016794dcc0 tcp_ctloutput_wrapper() at c02d2680 
netbsd:tcp_ctloutput_wrapper+-0x31150
[ 24837.756043] fp c0016794dcf0 sosetopt() at c0603cbc 
netbsd:sosetopt+0x78
[ 24837.756043] fp c0016794ddb0 sys_setsockopt() at c060b0fc 
netbsd:sys_setsockopt+0x7c
[ 24837.766041] fp c0016794de20 syscall() at c00b30fc 
netbsd:syscall+0x19c

That's:

int   
tcp_ctloutput(int op, struct socket *so, struct sockopt *sopt)
{
...
s = splsoftnet();
inp = sotoinpcb(so);
...
}
tp = intotcpcb(inp); <-

switch (op) { 





Re: Using the audio(4) driver for recording under -current?

2022-10-27 Thread Michael van Elst
buh...@nfbcal.org (Brian Buhrow) writes:

>   hello.  The hdaudio driver I'm using is a locally patched version to 
> work around an issue
>where the driver doesn't configure the headphone jack correctly.  
>Specifically, it seems 
>the default configuration configures the jack for use with a microphone, 
>rather than with just
>a headset, which is how I use it.

It probably should allow for both. E.g. like:

[ 1.005556] hdafg0 at hdaudio0: Conexant CX20671
[ 1.005556] hdafg0: DAC00 2ch: Speaker [Built-In]
[ 1.005556] hdafg0: DAC01 2ch: HP Out [Jack]
[ 1.005556] hdafg0: ADC02 2ch: Mic In [Jack]
[ 1.005556] hdafg0: ADC03 2ch: Mic In [Built-In]

DAC01 and ADC02 are output and input for the headphone jack.


>The linux driver seems to have some knobs  which permit
>selecting the various operating modes for the jack.  Because I needed 
>something working
>quickly, I reworked the driver to disable the headphone jack and internal 
>speaker.  It has been
>my intention to go back and fix it so we have the knobs as well, but that 
>hasn't happened yet.
>So, if you know how to add the appropriate controls, I'm very interested.

>hdaudio0 at pci0 dev 31 function 3: HD Audio Controller
>hdaudio0: interrupting at msi2 vec 0
>hdaudio0: HDA ver. 1.0, OSS 9, ISS 7, BSS 0, SDO 1, 64-bit
>hdafg0 at hdaudio0: Realtek product 0255
>hdafg0: DAC00 2ch: Speaker [Jack]
>hdafg0: 2ch/0ch 44100Hz 48000Hz 96000Hz 192000Hz PCM16 PCM20 PCM24 AC3
>audio0 at hdafg0: playback, capture, full duplex, independent
>audio0: slinear_le:16 2ch 48000Hz, blk 1920 bytes (10ms) for playback
>audio0: slinear_le:16 2ch 48000Hz, blk 1920 bytes (10ms) for recording
>hdafg1 at hdaudio0: Intel product 2809
>hdafg1: DP00 8ch: Digital Out [Jack]
>hdafg1: 8ch/0ch 48000Hz PCM16*

audio0 is configured for playback and recording, but you only have outputs.



>uaudio0 at uhub1 port 3 configuration 1 interface 0
>uaudio0: C-Media Electronics Inc. (0x0d8c) C-Media USB Headphone Set (0x000c), 
>rev 1.10/1.00, addr 1
>uaudio0: audio rev 1.00
>audio1 at uaudio0: playback, capture, full duplex, independent
>audio1: slinear_le:16 2ch 48000Hz, blk 11520 bytes (60ms) for playback
>audio1: slinear_le:16 1ch 48000Hz, blk 5760 bytes (60ms) for recording

audio1 should allow playback and recording.



>uaudio1 at uhub1 port 5 configuration 1 interface 0
>uaudio1: Burr-Brown from TI (0x08bb) USB Audio CODEC (0x2902), rev 1.10/1.00, 
>addr 5
>uaudio1: audio rev 1.00
>audio2 at uaudio1: playback, capture, full duplex, independent
>audio2: slinear_le:16 2ch 48000Hz, blk 11520 bytes (60ms) for playback
>audio2: slinear_le:16 2ch 11025Hz, blk 2640 bytes (59.8ms) for recording

audio2 should also allow playback and recording.



My guess is that USB audio fails for some bug in the xhci driver handling 
isochronous transfers.

>xhci0 at pci0 dev 20 function 0: vendor 8086 product a2af (rev. 0x00)
>xhci0: 64-bit DMA
>xhci0: interrupting at msi0 vec 0
>xhci0: xHCI version 1.0
>uhub0 at usb0: NetBSD (0x) xHCI root hub (0x), class 9/0, rev 
>3.00/1.00, addr 0
>uhub0: 10 ports with 10 removable, self powered
>uhub1 at usb1: NetBSD (0x) xHCI root hub (0x), class 9/0, rev 
>2.00/1.00, addr 0
>uhub1: 16 ports with 16 removable, self powered





Re: Using the audio(4) driver for recording under -current?

2022-10-27 Thread Michael van Elst
buh...@nfbcal.org (Brian Buhrow) writes:

>Then, to record:
>cat /dev/sound2 > rawrecordingfile

Can you just try the audiorecord command instead of 'cat' ?




Re: Using the audio(4) driver for recording under -current?

2022-10-26 Thread Michael van Elst
buh...@nfbcal.org (Brian Buhrow) writes:

>   hello.  thanks for the feedback.  In my case, recording doesn't work 
> for hdaudio or uaudio
>devices.  What's strange is that I would expect data to be taken from the 
>wrong input,
>resulting in a file of silence, but I wouldn't expect kernel messages telling 
>me the audio device
>timed out.

When there is a audio timeout, the hardware stopped producing (or consuming)
data. Strange that this happens for hdaudio and uaudio at the same time.

I get a timeout for "recording" from the second hdaudio device (which
here is the display port).

N.B. if you test recording, check the hardware audio parameters with
audiocfg list and tell audiorecord to use exactly these values. This
removes the various converters from the stream.



Re: Using the audio(4) driver for recording under -current?

2022-10-26 Thread Michael van Elst
r...@sdf.org (RVP) writes:

>Perhaps the correct ADC was not selected? See Section 10.6.1 in the Guide[1].


If I follow this and use "all available sources", I get twice the
amount of data, the sources are interleaved in the input stream.

hdafg0: ADC02 2ch: Mic In [Jack]
hdafg0: ADC03 2ch: Mic In [Built-In]

With record.source=ADC02,ADC03 this records 4 samples ADC02.left, ADC02.right,
ADC03.left, ADC03.right per frame.

But audio(4) doesn't know about it, as the hardware sends data at twice
the rate, the result sounds like a half-speed recording with half
the volume (when one input is silent).



Re: Working usb audio device on current?

2022-10-05 Thread Michael van Elst
a...@sdf.org (adr) writes:

>On Mon, 3 Oct 2022, Michael van Elst wrote:
>>>> Yes, almost all USB audio devices work. We don't support the audio 2.0
>>>> standard (that's the >= 192kHz 24bit devices).
>>>> 
>>> Ok, thanks for the info.
>> 
>> A patch, slightly modified by me, for xhci (NetBSD7-current) that helps
>> for RPI4 USB audio is:
>> 
>> http://ftp.netbsd.org/pub/NetBSD/misc/mlelstv/xhci.patch

>Thanks, I'll take a look. Could you share where does it come from
>and why it hasn't been commited?

It's from a user who calls himself sc.dying.

The patch helps the VL805 USB controller, and so far has no impact on
the few other xhci devices I have, but it should get some more exposure.



Re: Working usb audio device on current?

2022-10-03 Thread Michael van Elst
a...@sdf.org (adr) writes:

>On Mon, 3 Oct 2022, Michael van Elst wrote:

>> a...@sdf.org (adr) writes:
>> 
>>> Are usb audio devices working in other machines? I've been looking
>>> at the tech-kern archives and I don't see any recent discussion on
>>> this.
>> 
>> Yes, almost all USB audio devices work. We don't support the audio 2.0
>> standard (that's the >= 192kHz 24bit devices).
>> 
>Ok, thanks for the info.

A patch, slightly modified by me, for xhci (NetBSD7-current) that helps
for RPI4 USB audio is:

http://ftp.netbsd.org/pub/NetBSD/misc/mlelstv/xhci.patch



Re: Working usb audio device on current?

2022-10-03 Thread Michael van Elst
a...@sdf.org (adr) writes:

>Are usb audio devices working in other machines? I've been looking
>at the tech-kern archives and I don't see any recent discussion on
>this.

Yes, almost all USB audio devices work. We don't support the audio 2.0
standard (that's the >= 192kHz 24bit devices).



Re: Working usb audio device on current?

2022-10-02 Thread Michael van Elst
a...@sdf.org (adr) writes:

>By the way, this is on an rpi4.

xhci(4) still has issues with isochronous transfers, in particular with
the USB3 chip used by rpi4.



Re: How to limit amount of virtual memory used for files (was: Re: Tuning ZFS memory usage on NetBSD - call for advice)

2022-09-22 Thread Michael van Elst
ll...@must-have-coffee.gen.nz (Lloyd Parkes) writes:

>HÃ¥vard's email about his G4 Mac Mini is an excellent example 
>of a problem. A problem I have experienced in the past was a program 
>failing with out of memory errors while processing 128MB of data on a 
>system with 256MB of RAM. A problem doesn't have to be a crash. It could 
>simply be unnecessary swap being used leading to terrible performance.

Swapping might be cheaper than trashing the file cache, it all depends
on your particular workload.

I don't think there is a smart heuristic that covers all (or just most)
cases. Even the current default tuning values used to be good (and are
probably still mostly good for a G4 Mac Mini).


There is a second issue, that occurs more frequently and is independent
of the VM tuning.

The system will buffer all output until it runs out of memory (short
of hitting secondary limits like the pager map on some systems).
We don't reserve enough memory nor react in a smart way to handle
this situation, and we don't put any pressure on applications
wasting memory that way.



Re: How to BIOS-boot from NVMe device?

2022-09-07 Thread Michael van Elst
p...@whooppee.com (Paul Goyette) writes:

>I have completely disconnected the wd0 and wd1 hard drives, and now
>the motherboard/BIOS can't find something (primary boot?).  It does
>not give any helpful messages (no messages at all), but just goes
>back to the interactivve BIOS screens.

It's likely that the BIOS doesn't know about the NVMe device
and you must use UEFI.


>PS I _do_ have a msdos/efi partition on the nvme, but I don't know
>what to put there!  :)

You need:

./efi/boot/bootx64.efi

and you may need to tell the machine that it should do an UEFI boot.



Re: Switching to the new DHCP from ISC?

2022-09-03 Thread Michael van Elst
jo...@bec.de (Joerg Sonnenberger) writes:

>On Sat, Sep 03, 2022 at 10:00:04AM +1200, Lloyd Parkes wrote:
>> Does anyone know of a maintained DHCP relay implementation?

>The better question for me is: are DHCP relayer server still in use?

Yes.



Re: ssh, HPN extension and TCP auto-tuning

2022-08-29 Thread Michael van Elst
clays.sh...@sdf.org (Clay Daniels) writes:

>On 8/29/22 12:32 AM, Michael van Elst wrote
>> It should work, but how does it perform?
>Are there any specific tests I can perform? I have working NetBSD install.

Any effects should be mostly visible by copying large-enough files with scp,
something that takes 20-30 seconds is good enough.
You would need to compare ssh before and after the patch, with and without
HPN, for upload and download.

The client always allows HPN, you can use 'ssh -oHPNDisabled=yes' or
'scp -oHPNDisabled=yes' to test without HPN even when the server allows it.



Re: ssh, HPN extension and TCP auto-tuning

2022-08-28 Thread Michael van Elst
On Mon, Aug 29, 2022 at 04:57:44AM +, Clay Daniels wrote:
> On Sat, 27 Aug 2022, Michael van Elst wrote:
> 
> > Date: Sat, 27 Aug 2022 22:21:17 - (UTC)
> > From: Michael van Elst 
> > To: current-users@netbsd.org
> > Newsgroups: lists.netbsd.current-users
> > Subject: Re: ssh, HPN extension and TCP auto-tuning
> > 
> > clays.sh...@sdf.org (Clay Daniels) writes:
> > 
> > > home. I would love to help test what I can. Do I need to get a more
> > > recent snapshot? I see one on the server that is 27 Aug 07:34. Would
> > > that one work?
> > 
> > The next snapshot will have it.
> > 
> > 
> 
> Installed Aug 28 19:33 snapshot. ssh'd into sdf.org email. ssh seems to work
> fine.

It should work, but how does it perform?



Greetings,
-- 
Michael van Elst
Internet: mlel...@serpens.de
"A potential Snark may lurk in every tree."


Re: ssh, HPN extension and TCP auto-tuning

2022-08-27 Thread Michael van Elst
clays.sh...@sdf.org (Clay Daniels) writes:

>home. I would love to help test what I can. Do I need to get a more 
>recent snapshot? I see one on the server that is 27 Aug 07:34. Would 
>that one work?

The next snapshot will have it.



Re: ssh, HPN extension and TCP auto-tuning

2022-08-27 Thread Michael van Elst
buh...@nfbcal.org (Brian Buhrow) writes:

>   hello.  Refresh my memory.  Is it the case that the HPN code 
> only runs if both ends
>of the ssh connection support HPN and have it turned on?

With only the client using HPN, you can tune the client receive buffer,
which may or may not help.

With client and server using HPN, buffers will automatically grow with
the TCP window size.

The HPN mode in the NetBSD ssh was buggy since probably netbsd-5 and
could make the connection slower. People would mostly notice that
copying from or to another NetBSD box was slower than from or to e.g.
a Linux box.


>Are there other OS's that support HPN natively?  

I don't think so, FreeBSD dropped it I think in -10 but has it in ports.
There is also

https://freebsd.pkgs.org/13/freebsd-amd64/openssh-portable-hpn-9.0.p1,1.pkg.html



ssh, HPN extension and TCP auto-tuning

2022-08-27 Thread Michael van Elst
In https://mail-index.netbsd.org/current-users/2017/09/20/msg032361.html
there was a discussion about effectiveness of the "High Performance
Networking" patch to OpenSSH that we keep in our tree.

Details about the HPN patch can be found at:
https://www.psc.edu/hpn-ssh-home/hpn-ssh-faq/

This led finally to the decision to disable the HPN mode by default
by setting the Option HPNDisabled to yes in /etc/ssh/sshd_config
for netbsd-10.

I have now committed part of the latest version of the HPN changes
that handle TCP window auto-scaling, which avoids the issue we saw
in the past, at least in the network environments I can test.

I'd like to see tests from other people that may help with the decision
to re-enable it, or to abandon the HPN changes.




Re: Weird clock behaviour with current (amd64) kernel

2022-08-14 Thread Michael van Elst
On Mon, Aug 15, 2022 at 01:09:55AM +0700, Robert Elz wrote:
>   | N.B. It would be nice if there were a (MI) boot option that could be used
>   | to influence HZ without recompiling. Booting into ddb and patching hz
>   | and a few derived variables isn't that comfortable.
> 
> It would be nice...
> 
> I doubt patching is feasible, there are quite a few places in the code
> which use HZ rather than hz (quite a lot in the DRM code), there also
> appear to be some ports that make assumptions about what HZ is going to
> be - which can be checked at compile time, but which patching would destroy.

Our code should be safe, the old DRM code did use:

./drm/dist/bsd-core/drmP.h:#define DRM_HZ   hz

and DRM2 uses:

./drm2/include/asm/param.h:#define  HZ  hz

but obviously, this needs to be verified and there can be
weak assumptions about possible HZ values.

But I doubt these issues are difficult to find and to fix.


Providing a common boot parameter over all ports and port masters
on the other hand starts with the question what color it should be
painted.

Greetings,
-- 
            Michael van Elst
Internet: mlel...@serpens.de
"A potential Snark may lurk in every tree."


Re: Weird clock behaviour with current (amd64) kernel

2022-08-14 Thread Michael van Elst
k...@munnari.oz.au (Robert Elz) writes:

>Pity this won't help with PR 43997 - but I conclude from your response about
>that, that if the host running qemu had HZ set significantly higher than 100
>then qemu (hosting a kernel with HZ==100) would probably work just fine?

Yes.

FreeBSD has the following comment (in sys/conf/NOTES):

# The granularity of operation is controlled by the kernel option HZ (default
# frequency of 1000 Hz or a period 1ms between calls). Virtual machine guests
# use a value of 100.

N.B. It would be nice if there were a (MI) boot option that could be used
to influence HZ without recompiling. Booting into ddb and patching hz
and a few derived variables isn't that comfortable.



Re: Weird clock behaviour with current (amd64) kernel

2022-08-14 Thread Michael van Elst
s...@stix.id.au (Paul Ripke) writes:

>This is likely somewhat similar to what I reported here:
>http://mail-index.netbsd.org/current-users/2019/07/29/msg036293.html

>tl;dr: weird clock behaviour on GCE micro instances. This at least
>provides a nice easy testbed.

| ACPI-Safe: ntp syncs fine, clock maintains time, but "sleep 5" sleeps about 
20s.

Means, that the system does not maintain 100 clock interrupts per second
but the emulated ACPI timer is stable.

You could try if a kernel with HZ=10 gets proper time.



Re: Weird clock behaviour with current (amd64) kernel

2022-08-13 Thread Michael van Elst
On Sun, Aug 14, 2022 at 09:00:20AM +0700, Robert Elz wrote:
> Date:Sun, 14 Aug 2022 00:28:38 +0200
> From:Joerg Sonnenberger 
> Message-ID:  
> 
>   | I'm more wondering about the LAPIC frequency here. That one is normally
>   | used to drive the clockintr and if that frequency is off, interrupt rate
>   | would be off too. Does the interrupt rate match HZ?
> 
> That's a very good question, I never thought to check that, and should have.
> I will do later today, when I also test Michael's latest patch for HPET
> overflow.
> 
> Thanks both.
> 
> Do you (either of you, or anyone else) consider that what is happening
> here might be related to PR 43997?  If it is, then this might not be
> quite so unimportant as I had been considering it.


PR 43997 is more of a bug in qemu than anything else. You cannot emulate
a 100Hz interrupt when your clock granularity for sleep is 10ms. Best you
can do is to catch up interrupts when you are too late but which has other
problems. Qemu doesn't catch up, and so the emulated interrupt effectively
runs at 50Hz.

Linux (tickless kernel) has a clock granularity of ideally zero (in reality
limited by clock resolution and CPU speed), so you don't see such a problem
there.

You can still get a discrepancy between sleep time and wall clock
time as these clocks run independently (starting with the fact that
you sleep for "at least" some interval), but that's a different problem.



Greetings,
-- 
Michael van Elst
Internet: mlel...@serpens.de
"A potential Snark may lurk in every tree."


Re: Weird clock behaviour with current (amd64) kernel

2022-08-13 Thread Michael van Elst
mlel...@serpens.de (Michael van Elst) writes:

>In your case, you say it takes ~6 minutes between attachment and
>calibration and your hpet runs at 19.2MHz.

>This is enough for HPET_MCOUNT_LO to overflow.



This patch adds a separate delay of ~0.1 seconds to calibrate
the timers. This should avoid any overflow.


Index: sys/dev/ic/hpet.c
===
RCS file: /cvsroot/src/sys/dev/ic/hpet.c,v
retrieving revision 1.17
diff -p -u -r1.17 hpet.c
--- sys/dev/ic/hpet.c   16 May 2020 23:06:40 -  1.17
+++ sys/dev/ic/hpet.c   13 Aug 2022 21:24:58 -
@@ -54,8 +54,6 @@ static u_int  hpet_get_timecount(struct t
 static boolhpet_resume(device_t, const pmf_qual_t *);
 
 static struct hpet_softc *hpet0 __read_mostly;
-static uint32_t hpet_attach_val;
-static uint64_t hpet_attach_tsc;
 
 int
 hpet_detach(device_t dv, int flags)
@@ -147,14 +145,6 @@ hpet_attach_subr(device_t dv)
eval = bus_space_read_4(sc->sc_memt, sc->sc_memh, HPET_MCOUNT_LO);
val = eval - sval;
sc->sc_adj = (int64_t)val * sc->sc_period / 1000;
-
-   /* Store attach-time values for computing TSC frequency later. */
-   if (cpu_hascounter() && sc == hpet0) {
-   (void)cpu_counter();
-   val = bus_space_read_4(sc->sc_memt, sc->sc_memh, 
HPET_MCOUNT_LO);
-   hpet_attach_tsc = cpu_counter();
-   hpet_attach_val = val;
-   }
 }
 
 static u_int
@@ -214,33 +204,37 @@ uint64_t
 hpet_tsc_freq(void)
 {
struct hpet_softc *sc;
-   uint64_t td, val, freq;
-   uint32_t hd;
+   uint64_t td0, td, val, freq;
+   uint32_t hd0, hd;
int s;
 
if (hpet0 == NULL || !cpu_hascounter())
return 0;
 
-   /* Slow down if we got here from attach in under 0.1s. */
sc = hpet0;
-   hd = bus_space_read_4(sc->sc_memt, sc->sc_memh, HPET_MCOUNT_LO);
-   hd -= hpet_attach_val;
-   if (hd < (uint64_t)10 * 10 / sc->sc_period)
-   hpet_delay(10);
+
+   s = splhigh();
+   (void)cpu_counter();
+   (void)bus_space_read_4(sc->sc_memt, sc->sc_memh, HPET_MCOUNT_LO);
+   hd0 = bus_space_read_4(sc->sc_memt, sc->sc_memh, HPET_MCOUNT_LO);
+   td0 = cpu_counter();
+   splx(s);
+
+   hpet_delay(10);
 
/*
 * Determine TSC freq by comparing how far the TSC and HPET have
-* advanced since attach time.  Take the cost of reading HPET
-* register into account and round result to the nearest 1000.
+* advanced and round result to the nearest 1000.
 */
s = splhigh();
(void)cpu_counter();
+   (void)bus_space_read_4(sc->sc_memt, sc->sc_memh, HPET_MCOUNT_LO);
hd = bus_space_read_4(sc->sc_memt, sc->sc_memh, HPET_MCOUNT_LO);
td = cpu_counter();
splx(s);
-   hd -= hpet_attach_val;
-   val = ((uint64_t)hd * sc->sc_period - sc->sc_adj) / 1;
-   freq = (td - hpet_attach_tsc) * 1000 / val;
+
+   val = (uint64_t)(hd - hd0) * sc->sc_period / 1;
+   freq = (td - td0) * 1000 / val;
return rounddown(freq + 500, 1000);
 }
 




Re: Weird clock behaviour with current (amd64) kernel

2022-08-13 Thread Michael van Elst
On Sun, Aug 14, 2022 at 02:38:07AM +0700, Robert Elz wrote:
> Date:Sat, 13 Aug 2022 17:41:05 +0200
> From:    Michael van Elst 
> Message-ID:  
> 
>   | If you boot the kernel in debug mode (netbsd -x),
> 
> I did.
> 
>   | you may see output like:
> 
> which was:
> 
> [ 1.03] cpu0: TSC freq CPUID 341760 Hz
> [ 1.03] cpu0: TSC freq from CPUID 341760 Hz
> [ 1.064451] xhci0: hcc2=0x1fd
> [ 1.064451] xhci3: hcc2=0xfd
> [ 1.064451] cpu0: TSC freq from HPET 9007294000 Hz
> [ 1.064451] cpu0: TSC freq CPUID 341760 Hz
> [ 1.064451] cpu0: TSC freq calibrated 9007294000 Hz


So it's the HPET calibration that goes wrong.


The calibration works like:

Fetch hpet and tsc at attach time.

(void)cpu_counter();
val = bus_space_read_4(sc->sc_memt, sc->sc_memh, HPET_MCOUNT_LO);
hpet_attach_tsc = cpu_counter();
hpet_attach_val = val;


When calibtrating, make sure that hpet has counted for at
least 0.1 seconds:

hd = bus_space_read_4(sc->sc_memt, sc->sc_memh, HPET_MCOUNT_LO);
hd -= hpet_attach_val;
if (hd < (uint64_t)10 * 10 / sc->sc_period)
hpet_delay(10);


Fetch hpet and tsc again

s = splhigh();
(void)cpu_counter();
hd = bus_space_read_4(sc->sc_memt, sc->sc_memh, HPET_MCOUNT_LO);
td = cpu_counter();
splx(s);


Compute tsc frequency from hpet frequency.

hd -= hpet_attach_val;
val = ((uint64_t)hd * sc->sc_period - sc->sc_adj) / 1;
freq = (td - hpet_attach_tsc) * 1000 / val;
return rounddown(freq + 500, 1000);


In your case, you say it takes ~6 minutes between attachment and
calibration and your hpet runs at 19.2MHz.

This is enough for HPET_MCOUNT_LO to overflow.


Greetings,
-- 
Michael van Elst
Internet: mlel...@serpens.de
"A potential Snark may lurk in every tree."



Re: Weird clock behaviour with current (amd64) kernel

2022-08-13 Thread Michael van Elst
On Sat, Aug 13, 2022 at 10:15:30PM +0700, Robert Elz wrote:
> 
> The result:  "not much" if anything at all.

If you boot the kernel in debug mode (netbsd -x), you may see
output like:


[ 1.03] cpu0: TSC freq from delay 2521276800 Hz

maybe also something like:
[ ] cpu0: TSC freq from CPUID XX Hz

[ 1.057594] cpu0: TSC freq from HPET 2491906000 Hz
[ 1.957885] cpu1: TSC skew=8 drift=0
[ 2.014612] cpu2: TSC skew=34 drift=4
[ 2.181611] cpu3: TSC skew=34 drift=4
[ 2.291306] timecounter: Timecounter "TSC" frequency 2491906000 Hz quality 
3000


"from delay" is the first calibration against the i8254 timer.
"from CPUID" is a value that the CPU reports.
"from HPET" is a second calibration against the HPET timer.


The patch should improve the accuracy of the "from delay" value. It's
also the only place that could have been influenced by e.g. console
output.

If you have a working HPET, the second calibration should be better.
Here it always returns exactly the same number.


Greetings,
-- 
Michael van Elst
Internet: mlel...@serpens.de
"A potential Snark may lurk in every tree."


Re: specfs/spec_vnops.c diagnostic assertion panic

2022-08-13 Thread Michael van Elst
k...@munnari.oz.au (Robert Elz) writes:

>vpanic()
>kern_assert()
>_bus_dmamem_unmap.constprop.0() at +0x157

That panic should be fixed by now, it was an inverted assertion.



Re: Weird clock behaviour with current (amd64) kernel

2022-08-07 Thread Michael van Elst
k...@munnari.oz.au (Robert Elz) writes:

>Date:Thu, 4 Aug 2022 12:49:35 - (UTC)
>From:mlel...@serpens.de (Michael van Elst)
>Message-ID:  

>  | The measurement runs with enabled interrupts. If you have lots of 
> interrupts
>  | or interrupts that take some time, the measurement is biased.
>  |
>  | Console output can do this.

>That is what I suspected.   A normal boot dmesg on this system is about
>50KB.   With PCI_CONFIG_DUMP it is about 1MB (just a bit over).  That's
>a lot of work for wscons scrolling (via the BIOS the whole time - the
>dump all happens before the console switches to graphics mode) a fairly
>large screen.


Does this help ?

Index: sys/arch/x86/x86/cpu.c
===
RCS file: /cvsroot/src/sys/arch/x86/x86/cpu.c,v
retrieving revision 1.203
diff -p -u -r1.203 cpu.c
--- sys/arch/x86/x86/cpu.c  1 Apr 2022 19:57:22 -   1.203
+++ sys/arch/x86/x86/cpu.c  7 Aug 2022 09:17:12 -
@@ -1336,9 +1336,16 @@ cpu_get_tsc_freq(struct cpu_info *ci)
 */
if (ci->ci_data.cpu_cc_freq == 0)
freq = freq_from_cpuid = cpu_tsc_freq_cpuid(ci);
+   if (freq != 0)
+   aprint_debug_dev(ci->ci_dev, "TSC freq "
+   "from CPUID %" PRIu64 " Hz\n", freq);
 #if NHPET > 0
-   if (freq == 0)
+   if (freq == 0) {
freq = hpet_tsc_freq();
+   if (freq != 0)
+   aprint_debug_dev(ci->ci_dev, "TSC freq "
+   "from HPET %" PRIu64 " Hz\n", freq);
+   }
 #endif
if (freq == 0) {
/*
@@ -1348,20 +1355,33 @@ cpu_get_tsc_freq(struct cpu_info *ci)
 */
overhead = 0;
for (int i = 0; i <= 8; i++) {
+   const int s = splhigh();
t0 = cpu_counter();
delay_func(0);
t1 = cpu_counter();
+   splx(s);
if (i > 0) {
overhead += (t1 - t0);
}
}
overhead >>= 3;
 
-   /* Now do the calibration. */
-   t0 = cpu_counter();
-   delay_func(10);
-   t1 = cpu_counter();
-   freq = (t1 - t0 - overhead) * 10;
+   /*
+* Now do the calibration.
+*/
+   freq = 0;
+   for (int i = 0; i < 1000; i++) {
+   const int s = splhigh();
+   t0 = cpu_counter();
+   delay_func(100);
+   t1 = cpu_counter();
+   splx(s);
+   freq += t1 - t0 - overhead;
+   }
+   freq = freq * 10;
+
+   aprint_debug_dev(ci->ci_dev, "TSC freq "
+   "from delay %" PRIu64 " Hz\n", freq);
}
if (ci->ci_data.cpu_cc_freq != 0) {
freq_from_cpuid = cpu_tsc_freq_cpuid(ci);




Re: Weird clock behaviour with current (amd64) kernel

2022-08-04 Thread Michael van Elst
k...@munnari.oz.au (Robert Elz) writes:

>The issue only occurs when I boot a kernel with options PCI_CONFIG_DUMP
>enabled, which is (one way or the other) almost certainly responsible for
>the issue - though the problem you mention may be involved in the kernel's
>failure to detect that the TSC measurement is being messed up by that option.

The measurement runs with enabled interrupts. If you have lots of interrupts
or interrupts that take some time, the measurement is biased.

Console output can do this.



Re: Weird clock behaviour with current (amd64) kernel

2022-08-04 Thread Michael van Elst
dholland-curr...@netbsd.org (David Holland) writes:

>On Thu, Jul 14, 2022 at 08:59:25PM +0700, Robert Elz wrote:
> > I just booted a kernel that I built (from up to date at the time)
> > HEAD sources about 24 hours ago.
> > 
> > Everything seemed to be working fine - until I noticed that all of
> > my clocks (there are several, gkrellm, window manager, a dclock,
> > and an xtu) were all wildly wrong (as in, were moving time forwards
> > incredibly slowly).

>Probably not related, but see PR 56322. I have a machine where the TSC
>is apparently bad, and somewhere in -current a bit more than a year
>ago we stopped detecting that during boot, with negative consequences.


The bad TSC is probably a known erratum that we don't (yet) check for.



Re: Weird clock behaviour with current (amd64) kernel

2022-07-16 Thread Michael van Elst
r...@sdf.org (RVP) writes:

>Unsurprisingly, EFI also has a colour-index similar to VGA (see:
>/usr/src/sys/external/bsd/gnu-efi/dist/inc/eficon.h). I tried fixing the
>indexes like this, but, it doesn't for some (autoconfig?) reason. Can
>only look into this after I come back from my road-trip.

That color index is used by text mode, but booting from EFI uses
a graphics framebuffer (nowadays mostly 24bit or 32bit per pixel).


But all this color shift is unrelated to the color indexes, but
how the framebuffer pixels are organized.


The early console code has no information about byte order in
the framebuffer. rasops then initializes e.g. for 32bit pixels:

if (ri->ri_rnum == 0) {
ri->ri_rnum = ri->ri_gnum = ri->ri_bnum = 8;

ri->ri_rpos = 0;
ri->ri_gpos = 8;
ri->ri_bpos = 16;
}

which is 0x00BBGGRR.

When genfb actually attaches it carries information about
the byte ordering and rasops gets initialized with the
right values.

For the green color it doesn't matter if the order is BGR or RGB.
For cyan, the wrong order gives "brown" which is a dark yellow.



Re: Weird clock behaviour with current (amd64) kernel

2022-07-14 Thread Michael van Elst
k...@munnari.oz.au (Robert Elz) writes:

>  | Heh. It's not just Cyan/Yellow; Red and Blue are swapped too, because:
>  |
>  | /usr/src/sys/dev/wscons/wsdisplayvar.h and
>  | /usr/src/sys/dev/ic/pcdisplay.h have different values for those colour=
>s.

>If that is all it is, it is barely worth fixing ... though this
>must have happened sometime in the 9.99.9[78] series (sometime
>after early last Dec) - up to then I was building and running
>custom cyan text console systems (I kept building after that but
>didn't boot them... or not on real hardware) I have been mostly
>running GENERIC (green, which seems unaffected, or everyone would
>be noticing) since then, until very recently...  The yellow was
>just a quirk I didn't bother mentioning until I had another reason
>to send a (semi-related) message.


wsdisplayvar.h has ANSI color codes.
pcdisplay.h has VGA color codes.

wscons drivers need to interpret ANSI color codes in their allocattr
function. The VGA driver translates between ANSI and VGA codes
(dev/ic/vga.c, see the fgansitopc/bgansitopc tables). 24bit
framebuffers using rasops use a colormap (dev/rasops/rasops.c, see
the rasops_cmap).

Whatever driver you use either doesn't translate correctly or badly
assumes some hardware configuration (e.g. color palette) when booting.

Does the color shift also happen after a cold boot ?



Re: i386/amd64 image generated trough mkimage stuck on primary bootsrap at boot

2022-07-10 Thread Michael van Elst
m...@eterna.com.au (matthew green) writes:

>FWIW, i've been using 64K block *and frag size FFS for over
>a decade without any problem, on a file system that almost
>always has extremely large files on it.

>so, this should be fixed in the manual i guess.

The manual just lists the default values as is and does
not mention an upper limit.



Re: iscsi target on a zfs zvol?

2022-07-10 Thread Michael van Elst
ha...@espresso.rhein-neckar.de (Hauke Fath) writes:

>Jul 10 22:56:18 pizza istgt[9108]:=20
>istgt_iscsi.c:4165:istgt_iscsi_op_nopout: ***ERROR*** CmdSN(24873146)=20
>error ExpCmdSN=3D24873145=20
>Jul 10 22:56:18 pizza istgt[9108]:=20
>istgt_iscsi.c:5045:istgt_iscsi_execute: ***ERROR*** iscsi_op_nopout()=20
>failed=20
>Jul 10 22:56:18 pizza istgt[9108]: istgt_iscsi.c:5731:worker:=20
>***ERROR*** iscsi_execute() failed on=20
>iqn.2007-09.jp.ne.peach.istgt:disk1,t,0x0001(iqn.20

>(initiator is daemon-tools.cc).

That is at least something completely different, an ISCSI protocol error.



Re: iscsi target on a zfs zvol?

2022-07-10 Thread Michael van Elst
ha...@espresso.rhein-neckar.de (Hauke Fath) writes:

>I would like to set up an iscsi target backed by a zfs zvol, to serve=20
>as a Mac time machine volume.

Independent of your problem you should use 'istgt' from pkgsrc.


>863:/u/sources/netbsd-developer/src/external/bsd/iscsi/lib/../dist/src/lib/=
>disk.c:720:=20
>***ERROR*** error reading "target0"
># hexdump -C -n 512 /dev/zvol/rdsk/tank/time_machine_1=20
>  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00 =20
>at which point
>DISK: LUN 0: 2097152 MB disk storage for "target0"


>Looks like an initialization issue on behalf of iscsi-target. Does that=20
>ring a bell for anyone?

Probably unrelated to ZFS, this reminds me of:

# stat -s /dev/rvnd0a
st_dev=43010 st_ino=65103 st_mode=020640 st_nlink=1 st_uid=0 st_gid=5 
st_rdev=10496 st_size=0 st_atime=1590051866 st_mtime=1590051866 
st_ctime=1590051866 st_birthtime=1590051866 st_blksize=65536 st_blocks=0 
st_flags=0

# hexdump -C /dev/rvnd0a | head -1
  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |

# stat -s /dev/rvnd0a
st_dev=43010 st_ino=65103 st_mode=020640 st_nlink=1 st_uid=0 st_gid=5 
st_rdev=10496 st_size=10485760 st_atime=1657481715 st_mtime=1590051866 
st_ctime=1590051866 st_birthtime=1590051866 st_blksize=65536 st_blocks=0 
st_flags=0

Before opening the device with hexdump, the st_size field is 0.

The size is cached in the vnode, but only determined when you open the
device. The information gets lost again when the vnode is expired,
which only happens when there is a memory shortage.



Re: i386/amd64 image generated trough mkimage stuck on primary bootsrap at boot

2022-07-10 Thread Michael van Elst
r...@sdf.org (RVP) writes:

>@@ -255,7 +255,7 @@
>   echo ${bar} Populating ffs filesystem ${bar}
>   ${MAKEFS} -rx ${endian} -N ${release}/etc -t ffs \
>   -O ${ffsoffset} \
>-  -o d=4096,f=8192,b=65536 -b $((${extra}))m \
>+  -o d=8192,f=2048,b=16384 -b $((${extra}))m \
>   -F "$tmp/selected_sets" ${image} "${release}" "${mnt}"


Sounds like the disklabel is incorrect then. FFS requires that
the fragment size (not so much the blocksize) is correct, but the
scripts seem to be inconsistent.

N.B. unset fsize (== 0) defaults to fsize = BLKDEV_IOSIZE (== 2048).



Re: raidframeparity and /etc/defaults/rc.conf

2022-07-03 Thread Michael van Elst
k...@munnari.oz.au (Robert Elz) writes:

>Does someone know of a reason for a setting for the rc.conf
>(rc.d/*) variable raidframeparity to be omitted from rc.defaults/rc.conf ?

>To me that looks like an oversight.

raidframeparity has no rcvar switch, it's always started.



Re: pgdaemon high CPU consumption

2022-07-01 Thread Michael van Elst
m...@petermann-it.de (Matthias Petermann) writes:

>since some time I noticed that on several of my systems with=20
>NetBSD/amd64 9.99.97/98 after longer usage the kernel process pgdaemon=20
>completely claims a CPU core for itself, i.e. constantly consumes 100%.
>The affected systems do not have a shortage of RAM and the problem does=20
>not disappear even if all workloads are stopped, and thus no RAM is=20
>actually used by application processes.

There is a shortage, either free RAM pages or kernel address space.

The page daemon gets triggered, but if it cannot resolve the situation
it will just spin until it succeeds.


>I noticed this especially in connection with accesses to the ZFS set up=20
>on the respective machines - for example after checkout from the local=20
>CVS relic hosted on ZFS.

Resource exhaustion could be caused by ZFS, but also something else.

If you can still operate the system, a common workaround is to
reduce kern.maxvnodes with sysctl (and bump it up later).

If the system is not responding but you can enter DDB, setting
the kernel variable desiredvnodes does the same.



Re: kernel deadlock on fstchg with vnd

2022-05-31 Thread Michael van Elst
campbell+netbsd-current-us...@mumble.net (Taylor R Campbell) writes:

>This would need to be integrated into autoconf/specfs to avoid races
>in config_pseudo_spawn, sc->sc_configured, and config_pseudo_unspawn,

vnd is peculiar in that it spawns a unit whenever it is opened and
removes it again when it is closed.

VNDIOCSET for the same unit marks it VNF_INITED on success, and then
it persists.

VNDIOCSET for a different unit spawns that unit first and then marks
it VNF_INITED.

It would be simpler if there were a static dedicated control unit that
is used only for creating / purging other units.



Re: kernel deadlock on fstchg with vnd

2022-05-29 Thread Michael van Elst
bou...@antioche.eu.org (Manuel Bouyer) writes:

>Hello,
>do you have an idea on the problem in this thread:
>http://mail-index.netbsd.org/port-xen/2022/05/27/msg010213.html
[...]
>I can't reproduce this when using vnd from userland.

You can replicate it by addressing the block device with vnconfig.

A workaround would be to modify the Xen block script to select the
raw device:

vnconfig /dev/r${disk}d $xparams >/dev/null; then

or just the disk name:

vnconfig ${disk} $xparams >/dev/null; then



Re: Radeon HD 5450?

2022-05-11 Thread Michael van Elst
k...@munnari.oz.au (Robert Elz) writes:

>My suspicion when I first saw this was that in legacy mode
>the BIOS is doing some console graphics init (and leaving it
>that way) which NetBSD is depending upon, which at least
>some firmware either does not do at all, or undoes, or
>fails to pass along to the OS, or similar, in EFI mode.

Would be interesting to see what happens, when in EFI mode you
switch to the graphics mode with the gop command before booting
the kernel



Re: Configure Serial Adapter

2022-05-11 Thread Michael van Elst
rnes...@mac.com (Robert Nestor) writes:

>Following Michael=92s advice I added this line to sys/dev/usb/usbdevs =
>where other Prolific devices were defined:

>   product PROLIFIC PL2303Y 0x23c3 PL2303 Serial adapter (Null =
>modem)


I've now added a few models to usbdevs...



>I regenerated the header files and added these lines to the device table =
>in sys/dev/usb/uplcom.c:

>/* Prolific USB to serial null modem cable */
>{ USB_VENDOR_PROLIFIC, USB_PRODUCT_PROLIFIC_PL2303Y },

>Rebuilt the GENERIC kernel and copied to to / and rebooted my system.  I =
>get this result:

>[ 4.235870] uplcom0 at uhub1 port 1
>[ 4.235870] uplcom0: Prolific Technology Inc. (0x67b) USB-Serial =
>Controller (0x23c3), rev 2.00/3.05, addr 1
>[ 4.245877] uplcom0: autoconfiguration error: reset failed, NOMEM

>Did I miss something that is leading to this error?=20


No. A failed reset suggests that the chip isn't compatible to an old
PL2303. But NOMEM suggests, that this is a generic issue in the stack.
Does any other USB device work on that port ? Maybe a reboot helps.



Re: Configure Serial Adapter

2022-05-10 Thread Michael van Elst
rnes...@mac.com (Robert Nestor) writes:

>capture system).  The USB-Serial device shows up on boot on the capture =
>system as:

>ugen0: Prolific Technology Inc. (0x67b) USB-Serial Controller (0x23c3), =
>rev 2.00/3.05, addr 1

Means, it doesn't really show up.

>I=92m thinking I need to build a new kernel to get it to show up as a =
>com device before I can capture the serial input.  Can someone give me =
>the lines that need to be include in the GENERIC config file to =
>accomplish this?


The driver has a vendor/product list builtin, and that device doesn't
match anything in that list.

You should
- add this to sys/dev/usb/usbdevs
- Regen the header files with 'make -f Makefile.usbdevs TOOL_AWK=awk'
- add the device to the table in sys/dev/usb/uplcom.c


>And since I=92ve never done this before can someone tell me what I need =
>to do on the test system to direct console output over the serial and =
>what need to do on the capture system to get the results?

That depends a bit on the system. For a traditional PC booting with BIOS
you'd use 'fdisk -c /usr/mdec/mbr_com0' to install a bootblock for
serial console.


Greetings,


Re: cmake hang solution?

2022-05-09 Thread Michael van Elst
mlel...@serpens.de (Michael van Elst) writes:

>I'm currently testing:
>Index: lib/libpthread/pthread.c
>===
>RCS file: /cvsroot/src/lib/libpthread/pthread.c,v
>retrieving revision 1.153.2.1
>diff -p -u -r1.153.2.1 pthread.c
>--- lib/libpthread/pthread.c26 Jan 2020 10:55:16 -  1.153.2.1
>+++ lib/libpthread/pthread.c3 May 2022 09:22:58 -
>@@ -430,6 +430,8 @@ pthread_create(pthread_t *thread, const 
>   * only be one thread before it becomes true.
>   */
>   if (pthread__started == 0) {
>+  _lwp_park(CLOCK_REALTIME, 0, NULL,
>+  pthread__self()->pt_lid, NULL, NULL);
>   pthread__start();
>   pthread__started = 1;
>   }


With this patch there was so far no cmake/guile type hang in several
pbulk runs.

The samba / python hangup seems to be caused by something else.



Re: panic: wapbl transaction too big to flush

2022-05-04 Thread Michael van Elst
w...@netbsd.org (Thomas Klausner) writes:

>With a quite recent 9.99.96/amd64 kernel, when deleting ~800GB of data =
>on an ffs2 on a mostly empty 8TB file system, I just saw this panic:

>panic: kernel diagnostic assertion =E2=80=9C(wapbl_transaction_len(wl) =
><=3D (wl->wl_circ_size - wl->wl_reserved_bytes))=E2=80=9D failed: file =
>..sys/kern/vfs/vfs_wapbl.c=E2=80=9D, line 1265 wapbl_end: current =
>transaction too big to flush

See kern/54504.





Re: cmake hang solution?

2022-05-03 Thread Michael van Elst
mlel...@serpens.de (Michael van Elst) writes:

>c...@chuq.com (Chuck Silvers) writes:

>>> would this apply to netbsd-9 too ? The hang I'm seeing is on a system
>>> with a HEAD kernel and a netbsd-9 userland 

>>it looks like the diff won't apply as-is, but I think the concept still 
>>applies.

>I'm currently testing:


Not really successful. I got a a hang with all but one threads
in park and one waiting for kqueue.

Parked threads look like:

#0  0x7d0ce8ca220a in ___lwp_park60 () from /usr/lib/libc.so.12
#1  0x7d0ce9e0addf in pthread_cond_timedwait () from 
/usr/lib/libpthread.so.1
#2  0x7d0ce9aa1412 in 
std::condition_variable::wait(std::unique_lock&) () from 
/usr/lib/libstdc++.so.9
#3  0xa48f7685 in cmWorkerPoolInternal::Work(unsigned int) ()
#4  0x7d0ce9a9f5aa in ?? () from /usr/lib/libstdc++.so.9
#5  0x7d0ce9e0c757 in ?? () from /usr/lib/libpthread.so.1
#6  0x7d0ce8c87e10 in ?? () from /usr/lib/libc.so.12
#7  0x in ?? ()

except for one:

#0  0x7d0ce8ca220a in ___lwp_park60 () from /usr/lib/libc.so.12
#1  0x7d0ce9e0addf in pthread_cond_timedwait () from 
/usr/lib/libpthread.so.1
#2  0x7d0ce9aa1412 in 
std::condition_variable::wait(std::unique_lock&) () from 
/usr/lib/libstdc++.so.9
#3  0xa48f812e in 
cmWorkerPoolWorker::RunProcess(cmWorkerPool::ProcessResultT&, 
std::vector, 
std::allocator >, std::allocator, std::allocator > > > const&, 
std::__cxx11::basic_string, std::allocator > 
const&) ()
#4  0xa489731c in (anonymous 
namespace)::cmQtAutoMocUicT::JobT::RunProcess(cmQtAutoGen::GenT, 
cmWorkerPool::ProcessResultT&, std::vector, std::allocator >, 
std::allocator, 
std::allocator > > > const&, std::__cxx11::basic_string, std::allocator >*) ()
#5  0xa48aadc8 in (anonymous 
namespace)::cmQtAutoMocUicT::JobCompileMocT::Process() ()
#6  0xa48f75b6 in cmWorkerPoolInternal::Work(unsigned int) ()
#7  0x7d0ce9a9f5aa in ?? () from /usr/lib/libstdc++.so.9
#8  0x7d0ce9e0c757 in ?? () from /usr/lib/libpthread.so.1
#9  0x7d0ce8c87e10 in ?? () from /usr/lib/libc.so.12
#10 0x in ?? ()

The kqueue wait looks like:

#0  0x7d0ce8c42e6a in _sys___kevent50 () from /usr/lib/libc.so.12
#1  0x7d0ce9e07cb9 in __kevent50 () from /usr/lib/libpthread.so.1
#2  0x7d0ceac1c650 in uv.io_poll () from /usr/pkg/lib/libuv.so.1
#3  0x7d0ceac0de48 in uv_run () from /usr/pkg/lib/libuv.so.1
#4  0xa48f70a8 in cmWorkerPoolInternal::Process() ()
#5  0xa48f7102 in cmWorkerPool::Process(void*) ()
#6  0xa48ae2bf in (anonymous namespace)::cmQtAutoMocUicT::Process() ()
#7  0xa4b8b2ea in cmQtAutoGenerator::Run(std::basic_string_view >, std::basic_string_view 
>) ()
#8  0xa489b150 in cmQtAutoMocUic(std::basic_string_view >, std::basic_string_view 
>) ()
#9  0xa4838c3c in 
cmcmd::ExecuteCMakeCommand(std::vector, std::allocator >, 
std::allocator, 
std::allocator > > > const&, std::unique_ptr >) ()
#10 0xa4c3c6bc in main ()




Re: cmake hang solution?

2022-05-03 Thread Michael van Elst
c...@chuq.com (Chuck Silvers) writes:

>> would this apply to netbsd-9 too ? The hang I'm seeing is on a system
>> with a HEAD kernel and a netbsd-9 userland 

>it looks like the diff won't apply as-is, but I think the concept still 
>applies.

I'm currently testing:

Index: lib/libpthread/pthread.c
===
RCS file: /cvsroot/src/lib/libpthread/pthread.c,v
retrieving revision 1.153.2.1
diff -p -u -r1.153.2.1 pthread.c
--- lib/libpthread/pthread.c26 Jan 2020 10:55:16 -  1.153.2.1
+++ lib/libpthread/pthread.c3 May 2022 09:22:58 -
@@ -430,6 +430,8 @@ pthread_create(pthread_t *thread, const 
* only be one thread before it becomes true.
*/
if (pthread__started == 0) {
+   _lwp_park(CLOCK_REALTIME, 0, NULL,
+   pthread__self()->pt_lid, NULL, NULL);
pthread__start();
pthread__started = 1;
}




Re: uvideo uvm_fault panic

2022-05-02 Thread Michael van Elst
sc.dy...@gmail.com writes:

>+uvideo: truncated CS subtype-0x7 descriptor, length 30 < 38uvideo: 
>unimplemented VS CS descriptor len=30 type=0x24 subtype=0x07

>bLength30
>bDescriptorType36
>bDescriptorSubtype  7 (FRAME_MJPEG)
>bFrameIndex 1
>bmCapabilities   0x01
>  Still image supported
>wWidth   1280
>wHeight   720
>dwMinBitRate442368000
>dwMaxBitRate442368000
>dwMaxVideoFrameBufferSize 1843200
>dwDefaultFrameInterval 33
>bFrameIntervalType  1
>dwFrameInterval( 0)33


The descriptors are pretty ugly to parse and the sanity checks
added are neither correct nor sufficient.

In this case, there is a

typedef union {
uvideo_frame_interval_continuous_t  continuous;  
uvideo_frame_interval_discrete_tdiscrete;
} uvideo_frame_interval_t;

as a last element where uvideo_frame_interval_discrete_t is even
a variable length array, and all depends on the bFrameIntervalType.

The descriptor isn't padded for all possible types like a C union,
so validating against a sizeof doesn't work.


The result of course is that the formats are considered invalid
and there is no default format which the open routine assumes
to be valid without further checks...



Re: cmake hang solution?

2022-05-02 Thread Michael van Elst
On Sun, May 01, 2022 at 01:24:01PM -0700, Chuck Silvers wrote:
> On Tue, Apr 05, 2022 at 02:10:36PM -0000, Michael van Elst wrote:
> > I see both in almost every pbulk run.
> 
> please try this patch for the cmake variation of this hang:
> 
> http://www.netbsd.org/~chs/diff.pthread-park-stuck.1

The bulk builds use the latest release, i.e netbsd-9, but that
patch is for -current. Do you think that netbsd-9 has the same
issue and the patch could be reworked for the older code ?


Greetings,
-- 
            Michael van Elst
Internet: mlel...@serpens.de
"A potential Snark may lurk in every tree."


Re: Stable names for USB serial adapters

2022-05-01 Thread Michael van Elst
m...@eterna.com.au (matthew green) writes:

>this works great!  if i have serialnumbers in my ucoms :-(
>out of 20 devices, i have 3 with serial numbers, leaving me
>with 22 ucoms without a stable name (5 dual port devices.)

>tempted to suggest we include something like this in src,

This here works with a few enhancements to the USB stack:

lrwxr-xr-x  1 root  wheel  10 May  1 10:12 ttyU.1.4.1@ -> /dev/ttyU0
lrwxr-xr-x  1 root  wheel  10 May  1 10:12 ttyU.1.4.2@ -> /dev/ttyU1
lrwxr-xr-x  1 root  wheel  10 May  1 10:12 ttyU.1.4.3@ -> /dev/ttyU2

root hub Port 1
hub Port 4
device Port 1 / 2 / 3

Still not sufficient if you have multiple controllers and therefore
multiple root hubs.

You can prepend the bus name like:

lrwxr-xr-x  1 root  wheel  10 May  1 10:56 ttyU.usb1.1.4.1@ -> /dev/ttyU0
lrwxr-xr-x  1 root  wheel  10 May  1 10:56 ttyU.usb1.1.4.2@ -> /dev/ttyU1
lrwxr-xr-x  1 root  wheel  10 May  1 10:56 ttyU.usb1.1.4.3@ -> /dev/ttyU2

but the bus isn't necessarily stable once you get hot-plug USB controllers.

There should be a generic concept of a device path for every bus.



  1   2   3   4   >