Re: pkg server for current/arm64 stopped ? [main-armv7 on ampere2, elapsed so far: 651:21:56]

2024-04-29 Thread Mark Millard



On Apr 29, 2024, at 20:11, Mark Millard  wrote:

> On Apr 29, 2024, at 19:54, Mark Millard  wrote:
> 
>> On Apr 28, 2024, at 18:06, Philip Paeps  wrote:
>> 
>>> On 2024-04-18 23:14:22 (+0800), Mark Millard wrote:
 On Apr 18, 2024, at 08:02, Mark Millard  wrote:
> void  wrote on
> Date: Thu, 18 Apr 2024 14:08:36 UTC :
> 
>> Not sure where to post this..
>> 
>> The last bulk build for arm64 appears to have happened around
>> mid-March on ampere2. Is it broken?
> 
> main-armv7 building is broken and the last completed build
> was the one started on Mon, 19 Feb 2024 12:32:10 GMT. It
> gets stuck making no progress until manually forced to stop,
> which leads to huge elapsed times for the incomplete builds:
> 
> [...]
> 
> My guess is that FreeBSD has something that broken after bd45bbe440
> that was broken as of f5f08e41aa and was still broken at 75464941dc .
> 
 
 One thing of possible note:
 
 Failing . . .
 
 Host OSVERSION: 156
 Jail OSVERSION: 1500014
>>> 
>>> I have finished a package builder refresh this morning.  All our builder 
>>> hosts (except PowerPC - I don't touch those) are now on 
>>> main-n269671-feabaf8d5389 (OSVERSION 1500018).
>>> 
>>> ampere1 successfully finished its 140releng-armv7-quarterly build, so it 
>>> looks like the problem with stuck builds was limited to ampere2 building 
>>> main-armv7.  I'll keep a close eye on this one when it starts its next 
>>> build.
>>> 
>> 
>> I see that main-armv7 started.
>> 
>> It queued only 31935 instead of the prior 34528 (or more): it is doing an
>> incremental build instead of a full build. For example, pkg was not built
>> but instead the prior build is in use. Thus bad results from the prior
>> build might be involved in this new build.
>> 
>> I'd recommend forcing a full "poudriere bulk -c -a" that does a from-scratch
>> build for the purposes of the main-armv7 test.
> 
> Actually the test is not going to previde the information we are
> after as things are.
> 
> giflib-5.2.2 failed to build, which leads to devel/doxygen being
> skipped. devel/doxygen was the first one to hang up in the prior
> 2 failing attempts, if I remember right.
> 
> giflib-5.2.2 also causes graphics/graphviz to be skipped.
> graphics/graphviz was installed just before the hangup in all of
> the example hanups. So the context will not be replicated.
> 
> We need graphics/giflib to build to actually do the test.

Looks like:

https://cgit.freebsd.org/ports/commit/graphics/giflib?id=5007109903fc271e3ef0ba01d78781c1fed99f3f

is the fix for the graphic/giflib build failure.

===
Mark Millard
marklmi at yahoo.com




Re: pkg server for current/arm64 stopped ? [main-armv7 on ampere2, elapsed so far: 651:21:56]

2024-04-29 Thread Mark Millard
On Apr 29, 2024, at 19:54, Mark Millard  wrote:

> On Apr 28, 2024, at 18:06, Philip Paeps  wrote:
> 
>> On 2024-04-18 23:14:22 (+0800), Mark Millard wrote:
>>> On Apr 18, 2024, at 08:02, Mark Millard  wrote:
 void  wrote on
 Date: Thu, 18 Apr 2024 14:08:36 UTC :
 
> Not sure where to post this..
> 
> The last bulk build for arm64 appears to have happened around
> mid-March on ampere2. Is it broken?
 
 main-armv7 building is broken and the last completed build
 was the one started on Mon, 19 Feb 2024 12:32:10 GMT. It
 gets stuck making no progress until manually forced to stop,
 which leads to huge elapsed times for the incomplete builds:
 
 [...]
 
 My guess is that FreeBSD has something that broken after bd45bbe440
 that was broken as of f5f08e41aa and was still broken at 75464941dc .
 
>>> 
>>> One thing of possible note:
>>> 
>>> Failing . . .
>>> 
>>> Host OSVERSION: 156
>>> Jail OSVERSION: 1500014
>> 
>> I have finished a package builder refresh this morning.  All our builder 
>> hosts (except PowerPC - I don't touch those) are now on 
>> main-n269671-feabaf8d5389 (OSVERSION 1500018).
>> 
>> ampere1 successfully finished its 140releng-armv7-quarterly build, so it 
>> looks like the problem with stuck builds was limited to ampere2 building 
>> main-armv7.  I'll keep a close eye on this one when it starts its next build.
>> 
> 
> I see that main-armv7 started.
> 
> It queued only 31935 instead of the prior 34528 (or more): it is doing an
> incremental build instead of a full build. For example, pkg was not built
> but instead the prior build is in use. Thus bad results from the prior
> build might be involved in this new build.
> 
> I'd recommend forcing a full "poudriere bulk -c -a" that does a from-scratch
> build for the purposes of the main-armv7 test.

Actually the test is not going to previde the information we are
after as things are.

giflib-5.2.2 failed to build, which leads to devel/doxygen being
skipped. devel/doxygen was the first one to hang up in the prior
2 failing attempts, if I remember right.

giflib-5.2.2 also causes graphics/graphviz to be skipped.
graphics/graphviz was installed just before the hangup in all of
the example hanups. So the context will not be replicated.

We need graphics/giflib to build to actually do the test.


===
Mark Millard
marklmi at yahoo.com




Re: pkg server for current/arm64 stopped ? [main-armv7 on ampere2, elapsed so far: 651:21:56]

2024-04-29 Thread Mark Millard
On Apr 28, 2024, at 18:06, Philip Paeps  wrote:

> On 2024-04-18 23:14:22 (+0800), Mark Millard wrote:
>> On Apr 18, 2024, at 08:02, Mark Millard  wrote:
>>> void  wrote on
>>> Date: Thu, 18 Apr 2024 14:08:36 UTC :
>>> 
 Not sure where to post this..
 
 The last bulk build for arm64 appears to have happened around
 mid-March on ampere2. Is it broken?
>>> 
>>> main-armv7 building is broken and the last completed build
>>> was the one started on Mon, 19 Feb 2024 12:32:10 GMT. It
>>> gets stuck making no progress until manually forced to stop,
>>> which leads to huge elapsed times for the incomplete builds:
>>> 
>>> [...]
>>> 
>>> My guess is that FreeBSD has something that broken after bd45bbe440
>>> that was broken as of f5f08e41aa and was still broken at 75464941dc .
>>> 
>> 
>> One thing of possible note:
>> 
>> Failing . . .
>> 
>> Host OSVERSION: 156
>> Jail OSVERSION: 1500014
> 
> I have finished a package builder refresh this morning.  All our builder 
> hosts (except PowerPC - I don't touch those) are now on 
> main-n269671-feabaf8d5389 (OSVERSION 1500018).
> 
> ampere1 successfully finished its 140releng-armv7-quarterly build, so it 
> looks like the problem with stuck builds was limited to ampere2 building 
> main-armv7.  I'll keep a close eye on this one when it starts its next build.
> 

I see that main-armv7 started.

It queued only 31935 instead of the prior 34528 (or more): it is doing an
incremental build instead of a full build. For example, pkg was not built
but instead the prior build is in use. Thus bad results from the prior
build might be involved in this new build.

I'd recommend forcing a full "poudriere bulk -c -a" that does a from-scratch
build for the purposes of the main-armv7 test.

===
Mark Millard
marklmi at yahoo.com




Request for non-GENERIC kernel diff

2024-04-29 Thread Vladislav V. Prodan
Hello!

Who uses their own kernel config, please share the diff between the 14 and 13.x 
branches

Thanks.

--
 Vladislav V. Prodan
 System & Network Administrator
 support.od.ua



Re: pkg server for current/arm64 stopped ? [main-armv7 on ampere2, elapsed so far: 651:21:56]

2024-04-28 Thread Philip Paeps

On 2024-04-18 23:14:22 (+0800), Mark Millard wrote:

On Apr 18, 2024, at 08:02, Mark Millard  wrote:

void  wrote on
Date: Thu, 18 Apr 2024 14:08:36 UTC :


Not sure where to post this..

The last bulk build for arm64 appears to have happened around
mid-March on ampere2. Is it broken?


main-armv7 building is broken and the last completed build
was the one started on Mon, 19 Feb 2024 12:32:10 GMT. It
gets stuck making no progress until manually forced to stop,
which leads to huge elapsed times for the incomplete builds:

[...]

My guess is that FreeBSD has something that broken after bd45bbe440
that was broken as of f5f08e41aa and was still broken at 75464941dc .



One thing of possible note:

Failing . . .

Host OSVERSION: 156
Jail OSVERSION: 1500014


I have finished a package builder refresh this morning.  All our builder 
hosts (except PowerPC - I don't touch those) are now on 
main-n269671-feabaf8d5389 (OSVERSION 1500018).


ampere1 successfully finished its 140releng-armv7-quarterly build, so it 
looks like the problem with stuck builds was limited to ampere2 building 
main-armv7.  I'll keep a close eye on this one when it starts its next 
build.


Philip



Re: serial/ulscom: response timeout using pySerial/esptool.py

2024-04-27 Thread FreeBSD User
Am Sat, 27 Apr 2024 11:28:55 +0200
FreeBSD User  schrieb:

Just for the record: running a small "victim NAS" based on an HP EliteDesk 800 
G2 mini,
XigmaNAS (latest official version, kernel see below), installing packages from 
an official
FreeBSD site for FBSD 13.2-RELEASE, gives me on an ESP32 D1 mini, not working 
with the
afore mentioned host, gives this (after a loop of 100x issued the esptool.py 
command, no
issues detected):

[...]
nas01: ~# esptool.py --chip esp32 --port /dev/cuaU0 --baud 115200 read_mac
esptool.py v4.5
Serial port /dev/cuaU0
Connecting..
Chip is ESP32-D0WD-V3 (revision v3.1)
Features: WiFi, BT, Dual Core, 240MHz, VRef calibration in efuse, Coding Scheme 
None
Crystal is 40MHz
MAC: XX:XX:XX:XX:XX:XX
Uploading stub...
Running stub...
Stub running...
MAC: XX:XX:XX:XX:XX:XX
Hard resetting via RTS pin...
[...]

.. and those from AZdelivery (larger and older chips):
[...]
nas01: ~# esptool.py --chip esp32 --port /dev/cuaU0 --baud 115200 read_mac
esptool.py v4.5
Serial port /dev/cuaU0
Connecting.
Chip is ESP32-D0WDQ6 (revision v1.0)
Features: WiFi, BT, Dual Core, 240MHz, VRef calibration in efuse, Coding Scheme 
None
Crystal is 40MHz
MAC: XX:XX:XX:XX:XX:XX
Uploading stub...
Running stub...
Stub running...
MAC: XX:XX:XX:XX:XX:XX
Hard resetting via RTS pin...

[...]

or

[... considered a different revision, but in fact the same old ESP32 as it 
reveals itself as
..]
nas01: ~# esptool.py --chip esp32 --port /dev/cuaU0 --baud 115200 read_mac
esptool.py v4.5
Serial port /dev/cuaU0
Connecting...
Chip is ESP32-D0WDQ6 (revision v1.0)
Features: WiFi, BT, Dual Core, 240MHz, VRef calibration in efuse, Coding Scheme 
None
Crystal is 40MHz
MAC: XX:XX:XX:XX:XX:XX
Uploading stub...
Running stub...
Stub running...
MAC: XX:XX:XX:XX:XX:XX
Hard resetting via RTS pin...


Big question is: is this an issue introduced with FBSD 14? In 2020 I played 
around with my
first attempts using the Arduino IDE which worked pretty well, with some minor 
issues (I had
to perform several attempts to get connected, using 12- and 13-STABLE that 
time). But the
Arduino IDE doen't work as well


> Am Thu, 25 Apr 2024 21:51:21 +0200
> Tomek CEDRO  schrieb:
> 
> > CP2102 are pretty good ones and never let me down :-)
> > 
> > Is your UART connection to ESP32 working correctly? Can you see the
> > boot message and whatever happens next in terminal (cu / minicom)? Are
> > RX TX pins not swapped? Power supply okay?  
> 
> The ESP32 used are 
> - ESP32-WROOM32 D1 mini, have 10 pieces of those, on each single one same 
> behaviour on same
> host
> - ESP32-WROOM32 sold by Chinese distributor AZdelivery in Germany, I got a 
> bunch of them,
> Revision 1 (baught 2020) and a more recent revision V4, baught a couple of 
> months ago.
> 
> AGAIN: ALL chips do not communicate with my private hosts (dmesg: CPU 
> microcode: updated from
> 0x1f to 0x21 CPU: Intel(R) Core(TM) i5-3470 CPU @ 3.20GHz (3200.18-MHz 
> K8-class CPU)), OS:
> FreeBSD 15.0-CURRENT #39 main-n269723-4ba444de708b: Sat Apr 27 06:42:44 CEST 
> 2024 amd64,
> mainboard is a crappy Z77 Pro4 ASrock, 
> 
> pciconf excerpts:
> [...]
> ichsmb0@pci0:0:31:3:class=0x0c0500 rev=0x04 hdr=0x00 vendor=0x8086 
> device=0x1e22
> subvendor=0x1849 subdevice=0x1e22 vendor = 'Intel Corporation'
> device = '7 Series/C216 Chipset Family SMBus Controller'
> class  = serial bus
> subclass   = SMBus
> bar   [10] = type Memory, range 64, base 0xf7c15000, size 256, enabled
> bar   [20] = type I/O Port, range 32, base 0xf040, size 32, enabled
> ..
> ehci1@pci0:0:29:0:  class=0x0c0320 rev=0x04 hdr=0x00 vendor=0x8086 
> device=0x1e26
> subvendor=0x1849 subdevice=0x1e26 vendor = 'Intel Corporation'
> device = '7 Series/C216 Chipset Family USB Enhanced Host Controller'
> class  = serial bus
> subclass   = USB
> bar   [10] = type Memory, range 32, base 0xf7c17000, size 1024, enabled
> cap 01[50] = powerspec 2  supports D0 D3  current D0
> cap 0a[58] = EHCI Debug Port at offset 0xa0 in map 0x14
> cap 13[98] = PCI Advanced Features: FLR TP
> ..
> xhci0@pci0:0:20:0:  class=0x0c0330 rev=0x04 hdr=0x00 vendor=0x8086 
> device=0x1e31
> subvendor=0x1849 subdevice=0x1e31 vendor = 'Intel Corporation'
> device = '7 Series/C210 Series Chipset Family USB xHCI Host 
> Controller'
> class  = serial bus
> subclass   = USB
> bar   [10] = type Memory, range 64, base 0xf7c0, size 65536, enabled
> cap 01[70] = powerspec 2  supports D0 D3  current D0
> cap 05[80] = MSI supports 8 messages, 64 bit enabled with 1 message
> 
> 
> 
> > 
> > Are boards generic devkits of custom hardware? ESP32 in addition to RX
> > TX needs two control lines Reset and Boot that will switch the chip to
> > bootloader / flashing mode. Most USB-to-UART use RTS/CTS lines for
> > that. Are you sure these lines are available on your board and
> > connected to the target correctly? Do you have Reset and 

Re: serial/ulscom: response timeout using pySerial/esptool.py

2024-04-27 Thread FreeBSD User
Am Thu, 25 Apr 2024 21:51:21 +0200
Tomek CEDRO  schrieb:

> CP2102 are pretty good ones and never let me down :-)
> 
> Is your UART connection to ESP32 working correctly? Can you see the
> boot message and whatever happens next in terminal (cu / minicom)? Are
> RX TX pins not swapped? Power supply okay?

The ESP32 used are 
- ESP32-WROOM32 D1 mini, have 10 pieces of those, on each single one same 
behaviour on same
host
- ESP32-WROOM32 sold by Chinese distributor AZdelivery in Germany, I got a 
bunch of them,
Revision 1 (baught 2020) and a more recent revision V4, baught a couple of 
months ago.

AGAIN: ALL chips do not communicate with my private hosts (dmesg: CPU 
microcode: updated from
0x1f to 0x21 CPU: Intel(R) Core(TM) i5-3470 CPU @ 3.20GHz (3200.18-MHz K8-class 
CPU)), OS:
FreeBSD 15.0-CURRENT #39 main-n269723-4ba444de708b: Sat Apr 27 06:42:44 CEST 
2024 amd64,
mainboard is a crappy Z77 Pro4 ASrock, 

pciconf excerpts:
[...]
ichsmb0@pci0:0:31:3:class=0x0c0500 rev=0x04 hdr=0x00 vendor=0x8086 
device=0x1e22
subvendor=0x1849 subdevice=0x1e22 vendor = 'Intel Corporation'
device = '7 Series/C216 Chipset Family SMBus Controller'
class  = serial bus
subclass   = SMBus
bar   [10] = type Memory, range 64, base 0xf7c15000, size 256, enabled
bar   [20] = type I/O Port, range 32, base 0xf040, size 32, enabled
..
ehci1@pci0:0:29:0:  class=0x0c0320 rev=0x04 hdr=0x00 vendor=0x8086 
device=0x1e26
subvendor=0x1849 subdevice=0x1e26 vendor = 'Intel Corporation'
device = '7 Series/C216 Chipset Family USB Enhanced Host Controller'
class  = serial bus
subclass   = USB
bar   [10] = type Memory, range 32, base 0xf7c17000, size 1024, enabled
cap 01[50] = powerspec 2  supports D0 D3  current D0
cap 0a[58] = EHCI Debug Port at offset 0xa0 in map 0x14
cap 13[98] = PCI Advanced Features: FLR TP
..
xhci0@pci0:0:20:0:  class=0x0c0330 rev=0x04 hdr=0x00 vendor=0x8086 
device=0x1e31
subvendor=0x1849 subdevice=0x1e31 vendor = 'Intel Corporation'
device = '7 Series/C210 Series Chipset Family USB xHCI Host Controller'
class  = serial bus
subclass   = USB
bar   [10] = type Memory, range 64, base 0xf7c0, size 65536, enabled
cap 01[70] = powerspec 2  supports D0 D3  current D0
cap 05[80] = MSI supports 8 messages, 64 bit enabled with 1 message



> 
> Are boards generic devkits of custom hardware? ESP32 in addition to RX
> TX needs two control lines Reset and Boot that will switch the chip to
> bootloader / flashing mode. Most USB-to-UART use RTS/CTS lines for
> that. Are you sure these lines are available on your board and
> connected to the target correctly? Do you have Reset and Boot buttons
> on the board so you could trigger bootloader by hand (hold Boot, press
> and release Reset, device will be in bootloader upload mode, retry
> esptool flashing now). You can also play with the buttons with active
> terminal attached (i.e. minicom) to see if they work as expected.

I tried minivom, but I have to confess, I'm a "noice" in that matter, so do not 
expect
professional debugging infos:

Unsuccessful issueing the following command on three different types of ESP32 as
described above, I use at least two of them and even one (a D1 mini) just 
unfolded from
its sealed anti static bag) while observing the minicom attached via -D 
/dev/cuaU1:

[...]
[ohartmann]: esptool.py --chip esp32 --baud 115200 --connect-attempts 400 
--port /dev/cuaU1
read_mac esptool.py v4.7.0
Loaded custom configuration from /pool/home/ohartmann/esptool.cfg
Serial port /dev/cuaU1
Connecting...

A serial exception error occurred: device reports readiness to read but 
returned no data
(device disconnected or multiple access on port?) Note: This error originates 
from pySerial.
It is likely not a problem with esptool, but with the hardware connection or 
drivers. For
troubleshooting steps visit:
https://docs.espressif.com/projects/esptool/en/latest/troubleshooting.html

[...]

On the reference minicom terminal I observed with the D1 mini (minicom -b 
115200  -D
/dev/cuaU1):
[...]

Welcome to minicom 2.8

OPTIONS: I18n 
Compiled on Apr 27 2024, 09:04:50.
Port /dev/cuaU1, 10:50:53

Press CTRL-A Z for help on special keys

ts Jul 29 2019 12:21:46

rst:x1 (POWERON_RESET),boot:0x3 (DOWNLOAD_BOOT(UART0/UART1/SDIO_REI_REO_V2))
waiting for download
 U� U� U� U� U� U� U� U


[... the older ESP32 from 2020 ...]

rst:0x1 (POWERON_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)
configsip: 0, SPIWP:0xee
clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
mode:DOUT, clock div:2
load:0x3fff0018,len:4
load:0x3fff001c,len:1044
load:0x40078000,len:10124
load:0x40080400,len:5828
entry 0x400806a8
�un  8 2016 00:22:57

rst:0x1 (POWERON_RESET),boot:0x3 (DOWNLOAD_BOOT(UART0/UART1/SDIO_REI_REO_V2))
waiting for download
es Jun  8 2016 00:22:57

rst:0x1 (POWERON_RESET),boot:0x13 (SPI_FAST_FLASH]�(�:���   �


[... and the one purchased last year, called 

Re: TXT Kernel linking failed on -CURRENT

2024-04-26 Thread BSD USER

Konstantin, good day!

25.04.2024 0:09, Konstantin Belousov пишет:

On Wed, Apr 24, 2024 at 01:12:39PM +0500, BSD USER wrote:

linking kernel
ld: error: undefined symbol: ktrcapfail

referenced by vfs_lookup.c
    vfs_lookup.o:(namei)
referenced by vfs_lookup.c
    vfs_lookup.o:(namei_setup)
referenced by vfs_lookup.c
    vfs_lookup.o:(vfs_lookup)
referenced 3 more times

*** [kernel] Error code 1

Try
https://reviews.freebsd.org/D44931


Yes, now system and kernel builds fine.

Thanks!



Re: pkg server for current/arm64 stopped ? [main-armv7 on ampere2, elapsed so far: 651:21:56]

2024-04-26 Thread Mark Millard
On Apr 26, 2024, at 18:55, Philip Paeps  wrote:

> On 2024-04-18 23:02:30 (+0800), Mark Millard wrote:
>> void  wrote on
>> Date: Thu, 18 Apr 2024 14:08:36 UTC :
>> 
>>> Not sure where to post this..
>>> 
>>> The last bulk build for arm64 appears to have happened around
>>> mid-March on ampere2. Is it broken?
>> 
>> main-armv7 building is broken and the last completed build
>> was the one started on Mon, 19 Feb 2024 12:32:10 GMT. It
>> gets stuck making no progress until manually forced to stop,
>> which leads to huge elapsed times for the incomplete builds:
>> 
>> pd5512ae7b8c6_s75464941dc 34472 12282  (+9196) 107  (+77) 4753  (+2247) 1390 
>>  (+529) 15940 parallel_build: Fri, 22 Mar 2024 11:05:01 GMT 651:21:56
>> 
>> p43e3af5f5763_sf5f08e41aa 19809 5919  (+3126) 137  (+100) 5363  (+2741) 1395 
>>  (+522) 6995 parallel_build: Wed, 28 Feb 2024 15:46:14 GMT 359:42:14 ampere2
>> 
>> ampere2 alternates between trying to build main-arm64 and main-armv7, so 
>> main-armv7 being stuck blocks main-arm64 from building.
>> 
>> One can see that all 13 job ID's show over 570 hours:
>> 
>> http://ampere2.nyi.freebsd.org/build.html?mastername=main-armv7-default=pd5512ae7b8c6_s75464941dc
>> 
>> It is not random which packages are building when this happens. Compare:
>> 
>> http://ampere2.nyi.freebsd.org/build.html?mastername=main-armv7-default=p43e3af5f5763_sf5f08e41aa
>> 
>> By contrast, the 19 Feb 2024 from-scratch (full) build worked:
>> 
>> http://ampere2.nyi.freebsd.org/build.html?mastername=main-armv7-default=pe9c9c73181b5_sbd45bbe440
>> 
>> My guess is that FreeBSD has something that broken after bd45bbe440
>> that was broken as of f5f08e41aa and was still broken at 75464941dc .
> 
> It looks like ampere2 is going to end up in this state again:
> 
> https://pkg-status.freebsd.org/ampere2/build.html?mastername=main-armv7-default=p1c7a816cd0ad_s1bd4f769ca
> 
> It's got a couple of things stuck in -depends already.  I'll keep an eye on 
> it for the next hour or two.  If no progress is made, I'll kill this build 
> and force an upgrade.  The next build will start at 01:01 UTC Sunday.  So we 
> won't have long to wait before it tries again.
> 
> ampere1 is chewing away at llvm, and doesn't look stuck.
> 
> ampere3 has been upgraded.

Output from the likes of:

# ps -axldww

could be interesting. As might be output from:

# pstat -k -k PIDs_OF_STUCK_PROCESSES

(kernel stack backtraces).


===
Mark Millard
marklmi at yahoo.com




Re: pkg server for current/arm64 stopped ? [main-armv7 on ampere2, elapsed so far: 651:21:56]

2024-04-26 Thread Philip Paeps

On 2024-04-18 23:02:30 (+0800), Mark Millard wrote:

void  wrote on
Date: Thu, 18 Apr 2024 14:08:36 UTC :


Not sure where to post this..

The last bulk build for arm64 appears to have happened around
mid-March on ampere2. Is it broken?


main-armv7 building is broken and the last completed build
was the one started on Mon, 19 Feb 2024 12:32:10 GMT. It
gets stuck making no progress until manually forced to stop,
which leads to huge elapsed times for the incomplete builds:

pd5512ae7b8c6_s75464941dc 34472 12282  (+9196) 107  (+77) 4753  
(+2247) 1390  (+529) 15940 parallel_build: Fri, 22 Mar 2024 11:05:01 
GMT 651:21:56


p43e3af5f5763_sf5f08e41aa 19809 5919  (+3126) 137  (+100) 5363  
(+2741) 1395  (+522) 6995 parallel_build: Wed, 28 Feb 2024 15:46:14 
GMT 359:42:14 ampere2


ampere2 alternates between trying to build main-arm64 and main-armv7, 
so main-armv7 being stuck blocks main-arm64 from building.


One can see that all 13 job ID's show over 570 hours:

http://ampere2.nyi.freebsd.org/build.html?mastername=main-armv7-default=pd5512ae7b8c6_s75464941dc

It is not random which packages are building when this happens. 
Compare:


http://ampere2.nyi.freebsd.org/build.html?mastername=main-armv7-default=p43e3af5f5763_sf5f08e41aa

By contrast, the 19 Feb 2024 from-scratch (full) build worked:

http://ampere2.nyi.freebsd.org/build.html?mastername=main-armv7-default=pe9c9c73181b5_sbd45bbe440

My guess is that FreeBSD has something that broken after bd45bbe440
that was broken as of f5f08e41aa and was still broken at 75464941dc .


It looks like ampere2 is going to end up in this state again:

https://pkg-status.freebsd.org/ampere2/build.html?mastername=main-armv7-default=p1c7a816cd0ad_s1bd4f769ca

It's got a couple of things stuck in -depends already.  I'll keep an eye 
on it for the next hour or two.  If no progress is made, I'll kill this 
build and force an upgrade.  The next build will start at 01:01 UTC 
Sunday.  So we won't have long to wait before it tries again.


ampere1 is chewing away at llvm, and doesn't look stuck.

ampere3 has been upgraded.

Philip



Re: mysterious setting of B_DIRECT?

2024-04-25 Thread Rick Macklem
On Thu, Apr 25, 2024 at 8:51 PM Rick Macklem  wrote:
>
> On Thu, Apr 25, 2024 at 8:09 PM Konstantin Belousov  wrote:
> >
> > On Thu, Apr 25, 2024 at 07:49:23PM -0700, Rick Macklem wrote:
> > > Hi,
> > >
> > > This week I have been doing active testing as a part of an IETF
> > > bakeathon for NFSv4. During the week I had a NFSv4 client
> > > crash. On the surface, it is straightforward, in that it called
> > > ncl_doio_directwrite() and the field called b_caller1 was NULL.
> > >
> > > Now, here's the weird part...
> > > ncl_doio_directwrite() should never be called because B_DIRECT
> > > should never be set. (The only place B_DIRECT gets set in the code
> > > is never currently executed.)
> > Do you mean the place in nfs_directio_write()?  And the fact that
> > IO_SYNC is always set.
> Yes.
>
> >
> > >
> > > I have a patch that clears out the "never to be executed" code and
> > > this seems to avoid the patch, since with the patch, 
> > > ncl_doio_directwrite()
> > > no longer exists.
> > >
> > > What I cannot figure out is how B_DIRECT got set?
> > > I can note that UFS was under heavy load when the client crashed,
> > > but I cannot see how a UFS "struct buf" would become a NFS "struct buf"
> > > without b_flags being set to 0.
> >
> > There are also vfs_bio_brelse()/vfs_bio_setflags() functions which can
> > set B_DIRECT.  On the other hand, they are not used by nfs client.
> Yes, again.
>
> >
> > What was the overall state of the buffer with the B_DIRECT flag?  Which
> > vnode it was assigned to?
> Unfortunately I was in a hurry and didn't get more info.
> And, since I have never seen this crash before, I doubt I'll be able
> to reproduce it.
Oh, and I will put the cleanup patch on phabricator. I didn't see the
crash again
during a few days of testing with the patch. This makes sense, since it gets
rid of ncl_doio_directwrite().

>
> Thanks, rick



Re: mysterious setting of B_DIRECT?

2024-04-25 Thread Rick Macklem
On Thu, Apr 25, 2024 at 8:09 PM Konstantin Belousov  wrote:
>
> On Thu, Apr 25, 2024 at 07:49:23PM -0700, Rick Macklem wrote:
> > Hi,
> >
> > This week I have been doing active testing as a part of an IETF
> > bakeathon for NFSv4. During the week I had a NFSv4 client
> > crash. On the surface, it is straightforward, in that it called
> > ncl_doio_directwrite() and the field called b_caller1 was NULL.
> >
> > Now, here's the weird part...
> > ncl_doio_directwrite() should never be called because B_DIRECT
> > should never be set. (The only place B_DIRECT gets set in the code
> > is never currently executed.)
> Do you mean the place in nfs_directio_write()?  And the fact that
> IO_SYNC is always set.
Yes.

>
> >
> > I have a patch that clears out the "never to be executed" code and
> > this seems to avoid the patch, since with the patch, ncl_doio_directwrite()
> > no longer exists.
> >
> > What I cannot figure out is how B_DIRECT got set?
> > I can note that UFS was under heavy load when the client crashed,
> > but I cannot see how a UFS "struct buf" would become a NFS "struct buf"
> > without b_flags being set to 0.
>
> There are also vfs_bio_brelse()/vfs_bio_setflags() functions which can
> set B_DIRECT.  On the other hand, they are not used by nfs client.
Yes, again.

>
> What was the overall state of the buffer with the B_DIRECT flag?  Which
> vnode it was assigned to?
Unfortunately I was in a hurry and didn't get more info.
And, since I have never seen this crash before, I doubt I'll be able
to reproduce it.

Thanks, rick



Re: mysterious setting of B_DIRECT?

2024-04-25 Thread Konstantin Belousov
On Thu, Apr 25, 2024 at 07:49:23PM -0700, Rick Macklem wrote:
> Hi,
> 
> This week I have been doing active testing as a part of an IETF
> bakeathon for NFSv4. During the week I had a NFSv4 client
> crash. On the surface, it is straightforward, in that it called
> ncl_doio_directwrite() and the field called b_caller1 was NULL.
> 
> Now, here's the weird part...
> ncl_doio_directwrite() should never be called because B_DIRECT
> should never be set. (The only place B_DIRECT gets set in the code
> is never currently executed.)
Do you mean the place in nfs_directio_write()?  And the fact that
IO_SYNC is always set.

> 
> I have a patch that clears out the "never to be executed" code and
> this seems to avoid the patch, since with the patch, ncl_doio_directwrite()
> no longer exists.
> 
> What I cannot figure out is how B_DIRECT got set?
> I can note that UFS was under heavy load when the client crashed,
> but I cannot see how a UFS "struct buf" would become a NFS "struct buf"
> without b_flags being set to 0.

There are also vfs_bio_brelse()/vfs_bio_setflags() functions which can
set B_DIRECT.  On the other hand, they are not used by nfs client.

What was the overall state of the buffer with the B_DIRECT flag?  Which
vnode it was assigned to?



mysterious setting of B_DIRECT?

2024-04-25 Thread Rick Macklem
Hi,

This week I have been doing active testing as a part of an IETF
bakeathon for NFSv4. During the week I had a NFSv4 client
crash. On the surface, it is straightforward, in that it called
ncl_doio_directwrite() and the field called b_caller1 was NULL.

Now, here's the weird part...
ncl_doio_directwrite() should never be called because B_DIRECT
should never be set. (The only place B_DIRECT gets set in the code
is never currently executed.)

I have a patch that clears out the "never to be executed" code and
this seems to avoid the patch, since with the patch, ncl_doio_directwrite()
no longer exists.

What I cannot figure out is how B_DIRECT got set?
I can note that UFS was under heavy load when the client crashed,
but I cannot see how a UFS "struct buf" would become a NFS "struct buf"
without b_flags being set to 0.

Anyone have any ideas? rick



Re: serial/ulscom: response timeout using pySerial/esptool.py

2024-04-25 Thread Tom Jones
Can you isolate out the extraneous stuff and loop tx and rx on a CP2101 board 
and send bytes through? 

I did a bunch of development on an esp8266 board in the last few weeks and had 
no issues, but I’ve no idea if it were the same usb serial chip. 

I’ll have a dig around and see if I have something matching 

On Thu, Apr 25, 2024, at 20:17, FreeBSD User wrote:
> Hello,
>
> Host: 15.0-CURRENT FreeBSD 15.0-CURRENT #36 main-n269703-54c3aa02e926: 
> Thu Apr 25 18:48:56
> CEST 2024 amd64 or 14-STABLE recently compiled (dmesg/uname not at 
> hand).
>
> Hardware: oldish Z77Pro 4 based Asrock mainboard, a Lenovo T560 
> notebook, Fujitsu Esprimo Q5XX
> (simple desktop, Pentium Gold) or an oldish Fujitsu Celsius 7XX 
> workstation, 6 core Haswell
> XEON.
>
> Phenomenon: a couple of weeks now I try to connect to several Xtensa 
> ESP32 dev boards
> (ESP32-WROOM32 with CP2101 or CP2104 UART) via comms/py-esptool 
> (doesn't matter whether it is
> tho port's py39-esptool 4.5 or the latest py-esptool 4.7.0, doesn't 
> matter whether pkg package
> or self compiled on CURRENT and 14-STABLE, on all hardware platforms 
> same result).
>
> Attaching the ESP devel module via Micro USB cable (several type, 
> differnt vendors tried ...)
> show
>
> dmesg:
> [...]
> ugen0.4:  at usbus0
> uslcom0 on uhub3
> uslcom0:  rev 1.10/1.00, addr 4>
> on usbus0
> [...]
>
> When trying to connect to the ESP32 via below shown command (--trace 
> not every time issued), I
> get no connection:
>
> [ohartmann]: esptool.py --trace --chip esp32 --baud 115200 --port 
> /dev/cuaU1  flash_id
> esptool.py v4.7.0
> Loaded custom configuration from /pool/home/ohartmann/esptool.cfg
> Serial port /dev/cuaU1
> Connecting...TRACE +0.000 command op=0x08 data len=36 wait_response=1 
> timeout=0.100 data=
> 07071220  | ... 
>   | 
>   | 
> TRACE +0.000 Write 46 bytes: 
> c824 000707122055 | ...$ UUU
>   | 
>  55c0 | U.
> TRACE +0.102 No serial data received.
> TRACE +0.052 command op=0x08 data len=36 wait_response=1 timeout=0.100 
> data=
> 07071220  | ... 
>   | 
>   | 
> TRACE +0.000 Write 46 bytes: 
> c824 000707122055 | ...$ UUU
>   | 
>  55c0 | U.
> TRACE +0.107 No serial data received.
> TRACE +0.054 command op=0x08 data len=36 wait_response=1 timeout=0.100 
> data=
> 07071220  | ... 
>   | 
>   | 
> TRACE +0.000 Write 46 bytes: 
> c824 000707122055 | ...$ UUU
>   | 
>  55c0 | U.
> TRACE +0.107 No serial data received.
> TRACE +0.054 command op=0x08 data len=36 wait_response=1 timeout=0.100 
> data=
> 07071220  | ... 
>   | 
>   | 
> TRACE +0.000 Write 46 bytes: 
> c824 000707122055 | ...$ UUU
>   | 
>  55c0 | U.
>
>
> A serial exception error occurred: device reports readiness to read but 
> returned no data
> (device disconnected or multiple access on port?) Note: This error 
> originates from pySerial.
> It is likely not a problem with esptool, but with the hardware 
> connection or drivers. For
> troubleshooting steps visit:
> https://docs.espressif.com/projects/esptool/en/latest/troubleshooting.html
> [...]
>
>
> Whatever baud rate issued, in most cases on all tested OS versions and 
> almost all hardware
> platforms except the Fujistu Celsius 7XX (2015 model) I do not get any 
> connection! And it get
> more weird: To avoid out-of-sync-software I recompiled everything via 
> "portmaster -df
> comms/py-pyserial comms/py-esptool" and after that, everything was 
> fine, the connection was
> made, I got results out of the chip. Seconds later same problems.
>
> I exchanged cablings, exchanged the ESP32 model and vendor. Invariants 
> are 14-STABLE, daily
> compiled, CURRENT. daily compiled. On my private box (old Z77 based 
> IvyBridge ASRock crap), a
> couple of Lenovo T560 running 14-STABLE and several Fujitsu Esprimo 
> Q5XX boxes there is always
> this weird error message, but in very rare cases I get connection.
>
> Only exception: the Fujsitus Celsius 7XX workstation (14-STABLE, last 
> complied today noon). No
> matter what ESP32, no 

Re: serial/ulscom: response timeout using pySerial/esptool.py

2024-04-25 Thread Tomek CEDRO
CP2102 are pretty good ones and never let me down :-)

Is your UART connection to ESP32 working correctly? Can you see the
boot message and whatever happens next in terminal (cu / minicom)? Are
RX TX pins not swapped? Power supply okay?

Are boards generic devkits of custom hardware? ESP32 in addition to RX
TX needs two control lines Reset and Boot that will switch the chip to
bootloader / flashing mode. Most USB-to-UART use RTS/CTS lines for
that. Are you sure these lines are available on your board and
connected to the target correctly? Do you have Reset and Boot buttons
on the board so you could trigger bootloader by hand (hold Boot, press
and release Reset, device will be in bootloader upload mode, retry
esptool flashing now). You can also play with the buttons with active
terminal attached (i.e. minicom) to see if they work as expected.

Minicom serial terminal is pretty cool as it allows you to watch UART
behavior on adapter (un)plug. In minicom you can also enable/disable
hardware flow control lines (Ctrl+A O -> Serial Port Setup -> (F)
Hardware Flow Control). You can switch that easily and watch the
target behavior. If this is the problem you may want to use stty (1)
to enable/disable hardware flow control on the port.

Can you try with another board? ESP32 has fuses that may permanently
disable and/or mess up some hardware features.

--
CeDeROM, SQ7MHZ, http://www.tomek.cedro.info



serial/ulscom: response timeout using pySerial/esptool.py

2024-04-25 Thread FreeBSD User
Hello,

Host: 15.0-CURRENT FreeBSD 15.0-CURRENT #36 main-n269703-54c3aa02e926: Thu Apr 
25 18:48:56
CEST 2024 amd64 or 14-STABLE recently compiled (dmesg/uname not at hand).

Hardware: oldish Z77Pro 4 based Asrock mainboard, a Lenovo T560 notebook, 
Fujitsu Esprimo Q5XX
(simple desktop, Pentium Gold) or an oldish Fujitsu Celsius 7XX workstation, 6 
core Haswell
XEON.

Phenomenon: a couple of weeks now I try to connect to several Xtensa ESP32 dev 
boards
(ESP32-WROOM32 with CP2101 or CP2104 UART) via comms/py-esptool (doesn't matter 
whether it is
tho port's py39-esptool 4.5 or the latest py-esptool 4.7.0, doesn't matter 
whether pkg package
or self compiled on CURRENT and 14-STABLE, on all hardware platforms same 
result).

Attaching the ESP devel module via Micro USB cable (several type, differnt 
vendors tried ...)
show

dmesg:
[...]
ugen0.4:  at usbus0
uslcom0 on uhub3
uslcom0: 
on usbus0
[...]

When trying to connect to the ESP32 via below shown command (--trace not every 
time issued), I
get no connection:

[ohartmann]: esptool.py --trace --chip esp32 --baud 115200 --port /dev/cuaU1  
flash_id
esptool.py v4.7.0
Loaded custom configuration from /pool/home/ohartmann/esptool.cfg
Serial port /dev/cuaU1
Connecting...TRACE +0.000 command op=0x08 data len=36 wait_response=1 
timeout=0.100 data=
07071220  | ... 
  | 
  | 
TRACE +0.000 Write 46 bytes: 
c824 000707122055 | ...$ UUU
  | 
 55c0 | U.
TRACE +0.102 No serial data received.
TRACE +0.052 command op=0x08 data len=36 wait_response=1 timeout=0.100 data=
07071220  | ... 
  | 
  | 
TRACE +0.000 Write 46 bytes: 
c824 000707122055 | ...$ UUU
  | 
 55c0 | U.
TRACE +0.107 No serial data received.
TRACE +0.054 command op=0x08 data len=36 wait_response=1 timeout=0.100 data=
07071220  | ... 
  | 
  | 
TRACE +0.000 Write 46 bytes: 
c824 000707122055 | ...$ UUU
  | 
 55c0 | U.
TRACE +0.107 No serial data received.
TRACE +0.054 command op=0x08 data len=36 wait_response=1 timeout=0.100 data=
07071220  | ... 
  | 
  | 
TRACE +0.000 Write 46 bytes: 
c824 000707122055 | ...$ UUU
  | 
 55c0 | U.


A serial exception error occurred: device reports readiness to read but 
returned no data
(device disconnected or multiple access on port?) Note: This error originates 
from pySerial.
It is likely not a problem with esptool, but with the hardware connection or 
drivers. For
troubleshooting steps visit:
https://docs.espressif.com/projects/esptool/en/latest/troubleshooting.html
[...]


Whatever baud rate issued, in most cases on all tested OS versions and almost 
all hardware
platforms except the Fujistu Celsius 7XX (2015 model) I do not get any 
connection! And it get
more weird: To avoid out-of-sync-software I recompiled everything via 
"portmaster -df
comms/py-pyserial comms/py-esptool" and after that, everything was fine, the 
connection was
made, I got results out of the chip. Seconds later same problems.

I exchanged cablings, exchanged the ESP32 model and vendor. Invariants are 
14-STABLE, daily
compiled, CURRENT. daily compiled. On my private box (old Z77 based IvyBridge 
ASRock crap), a
couple of Lenovo T560 running 14-STABLE and several Fujitsu Esprimo Q5XX boxes 
there is always
this weird error message, but in very rare cases I get connection.

Only exception: the Fujsitus Celsius 7XX workstation (14-STABLE, last complied 
today noon). No
matter what ESP32, no matter what vendor, no matter what cablin used: 
connection is established
at any BAUD rate issued at any time. Not one single failure as shown above in 
any session (I
checked several tenth times)!

Now I'm out of ideas and I suspect the CP210X ulscom serial driver to have 
trouble with most
onboard serial chipsets.

Can anyone help me track down this issue? Is there anything I could have missed?

I drives me nuts ...

Thanks in advance,

Oliver

 
-- 
O. Hartmann



Re: TXT Kernel linking failed on -CURRENT

2024-04-24 Thread Konstantin Belousov
On Wed, Apr 24, 2024 at 01:12:39PM +0500, BSD USER wrote:
> linking kernel
> ld: error: undefined symbol: ktrcapfail
> >>> referenced by vfs_lookup.c
> >>>   vfs_lookup.o:(namei)
> >>> referenced by vfs_lookup.c
> >>>   vfs_lookup.o:(namei_setup)
> >>> referenced by vfs_lookup.c
> >>>   vfs_lookup.o:(vfs_lookup)
> >>> referenced 3 more times
> *** [kernel] Error code 1

Try
https://reviews.freebsd.org/D44931



Re: Strange network/socket anomalies since about a month

2024-04-24 Thread Dag-Erling Smørgrav
Alexander Leidinger  writes:
> Gleb Smirnoff  writes:
> > I don't have any better idea than ktrace over failing application.
> > Yep, I understand that poudriere will produce a lot.
> Yes, it does. 4.4 GB just for the start of poudriere until the first
> package build fails due to a failed sccache start [...]

Using `ktrace -tcnpstuy` instead of just `ktrace` should greatly reduce
the size of the trace file.

(remind me to modify ktrace and kdump so this can be written as `-t-i`
or `-tI` instead...)

DES
-- 
Dag-Erling Smørgrav - d...@freebsd.org



Re: Strange network/socket anomalies since about a month

2024-04-24 Thread Alexander Leidinger

Am 2024-04-22 18:12, schrieb Gleb Smirnoff:

There were several preparatory commits that were not reverted and one 
of them
had a bug.  The bug manifested itself as failure to send(2) zero bytes 
over
unix/stream.  It was fixed with 
e6a4b57239dafc6c944473326891d46d966c0264. Can
you please check you have this revision? Other than that there are no 
known

bugs left.


Yes, I have this fix in my running kernel.


A> Any ideas how to track this down more easily than running the entire
A> poudriere in ktrace (e.g. a hint/script which dtrace probes to use)?

I don't have any better idea than ktrace over failing application.  
Yep, I
understand that poudriere will produce a lot.  But first we need to 
determine


Yes, it does. 4.4 GB just for the start of poudriere until the first 
package build fails due to a failed sccache start (luckily in the first 
builder, but I had at least 2 builders automatically spin up by 
poudriere at the time when I validated the failure in the logs and 
disabled the tracing).


what syscall fails and on what type of socket.  After that we can scope 
down to

using dtrace on very particular functions.


I'm not sure I manage to find the cause of the failure... the only thing 
which remotely looks like an issue is "Resource temporarily 
unavailable", but this is from the process which waits for the server to 
have started:

---snip---
 58406 sccache  1713947887.504834367 RET   __sysctl 0
 58406 sccache  1713947887.505521884 CALL  
rfork(0x8000<>2147483648)
 58406 sccache  1713947887.50575 CAP   system call not allowed: 
rfork

 58406 sccache  1713947887.505774176 RET   rfork 58426/0xe43a
 58406 sccache  1713947887.507304865 CALL  
compat11.kevent(0x3,0x371d360f89e8,0x2,0x371d360f89e8,0x2,0)
 58406 sccache  1713947887.507657906 STRU  struct freebsd11_kevent[] = { 
{ ident=11, filter=EVFILT_READ, flags=0x61, 
fflags=0, data=0, udata=0x0 }
 { ident=11, filter=EVFILT_WRITE, 
flags=0x61, fflags=0, data=0, udata=0x0 } }
 58406 sccache  1713947887.507689980 STRU  struct freebsd11_kevent[] = { 
{ ident=11, filter=EVFILT_READ, flags=0x4000, fflags=0, 
data=0, udata=0x0 }
 { ident=11, filter=EVFILT_WRITE, flags=0x4000, 
fflags=0, data=0, udata=0x0 } }

 58406 sccache  1713947887.507977155 RET   compat11.kevent 2
 58406 sccache  1713947887.508015751 CALL  write(0x5,0x371515685bcc,0x1)
 58406 sccache  1713947887.508086434 GIO   fd 5 wrote 1 byte
   0x 01   |.|

 58406 sccache  1713947887.508145930 RET   write 1
 58406 sccache  1713947887.508183140 CALL  
compat11.kevent(0x7,0,0,0x5a5689ab0c40,0x400,0)
 58406 sccache  1713947887.508396614 STRU  struct freebsd11_kevent[] = { 
 }
 58406 sccache  1713947887.508156537 STRU  struct freebsd11_kevent[] = { 
{ ident=4, filter=EVFILT_READ, flags=0x60, 
fflags=0, data=0x1, udata=0x } }

 58406 sccache  1713947887.508530888 RET   compat11.kevent 1
 58406 sccache  1713947887.508563736 CALL  read(0x4,0x371d3a2887c0,0x80)
 58406 sccache  1713947887.508729102 GIO   fd 4 read 1 byte
   0x 01   |.|

 58406 sccache  1713947887.508967661 RET   read 1
 58406 sccache  1713947887.508996753 CALL  read(0x4,0x371d3a2887c0,0x80)
 58406 sccache  1713947887.509028311 RET   read -1 errno 35 Resource 
temporarily unavailable
 58406 sccache  1713947887.509068838 CALL  
compat11.kevent(0x3,0,0,0x5a5689a97540,0x400,0x371d3a2887c8)

..
 58406 sccache  1713947897.514352552 CALL  
_umtx_op(0x5a5689a3d290,0x10,0x7fff,0,0)

 58406 sccache  1713947897.514383653 RET   _umtx_op 0
 58406 sccache  1713947897.514421273 CALL  write(0x5,0x371515685bcc,0x1)
 58406 sccache  1713947897.515050967 STRU  struct freebsd11_kevent[] = { 
{ ident=4, filter=EVFILT_READ, flags=0x60, 
fflags=0, data=0x1, udata=0x } }

 58406 sccache  1713947897.515146151 RET   compat11.kevent 1
 58406 sccache  1713947897.515178978 CALL  read(0x4,0x371d3a2887c0,0x80)
 58406 sccache  1713947897.515368070 GIO   fd 4 read 1 byte
   0x 01   |.|

 58406 sccache  1713947897.515396600 RET   read 1
 58406 sccache  1713947897.515426523 CALL  read(0x4,0x371d3a2887c0,0x80)
 58406 sccache  1713947897.515457073 RET   read -1 errno 35 Resource 
temporarily unavailable

 58406 sccache  1713947897.515004494 GIO   fd 5 wrote 1 byte
   0x 01   |.|
---snip---

https://www.leidinger.net/test/sccache.tar.bz2 contains the parts of the 
trace of the sccache processes (in case someone wants to have a look).


The poudriere run had several builders in parallel, at least 2 were 
running at that point in time. What the overlay does is to startup 
(sccache --start-server) the sccache server process (forks to return 
back on the command line) which creates a file system socket, and then 
it queries the stats (sccache --show-stats). So some of the traces in 
the tarball are the server start (those with "Timed 

TXT Kernel linking failed on -CURRENT

2024-04-24 Thread BSD USER

Sorry for HTML-trash from previous mail :)

Hi, FreeBSD Community!
I have a teach with FreeBSD and use -CURRENT on my test machine.
And some days ago after
- git pull
- make buildworld
- make buildkernel
There is /etc/src.conf and BSDSERV below, what can cause that error?
Thanks for help!

My /usr/src state is:

 git log -n 1
commit a0d7d68a2dd818ce84e37e1ff20c8849cda6d853 (HEAD -> main, 
origin/main, origin/HEAD)

Author: Cy Schubert 


kernel building failed with such messages:
--
--- force-dynamic-hack.pico ---
cc -target x86_64-unknown-freebsd15.0 
--sysroot=/usr/obj/usr/src/amd64.amd64/tmp 
-B/usr/obj/usr/src/amd64.amd64/tmp/usr/bin  -shared -O2 -pipe 
-fno-strict-aliasing -march=native  -nostdinc  -I. -I/usr/src/sys -I/u
sr/src/sys/contrib/ck/include -I/usr/src/sys/contrib/libfdt -D_KERNEL 
-DHAVE_KERNEL_OPTION_HEADERS -include opt_global.h -fno-common    -MD  
-MF.depend.force-dynamic-hack.pico -MTforce-dynamic-hack.pico -fdebug-pr
efix-map=./machine=/usr/src/sys/amd64/include 
-fdebug-prefix-map=./x86=/usr/src/sys/x86/include 
-fdebug-prefix-map=./i386=/usr/src/sys/i386/include -mcmodel=kernel 
-mno-red-zone -mno-mmx -mno-sse -msoft-float -fn
o-asynchronous-unwind-tables -ffreestanding -fwrapv -Wall 
-Wstrict-prototypes -Wmissing-prototypes -Wpointer-arith -Wcast-qual 
-Wundef -Wno-pointer-sign -D__printf__=__freebsd_kprintf__ 
-Wmissing-include-dirs -fdi
agnostics-show-option -Wno-unknown-pragmas -Wswitch 
-Wno-error=tautological-compare -Wno-error=empty-body 
-Wno-error=parentheses-equality -Wno-error=unused-function 
-Wno-error=pointer-sign -Wno-error=shift-negativ
e-value -Wno-address-of-packed-member -Wno-format-zero-length   -mno-aes 
-mno-avx  -std=gnu99 -nostdlib  force-dynamic-hack.c -o 
force-dynamic-hack.pico

--- vers.c ---
MAKE="make" sh /usr/src/sys/conf/newvers.sh  BSDSERV
--- vers.o ---
cc -target x86_64-unknown-freebsd15.0 
--sysroot=/usr/obj/usr/src/amd64.amd64/tmp 
-B/usr/obj/usr/src/amd64.amd64/tmp/usr/bin -c -O2 -pipe 
-fno-strict-aliasing -march=native  -nostdinc  -I. -I/usr/src/sys -I/usr/src
/sys/contrib/ck/include -I/usr/src/sys/contrib/libfdt -D_KERNEL 
-DHAVE_KERNEL_OPTION_HEADERS -include opt_global.h -fno-common 
-fdebug-prefix-map=./machine=/usr/src/sys/amd64/include 
-fdebug-prefix-map=./x86=/
usr/src/sys/x86/include 
-fdebug-prefix-map=./i386=/usr/src/sys/i386/include -mcmodel=kernel 
-mno-red-zone -mno-mmx -mno-sse -msoft-float 
-fno-asynchronous-unwind-tables -ffreestanding -fwrapv -Wall -Wstrict-proto
types -Wmissing-prototypes -Wpointer-arith -Wcast-qual -Wundef 
-Wno-pointer-sign -D__printf__=__freebsd_kprintf__ 
-Wmissing-include-dirs -fdiagnostics-show-option -Wno-unknown-pragmas 
-Wswitch -Wno-error=tautologi
cal-compare -Wno-error=empty-body -Wno-error=parentheses-equality 
-Wno-error=unused-function -Wno-error=pointer-sign 
-Wno-error=shift-negative-value -Wno-address-of-packed-member 
-Wno-format-zero-length -mno-aes

 -mno-avx  -std=gnu99 -Werror vers.c
--- kernel ---
linking kernel
ld: error: undefined symbol: ktrcapfail
>>> referenced by vfs_lookup.c
>>>   vfs_lookup.o:(namei)
>>> referenced by vfs_lookup.c
>>>   vfs_lookup.o:(namei_setup)
>>> referenced by vfs_lookup.c
>>>   vfs_lookup.o:(vfs_lookup)
>>> referenced 3 more times
*** [kernel] Error code 1
make[2]: stopped in /usr/obj/usr/src/amd64.amd64/sys/BSDSERV
make[2]: 1 error
make[2]: stopped in /usr/obj/usr/src/amd64.amd64/sys/BSDSERV
 1098.27 real  2002.17 user   176.26 sys
make[1]: stopped in /usr/src
make: stopped in /usr/src

/etc/src.conf
===
WITHOUT_APM=yes
WITHOUT_ASSERT_DEBUG=yes
WITHOUT_AUTHPF=yes
WITHOUT_BHYVE=yes
WITHOUT_BLACKLIST=yes
WITHOUT_BLUETOOTH=yes
WITHOUT_CCD=yes
WITHOUT_CXGBETOOL=yes
WITHOUT_DEBUG_FILES=yes
WITHOUT_DTRACE=yes
WITHOUT_FLOPPY=yes
WITHOUT_GOOGLETEST=yes
WITHOUT_HAST=yes
WITHOUT_HTML=yes
WITHOUT_HYPERV=yes
WITHOUT_INET6=yes
WITHOUT_IPFILTER=yes
WITHOUT_ISCSI=yes
WITHOUT_KDUMP=yes
WITHOUT_KERNEL_SYMBOLS=yes
WITH_MALLOC_PRODUCTION=yes
WITHOUT_MLX5TOOL=yes
WITHOUT_NVME=yes
WITHOUT_OFED=yes
WITHOUT_PF=yes
WITHOUT_PTHREADS_ASSERTIONS=yes
WITHOUT_RADIUS_SUPPORT=yes
WITHOUT_RELRO=yes
WITHOUT_SSP=yes
WITHOUT_WARNS=yes
WITHOUT_WERROR=yes
WITHOUT_TESTS=yes
WITHOUT_WIRELESS=yes
BSDSERV
===
cpu HAMMER
ident   BSDSERV
device  amdtemp
options SCHED_ULE   # ULE scheduler
options PREEMPTION  # Enable kernel thread preemption
options VIMAGE  # Subsystem virtualization, e.g. 
VNET

options INET    # InterNETworking
options TCP_OFFLOAD # TCP offload
options TCP_BLACKBOX    # Enhanced TCP event 

Kernel linking error on -CURRENT

2024-04-24 Thread USER BSD
Hi, FreeBSD Community! I have a teach with FreeBSD and use -CURRENT on my test machine.And some days ago after- git pull- make buildworld- make buildkernel There is /etc/src.conf and BSDSERV below, what can cause that error?Thanks for help! kernel building failed with such messages:- force-dynamic-hack.pico ---cc -target x86_64-unknown-freebsd15.0 --sysroot=/usr/obj/usr/src/amd64.amd64/tmp -B/usr/obj/usr/src/amd64.amd64/tmp/usr/bin  -shared -O2 -pipe -fno-strict-aliasing -march=native  -nostdinc  -I. -I/usr/src/sys -I/usr/src/sys/contrib/ck/include -I/usr/src/sys/contrib/libfdt -D_KERNEL -DHAVE_KERNEL_OPTION_HEADERS -include opt_global.h -fno-common    -MD  -MF.depend.force-dynamic-hack.pico -MTforce-dynamic-hack.pico -fdebug-prefix-map=./machine=/usr/src/sys/amd64/include -fdebug-prefix-map=./x86=/usr/src/sys/x86/include -fdebug-prefix-map=./i386=/usr/src/sys/i386/include -mcmodel=kernel -mno-red-zone -mno-mmx -mno-sse -msoft-float  -fno-asynchronous-unwind-tables -ffreestanding -fwrapv -Wall -Wstrict-prototypes -Wmissing-prototypes -Wpointer-arith -Wcast-qual -Wundef -Wno-pointer-sign -D__printf__=__freebsd_kprintf__ -Wmissing-include-dirs -fdiagnostics-show-option -Wno-unknown-pragmas -Wswitch -Wno-error=tautological-compare -Wno-error=empty-body -Wno-error=parentheses-equality -Wno-error=unused-function -Wno-error=pointer-sign -Wno-error=shift-negative-value -Wno-address-of-packed-member -Wno-format-zero-length   -mno-aes -mno-avx  -std=gnu99 -nostdlib  force-dynamic-hack.c -o force-dynamic-hack.pico--- vers.c ---MAKE="make" sh /usr/src/sys/conf/newvers.sh  BSDSERV--- vers.o ---cc -target x86_64-unknown-freebsd15.0 --sysroot=/usr/obj/usr/src/amd64.amd64/tmp -B/usr/obj/usr/src/amd64.amd64/tmp/usr/bin -c -O2 -pipe -fno-strict-aliasing -march=native  -nostdinc  -I. -I/usr/src/sys -I/usr/src/sys/contrib/ck/include -I/usr/src/sys/contrib/libfdt -D_KERNEL -DHAVE_KERNEL_OPTION_HEADERS -include opt_global.h -fno-common -fdebug-prefix-map=./machine=/usr/src/sys/amd64/include -fdebug-prefix-map=./x86=/usr/src/sys/x86/include -fdebug-prefix-map=./i386=/usr/src/sys/i386/include -mcmodel=kernel -mno-red-zone -mno-mmx -mno-sse -msoft-float  -fno-asynchronous-unwind-tables -ffreestanding -fwrapv -Wall -Wstrict-prototypes -Wmissing-prototypes -Wpointer-arith -Wcast-qual -Wundef -Wno-pointer-sign -D__printf__=__freebsd_kprintf__ -Wmissing-include-dirs -fdiagnostics-show-option -Wno-unknown-pragmas -Wswitch -Wno-error=tautological-compare -Wno-error=empty-body -Wno-error=parentheses-equality -Wno-error=unused-function -Wno-error=pointer-sign -Wno-error=shift-negative-value -Wno-address-of-packed-member -Wno-format-zero-length   -mno-aes -mno-avx  -std=gnu99 -Werror vers.c--- kernel ---linking kernelld: error: undefined symbol: ktrcapfail>>> referenced by vfs_lookup.c>>>   vfs_lookup.o:(namei)>>> referenced by vfs_lookup.c>>>   vfs_lookup.o:(namei_setup)>>> referenced by vfs_lookup.c>>>   vfs_lookup.o:(vfs_lookup)>>> referenced 3 more times*** [kernel] Error code 1 make[2]: stopped in /usr/obj/usr/src/amd64.amd64/sys/BSDSERVmake[2]: 1 error make[2]: stopped in /usr/obj/usr/src/amd64.amd64/sys/BSDSERV 1098.27 real  2002.17 user   176.26 sys make[1]: stopped in /usr/src make: stopped in /usr/src  /etc/src.conf===WITHOUT_APM=yesWITHOUT_ASSERT_DEBUG=yesWITHOUT_AUTHPF=yesWITHOUT_BHYVE=yesWITHOUT_BLACKLIST=yesWITHOUT_BLUETOOTH=yesWITHOUT_CCD=yesWITHOUT_CXGBETOOL=yesWITHOUT_DEBUG_FILES=yesWITHOUT_DTRACE=yesWITHOUT_FLOPPY=yesWITHOUT_GOOGLETEST=yesWITHOUT_HAST=yesWITHOUT_HTML=yesWITHOUT_HYPERV=yesWITHOUT_INET6=yesWITHOUT_IPFILTER=yesWITHOUT_ISCSI=yesWITHOUT_KDUMP=yesWITHOUT_KERNEL_SYMBOLS=yesWITH_MALLOC_PRODUCTION=yesWITHOUT_MLX5TOOL=yesWITHOUT_NVME=yesWITHOUT_OFED=yesWITHOUT_PF=yesWITHOUT_PTHREADS_ASSERTIONS=yesWITHOUT_RADIUS_SUPPORT=yesWITHOUT_RELRO=yesWITHOUT_SSP=yesWITHOUT_WARNS=yesWITHOUT_WERROR=yesWITHOUT_TESTS=yesWITHOUT_WIRELESS=yes BSDSERV===cpu HAMMERident   BSDSERVdevice  amdtempoptions SCHED_ULE   # ULE scheduleroptions PREEMPTION  # Enable kernel thread preemptionoptions VIMAGE  # Subsystem virtualization, e.g. VNEToptions INET    # InterNETworkingoptions TCP_OFFLOAD # TCP offloadoptions TCP_BLACKBOX    # Enhanced TCP event loggingoptions TCP_HHOOK   # hhook(9) framework for TCPoptions TCP_RFC7413 # TCP Fast Openoptions KERN_TLS    # TLS transmit & receive offloadoptions FFS # Berkeley Fast Filesystemoptions SOFTUPDATES # Enable FFS soft updates supportoptions MD_ROOT

Re: pkg server for current/arm64 stopped ? [main-armv7 on ampere2, elapsed so far: 651:21:56]

2024-04-23 Thread Philip Paeps

On 2024-04-24 02:12:41 (+0800), Mark Millard wrote:


On Apr 19, 2024, at 07:16, Philip Paeps  wrote:


On 2024-04-18 23:02:30 (+0800), Mark Millard wrote:


void  wrote on
Date: Thu, 18 Apr 2024 14:08:36 UTC :


Not sure where to post this..

The last bulk build for arm64 appears to have happened around
mid-March on ampere2. Is it broken?


main-armv7 building is broken and the last completed build
was the one started on Mon, 19 Feb 2024 12:32:10 GMT. It
gets stuck making no progress until manually forced to stop,
which leads to huge elapsed times for the incomplete builds:

pd5512ae7b8c6_s75464941dc 34472 12282  (+9196) 107  (+77) 4753  
(+2247) 1390  (+529) 15940 parallel_build: Fri, 22 Mar 2024 11:05:01 
GMT 651:21:56


p43e3af5f5763_sf5f08e41aa 19809 5919  (+3126) 137  (+100) 5363  
(+2741) 1395  (+522) 6995 parallel_build: Wed, 28 Feb 2024 15:46:14 
GMT 359:42:14 ampere2


ampere2 alternates between trying to build main-arm64 and 
main-armv7, so main-armv7 being stuck blocks main-arm64 from 
building.


One can see that all 13 job ID's show over 570 hours:

http://ampere2.nyi.freebsd.org/build.html?mastername=main-armv7-default=pd5512ae7b8c6_s75464941dc

It is not random which packages are building when this happens. 
Compare:


http://ampere2.nyi.freebsd.org/build.html?mastername=main-armv7-default=p43e3af5f5763_sf5f08e41aa

By contrast, the 19 Feb 2024 from-scratch (full) build worked:

http://ampere2.nyi.freebsd.org/build.html?mastername=main-armv7-default=pe9c9c73181b5_sbd45bbe440

My guess is that FreeBSD has something that broken after bd45bbe440
that was broken as of f5f08e41aa and was still broken at 75464941dc 
.


I'll kill the build on ampere2 again.  Thanks for the nudge.

We don't really have good monitoring for this.  Also: builds should 
time out after 36 hours.  The fact that this one does not is a bug in 
itself.


Philip [hat: clusteradm]


I'll note that I've never managed to replicate the problem for
building for armv7 on aarch64. But my context never has the
likes of:

QUOTE
Host OSVERSION: 156
Jail OSVERSION: 1500015
. . .
!!! Jail is newer than host. (Jail: 1500015, Host: 156) !!!
!!! This is not supported. !!!
!!! Host kernel must be same or newer than jail. !!!
!!! Expect build failures. !!!
END QUOTE

but always has the two OSVERSION's the same, such as:

Host OSVERSION: 1500015
Jail OSVERSION: 1500015

or, recently,

Host OSVERSION: 1500018
Jail OSVERSION: 1500018

My bulk runs do go through the sequence where the hangups
have repeated for main-armv7 on ampere2.

I wonder what would happen if "Host OSVERSION" was updated
(modernized) to match the modern "Jail OSVERSION" that would
be used?


The package builders are due for a regular refresh to newer -CURRENT 
dogfood.  I'll do the aarch64 builders first this time.


I've set /root/stop-builds on them.  I'll upgrade them when they go 
idle.  Or I'll kill them if they take much longer to build what they're 
building.  It annoys me that they do not stop building after 36 hours, 
like they're supposed to.


They're currently running:

n266879-6abee52e0d79   2023-12-09 01:06:28 jlduran strfmon: Silence 
scan-build warning


Our current clusteradm build is:

n269399-bbc6e6c5ec8c   2024-04-14 03:12:36 sigsys daemon: fix -R to 
enable supervision mode


I may do a new build while waiting for them to go idle:

-   quarterly 140arm64 1b931669de11 parallel_build 28776 15299   33  588 
   985 0  11871 3D:01:08:29 
https://pkg-status.freebsd.org/ampere1/build.html?mastername=140arm64-quarterly=1b931669de11
-   default main-arm64 p1c7a816cd0ad_s1bd4f769caf parallel_build 34528 
19888   65  669980 0  12926 4D:00:52:21 
https://pkg-status.freebsd.org/ampere2/build.html?mastername=main-arm64-default=p1c7a816cd0ad_s1bd4f769caf
-   default 140releng-armv7 2910ff97e727 parallel_build 34543 14826   60 
5539   1397 0  12721 1D:09:35:28 
https://pkg-status.freebsd.org/ampere3/build.html?mastername=140releng-armv7-default=2910ff97e727


Philip



Re: April 2024 stabilization week

2024-04-23 Thread Gleb Smirnoff
  Hi FreeBSD/main users & developers,

this stabilization week [likely final] status update:

* Netflix testing didn't discover any stability issues with
  main-n269602-dd03eafacba9.
* Netflix testing didn't discover any substantial performance
  degradations.  The data is still being analyzed though.
* A regression with ZFS reported in 
  https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=278494
  has been addressed by ZFS 9f83eec03904b18e052fbe2c66542bd47254cf57.
* An old (more than a month old) regression has been identified
  with accept_filter(9).
  Fixed by a8acc2bf5699556946dda2a37589d3c3bd9762c6.

Since FreeBSD/main has been pushed with several non-documentation, non-a-
trivial-bugfix commits during the days of the stabilization week, I can't
guarantee that the above testing results are applicable to the current state of
FreeBSD/main.  That's why I created a temporary cherry-picking branch
stabweek-2024-Apr that is published at https://github.com/glebius/FreeBSD.git.

Users of FreeBSD/main are adviced with the following choices:

- Pull up FreeBSD/main up to a8acc2bf5699556946dda2a37589d3c3bd9762c6 and use it
  as your stabilization point.  There is tiny risk of untested changes added
  recently.
- Pull stabweek-2024-Apr from https://github.com/glebius/FreeBSD.git.
- Craft stabweek-2024-Apr yourself:
  # git checkout -b stabweek-2024-Apr dd03eafacba962c9dcec929c3ed9d63e7c43da3a
  # git cherry-pick -x --strategy=subtree -Xsubtree=sys/contrib/openzfs 
9f83eec03904b18e052fbe2c66542bd47254cf57
  # git cherry-pick -x a8acc2bf5699556946dda2a37589d3c3bd9762c6

I'm planning to end the advisory freeze on the main branch Wednesday morning
at 8:00 UTC, unless somebody opposes that with a valid reason, e.g. a
regression that I missed.

-- 
Gleb Smirnoff


signature.asc
Description: PGP signature


Re: pkg server for current/arm64 stopped ? [main-armv7 on ampere2, elapsed so far: 651:21:56]

2024-04-23 Thread Mark Millard
On Apr 19, 2024, at 07:16, Philip Paeps  wrote:

> On 2024-04-18 23:02:30 (+0800), Mark Millard wrote:
> 
>> void  wrote on
>> Date: Thu, 18 Apr 2024 14:08:36 UTC :
>> 
>>> Not sure where to post this..
>>> 
>>> The last bulk build for arm64 appears to have happened around
>>> mid-March on ampere2. Is it broken?
>> 
>> main-armv7 building is broken and the last completed build
>> was the one started on Mon, 19 Feb 2024 12:32:10 GMT. It
>> gets stuck making no progress until manually forced to stop,
>> which leads to huge elapsed times for the incomplete builds:
>> 
>> pd5512ae7b8c6_s75464941dc 34472 12282  (+9196) 107  (+77) 4753  (+2247) 1390 
>>  (+529) 15940 parallel_build: Fri, 22 Mar 2024 11:05:01 GMT 651:21:56
>> 
>> p43e3af5f5763_sf5f08e41aa 19809 5919  (+3126) 137  (+100) 5363  (+2741) 1395 
>>  (+522) 6995 parallel_build: Wed, 28 Feb 2024 15:46:14 GMT 359:42:14 ampere2
>> 
>> ampere2 alternates between trying to build main-arm64 and main-armv7, so 
>> main-armv7 being stuck blocks main-arm64 from building.
>> 
>> One can see that all 13 job ID's show over 570 hours:
>> 
>> http://ampere2.nyi.freebsd.org/build.html?mastername=main-armv7-default=pd5512ae7b8c6_s75464941dc
>> 
>> It is not random which packages are building when this happens. Compare:
>> 
>> http://ampere2.nyi.freebsd.org/build.html?mastername=main-armv7-default=p43e3af5f5763_sf5f08e41aa
>> 
>> By contrast, the 19 Feb 2024 from-scratch (full) build worked:
>> 
>> http://ampere2.nyi.freebsd.org/build.html?mastername=main-armv7-default=pe9c9c73181b5_sbd45bbe440
>> 
>> My guess is that FreeBSD has something that broken after bd45bbe440
>> that was broken as of f5f08e41aa and was still broken at 75464941dc .
> 
> I'll kill the build on ampere2 again.  Thanks for the nudge.
> 
> We don't really have good monitoring for this.  Also: builds should time out 
> after 36 hours.  The fact that this one does not is a bug in itself.
> 
> Philip [hat: clusteradm]

I'll note that I've never managed to replicate the problem for
building for armv7 on aarch64. But my context never has the
likes of:

QUOTE
Host OSVERSION: 156
Jail OSVERSION: 1500015
 . .
!!! Jail is newer than host. (Jail: 1500015, Host: 156) !!!
!!! This is not supported. !!!
!!! Host kernel must be same or newer than jail. !!!
!!! Expect build failures. !!!
END QUOTE

but always has the two OSVERSION's the same, such as:

Host OSVERSION: 1500015
Jail OSVERSION: 1500015

or, recently,

Host OSVERSION: 1500018
Jail OSVERSION: 1500018

My bulk runs do go through the sequence where the hangups
have repeated for main-armv7 on ampere2.

I wonder what would happen if "Host OSVERSION" was updated
(modernized) to match the modern "Jail OSVERSION" that would
be used?



===
Mark Millard
marklmi at yahoo.com




Re: April 2024 stabilization week

2024-04-23 Thread Gleb Smirnoff
  Hi FreeBSD/main users & developers,

On Mon, Apr 22, 2024 at 01:00:50AM -0700, Gleb Smirnoff wrote:
T> This is an automated email to inform you that the April 2024 stabilization 
week
T> started with FreeBSD/main at main-n269602-dd03eafacba9, which was tagged as
T> main-stabweek-2024-Apr.
T> 
T> The tag main-stabweek-2024-Apr has been published at
T> https://github.com/glebius/FreeBSD/tags.  Those who want to participate
T> in the stabilization week are encouraged to update to the above
T> revision/tag and test their systems.

* Netflix testing didn't discover any stability issues with
  main-n269602-dd03eafacba9.  We are still running the performance test,
  but preliminary results are that everything is fine.
* My personal desktop/server experience with the tag neither has any problems.

Feel free to reply with your reports if you participated in the testing, too.

In bugzilla we have this submission, which looks important:

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=278494

I want to hear from Alexander and Martin before thawing the advisory freeze.
Don't want to declare the tag good if some ZFS systems fail to boot after
upgrade.

-- 
Gleb Smirnoff



Re: llvm and Undefined symbols: ___truncsfbf2 problem

2024-04-23 Thread Hiroo Ono
Thank you.
I updated my current to recent current and confirmed that julia
1.11.0 beta1 builds and runs with the system clang (18.1.4).


On Thu, 18 Apr 2024 00:36:28 +0200
Dimitry Andric  wrote:

> On 11 Apr 2024, at 15:07, Hiroo Ono  wrote:
> > 
> > Hello,
> > 
> > I am trying to update the lang/julia port to 1.11.0 (currently
> > still in beta 1). I seem to ran across this problem initially
> > reported on MacOS. https://github.com/JuliaLang/julia/issues/52067
> > 
> > The llvm team seems to have patched this problem only for Darwin.
> > https://github.com/llvm/llvm-project/pull/84192
> > 
> > I think the solution is also needed for FreeBSD, but should I
> > report it directly to llvm team or report here or to FreeBSD
> > bugzilla and ask toolchain maintainer of FreeBSD to report
> > upstream?  
> 
> The __bf16 type is only available on some architectures, and only
> supported by relatively recent compiler versions, in combination with
> some runtime support (i.e. compiler-rt or libgcc).
> 
> Approximately: it is available on aarch64, amd64, arm (with fp), i386
> (with sse2) and riscv. And it is supported by clang 15 and later
> (though not for riscv, which requires clang 18), and gcc 13 and later.
> 
> However, the runtime support in FreeBSD was only added with the recent
> merge of llvm 18. The necessary library functions (truncdfbf2 and
> truncsfbf2) are now in compiler-rt.
> 
> -Dimitry
> 
> 




Re: Strange network/socket anomalies since about a month

2024-04-22 Thread Gleb Smirnoff
  Alexander,

On Mon, Apr 22, 2024 at 09:26:59AM +0200, Alexander Leidinger wrote:
A> I see a higher failure rate of socket/network related stuff since a while.
A> Those failures are transient. Directly executing the same thing again may
A> or may not result in success/failure. I'm not able to reproduce this at
A> will. Sometimes they show up.
A> 
A> Examples:
A>  - poudriere runs with the sccache overlay (like ccache but also works for
A> rust) sometimes fail to create the communication socket and as such the
A> build fails. I have 3 different poudriere bulk runs after each other in my
A> build script, and when the first one fails, the second and third still run.
A> If the first fails due to the sccache issue, the second and 3rd may or may
A> not fail. Sometimes the first fails and the rest is ok. Sometimes all fail,
A> and if I then run one by hand it works (the script does the same as the
A> manual run, the script is simply a "for type in A B C; do; poudriere bulk
A> -O sccache -j $type -f  ${type}.pkglist; done" which I execute from the
A> same shell, and the script doesn't do env-sanityzing).
A>  - A webmail interface (inet / local net -> nginx (rev-proxy) -> nginx
A> (webmail service) -> php -> imap) sees intermittent issues sometimes.
A> Opening the same email directly again afterwards normally works. I've also
A> seen transient issues with pgp signing (webmail interface -> gnupg /
A> gpg-agent on the server), simply hitting send again after a failure works
A> fine.
A> 
A> Gleb, could this be related to the socket stuff you did 2 weeks ago? My
A> world is from 2024-04-17-112537. I do notice this since at least then, but
A> I'm not sure if they where there before that and I simply didn't notice
A> them. They are surely "new recently", that amount of issues I haven's seen
A> in January. The last two updates of current I did before the last one where
A> on 2024-03-31-120210 and 2024-04-08-112551.

The stuff I pushed 2 weeks ago was a large rewrite of unix/stream, but that was
reverted as it appears needs more work wrt to aio(4), nfs/rpc and also appeared
that sendfile(2) over unix(4) has some non-zero use.

There were several preparatory commits that were not reverted and one of them
had a bug.  The bug manifested itself as failure to send(2) zero bytes over
unix/stream.  It was fixed with e6a4b57239dafc6c944473326891d46d966c0264. Can
you please check you have this revision? Other than that there are no known
bugs left.

A> I could also imagine that some memory related transient failure could cause
A> this, but with >3 GB free I do not expect this. Important here may be that
A> I have https://reviews.freebsd.org/D40575 in my tree, which is memory
A> related, but it's only a metric to quantify memory fragmentation.
A> 
A> Any ideas how to track this down more easily than running the entire
A> poudriere in ktrace (e.g. a hint/script which dtrace probes to use)?

I don't have any better idea than ktrace over failing application.  Yep, I
understand that poudriere will produce a lot.  But first we need to determine
what syscall fails and on what type of socket.  After that we can scope down to
using dtrace on very particular functions.

-- 
Gleb Smirnoff



Re: Strange network/socket anomalies since about a month

2024-04-22 Thread Paul Mather
On Apr 22, 2024, at 3:26 AM, Alexander Leidinger  
wrote:


> Hi,
> 
> I see a higher failure rate of socket/network related stuff since a while. 
> Those failures are transient. Directly executing the same thing again may or 
> may not result in success/failure. I'm not able to reproduce this at will. 
> Sometimes they show up.
> 
> Examples:
> - poudriere runs with the sccache overlay (like ccache but also works for 
> rust) sometimes fail to create the communication socket and as such the build 
> fails. I have 3 different poudriere bulk runs after each other in my build 
> script, and when the first one fails, the second and third still run. If the 
> first fails due to the sccache issue, the second and 3rd may or may not fail. 
> Sometimes the first fails and the rest is ok. Sometimes all fail, and if I 
> then run one by hand it works (the script does the same as the manual run, 
> the script is simply a "for type in A B C; do; poudriere bulk -O sccache -j 
> $type -f  ${type}.pkglist; done" which I execute from the same shell, and the 
> script doesn't do env-sanityzing).
> - A webmail interface (inet / local net -> nginx (rev-proxy) -> nginx 
> (webmail service) -> php -> imap) sees intermittent issues sometimes. Opening 
> the same email directly again afterwards normally works. I've also seen 
> transient issues with pgp signing (webmail interface -> gnupg / gpg-agent on 
> the server), simply hitting send again after a failure works fine.
> 
> Gleb, could this be related to the socket stuff you did 2 weeks ago? My world 
> is from 2024-04-17-112537. I do notice this since at least then, but I'm not 
> sure if they where there before that and I simply didn't notice them. They 
> are surely "new recently", that amount of issues I haven's seen in January. 
> The last two updates of current I did before the last one where on 
> 2024-03-31-120210 and 2024-04-08-112551.
> 
> I could also imagine that some memory related transient failure could cause 
> this, but with >3 GB free I do not expect this. Important here may be that I 
> have https://reviews.freebsd.org/D40575 in my tree, which is memory related, 
> but it's only a metric to quantify memory fragmentation.
> 
> Any ideas how to track this down more easily than running the entire 
> poudriere in ktrace (e.g. a hint/script which dtrace probes to use)?


No answers, I'm afraid, just a "me too."

I have the same problem as you describe when using ports-mgmt/sccache-overlay 
when building packages with Poudriere.  In my case, I'm using FreeBSD 14-STABLE 
(stable/14-13952fbca).

I actually stopped using ports-mgmt/sccache-overlay because it got to the point 
where it didn't work more often than it did.  Then, a few months ago, I decided 
to start using it again on a whim and it worked reliably for me.  Then, 
starting a few weeks ago, it has reverted to the behaviour you describe above.  
It is not as bad right now as it got when I quit using it.  Now, sometimes it 
will fail, but it will succeed when re-running a "poudriere bulk" run.

I'd love it to go back to when it was working 100% of the time.

Cheers,

Paul.




April 2024 stabilization week

2024-04-22 Thread Gleb Smirnoff
  Hi FreeBSD/main users & developers:

This is an automated email to inform you that the April 2024 stabilization week
started with FreeBSD/main at main-n269602-dd03eafacba9, which was tagged as
main-stabweek-2024-Apr.

The tag main-stabweek-2024-Apr has been published at
https://github.com/glebius/FreeBSD/tags.  Those who want to participate
in the stabilization week are encouraged to update to the above
revision/tag and test their systems.

Developers are encouraged to avoid pushing new features to FreeBSD/main,
but focus on bugfixes instead.  The stabilization week runs up to
Friday 18:00 UTC, but if there is consensus that any regressions
discovered by participants have been fixed, it will end early.

Once that happens, the advisory freeze of FreeBSD/main branch is thawed.

--
Gleb Smirnoff



Strange network/socket anomalies since about a month

2024-04-22 Thread Alexander Leidinger

Hi,

I see a higher failure rate of socket/network related stuff since a 
while. Those failures are transient. Directly executing the same thing 
again may or may not result in success/failure. I'm not able to 
reproduce this at will. Sometimes they show up.


Examples:
 - poudriere runs with the sccache overlay (like ccache but also works 
for rust) sometimes fail to create the communication socket and as such 
the build fails. I have 3 different poudriere bulk runs after each other 
in my build script, and when the first one fails, the second and third 
still run. If the first fails due to the sccache issue, the second and 
3rd may or may not fail. Sometimes the first fails and the rest is ok. 
Sometimes all fail, and if I then run one by hand it works (the script 
does the same as the manual run, the script is simply a "for type in A B 
C; do; poudriere bulk -O sccache -j $type -f  ${type}.pkglist; done" 
which I execute from the same shell, and the script doesn't do 
env-sanityzing).
 - A webmail interface (inet / local net -> nginx (rev-proxy) -> nginx 
(webmail service) -> php -> imap) sees intermittent issues sometimes. 
Opening the same email directly again afterwards normally works. I've 
also seen transient issues with pgp signing (webmail interface -> gnupg 
/ gpg-agent on the server), simply hitting send again after a failure 
works fine.


Gleb, could this be related to the socket stuff you did 2 weeks ago? My 
world is from 2024-04-17-112537. I do notice this since at least then, 
but I'm not sure if they where there before that and I simply didn't 
notice them. They are surely "new recently", that amount of issues I 
haven's seen in January. The last two updates of current I did before 
the last one where on 2024-03-31-120210 and 2024-04-08-112551.


I could also imagine that some memory related transient failure could 
cause this, but with >3 GB free I do not expect this. Important here may 
be that I have https://reviews.freebsd.org/D40575 in my tree, which is 
memory related, but it's only a metric to quantify memory fragmentation.


Any ideas how to track this down more easily than running the entire 
poudriere in ktrace (e.g. a hint/script which dtrace probes to use)?


Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


signature.asc
Description: OpenPGP digital signature


Re: Unfamiliar console message: in prompt_tty(): caught signal 2

2024-04-21 Thread bob prohaska
On Sun, Apr 21, 2024 at 10:16:55PM +0200, Dag-Erling Smørgrav wrote:
> bob prohaska  writes:
> > Apr 20 22:14:37 www su[30398]: in prompt_tty(): caught signal 2
> 
> This means someone ran `su` and pressed Ctrl-C instead of entering a
> password when prompted.

Ahh, that would have been me. Thank you!

bob prohaska




etc/rc.d/nuageinit sometimes missing

2024-04-21 Thread Mark Millard
I've not managed to track down how yet (ever?), but in two of my upgraded 
directory trees the etc/rc.d/nuageinit ended up being missing.

By contrast, the places that were filled in via "installworld distrib-dirs 
distribution DB_FROM_SRC=1" usage did have the file. (Installed from the same 
build.)

Note, I noticed because of a message from an etcupdate run: I do not use the 
file for anything.

===
Mark Millard
marklmi at yahoo.com




libclang_rt.asan_static-aarch64.a and libclang_rt.fuzzer_interceptors-aarch64.a in .../tmp/lib/clang/17/lib/freebsd/ not cleaned out

2024-04-21 Thread Mark Millard
In my recent FreeBSD update activity, the following files blocked the "delete 
old things" activity from from deleting the related old 17/ subdirectory tree 
for clang:

-r--r--r--  1 root wheel - 8378 Mar  2 19:39:47 2024 
/usr/obj/BUILDs/main-CA76-nodbg-clang/usr/main-src/arm64.aarch64/tmp/usr/lib/clang/17/lib/freebsd/libclang_rt.asan_static-aarch64.a
-r--r--r--  1 root wheel - 998 Mar  2 19:39:47 2024 
/usr/obj/BUILDs/main-CA76-nodbg-clang/usr/main-src/arm64.aarch64/tmp/usr/lib/clang/17/lib/freebsd/libclang_rt.fuzzer_interceptors-aarch64.a


===
Mark Millard
marklmi at yahoo.com




Re: Unfamiliar console message: in prompt_tty(): caught signal 2

2024-04-21 Thread Dag-Erling Smørgrav
bob prohaska  writes:
> Apr 20 22:14:37 www su[30398]: in prompt_tty(): caught signal 2

This means someone ran `su` and pressed Ctrl-C instead of entering a
password when prompted.

DES
-- 
Dag-Erling Smørgrav - d...@freebsd.org



Unfamiliar console message: in prompt_tty(): caught signal 2

2024-04-21 Thread bob prohaska
On the serial console on a Pi3 v1.1 (so armv7) I just noticed an
unfamilar message:

Apr 20 22:14:37 www su[30398]: in prompt_tty(): caught signal 2

Several login failures were reported shortly afterward, so
the message seems to have been a console message, not from
the tip session used to connect.

I've never seen it before and wondered if it has any special 
importance. The machine was running buildworld on -current,
updated a day or so ago.

By the next morning the machine had locked up hard, no
response to the enter-tilda-control-B debugger escape.

After power-cycling it came back up after fsck  and buildworld
was resumed where it left off.

Thanks for reading, 

bob prohaska






Re: pkg server for current/arm64 stopped ? [main-armv7 on ampere2, elapsed so far: 651:21:56]

2024-04-19 Thread Philip Paeps

On 2024-04-18 23:02:30 (+0800), Mark Millard wrote:


void  wrote on
Date: Thu, 18 Apr 2024 14:08:36 UTC :


Not sure where to post this..

The last bulk build for arm64 appears to have happened around
mid-March on ampere2. Is it broken?


main-armv7 building is broken and the last completed build
was the one started on Mon, 19 Feb 2024 12:32:10 GMT. It
gets stuck making no progress until manually forced to stop,
which leads to huge elapsed times for the incomplete builds:

pd5512ae7b8c6_s75464941dc 34472 12282  (+9196) 107  (+77) 4753  
(+2247) 1390  (+529) 15940 parallel_build: Fri, 22 Mar 2024 11:05:01 
GMT 651:21:56


p43e3af5f5763_sf5f08e41aa 19809 5919  (+3126) 137  (+100) 5363  
(+2741) 1395  (+522) 6995 parallel_build: Wed, 28 Feb 2024 15:46:14 
GMT 359:42:14 ampere2


ampere2 alternates between trying to build main-arm64 and main-armv7, 
so main-armv7 being stuck blocks main-arm64 from building.


One can see that all 13 job ID's show over 570 hours:

http://ampere2.nyi.freebsd.org/build.html?mastername=main-armv7-default=pd5512ae7b8c6_s75464941dc

It is not random which packages are building when this happens. 
Compare:


http://ampere2.nyi.freebsd.org/build.html?mastername=main-armv7-default=p43e3af5f5763_sf5f08e41aa

By contrast, the 19 Feb 2024 from-scratch (full) build worked:

http://ampere2.nyi.freebsd.org/build.html?mastername=main-armv7-default=pe9c9c73181b5_sbd45bbe440

My guess is that FreeBSD has something that broken after bd45bbe440
that was broken as of f5f08e41aa and was still broken at 75464941dc .


I'll kill the build on ampere2 again.  Thanks for the nudge.

We don't really have good monitoring for this.  Also: builds should time 
out after 36 hours.  The fact that this one does not is a bug in itself.


Philip [hat: clusteradm]



Re: pkg server for current/arm64 stopped ? [main-armv7 on ampere2, elapsed so far: 651:21:56]

2024-04-18 Thread void

On Thu, Apr 18, 2024 at 08:02:30AM -0700, Mark Millard wrote:

void  wrote on
Date: Thu, 18 Apr 2024 14:08:36 UTC :


Not sure where to post this..

The last bulk build for arm64 appears to have happened around
mid-March on ampere2. Is it broken?


main-armv7 building is broken and the last completed build
was the one started on Mon, 19 Feb 2024 12:32:10 GMT. It
gets stuck making no progress until manually forced to stop,
which leads to huge elapsed times for the incomplete builds:


Should I report it in bugzilla?

--



Re: pkg server for current/arm64 stopped ? [main-armv7 on ampere2, elapsed so far: 651:21:56]

2024-04-18 Thread Mark Millard



On Apr 18, 2024, at 08:02, Mark Millard  wrote:

> void  wrote on
> Date: Thu, 18 Apr 2024 14:08:36 UTC :
> 
>> Not sure where to post this..
>> 
>> The last bulk build for arm64 appears to have happened around
>> mid-March on ampere2. Is it broken?
> 
> main-armv7 building is broken and the last completed build
> was the one started on Mon, 19 Feb 2024 12:32:10 GMT. It
> gets stuck making no progress until manually forced to stop,
> which leads to huge elapsed times for the incomplete builds:
> 
> pd5512ae7b8c6_s75464941dc 34472 12282  (+9196) 107  (+77) 4753  (+2247) 1390  
> (+529) 15940 parallel_build: Fri, 22 Mar 2024 11:05:01 GMT 651:21:56
> 
> p43e3af5f5763_sf5f08e41aa 19809 5919  (+3126) 137  (+100) 5363  (+2741) 1395  
> (+522) 6995 parallel_build: Wed, 28 Feb 2024 15:46:14 GMT 359:42:14 ampere2
> 
> ampere2 alternates between trying to build main-arm64 and main-armv7, so 
> main-armv7 being stuck blocks main-arm64 from building.
> 
> One can see that all 13 job ID's show over 570 hours:
> 
> http://ampere2.nyi.freebsd.org/build.html?mastername=main-armv7-default=pd5512ae7b8c6_s75464941dc
> 
> It is not random which packages are building when this happens. Compare:
> 
> http://ampere2.nyi.freebsd.org/build.html?mastername=main-armv7-default=p43e3af5f5763_sf5f08e41aa
> 
> By contrast, the 19 Feb 2024 from-scratch (full) build worked:
> 
> http://ampere2.nyi.freebsd.org/build.html?mastername=main-armv7-default=pe9c9c73181b5_sbd45bbe440
> 
> My guess is that FreeBSD has something that broken after bd45bbe440
> that was broken as of f5f08e41aa and was still broken at 75464941dc .
> 

One thing of possible note:

Failing . . .

Host OSVERSION: 156
Jail OSVERSION: 1500014

and, more recently,

Host OSVERSION: 156
Jail OSVERSION: 1500015

But the most recent working had . . .

Host OSVERSION: 156
Jail OSVERSION: 1500014

So, if it is a FreeBSD problem, it seems to have started during 1500014 .


===
Mark Millard
marklmi at yahoo.com




pkg server for current/arm64 stopped ? [main-armv7 on ampere2, elapsed so far: 651:21:56]

2024-04-18 Thread Mark Millard
void  wrote on
Date: Thu, 18 Apr 2024 14:08:36 UTC :

> Not sure where to post this..
> 
> The last bulk build for arm64 appears to have happened around
> mid-March on ampere2. Is it broken?

main-armv7 building is broken and the last completed build
was the one started on Mon, 19 Feb 2024 12:32:10 GMT. It
gets stuck making no progress until manually forced to stop,
which leads to huge elapsed times for the incomplete builds:

pd5512ae7b8c6_s75464941dc 34472 12282  (+9196) 107  (+77) 4753  (+2247) 1390  
(+529) 15940 parallel_build: Fri, 22 Mar 2024 11:05:01 GMT 651:21:56

p43e3af5f5763_sf5f08e41aa 19809 5919  (+3126) 137  (+100) 5363  (+2741) 1395  
(+522) 6995 parallel_build: Wed, 28 Feb 2024 15:46:14 GMT 359:42:14 ampere2

ampere2 alternates between trying to build main-arm64 and main-armv7, so 
main-armv7 being stuck blocks main-arm64 from building.

One can see that all 13 job ID's show over 570 hours:

http://ampere2.nyi.freebsd.org/build.html?mastername=main-armv7-default=pd5512ae7b8c6_s75464941dc

It is not random which packages are building when this happens. Compare:

http://ampere2.nyi.freebsd.org/build.html?mastername=main-armv7-default=p43e3af5f5763_sf5f08e41aa

By contrast, the 19 Feb 2024 from-scratch (full) build worked:

http://ampere2.nyi.freebsd.org/build.html?mastername=main-armv7-default=pe9c9c73181b5_sbd45bbe440

My guess is that FreeBSD has something that broken after bd45bbe440
that was broken as of f5f08e41aa and was still broken at 75464941dc .


===
Mark Millard
marklmi at yahoo.com



Re: llvm and Undefined symbols: ___truncsfbf2 problem

2024-04-17 Thread Dimitry Andric
On 11 Apr 2024, at 15:07, Hiroo Ono  wrote:
> 
> Hello,
> 
> I am trying to update the lang/julia port to 1.11.0 (currently still in beta 
> 1).
> I seem to ran across this problem initially reported on MacOS.
> https://github.com/JuliaLang/julia/issues/52067
> 
> The llvm team seems to have patched this problem only for Darwin.
> https://github.com/llvm/llvm-project/pull/84192
> 
> I think the solution is also needed for FreeBSD, but should I report it 
> directly
> to llvm team or report here or to FreeBSD bugzilla and ask toolchain 
> maintainer
> of FreeBSD to report upstream?

The __bf16 type is only available on some architectures, and only
supported by relatively recent compiler versions, in combination with
some runtime support (i.e. compiler-rt or libgcc).

Approximately: it is available on aarch64, amd64, arm (with fp), i386
(with sse2) and riscv. And it is supported by clang 15 and later (though
not for riscv, which requires clang 18), and gcc 13 and later.

However, the runtime support in FreeBSD was only added with the recent
merge of llvm 18. The necessary library functions (truncdfbf2 and
truncsfbf2) are now in compiler-rt.

-Dimitry




llvm and Undefined symbols: ___truncsfbf2 problem

2024-04-11 Thread Hiroo Ono
Hello,

I am trying to update the lang/julia port to 1.11.0 (currently still in beta 1).
I seem to ran across this problem initially reported on MacOS.
https://github.com/JuliaLang/julia/issues/52067

The llvm team seems to have patched this problem only for Darwin.
https://github.com/llvm/llvm-project/pull/84192

I think the solution is also needed for FreeBSD, but should I report it directly
to llvm team or report here or to FreeBSD bugzilla and ask toolchain maintainer
of FreeBSD to report upstream?

Hiroo Ono 



Re: Request for Testing: TCP RACK

2024-04-10 Thread Nuno Teixeira
(...)

Backup server is https://www.rsync.net/ (free 500GB for FreeBSD
developers).

Nuno Teixeira  escreveu (quarta, 10/04/2024 à(s)
13:39):

> With base stack I can complete restic check successfully
> downloading/reading/checking all files from a "big" remote compressed
> backup.
> Changing it to RACK stack, it fails.
>
> I run this command often because in the past, compression corruption
> occured and this is the equivalent of restoring backup to check its
> integrity.
>
> Maybe someone could do a restic test to check if this is reproducible.
>
> Thanks,
>
>
>
>  escreveu (quarta, 10/04/2024 à(s) 13:12):
>
>>
>>
>> > On 10. Apr 2024, at 13:40, Nuno Teixeira  wrote:
>> >
>> > Hello all,
>> >
>> > @ current 1500018 and fetching torrents with net-p2p/qbittorrent
>> finished ~2GB download and connection UP until the end:
>> >
>> > ---
>> > Apr 10 11:26:46 leg kernel: re0: watchdog timeout
>> > Apr 10 11:26:46 leg kernel: re0: link state changed to DOWN
>> > Apr 10 11:26:49 leg dhclient[58810]: New IP Address (re0): 192.168.1.67
>> > Apr 10 11:26:49 leg dhclient[58814]: New Subnet Mask (re0):
>> 255.255.255.0
>> > Apr 10 11:26:49 leg dhclient[58818]: New Broadcast Address (re0):
>> 192.168.1.255
>> > Apr 10 11:26:49 leg kernel: re0: link state changed to UP
>> > Apr 10 11:26:49 leg dhclient[58822]: New Routers (re0): 192.168.1.1
>> > ---
>> >
>> > In the past tests, I've got more watchdog timeouts, connection goes
>> down and a reboot needed to put it back (`service netif restart` didn't
>> work).
>> >
>> > Other way to reproduce this is using sysutils/restic (backup program)
>> to read/check all files from a remote server via sftp:
>> >
>> > `restic -r sftp:user@remote:restic-repo check --read-data` from a 60GB
>> compressed backup.
>> >
>> > ---
>> > watchdog timeout x3 as above
>> > ---
>> >
>> > restic check fail log @ 15% progress:
>> > ---
>> > 
>> > Load(, 17310001, 0) returned error, retrying after
>> 1.7670599s: connection lost
>> > Load(, 17456892, 0) returned error, retrying after
>> 4.619104908s: connection lost
>> > Load(, 17310001, 0) returned error, retrying after
>> 5.477648517s: connection lost
>> > List(lock) returned error, retrying after 293.057766ms: connection lost
>> > List(lock) returned error, retrying after 385.206693ms: connection lost
>> > List(lock) returned error, retrying after 1.577594281s: connection lost
>> > 
>> >
>> > Connection continues UP.
>> Hi,
>>
>> I'm not sure what the issue is you are reporting. Could you state
>> what behavior you are experiencing with the base stack and with
>> the RACK stack. In particular, what the difference is?
>>
>> Best regards
>> Michael
>> >
>> > Cheers,
>> >
>> >  escreveu (quinta, 28/03/2024 à(s) 15:53):
>> >> On 28. Mar 2024, at 15:00, Nuno Teixeira  wrote:
>> >>
>> >> Hello all!
>> >>
>> >> Running rack @b7b78c1c169 "Optimize HPTS..." very happy on my laptop
>> (amd64)!
>> >>
>> >> Thanks all!
>> > Thanks for the feedback!
>> >
>> > Best regards
>> > Michael
>> >>
>> >> Drew Gallatin  escreveu (quinta, 21/03/2024
>> à(s) 12:58):
>> >> The entire point is to *NOT* go through the overhead of scheduling
>> something asynchronously, but to take advantage of the fact that a
>> user/kernel transition is going to trash the cache anyway.
>> >>
>> >> In the common case of a system which has less than the threshold
>> number of connections , we access the tcp_hpts_softclock function pointer,
>> make one function call, and access hpts_that_need_softclock, and then
>> return.  So that's 2 variables and a function call.
>> >>
>> >> I think it would be preferable to avoid that call, and to move the
>> declaration of tcp_hpts_softclock and hpts_that_need_softclock so that they
>> are in the same cacheline.  Then we'd be hitting just a single line in the
>> common case.  (I've made comments on the review to that effect).
>> >>
>> >> Also, I wonder if the threshold could get higher by default, so that
>> hpts is never called in this context unless we're to the point where we're
>> scheduling thousands of runs of the hpts thread (and taking all those clock
>> interrupts).
>> >>
>> >> Drew
>> >>
>> >> On Wed, Mar 20, 2024, at 8:17 PM, Konstantin Belousov wrote:
>> >>> On Tue, Mar 19, 2024 at 06:19:52AM -0400, rrs wrote:
>>  Ok I have created
>> 
>>  https://reviews.freebsd.org/D44420
>> 
>> 
>>  To address the issue. I also attach a short version of the patch
>> that Nuno
>>  can try and validate
>> 
>>  it works. Drew you may want to try this and validate the
>> optimization does
>>  kick in since I can
>> 
>>  only now test that it does not on my local box :)
>> >>> The patch still causes access to all cpu's cachelines on each userret.
>> >>> It would be much better to inc/check the threshold and only schedule
>> the
>> >>> call when exceeded.  Then the call can occur in some dedicated
>> context,
>> >>> like per-CPU thread, instead of userret.
>> >>>
>> 
>> 
>>  R
>> 
>> 
>> 

Re: Request for Testing: TCP RACK

2024-04-10 Thread Nuno Teixeira
With base stack I can complete restic check successfully
downloading/reading/checking all files from a "big" remote compressed
backup.
Changing it to RACK stack, it fails.

I run this command often because in the past, compression corruption
occured and this is the equivalent of restoring backup to check its
integrity.

Maybe someone could do a restic test to check if this is reproducible.

Thanks,



 escreveu (quarta, 10/04/2024 à(s) 13:12):

>
>
> > On 10. Apr 2024, at 13:40, Nuno Teixeira  wrote:
> >
> > Hello all,
> >
> > @ current 1500018 and fetching torrents with net-p2p/qbittorrent
> finished ~2GB download and connection UP until the end:
> >
> > ---
> > Apr 10 11:26:46 leg kernel: re0: watchdog timeout
> > Apr 10 11:26:46 leg kernel: re0: link state changed to DOWN
> > Apr 10 11:26:49 leg dhclient[58810]: New IP Address (re0): 192.168.1.67
> > Apr 10 11:26:49 leg dhclient[58814]: New Subnet Mask (re0): 255.255.2550
> > Apr 10 11:26:49 leg dhclient[58818]: New Broadcast Address (re0):
> 192.168.1.255
> > Apr 10 11:26:49 leg kernel: re0: link state changed to UP
> > Apr 10 11:26:49 leg dhclient[58822]: New Routers (re0): 192.168.1.1
> > ---
> >
> > In the past tests, I've got more watchdog timeouts, connection goes down
> and a reboot needed to put it back (`service netif restart` didn't work).
> >
> > Other way to reproduce this is using sysutils/restic (backup program) to
> read/check all files from a remote server via sftp:
> >
> > `restic -r sftp:user@remote:restic-repo check --read-data` from a 60GB
> compressed backup.
> >
> > ---
> > watchdog timeout x3 as above
> > ---
> >
> > restic check fail log @ 15% progress:
> > ---
> > 
> > Load(, 17310001, 0) returned error, retrying after
> 1.7670599s: connection lost
> > Load(, 17456892, 0) returned error, retrying after
> 4.619104908s: connection lost
> > Load(, 17310001, 0) returned error, retrying after
> 5.477648517s: connection lost
> > List(lock) returned error, retrying after 293.057766ms: connection lost
> > List(lock) returned error, retrying after 385.206693ms: connection lost
> > List(lock) returned error, retrying after 1.577594281s: connection lost
> > 
> >
> > Connection continues UP.
> Hi,
>
> I'm not sure what the issue is you are reporting. Could you state
> what behavior you are experiencing with the base stack and with
> the RACK stack. In particular, what the difference is?
>
> Best regards
> Michael
> >
> > Cheers,
> >
> >  escreveu (quinta, 28/03/2024 à(s) 15:53):
> >> On 28. Mar 2024, at 15:00, Nuno Teixeira  wrote:
> >>
> >> Hello all!
> >>
> >> Running rack @b7b78c1c169 "Optimize HPTS..." very happy on my laptop
> (amd64)!
> >>
> >> Thanks all!
> > Thanks for the feedback!
> >
> > Best regards
> > Michael
> >>
> >> Drew Gallatin  escreveu (quinta, 21/03/2024 à(s)
> 12:58):
> >> The entire point is to *NOT* go through the overhead of scheduling
> something asynchronously, but to take advantage of the fact that a
> user/kernel transition is going to trash the cache anyway.
> >>
> >> In the common case of a system which has less than the threshold
> number of connections , we access the tcp_hpts_softclock function pointer,
> make one function call, and access hpts_that_need_softclock, and then
> return.  So that's 2 variables and a function call.
> >>
> >> I think it would be preferable to avoid that call, and to move the
> declaration of tcp_hpts_softclock and hpts_that_need_softclock so that they
> are in the same cacheline.  Then we'd be hitting just a single line in the
> common case.  (I've made comments on the review to that effect).
> >>
> >> Also, I wonder if the threshold could get higher by default, so that
> hpts is never called in this context unless we're to the point where we're
> scheduling thousands of runs of the hpts thread (and taking all those clock
> interrupts).
> >>
> >> Drew
> >>
> >> On Wed, Mar 20, 2024, at 8:17 PM, Konstantin Belousov wrote:
> >>> On Tue, Mar 19, 2024 at 06:19:52AM -0400, rrs wrote:
>  Ok I have created
> 
>  https://reviews.freebsd.org/D44420
> 
> 
>  To address the issue. I also attach a short version of the patch that
> Nuno
>  can try and validate
> 
>  it works. Drew you may want to try this and validate the optimization
> does
>  kick in since I can
> 
>  only now test that it does not on my local box :)
> >>> The patch still causes access to all cpu's cachelines on each userret.
> >>> It would be much better to inc/check the threshold and only schedule
> the
> >>> call when exceeded.  Then the call can occur in some dedicated context,
> >>> like per-CPU thread, instead of userret.
> >>>
> 
> 
>  R
> 
> 
> 
>  On 3/18/24 3:42 PM, Drew Gallatin wrote:
> > No.  The goal is to run on every return to userspace for every
> thread.
> >
> > Drew
> >
> > On Mon, Mar 18, 2024, at 3:41 PM, Konstantin Belousov wrote:
> >> On Mon, Mar 18, 2024 at 03:13:11PM -0400, Drew Gallatin wrote:
> 

Re: Request for Testing: TCP RACK

2024-04-10 Thread tuexen



> On 10. Apr 2024, at 13:40, Nuno Teixeira  wrote:
> 
> Hello all,
> 
> @ current 1500018 and fetching torrents with net-p2p/qbittorrent finished 
> ~2GB download and connection UP until the end: 
> 
> ---
> Apr 10 11:26:46 leg kernel: re0: watchdog timeout
> Apr 10 11:26:46 leg kernel: re0: link state changed to DOWN
> Apr 10 11:26:49 leg dhclient[58810]: New IP Address (re0): 192.168.1.67
> Apr 10 11:26:49 leg dhclient[58814]: New Subnet Mask (re0): 255.255.255.0
> Apr 10 11:26:49 leg dhclient[58818]: New Broadcast Address (re0): 
> 192.168.1.255
> Apr 10 11:26:49 leg kernel: re0: link state changed to UP
> Apr 10 11:26:49 leg dhclient[58822]: New Routers (re0): 192.168.1.1
> ---
> 
> In the past tests, I've got more watchdog timeouts, connection goes down and 
> a reboot needed to put it back (`service netif restart` didn't work).
> 
> Other way to reproduce this is using sysutils/restic (backup program) to 
> read/check all files from a remote server via sftp:
> 
> `restic -r sftp:user@remote:restic-repo check --read-data` from a 60GB 
> compressed backup.
> 
> ---
> watchdog timeout x3 as above
> ---
> 
> restic check fail log @ 15% progress:
> ---
> 
> Load(, 17310001, 0) returned error, retrying after 
> 1.7670599s: connection lost
> Load(, 17456892, 0) returned error, retrying after 
> 4.619104908s: connection lost
> Load(, 17310001, 0) returned error, retrying after 
> 5.477648517s: connection lost
> List(lock) returned error, retrying after 293.057766ms: connection lost
> List(lock) returned error, retrying after 385.206693ms: connection lost
> List(lock) returned error, retrying after 1.577594281s: connection lost
> 
> 
> Connection continues UP.
Hi,

I'm not sure what the issue is you are reporting. Could you state
what behavior you are experiencing with the base stack and with
the RACK stack. In particular, what the difference is?

Best regards
Michael
> 
> Cheers,
> 
>  escreveu (quinta, 28/03/2024 à(s) 15:53):
>> On 28. Mar 2024, at 15:00, Nuno Teixeira  wrote:
>> 
>> Hello all!
>> 
>> Running rack @b7b78c1c169 "Optimize HPTS..." very happy on my laptop (amd64)!
>> 
>> Thanks all!
> Thanks for the feedback!
> 
> Best regards
> Michael
>> 
>> Drew Gallatin  escreveu (quinta, 21/03/2024 à(s) 
>> 12:58):
>> The entire point is to *NOT* go through the overhead of scheduling something 
>> asynchronously, but to take advantage of the fact that a user/kernel 
>> transition is going to trash the cache anyway.
>> 
>> In the common case of a system which has less than the threshold  number of 
>> connections , we access the tcp_hpts_softclock function pointer, make one 
>> function call, and access hpts_that_need_softclock, and then return.  So 
>> that's 2 variables and a function call.
>> 
>> I think it would be preferable to avoid that call, and to move the 
>> declaration of tcp_hpts_softclock and hpts_that_need_softclock so that they 
>> are in the same cacheline.  Then we'd be hitting just a single line in the 
>> common case.  (I've made comments on the review to that effect).
>> 
>> Also, I wonder if the threshold could get higher by default, so that hpts is 
>> never called in this context unless we're to the point where we're 
>> scheduling thousands of runs of the hpts thread (and taking all those clock 
>> interrupts).
>> 
>> Drew
>> 
>> On Wed, Mar 20, 2024, at 8:17 PM, Konstantin Belousov wrote:
>>> On Tue, Mar 19, 2024 at 06:19:52AM -0400, rrs wrote:
 Ok I have created
 
 https://reviews.freebsd.org/D44420
 
 
 To address the issue. I also attach a short version of the patch that Nuno
 can try and validate
 
 it works. Drew you may want to try this and validate the optimization does
 kick in since I can
 
 only now test that it does not on my local box :)
>>> The patch still causes access to all cpu's cachelines on each userret.
>>> It would be much better to inc/check the threshold and only schedule the
>>> call when exceeded.  Then the call can occur in some dedicated context,
>>> like per-CPU thread, instead of userret.
>>> 
 
 
 R
 
 
 
 On 3/18/24 3:42 PM, Drew Gallatin wrote:
> No.  The goal is to run on every return to userspace for every thread.
> 
> Drew
> 
> On Mon, Mar 18, 2024, at 3:41 PM, Konstantin Belousov wrote:
>> On Mon, Mar 18, 2024 at 03:13:11PM -0400, Drew Gallatin wrote:
>>> I got the idea from
>>> https://people.mpi-sws.org/~druschel/publications/soft-timers-tocs.pdf
>>> The gist is that the TCP pacing stuff needs to run frequently, and
>>> rather than run it out of a clock interrupt, its more efficient to run
>>> it out of a system call context at just the point where we return to
>>> userspace and the cache is trashed anyway. The current implementation
>>> is fine for our workload, but probably not idea for a generic system.
>>> Especially one where something is banging on system calls.
>>> 
>>> Ast's could be the right 

Re: Request for Testing: TCP RACK

2024-04-10 Thread Nuno Teixeira
Hello all,

@ current 1500018 and fetching torrents with net-p2p/qbittorrent finished
~2GB download and connection UP until the end:

---
Apr 10 11:26:46 leg kernel: re0: watchdog timeout
Apr 10 11:26:46 leg kernel: re0: link state changed to DOWN
Apr 10 11:26:49 leg dhclient[58810]: New IP Address (re0): 192.168.1.67
Apr 10 11:26:49 leg dhclient[58814]: New Subnet Mask (re0): 255.255.255.0
Apr 10 11:26:49 leg dhclient[58818]: New Broadcast Address (re0):
192.168.1.255
Apr 10 11:26:49 leg kernel: re0: link state changed to UP
Apr 10 11:26:49 leg dhclient[58822]: New Routers (re0): 192.168.1.1
---

In the past tests, I've got more watchdog timeouts, connection goes down
and a reboot needed to put it back (`service netif restart` didn't work).

Other way to reproduce this is using sysutils/restic (backup program) to
read/check all files from a remote server via sftp:

`restic -r sftp:user@remote:restic-repo check --read-data` from a 60GB
compressed backup.

---
watchdog timeout x3 as above
---

restic check fail log @ 15% progress:
---

Load(, 17310001, 0) returned error, retrying after
1.7670599s: connection lost
Load(, 17456892, 0) returned error, retrying after
4.619104908s: connection lost
Load(, 17310001, 0) returned error, retrying after
5.477648517s: connection lost
List(lock) returned error, retrying after 293.057766ms: connection lost
List(lock) returned error, retrying after 385.206693ms: connection lost
List(lock) returned error, retrying after 1.577594281s: connection lost


Connection continues UP.

Cheers,

 escreveu (quinta, 28/03/2024 à(s) 15:53):

> > On 28. Mar 2024, at 15:00, Nuno Teixeira  wrote:
> >
> > Hello all!
> >
> > Running rack @b7b78c1c169 "Optimize HPTS..." very happy on my laptop
> (amd64)!
> >
> > Thanks all!
> Thanks for the feedback!
>
> Best regards
> Michael
> >
> > Drew Gallatin  escreveu (quinta, 21/03/2024 à(s)
> 12:58):
> > The entire point is to *NOT* go through the overhead of scheduling
> something asynchronously, but to take advantage of the fact that a
> user/kernel transition is going to trash the cache anyway.
> >
> > In the common case of a system which has less than the threshold  number
> of connections , we access the tcp_hpts_softclock function pointer, make
> one function call, and access hpts_that_need_softclock, and then return.
> So that's 2 variables and a function call.
> >
> > I think it would be preferable to avoid that call, and to move the
> declaration of tcp_hpts_softclock and hpts_that_need_softclock so that they
> are in the same cacheline.  Then we'd be hitting just a single line in the
> common case.  (I've made comments on the review to that effect).
> >
> > Also, I wonder if the threshold could get higher by default, so that
> hpts is never called in this context unless we're to the point where we're
> scheduling thousands of runs of the hpts thread (and taking all those clock
> interrupts).
> >
> > Drew
> >
> > On Wed, Mar 20, 2024, at 8:17 PM, Konstantin Belousov wrote:
> >> On Tue, Mar 19, 2024 at 06:19:52AM -0400, rrs wrote:
> >>> Ok I have created
> >>>
> >>> https://reviews.freebsd.org/D44420
> >>>
> >>>
> >>> To address the issue. I also attach a short version of the patch that
> Nuno
> >>> can try and validate
> >>>
> >>> it works. Drew you may want to try this and validate the optimization
> does
> >>> kick in since I can
> >>>
> >>> only now test that it does not on my local box :)
> >> The patch still causes access to all cpu's cachelines on each userret.
> >> It would be much better to inc/check the threshold and only schedule the
> >> call when exceeded.  Then the call can occur in some dedicated context,
> >> like per-CPU thread, instead of userret.
> >>
> >>>
> >>>
> >>> R
> >>>
> >>>
> >>>
> >>> On 3/18/24 3:42 PM, Drew Gallatin wrote:
>  No.  The goal is to run on every return to userspace for every thread.
> 
>  Drew
> 
>  On Mon, Mar 18, 2024, at 3:41 PM, Konstantin Belousov wrote:
> > On Mon, Mar 18, 2024 at 03:13:11PM -0400, Drew Gallatin wrote:
> >> I got the idea from
> >>
> https://people.mpi-sws.org/~druschel/publications/soft-timers-tocs.pdf
> >> The gist is that the TCP pacing stuff needs to run frequently, and
> >> rather than run it out of a clock interrupt, its more efficient to
> run
> >> it out of a system call context at just the point where we return to
> >> userspace and the cache is trashed anyway. The current
> implementation
> >> is fine for our workload, but probably not idea for a generic
> system.
> >> Especially one where something is banging on system calls.
> >>
> >> Ast's could be the right tool for this, but I'm super unfamiliar
> with
> >> them, and I can't find any docs on them.
> >>
> >> Would ast_register(0, ASTR_UNCOND, 0, func) be roughly equivalent to
> >> what's happening here?
> > This call would need some AST number added, and then it registers the
> > ast to run on next return to userspace, for 

Resolved: Panic after update main-n269202-4e7aa03b7076 -> n269230-f6f67f58c19d

2024-04-10 Thread David Wolfskill
After the update to main-n269261-1e6db7be6921, head built & booted OK.

FreeBSD 15.0-CURRENT #45 main-n269261-1e6db7be6921: Wed Apr 10 11:11:50 UTC 
2024 
r...@freebeast.catwhisker.org:/common/S4/obj/usr/src/amd64.amd64/sys/GENERIC 
amd64 1500018 1500018

Peace,
david
-- 
David H. Wolfskill  da...@catwhisker.org
Alexey Navalny was a courageous man; Putin has made him a martyr.

See https://www.catwhisker.org/~david/publickey.gpg for my public key.


signature.asc
Description: PGP signature


Re: pkg-1.21.0: after upgrade 1.20.9_1 -> 1.21.0: pkg core dumps on specific ports

2024-04-09 Thread Baptiste Daroussin
Le 9 avril 2024 18:06:12 GMT+02:00, FreeBSD User  a 
écrit :
>Am Tue, 9 Apr 2024 17:10:52 +0200
>Rainer Hurling  schrieb:
>
>> Am 09.04.24 um 09:20 schrieb Baptiste Daroussin:
>> > On Sat 06 Apr 09:23, Rainer Hurling wrote:  
>> >> Am 06.04.24 um 09:05 schrieb FreeBSD User:  
>> >>> Hello,
>> >>>
>> >>> after updating (portmaster and make) ports-mgmt/ports from 1.20.9_1 -> 
>> >>> 1.21.0 on CURRENT
>> >>> and 14-STABLE, I can't update several ports:
>> >>>
>> >>> www/apache24
>> >>> databases/redis
>> >>>
>> >>> pkg core dumps while performing installation. apache24 and redis are 
>> >>> ports I realized
>> >>> this misbehaviour on ALL 14-STABLE and CURRENT boxes (both OS variants 
>> >>> latest builds,
>> >>> i.e. FreeBSD 15.0-CURRENT #32 main-n269135-da2b732288c7: Fri Apr  5 
>> >>> 20:30:39 CEST 2024
>> >>> amd64).
>> >>>
>> >>> After some updates on a poudriere builder (CURRENT base host, 
>> >>> 14.0-RELENG jail with
>> >>> poudriere) building packages for 14.0-RELENG, I observed the same 
>> >>> behaviour when
>> >>> updating packages on target hosts where pkg is first updated, on those 
>> >>> hosts,
>> >>> nextcloud-server and icinga2 host utilizing also databases/redis and 
>> >>> www/apache24, pkg
>> >>> fails the same way.
>> >>>
>> >>> I do not dare to update our poudriere hosts since the problem seems to 
>> >>> pop up when pkg
>> >>> 1.21.0 is installed, no matter whether I use poudriere built ports (from 
>> >>> our own builder
>> >>> hosts) or recent source tree with portmaster/make build process.
>> >>>
>> >>> Looks like a serious bug to me and not a site/user specific problem. 
>> >>> Hopefully others do
>> >>> realize the same ...
>> >>>
>> >>> Thanks in advance,
>> >>>
>> >>> oh  
>> >>
>> >>
>> >> Hmm, I just tried to reproduce that. Both ports mentioned, databases/redis
>> >> and www/apache24, can be built and installed with Portmaster. The box is a
>> >> 15.0-CURRENT with pkg-1.21.0.
>> >>
>> >> Maybe 'pkg check -Bn' or 'portmaster --check-depends --check-port-dbdir'
>> >> show some inconsistencies?
>> >>
>> >> Best wishes,
>> >> Rainer
>> >>
>> >>  
>> > using portmaster or not are strictly unlikely to be helpful here.
>> > 
>> > The right way to test if to report running with pkg - and also to 
>> > recommand
>> > testing with default options in pkg.conf.
>> > 
>> > Best regards,
>> > Bapt  
>> 
>> This is correct and certainly better. I was not aware of this.
>> 
>> Fortunately, my less optimal suggestions helped O. Hartmann in this case 
>> to find the missing and outdated dependencies.
>> 
>> In any case, many thanks for this helpfull advice.
>> 
>> Regards,
>> Rainer
>> 
>> 
>
>Hello,
>
>@Babptist : it should be pkg -d, shouldn't it? Or do I miss again something 
>here?

Each d will provide a more verbose level of debug

Bapt




Re: Panic after update main-n269202-4e7aa03b7076 -> n269230-f6f67f58c19d

2024-04-09 Thread Cy Schubert
Cy Schubert writes:
> In message , Gleb Smirnoff writes:
> > On Tue, Apr 09, 2024 at 07:02:11PM +0200, FreeBSD User wrote:
> > F> The crash is still present on the most recent checked out sources as of 
> mi
> > nutes ago.
> > F> I just checked out on HEAD the latest commits (see below, just for the r
> ec
> > ord and to prevent
> > F> being wrong here).
> > F> 
> > F> [...]
> > F> commit 841cf52595b6a6b98e266b63e54a7cf6fb6ca73e (HEAD -> main, origin/ma
> in
> > , origin/HEAD)
> >
> > Is the crash same or different? Can you please share backtrace?
>
> The new panic is:
>
> Fatal trap 12: page fault while in kernel mode
> cpuid = 3; apic id = 03
> fault virtual address   = 0x28
> fault code  = supervisor read data, page not present
> instruction pointer = 0x20:0x80729d8d
> stack pointer   = 0x28:0xfe00b59c0a70
> frame pointer   = 0x28:0xfe00b59c0aa0
> code segment= base 0x0, limit 0xf, type 0x1b
> = DPL 0, pres 1, long 1, def32 0, gran 1
> processor eflags= interrupt enabled, resume, IOPL = 0
> current process = 2697 (rpcbind)
> rdi: f80004fcd720 rsi:  rdx: fe00b59c0b68
> rcx:   r8: 0001  r9: 3b9ac9e0
> rax: 3b9aca00 rbx: fe00b59c0b68 rbp: fe00b59c0aa0
> r10: 0020 r11:  r12: 
> r13: 0020 r14: 0020 r15: f80004fcd720
> trap number = 12
> panic: page fault
> cpuid = 3
> time = 1712682162
> KDB: stack backtrace:
> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 
> 0xfe00b59c0760
> vpanic() at vpanic+0x135/frame 0xfe00b59c0890
> panic() at panic+0x43/frame 0xfe00b59c08f0
> trap_fatal() at trap_fatal+0x40b/frame 0xfe00b59c0950
> trap_pfault() at trap_pfault+0x46/frame 0xfe00b59c09a0
> calltrap() at calltrap+0x8/frame 0xfe00b59c09a0
> --- trap 0xc, rip = 0x80729d8d, rsp = 0xfe00b59c0a70, rbp = 
> 0xfe00b59c0aa0 ---
> uiomove_faultflag() at uiomove_faultflag+0x9d/frame 0xfe00b59c0aa0
> uipc_soreceive_stream_or_seqpacket() at uipc_soreceive_stream_or_seqpacket+0
> x38c/frame 0xfe00b59c0b30
> soreceive() at soreceive+0x2f/frame 0xfe00b59c0b50
> clnt_vc_soupcall() at clnt_vc_soupcall+0x139/frame 0xfe00b59c0c00
> sorwakeup_locked() at sorwakeup_locked+0x98/frame 0xfe00b59c0c20
> uipc_sosend_stream_or_seqpacket() at uipc_sosend_stream_or_seqpacket+0x58e/f
> rame 0xfe00b59c0ce0
> sousrsend() at sousrsend+0x5f/frame 0xfe00b59c0d40
> dofilewrite() at dofilewrite+0x7f/frame 0xfe00b59c0d90
> sys_write() at sys_write+0xb3/frame 0xfe00b59c0e00
> amd64_syscall() at amd64_syscall+0x115/frame 0xfe00b59c0f30
> fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfe00b59c0f30
> --- syscall (4, FreeBSD ELF64, write), rip = 0x1d82f79281a, rsp = 
> 0x1d82c63be78, rbp = 0x1d82c63bee0 ---
> Uptime: 39s
> Dumping 515 out of 7969 MB:..4%..13%..22%..32%..41%..53%..63%..72%..81%..91%
>
> (kgdb) bt
> #0  __curthread () at /opt/src/git-src/sys/amd64/include/pcpu_aux.h:57
> #1  doadump (textdump=textdump@entry=1) at /opt/src/git-src/sys/kern/kern_sh
> utdown.c:404
> #2  0x806bd7d9 in kern_reboot (howto=260) at 
> /opt/src/git-src/sys/kern/kern_shutdown.c:524
> #3  0x806bdcf2 in vpanic (fmt=0x80ae0f0d "%s", 
> ap=ap@entry=0xfe00b59c08d0) at /opt/src/git-src/sys/kern/kern_shutdown.c
> :976
> #4  0x806bdb43 in panic (fmt=) at 
> /opt/src/git-src/sys/kern/kern_shutdown.c:892
> #5  0x80a597fb in trap_fatal (frame=0xfe00b59c09b0, eva=40) at 
> /opt/src/git-src/sys/amd64/amd64/trap.c:950
> #6  0x80a59846 in trap_pfault (frame=, usermode=false, 
> signo=, ucode=) at /opt/src/git-src/sys/amd64/
> amd64/trap.c:758
> #7  
> #8  uiomove_faultflag (cp=0xf80004fcd720, n=32, 
> uio=uio@entry=0xfe00b59c0b68, nofault=nofault@entry=0) at 
> /opt/src/git-src/sys/kern/subr_uio.c:240
> #9  0x80729ce9 in uiomove (cp=0xf80004fcd720, n=0, 
> uio=uio@entry=0xfe00b59c0b68) at /opt/src/git-src/sys/kern/subr_uio.c:19
> 3
> #10 0x80774f1c in uipc_soreceive_stream_or_seqpacket 
> (so=0xf800361f4000, psa=, uio=0xfe00b59c0b68, 
> mp0=, controlp=0xfe00b59c0bc0, flagsp=0xfe00b59c0ba8)
>  at /opt/src/git-src/sys/kern/uipc_usrreq.c:1420
> #11 0x8076d4ff in soreceive (so=0xf80004fcd720, 
> so@entry=0xf800361f4000, psa=psa@entry=0x0, uio=uio@entry=0xfe00b59c
> 0b68, mp0=0x0, mp0@entry=0xfe00b59c0bb8, controlp=0x1, 
> controlp@entry=0xfe00b59c0bc0, flagsp=0x3b9ac9e0,
> flagsp@entry=0xfe00b59c0ba8) at /opt/src/git-src/sys/kern/uipc_socke
> t.c:2965
> #12 0x80917719 in clnt_vc_soupcall (so=0xf800361f4000, 
> arg=0xf80036191c00, waitflag=) at 
> /opt/src/git-src/sys/rpc/clnt_vc.c:991
> #13 0x80765338 in sowakeup (so=0xf800361f4000, which=SO_RCV) at 
> 

Re: Panic after update main-n269202-4e7aa03b7076 -> n269230-f6f67f58c19d

2024-04-09 Thread Gleb Smirnoff
On Tue, Apr 09, 2024 at 07:02:11PM +0200, FreeBSD User wrote:
F> The crash is still present on the most recent checked out sources as of 
minutes ago.
F> I just checked out on HEAD the latest commits (see below, just for the 
record and to prevent
F> being wrong here).
F> 
F> [...]
F> commit 841cf52595b6a6b98e266b63e54a7cf6fb6ca73e (HEAD -> main, origin/main, 
origin/HEAD)

Is the crash same or different? Can you please share backtrace?

-- 
Gleb Smirnoff



Re: Panic after update main-n269202-4e7aa03b7076 -> n269230-f6f67f58c19d

2024-04-09 Thread FreeBSD User
Am Tue, 9 Apr 2024 09:18:49 -0700
Gleb Smirnoff  schrieb:

> On Tue, Apr 09, 2024 at 04:47:07AM -0700, David Wolfskill wrote:
> D> --- trap 0xc, rip = 0x80b208c5, rsp = 0xfe048c204920, rbp = 
> 0xfe
> D> 048c204960 ---
> D> __mtx_lock_flags() at __mtx_lock_flags+0x45/frame 0xfe048c204960
> D> clnt_vc_create() at clnt_vc_create+0x4f4/frame 0xfe048c204ab0
> D> local_rpcb() at local_rpcb+0x11b/frame 0xfe048c204b50
> D> rpcb_unset() at rpcb_unset+0x24/frame 0xfe048c204bb0
> D> svc_tp_create() at svc_tp_create+0xee/frame 0xfe048c204c90
> D> sys_nlm_syscall() at sys_nlm_syscall+0x3d0/frame 0xfe048c204e00
> D> amd64_syscall() at amd64_syscall+0x158/frame 0xfe048c204f30
> D> fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfe048c204f30
> D> --- syscall (154, FreeBSD ELF64, nlm_syscall), rip = 0x3f00a2dfd2a, rsp = 
> 0x3f00
> D> 96f7168, rbp = 0x3f0096f7230 ---
> D> KDB: enter: panic
> D> [ thread pid 1208 tid 101107 ]
> D> Stopped at  kdb_enter+0x33: movq$0,0x104eb92(%rip)  
> D> db>   
> 
> This should be fixed by just pushed e205fd318a296ffdb7392486cdcec7f660fcffcf.
> 
> Sorry for that!
> 

Hello all.

The crash is still present on the most recent checked out sources as of minutes 
ago.

I just checked out on HEAD the latest commits (see below, just for the record 
and to prevent
being wrong here).

[...]
commit 841cf52595b6a6b98e266b63e54a7cf6fb6ca73e (HEAD -> main, origin/main, 
origin/HEAD)
Author: Alan Cox 
Date:   Mon Apr 8 00:05:27 2024 -0500

arm64 pmap: Add ATTR_CONTIGUOUS support [Part 2]

Create ATTR_CONTIGUOUS mappings in pmap_enter_object().  As a result,
when the base page size is 4 KB, the read-only data and text sections
of large (2 MB+) executables, e.g., clang, can be mapped using 64 KB
pages.  Similarly, when the base page size is 16 KB, the read-only
data section of large executables can be mapped using 2 MB pages.

Rename pmap_enter_2mpage().  Given that we have grown support for 16 KB
base pages, we should no longer include page sizes that may vary, e.g.,
2mpage, in pmap function names.  Requested by: andrew

Co-authored-by: Eliot Solomon 
Differential Revision:  https://reviews.freebsd.org/D44575

commit e205fd318a296ffdb7392486cdcec7f660fcffcf
Author: Gleb Smirnoff 
Date:   Tue Apr 9 09:16:52 2024 -0700

rpc: use new macros to lock socket buffers

Fixes:  d80a97def9a1db6f07f5d2e68f7ad62b27918947

commit cb20a74ca06381e96c41cb4495d633710cc6cb79
Author: Stephen J. Kiernan 
Date:   Wed Apr 3 17:04:57 2024 -0400


-- 
O. Hartmann



Re: Panic after update main-n269202-4e7aa03b7076 -> n269230-f6f67f58c19d

2024-04-09 Thread David Wolfskill
On Tue, Apr 09, 2024 at 09:18:49AM -0700, Gleb Smirnoff wrote:
> ...
> D> db> 
> 
> This should be fixed by just pushed e205fd318a296ffdb7392486cdcec7f660fcffcf.

Thanks! :-)

> Sorry for that!
> 

Glad it's idenitfied & addressed.

[Sorry for delay; commute this morning was a bit more turbulent than
usual.]

Peace,
david
-- 
David H. Wolfskill  da...@catwhisker.org
Alexey Navalny was a courageous man; Putin has made him a martyr.

See https://www.catwhisker.org/~david/publickey.gpg for my public key.


signature.asc
Description: PGP signature


Re: Panic after update main-n269202-4e7aa03b7076 -> n269230-f6f67f58c19d

2024-04-09 Thread Gleb Smirnoff
On Tue, Apr 09, 2024 at 04:47:07AM -0700, David Wolfskill wrote:
D> --- trap 0xc, rip = 0x80b208c5, rsp = 0xfe048c204920, rbp = 
0xfe
D> 048c204960 ---
D> __mtx_lock_flags() at __mtx_lock_flags+0x45/frame 0xfe048c204960
D> clnt_vc_create() at clnt_vc_create+0x4f4/frame 0xfe048c204ab0
D> local_rpcb() at local_rpcb+0x11b/frame 0xfe048c204b50
D> rpcb_unset() at rpcb_unset+0x24/frame 0xfe048c204bb0
D> svc_tp_create() at svc_tp_create+0xee/frame 0xfe048c204c90
D> sys_nlm_syscall() at sys_nlm_syscall+0x3d0/frame 0xfe048c204e00
D> amd64_syscall() at amd64_syscall+0x158/frame 0xfe048c204f30
D> fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfe048c204f30
D> --- syscall (154, FreeBSD ELF64, nlm_syscall), rip = 0x3f00a2dfd2a, rsp = 
0x3f00
D> 96f7168, rbp = 0x3f0096f7230 ---
D> KDB: enter: panic
D> [ thread pid 1208 tid 101107 ]
D> Stopped at  kdb_enter+0x33: movq$0,0x104eb92(%rip)
D> db> 

This should be fixed by just pushed e205fd318a296ffdb7392486cdcec7f660fcffcf.

Sorry for that!

-- 
Gleb Smirnoff



Re: pkg-1.21.0: after upgrade 1.20.9_1 -> 1.21.0: pkg core dumps on specific ports

2024-04-09 Thread FreeBSD User
Am Tue, 9 Apr 2024 17:10:52 +0200
Rainer Hurling  schrieb:

> Am 09.04.24 um 09:20 schrieb Baptiste Daroussin:
> > On Sat 06 Apr 09:23, Rainer Hurling wrote:  
> >> Am 06.04.24 um 09:05 schrieb FreeBSD User:  
> >>> Hello,
> >>>
> >>> after updating (portmaster and make) ports-mgmt/ports from 1.20.9_1 -> 
> >>> 1.21.0 on CURRENT
> >>> and 14-STABLE, I can't update several ports:
> >>>
> >>> www/apache24
> >>> databases/redis
> >>>
> >>> pkg core dumps while performing installation. apache24 and redis are 
> >>> ports I realized
> >>> this misbehaviour on ALL 14-STABLE and CURRENT boxes (both OS variants 
> >>> latest builds,
> >>> i.e. FreeBSD 15.0-CURRENT #32 main-n269135-da2b732288c7: Fri Apr  5 
> >>> 20:30:39 CEST 2024
> >>> amd64).
> >>>
> >>> After some updates on a poudriere builder (CURRENT base host, 14.0-RELENG 
> >>> jail with
> >>> poudriere) building packages for 14.0-RELENG, I observed the same 
> >>> behaviour when
> >>> updating packages on target hosts where pkg is first updated, on those 
> >>> hosts,
> >>> nextcloud-server and icinga2 host utilizing also databases/redis and 
> >>> www/apache24, pkg
> >>> fails the same way.
> >>>
> >>> I do not dare to update our poudriere hosts since the problem seems to 
> >>> pop up when pkg
> >>> 1.21.0 is installed, no matter whether I use poudriere built ports (from 
> >>> our own builder
> >>> hosts) or recent source tree with portmaster/make build process.
> >>>
> >>> Looks like a serious bug to me and not a site/user specific problem. 
> >>> Hopefully others do
> >>> realize the same ...
> >>>
> >>> Thanks in advance,
> >>>
> >>> oh  
> >>
> >>
> >> Hmm, I just tried to reproduce that. Both ports mentioned, databases/redis
> >> and www/apache24, can be built and installed with Portmaster. The box is a
> >> 15.0-CURRENT with pkg-1.21.0.
> >>
> >> Maybe 'pkg check -Bn' or 'portmaster --check-depends --check-port-dbdir'
> >> show some inconsistencies?
> >>
> >> Best wishes,
> >> Rainer
> >>
> >>  
> > using portmaster or not are strictly unlikely to be helpful here.
> > 
> > The right way to test if to report running with pkg - and also to 
> > recommand
> > testing with default options in pkg.conf.
> > 
> > Best regards,
> > Bapt  
> 
> This is correct and certainly better. I was not aware of this.
> 
> Fortunately, my less optimal suggestions helped O. Hartmann in this case 
> to find the missing and outdated dependencies.
> 
> In any case, many thanks for this helpfull advice.
> 
> Regards,
> Rainer
> 
> 

Hello,

@Babptist : it should be pkg -d, shouldn't it? Or do I miss again something 
here?

With today's update to pkg 1.21.1 the problem has vanished. 

@R. Hurling: Thanks for the tip using the checks. I missed that and somehow it 
revealed some
problems here I hopefully have fixed so far.

Kind regads and thanks,

oh
-- 
O. Hartmann



Re: Panic after update main-n269202-4e7aa03b7076 -> n269230-f6f67f58c19d

2024-04-09 Thread Rick Macklem
On Tue, Apr 9, 2024 at 8:04 AM Rick Macklem  wrote:
>
> On Tue, Apr 9, 2024 at 7:46 AM Rick Macklem  wrote:
> >
> > On Tue, Apr 9, 2024 at 4:47 AM David Wolfskill  wrote:
> > >
> > > Machine had been running:
> > >
> > > FreeBSD 15.0-CURRENT #43 main-n269202-4e7aa03b7076: Mon Apr  8 11:19:58 
> > > UTC 2024 
> > > r...@freebeast.catwhisker.org:/common/S4/obj/usr/src/amd64.amd64/sys/GENERIC
> > >  amd64 1500018 1500018
> > >
> > > This was an in-place source update, after updating sources to
> > > main-n269230-f6f67f58c19d.  On reboot (after "make installworld"
> > > completed, I see this on the serial console (copy/pasted):
> > >
> > > ...
> > > Starting lockd.
> > I'd guess this is caused by some recent change to AF_UNIX socket
> > creation. The crash appears to be either the SOCK_LOCK() or
> > SOCKBUF_LOCK(>so_rcv) not being initialized.
> > If you can find out what source line# corresponds to
> > clnt_vc_create+0x4f4 you can probably tell which one it is.
> >
> > All local_rpcb() does is a
> >   error = socreate(AF_LOCAL, , SOCK_STREAM, 0, curthread->td_ucred,
> > curthread);
> >   and then calls clnt_vc_create(..so..) with the socket.
> >
> > I think that socreate() is not initializing one of those two mutexes
> > for some reason.
> Looks to me like this was caused by commit 681711b. I've added tuexen@
> to the post, since he committed it.
Oops, my bad, got this wrong.

The commit is d80a97d, when it added PR_SOCKBUG to the pr_flags
for AF_UNIX/SOCKSTREAM.
I've added glebius@ to the email.

rick

>
> rick
>
> >
> > rick
> >
> > >
> > >
> > > Fatal trap 12: page fault while in kernel mode
> > > cpuid = 9; apic id = 09
> > > fault virtual address   = 0x18
> > > fault code  = supervisor read data, page not present
> > > instruction pointer = 0x20:0x80b208c5
> > > stack pointer   = 0x28:0xfe048c204920
> > > frame pointer   = 0x28:0xfe048c204960
> > > code segment= base 0x0, limit 0xf, type 0x1b
> > > = DPL 0, pres 1, long 1, def32 0, gran 1
> > > processor eflags= interrupt enabled, resume, IOPL = 0
> > > current process = 1208 (rpc.Starting automountd.
> > > lockd)
> > > rdi:  rsi: f801078b0740 rdx: 
> > > rcx: 010a  r8: 818d30f0  r9: 
> > > rax:  rbx: Starting powerd.0018 rbp: 
> > > fe048c204960
> > > r10: 0001 r11: 0001 r12: f80274e32c18
> > > r13: 010a r14: f80274e32c00 r15: 812ae38a
> > > trap number = 12
> > > panic: page fault
> > > cpuid = 9
> > > time = 1712662362
> > > KDB: stack backtrace:
> > > db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 
> > > 0xfe048c2045f0
> > > vpanic() at vpanic+0x135/frame 0xfe048c204720
> > > panic() at panic+0x43/frame 0xfe048c204780
> > > trap_fatal() at trap_fatal+0x40b/frame 0xfe048c2047e0
> > > trap_pfault() at trap_pfault+0xa0/frame 0xfe048c204850
> > > calltrap() at calltrap+0x8/frame 0xfe048c204850
> > > --- trap 0xc, rip = 0x80b208c5, rsp = 0xfe048c204920, rbp = 
> > > 0xfe
> > > 048c204960 ---
> > > __mtx_lock_flags() at __mtx_lock_flags+0x45/frame 0xfe048c204960
> > > clnt_vc_create() at clnt_vc_create+0x4f4/frame 0xfe048c204ab0
> > > local_rpcb() at local_rpcb+0x11b/frame 0xfe048c204b50
> > > rpcb_unset() at rpcb_unset+0x24/frame 0xfe048c204bb0
> > > svc_tp_create() at svc_tp_create+0xee/frame 0xfe048c204c90
> > > sys_nlm_syscall() at sys_nlm_syscall+0x3d0/frame 0xfe048c204e00
> > > amd64_syscall() at amd64_syscall+0x158/frame 0xfe048c204f30
> > > fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfe048c204f30
> > > --- syscall (154, FreeBSD ELF64, nlm_syscall), rip = 0x3f00a2dfd2a, rsp = 
> > > 0x3f00
> > > 96f7168, rbp = 0x3f0096f7230 ---
> > > KDB: enter: panic
> > > [ thread pid 1208 tid 101107 ]
> > > Stopped at  kdb_enter+0x33: movq$0,0x104eb92(%rip)
> > > db>
> > >
> > >
> > > Given suitable clues, I can poke at it a bit -- this is my "build
> > > machine," so it doesn't have critical work to do at the moment.  (I
> > > would normally have powered it down for the day: here's no need for
> > > it to be wasting energy.)
> > >
> > > Laptops are still building ports under stable/14 -- something seems
> > > to want the llvm17 port, and they have firefox to build, so they
> > > won't be testing CURRENT/head for a while, yet.
> > >
> > > Peace,
> > > david
> > > --
> > > David H. Wolfskill  da...@catwhisker.org
> > > Alexey Navalny was a courageous man; Putin has made him a martyr.
> > >
> > > See https://www.catwhisker.org/~david/publickey.gpg for my public key.



Re: pkg-1.21.0: after upgrade 1.20.9_1 -> 1.21.0: pkg core dumps on specific ports

2024-04-09 Thread Rainer Hurling

Am 09.04.24 um 09:20 schrieb Baptiste Daroussin:

On Sat 06 Apr 09:23, Rainer Hurling wrote:

Am 06.04.24 um 09:05 schrieb FreeBSD User:

Hello,

after updating (portmaster and make) ports-mgmt/ports from 1.20.9_1 -> 1.21.0 
on CURRENT and
14-STABLE, I can't update several ports:

www/apache24
databases/redis

pkg core dumps while performing installation. apache24 and redis are ports I 
realized this
misbehaviour on ALL 14-STABLE and CURRENT boxes (both OS variants latest 
builds, i.e. FreeBSD
15.0-CURRENT #32 main-n269135-da2b732288c7: Fri Apr  5 20:30:39 CEST 2024 
amd64).

After some updates on a poudriere builder (CURRENT base host, 14.0-RELENG jail 
with poudriere)
building packages for 14.0-RELENG, I observed the same behaviour when updating 
packages on
target hosts where pkg is first updated, on those hosts, nextcloud-server and 
icinga2 host
utilizing also databases/redis and www/apache24, pkg fails the same way.

I do not dare to update our poudriere hosts since the problem seems to pop up 
when pkg 1.21.0
is installed, no matter whether I use poudriere built ports (from our own 
builder hosts) or
recent source tree with portmaster/make build process.

Looks like a serious bug to me and not a site/user specific problem. Hopefully 
others do
realize the same ...

Thanks in advance,

oh



Hmm, I just tried to reproduce that. Both ports mentioned, databases/redis
and www/apache24, can be built and installed with Portmaster. The box is a
15.0-CURRENT with pkg-1.21.0.

Maybe 'pkg check -Bn' or 'portmaster --check-depends --check-port-dbdir'
show some inconsistencies?

Best wishes,
Rainer



using portmaster or not are strictly unlikely to be helpful here.

The right way to test if to report running with pkg - and also to recommand
testing with default options in pkg.conf.

Best regards,
Bapt


This is correct and certainly better. I was not aware of this.

Fortunately, my less optimal suggestions helped O. Hartmann in this case 
to find the missing and outdated dependencies.


In any case, many thanks for this helpfull advice.

Regards,
Rainer




Re: pkg-1.21.0: after upgrade 1.20.9_1 -> 1.21.0: pkg core dumps on specific ports

2024-04-09 Thread Rainer Hurling

Am 06.04.24 um 09:56 schrieb FreeBSD User:

Am Sat, 6 Apr 2024 09:23:30 +0200
Rainer Hurling  schrieb:


Am 06.04.24 um 09:05 schrieb FreeBSD User:

Hello,

after updating (portmaster and make) ports-mgmt/ports from 1.20.9_1 -> 1.21.0 
on CURRENT
and 14-STABLE, I can't update several ports:

www/apache24
databases/redis

pkg core dumps while performing installation. apache24 and redis are ports I 
realized this
misbehaviour on ALL 14-STABLE and CURRENT boxes (both OS variants latest 
builds, i.e.
FreeBSD 15.0-CURRENT #32 main-n269135-da2b732288c7: Fri Apr  5 20:30:39 CEST 
2024 amd64).

After some updates on a poudriere builder (CURRENT base host, 14.0-RELENG jail 
with
poudriere) building packages for 14.0-RELENG, I observed the same behaviour 
when updating
packages on target hosts where pkg is first updated, on those hosts, 
nextcloud-server and
icinga2 host utilizing also databases/redis and www/apache24, pkg fails the 
same way.

I do not dare to update our poudriere hosts since the problem seems to pop up 
when pkg
1.21.0 is installed, no matter whether I use poudriere built ports (from our 
own builder
hosts) or recent source tree with portmaster/make build process.

Looks like a serious bug to me and not a site/user specific problem. Hopefully 
others do
realize the same ...

Thanks in advance,

oh



Hmm, I just tried to reproduce that. Both ports mentioned,
databases/redis and www/apache24, can be built and installed with
Portmaster. The box is a 15.0-CURRENT with pkg-1.21.0.

Maybe 'pkg check -Bn' or 'portmaster --check-depends --check-port-dbdir'
show some inconsistencies?

Best wishes,
Rainer



Hello,

thanks for the quick response.

I checked on the CURRENT systems here at hand and must confess - it is a mess! 
pkg check -Bn
dropped a lot of missing shared objects missing from autotools and missing 
guile2 :-(

Thank you very much,
oh



You're really welcome. I myself have failed several times precisely 
because some dependencies were not in order. And that's not always 
obvious :)


Best wishes,
Rainer




Re: Panic after update main-n269202-4e7aa03b7076 -> n269230-f6f67f58c19d

2024-04-09 Thread Rick Macklem
On Tue, Apr 9, 2024 at 7:46 AM Rick Macklem  wrote:
>
> On Tue, Apr 9, 2024 at 4:47 AM David Wolfskill  wrote:
> >
> > Machine had been running:
> >
> > FreeBSD 15.0-CURRENT #43 main-n269202-4e7aa03b7076: Mon Apr  8 11:19:58 UTC 
> > 2024 
> > r...@freebeast.catwhisker.org:/common/S4/obj/usr/src/amd64.amd64/sys/GENERIC
> >  amd64 1500018 1500018
> >
> > This was an in-place source update, after updating sources to
> > main-n269230-f6f67f58c19d.  On reboot (after "make installworld"
> > completed, I see this on the serial console (copy/pasted):
> >
> > ...
> > Starting lockd.
> I'd guess this is caused by some recent change to AF_UNIX socket
> creation. The crash appears to be either the SOCK_LOCK() or
> SOCKBUF_LOCK(>so_rcv) not being initialized.
> If you can find out what source line# corresponds to
> clnt_vc_create+0x4f4 you can probably tell which one it is.
>
> All local_rpcb() does is a
>   error = socreate(AF_LOCAL, , SOCK_STREAM, 0, curthread->td_ucred,
> curthread);
>   and then calls clnt_vc_create(..so..) with the socket.
>
> I think that socreate() is not initializing one of those two mutexes
> for some reason.
Looks to me like this was caused by commit 681711b. I've added tuexen@
to the post, since he committed it.

rick

>
> rick
>
> >
> >
> > Fatal trap 12: page fault while in kernel mode
> > cpuid = 9; apic id = 09
> > fault virtual address   = 0x18
> > fault code  = supervisor read data, page not present
> > instruction pointer = 0x20:0x80b208c5
> > stack pointer   = 0x28:0xfe048c204920
> > frame pointer   = 0x28:0xfe048c204960
> > code segment= base 0x0, limit 0xf, type 0x1b
> > = DPL 0, pres 1, long 1, def32 0, gran 1
> > processor eflags= interrupt enabled, resume, IOPL = 0
> > current process = 1208 (rpc.Starting automountd.
> > lockd)
> > rdi:  rsi: f801078b0740 rdx: 
> > rcx: 010a  r8: 818d30f0  r9: 
> > rax:  rbx: Starting powerd.0018 rbp: 
> > fe048c204960
> > r10: 0001 r11: 0001 r12: f80274e32c18
> > r13: 010a r14: f80274e32c00 r15: 812ae38a
> > trap number = 12
> > panic: page fault
> > cpuid = 9
> > time = 1712662362
> > KDB: stack backtrace:
> > db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 
> > 0xfe048c2045f0
> > vpanic() at vpanic+0x135/frame 0xfe048c204720
> > panic() at panic+0x43/frame 0xfe048c204780
> > trap_fatal() at trap_fatal+0x40b/frame 0xfe048c2047e0
> > trap_pfault() at trap_pfault+0xa0/frame 0xfe048c204850
> > calltrap() at calltrap+0x8/frame 0xfe048c204850
> > --- trap 0xc, rip = 0x80b208c5, rsp = 0xfe048c204920, rbp = 
> > 0xfe
> > 048c204960 ---
> > __mtx_lock_flags() at __mtx_lock_flags+0x45/frame 0xfe048c204960
> > clnt_vc_create() at clnt_vc_create+0x4f4/frame 0xfe048c204ab0
> > local_rpcb() at local_rpcb+0x11b/frame 0xfe048c204b50
> > rpcb_unset() at rpcb_unset+0x24/frame 0xfe048c204bb0
> > svc_tp_create() at svc_tp_create+0xee/frame 0xfe048c204c90
> > sys_nlm_syscall() at sys_nlm_syscall+0x3d0/frame 0xfe048c204e00
> > amd64_syscall() at amd64_syscall+0x158/frame 0xfe048c204f30
> > fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfe048c204f30
> > --- syscall (154, FreeBSD ELF64, nlm_syscall), rip = 0x3f00a2dfd2a, rsp = 
> > 0x3f00
> > 96f7168, rbp = 0x3f0096f7230 ---
> > KDB: enter: panic
> > [ thread pid 1208 tid 101107 ]
> > Stopped at  kdb_enter+0x33: movq$0,0x104eb92(%rip)
> > db>
> >
> >
> > Given suitable clues, I can poke at it a bit -- this is my "build
> > machine," so it doesn't have critical work to do at the moment.  (I
> > would normally have powered it down for the day: here's no need for
> > it to be wasting energy.)
> >
> > Laptops are still building ports under stable/14 -- something seems
> > to want the llvm17 port, and they have firefox to build, so they
> > won't be testing CURRENT/head for a while, yet.
> >
> > Peace,
> > david
> > --
> > David H. Wolfskill  da...@catwhisker.org
> > Alexey Navalny was a courageous man; Putin has made him a martyr.
> >
> > See https://www.catwhisker.org/~david/publickey.gpg for my public key.



Re: Panic after update main-n269202-4e7aa03b7076 -> n269230-f6f67f58c19d

2024-04-09 Thread Rick Macklem
On Tue, Apr 9, 2024 at 4:47 AM David Wolfskill  wrote:
>
> Machine had been running:
>
> FreeBSD 15.0-CURRENT #43 main-n269202-4e7aa03b7076: Mon Apr  8 11:19:58 UTC 
> 2024 
> r...@freebeast.catwhisker.org:/common/S4/obj/usr/src/amd64.amd64/sys/GENERIC 
> amd64 1500018 1500018
>
> This was an in-place source update, after updating sources to
> main-n269230-f6f67f58c19d.  On reboot (after "make installworld"
> completed, I see this on the serial console (copy/pasted):
>
> ...
> Starting lockd.
I'd guess this is caused by some recent change to AF_UNIX socket
creation. The crash appears to be either the SOCK_LOCK() or
SOCKBUF_LOCK(>so_rcv) not being initialized.
If you can find out what source line# corresponds to
clnt_vc_create+0x4f4 you can probably tell which one it is.

All local_rpcb() does is a
  error = socreate(AF_LOCAL, , SOCK_STREAM, 0, curthread->td_ucred,
curthread);
  and then calls clnt_vc_create(..so..) with the socket.

I think that socreate() is not initializing one of those two mutexes
for some reason.

rick

>
>
> Fatal trap 12: page fault while in kernel mode
> cpuid = 9; apic id = 09
> fault virtual address   = 0x18
> fault code  = supervisor read data, page not present
> instruction pointer = 0x20:0x80b208c5
> stack pointer   = 0x28:0xfe048c204920
> frame pointer   = 0x28:0xfe048c204960
> code segment= base 0x0, limit 0xf, type 0x1b
> = DPL 0, pres 1, long 1, def32 0, gran 1
> processor eflags= interrupt enabled, resume, IOPL = 0
> current process = 1208 (rpc.Starting automountd.
> lockd)
> rdi:  rsi: f801078b0740 rdx: 
> rcx: 010a  r8: 818d30f0  r9: 
> rax:  rbx: Starting powerd.0018 rbp: 
> fe048c204960
> r10: 0001 r11: 0001 r12: f80274e32c18
> r13: 010a r14: f80274e32c00 r15: 812ae38a
> trap number = 12
> panic: page fault
> cpuid = 9
> time = 1712662362
> KDB: stack backtrace:
> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe048c2045f0
> vpanic() at vpanic+0x135/frame 0xfe048c204720
> panic() at panic+0x43/frame 0xfe048c204780
> trap_fatal() at trap_fatal+0x40b/frame 0xfe048c2047e0
> trap_pfault() at trap_pfault+0xa0/frame 0xfe048c204850
> calltrap() at calltrap+0x8/frame 0xfe048c204850
> --- trap 0xc, rip = 0x80b208c5, rsp = 0xfe048c204920, rbp = 
> 0xfe
> 048c204960 ---
> __mtx_lock_flags() at __mtx_lock_flags+0x45/frame 0xfe048c204960
> clnt_vc_create() at clnt_vc_create+0x4f4/frame 0xfe048c204ab0
> local_rpcb() at local_rpcb+0x11b/frame 0xfe048c204b50
> rpcb_unset() at rpcb_unset+0x24/frame 0xfe048c204bb0
> svc_tp_create() at svc_tp_create+0xee/frame 0xfe048c204c90
> sys_nlm_syscall() at sys_nlm_syscall+0x3d0/frame 0xfe048c204e00
> amd64_syscall() at amd64_syscall+0x158/frame 0xfe048c204f30
> fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfe048c204f30
> --- syscall (154, FreeBSD ELF64, nlm_syscall), rip = 0x3f00a2dfd2a, rsp = 
> 0x3f00
> 96f7168, rbp = 0x3f0096f7230 ---
> KDB: enter: panic
> [ thread pid 1208 tid 101107 ]
> Stopped at  kdb_enter+0x33: movq$0,0x104eb92(%rip)
> db>
>
>
> Given suitable clues, I can poke at it a bit -- this is my "build
> machine," so it doesn't have critical work to do at the moment.  (I
> would normally have powered it down for the day: here's no need for
> it to be wasting energy.)
>
> Laptops are still building ports under stable/14 -- something seems
> to want the llvm17 port, and they have firefox to build, so they
> won't be testing CURRENT/head for a while, yet.
>
> Peace,
> david
> --
> David H. Wolfskill  da...@catwhisker.org
> Alexey Navalny was a courageous man; Putin has made him a martyr.
>
> See https://www.catwhisker.org/~david/publickey.gpg for my public key.



Panic after update main-n269202-4e7aa03b7076 -> n269230-f6f67f58c19d

2024-04-09 Thread David Wolfskill
Machine had been running:

FreeBSD 15.0-CURRENT #43 main-n269202-4e7aa03b7076: Mon Apr  8 11:19:58 UTC 
2024 
r...@freebeast.catwhisker.org:/common/S4/obj/usr/src/amd64.amd64/sys/GENERIC 
amd64 1500018 1500018

This was an in-place source update, after updating sources to
main-n269230-f6f67f58c19d.  On reboot (after "make installworld"
completed, I see this on the serial console (copy/pasted):

...
Starting lockd.


Fatal trap 12: page fault while in kernel mode
cpuid = 9; apic id = 09
fault virtual address   = 0x18
fault code  = supervisor read data, page not present
instruction pointer = 0x20:0x80b208c5
stack pointer   = 0x28:0xfe048c204920
frame pointer   = 0x28:0xfe048c204960
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 1208 (rpc.Starting automountd.
lockd)
rdi:  rsi: f801078b0740 rdx: 
rcx: 010a  r8: 818d30f0  r9: 
rax:  rbx: Starting powerd.0018 rbp: 
fe048c204960
r10: 0001 r11: 0001 r12: f80274e32c18
r13: 010a r14: f80274e32c00 r15: 812ae38a
trap number = 12
panic: page fault
cpuid = 9
time = 1712662362
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe048c2045f0
vpanic() at vpanic+0x135/frame 0xfe048c204720
panic() at panic+0x43/frame 0xfe048c204780
trap_fatal() at trap_fatal+0x40b/frame 0xfe048c2047e0
trap_pfault() at trap_pfault+0xa0/frame 0xfe048c204850
calltrap() at calltrap+0x8/frame 0xfe048c204850
--- trap 0xc, rip = 0x80b208c5, rsp = 0xfe048c204920, rbp = 0xfe
048c204960 ---
__mtx_lock_flags() at __mtx_lock_flags+0x45/frame 0xfe048c204960
clnt_vc_create() at clnt_vc_create+0x4f4/frame 0xfe048c204ab0
local_rpcb() at local_rpcb+0x11b/frame 0xfe048c204b50
rpcb_unset() at rpcb_unset+0x24/frame 0xfe048c204bb0
svc_tp_create() at svc_tp_create+0xee/frame 0xfe048c204c90
sys_nlm_syscall() at sys_nlm_syscall+0x3d0/frame 0xfe048c204e00
amd64_syscall() at amd64_syscall+0x158/frame 0xfe048c204f30
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfe048c204f30
--- syscall (154, FreeBSD ELF64, nlm_syscall), rip = 0x3f00a2dfd2a, rsp = 0x3f00
96f7168, rbp = 0x3f0096f7230 ---
KDB: enter: panic
[ thread pid 1208 tid 101107 ]
Stopped at  kdb_enter+0x33: movq$0,0x104eb92(%rip)
db> 


Given suitable clues, I can poke at it a bit -- this is my "build
machine," so it doesn't have critical work to do at the moment.  (I
would normally have powered it down for the day: here's no need for
it to be wasting energy.)

Laptops are still building ports under stable/14 -- something seems
to want the llvm17 port, and they have firefox to build, so they
won't be testing CURRENT/head for a while, yet.

Peace,
david
-- 
David H. Wolfskill  da...@catwhisker.org
Alexey Navalny was a courageous man; Putin has made him a martyr.

See https://www.catwhisker.org/~david/publickey.gpg for my public key.


signature.asc
Description: PGP signature


Re: pkg-1.21.0: after upgrade 1.20.9_1 -> 1.21.0: pkg core dumps on specific ports

2024-04-09 Thread Baptiste Daroussin
On Sat 06 Apr 09:23, Rainer Hurling wrote:
> Am 06.04.24 um 09:05 schrieb FreeBSD User:
> > Hello,
> > 
> > after updating (portmaster and make) ports-mgmt/ports from 1.20.9_1 -> 
> > 1.21.0 on CURRENT and
> > 14-STABLE, I can't update several ports:
> > 
> > www/apache24
> > databases/redis
> > 
> > pkg core dumps while performing installation. apache24 and redis are ports 
> > I realized this
> > misbehaviour on ALL 14-STABLE and CURRENT boxes (both OS variants latest 
> > builds, i.e. FreeBSD
> > 15.0-CURRENT #32 main-n269135-da2b732288c7: Fri Apr  5 20:30:39 CEST 2024 
> > amd64).
> > 
> > After some updates on a poudriere builder (CURRENT base host, 14.0-RELENG 
> > jail with poudriere)
> > building packages for 14.0-RELENG, I observed the same behaviour when 
> > updating packages on
> > target hosts where pkg is first updated, on those hosts, nextcloud-server 
> > and icinga2 host
> > utilizing also databases/redis and www/apache24, pkg fails the same way.
> > 
> > I do not dare to update our poudriere hosts since the problem seems to pop 
> > up when pkg 1.21.0
> > is installed, no matter whether I use poudriere built ports (from our own 
> > builder hosts) or
> > recent source tree with portmaster/make build process.
> > 
> > Looks like a serious bug to me and not a site/user specific problem. 
> > Hopefully others do
> > realize the same ...
> > 
> > Thanks in advance,
> > 
> > oh
> 
> 
> Hmm, I just tried to reproduce that. Both ports mentioned, databases/redis
> and www/apache24, can be built and installed with Portmaster. The box is a
> 15.0-CURRENT with pkg-1.21.0.
> 
> Maybe 'pkg check -Bn' or 'portmaster --check-depends --check-port-dbdir'
> show some inconsistencies?
> 
> Best wishes,
> Rainer
> 
> 
using portmaster or not are strictly unlikely to be helpful here.

The right way to test if to report running with pkg - and also to recommand
testing with default options in pkg.conf.

Best regards,
Bapt



Re: pkg-1.21.0: after upgrade 1.20.9_1 -> 1.21.0: pkg core dumps on specific ports

2024-04-09 Thread Baptiste Daroussin
On Sat 06 Apr 09:05, FreeBSD User wrote:
> Hello,
> 
> after updating (portmaster and make) ports-mgmt/ports from 1.20.9_1 -> 1.21.0 
> on CURRENT and
> 14-STABLE, I can't update several ports:
> 
> www/apache24
> databases/redis
> 
> pkg core dumps while performing installation. apache24 and redis are ports I 
> realized this
> misbehaviour on ALL 14-STABLE and CURRENT boxes (both OS variants latest 
> builds, i.e. FreeBSD
> 15.0-CURRENT #32 main-n269135-da2b732288c7: Fri Apr  5 20:30:39 CEST 2024 
> amd64).
> 
> After some updates on a poudriere builder (CURRENT base host, 14.0-RELENG 
> jail with poudriere)
> building packages for 14.0-RELENG, I observed the same behaviour when 
> updating packages on
> target hosts where pkg is first updated, on those hosts, nextcloud-server and 
> icinga2 host
> utilizing also databases/redis and www/apache24, pkg fails the same way.
> 
> I do not dare to update our poudriere hosts since the problem seems to pop up 
> when pkg 1.21.0
> is installed, no matter whether I use poudriere built ports (from our own 
> builder hosts) or
> recent source tree with portmaster/make build process.
> 
> Looks like a serious bug to me and not a site/user specific problem. 
> Hopefully others do
> realize the same ...
> 
> Thanks in advance,
> 
> oh 
> 
https://github.com/freebsd/pkg/issues/2270

set HANDLE_RC_SCRIPTS=false in your pkg.conf

a Fix was made last friday, given this is a non default option I waited for the
Week end to pass to see if there were other regressions, but no more reports so
I will issue a pkg 1.21.1 now.

Best regards,
Bapt



RFC: Does anyone use the -public/-webnfs NFS exports?

2024-04-07 Thread Rick Macklem
Hi,

I have a hunch that no one uses the WebNFS stuff, which is done via exports(5)
using the -public or -webnfs exports options.
I would like to deprecate these exports options, but thought I'd ask
in case anyone
uses them?

rick



Re: 15.0 on RPi4, USB broken: uhub_reattach_port: port 2 reset failed, error=USB_ERR_TIMEOUT

2024-04-07 Thread Lexi Winter
Mark Millard:
> On Mar 30, 2024, at 12:44, Lexi Winter  wrote:
> > when the problem happens, with USB_DEBUG enabled, the kernel logs:
> > 
> > uhub_reattach_port: port 2 reset failed, error=USB_ERR_TIMEOUT
> > uhub_reattach_port: device problem (USB_ERR_TIMEOUT), disabling port 2
> 
> Here is my config.txt material related to such issues:
> 
> #
> # Local addition that avoids USB3 SSD boot failures that look like:
> #   uhub_reattach_port: port ? reset failed, error=USB_ERR_TIMEOUT
> #   uhub_reattach_port: device problem (USB_ERR_TIMEOUT), disabling port ?
> # WARNING, not sufficient for "boot -s": that needs the full force_turbo=1
> initial_turbo=60
 
thanks -- after setting this in config.txt, the problem seems to be
fixed.

hopefully this won't cause any overheating issues; none of my RPis have
fans, but they have fairly decent passive cooling, and are running
powerd on boot.

regards, lexi.


signature.asc
Description: PGP signature


Re: pkg-1.21.0: after upgrade 1.20.9_1 -> 1.21.0: pkg-static core dumps on some ports

2024-04-07 Thread FreeBSD User
Am Sat, 6 Apr 2024 09:05:00 +0200
FreeBSD User  schrieb:

> Hello,
> 
> after updating (portmaster and make) ports-mgmt/ports from 1.20.9_1 -> 1.21.0 
> on CURRENT and
> 14-STABLE, I can't update several ports:
> 
> www/apache24
> databases/redis
> 
> pkg core dumps while performing installation. apache24 and redis are ports I 
> realized this
> misbehaviour on ALL 14-STABLE and CURRENT boxes (both OS variants latest 
> builds, i.e. FreeBSD
> 15.0-CURRENT #32 main-n269135-da2b732288c7: Fri Apr  5 20:30:39 CEST 2024 
> amd64).
> 
> After some updates on a poudriere builder (CURRENT base host, 14.0-RELENG 
> jail with
> poudriere) building packages for 14.0-RELENG, I observed the same behaviour 
> when updating
> packages on target hosts where pkg is first updated, on those hosts, 
> nextcloud-server and
> icinga2 host utilizing also databases/redis and www/apache24, pkg fails the 
> same way.
> 
> I do not dare to update our poudriere hosts since the problem seems to pop up 
> when pkg 1.21.0
> is installed, no matter whether I use poudriere built ports (from our own 
> builder hosts) or
> recent source tree with portmaster/make build process.
> 
> Looks like a serious bug to me and not a site/user specific problem. 
> Hopefully others do
> realize the same ...
> 
> Thanks in advance,
> 
> oh 
> 
> 

Hello,

after following a recommnedation checking dependencies on ports via pkg check 
-Bn, recompiling
pkg via "portmaster -df ports-mgmt/pkg" along with all ports found by the check 
command as
well a sqlite (precaustion), still the pkg-static binary drops core dumps on 
some ports.

Phenomenon: When updating existing ports, like

www/apache24
databases/redis
net/openldap26-server
misc/e2fsprogs-libuuid

building the port runs smootly, but pkg-static dies on deleting attempt of the
old/to-be-reinstalled port. The problem arises by using portmaster as well as 
performing "make
install/make deinstall" in the specific target port.

Last port I hit is  misc/e2fsprogs-libuuid.

My skills debugging core dumps are rather limited, our boxes do have debugging 
disabled. Since
the problem spreads across several hosts running CURRENT (same IcyBridge CPU 
generation, but
one host most recent CURRENT with LLVM18, the other one running a CURRENT 
compiled 4 days ago
and as of last week the problem arose also on 14-STABLE on a box in the lab 
when performing
the tansition from pkg 1.20.9_1 -> 1.21.0), I'd exclude a hardware/memory issue.

Using (a freshly recompiled) gdb 14 from ports gives not much:

[...]
root@thor:/usr/ports # gdb  /usr/local/sbin/pkg-static
/packages/portmaster-backup/pkg-static.core
GNU gdb (GDB) 14.1 [GDB v14.1 for FreeBSD]
Copyright (C) 2023 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-portbld-freebsd15.0".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
.
Find the GDB manual and other documentation resources online at:
.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/local/sbin/pkg-static...
(No debugging symbols found in /usr/local/sbin/pkg-static)
[New LWP 101269]
Core was generated by `/usr/local/sbin/pkg-static delete -yf 
e2fsprogs-libuuid-1.47.0'.
Program terminated with signal SIGSEGV, Segmentation fault.
Address not mapped to object.
#0  0x00b3de2b in strlen_baseline ()



Kind regards,
oh



-- 
O. Hartmann



Re: CVE-2024-3094: malicious code in xz 5.6.0 and xz 5.6.1

2024-04-06 Thread FreeBSD User
Am Thu, 4 Apr 2024 01:14:52 -0500
Kyle Evans  schrieb:

> On 4/4/24 00:49, FreeBSD User wrote:
> > Hello,
> > 
> > I just stumbled over this CVE regarding xz 5.6.0 and 5.6.1:
> > 
> > https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2024-3094
> > 
> > FreeBSD starting with 14-STABLE seems to use xz 5.6.0, but my limited 
> > skills do not allow
> > me to judge wether the described exploit mechanism also works on FreeBSD.
> > RedHat already sent out a warning, the workaround is to move back towards 
> > an older variant.
> > 
> > I have to report to my superiors (we're using 14-STABLE and CURRENT and I 
> > do so in
> > private), so I would like to welcome any comment on that.
> > 
> > Thanks in advance,
> > 
> > O. Hartmann
> > 
> >   
> 
> See so@'s answer from a couple days ago:
> 
> https://lists.freebsd.org/archives/freebsd-security/2024-March/000248.html
> 
> TL;DR no
> 
> Thanks,
> 
> Kyle Evans

Thank you very much.

Kind regards,

oh

-- 
O. Hartmann



Re: pkg-1.21.0: after upgrade 1.20.9_1 -> 1.21.0: pkg core dumps on specific ports

2024-04-06 Thread Rainer Hurling

Am 06.04.24 um 09:05 schrieb FreeBSD User:

Hello,

after updating (portmaster and make) ports-mgmt/ports from 1.20.9_1 -> 1.21.0 
on CURRENT and
14-STABLE, I can't update several ports:

www/apache24
databases/redis

pkg core dumps while performing installation. apache24 and redis are ports I 
realized this
misbehaviour on ALL 14-STABLE and CURRENT boxes (both OS variants latest 
builds, i.e. FreeBSD
15.0-CURRENT #32 main-n269135-da2b732288c7: Fri Apr  5 20:30:39 CEST 2024 
amd64).

After some updates on a poudriere builder (CURRENT base host, 14.0-RELENG jail 
with poudriere)
building packages for 14.0-RELENG, I observed the same behaviour when updating 
packages on
target hosts where pkg is first updated, on those hosts, nextcloud-server and 
icinga2 host
utilizing also databases/redis and www/apache24, pkg fails the same way.

I do not dare to update our poudriere hosts since the problem seems to pop up 
when pkg 1.21.0
is installed, no matter whether I use poudriere built ports (from our own 
builder hosts) or
recent source tree with portmaster/make build process.

Looks like a serious bug to me and not a site/user specific problem. Hopefully 
others do
realize the same ...

Thanks in advance,

oh



Hmm, I just tried to reproduce that. Both ports mentioned, 
databases/redis and www/apache24, can be built and installed with 
Portmaster. The box is a 15.0-CURRENT with pkg-1.21.0.


Maybe 'pkg check -Bn' or 'portmaster --check-depends --check-port-dbdir' 
show some inconsistencies?


Best wishes,
Rainer




pkg-1.21.0: after upgrade 1.20.9_1 -> 1.21.0: pkg core dumps on specific ports

2024-04-06 Thread FreeBSD User
Hello,

after updating (portmaster and make) ports-mgmt/ports from 1.20.9_1 -> 1.21.0 
on CURRENT and
14-STABLE, I can't update several ports:

www/apache24
databases/redis

pkg core dumps while performing installation. apache24 and redis are ports I 
realized this
misbehaviour on ALL 14-STABLE and CURRENT boxes (both OS variants latest 
builds, i.e. FreeBSD
15.0-CURRENT #32 main-n269135-da2b732288c7: Fri Apr  5 20:30:39 CEST 2024 
amd64).

After some updates on a poudriere builder (CURRENT base host, 14.0-RELENG jail 
with poudriere)
building packages for 14.0-RELENG, I observed the same behaviour when updating 
packages on
target hosts where pkg is first updated, on those hosts, nextcloud-server and 
icinga2 host
utilizing also databases/redis and www/apache24, pkg fails the same way.

I do not dare to update our poudriere hosts since the problem seems to pop up 
when pkg 1.21.0
is installed, no matter whether I use poudriere built ports (from our own 
builder hosts) or
recent source tree with portmaster/make build process.

Looks like a serious bug to me and not a site/user specific problem. Hopefully 
others do
realize the same ...

Thanks in advance,

oh 


-- 
O. Hartmann



Re: CVE-2024-3094: malicious code in xz 5.6.0 and xz 5.6.1

2024-04-04 Thread Ben C. O. Grimm

On April 4, 2024 07:50:55 FreeBSD User  wrote:


Hello,

I just stumbled over this CVE regarding xz 5.6.0 and 5.6.1:

https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2024-3094

FreeBSD starting with 14-STABLE seems to use xz 5.6.0, but my limited 
skills do not allow me

to judge wether the described exploit mechanism also works on FreeBSD.
RedHat already sent out a warning, the workaround is to move back towards 
an older variant.


I have to report to my superiors (we're using 14-STABLE and CURRENT and I 
do so in private),

so I would like to welcome any comment on that.

Thanks in advance,

O. Hartmann


--
O. Hartmann


As noted on freebsd-security last Friday:

FreeBSD is not affected by the recently announced backdoor included in the 
5.6.0 and 5.6.1 xz releases.




All supported FreeBSD releases include versions of xz that predate the 
affected releases.




The main, stable/14, and stable/13 branches do include the affected version 
(5.6.0), but the backdoor components were excluded from the vendor import. 
Additionally, FreeBSD does not use the upstream's build tooling, which was 
a required part of the attack. Lastly, the attack specifically targeted 
x86_64 Linux systems using glibc.


Re: CVE-2024-3094: malicious code in xz 5.6.0 and xz 5.6.1

2024-04-04 Thread Kyle Evans

On 4/4/24 00:49, FreeBSD User wrote:

Hello,

I just stumbled over this CVE regarding xz 5.6.0 and 5.6.1:

https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2024-3094

FreeBSD starting with 14-STABLE seems to use xz 5.6.0, but my limited skills do 
not allow me
to judge wether the described exploit mechanism also works on FreeBSD.
RedHat already sent out a warning, the workaround is to move back towards an 
older variant.

I have to report to my superiors (we're using 14-STABLE and CURRENT and I do so 
in private),
so I would like to welcome any comment on that.

Thanks in advance,

O. Hartmann




See so@'s answer from a couple days ago:

https://lists.freebsd.org/archives/freebsd-security/2024-March/000248.html

TL;DR no

Thanks,

Kyle Evans



Re: CVE-2024-3094: malicious code in xz 5.6.0 and xz 5.6.1

2024-04-04 Thread FreeBSD User
Am Thu, 04 Apr 2024 08:06:26 +0200 (CEST)
sth...@nethelp.no schrieb:

> >> I have to report to my superiors (we're using 14-STABLE and CURRENT
> >> and I do so in private),
> >> so I would like to welcome any comment on that.  
> > 
> > No it does not affect FreeBSD.
> > 
> > The autoconf script checks that it is running in a RedHat or Debian
> > package build environment before trying to proceed. There are also
> > checks for GCC and binutils ld.bfd. And I'm not sure that the payload
> > (a precompiled Linux object file) would work with FreeBSD and
> > /lib/libelf.so.2.
> > 
> > See
> > 
> > https://gist.github.com/thesamesam/223949d5a074ebc3dce9ee78baad9e27  
> 
> See also the following message from the FreeBSD security officer:
> 
> https://lists.freebsd.org/archives/freebsd-security/2024-March/000248.html
> 
> Steinar Haug, Nethelp consulting, sth...@nethelp.no
> 

Thank you very much for the quick answer.

Kind regards
oh

-- 
O. Hartmann



Re: CVE-2024-3094: malicious code in xz 5.6.0 and xz 5.6.1

2024-04-04 Thread sthaug
>> I have to report to my superiors (we're using 14-STABLE and CURRENT
>> and I do so in private),
>> so I would like to welcome any comment on that.
> 
> No it does not affect FreeBSD.
> 
> The autoconf script checks that it is running in a RedHat or Debian
> package build environment before trying to proceed. There are also
> checks for GCC and binutils ld.bfd. And I'm not sure that the payload
> (a precompiled Linux object file) would work with FreeBSD and
> /lib/libelf.so.2.
> 
> See
> 
> https://gist.github.com/thesamesam/223949d5a074ebc3dce9ee78baad9e27

See also the following message from the FreeBSD security officer:

https://lists.freebsd.org/archives/freebsd-security/2024-March/000248.html

Steinar Haug, Nethelp consulting, sth...@nethelp.no



Re: CVE-2024-3094: malicious code in xz 5.6.0 and xz 5.6.1

2024-04-04 Thread Paul Floyd




On 04-04-24 05:49, FreeBSD User wrote:

Hello,

I just stumbled over this CVE regarding xz 5.6.0 and 5.6.1:

https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2024-3094

FreeBSD starting with 14-STABLE seems to use xz 5.6.0, but my limited skills do 
not allow me
to judge whether the described exploit mechanism also works on FreeBSD.
RedHat already sent out a warning, the workaround is to move back towards an 
older variant.

I have to report to my superiors (we're using 14-STABLE and CURRENT and I do so 
in private),
so I would like to welcome any comment on that.


No it does not affect FreeBSD.

The autoconf script checks that it is running in a RedHat or Debian 
package build environment before trying to proceed. There are also 
checks for GCC and binutils ld.bfd. And I'm not sure that the payload (a 
precompiled Linux object file) would work with FreeBSD and /lib/libelf.so.2.


See

https://gist.github.com/thesamesam/223949d5a074ebc3dce9ee78baad9e27

A+
Paul



Re: LOR so_snd_sx / nfs

2024-04-03 Thread Rick Macklem
Shouldn't be a problem. The socket used for lookup is
AF_UNIX (uses unp_connectat) and the NFS socket
will always be UDP or TCP.

Different sockets imply different socket locks.

At least that's my interpretation, rick

On Wed, Apr 3, 2024 at 11:33 AM Bjoern A. Zeeb
 wrote:
>
>
> NFS root boot of a Lab machine;  calling wpa_cli:
>
> Thilock order reversal:
>   1st 0x0001d4e1c800 so_snd_sx (so_snd_sx, sx) @ 
> /usr/src/sys/kern/uipc_socket.c:4020
>   2nd 0xa020cb20e930 nfs (nfs, lockmgr) @ 
> /usr/src/sys/kern/vfs_lookup.c:1083
> lock order nfs -> so_snd_sx established at:
> #0 0x00529588 at witness_checkorder+0x328
> #1 0x004bdf48 at _sx_xlock+0x70
> #2 0x005687e0 at soiolock+0x5c
> #3 0x00567ff0 at sosend_generic+0x104
> #4 0x005688b8 at sosend+0x48
> #5 0x0076b6a0 at clnt_vc_call+0x570
> #6 0x00769914 at clnt_reconnect_call+0x1c4
> #7 0x003552ec at newnfs_request+0x7e4
> #8 0x0037abf0 at nfsrpc_getattrnovp+0xfc
> #9 0x0039823c at mountnfs+0x6ec
> #10 0x00395c64 at nfs_mount+0xe78
> #11 0x0059b59c at vfs_mount_sigdefer+0x30
> #12 0x005a44c8 at vfs_domount_first+0x254
> #13 0x005a0884 at vfs_domount+0x2d4
> #14 0x0059f1ec at vfs_donmount+0x824
> #15 0x005a3438 at kernel_mount+0x64
> #16 0x005a72b0 at parse_mount+0x494
> #17 0x005a59ac at vfs_mountroot+0x5b8
> lock order so_snd_sx -> nfs attempted at:
> #0 0x00529cd8 at witness_checkorder+0xa78
> #1 0x0047edb8 at lockmgr_lock_flags+0x78
> #2 0x00390044 at nfs_lock+0x34
> #3 0x0059793c at vop_sigdefer+0x38
> #4 0x005c3734 at _vn_lock+0x58
> #5 0x0059cb78 at vfs_lookup+0x12c
> #6 0x0059c01c at namei+0x280
> #7 0x005751f8 at unp_connectat+0x244
> #8 0x00576adc at uipc_sosend_dgram+0x3c0
> #9 0x005689c8 at sousrsend+0x80
> #10 0x0056ec90 at kern_sendit+0x1e4
> #11 0x0056ef78 at sendit+0x1b0
> #12 0x0056edb4 at sys_sendto+0x4c
> #13 0x0086ad40 at do_el0_sync+0x59c
> #14 0x0084391c at handle_el0_sync+0x48
>
>
> --
> Bjoern A. Zeeb r15:7
>



LOR so_snd_sx / nfs

2024-04-03 Thread Bjoern A. Zeeb



NFS root boot of a Lab machine;  calling wpa_cli:

Thilock order reversal:
 1st 0x0001d4e1c800 so_snd_sx (so_snd_sx, sx) @ 
/usr/src/sys/kern/uipc_socket.c:4020
 2nd 0xa020cb20e930 nfs (nfs, lockmgr) @ /usr/src/sys/kern/vfs_lookup.c:1083
lock order nfs -> so_snd_sx established at:
#0 0x00529588 at witness_checkorder+0x328
#1 0x004bdf48 at _sx_xlock+0x70
#2 0x005687e0 at soiolock+0x5c
#3 0x00567ff0 at sosend_generic+0x104
#4 0x005688b8 at sosend+0x48
#5 0x0076b6a0 at clnt_vc_call+0x570
#6 0x00769914 at clnt_reconnect_call+0x1c4
#7 0x003552ec at newnfs_request+0x7e4
#8 0x0037abf0 at nfsrpc_getattrnovp+0xfc
#9 0x0039823c at mountnfs+0x6ec
#10 0x00395c64 at nfs_mount+0xe78
#11 0x0059b59c at vfs_mount_sigdefer+0x30
#12 0x005a44c8 at vfs_domount_first+0x254
#13 0x005a0884 at vfs_domount+0x2d4
#14 0x0059f1ec at vfs_donmount+0x824
#15 0x005a3438 at kernel_mount+0x64
#16 0x005a72b0 at parse_mount+0x494
#17 0x005a59ac at vfs_mountroot+0x5b8
lock order so_snd_sx -> nfs attempted at:
#0 0x00529cd8 at witness_checkorder+0xa78
#1 0x0047edb8 at lockmgr_lock_flags+0x78
#2 0x00390044 at nfs_lock+0x34
#3 0x0059793c at vop_sigdefer+0x38
#4 0x005c3734 at _vn_lock+0x58
#5 0x0059cb78 at vfs_lookup+0x12c
#6 0x0059c01c at namei+0x280
#7 0x005751f8 at unp_connectat+0x244
#8 0x00576adc at uipc_sosend_dgram+0x3c0
#9 0x005689c8 at sousrsend+0x80
#10 0x0056ec90 at kern_sendit+0x1e4
#11 0x0056ef78 at sendit+0x1b0
#12 0x0056edb4 at sys_sendto+0x4c
#13 0x0086ad40 at do_el0_sync+0x59c
#14 0x0084391c at handle_el0_sync+0x48


--
Bjoern A. Zeeb r15:7



Re: Multi cons support has disappeared (on Alder Lake) was: Alt+Fn isn't functional. Has this been removed?

2024-04-02 Thread Kevin Oberman
On Tue, Apr 2, 2024 at 11:53 AM Tomoaki AOKI 
wrote:

> On Tue, 02 Apr 2024 08:53:15 -0700
> Chris  wrote:
>
> > On 2024-04-02 04:32, Tomoaki AOKI wrote:
> > > On Tue, 02 Apr 2024 00:42:23 -0700
> > > Chris  wrote:
> > >
> > >> On 2024-04-01 22:51, Kevin Oberman wrote:
> > >> > On Mon, Apr 1, 2024 at 3:05 PM Chris 
> wrote:
> > >> >
> > >> >> I experience challenges running FreeBSD on my Alder Lake laptop.
> > >> >> With some help on the list and Bugzilla, I was able to get Graphics
> > >> >> WiFi at least working. But still wasn't as stable as running on
> > >> >> more dated CPU's. As it is; I'm only able to use CURRENT. Beginning
> > >> >> of last week, in hopes of getting a more stable experience. I wiped
> > >> >> the partition (UFS) and unpacked the version available on the
> FreeBSD
> > >> >> ftp servers at that time. I quickly discovered that multi-cons
> (Ctrl+
> > >> >> Alt+Fn || Alt+Fn) was no longer available. I posted this discovery
> to
> > >> >> the list. But no solution was discovered. I've since attempted to
> use
> > >> >> 2 more different newer versions. Both of them were also w/o
> multi-con(s)
> > >> >> support. What must I do to fix, or uncover the cause of this?
> > >> >> I only load the associated GPU module in rc.conf(5) (no keyboard
> settings).
> > >> >> I'm also unable to get multi-cons booting from any of the boot
> media
> > >> >> produced within the last week.
> > >> >>
> > >> >> Following are some specifics:
> > >> >>
> > >> >> CPU: 12th Gen Intel(R) Core(TM) i3-1215U (2496.00-MHz K8-class CPU)
> > >> >>
> > >> >> IdeaPad 3 17IAU7
> > >> >>
>
[Lots of details elided]]

> > >> >
> > >> > I have a T16 and ran into that issue. It may be that BIOS changes
> have
> > >> > broken things, but I found that, by default, the F keys control
> volume,
> > >> > screen brightness, and many other things. I can use Fn+F[1-12] to
> perform
> > >> > traditional function key functions. I found that bios has an option
> to make
> > >> > the traditional functions the default which is how I am running
> today and
> > >> > have since shortly after I purchased the computer. One I set that
> BIOS
> > >> > option, everything worked "properly". I now use Fn+F[1-12] to
> adjust volume
> > >> > and screen brightness. I hope to get mute to work, but I need to
> figure out
> > >> > which event is set when Fn+F1 is pressed to write trivial devd
> support for
> > >> > it.
> > >> Well, I can't explain it. I set everything up in the BIOS to work
> > >> "traditionally"
> > >> and everything worked fine up until the upgrade. Where everything went
> > >> "south"
> > >> in the Fn department. But since you mentioned it. I thought I'd
> review the
> > >> settings
> > >> and sure enough, the Function key settings had changed. I have no
> > >> explanation. I
> > >> haven't been to the BIOS settings since initial setup. But only that
> > >> setting
> > >> was
> > >> changed. I can't thank you enough for mentioning this, Kevin. I
> *really*
> > >> appreciate
> > >> your taking the time to reply!
> > >
> > > So I was correct. ;-)
> > I don't know how in the  I missed your suggestion. Thank you for
> figuring
> > it out,
> > (even if I somehow missed it)! How embarrassing.
>
> Maybe just because my previous post was sent to this ML only, not
> including your email directly as a recipient. ;-)
>

FWIW, when I sent the message to "All",  bsdforge had blocked it (550 5.0.0
REJECT, too much abuse from your host). If this is an indication that my
system is misbehaving, please let me know.
-- 
Kevin Oberman, Part time kid herder and retired Network Engineer
E-mail: rkober...@gmail.com
PGP Fingerprint: D03FB98AFA78E3B78C1694B318AB39EF1B055683


Re: Multi cons support has disappeared (on Alder Lake) was: Alt+Fn isn't functional. Has this been removed?

2024-04-02 Thread Tomoaki AOKI
On Tue, 02 Apr 2024 08:53:15 -0700
Chris  wrote:

> On 2024-04-02 04:32, Tomoaki AOKI wrote:
> > On Tue, 02 Apr 2024 00:42:23 -0700
> > Chris  wrote:
> > 
> >> On 2024-04-01 22:51, Kevin Oberman wrote:
> >> > On Mon, Apr 1, 2024 at 3:05 PM Chris  wrote:
> >> >
> >> >> I experience challenges running FreeBSD on my Alder Lake laptop.
> >> >> With some help on the list and Bugzilla, I was able to get Graphics
> >> >> WiFi at least working. But still wasn't as stable as running on
> >> >> more dated CPU's. As it is; I'm only able to use CURRENT. Beginning
> >> >> of last week, in hopes of getting a more stable experience. I wiped
> >> >> the partition (UFS) and unpacked the version available on the FreeBSD
> >> >> ftp servers at that time. I quickly discovered that multi-cons (Ctrl+
> >> >> Alt+Fn || Alt+Fn) was no longer available. I posted this discovery to
> >> >> the list. But no solution was discovered. I've since attempted to use
> >> >> 2 more different newer versions. Both of them were also w/o multi-con(s)
> >> >> support. What must I do to fix, or uncover the cause of this?
> >> >> I only load the associated GPU module in rc.conf(5) (no keyboard 
> >> >> settings).
> >> >> I'm also unable to get multi-cons booting from any of the boot media
> >> >> produced within the last week.
> >> >>
> >> >> Following are some specifics:
> >> >>
> >> >> CPU: 12th Gen Intel(R) Core(TM) i3-1215U (2496.00-MHz K8-class CPU)
> >> >>
> >> >> IdeaPad 3 17IAU7
> >> >>
> >> >> WORKS:
> >> >> FreeBSD 15.0-CURRENT #0 main-n267640-7a4d1d1df0b2:
> >> >> Thu Jan 18 04:04:32 UTC 2024
> >> >>
> >> >> DOESN'T WORK:
> >> >> FreeBSD 15.0-CURRENT #0 main-n269036-6baddb6b1176:
> >> >> Fri Mar 29 10:19:43 UTC 2024
> >> >> r...@releng3.nyi.freebsd.org:/usr/obj/usr/src/amd64.amd64/sys/GENERIC
> >> >> amd64
> >> >>
> >> >> FreeBSD 15.0-CURRENT #0 main-n268793-220ee18f1964:
> >> >> Thu Mar 14 02:58:39 UTC 2024
> >> >> r...@releng3.nyi.freebsd.org:/usr/obj/usr/src/amd64.amd64/sys/GENERIC
> >> >> amd64
> >> >>
> >> >> hostb0@pci0:0:0:0:  class=0x06 rev=0x04 hdr=0x00 vendor=0x8086
> >> >> device=0x4609 subvendor=0x17aa subdevice=0x3803
> >> >>  vendor = 'Intel Corporation'
> >> >>  class  = bridge
> >> >>  subclass   = HOST-PCI
> >> >> vgapci0@pci0:0:2:0: class=0x03 rev=0x0c hdr=0x00 vendor=0x8086
> >> >> device=0x46b3 subvendor=0x17aa subdevice=0x3b3a
> >> >>  vendor = 'Intel Corporation'
> >> >>  device = 'Alder Lake-UP3 GT1 [UHD Graphics]'
> >> >>  class  = display
> >> >>  subclass   = VGA
> >> >> none0@pci0:0:4:0:   class=0x118000 rev=0x04 hdr=0x00 vendor=0x8086
> >> >> device=0x461d subvendor=0x17aa subdevice=0x380c
> >> >>  vendor = 'Intel Corporation'
> >> >>  device = 'Alder Lake Innovation Platform Framework Processor
> >> >> Participant'
> >> >>  class  = dasp
> >> >> pcib1@pci0:0:6:0:   class=0x060400 rev=0x04 hdr=0x01 vendor=0x8086
> >> >> device=0x464d subvendor=0x17aa subdevice=0x380e
> >> >>  vendor = 'Intel Corporation'
> >> >>  device = '12th Gen Core Processor PCI Express x4 Controller'
> >> >>  class  = bridge
> >> >>  subclass   = PCI-PCI
> >> >> none1@pci0:0:10:0:  class=0x118000 rev=0x01 hdr=0x00 vendor=0x8086
> >> >> device=0x467d subvendor=0x17aa subdevice=0x3813
> >> >>  vendor = 'Intel Corporation'
> >> >>  device = 'Platform Monitoring Technology'
> >> >>  class  = dasp
> >> >> xhci0@pci0:0:13:0:  class=0x0c0330 rev=0x04 hdr=0x00 vendor=0x8086
> >> >> device=0x461e subvendor=0x17aa subdevice=0x3824
> >> >>  vendor = 'Intel Corporation'
> >> >>  device = 'Alder Lake-P Thunderbolt 4 USB Controller'
> >> >>  class  = serial bus
> >> >>  subclass   = USB
> >> >> xhci1@pci0:0:20:0:  class=0x0c0330 rev=0x01 hdr=0x00 vendor=0x8086
> >> >> device=0x51ed subvendor=0x17aa subdevice=0x3820
> >> >>  vendor = 'Intel Corporation'
> >> >>  device = 'Alder Lake PCH USB 3.2 xHCI Host Controller'
> >> >>  class  = serial bus
> >> >>  subclass   = USB
> >> >> none2@pci0:0:20:2:  class=0x05 rev=0x01 hdr=0x00 vendor=0x8086
> >> >> device=0x51ef subvendor=0x17aa subdevice=0x381e
> >> >>  vendor = 'Intel Corporation'
> >> >>  device = 'Alder Lake PCH Shared SRAM'
> >> >>  class  = memory
> >> >>  subclass   = RAM
> >> >> iwlwifi0@pci0:0:20:3:   class=0x028000 rev=0x01 hdr=0x00 vendor=0x8086
> >> >> device=0x51f0 subvendor=0x8086 subdevice=0x0074
> >> >>  vendor = 'Intel Corporation'
> >> >>  device = 'Alder Lake-P PCH CNVi WiFi'
> >> >>  class  = network
> >> >> ig4iic0@pci0:0:21:0:class=0x0c8000 rev=0x01 hdr=0x00 vendor=0x8086
> >> >> device=0x51e8 subvendor=0x17aa subdevice=0x3812
> >> >>  vendor = 'Intel Corporation'
> >> >>  device = 'Alder Lake PCH Serial IO I2C Controller'
> >> >>  class  = serial bus
> >> >> ig4iic1@pci0:0:21:1:

Re: Multi cons support has disappeared (on Alder Lake) was: Alt+Fn isn't functional. Has this been removed?

2024-04-02 Thread Chris

On 2024-04-02 04:32, Tomoaki AOKI wrote:

On Tue, 02 Apr 2024 00:42:23 -0700
Chris  wrote:


On 2024-04-01 22:51, Kevin Oberman wrote:
> On Mon, Apr 1, 2024 at 3:05 PM Chris  wrote:
>
>> I experience challenges running FreeBSD on my Alder Lake laptop.
>> With some help on the list and Bugzilla, I was able to get Graphics
>> WiFi at least working. But still wasn't as stable as running on
>> more dated CPU's. As it is; I'm only able to use CURRENT. Beginning
>> of last week, in hopes of getting a more stable experience. I wiped
>> the partition (UFS) and unpacked the version available on the FreeBSD
>> ftp servers at that time. I quickly discovered that multi-cons (Ctrl+
>> Alt+Fn || Alt+Fn) was no longer available. I posted this discovery to
>> the list. But no solution was discovered. I've since attempted to use
>> 2 more different newer versions. Both of them were also w/o multi-con(s)
>> support. What must I do to fix, or uncover the cause of this?
>> I only load the associated GPU module in rc.conf(5) (no keyboard settings).
>> I'm also unable to get multi-cons booting from any of the boot media
>> produced within the last week.
>>
>> Following are some specifics:
>>
>> CPU: 12th Gen Intel(R) Core(TM) i3-1215U (2496.00-MHz K8-class CPU)
>>
>> IdeaPad 3 17IAU7
>>
>> WORKS:
>> FreeBSD 15.0-CURRENT #0 main-n267640-7a4d1d1df0b2:
>> Thu Jan 18 04:04:32 UTC 2024
>>
>> DOESN'T WORK:
>> FreeBSD 15.0-CURRENT #0 main-n269036-6baddb6b1176:
>> Fri Mar 29 10:19:43 UTC 2024
>> r...@releng3.nyi.freebsd.org:/usr/obj/usr/src/amd64.amd64/sys/GENERIC
>> amd64
>>
>> FreeBSD 15.0-CURRENT #0 main-n268793-220ee18f1964:
>> Thu Mar 14 02:58:39 UTC 2024
>> r...@releng3.nyi.freebsd.org:/usr/obj/usr/src/amd64.amd64/sys/GENERIC
>> amd64
>>
>> hostb0@pci0:0:0:0:  class=0x06 rev=0x04 hdr=0x00 vendor=0x8086
>> device=0x4609 subvendor=0x17aa subdevice=0x3803
>>  vendor = 'Intel Corporation'
>>  class  = bridge
>>  subclass   = HOST-PCI
>> vgapci0@pci0:0:2:0: class=0x03 rev=0x0c hdr=0x00 vendor=0x8086
>> device=0x46b3 subvendor=0x17aa subdevice=0x3b3a
>>  vendor = 'Intel Corporation'
>>  device = 'Alder Lake-UP3 GT1 [UHD Graphics]'
>>  class  = display
>>  subclass   = VGA
>> none0@pci0:0:4:0:   class=0x118000 rev=0x04 hdr=0x00 vendor=0x8086
>> device=0x461d subvendor=0x17aa subdevice=0x380c
>>  vendor = 'Intel Corporation'
>>  device = 'Alder Lake Innovation Platform Framework Processor
>> Participant'
>>  class  = dasp
>> pcib1@pci0:0:6:0:   class=0x060400 rev=0x04 hdr=0x01 vendor=0x8086
>> device=0x464d subvendor=0x17aa subdevice=0x380e
>>  vendor = 'Intel Corporation'
>>  device = '12th Gen Core Processor PCI Express x4 Controller'
>>  class  = bridge
>>  subclass   = PCI-PCI
>> none1@pci0:0:10:0:  class=0x118000 rev=0x01 hdr=0x00 vendor=0x8086
>> device=0x467d subvendor=0x17aa subdevice=0x3813
>>  vendor = 'Intel Corporation'
>>  device = 'Platform Monitoring Technology'
>>  class  = dasp
>> xhci0@pci0:0:13:0:  class=0x0c0330 rev=0x04 hdr=0x00 vendor=0x8086
>> device=0x461e subvendor=0x17aa subdevice=0x3824
>>  vendor = 'Intel Corporation'
>>  device = 'Alder Lake-P Thunderbolt 4 USB Controller'
>>  class  = serial bus
>>  subclass   = USB
>> xhci1@pci0:0:20:0:  class=0x0c0330 rev=0x01 hdr=0x00 vendor=0x8086
>> device=0x51ed subvendor=0x17aa subdevice=0x3820
>>  vendor = 'Intel Corporation'
>>  device = 'Alder Lake PCH USB 3.2 xHCI Host Controller'
>>  class  = serial bus
>>  subclass   = USB
>> none2@pci0:0:20:2:  class=0x05 rev=0x01 hdr=0x00 vendor=0x8086
>> device=0x51ef subvendor=0x17aa subdevice=0x381e
>>  vendor = 'Intel Corporation'
>>  device = 'Alder Lake PCH Shared SRAM'
>>  class  = memory
>>  subclass   = RAM
>> iwlwifi0@pci0:0:20:3:   class=0x028000 rev=0x01 hdr=0x00 vendor=0x8086
>> device=0x51f0 subvendor=0x8086 subdevice=0x0074
>>  vendor = 'Intel Corporation'
>>  device = 'Alder Lake-P PCH CNVi WiFi'
>>  class  = network
>> ig4iic0@pci0:0:21:0:class=0x0c8000 rev=0x01 hdr=0x00 vendor=0x8086
>> device=0x51e8 subvendor=0x17aa subdevice=0x3812
>>  vendor = 'Intel Corporation'
>>  device = 'Alder Lake PCH Serial IO I2C Controller'
>>  class  = serial bus
>> ig4iic1@pci0:0:21:1:class=0x0c8000 rev=0x01 hdr=0x00 vendor=0x8086
>> device=0x51e9 subvendor=0x17aa subdevice=0x3814
>>  vendor = 'Intel Corporation'
>>  device = 'Alder Lake PCH Serial IO I2C Controller'
>>  class  = serial bus
>> none3@pci0:0:22:0:  class=0x078000 rev=0x01 hdr=0x00 vendor=0x8086
>> device=0x51e0 subvendor=0x17aa subdevice=0x3815
>>  vendor = 'Intel Corporation'
>>  device = 'Alder Lake PCH HECI Controller'
>>  class  = simple comms
>> ahci0@pci0:0:23:0:  class=0x010601 rev=0x01 hdr=0x00 

Re: Multi cons support has disappeared (on Alder Lake) was: Alt+Fn isn't functional. Has this been removed?

2024-04-02 Thread Tomoaki AOKI
On Tue, 02 Apr 2024 00:42:23 -0700
Chris  wrote:

> On 2024-04-01 22:51, Kevin Oberman wrote:
> > On Mon, Apr 1, 2024 at 3:05 PM Chris  wrote:
> > 
> >> I experience challenges running FreeBSD on my Alder Lake laptop.
> >> With some help on the list and Bugzilla, I was able to get Graphics
> >> WiFi at least working. But still wasn't as stable as running on
> >> more dated CPU's. As it is; I'm only able to use CURRENT. Beginning
> >> of last week, in hopes of getting a more stable experience. I wiped
> >> the partition (UFS) and unpacked the version available on the FreeBSD
> >> ftp servers at that time. I quickly discovered that multi-cons (Ctrl+
> >> Alt+Fn || Alt+Fn) was no longer available. I posted this discovery to
> >> the list. But no solution was discovered. I've since attempted to use
> >> 2 more different newer versions. Both of them were also w/o multi-con(s)
> >> support. What must I do to fix, or uncover the cause of this?
> >> I only load the associated GPU module in rc.conf(5) (no keyboard settings).
> >> I'm also unable to get multi-cons booting from any of the boot media
> >> produced within the last week.
> >> 
> >> Following are some specifics:
> >> 
> >> CPU: 12th Gen Intel(R) Core(TM) i3-1215U (2496.00-MHz K8-class CPU)
> >> 
> >> IdeaPad 3 17IAU7
> >> 
> >> WORKS:
> >> FreeBSD 15.0-CURRENT #0 main-n267640-7a4d1d1df0b2:
> >> Thu Jan 18 04:04:32 UTC 2024
> >> 
> >> DOESN'T WORK:
> >> FreeBSD 15.0-CURRENT #0 main-n269036-6baddb6b1176:
> >> Fri Mar 29 10:19:43 UTC 2024
> >> r...@releng3.nyi.freebsd.org:/usr/obj/usr/src/amd64.amd64/sys/GENERIC
> >> amd64
> >> 
> >> FreeBSD 15.0-CURRENT #0 main-n268793-220ee18f1964:
> >> Thu Mar 14 02:58:39 UTC 2024
> >> r...@releng3.nyi.freebsd.org:/usr/obj/usr/src/amd64.amd64/sys/GENERIC
> >> amd64
> >> 
> >> hostb0@pci0:0:0:0:  class=0x06 rev=0x04 hdr=0x00 vendor=0x8086
> >> device=0x4609 subvendor=0x17aa subdevice=0x3803
> >>  vendor = 'Intel Corporation'
> >>  class  = bridge
> >>  subclass   = HOST-PCI
> >> vgapci0@pci0:0:2:0: class=0x03 rev=0x0c hdr=0x00 vendor=0x8086
> >> device=0x46b3 subvendor=0x17aa subdevice=0x3b3a
> >>  vendor = 'Intel Corporation'
> >>  device = 'Alder Lake-UP3 GT1 [UHD Graphics]'
> >>  class  = display
> >>  subclass   = VGA
> >> none0@pci0:0:4:0:   class=0x118000 rev=0x04 hdr=0x00 vendor=0x8086
> >> device=0x461d subvendor=0x17aa subdevice=0x380c
> >>  vendor = 'Intel Corporation'
> >>  device = 'Alder Lake Innovation Platform Framework Processor
> >> Participant'
> >>  class  = dasp
> >> pcib1@pci0:0:6:0:   class=0x060400 rev=0x04 hdr=0x01 vendor=0x8086
> >> device=0x464d subvendor=0x17aa subdevice=0x380e
> >>  vendor = 'Intel Corporation'
> >>  device = '12th Gen Core Processor PCI Express x4 Controller'
> >>  class  = bridge
> >>  subclass   = PCI-PCI
> >> none1@pci0:0:10:0:  class=0x118000 rev=0x01 hdr=0x00 vendor=0x8086
> >> device=0x467d subvendor=0x17aa subdevice=0x3813
> >>  vendor = 'Intel Corporation'
> >>  device = 'Platform Monitoring Technology'
> >>  class  = dasp
> >> xhci0@pci0:0:13:0:  class=0x0c0330 rev=0x04 hdr=0x00 vendor=0x8086
> >> device=0x461e subvendor=0x17aa subdevice=0x3824
> >>  vendor = 'Intel Corporation'
> >>  device = 'Alder Lake-P Thunderbolt 4 USB Controller'
> >>  class  = serial bus
> >>  subclass   = USB
> >> xhci1@pci0:0:20:0:  class=0x0c0330 rev=0x01 hdr=0x00 vendor=0x8086
> >> device=0x51ed subvendor=0x17aa subdevice=0x3820
> >>  vendor = 'Intel Corporation'
> >>  device = 'Alder Lake PCH USB 3.2 xHCI Host Controller'
> >>  class  = serial bus
> >>  subclass   = USB
> >> none2@pci0:0:20:2:  class=0x05 rev=0x01 hdr=0x00 vendor=0x8086
> >> device=0x51ef subvendor=0x17aa subdevice=0x381e
> >>  vendor = 'Intel Corporation'
> >>  device = 'Alder Lake PCH Shared SRAM'
> >>  class  = memory
> >>  subclass   = RAM
> >> iwlwifi0@pci0:0:20:3:   class=0x028000 rev=0x01 hdr=0x00 vendor=0x8086
> >> device=0x51f0 subvendor=0x8086 subdevice=0x0074
> >>  vendor = 'Intel Corporation'
> >>  device = 'Alder Lake-P PCH CNVi WiFi'
> >>  class  = network
> >> ig4iic0@pci0:0:21:0:class=0x0c8000 rev=0x01 hdr=0x00 vendor=0x8086
> >> device=0x51e8 subvendor=0x17aa subdevice=0x3812
> >>  vendor = 'Intel Corporation'
> >>  device = 'Alder Lake PCH Serial IO I2C Controller'
> >>  class  = serial bus
> >> ig4iic1@pci0:0:21:1:class=0x0c8000 rev=0x01 hdr=0x00 vendor=0x8086
> >> device=0x51e9 subvendor=0x17aa subdevice=0x3814
> >>  vendor = 'Intel Corporation'
> >>  device = 'Alder Lake PCH Serial IO I2C Controller'
> >>  class  = serial bus
> >> none3@pci0:0:22:0:  class=0x078000 rev=0x01 hdr=0x00 vendor=0x8086
> >> device=0x51e0 subvendor=0x17aa subdevice=0x3815
> >>  vendor = 

Re: Multi cons support has disappeared (on Alder Lake) was: Alt+Fn isn't functional. Has this been removed?

2024-04-02 Thread David Wolfskill
On Mon, Apr 01, 2024 at 10:51:10PM -0700, Kevin Oberman wrote:
> 
> I have a T16 and ran into that issue. It may be that BIOS changes have
> broken things, but I found that, by default, the F keys control volume,
> screen brightness, and many other things. I can use Fn+F[1-12] to perform
> traditional function key functions. I found that bios has an option to make
> the traditional functions the default which is how I am running today and
> have since shortly after I purchased the computer. One I set that BIOS
> option, everything worked "properly". I now use Fn+F[1-12] to adjust volume
> and screen brightness. I hope to get mute to work, but I need to figure out
> which event is set when Fn+F1 is pressed to write trivial devd support for
> it.

Another approach (for making use of quasi-random keys scattered
across a keyboard) is to utilize x11/xbindkeys (at least, within
an X11 environment).

> 

Peace,
david
-- 
David H. Wolfskill  da...@catwhisker.org
Alexey Navalny was a courageous man; Putin has made him a martyr.

See https://www.catwhisker.org/~david/publickey.gpg for my public key.


signature.asc
Description: PGP signature


Re: Multi cons support has disappeared (on Alder Lake) was: Alt+Fn isn't functional. Has this been removed?

2024-04-02 Thread Chris

On 2024-04-01 22:51, Kevin Oberman wrote:

On Mon, Apr 1, 2024 at 3:05 PM Chris  wrote:


I experience challenges running FreeBSD on my Alder Lake laptop.
With some help on the list and Bugzilla, I was able to get Graphics
WiFi at least working. But still wasn't as stable as running on
more dated CPU's. As it is; I'm only able to use CURRENT. Beginning
of last week, in hopes of getting a more stable experience. I wiped
the partition (UFS) and unpacked the version available on the FreeBSD
ftp servers at that time. I quickly discovered that multi-cons (Ctrl+
Alt+Fn || Alt+Fn) was no longer available. I posted this discovery to
the list. But no solution was discovered. I've since attempted to use
2 more different newer versions. Both of them were also w/o multi-con(s)
support. What must I do to fix, or uncover the cause of this?
I only load the associated GPU module in rc.conf(5) (no keyboard settings).
I'm also unable to get multi-cons booting from any of the boot media
produced within the last week.

Following are some specifics:

CPU: 12th Gen Intel(R) Core(TM) i3-1215U (2496.00-MHz K8-class CPU)

IdeaPad 3 17IAU7

WORKS:
FreeBSD 15.0-CURRENT #0 main-n267640-7a4d1d1df0b2:
Thu Jan 18 04:04:32 UTC 2024

DOESN'T WORK:
FreeBSD 15.0-CURRENT #0 main-n269036-6baddb6b1176:
Fri Mar 29 10:19:43 UTC 2024
r...@releng3.nyi.freebsd.org:/usr/obj/usr/src/amd64.amd64/sys/GENERIC
amd64

FreeBSD 15.0-CURRENT #0 main-n268793-220ee18f1964:
Thu Mar 14 02:58:39 UTC 2024
r...@releng3.nyi.freebsd.org:/usr/obj/usr/src/amd64.amd64/sys/GENERIC
amd64

hostb0@pci0:0:0:0:  class=0x06 rev=0x04 hdr=0x00 vendor=0x8086
device=0x4609 subvendor=0x17aa subdevice=0x3803
 vendor = 'Intel Corporation'
 class  = bridge
 subclass   = HOST-PCI
vgapci0@pci0:0:2:0: class=0x03 rev=0x0c hdr=0x00 vendor=0x8086
device=0x46b3 subvendor=0x17aa subdevice=0x3b3a
 vendor = 'Intel Corporation'
 device = 'Alder Lake-UP3 GT1 [UHD Graphics]'
 class  = display
 subclass   = VGA
none0@pci0:0:4:0:   class=0x118000 rev=0x04 hdr=0x00 vendor=0x8086
device=0x461d subvendor=0x17aa subdevice=0x380c
 vendor = 'Intel Corporation'
 device = 'Alder Lake Innovation Platform Framework Processor
Participant'
 class  = dasp
pcib1@pci0:0:6:0:   class=0x060400 rev=0x04 hdr=0x01 vendor=0x8086
device=0x464d subvendor=0x17aa subdevice=0x380e
 vendor = 'Intel Corporation'
 device = '12th Gen Core Processor PCI Express x4 Controller'
 class  = bridge
 subclass   = PCI-PCI
none1@pci0:0:10:0:  class=0x118000 rev=0x01 hdr=0x00 vendor=0x8086
device=0x467d subvendor=0x17aa subdevice=0x3813
 vendor = 'Intel Corporation'
 device = 'Platform Monitoring Technology'
 class  = dasp
xhci0@pci0:0:13:0:  class=0x0c0330 rev=0x04 hdr=0x00 vendor=0x8086
device=0x461e subvendor=0x17aa subdevice=0x3824
 vendor = 'Intel Corporation'
 device = 'Alder Lake-P Thunderbolt 4 USB Controller'
 class  = serial bus
 subclass   = USB
xhci1@pci0:0:20:0:  class=0x0c0330 rev=0x01 hdr=0x00 vendor=0x8086
device=0x51ed subvendor=0x17aa subdevice=0x3820
 vendor = 'Intel Corporation'
 device = 'Alder Lake PCH USB 3.2 xHCI Host Controller'
 class  = serial bus
 subclass   = USB
none2@pci0:0:20:2:  class=0x05 rev=0x01 hdr=0x00 vendor=0x8086
device=0x51ef subvendor=0x17aa subdevice=0x381e
 vendor = 'Intel Corporation'
 device = 'Alder Lake PCH Shared SRAM'
 class  = memory
 subclass   = RAM
iwlwifi0@pci0:0:20:3:   class=0x028000 rev=0x01 hdr=0x00 vendor=0x8086
device=0x51f0 subvendor=0x8086 subdevice=0x0074
 vendor = 'Intel Corporation'
 device = 'Alder Lake-P PCH CNVi WiFi'
 class  = network
ig4iic0@pci0:0:21:0:class=0x0c8000 rev=0x01 hdr=0x00 vendor=0x8086
device=0x51e8 subvendor=0x17aa subdevice=0x3812
 vendor = 'Intel Corporation'
 device = 'Alder Lake PCH Serial IO I2C Controller'
 class  = serial bus
ig4iic1@pci0:0:21:1:class=0x0c8000 rev=0x01 hdr=0x00 vendor=0x8086
device=0x51e9 subvendor=0x17aa subdevice=0x3814
 vendor = 'Intel Corporation'
 device = 'Alder Lake PCH Serial IO I2C Controller'
 class  = serial bus
none3@pci0:0:22:0:  class=0x078000 rev=0x01 hdr=0x00 vendor=0x8086
device=0x51e0 subvendor=0x17aa subdevice=0x3815
 vendor = 'Intel Corporation'
 device = 'Alder Lake PCH HECI Controller'
 class  = simple comms
ahci0@pci0:0:23:0:  class=0x010601 rev=0x01 hdr=0x00 vendor=0x8086
device=0x51d3 subvendor=0x8086 subdevice=0x7270
 vendor = 'Intel Corporation'
 device = 'Alder Lake-P SATA AHCI Controller'
 class  = mass storage
 subclass   = SATA
pcib2@pci0:0:29:0:  class=0x060400 rev=0x01 hdr=0x01 vendor=0x8086
device=0x51b1 subvendor=0x17aa subdevice=0x381f
 vendor = 'Intel Corporation'
 device = 'Alder Lake 

Re: Multi cons support has disappeared (on Alder Lake) was: Alt+Fn isn't functional. Has this been removed?

2024-04-01 Thread Kevin Oberman
On Mon, Apr 1, 2024 at 3:05 PM Chris  wrote:

> I experience challenges running FreeBSD on my Alder Lake laptop.
> With some help on the list and Bugzilla, I was able to get Graphics
> WiFi at least working. But still wasn't as stable as running on
> more dated CPU's. As it is; I'm only able to use CURRENT. Beginning
> of last week, in hopes of getting a more stable experience. I wiped
> the partition (UFS) and unpacked the version available on the FreeBSD
> ftp servers at that time. I quickly discovered that multi-cons (Ctrl+
> Alt+Fn || Alt+Fn) was no longer available. I posted this discovery to
> the list. But no solution was discovered. I've since attempted to use
> 2 more different newer versions. Both of them were also w/o multi-con(s)
> support. What must I do to fix, or uncover the cause of this?
> I only load the associated GPU module in rc.conf(5) (no keyboard settings).
> I'm also unable to get multi-cons booting from any of the boot media
> produced within the last week.
>
> Following are some specifics:
>
> CPU: 12th Gen Intel(R) Core(TM) i3-1215U (2496.00-MHz K8-class CPU)
>
> IdeaPad 3 17IAU7
>
> WORKS:
> FreeBSD 15.0-CURRENT #0 main-n267640-7a4d1d1df0b2:
> Thu Jan 18 04:04:32 UTC 2024
>
> DOESN'T WORK:
> FreeBSD 15.0-CURRENT #0 main-n269036-6baddb6b1176:
> Fri Mar 29 10:19:43 UTC 2024
> r...@releng3.nyi.freebsd.org:/usr/obj/usr/src/amd64.amd64/sys/GENERIC
> amd64
>
> FreeBSD 15.0-CURRENT #0 main-n268793-220ee18f1964:
> Thu Mar 14 02:58:39 UTC 2024
> r...@releng3.nyi.freebsd.org:/usr/obj/usr/src/amd64.amd64/sys/GENERIC
> amd64
>
> hostb0@pci0:0:0:0:  class=0x06 rev=0x04 hdr=0x00 vendor=0x8086
> device=0x4609 subvendor=0x17aa subdevice=0x3803
>  vendor = 'Intel Corporation'
>  class  = bridge
>  subclass   = HOST-PCI
> vgapci0@pci0:0:2:0: class=0x03 rev=0x0c hdr=0x00 vendor=0x8086
> device=0x46b3 subvendor=0x17aa subdevice=0x3b3a
>  vendor = 'Intel Corporation'
>  device = 'Alder Lake-UP3 GT1 [UHD Graphics]'
>  class  = display
>  subclass   = VGA
> none0@pci0:0:4:0:   class=0x118000 rev=0x04 hdr=0x00 vendor=0x8086
> device=0x461d subvendor=0x17aa subdevice=0x380c
>  vendor = 'Intel Corporation'
>  device = 'Alder Lake Innovation Platform Framework Processor
> Participant'
>  class  = dasp
> pcib1@pci0:0:6:0:   class=0x060400 rev=0x04 hdr=0x01 vendor=0x8086
> device=0x464d subvendor=0x17aa subdevice=0x380e
>  vendor = 'Intel Corporation'
>  device = '12th Gen Core Processor PCI Express x4 Controller'
>  class  = bridge
>  subclass   = PCI-PCI
> none1@pci0:0:10:0:  class=0x118000 rev=0x01 hdr=0x00 vendor=0x8086
> device=0x467d subvendor=0x17aa subdevice=0x3813
>  vendor = 'Intel Corporation'
>  device = 'Platform Monitoring Technology'
>  class  = dasp
> xhci0@pci0:0:13:0:  class=0x0c0330 rev=0x04 hdr=0x00 vendor=0x8086
> device=0x461e subvendor=0x17aa subdevice=0x3824
>  vendor = 'Intel Corporation'
>  device = 'Alder Lake-P Thunderbolt 4 USB Controller'
>  class  = serial bus
>  subclass   = USB
> xhci1@pci0:0:20:0:  class=0x0c0330 rev=0x01 hdr=0x00 vendor=0x8086
> device=0x51ed subvendor=0x17aa subdevice=0x3820
>  vendor = 'Intel Corporation'
>  device = 'Alder Lake PCH USB 3.2 xHCI Host Controller'
>  class  = serial bus
>  subclass   = USB
> none2@pci0:0:20:2:  class=0x05 rev=0x01 hdr=0x00 vendor=0x8086
> device=0x51ef subvendor=0x17aa subdevice=0x381e
>  vendor = 'Intel Corporation'
>  device = 'Alder Lake PCH Shared SRAM'
>  class  = memory
>  subclass   = RAM
> iwlwifi0@pci0:0:20:3:   class=0x028000 rev=0x01 hdr=0x00 vendor=0x8086
> device=0x51f0 subvendor=0x8086 subdevice=0x0074
>  vendor = 'Intel Corporation'
>  device = 'Alder Lake-P PCH CNVi WiFi'
>  class  = network
> ig4iic0@pci0:0:21:0:class=0x0c8000 rev=0x01 hdr=0x00 vendor=0x8086
> device=0x51e8 subvendor=0x17aa subdevice=0x3812
>  vendor = 'Intel Corporation'
>  device = 'Alder Lake PCH Serial IO I2C Controller'
>  class  = serial bus
> ig4iic1@pci0:0:21:1:class=0x0c8000 rev=0x01 hdr=0x00 vendor=0x8086
> device=0x51e9 subvendor=0x17aa subdevice=0x3814
>  vendor = 'Intel Corporation'
>  device = 'Alder Lake PCH Serial IO I2C Controller'
>  class  = serial bus
> none3@pci0:0:22:0:  class=0x078000 rev=0x01 hdr=0x00 vendor=0x8086
> device=0x51e0 subvendor=0x17aa subdevice=0x3815
>  vendor = 'Intel Corporation'
>  device = 'Alder Lake PCH HECI Controller'
>  class  = simple comms
> ahci0@pci0:0:23:0:  class=0x010601 rev=0x01 hdr=0x00 vendor=0x8086
> device=0x51d3 subvendor=0x8086 subdevice=0x7270
>  vendor = 'Intel Corporation'
>  device = 'Alder Lake-P SATA AHCI Controller'
>  class  = mass storage
>  subclass   = SATA
> pcib2@pci0:0:29:0:  

Re: Multi cons support has disappeared (on Alder Lake) was: Alt+Fn isn't functional. Has this been removed?

2024-04-01 Thread Chris

On 2024-04-01 14:57, Michael Gmelin wrote:
You never responded to my (and T’s) message - so I assume the suggestions 
made no

difference?

Right, and thank you for the reply. I (indirectly) answered it here
when I indicated that I had no references to "keyboard settings in
my rc.conf(5)". :) Any other thoughts?
Thanks again, Michael!

--Chris


-m


On 1. Apr 2024, at 22:48, Chris  wrote:

I experience challenges running FreeBSD on my Alder Lake laptop.
With some help on the list and Bugzilla, I was able to get Graphics
WiFi at least working. But still wasn't as stable as running on
more dated CPU's. As it is; I'm only able to use CURRENT. Beginning
of last week, in hopes of getting a more stable experience. I wiped
the partition (UFS) and unpacked the version available on the FreeBSD
ftp servers at that time. I quickly discovered that multi-cons (Ctrl+
Alt+Fn || Alt+Fn) was no longer available. I posted this discovery to
the list. But no solution was discovered. I've since attempted to use
2 more different newer versions. Both of them were also w/o multi-con(s)
support. What must I do to fix, or uncover the cause of this?
I only load the associated GPU module in rc.conf(5) (no keyboard settings).
I'm also unable to get multi-cons booting from any of the boot media
produced within the last week.

Following are some specifics:

CPU: 12th Gen Intel(R) Core(TM) i3-1215U (2496.00-MHz K8-class CPU)

IdeaPad 3 17IAU7

WORKS:
FreeBSD 15.0-CURRENT #0 main-n267640-7a4d1d1df0b2:
Thu Jan 18 04:04:32 UTC 2024

DOESN'T WORK:
FreeBSD 15.0-CURRENT #0 main-n269036-6baddb6b1176:
Fri Mar 29 10:19:43 UTC 2024
r...@releng3.nyi.freebsd.org:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64

FreeBSD 15.0-CURRENT #0 main-n268793-220ee18f1964:
Thu Mar 14 02:58:39 UTC 2024
r...@releng3.nyi.freebsd.org:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64

hostb0@pci0:0:0:0:class=0x06 rev=0x04 hdr=0x00 vendor=0x8086 
device=0x4609 subvendor=0x17aa subdevice=0x3803

   vendor = 'Intel Corporation'
   class  = bridge
   subclass   = HOST-PCI
vgapci0@pci0:0:2:0:class=0x03 rev=0x0c hdr=0x00 vendor=0x8086 
device=0x46b3 subvendor=0x17aa subdevice=0x3b3a

   vendor = 'Intel Corporation'
   device = 'Alder Lake-UP3 GT1 [UHD Graphics]'
   class  = display
   subclass   = VGA
none0@pci0:0:4:0:class=0x118000 rev=0x04 hdr=0x00 vendor=0x8086 
device=0x461d subvendor=0x17aa subdevice=0x380c

   vendor = 'Intel Corporation'
   device = 'Alder Lake Innovation Platform Framework Processor 
Participant'

   class  = dasp
pcib1@pci0:0:6:0:class=0x060400 rev=0x04 hdr=0x01 vendor=0x8086 
device=0x464d subvendor=0x17aa subdevice=0x380e

   vendor = 'Intel Corporation'
   device = '12th Gen Core Processor PCI Express x4 Controller'
   class  = bridge
   subclass   = PCI-PCI
none1@pci0:0:10:0:class=0x118000 rev=0x01 hdr=0x00 vendor=0x8086 
device=0x467d subvendor=0x17aa subdevice=0x3813

   vendor = 'Intel Corporation'
   device = 'Platform Monitoring Technology'
   class  = dasp
xhci0@pci0:0:13:0:class=0x0c0330 rev=0x04 hdr=0x00 vendor=0x8086 
device=0x461e subvendor=0x17aa subdevice=0x3824

   vendor = 'Intel Corporation'
   device = 'Alder Lake-P Thunderbolt 4 USB Controller'
   class  = serial bus
   subclass   = USB
xhci1@pci0:0:20:0:class=0x0c0330 rev=0x01 hdr=0x00 vendor=0x8086 
device=0x51ed subvendor=0x17aa subdevice=0x3820

   vendor = 'Intel Corporation'
   device = 'Alder Lake PCH USB 3.2 xHCI Host Controller'
   class  = serial bus
   subclass   = USB
none2@pci0:0:20:2:class=0x05 rev=0x01 hdr=0x00 vendor=0x8086 
device=0x51ef subvendor=0x17aa subdevice=0x381e

   vendor = 'Intel Corporation'
   device = 'Alder Lake PCH Shared SRAM'
   class  = memory
   subclass   = RAM
iwlwifi0@pci0:0:20:3:class=0x028000 rev=0x01 hdr=0x00 vendor=0x8086 
device=0x51f0 subvendor=0x8086 subdevice=0x0074

   vendor = 'Intel Corporation'
   device = 'Alder Lake-P PCH CNVi WiFi'
   class  = network
ig4iic0@pci0:0:21:0:class=0x0c8000 rev=0x01 hdr=0x00 vendor=0x8086 
device=0x51e8 subvendor=0x17aa subdevice=0x3812

   vendor = 'Intel Corporation'
   device = 'Alder Lake PCH Serial IO I2C Controller'
   class  = serial bus
ig4iic1@pci0:0:21:1:class=0x0c8000 rev=0x01 hdr=0x00 vendor=0x8086 
device=0x51e9 subvendor=0x17aa subdevice=0x3814

   vendor = 'Intel Corporation'
   device = 'Alder Lake PCH Serial IO I2C Controller'
   class  = serial bus
none3@pci0:0:22:0:class=0x078000 rev=0x01 hdr=0x00 vendor=0x8086 
device=0x51e0 subvendor=0x17aa subdevice=0x3815

   vendor = 'Intel Corporation'
   device = 'Alder Lake PCH HECI Controller'
   class  = simple comms
ahci0@pci0:0:23:0:class=0x010601 rev=0x01 hdr=0x00 vendor=0x8086 
device=0x51d3 subvendor=0x8086 subdevice=0x7270

   vendor = 'Intel Corporation'
   device = 'Alder Lake-P SATA AHCI Controller'
   class  = 

Multi cons support has disappeared (on Alder Lake) was: Alt+Fn isn't functional. Has this been removed?

2024-04-01 Thread Chris

I experience challenges running FreeBSD on my Alder Lake laptop.
With some help on the list and Bugzilla, I was able to get Graphics
WiFi at least working. But still wasn't as stable as running on
more dated CPU's. As it is; I'm only able to use CURRENT. Beginning
of last week, in hopes of getting a more stable experience. I wiped
the partition (UFS) and unpacked the version available on the FreeBSD
ftp servers at that time. I quickly discovered that multi-cons (Ctrl+
Alt+Fn || Alt+Fn) was no longer available. I posted this discovery to
the list. But no solution was discovered. I've since attempted to use
2 more different newer versions. Both of them were also w/o multi-con(s)
support. What must I do to fix, or uncover the cause of this?
I only load the associated GPU module in rc.conf(5) (no keyboard settings).
I'm also unable to get multi-cons booting from any of the boot media
produced within the last week.

Following are some specifics:

CPU: 12th Gen Intel(R) Core(TM) i3-1215U (2496.00-MHz K8-class CPU)

IdeaPad 3 17IAU7

WORKS:
FreeBSD 15.0-CURRENT #0 main-n267640-7a4d1d1df0b2:
Thu Jan 18 04:04:32 UTC 2024

DOESN'T WORK:
FreeBSD 15.0-CURRENT #0 main-n269036-6baddb6b1176:
Fri Mar 29 10:19:43 UTC 2024
r...@releng3.nyi.freebsd.org:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64

FreeBSD 15.0-CURRENT #0 main-n268793-220ee18f1964:
Thu Mar 14 02:58:39 UTC 2024
r...@releng3.nyi.freebsd.org:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64

hostb0@pci0:0:0:0:	class=0x06 rev=0x04 hdr=0x00 vendor=0x8086 
device=0x4609 subvendor=0x17aa subdevice=0x3803

vendor = 'Intel Corporation'
class  = bridge
subclass   = HOST-PCI
vgapci0@pci0:0:2:0:	class=0x03 rev=0x0c hdr=0x00 vendor=0x8086 
device=0x46b3 subvendor=0x17aa subdevice=0x3b3a

vendor = 'Intel Corporation'
device = 'Alder Lake-UP3 GT1 [UHD Graphics]'
class  = display
subclass   = VGA
none0@pci0:0:4:0:	class=0x118000 rev=0x04 hdr=0x00 vendor=0x8086 
device=0x461d subvendor=0x17aa subdevice=0x380c

vendor = 'Intel Corporation'
device = 'Alder Lake Innovation Platform Framework Processor 
Participant'

class  = dasp
pcib1@pci0:0:6:0:	class=0x060400 rev=0x04 hdr=0x01 vendor=0x8086 
device=0x464d subvendor=0x17aa subdevice=0x380e

vendor = 'Intel Corporation'
device = '12th Gen Core Processor PCI Express x4 Controller'
class  = bridge
subclass   = PCI-PCI
none1@pci0:0:10:0:	class=0x118000 rev=0x01 hdr=0x00 vendor=0x8086 
device=0x467d subvendor=0x17aa subdevice=0x3813

vendor = 'Intel Corporation'
device = 'Platform Monitoring Technology'
class  = dasp
xhci0@pci0:0:13:0:	class=0x0c0330 rev=0x04 hdr=0x00 vendor=0x8086 
device=0x461e subvendor=0x17aa subdevice=0x3824

vendor = 'Intel Corporation'
device = 'Alder Lake-P Thunderbolt 4 USB Controller'
class  = serial bus
subclass   = USB
xhci1@pci0:0:20:0:	class=0x0c0330 rev=0x01 hdr=0x00 vendor=0x8086 
device=0x51ed subvendor=0x17aa subdevice=0x3820

vendor = 'Intel Corporation'
device = 'Alder Lake PCH USB 3.2 xHCI Host Controller'
class  = serial bus
subclass   = USB
none2@pci0:0:20:2:	class=0x05 rev=0x01 hdr=0x00 vendor=0x8086 
device=0x51ef subvendor=0x17aa subdevice=0x381e

vendor = 'Intel Corporation'
device = 'Alder Lake PCH Shared SRAM'
class  = memory
subclass   = RAM
iwlwifi0@pci0:0:20:3:	class=0x028000 rev=0x01 hdr=0x00 vendor=0x8086 
device=0x51f0 subvendor=0x8086 subdevice=0x0074

vendor = 'Intel Corporation'
device = 'Alder Lake-P PCH CNVi WiFi'
class  = network
ig4iic0@pci0:0:21:0:	class=0x0c8000 rev=0x01 hdr=0x00 vendor=0x8086 
device=0x51e8 subvendor=0x17aa subdevice=0x3812

vendor = 'Intel Corporation'
device = 'Alder Lake PCH Serial IO I2C Controller'
class  = serial bus
ig4iic1@pci0:0:21:1:	class=0x0c8000 rev=0x01 hdr=0x00 vendor=0x8086 
device=0x51e9 subvendor=0x17aa subdevice=0x3814

vendor = 'Intel Corporation'
device = 'Alder Lake PCH Serial IO I2C Controller'
class  = serial bus
none3@pci0:0:22:0:	class=0x078000 rev=0x01 hdr=0x00 vendor=0x8086 
device=0x51e0 subvendor=0x17aa subdevice=0x3815

vendor = 'Intel Corporation'
device = 'Alder Lake PCH HECI Controller'
class  = simple comms
ahci0@pci0:0:23:0:	class=0x010601 rev=0x01 hdr=0x00 vendor=0x8086 
device=0x51d3 subvendor=0x8086 subdevice=0x7270

vendor = 'Intel Corporation'
device = 'Alder Lake-P SATA AHCI Controller'
class  = mass storage
subclass   = SATA
pcib2@pci0:0:29:0:	class=0x060400 rev=0x01 hdr=0x01 vendor=0x8086 
device=0x51b1 subvendor=0x17aa subdevice=0x381f

vendor = 'Intel Corporation'
device = 'Alder Lake PCI Express x1 Root Port'
class  = bridge
subclass   = PCI-PCI
isab0@pci0:0:31:0:	class=0x060100 rev=0x01 hdr=0x00 vendor=0x8086 
device=0x5182 subvendor=0x17aa 

Re: 15.0 on RPi4, USB broken: uhub_reattach_port: port 2 reset failed, error=USB_ERR_TIMEOUT

2024-03-31 Thread Mark Millard
On Mar 30, 2024, at 12:44, Lexi Winter  wrote:

> i'm using 15.0 (f66a994d59) on an 4GB RPi4 with a USB<>SATA adapter for
> the root disk:
> 
> usb_msc_auto_quirk: UQ_MSC_NO_GETMAXLUN set for USB mass storage device 
> ASM1153USB3.0TOSATA ASM1153USB3.0TOSATA (0x174c:0x55aa)
> ugen0.3:  at usbus0
> umass0 on uhub1
> umass0:  addr 2> on usbus0
> umass0:  SCSI over Bulk-Only; quirks = 0x0100
> umass0:1:0: Attached to scbus1
> da0 at umass-sim0 bus 0 scbus1 target 0 lun 0
> da0:  Fixed Direct Access SPC-4 SCSI device
> da0: Serial Number 123456789019
> da0: 40.000MB/s transfers
> da0: 228936MB (468862128 512 byte sectors)
> da0: quirks=0x2
> 
> when connected via USB 2, this works fine.  when connected via USB 3.0,
> the device sometimes fails to attach on boot, causing mountroot to fail.
> i can reproduce this reliably with both GENERIC-NODEBUG and a custom
> modular kernel, and sometimes (but not every boot) with GENERIC.
> 
> when the problem happens, with USB_DEBUG enabled, the kernel logs:
> 
> uhub_reattach_port: port 2 reset failed, error=USB_ERR_TIMEOUT
> uhub_reattach_port: device problem (USB_ERR_TIMEOUT), disabling port 2
> 
> however, if i boot with "boot -v", the device is reliably detected
> correctly.  since -v shouldn't cause any functional changes, i suspect
> this may be some kind of timing issue.
> 
> i've tried increasing some of the USB timings (hw.usb.timings.*) but
> this didn't seem to have any effect.  is there anything else i could try
> that might affect this, or is this perhaps a known issue?
> 

Here is my config.txt material related to such issues:

#
# Local addition that avoids USB3 SSD boot failures that look like:
#   uhub_reattach_port: port ? reset failed, error=USB_ERR_TIMEOUT
#   uhub_reattach_port: device problem (USB_ERR_TIMEOUT), disabling port ?
# WARNING, not sufficient for "boot -s": that needs the full force_turbo=1
initial_turbo=60

As far as I can tell, without using one of the turbo settings, the
more modern RPI firmware is varying the speed of the clock in the early
boot time frame and FreeBSD is working in a way that requires more
uniformity for such. (May be delays based on just loop counting?)


===
Mark Millard
marklmi at yahoo.com




Re: 15.0 on RPi4, USB broken: uhub_reattach_port: port 2 reset failed, error=USB_ERR_TIMEOUT

2024-03-31 Thread Nuno Teixeira
(...)

initial_turbo
https://www.raspberrypi.com/documentation/computers/config_txt.html#overclocking

Nuno Teixeira  escreveu (domingo, 31/03/2024 à(s)
19:59):

> Hello,
>
> If you got a fan in your rpi4 box, you could try to overclock it.
> If not, there is a funcionality in config.txt to overclock it just for a
> few seconds at boot time.
>
> I can't remember the funtion but I'm looking at:
> https://www.raspberrypi.com/documentation/computers/config_txt.html
>
> Cheers,
>
> Lexi Winter  escreveu (sábado, 30/03/2024 à(s) 19:45):
>
>> hello,
>>
>> i'm using 15.0 (f66a994d59) on an 4GB RPi4 with a USB<>SATA adapter for
>> the root disk:
>>
>> usb_msc_auto_quirk: UQ_MSC_NO_GETMAXLUN set for USB mass storage device
>> ASM1153USB3.0TOSATA ASM1153USB3.0TOSATA (0x174c:0x55aa)
>> ugen0.3:  at usbus0
>> umass0 on uhub1
>> umass0: > 2.10/1.00, addr 2> on usbus0
>> umass0:  SCSI over Bulk-Only; quirks = 0x0100
>> umass0:1:0: Attached to scbus1
>> da0 at umass-sim0 bus 0 scbus1 target 0 lun 0
>> da0:  Fixed Direct Access SPC-4 SCSI device
>> da0: Serial Number 123456789019
>> da0: 40.000MB/s transfers
>> da0: 228936MB (468862128 512 byte sectors)
>> da0: quirks=0x2
>>
>> when connected via USB 2, this works fine.  when connected via USB 3.0,
>> the device sometimes fails to attach on boot, causing mountroot to fail.
>> i can reproduce this reliably with both GENERIC-NODEBUG and a custom
>> modular kernel, and sometimes (but not every boot) with GENERIC.
>>
>> when the problem happens, with USB_DEBUG enabled, the kernel logs:
>>
>> uhub_reattach_port: port 2 reset failed, error=USB_ERR_TIMEOUT
>> uhub_reattach_port: device problem (USB_ERR_TIMEOUT), disabling port 2
>>
>> however, if i boot with "boot -v", the device is reliably detected
>> correctly.  since -v shouldn't cause any functional changes, i suspect
>> this may be some kind of timing issue.
>>
>> i've tried increasing some of the USB timings (hw.usb.timings.*) but
>> this didn't seem to have any effect.  is there anything else i could try
>> that might affect this, or is this perhaps a known issue?
>>
>> thanks, lexi.
>>
>
>
> --
> Nuno Teixeira
> FreeBSD Committer (ports)
>


-- 
Nuno Teixeira
FreeBSD Committer (ports)


Re: 15.0 on RPi4, USB broken: uhub_reattach_port: port 2 reset failed, error=USB_ERR_TIMEOUT

2024-03-31 Thread Nuno Teixeira
Hello,

If you got a fan in your rpi4 box, you could try to overclock it.
If not, there is a funcionality in config.txt to overclock it just for a
few seconds at boot time.

I can't remember the funtion but I'm looking at:
https://www.raspberrypi.com/documentation/computers/config_txt.html

Cheers,

Lexi Winter  escreveu (sábado, 30/03/2024 à(s) 19:45):

> hello,
>
> i'm using 15.0 (f66a994d59) on an 4GB RPi4 with a USB<>SATA adapter for
> the root disk:
>
> usb_msc_auto_quirk: UQ_MSC_NO_GETMAXLUN set for USB mass storage device
> ASM1153USB3.0TOSATA ASM1153USB3.0TOSATA (0x174c:0x55aa)
> ugen0.3:  at usbus0
> umass0 on uhub1
> umass0:  2.10/1.00, addr 2> on usbus0
> umass0:  SCSI over Bulk-Only; quirks = 0x0100
> umass0:1:0: Attached to scbus1
> da0 at umass-sim0 bus 0 scbus1 target 0 lun 0
> da0:  Fixed Direct Access SPC-4 SCSI device
> da0: Serial Number 123456789019
> da0: 40.000MB/s transfers
> da0: 228936MB (468862128 512 byte sectors)
> da0: quirks=0x2
>
> when connected via USB 2, this works fine.  when connected via USB 3.0,
> the device sometimes fails to attach on boot, causing mountroot to fail.
> i can reproduce this reliably with both GENERIC-NODEBUG and a custom
> modular kernel, and sometimes (but not every boot) with GENERIC.
>
> when the problem happens, with USB_DEBUG enabled, the kernel logs:
>
> uhub_reattach_port: port 2 reset failed, error=USB_ERR_TIMEOUT
> uhub_reattach_port: device problem (USB_ERR_TIMEOUT), disabling port 2
>
> however, if i boot with "boot -v", the device is reliably detected
> correctly.  since -v shouldn't cause any functional changes, i suspect
> this may be some kind of timing issue.
>
> i've tried increasing some of the USB timings (hw.usb.timings.*) but
> this didn't seem to have any effect.  is there anything else i could try
> that might affect this, or is this perhaps a known issue?
>
> thanks, lexi.
>


-- 
Nuno Teixeira
FreeBSD Committer (ports)


Re: Multiple issues with current (kldload failures, missing CTF stuff, pty issues, ...)

2024-03-31 Thread Alexander Leidinger

Am 2024-03-29 18:21, schrieb Alexander Leidinger:

Am 2024-03-29 18:13, schrieb Mark Johnston:

On Fri, Mar 29, 2024 at 04:52:55PM +0100, Alexander Leidinger wrote:

Hi,

sources from 2024-03-11 work. Sources from 2024-03-25 and today don't 
work
(see below for the issue). As the monthly stabilisation pass didn't 
find

obvious issues, it is something related to my setup:
 - not a generic kernel
 - very modular kernel (as much as possible as a module)
 - bind_now (a build without fails too, tested with clean /usr/obj)
 - ccache (a build without fails too, tested with clean /usr/obj)
 - kernel retpoline (build without in progress)
 - userland retpoline (build without in progress)
 - kernel build with WITH_CTF / DDB_CTF (next one to test if it isn't
retpoline)
 - -fno-builtin
 - CPUFLAGS=native (except for stuff in /usr/src/sys/boot)
 - malloc production
 - COPTFLAGS= -O2 -pipe

The issue is, that kernel modules load OK from loader, but once it 
starts
init any module fails to load (e.g. via autodetection of hardware or 
rc.conf
kld_list) with the message that the kernel and module versions are 
out of

sync and the module refuses to load.


What is the exact revision you're running?  There were some unrelated
changes to the kernel linker around the same time.


The working src is from 2024-03-11-094351 (GMT+0100).
The failing src was fetched after Glebs stabilization week message (and 
todays src before the sound stuff still fails).


Retpoline wasn't the cause, next test is the CTF stuff in the kernel...


A rather obscure problem was causing this. The "last" BE had canmount 
set to "on" instead of "noauto". No idea how this happened, but this 
resulted in the "last" BE to be mounted on "zfs mount -a" on top of the 
current BE. This means that all modules loaded after the zfs rc script 
has run was loading old kernel modules and the error message of kernel 
version mismatch was correct. I fiund the issue while bisecting the tree 
and suddenly the error message went away but the new issue of missing 
dev entries popped up (/dev was mounted correctly on the booting 
dataset, but the last BE was mounted on top of it and /dev went 
empty...).


It looks to me like bectl was doing this (from "zpool history")...
2024-03-11.14:16:31 zpool set bootfs=rpool/ROOT/2024-03-11-094351 rpool
2024-03-11.14:16:31 zfs set canmount=noauto rpool/ROOT/2024-01-18-092730
2024-03-11.14:16:31 zfs set canmount=noauto rpool/ROOT/2024-02-10-144617
2024-03-11.14:16:32 zfs set canmount=noauto rpool/ROOT/2024-02-11-212006
2024-03-11.14:16:32 zfs set canmount=noauto rpool/ROOT/2024-02-16-082836
2024-03-11.14:16:32 zfs set canmount=noauto rpool/ROOT/2024-02-24-140211
2024-03-11.14:16:32 zfs set canmount=noauto 
rpool/ROOT/2024-02-24-140211_ok

2024-03-11.14:16:33 zfs set canmount=on rpool/ROOT/2024-03-11-094351
2024-03-11.14:16:33 zfs promote rpool/ROOT/2024-03-11-094351
2024-03-11.14:17:03 zfs destroy -r rpool/ROOT/2024-02-24-140211_ok

I surely didn't do the "zfs set canmount=..." for those by hand.

Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


signature.asc
Description: OpenPGP digital signature


15.0 on RPi4, USB broken: uhub_reattach_port: port 2 reset failed, error=USB_ERR_TIMEOUT

2024-03-30 Thread Lexi Winter
hello,

i'm using 15.0 (f66a994d59) on an 4GB RPi4 with a USB<>SATA adapter for
the root disk:

usb_msc_auto_quirk: UQ_MSC_NO_GETMAXLUN set for USB mass storage device 
ASM1153USB3.0TOSATA ASM1153USB3.0TOSATA (0x174c:0x55aa)
ugen0.3:  at usbus0
umass0 on uhub1
umass0:  on usbus0
umass0:  SCSI over Bulk-Only; quirks = 0x0100
umass0:1:0: Attached to scbus1
da0 at umass-sim0 bus 0 scbus1 target 0 lun 0
da0:  Fixed Direct Access SPC-4 SCSI device
da0: Serial Number 123456789019
da0: 40.000MB/s transfers
da0: 228936MB (468862128 512 byte sectors)
da0: quirks=0x2

when connected via USB 2, this works fine.  when connected via USB 3.0,
the device sometimes fails to attach on boot, causing mountroot to fail.
i can reproduce this reliably with both GENERIC-NODEBUG and a custom
modular kernel, and sometimes (but not every boot) with GENERIC.

when the problem happens, with USB_DEBUG enabled, the kernel logs:

uhub_reattach_port: port 2 reset failed, error=USB_ERR_TIMEOUT
uhub_reattach_port: device problem (USB_ERR_TIMEOUT), disabling port 2

however, if i boot with "boot -v", the device is reliably detected
correctly.  since -v shouldn't cause any functional changes, i suspect
this may be some kind of timing issue.

i've tried increasing some of the USB timings (hw.usb.timings.*) but
this didn't seem to have any effect.  is there anything else i could try
that might affect this, or is this perhaps a known issue?

thanks, lexi.


signature.asc
Description: PGP signature


Re: CURRENT on laptop ASUS VivoBook Pro 14 90NB0VZ2-M01230

2024-03-30 Thread Matthias Apitz



(For the not working Wifi chip, I use at the moment an USB-Wifi dongle,
Realtek RTL8191S WLAN Adapter, which works fine).

I also can't get Xorg plus twm up; it says in /var/log/Xorg.0.log at the end:

..
REDWOOD, ATI Mobility Radeon Graphics, CEDAR, ATI FirePro 2270,
ATI Radeon HD 5450, CAYMAN, AMD Radeon HD 6900 Series,
AMD Radeon HD 6900M Series, Mobility Radeon HD 6000 Series, BARTS,
AMD Radeon HD 6800 Series, AMD Radeon HD 6700 Series, TURKS, CAICOS,
ARUBA, TAHITI, PITCAIRN, VERDE, OLAND, HAINAN, BONAIRE, KABINI,
MULLINS, KAVERI, HAWAII
[   248.442] (II) modesetting: Driver for Modesetting Kernel Drivers: kms
[   248.442] (II) scfb: driver for wsdisplay framebuffer: scfb
[   248.442] (II) VESA: driver for VESA chipsets: vesa
[   248.442] (--) Using syscons driver with X support (version 2.0)
[   248.442] (--) using VT number 9

[   248.447] (EE) open /dev/dri/card0: No such file or directory
[   248.447] (WW) Falling back to old probe method for modesetting
[   248.447] (EE) open /dev/dri/card0: No such file or directory
[   248.447] (WW) Falling back to old probe method for scfb
[   248.447] scfb trace: probe start
..

The kernel modules loaded are:

Id Refs AddressSize Name
 1  139 0x8020  1d4f010 kernel
 21 0x81f5 36c0 coretemp.ko
 31 0x81f55000 9c48 if_cdce.ko
 42 0x81f5f000 6138 uether.ko
 51 0x81f66000 a698 cuse.ko
 61 0x81f71000f7f38 ipl.ko
 71 0x83c0   462be0 zfs.ko
 81 0x84063000   1510b8 radeonkms.ko
 92 0x841b500073da0 drm.ko
101 0x83bd7000 22a8 iic.ko
113 0x83bda000 1100 linuxkpi_gplv2.ko
124 0x83bdc000 6320 dmabuf.ko
134 0x83be3000 3080 linuxkpi_hdmi.ko
141 0x83be7000 c7b0 ttm.ko
151 0x83bf4000 3370 acpi_wmi.ko
161 0x83bf8000 5ee0 ig4.ko
171 0x84229000 3210 intpm.ko
181 0x8422d000 2178 smbus.ko
191 0x842330ad8 linux.ko
204 0x84261000 be30 linux_common.ko
211 0x8426d0002ccf8 linux64.ko
221 0x8429a000 2270 pty.ko
231 0x8429d000 3540 fdescfs.ko
241 0x842a1000 73c0 linprocfs.ko
251 0x842a9000 43e4 linsysfs.ko
261 0x842ae000 4d00 ng_ubt.ko
276 0x842b3000 bb28 netgraph.ko
282 0x842bf000 a238 ng_hci.ko
294 0x842ca000 2668 ng_bluetooth.ko
301 0x842cd000 a7e0 if_rsu.ko
311 0x842d8000 3218 iichid.ko
325 0x842dc000 32a8 hidbus.ko
331 0x842e f250 ng_l2cap.ko
341 0x842f19f08 ng_btsocket.ko
351 0x8430a000 38b8 ng_socket.ko
371 0x8432e000 21e0 hms.ko
381 0x84331000 40a8 hidmap.ko
391 0x84336000 334d hmt.ko
401 0x8433a000 22c4 hconf.ko

The complete Xorg.0.log is here: http://www.unixarea.de/Xorg.0.log.txt

Thanks in advance for ideas.

matthias
-- 
Matthias Apitz, ✉ g...@unixarea.de, http://www.unixarea.de/ +49-176-38902045
Public GnuPG key: http://www.unixarea.de/key.pub



Re: Alt+Fn isn't functional. Has this been removed?

2024-03-30 Thread Tomoaki AOKI
On Fri, 29 Mar 2024 19:02:37 -0700
Chris  wrote:

> I just poured the dist files onto an earlier 15 (after removing
> the earlier version). After booting into the new install, I no longer
> had any other tty's other than ttyv0. Alt+Fn has no affect, I'm only
> getting ttyv0. getty(8) is running, and a ps waux | grep getty shows
> they're all up. Only things I saved from the older install were the
> user/group databases, rc.conf,pf.conf,jail.conf, and wpa_supplicant.conf.
> 
> What do I need to do to further isolate this problem?
> 
> Thanks.
> 
> System info:
> 
> FreeBSD fbsd15 15.0-CURRENT FreeBSD 15.0-CURRENT #0 
> main-n268793-220ee18f1964:
> Thu Mar 14 02:58:39 UTC 2024
> r...@releng3.nyi.freebsd.org:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64
> 
> CPU: 12th Gen Intel(R) Core(TM) i3-1215U (2496.00-MHz K8-class CPU)
> 
> IdeaPad 3 17IAU7
> 
> Id Refs AddressSize Name
>   1   95 0x8020  1d527c0 kernel
>   21 0x81f54000287e8 fusefs.ko
>   31 0x82d8f000   1e3228 i915kms.ko
>   42 0x82f7300085090 drm.ko
>   51 0x82ff9000 22b8 iic.ko
>   62 0x82ffc000 40e9 linuxkpi_video.ko
>   73 0x83001000 7358 dmabuf.ko
>   83 0x83009000 3378 lindebugfs.ko
>   91 0x8300d000 c338 ttm.ko
> 101 0x8301a000 5760 cuse.ko
> 111 0x8302 3390 acpi_wmi.ko
> 121 0x83024000 4250 ichsmb.ko
> 131 0x83029000 2178 smbus.ko
> 141 0x8302c00091260 if_iwlwifi.ko
> 151 0x830be000 5f90 ig4.ko
> 161 0x830c4000 4d20 ng_ubt.ko
> 173 0x830c9000 bbb8 netgraph.ko
> 182 0x830d5000 a250 ng_hci.ko
> 192 0x830e 2670 ng_bluetooth.ko
> 201 0x830e3000 3218 iichid.ko
> 215 0x830e7000 3380 hidbus.ko
> 221 0x830eb000 21e8 hms.ko
> 231 0x830ee000 40a8 hidmap.ko
> 241 0x830f3000 3355 hmt.ko
> 251 0x830f7000 22cc hconf.ko
> 261 0x830fa000 2260 pflog.ko
> 271 0x830fd00056540 pf.ko
> 281 0x83154000 3560 fdescfs.ko

Are you sure your function keys are actually function keys?
Not sure your IdeaPad is, but some Lenovo notebooks are configured
function keys as special (mute, radio,...) keys by default and needs to
configure in UEFI firmware to switch to function keys.
If it's the case, Fn+Alt+F2 would switch to vty1.

-- 
Tomoaki AOKI



Re: Alt+Fn isn't functional. Has this been removed?

2024-03-30 Thread Michael Gmelin



> On 30. Mar 2024, at 07:30, Chris  wrote:
> 
> On 2024-03-29 23:06, Michael Schuster wrote:
>> Two ideas:
>> - does CTL-ALT-Fn work?
> Thanks. But no, I tried that.
> 
>> - perhaps the number of predefined ttys was overwritten/set to 0 somewhere?
> I'm only aware of /etc/ttys, and they're all available (uncommented) and
> ps(1) indicates getty(8) is running on all the normally assigned ttyv(n)'s.
> 
> Thanks for the reply!
> 
> --Chris

In case you have a keymap defined on rc.conf, try commenting that out, reboot 
amd see if it makes a difference (as a debugging measure). 

Cheers
Michael


>> HTH
>> Michael
>>> On Sat, Mar 30, 2024, 03:03 Chris  wrote:
>>> I just poured the dist files onto an earlier 15 (after removing
>>> the earlier version). After booting into the new install, I no longer
>>> had any other tty's other than ttyv0. Alt+Fn has no affect, I'm only
>>> getting ttyv0. getty(8) is running, and a ps waux | grep getty shows
>>> they're all up. Only things I saved from the older install were the
>>> user/group databases, rc.conf,pf.conf,jail.conf, and wpa_supplicant.conf.
>>> What do I need to do to further isolate this problem?
>>> Thanks.
>>> System info:
>>> FreeBSD fbsd15 15.0-CURRENT FreeBSD 15.0-CURRENT #0
>>> main-n268793-220ee18f1964:
>>> Thu Mar 14 02:58:39 UTC 2024
>>> r...@releng3.nyi.freebsd.org:/usr/obj/usr/src/amd64.amd64/sys/GENERIC
>>> amd64
>>> CPU: 12th Gen Intel(R) Core(TM) i3-1215U (2496.00-MHz K8-class CPU)
>>> IdeaPad 3 17IAU7
>>> Id Refs AddressSize Name
>>>  1   95 0x8020  1d527c0 kernel
>>>  21 0x81f54000287e8 fusefs.ko
>>>  31 0x82d8f000   1e3228 i915kms.ko
>>>  42 0x82f7300085090 drm.ko
>>>  51 0x82ff9000 22b8 iic.ko
>>>  62 0x82ffc000 40e9 linuxkpi_video.ko
>>>  73 0x83001000 7358 dmabuf.ko
>>>  83 0x83009000 3378 lindebugfs.ko
>>>  91 0x8300d000 c338 ttm.ko
>>> 101 0x8301a000 5760 cuse.ko
>>> 111 0x8302 3390 acpi_wmi.ko
>>> 121 0x83024000 4250 ichsmb.ko
>>> 131 0x83029000 2178 smbus.ko
>>> 141 0x8302c00091260 if_iwlwifi.ko
>>> 151 0x830be000 5f90 ig4.ko
>>> 161 0x830c4000 4d20 ng_ubt.ko
>>> 173 0x830c9000 bbb8 netgraph.ko
>>> 182 0x830d5000 a250 ng_hci.ko
>>> 192 0x830e 2670 ng_bluetooth.ko
>>> 201 0x830e3000 3218 iichid.ko
>>> 215 0x830e7000 3380 hidbus.ko
>>> 221 0x830eb000 21e8 hms.ko
>>> 231 0x830ee000 40a8 hidmap.ko
>>> 241 0x830f3000 3355 hmt.ko
>>> 251 0x830f7000 22cc hconf.ko
>>> 261 0x830fa000 2260 pflog.ko
>>> 271 0x830fd00056540 pf.ko
>>> 281 0x83154000 3560 fdescfs.ko
> 




Re: Alt+Fn isn't functional. Has this been removed?

2024-03-30 Thread Chris

On 2024-03-29 23:06, Michael Schuster wrote:

Two ideas:
- does CTL-ALT-Fn work?

Thanks. But no, I tried that.


- perhaps the number of predefined ttys was overwritten/set to 0 somewhere?

I'm only aware of /etc/ttys, and they're all available (uncommented) and
ps(1) indicates getty(8) is running on all the normally assigned ttyv(n)'s.

Thanks for the reply!

--Chris


HTH
Michael

On Sat, Mar 30, 2024, 03:03 Chris  wrote:


I just poured the dist files onto an earlier 15 (after removing
the earlier version). After booting into the new install, I no longer
had any other tty's other than ttyv0. Alt+Fn has no affect, I'm only
getting ttyv0. getty(8) is running, and a ps waux | grep getty shows
they're all up. Only things I saved from the older install were the
user/group databases, rc.conf,pf.conf,jail.conf, and wpa_supplicant.conf.

What do I need to do to further isolate this problem?

Thanks.

System info:

FreeBSD fbsd15 15.0-CURRENT FreeBSD 15.0-CURRENT #0
main-n268793-220ee18f1964:
Thu Mar 14 02:58:39 UTC 2024
r...@releng3.nyi.freebsd.org:/usr/obj/usr/src/amd64.amd64/sys/GENERIC
amd64

CPU: 12th Gen Intel(R) Core(TM) i3-1215U (2496.00-MHz K8-class CPU)

IdeaPad 3 17IAU7

Id Refs AddressSize Name
  1   95 0x8020  1d527c0 kernel
  21 0x81f54000287e8 fusefs.ko
  31 0x82d8f000   1e3228 i915kms.ko
  42 0x82f7300085090 drm.ko
  51 0x82ff9000 22b8 iic.ko
  62 0x82ffc000 40e9 linuxkpi_video.ko
  73 0x83001000 7358 dmabuf.ko
  83 0x83009000 3378 lindebugfs.ko
  91 0x8300d000 c338 ttm.ko
101 0x8301a000 5760 cuse.ko
111 0x8302 3390 acpi_wmi.ko
121 0x83024000 4250 ichsmb.ko
131 0x83029000 2178 smbus.ko
141 0x8302c00091260 if_iwlwifi.ko
151 0x830be000 5f90 ig4.ko
161 0x830c4000 4d20 ng_ubt.ko
173 0x830c9000 bbb8 netgraph.ko
182 0x830d5000 a250 ng_hci.ko
192 0x830e 2670 ng_bluetooth.ko
201 0x830e3000 3218 iichid.ko
215 0x830e7000 3380 hidbus.ko
221 0x830eb000 21e8 hms.ko
231 0x830ee000 40a8 hidmap.ko
241 0x830f3000 3355 hmt.ko
251 0x830f7000 22cc hconf.ko
261 0x830fa000 2260 pflog.ko
271 0x830fd00056540 pf.ko
281 0x83154000 3560 fdescfs.ko






Alt+Fn isn't functional. Has this been removed?

2024-03-29 Thread Chris

I just poured the dist files onto an earlier 15 (after removing
the earlier version). After booting into the new install, I no longer
had any other tty's other than ttyv0. Alt+Fn has no affect, I'm only
getting ttyv0. getty(8) is running, and a ps waux | grep getty shows
they're all up. Only things I saved from the older install were the
user/group databases, rc.conf,pf.conf,jail.conf, and wpa_supplicant.conf.

What do I need to do to further isolate this problem?

Thanks.

System info:

FreeBSD fbsd15 15.0-CURRENT FreeBSD 15.0-CURRENT #0 
main-n268793-220ee18f1964:

Thu Mar 14 02:58:39 UTC 2024
r...@releng3.nyi.freebsd.org:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64

CPU: 12th Gen Intel(R) Core(TM) i3-1215U (2496.00-MHz K8-class CPU)

IdeaPad 3 17IAU7

Id Refs AddressSize Name
 1   95 0x8020  1d527c0 kernel
 21 0x81f54000287e8 fusefs.ko
 31 0x82d8f000   1e3228 i915kms.ko
 42 0x82f7300085090 drm.ko
 51 0x82ff9000 22b8 iic.ko
 62 0x82ffc000 40e9 linuxkpi_video.ko
 73 0x83001000 7358 dmabuf.ko
 83 0x83009000 3378 lindebugfs.ko
 91 0x8300d000 c338 ttm.ko
101 0x8301a000 5760 cuse.ko
111 0x8302 3390 acpi_wmi.ko
121 0x83024000 4250 ichsmb.ko
131 0x83029000 2178 smbus.ko
141 0x8302c00091260 if_iwlwifi.ko
151 0x830be000 5f90 ig4.ko
161 0x830c4000 4d20 ng_ubt.ko
173 0x830c9000 bbb8 netgraph.ko
182 0x830d5000 a250 ng_hci.ko
192 0x830e 2670 ng_bluetooth.ko
201 0x830e3000 3218 iichid.ko
215 0x830e7000 3380 hidbus.ko
221 0x830eb000 21e8 hms.ko
231 0x830ee000 40a8 hidmap.ko
241 0x830f3000 3355 hmt.ko
251 0x830f7000 22cc hconf.ko
261 0x830fa000 2260 pflog.ko
271 0x830fd00056540 pf.ko
281 0x83154000 3560 fdescfs.ko



Re: Multiple issues with current (kldload failures, missing CTF stuff, pty issues, ...)

2024-03-29 Thread Bojan Novković

On 3/29/24 16:52, Alexander Leidinger wrote:

Hi,

sources from 2024-03-11 work. Sources from 2024-03-25 and today don't 
work (see below for the issue). As the monthly stabilisation pass 
didn't find obvious issues, it is something related to my setup:

 - not a generic kernel
 - very modular kernel (as much as possible as a module)
 - bind_now (a build without fails too, tested with clean /usr/obj)
 - ccache (a build without fails too, tested with clean /usr/obj)
 - kernel retpoline (build without in progress)
 - userland retpoline (build without in progress)
 - kernel build with WITH_CTF / DDB_CTF (next one to test if it isn't 
retpoline)

 - -fno-builtin
 - CPUFLAGS=native (except for stuff in /usr/src/sys/boot)
 - malloc production
 - COPTFLAGS= -O2 -pipe

The issue is, that kernel modules load OK from loader, but once it 
starts init any module fails to load (e.g. via autodetection of 
hardware or rc.conf kld_list) with the message that the kernel and 
module versions are out of sync and the module refuses to load.


I tried the workaround to load the modules from the loader, which 
works, but then I can't login remotely as ssh fails to allocate a pty. 
By loading modules via the loader, I can see messages about missing 
CTF info when the nvidia modules (from ports = not yet rebuild = in 
/boot/modules/...ko instead of /boot/kernel/...ko) try to get 
initialised... and it looks like they are failing to get initialised 
because of this missing CTF stuff (I'm back to the previous boot env 
to be able to login remotely and send mails, I don't have a copy of 
the failure message at hand).


I assume the missing CTF stuff is due to the CTF based pretty printing 
(https://cgit.freebsd.org/src/commit/?id=c21bc6f3c2425de74141bfee07b609bf65b5a6b3). 
Is this supposed to fail to load modules which are compiled without 
CTF data? Shouldn't this work gracefully (e.g. spit out a warning that 
pretty printing is not available for module X and have the module 
working)?


This is indeed how it works, those messages are emitted by CTF loading 
routines in 'kern/kern_ctf.c' as a warning and do not affect the rest of 
the module loading process.


However, I completely agree that they are cryptic and spammy, I'll try 
to do something about that.


Bojan




Re: Multiple issues with current (kldload failures, missing CTF stuff, pty issues, ...)

2024-03-29 Thread Alexander Leidinger

Am 2024-03-29 18:13, schrieb Mark Johnston:

On Fri, Mar 29, 2024 at 04:52:55PM +0100, Alexander Leidinger wrote:

Hi,

sources from 2024-03-11 work. Sources from 2024-03-25 and today don't 
work
(see below for the issue). As the monthly stabilisation pass didn't 
find

obvious issues, it is something related to my setup:
 - not a generic kernel
 - very modular kernel (as much as possible as a module)
 - bind_now (a build without fails too, tested with clean /usr/obj)
 - ccache (a build without fails too, tested with clean /usr/obj)
 - kernel retpoline (build without in progress)
 - userland retpoline (build without in progress)
 - kernel build with WITH_CTF / DDB_CTF (next one to test if it isn't
retpoline)
 - -fno-builtin
 - CPUFLAGS=native (except for stuff in /usr/src/sys/boot)
 - malloc production
 - COPTFLAGS= -O2 -pipe

The issue is, that kernel modules load OK from loader, but once it 
starts
init any module fails to load (e.g. via autodetection of hardware or 
rc.conf
kld_list) with the message that the kernel and module versions are out 
of

sync and the module refuses to load.


What is the exact revision you're running?  There were some unrelated
changes to the kernel linker around the same time.


The working src is from 2024-03-11-094351 (GMT+0100).
The failing src was fetched after Glebs stabilization week message (and 
todays src before the sound stuff still fails).


Retpoline wasn't the cause, next test is the CTF stuff in the kernel...

I tried the workaround to load the modules from the loader, which 
works, but

then I can't login remotely as ssh fails to allocate a pty. By loading
modules via the loader, I can see messages about missing CTF info when 
the

nvidia modules (from ports = not yet rebuild = in /boot/modules/...ko
instead of /boot/kernel/...ko) try to get initialised... and it looks 
like
they are failing to get initialised because of this missing CTF stuff 
(I'm
back to the previous boot env to be able to login remotely and send 
mails, I

don't have a copy of the failure message at hand).

I assume the missing CTF stuff is due to the CTF based pretty printing 
(https://cgit.freebsd.org/src/commit/?id=c21bc6f3c2425de74141bfee07b609bf65b5a6b3).
Is this supposed to fail to load modules which are compiled without 
CTF
data? Shouldn't this work gracefully (e.g. spit out a warning that 
pretty

printing is not available for module X and have the module working)?


From my reading of linker_ctf_load_file(), this is exactly how it
already works.


Great that it works this way, I still suggest to print a message what 
the warning about missing stuff means.


Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


signature.asc
Description: OpenPGP digital signature


Re: Multiple issues with current (kldload failures, missing CTF stuff, pty issues, ...)

2024-03-29 Thread Mark Johnston
On Fri, Mar 29, 2024 at 04:52:55PM +0100, Alexander Leidinger wrote:
> Hi,
> 
> sources from 2024-03-11 work. Sources from 2024-03-25 and today don't work
> (see below for the issue). As the monthly stabilisation pass didn't find
> obvious issues, it is something related to my setup:
>  - not a generic kernel
>  - very modular kernel (as much as possible as a module)
>  - bind_now (a build without fails too, tested with clean /usr/obj)
>  - ccache (a build without fails too, tested with clean /usr/obj)
>  - kernel retpoline (build without in progress)
>  - userland retpoline (build without in progress)
>  - kernel build with WITH_CTF / DDB_CTF (next one to test if it isn't
> retpoline)
>  - -fno-builtin
>  - CPUFLAGS=native (except for stuff in /usr/src/sys/boot)
>  - malloc production
>  - COPTFLAGS= -O2 -pipe
> 
> The issue is, that kernel modules load OK from loader, but once it starts
> init any module fails to load (e.g. via autodetection of hardware or rc.conf
> kld_list) with the message that the kernel and module versions are out of
> sync and the module refuses to load.

What is the exact revision you're running?  There were some unrelated
changes to the kernel linker around the same time.

> I tried the workaround to load the modules from the loader, which works, but
> then I can't login remotely as ssh fails to allocate a pty. By loading
> modules via the loader, I can see messages about missing CTF info when the
> nvidia modules (from ports = not yet rebuild = in /boot/modules/...ko
> instead of /boot/kernel/...ko) try to get initialised... and it looks like
> they are failing to get initialised because of this missing CTF stuff (I'm
> back to the previous boot env to be able to login remotely and send mails, I
> don't have a copy of the failure message at hand).
> 
> I assume the missing CTF stuff is due to the CTF based pretty printing 
> (https://cgit.freebsd.org/src/commit/?id=c21bc6f3c2425de74141bfee07b609bf65b5a6b3).
> Is this supposed to fail to load modules which are compiled without CTF
> data? Shouldn't this work gracefully (e.g. spit out a warning that pretty
> printing is not available for module X and have the module working)?

>From my reading of linker_ctf_load_file(), this is exactly how it
already works.

> Next steps:
>  - try a world without retpoline (bind_now and ccache active)
>  - try a kernel without CTF (bind now, ccache, retpoline active)
>  - try a world without bind_now, retpoline, CTF, CPUFLAGS, COPTFLAGS
> 
> If anyone has an idea how to debug this in some other way...
> 
> Bye,
> Alexander.
> 
> -- 
> http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
> http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF





Multiple issues with current (kldload failures, missing CTF stuff, pty issues, ...)

2024-03-29 Thread Alexander Leidinger

Hi,

sources from 2024-03-11 work. Sources from 2024-03-25 and today don't 
work (see below for the issue). As the monthly stabilisation pass didn't 
find obvious issues, it is something related to my setup:

 - not a generic kernel
 - very modular kernel (as much as possible as a module)
 - bind_now (a build without fails too, tested with clean /usr/obj)
 - ccache (a build without fails too, tested with clean /usr/obj)
 - kernel retpoline (build without in progress)
 - userland retpoline (build without in progress)
 - kernel build with WITH_CTF / DDB_CTF (next one to test if it isn't 
retpoline)

 - -fno-builtin
 - CPUFLAGS=native (except for stuff in /usr/src/sys/boot)
 - malloc production
 - COPTFLAGS= -O2 -pipe

The issue is, that kernel modules load OK from loader, but once it 
starts init any module fails to load (e.g. via autodetection of hardware 
or rc.conf kld_list) with the message that the kernel and module 
versions are out of sync and the module refuses to load.


I tried the workaround to load the modules from the loader, which works, 
but then I can't login remotely as ssh fails to allocate a pty. By 
loading modules via the loader, I can see messages about missing CTF 
info when the nvidia modules (from ports = not yet rebuild = in 
/boot/modules/...ko instead of /boot/kernel/...ko) try to get 
initialised... and it looks like they are failing to get initialised 
because of this missing CTF stuff (I'm back to the previous boot env to 
be able to login remotely and send mails, I don't have a copy of the 
failure message at hand).


I assume the missing CTF stuff is due to the CTF based pretty printing 
(https://cgit.freebsd.org/src/commit/?id=c21bc6f3c2425de74141bfee07b609bf65b5a6b3). 
Is this supposed to fail to load modules which are compiled without CTF 
data? Shouldn't this work gracefully (e.g. spit out a warning that 
pretty printing is not available for module X and have the module 
working)?


Next steps:
 - try a world without retpoline (bind_now and ccache active)
 - try a kernel without CTF (bind now, ccache, retpoline active)
 - try a world without bind_now, retpoline, CTF, CPUFLAGS, COPTFLAGS

If anyone has an idea how to debug this in some other way...

Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


signature.asc
Description: OpenPGP digital signature


Re: Request for Testing: TCP RACK

2024-03-28 Thread tuexen
> On 28. Mar 2024, at 15:00, Nuno Teixeira  wrote:
> 
> Hello all!
> 
> Running rack @b7b78c1c169 "Optimize HPTS..." very happy on my laptop (amd64)!
> 
> Thanks all!
Thanks for the feedback!

Best regards
Michael
> 
> Drew Gallatin  escreveu (quinta, 21/03/2024 à(s) 12:58):
> The entire point is to *NOT* go through the overhead of scheduling something 
> asynchronously, but to take advantage of the fact that a user/kernel 
> transition is going to trash the cache anyway.
> 
> In the common case of a system which has less than the threshold  number of 
> connections , we access the tcp_hpts_softclock function pointer, make one 
> function call, and access hpts_that_need_softclock, and then return.  So 
> that's 2 variables and a function call.
> 
> I think it would be preferable to avoid that call, and to move the 
> declaration of tcp_hpts_softclock and hpts_that_need_softclock so that they 
> are in the same cacheline.  Then we'd be hitting just a single line in the 
> common case.  (I've made comments on the review to that effect).
> 
> Also, I wonder if the threshold could get higher by default, so that hpts is 
> never called in this context unless we're to the point where we're scheduling 
> thousands of runs of the hpts thread (and taking all those clock interrupts).
> 
> Drew
> 
> On Wed, Mar 20, 2024, at 8:17 PM, Konstantin Belousov wrote:
>> On Tue, Mar 19, 2024 at 06:19:52AM -0400, rrs wrote:
>>> Ok I have created
>>> 
>>> https://reviews.freebsd.org/D44420
>>> 
>>> 
>>> To address the issue. I also attach a short version of the patch that Nuno
>>> can try and validate
>>> 
>>> it works. Drew you may want to try this and validate the optimization does
>>> kick in since I can
>>> 
>>> only now test that it does not on my local box :)
>> The patch still causes access to all cpu's cachelines on each userret.
>> It would be much better to inc/check the threshold and only schedule the
>> call when exceeded.  Then the call can occur in some dedicated context,
>> like per-CPU thread, instead of userret.
>> 
>>> 
>>> 
>>> R
>>> 
>>> 
>>> 
>>> On 3/18/24 3:42 PM, Drew Gallatin wrote:
 No.  The goal is to run on every return to userspace for every thread.
 
 Drew
 
 On Mon, Mar 18, 2024, at 3:41 PM, Konstantin Belousov wrote:
> On Mon, Mar 18, 2024 at 03:13:11PM -0400, Drew Gallatin wrote:
>> I got the idea from
>> https://people.mpi-sws.org/~druschel/publications/soft-timers-tocs.pdf
>> The gist is that the TCP pacing stuff needs to run frequently, and
>> rather than run it out of a clock interrupt, its more efficient to run
>> it out of a system call context at just the point where we return to
>> userspace and the cache is trashed anyway. The current implementation
>> is fine for our workload, but probably not idea for a generic system.
>> Especially one where something is banging on system calls.
>> 
>> Ast's could be the right tool for this, but I'm super unfamiliar with
>> them, and I can't find any docs on them.
>> 
>> Would ast_register(0, ASTR_UNCOND, 0, func) be roughly equivalent to
>> what's happening here?
> This call would need some AST number added, and then it registers the
> ast to run on next return to userspace, for the current thread.
> 
> Is it enough?
>> 
>> Drew
> 
>> 
>> On Mon, Mar 18, 2024, at 2:33 PM, Konstantin Belousov wrote:
>>> On Mon, Mar 18, 2024 at 07:26:10AM -0500, Mike Karels wrote:
 On 18 Mar 2024, at 7:04, tue...@freebsd.org wrote:
 
>> On 18. Mar 2024, at 12:42, Nuno Teixeira
>  wrote:
>> 
>> Hello all!
>> 
>> It works just fine!
>> System performance is OK.
>> Using patch on main-n268841-b0aaf8beb126(-dirty).
>> 
>> ---
>> net.inet.tcp.functions_available:
>> Stack   D
> AliasPCB count
>> freebsd freebsd  0
>> rack*
> rack 38
>> ---
>> 
>> It would be so nice that we can have a sysctl tunnable for
> this patch
>> so we could do more tests without recompiling kernel.
> Thanks for testing!
> 
> @gallatin: can you come up with a patch that is acceptable
> for Netflix
> and allows to mitigate the performance regression.
 
 Ideally, tcphpts could enable this automatically when it
> starts to be
 used (enough?), but a sysctl could select auto/on/off.
>>> There is already a well-known mechanism to request execution of the
>>> specific function on return to userspace, namely AST.  The difference
>>> with the current hack is that the execution is requested for one
> callback
>>> in the context of the specific thread.
>>> 
>>> Still, it might be worth a try to use it; what 

Re: Request for Testing: TCP RACK

2024-03-28 Thread Nuno Teixeira
Hello all!

Running rack @b7b78c1c169 "Optimize HPTS..." very happy on my laptop
(amd64)!

Thanks all!

Drew Gallatin  escreveu (quinta, 21/03/2024 à(s)
12:58):

> The entire point is to *NOT* go through the overhead of scheduling
> something asynchronously, but to take advantage of the fact that a
> user/kernel transition is going to trash the cache anyway.
>
> In the common case of a system which has less than the threshold  number
> of connections , we access the tcp_hpts_softclock function pointer, make
> one function call, and access hpts_that_need_softclock, and then return.
> So that's 2 variables and a function call.
>
> I think it would be preferable to avoid that call, and to move the
> declaration of tcp_hpts_softclock and hpts_that_need_softclock so that they
> are in the same cacheline.  Then we'd be hitting just a single line in the
> common case.  (I've made comments on the review to that effect).
>
> Also, I wonder if the threshold could get higher by default, so that hpts
> is never called in this context unless we're to the point where we're
> scheduling thousands of runs of the hpts thread (and taking all those clock
> interrupts).
>
> Drew
>
> On Wed, Mar 20, 2024, at 8:17 PM, Konstantin Belousov wrote:
>
> On Tue, Mar 19, 2024 at 06:19:52AM -0400, rrs wrote:
> > Ok I have created
> >
> > https://reviews.freebsd.org/D44420
> >
> >
> > To address the issue. I also attach a short version of the patch that
> Nuno
> > can try and validate
> >
> > it works. Drew you may want to try this and validate the optimization
> does
> > kick in since I can
> >
> > only now test that it does not on my local box :)
> The patch still causes access to all cpu's cachelines on each userret.
> It would be much better to inc/check the threshold and only schedule the
> call when exceeded.  Then the call can occur in some dedicated context,
> like per-CPU thread, instead of userret.
>
> >
> >
> > R
> >
> >
> >
> > On 3/18/24 3:42 PM, Drew Gallatin wrote:
> > > No.  The goal is to run on every return to userspace for every thread.
> > >
> > > Drew
> > >
> > > On Mon, Mar 18, 2024, at 3:41 PM, Konstantin Belousov wrote:
> > > > On Mon, Mar 18, 2024 at 03:13:11PM -0400, Drew Gallatin wrote:
> > > > > I got the idea from
> > > > >
> https://people.mpi-sws.org/~druschel/publications/soft-timers-tocs.pdf
> > > > > The gist is that the TCP pacing stuff needs to run frequently, and
> > > > > rather than run it out of a clock interrupt, its more efficient to
> run
> > > > > it out of a system call context at just the point where we return
> to
> > > > > userspace and the cache is trashed anyway. The current
> implementation
> > > > > is fine for our workload, but probably not idea for a generic
> system.
> > > > > Especially one where something is banging on system calls.
> > > > >
> > > > > Ast's could be the right tool for this, but I'm super unfamiliar
> with
> > > > > them, and I can't find any docs on them.
> > > > >
> > > > > Would ast_register(0, ASTR_UNCOND, 0, func) be roughly equivalent
> to
> > > > > what's happening here?
> > > > This call would need some AST number added, and then it registers the
> > > > ast to run on next return to userspace, for the current thread.
> > > >
> > > > Is it enough?
> > > > >
> > > > > Drew
> > > >
> > > > >
> > > > > On Mon, Mar 18, 2024, at 2:33 PM, Konstantin Belousov wrote:
> > > > > > On Mon, Mar 18, 2024 at 07:26:10AM -0500, Mike Karels wrote:
> > > > > > > On 18 Mar 2024, at 7:04, tue...@freebsd.org wrote:
> > > > > > >
> > > > > > > >> On 18. Mar 2024, at 12:42, Nuno Teixeira
> > > >  wrote:
> > > > > > > >>
> > > > > > > >> Hello all!
> > > > > > > >>
> > > > > > > >> It works just fine!
> > > > > > > >> System performance is OK.
> > > > > > > >> Using patch on main-n268841-b0aaf8beb126(-dirty).
> > > > > > > >>
> > > > > > > >> ---
> > > > > > > >> net.inet.tcp.functions_available:
> > > > > > > >> Stack   D
> > > > AliasPCB count
> > > > > > > >> freebsd freebsd  0
> > > > > > > >> rack*
> > > > rack 38
> > > > > > > >> ---
> > > > > > > >>
> > > > > > > >> It would be so nice that we can have a sysctl tunnable for
> > > > this patch
> > > > > > > >> so we could do more tests without recompiling kernel.
> > > > > > > > Thanks for testing!
> > > > > > > >
> > > > > > > > @gallatin: can you come up with a patch that is acceptable
> > > > for Netflix
> > > > > > > > and allows to mitigate the performance regression.
> > > > > > >
> > > > > > > Ideally, tcphpts could enable this automatically when it
> > > > starts to be
> > > > > > > used (enough?), but a sysctl could select auto/on/off.
> > > > > > There is already a well-known mechanism to request execution of
> the
> > > > > > specific function on return to userspace, namely AST.  The
> difference
> > > > > > with the current hack is that the execution is requested for one
> > > > 

  1   2   3   4   5   6   7   8   9   10   >