Re: bsdinstall wifi setup is broken on CURRENT

2024-05-24 Thread Renato Botelho

On 18/05/24 11:33, Alfonso S. Siciliano wrote:

On 5/16/24 20:40, Renato Botelho wrote:
I saw some users on a .br group complaining bsdinstall was failing to 
setup wifi network on 15.0 snapshots and tried it myself.  I was able 
to reproduce the problem and also noticed another one.




Thank you for your report, the video is highly appreciated to understand 
the problem quickly and exactly.


I noticed Network Selection screen only shows one line, it's not 
beautiful to navigate through items this way.  On 14.1-BETA2 it shows 
multiple lines so it seems to be a regression.


Problem 1. Looking at wlanconfig it seems related to $height $width 
$rows for the selecting menu. Please could you open a PR adding me, so 
we can test and solve.




The problem users reported was: after selecting desired network it 
just starts over instead of asking for password.  I made a video [1] 
showing the problem.


Problem 2. I know this issue about --mixedform, my last import 2 day ago 
should solve a6d8be451f62d425b71a4874f7d4e133b9fb393c.
You could try the last main snapshot (yesterday 17 May), please let me 
know any problem.


I confirmed it is fixed with bsddialog 1.0.2 but I found another issue 
while testing.


Instead of password, it was adding SSID to psk field of 
wpa_supplicant.conf.  I've created following review to address that


https://reviews.freebsd.org/D45344

Thanks!
--
Renato Botelho



Re: bsdinstall wifi setup is broken on CURRENT

2024-05-20 Thread Renato Botelho

On 18/05/24 11:33, Alfonso S. Siciliano wrote:

On 5/16/24 20:40, Renato Botelho wrote:
I saw some users on a .br group complaining bsdinstall was failing to 
setup wifi network on 15.0 snapshots and tried it myself.  I was able 
to reproduce the problem and also noticed another one.




Thank you for your report, the video is highly appreciated to understand 
the problem quickly and exactly.


I noticed Network Selection screen only shows one line, it's not 
beautiful to navigate through items this way.  On 14.1-BETA2 it shows 
multiple lines so it seems to be a regression.


Problem 1. Looking at wlanconfig it seems related to $height $width 
$rows for the selecting menu. Please could you open a PR adding me, so 
we can test and solve.


I've fixed it locally and submitted a fix for review

https://reviews.freebsd.org/D45271



The problem users reported was: after selecting desired network it 
just starts over instead of asking for password.  I made a video [1] 
showing the problem.


Problem 2. I know this issue about --mixedform, my last import 2 day ago 
should solve a6d8be451f62d425b71a4874f7d4e133b9fb393c.
You could try the last main snapshot (yesterday 17 May), please let me 
know any problem.


Last snapshot still contains bsddialog 1.0 so I'll wait for the next one 
and give it a try.




Jessica, I've cc'd you because git shows you were the last person 
making changes in this area.  If it's not related and I made a 
mistake, just ignore me.


[1] https://youtube.com/shorts/Gmeckokw2a0


Again thanks for the video.

Best Regards,
Alfonso




--
Renato Botelho



Re: bsdinstall wifi setup is broken on CURRENT

2024-05-18 Thread Alfonso S. Siciliano

On 5/16/24 20:40, Renato Botelho wrote:
I saw some users on a .br group complaining bsdinstall was failing to 
setup wifi network on 15.0 snapshots and tried it myself.  I was able to 
reproduce the problem and also noticed another one.




Thank you for your report, the video is highly appreciated to understand 
the problem quickly and exactly.


I noticed Network Selection screen only shows one line, it's not 
beautiful to navigate through items this way.  On 14.1-BETA2 it shows 
multiple lines so it seems to be a regression.


Problem 1. Looking at wlanconfig it seems related to $height $width 
$rows for the selecting menu. Please could you open a PR adding me, so 
we can test and solve.




The problem users reported was: after selecting desired network it just 
starts over instead of asking for password.  I made a video [1] showing 
the problem.


Problem 2. I know this issue about --mixedform, my last import 2 day ago 
should solve a6d8be451f62d425b71a4874f7d4e133b9fb393c.
You could try the last main snapshot (yesterday 17 May), please let me 
know any problem.




Jessica, I've cc'd you because git shows you were the last person making 
changes in this area.  If it's not related and I made a mistake, just 
ignore me.


[1] https://youtube.com/shorts/Gmeckokw2a0


Again thanks for the video.

Best Regards,
Alfonso




Re: bsdinstall wifi setup is broken on CURRENT

2024-05-16 Thread Dag-Erling Smørgrav
Renato Botelho  writes:
> I'm not sure about a good way to test it on a running system instead.

Update your source tree, build and install world, run `sudo bsdconfig`,
scroll down and select “Network Management”, then select “Wireless
Networks”.

DES
-- 
Dag-Erling Smørgrav - d...@freebsd.org



Re: bsdinstall wifi setup is broken on CURRENT

2024-05-16 Thread Nuno Teixeira
Hello Renato,

I will give it a try this weekend with bhyve since I have a passtrhu for
iwlwifi card.

Cheers,

Renato Botelho  escreveu (quinta, 16/05/2024 à(s) 19:56):

> On 16/05/24 15:47, Jessica Clarke wrote:
> > On 16 May 2024, at 19:40, Renato Botelho  wrote:
> >>
> >> I saw some users on a .br group complaining bsdinstall was failing to
> setup wifi network on 15.0 snapshots and tried it myself.  I was able to
> reproduce the problem and also noticed another one.
> >>
> >> I noticed Network Selection screen only shows one line, it's not
> beautiful to navigate through items this way.  On 14.1-BETA2 it shows
> multiple lines so it seems to be a regression.
> >>
> >> The problem users reported was: after selecting desired network it just
> starts over instead of asking for password.  I made a video [1] showing the
> problem.
> >>
> >> Jessica, I've cc'd you because git shows you were the last person
> making changes in this area.  If it's not related and I made a mistake,
> just ignore me.
> >
> > Hi Renato,
> > I touched the code that lets you select the wireless interface in the
> > first place, but not the script that then gets called to set it up and
> > is responsible for the dialogs you see. Given the behaviour, I wonder
> > if this is what today’s import of bsddialog[1] fixes? From reading the
> > script the next dialog uses --mixedform, and restarts the script on
> > error, which it looks like is what you observe.
>
> Thanks for pointing that out, Jessica.  I'll wait for the next 15
> snapshot and will check.
>
> I'm not sure about a good way to test it on a running system instead.
>
> --
> Renato Botelho
>
>

-- 
Nuno Teixeira
FreeBSD UNIX: Web:  https://FreeBSD.org


bsdinstall wifi setup is broken on CURRENT

2024-05-16 Thread SAH



Thank you for the information. The right email address is

i...@aktionheizung.de

Pay information exclusively to this email address. Thanks


-
On 16 May 2024, at 19:40, Renato Botelho  wrote:

I saw some users on a .br group complaining bsdinstall was failing to setup 
wifi network on 15.0 snapshots and tried it myself.  I was able to reproduce 
the problem and also noticed another one.

I noticed Network Selection screen only shows one line, it's not beautiful to 
navigate through items this way.  On 14.1-BETA2 it shows multiple lines so it 
seems to be a regression.

The problem users reported was: after selecting desired network it just starts 
over instead of asking for password.  I made a video [1] showing the problem.

Jessica, I've cc'd you because git shows you were the last person making 
changes in this area.  If it's not related and I made a mistake, just ignore me.


Hi Renato,
I touched the code that lets you select the wireless interface in the
first place, but not the script that then gets called to set it up and
is responsible for the dialogs you see. Given the behaviour, I wonder
if this is what today’s import of bsddialog[1] fixes? From reading the
script the next dialog uses --mixedform, and restarts the script on
error, which it looks like is what you observe.

Jess

[1]https://cgit.freebsd.org/src/commit/?id=a6d8be451f62d425b71a4874f7d4e133b9fb393c


[1]https://youtube.com/shorts/Gmeckokw2a0
--
Renato Botelho


Re: bsdinstall wifi setup is broken on CURRENT

2024-05-16 Thread Renato Botelho

On 16/05/24 15:47, Jessica Clarke wrote:

On 16 May 2024, at 19:40, Renato Botelho  wrote:


I saw some users on a .br group complaining bsdinstall was failing to setup 
wifi network on 15.0 snapshots and tried it myself.  I was able to reproduce 
the problem and also noticed another one.

I noticed Network Selection screen only shows one line, it's not beautiful to 
navigate through items this way.  On 14.1-BETA2 it shows multiple lines so it 
seems to be a regression.

The problem users reported was: after selecting desired network it just starts 
over instead of asking for password.  I made a video [1] showing the problem.

Jessica, I've cc'd you because git shows you were the last person making 
changes in this area.  If it's not related and I made a mistake, just ignore me.


Hi Renato,
I touched the code that lets you select the wireless interface in the
first place, but not the script that then gets called to set it up and
is responsible for the dialogs you see. Given the behaviour, I wonder
if this is what today’s import of bsddialog[1] fixes? From reading the
script the next dialog uses --mixedform, and restarts the script on
error, which it looks like is what you observe.


Thanks for pointing that out, Jessica.  I'll wait for the next 15 
snapshot and will check.


I'm not sure about a good way to test it on a running system instead.

--
Renato Botelho



Re: bsdinstall wifi setup is broken on CURRENT

2024-05-16 Thread Jessica Clarke
On 16 May 2024, at 19:40, Renato Botelho  wrote:
> 
> I saw some users on a .br group complaining bsdinstall was failing to setup 
> wifi network on 15.0 snapshots and tried it myself.  I was able to reproduce 
> the problem and also noticed another one.
> 
> I noticed Network Selection screen only shows one line, it's not beautiful to 
> navigate through items this way.  On 14.1-BETA2 it shows multiple lines so it 
> seems to be a regression.
> 
> The problem users reported was: after selecting desired network it just 
> starts over instead of asking for password.  I made a video [1] showing the 
> problem.
> 
> Jessica, I've cc'd you because git shows you were the last person making 
> changes in this area.  If it's not related and I made a mistake, just ignore 
> me.

Hi Renato,
I touched the code that lets you select the wireless interface in the
first place, but not the script that then gets called to set it up and
is responsible for the dialogs you see. Given the behaviour, I wonder
if this is what today’s import of bsddialog[1] fixes? From reading the
script the next dialog uses --mixedform, and restarts the script on
error, which it looks like is what you observe.

Jess

[1] 
https://cgit.freebsd.org/src/commit/?id=a6d8be451f62d425b71a4874f7d4e133b9fb393c

> [1] https://youtube.com/shorts/Gmeckokw2a0
> -- 
> Renato Botelho




bsdinstall wifi setup is broken on CURRENT

2024-05-16 Thread Renato Botelho
I saw some users on a .br group complaining bsdinstall was failing to 
setup wifi network on 15.0 snapshots and tried it myself.  I was able to 
reproduce the problem and also noticed another one.


I noticed Network Selection screen only shows one line, it's not 
beautiful to navigate through items this way.  On 14.1-BETA2 it shows 
multiple lines so it seems to be a regression.


The problem users reported was: after selecting desired network it just 
starts over instead of asking for password.  I made a video [1] showing 
the problem.


Jessica, I've cc'd you because git shows you were the last person making 
changes in this area.  If it's not related and I made a mistake, just 
ignore me.


[1] https://youtube.com/shorts/Gmeckokw2a0
--
Renato Botelho



Re: pkg server for current/arm64 stopped ? [main-armv7 on ampere2, . . .] [Update to Host OSVERSION 1500018 did not help]

2024-05-08 Thread Philip Paeps

On 2024-05-08 23:53:57 (+0800), Mark Millard wrote:


On Apr 29, 2024, at 20:16, Mark Millard  wrote:


On Apr 29, 2024, at 20:11, Mark Millard  wrote:


On Apr 29, 2024, at 19:54, Mark Millard  wrote:


On Apr 28, 2024, at 18:06, Philip Paeps  wrote:


On 2024-04-18 23:14:22 (+0800), Mark Millard wrote:
On Apr 18, 2024, at 08:02, Mark Millard  
wrote:

void  wrote on
Date: Thu, 18 Apr 2024 14:08:36 UTC :


Not sure where to post this..

The last bulk build for arm64 appears to have happened around
mid-March on ampere2. Is it broken?


main-armv7 building is broken and the last completed build
was the one started on Mon, 19 Feb 2024 12:32:10 GMT. It
gets stuck making no progress until manually forced to stop,
which leads to huge elapsed times for the incomplete builds:

[...]

My guess is that FreeBSD has something that broken after 
bd45bbe440
that was broken as of f5f08e41aa and was still broken at 
75464941dc .




One thing of possible note:

Failing . . .

Host OSVERSION: 156
Jail OSVERSION: 1500014


I have finished a package builder refresh this morning.  All our 
builder hosts (except PowerPC - I don't touch those) are now on 
main-n269671-feabaf8d5389 (OSVERSION 1500018).


ampere1 successfully finished its 140releng-armv7-quarterly build, 
so it looks like the problem with stuck builds was limited to 
ampere2 building main-armv7.  I'll keep a close eye on this one 
when it starts its next build.




I see that main-armv7 started.

It queued only 31935 instead of the prior 34528 (or more): it is 
doing an
incremental build instead of a full build. For example, pkg was not 
built
but instead the prior build is in use. Thus bad results from the 
prior

build might be involved in this new build.

I'd recommend forcing a full "poudriere bulk -c -a" that does a 
from-scratch

build for the purposes of the main-armv7 test.


Actually the test is not going to previde the information we are
after as things are.

giflib-5.2.2 failed to build, which leads to devel/doxygen being
skipped. devel/doxygen was the first one to hang up in the prior
2 failing attempts, if I remember right.

giflib-5.2.2 also causes graphics/graphviz to be skipped.
graphics/graphviz was installed just before the hangup in all of
the example hanups. So the context will not be replicated.

We need graphics/giflib to build to actually do the test.


Looks like:

https://cgit.freebsd.org/ports/commit/graphics/giflib?id=5007109903fc271e3ef0ba01d78781c1fed99f3f

is the fix for the graphic/giflib build failure.


Well, main-armv7 is building again and things are still
getting stuck. So much for my idea. For reference I
list the over 10-hr-so-far ones:

doxygen-1.9.6_1,2   build-depends 13:03:54
py39-pydot-2.0.0run-depends   12:24:04
py39-pygraphviz-1.6 lib-depends   12:10:38

"ps -alxdww" would likely be appropriate to get a copy
of the otuput of.

"procstat -k -k" usage and the like on stuck processes
would probably be appropriate.

Does anyone with appropriate investigative background
have login access to ampere2 to take a look at what
is getting stuck?


This is unfortunate.  I'm sure I have the appropriate background, but 
I'm spread very thin!  I'll get as much information as I can about this 
machine while it's stuck, before I bounce it again.


I think it may be worth a try building those ports in isolation on 
ref14-aarch64, and see what they're trying to do.  I'll also set up a 
set of refX-armv7 jails on that machine.


Hopefully we can get to the bottom of this soon.  This is a very tedious 
failure mode.


We could also try to put an older armv7 image on the builder jail on 
ampere2.  Depending on whether we have a sufficiently old image, that 
will either be very straightforward, or a very deep rabbit hole.


Thanks again for keeping an eye on this.  We really should have better 
monitoring for stuck builds than "Mark will tell us". :-)


Philip



Re: pkg server for current/arm64 stopped ? [main-armv7 on ampere2, . . .] [Update to Host OSVERSION 1500018 did not help]

2024-05-08 Thread Mark Millard
On Apr 29, 2024, at 20:16, Mark Millard  wrote:

> On Apr 29, 2024, at 20:11, Mark Millard  wrote:
> 
>> On Apr 29, 2024, at 19:54, Mark Millard  wrote:
>> 
>>> On Apr 28, 2024, at 18:06, Philip Paeps  wrote:
>>> 
 On 2024-04-18 23:14:22 (+0800), Mark Millard wrote:
> On Apr 18, 2024, at 08:02, Mark Millard  wrote:
>> void  wrote on
>> Date: Thu, 18 Apr 2024 14:08:36 UTC :
>> 
>>> Not sure where to post this..
>>> 
>>> The last bulk build for arm64 appears to have happened around
>>> mid-March on ampere2. Is it broken?
>> 
>> main-armv7 building is broken and the last completed build
>> was the one started on Mon, 19 Feb 2024 12:32:10 GMT. It
>> gets stuck making no progress until manually forced to stop,
>> which leads to huge elapsed times for the incomplete builds:
>> 
>> [...]
>> 
>> My guess is that FreeBSD has something that broken after bd45bbe440
>> that was broken as of f5f08e41aa and was still broken at 75464941dc .
>> 
> 
> One thing of possible note:
> 
> Failing . . .
> 
> Host OSVERSION: 156
> Jail OSVERSION: 1500014
 
 I have finished a package builder refresh this morning.  All our builder 
 hosts (except PowerPC - I don't touch those) are now on 
 main-n269671-feabaf8d5389 (OSVERSION 1500018).
 
 ampere1 successfully finished its 140releng-armv7-quarterly build, so it 
 looks like the problem with stuck builds was limited to ampere2 building 
 main-armv7.  I'll keep a close eye on this one when it starts its next 
 build.
 
>>> 
>>> I see that main-armv7 started.
>>> 
>>> It queued only 31935 instead of the prior 34528 (or more): it is doing an
>>> incremental build instead of a full build. For example, pkg was not built
>>> but instead the prior build is in use. Thus bad results from the prior
>>> build might be involved in this new build.
>>> 
>>> I'd recommend forcing a full "poudriere bulk -c -a" that does a from-scratch
>>> build for the purposes of the main-armv7 test.
>> 
>> Actually the test is not going to previde the information we are
>> after as things are.
>> 
>> giflib-5.2.2 failed to build, which leads to devel/doxygen being
>> skipped. devel/doxygen was the first one to hang up in the prior
>> 2 failing attempts, if I remember right.
>> 
>> giflib-5.2.2 also causes graphics/graphviz to be skipped.
>> graphics/graphviz was installed just before the hangup in all of
>> the example hanups. So the context will not be replicated.
>> 
>> We need graphics/giflib to build to actually do the test.
> 
> Looks like:
> 
> https://cgit.freebsd.org/ports/commit/graphics/giflib?id=5007109903fc271e3ef0ba01d78781c1fed99f3f
> 
> is the fix for the graphic/giflib build failure.

Well, main-armv7 is building again and things are still
getting stuck. So much for my idea. For reference I
list the over 10-hr-so-far ones:

doxygen-1.9.6_1,2   build-depends 13:03:54
py39-pydot-2.0.0run-depends   12:24:04
py39-pygraphviz-1.6 lib-depends   12:10:38

"ps -alxdww" would likely be appropriate to get a copy
of the otuput of.

"procstat -k -k" usage and the like on stuck processes
would probably be appropriate.

Does anyone with appropriate investigative background
have login access to ampere2 to take a look at what
is getting stuck?


===
Mark Millard
marklmi at yahoo.com




Re: pkg server for current/arm64 stopped ? [main-armv7 on ampere2, elapsed so far: 651:21:56]

2024-04-29 Thread Mark Millard



On Apr 29, 2024, at 20:11, Mark Millard  wrote:

> On Apr 29, 2024, at 19:54, Mark Millard  wrote:
> 
>> On Apr 28, 2024, at 18:06, Philip Paeps  wrote:
>> 
>>> On 2024-04-18 23:14:22 (+0800), Mark Millard wrote:
 On Apr 18, 2024, at 08:02, Mark Millard  wrote:
> void  wrote on
> Date: Thu, 18 Apr 2024 14:08:36 UTC :
> 
>> Not sure where to post this..
>> 
>> The last bulk build for arm64 appears to have happened around
>> mid-March on ampere2. Is it broken?
> 
> main-armv7 building is broken and the last completed build
> was the one started on Mon, 19 Feb 2024 12:32:10 GMT. It
> gets stuck making no progress until manually forced to stop,
> which leads to huge elapsed times for the incomplete builds:
> 
> [...]
> 
> My guess is that FreeBSD has something that broken after bd45bbe440
> that was broken as of f5f08e41aa and was still broken at 75464941dc .
> 
 
 One thing of possible note:
 
 Failing . . .
 
 Host OSVERSION: 156
 Jail OSVERSION: 1500014
>>> 
>>> I have finished a package builder refresh this morning.  All our builder 
>>> hosts (except PowerPC - I don't touch those) are now on 
>>> main-n269671-feabaf8d5389 (OSVERSION 1500018).
>>> 
>>> ampere1 successfully finished its 140releng-armv7-quarterly build, so it 
>>> looks like the problem with stuck builds was limited to ampere2 building 
>>> main-armv7.  I'll keep a close eye on this one when it starts its next 
>>> build.
>>> 
>> 
>> I see that main-armv7 started.
>> 
>> It queued only 31935 instead of the prior 34528 (or more): it is doing an
>> incremental build instead of a full build. For example, pkg was not built
>> but instead the prior build is in use. Thus bad results from the prior
>> build might be involved in this new build.
>> 
>> I'd recommend forcing a full "poudriere bulk -c -a" that does a from-scratch
>> build for the purposes of the main-armv7 test.
> 
> Actually the test is not going to previde the information we are
> after as things are.
> 
> giflib-5.2.2 failed to build, which leads to devel/doxygen being
> skipped. devel/doxygen was the first one to hang up in the prior
> 2 failing attempts, if I remember right.
> 
> giflib-5.2.2 also causes graphics/graphviz to be skipped.
> graphics/graphviz was installed just before the hangup in all of
> the example hanups. So the context will not be replicated.
> 
> We need graphics/giflib to build to actually do the test.

Looks like:

https://cgit.freebsd.org/ports/commit/graphics/giflib?id=5007109903fc271e3ef0ba01d78781c1fed99f3f

is the fix for the graphic/giflib build failure.

===
Mark Millard
marklmi at yahoo.com




Re: pkg server for current/arm64 stopped ? [main-armv7 on ampere2, elapsed so far: 651:21:56]

2024-04-29 Thread Mark Millard
On Apr 29, 2024, at 19:54, Mark Millard  wrote:

> On Apr 28, 2024, at 18:06, Philip Paeps  wrote:
> 
>> On 2024-04-18 23:14:22 (+0800), Mark Millard wrote:
>>> On Apr 18, 2024, at 08:02, Mark Millard  wrote:
 void  wrote on
 Date: Thu, 18 Apr 2024 14:08:36 UTC :
 
> Not sure where to post this..
> 
> The last bulk build for arm64 appears to have happened around
> mid-March on ampere2. Is it broken?
 
 main-armv7 building is broken and the last completed build
 was the one started on Mon, 19 Feb 2024 12:32:10 GMT. It
 gets stuck making no progress until manually forced to stop,
 which leads to huge elapsed times for the incomplete builds:
 
 [...]
 
 My guess is that FreeBSD has something that broken after bd45bbe440
 that was broken as of f5f08e41aa and was still broken at 75464941dc .
 
>>> 
>>> One thing of possible note:
>>> 
>>> Failing . . .
>>> 
>>> Host OSVERSION: 156
>>> Jail OSVERSION: 1500014
>> 
>> I have finished a package builder refresh this morning.  All our builder 
>> hosts (except PowerPC - I don't touch those) are now on 
>> main-n269671-feabaf8d5389 (OSVERSION 1500018).
>> 
>> ampere1 successfully finished its 140releng-armv7-quarterly build, so it 
>> looks like the problem with stuck builds was limited to ampere2 building 
>> main-armv7.  I'll keep a close eye on this one when it starts its next build.
>> 
> 
> I see that main-armv7 started.
> 
> It queued only 31935 instead of the prior 34528 (or more): it is doing an
> incremental build instead of a full build. For example, pkg was not built
> but instead the prior build is in use. Thus bad results from the prior
> build might be involved in this new build.
> 
> I'd recommend forcing a full "poudriere bulk -c -a" that does a from-scratch
> build for the purposes of the main-armv7 test.

Actually the test is not going to previde the information we are
after as things are.

giflib-5.2.2 failed to build, which leads to devel/doxygen being
skipped. devel/doxygen was the first one to hang up in the prior
2 failing attempts, if I remember right.

giflib-5.2.2 also causes graphics/graphviz to be skipped.
graphics/graphviz was installed just before the hangup in all of
the example hanups. So the context will not be replicated.

We need graphics/giflib to build to actually do the test.


===
Mark Millard
marklmi at yahoo.com




Re: pkg server for current/arm64 stopped ? [main-armv7 on ampere2, elapsed so far: 651:21:56]

2024-04-29 Thread Mark Millard
On Apr 28, 2024, at 18:06, Philip Paeps  wrote:

> On 2024-04-18 23:14:22 (+0800), Mark Millard wrote:
>> On Apr 18, 2024, at 08:02, Mark Millard  wrote:
>>> void  wrote on
>>> Date: Thu, 18 Apr 2024 14:08:36 UTC :
>>> 
 Not sure where to post this..
 
 The last bulk build for arm64 appears to have happened around
 mid-March on ampere2. Is it broken?
>>> 
>>> main-armv7 building is broken and the last completed build
>>> was the one started on Mon, 19 Feb 2024 12:32:10 GMT. It
>>> gets stuck making no progress until manually forced to stop,
>>> which leads to huge elapsed times for the incomplete builds:
>>> 
>>> [...]
>>> 
>>> My guess is that FreeBSD has something that broken after bd45bbe440
>>> that was broken as of f5f08e41aa and was still broken at 75464941dc .
>>> 
>> 
>> One thing of possible note:
>> 
>> Failing . . .
>> 
>> Host OSVERSION: 156
>> Jail OSVERSION: 1500014
> 
> I have finished a package builder refresh this morning.  All our builder 
> hosts (except PowerPC - I don't touch those) are now on 
> main-n269671-feabaf8d5389 (OSVERSION 1500018).
> 
> ampere1 successfully finished its 140releng-armv7-quarterly build, so it 
> looks like the problem with stuck builds was limited to ampere2 building 
> main-armv7.  I'll keep a close eye on this one when it starts its next build.
> 

I see that main-armv7 started.

It queued only 31935 instead of the prior 34528 (or more): it is doing an
incremental build instead of a full build. For example, pkg was not built
but instead the prior build is in use. Thus bad results from the prior
build might be involved in this new build.

I'd recommend forcing a full "poudriere bulk -c -a" that does a from-scratch
build for the purposes of the main-armv7 test.

===
Mark Millard
marklmi at yahoo.com




Re: pkg server for current/arm64 stopped ? [main-armv7 on ampere2, elapsed so far: 651:21:56]

2024-04-28 Thread Philip Paeps

On 2024-04-18 23:14:22 (+0800), Mark Millard wrote:

On Apr 18, 2024, at 08:02, Mark Millard  wrote:

void  wrote on
Date: Thu, 18 Apr 2024 14:08:36 UTC :


Not sure where to post this..

The last bulk build for arm64 appears to have happened around
mid-March on ampere2. Is it broken?


main-armv7 building is broken and the last completed build
was the one started on Mon, 19 Feb 2024 12:32:10 GMT. It
gets stuck making no progress until manually forced to stop,
which leads to huge elapsed times for the incomplete builds:

[...]

My guess is that FreeBSD has something that broken after bd45bbe440
that was broken as of f5f08e41aa and was still broken at 75464941dc .



One thing of possible note:

Failing . . .

Host OSVERSION: 156
Jail OSVERSION: 1500014


I have finished a package builder refresh this morning.  All our builder 
hosts (except PowerPC - I don't touch those) are now on 
main-n269671-feabaf8d5389 (OSVERSION 1500018).


ampere1 successfully finished its 140releng-armv7-quarterly build, so it 
looks like the problem with stuck builds was limited to ampere2 building 
main-armv7.  I'll keep a close eye on this one when it starts its next 
build.


Philip



Re: TXT Kernel linking failed on -CURRENT

2024-04-26 Thread BSD USER

Konstantin, good day!

25.04.2024 0:09, Konstantin Belousov пишет:

On Wed, Apr 24, 2024 at 01:12:39PM +0500, BSD USER wrote:

linking kernel
ld: error: undefined symbol: ktrcapfail

referenced by vfs_lookup.c
    vfs_lookup.o:(namei)
referenced by vfs_lookup.c
    vfs_lookup.o:(namei_setup)
referenced by vfs_lookup.c
    vfs_lookup.o:(vfs_lookup)
referenced 3 more times

*** [kernel] Error code 1

Try
https://reviews.freebsd.org/D44931


Yes, now system and kernel builds fine.

Thanks!



Re: pkg server for current/arm64 stopped ? [main-armv7 on ampere2, elapsed so far: 651:21:56]

2024-04-26 Thread Mark Millard
On Apr 26, 2024, at 18:55, Philip Paeps  wrote:

> On 2024-04-18 23:02:30 (+0800), Mark Millard wrote:
>> void  wrote on
>> Date: Thu, 18 Apr 2024 14:08:36 UTC :
>> 
>>> Not sure where to post this..
>>> 
>>> The last bulk build for arm64 appears to have happened around
>>> mid-March on ampere2. Is it broken?
>> 
>> main-armv7 building is broken and the last completed build
>> was the one started on Mon, 19 Feb 2024 12:32:10 GMT. It
>> gets stuck making no progress until manually forced to stop,
>> which leads to huge elapsed times for the incomplete builds:
>> 
>> pd5512ae7b8c6_s75464941dc 34472 12282  (+9196) 107  (+77) 4753  (+2247) 1390 
>>  (+529) 15940 parallel_build: Fri, 22 Mar 2024 11:05:01 GMT 651:21:56
>> 
>> p43e3af5f5763_sf5f08e41aa 19809 5919  (+3126) 137  (+100) 5363  (+2741) 1395 
>>  (+522) 6995 parallel_build: Wed, 28 Feb 2024 15:46:14 GMT 359:42:14 ampere2
>> 
>> ampere2 alternates between trying to build main-arm64 and main-armv7, so 
>> main-armv7 being stuck blocks main-arm64 from building.
>> 
>> One can see that all 13 job ID's show over 570 hours:
>> 
>> http://ampere2.nyi.freebsd.org/build.html?mastername=main-armv7-default=pd5512ae7b8c6_s75464941dc
>> 
>> It is not random which packages are building when this happens. Compare:
>> 
>> http://ampere2.nyi.freebsd.org/build.html?mastername=main-armv7-default=p43e3af5f5763_sf5f08e41aa
>> 
>> By contrast, the 19 Feb 2024 from-scratch (full) build worked:
>> 
>> http://ampere2.nyi.freebsd.org/build.html?mastername=main-armv7-default=pe9c9c73181b5_sbd45bbe440
>> 
>> My guess is that FreeBSD has something that broken after bd45bbe440
>> that was broken as of f5f08e41aa and was still broken at 75464941dc .
> 
> It looks like ampere2 is going to end up in this state again:
> 
> https://pkg-status.freebsd.org/ampere2/build.html?mastername=main-armv7-default=p1c7a816cd0ad_s1bd4f769ca
> 
> It's got a couple of things stuck in -depends already.  I'll keep an eye on 
> it for the next hour or two.  If no progress is made, I'll kill this build 
> and force an upgrade.  The next build will start at 01:01 UTC Sunday.  So we 
> won't have long to wait before it tries again.
> 
> ampere1 is chewing away at llvm, and doesn't look stuck.
> 
> ampere3 has been upgraded.

Output from the likes of:

# ps -axldww

could be interesting. As might be output from:

# pstat -k -k PIDs_OF_STUCK_PROCESSES

(kernel stack backtraces).


===
Mark Millard
marklmi at yahoo.com




Re: pkg server for current/arm64 stopped ? [main-armv7 on ampere2, elapsed so far: 651:21:56]

2024-04-26 Thread Philip Paeps

On 2024-04-18 23:02:30 (+0800), Mark Millard wrote:

void  wrote on
Date: Thu, 18 Apr 2024 14:08:36 UTC :


Not sure where to post this..

The last bulk build for arm64 appears to have happened around
mid-March on ampere2. Is it broken?


main-armv7 building is broken and the last completed build
was the one started on Mon, 19 Feb 2024 12:32:10 GMT. It
gets stuck making no progress until manually forced to stop,
which leads to huge elapsed times for the incomplete builds:

pd5512ae7b8c6_s75464941dc 34472 12282  (+9196) 107  (+77) 4753  
(+2247) 1390  (+529) 15940 parallel_build: Fri, 22 Mar 2024 11:05:01 
GMT 651:21:56


p43e3af5f5763_sf5f08e41aa 19809 5919  (+3126) 137  (+100) 5363  
(+2741) 1395  (+522) 6995 parallel_build: Wed, 28 Feb 2024 15:46:14 
GMT 359:42:14 ampere2


ampere2 alternates between trying to build main-arm64 and main-armv7, 
so main-armv7 being stuck blocks main-arm64 from building.


One can see that all 13 job ID's show over 570 hours:

http://ampere2.nyi.freebsd.org/build.html?mastername=main-armv7-default=pd5512ae7b8c6_s75464941dc

It is not random which packages are building when this happens. 
Compare:


http://ampere2.nyi.freebsd.org/build.html?mastername=main-armv7-default=p43e3af5f5763_sf5f08e41aa

By contrast, the 19 Feb 2024 from-scratch (full) build worked:

http://ampere2.nyi.freebsd.org/build.html?mastername=main-armv7-default=pe9c9c73181b5_sbd45bbe440

My guess is that FreeBSD has something that broken after bd45bbe440
that was broken as of f5f08e41aa and was still broken at 75464941dc .


It looks like ampere2 is going to end up in this state again:

https://pkg-status.freebsd.org/ampere2/build.html?mastername=main-armv7-default=p1c7a816cd0ad_s1bd4f769ca

It's got a couple of things stuck in -depends already.  I'll keep an eye 
on it for the next hour or two.  If no progress is made, I'll kill this 
build and force an upgrade.  The next build will start at 01:01 UTC 
Sunday.  So we won't have long to wait before it tries again.


ampere1 is chewing away at llvm, and doesn't look stuck.

ampere3 has been upgraded.

Philip



Re: TXT Kernel linking failed on -CURRENT

2024-04-24 Thread Konstantin Belousov
On Wed, Apr 24, 2024 at 01:12:39PM +0500, BSD USER wrote:
> linking kernel
> ld: error: undefined symbol: ktrcapfail
> >>> referenced by vfs_lookup.c
> >>>   vfs_lookup.o:(namei)
> >>> referenced by vfs_lookup.c
> >>>   vfs_lookup.o:(namei_setup)
> >>> referenced by vfs_lookup.c
> >>>   vfs_lookup.o:(vfs_lookup)
> >>> referenced 3 more times
> *** [kernel] Error code 1

Try
https://reviews.freebsd.org/D44931



TXT Kernel linking failed on -CURRENT

2024-04-24 Thread BSD USER

Sorry for HTML-trash from previous mail :)

Hi, FreeBSD Community!
I have a teach with FreeBSD and use -CURRENT on my test machine.
And some days ago after
- git pull
- make buildworld
- make buildkernel
There is /etc/src.conf and BSDSERV below, what can cause that error?
Thanks for help!

My /usr/src state is:

 git log -n 1
commit a0d7d68a2dd818ce84e37e1ff20c8849cda6d853 (HEAD -> main, 
origin/main, origin/HEAD)

Author: Cy Schubert 


kernel building failed with such messages:
--
--- force-dynamic-hack.pico ---
cc -target x86_64-unknown-freebsd15.0 
--sysroot=/usr/obj/usr/src/amd64.amd64/tmp 
-B/usr/obj/usr/src/amd64.amd64/tmp/usr/bin  -shared -O2 -pipe 
-fno-strict-aliasing -march=native  -nostdinc  -I. -I/usr/src/sys -I/u
sr/src/sys/contrib/ck/include -I/usr/src/sys/contrib/libfdt -D_KERNEL 
-DHAVE_KERNEL_OPTION_HEADERS -include opt_global.h -fno-common    -MD  
-MF.depend.force-dynamic-hack.pico -MTforce-dynamic-hack.pico -fdebug-pr
efix-map=./machine=/usr/src/sys/amd64/include 
-fdebug-prefix-map=./x86=/usr/src/sys/x86/include 
-fdebug-prefix-map=./i386=/usr/src/sys/i386/include -mcmodel=kernel 
-mno-red-zone -mno-mmx -mno-sse -msoft-float -fn
o-asynchronous-unwind-tables -ffreestanding -fwrapv -Wall 
-Wstrict-prototypes -Wmissing-prototypes -Wpointer-arith -Wcast-qual 
-Wundef -Wno-pointer-sign -D__printf__=__freebsd_kprintf__ 
-Wmissing-include-dirs -fdi
agnostics-show-option -Wno-unknown-pragmas -Wswitch 
-Wno-error=tautological-compare -Wno-error=empty-body 
-Wno-error=parentheses-equality -Wno-error=unused-function 
-Wno-error=pointer-sign -Wno-error=shift-negativ
e-value -Wno-address-of-packed-member -Wno-format-zero-length   -mno-aes 
-mno-avx  -std=gnu99 -nostdlib  force-dynamic-hack.c -o 
force-dynamic-hack.pico

--- vers.c ---
MAKE="make" sh /usr/src/sys/conf/newvers.sh  BSDSERV
--- vers.o ---
cc -target x86_64-unknown-freebsd15.0 
--sysroot=/usr/obj/usr/src/amd64.amd64/tmp 
-B/usr/obj/usr/src/amd64.amd64/tmp/usr/bin -c -O2 -pipe 
-fno-strict-aliasing -march=native  -nostdinc  -I. -I/usr/src/sys -I/usr/src
/sys/contrib/ck/include -I/usr/src/sys/contrib/libfdt -D_KERNEL 
-DHAVE_KERNEL_OPTION_HEADERS -include opt_global.h -fno-common 
-fdebug-prefix-map=./machine=/usr/src/sys/amd64/include 
-fdebug-prefix-map=./x86=/
usr/src/sys/x86/include 
-fdebug-prefix-map=./i386=/usr/src/sys/i386/include -mcmodel=kernel 
-mno-red-zone -mno-mmx -mno-sse -msoft-float 
-fno-asynchronous-unwind-tables -ffreestanding -fwrapv -Wall -Wstrict-proto
types -Wmissing-prototypes -Wpointer-arith -Wcast-qual -Wundef 
-Wno-pointer-sign -D__printf__=__freebsd_kprintf__ 
-Wmissing-include-dirs -fdiagnostics-show-option -Wno-unknown-pragmas 
-Wswitch -Wno-error=tautologi
cal-compare -Wno-error=empty-body -Wno-error=parentheses-equality 
-Wno-error=unused-function -Wno-error=pointer-sign 
-Wno-error=shift-negative-value -Wno-address-of-packed-member 
-Wno-format-zero-length -mno-aes

 -mno-avx  -std=gnu99 -Werror vers.c
--- kernel ---
linking kernel
ld: error: undefined symbol: ktrcapfail
>>> referenced by vfs_lookup.c
>>>   vfs_lookup.o:(namei)
>>> referenced by vfs_lookup.c
>>>   vfs_lookup.o:(namei_setup)
>>> referenced by vfs_lookup.c
>>>   vfs_lookup.o:(vfs_lookup)
>>> referenced 3 more times
*** [kernel] Error code 1
make[2]: stopped in /usr/obj/usr/src/amd64.amd64/sys/BSDSERV
make[2]: 1 error
make[2]: stopped in /usr/obj/usr/src/amd64.amd64/sys/BSDSERV
 1098.27 real  2002.17 user   176.26 sys
make[1]: stopped in /usr/src
make: stopped in /usr/src

/etc/src.conf
===
WITHOUT_APM=yes
WITHOUT_ASSERT_DEBUG=yes
WITHOUT_AUTHPF=yes
WITHOUT_BHYVE=yes
WITHOUT_BLACKLIST=yes
WITHOUT_BLUETOOTH=yes
WITHOUT_CCD=yes
WITHOUT_CXGBETOOL=yes
WITHOUT_DEBUG_FILES=yes
WITHOUT_DTRACE=yes
WITHOUT_FLOPPY=yes
WITHOUT_GOOGLETEST=yes
WITHOUT_HAST=yes
WITHOUT_HTML=yes
WITHOUT_HYPERV=yes
WITHOUT_INET6=yes
WITHOUT_IPFILTER=yes
WITHOUT_ISCSI=yes
WITHOUT_KDUMP=yes
WITHOUT_KERNEL_SYMBOLS=yes
WITH_MALLOC_PRODUCTION=yes
WITHOUT_MLX5TOOL=yes
WITHOUT_NVME=yes
WITHOUT_OFED=yes
WITHOUT_PF=yes
WITHOUT_PTHREADS_ASSERTIONS=yes
WITHOUT_RADIUS_SUPPORT=yes
WITHOUT_RELRO=yes
WITHOUT_SSP=yes
WITHOUT_WARNS=yes
WITHOUT_WERROR=yes
WITHOUT_TESTS=yes
WITHOUT_WIRELESS=yes
BSDSERV
===
cpu HAMMER
ident   BSDSERV
device  amdtemp
options SCHED_ULE   # ULE scheduler
options PREEMPTION  # Enable kernel thread preemption
options VIMAGE  # Subsystem virtualization, e.g. 
VNET

options INET    # InterNETworking
options TCP_OFFLOAD   

Kernel linking error on -CURRENT

2024-04-24 Thread USER BSD
Hi, FreeBSD Community! I have a teach with FreeBSD and use -CURRENT on my test machine.And some days ago after- git pull- make buildworld- make buildkernel There is /etc/src.conf and BSDSERV below, what can cause that error?Thanks for help! kernel building failed with such messages:- force-dynamic-hack.pico ---cc -target x86_64-unknown-freebsd15.0 --sysroot=/usr/obj/usr/src/amd64.amd64/tmp -B/usr/obj/usr/src/amd64.amd64/tmp/usr/bin  -shared -O2 -pipe -fno-strict-aliasing -march=native  -nostdinc  -I. -I/usr/src/sys -I/usr/src/sys/contrib/ck/include -I/usr/src/sys/contrib/libfdt -D_KERNEL -DHAVE_KERNEL_OPTION_HEADERS -include opt_global.h -fno-common    -MD  -MF.depend.force-dynamic-hack.pico -MTforce-dynamic-hack.pico -fdebug-prefix-map=./machine=/usr/src/sys/amd64/include -fdebug-prefix-map=./x86=/usr/src/sys/x86/include -fdebug-prefix-map=./i386=/usr/src/sys/i386/include -mcmodel=kernel -mno-red-zone -mno-mmx -mno-sse -msoft-float  -fno-asynchronous-unwind-tables -ffreestanding -fwrapv -Wall -Wstrict-prototypes -Wmissing-prototypes -Wpointer-arith -Wcast-qual -Wundef -Wno-pointer-sign -D__printf__=__freebsd_kprintf__ -Wmissing-include-dirs -fdiagnostics-show-option -Wno-unknown-pragmas -Wswitch -Wno-error=tautological-compare -Wno-error=empty-body -Wno-error=parentheses-equality -Wno-error=unused-function -Wno-error=pointer-sign -Wno-error=shift-negative-value -Wno-address-of-packed-member -Wno-format-zero-length   -mno-aes -mno-avx  -std=gnu99 -nostdlib  force-dynamic-hack.c -o force-dynamic-hack.pico--- vers.c ---MAKE="make" sh /usr/src/sys/conf/newvers.sh  BSDSERV--- vers.o ---cc -target x86_64-unknown-freebsd15.0 --sysroot=/usr/obj/usr/src/amd64.amd64/tmp -B/usr/obj/usr/src/amd64.amd64/tmp/usr/bin -c -O2 -pipe -fno-strict-aliasing -march=native  -nostdinc  -I. -I/usr/src/sys -I/usr/src/sys/contrib/ck/include -I/usr/src/sys/contrib/libfdt -D_KERNEL -DHAVE_KERNEL_OPTION_HEADERS -include opt_global.h -fno-common -fdebug-prefix-map=./machine=/usr/src/sys/amd64/include -fdebug-prefix-map=./x86=/usr/src/sys/x86/include -fdebug-prefix-map=./i386=/usr/src/sys/i386/include -mcmodel=kernel -mno-red-zone -mno-mmx -mno-sse -msoft-float  -fno-asynchronous-unwind-tables -ffreestanding -fwrapv -Wall -Wstrict-prototypes -Wmissing-prototypes -Wpointer-arith -Wcast-qual -Wundef -Wno-pointer-sign -D__printf__=__freebsd_kprintf__ -Wmissing-include-dirs -fdiagnostics-show-option -Wno-unknown-pragmas -Wswitch -Wno-error=tautological-compare -Wno-error=empty-body -Wno-error=parentheses-equality -Wno-error=unused-function -Wno-error=pointer-sign -Wno-error=shift-negative-value -Wno-address-of-packed-member -Wno-format-zero-length   -mno-aes -mno-avx  -std=gnu99 -Werror vers.c--- kernel ---linking kernelld: error: undefined symbol: ktrcapfail>>> referenced by vfs_lookup.c>>>   vfs_lookup.o:(namei)>>> referenced by vfs_lookup.c>>>   vfs_lookup.o:(namei_setup)>>> referenced by vfs_lookup.c>>>   vfs_lookup.o:(vfs_lookup)>>> referenced 3 more times*** [kernel] Error code 1 make[2]: stopped in /usr/obj/usr/src/amd64.amd64/sys/BSDSERVmake[2]: 1 error make[2]: stopped in /usr/obj/usr/src/amd64.amd64/sys/BSDSERV 1098.27 real  2002.17 user   176.26 sys make[1]: stopped in /usr/src make: stopped in /usr/src  /etc/src.conf===WITHOUT_APM=yesWITHOUT_ASSERT_DEBUG=yesWITHOUT_AUTHPF=yesWITHOUT_BHYVE=yesWITHOUT_BLACKLIST=yesWITHOUT_BLUETOOTH=yesWITHOUT_CCD=yesWITHOUT_CXGBETOOL=yesWITHOUT_DEBUG_FILES=yesWITHOUT_DTRACE=yesWITHOUT_FLOPPY=yesWITHOUT_GOOGLETEST=yesWITHOUT_HAST=yesWITHOUT_HTML=yesWITHOUT_HYPERV=yesWITHOUT_INET6=yesWITHOUT_IPFILTER=yesWITHOUT_ISCSI=yesWITHOUT_KDUMP=yesWITHOUT_KERNEL_SYMBOLS=yesWITH_MALLOC_PRODUCTION=yesWITHOUT_MLX5TOOL=yesWITHOUT_NVME=yesWITHOUT_OFED=yesWITHOUT_PF=yesWITHOUT_PTHREADS_ASSERTIONS=yesWITHOUT_RADIUS_SUPPORT=yesWITHOUT_RELRO=yesWITHOUT_SSP=yesWITHOUT_WARNS=yesWITHOUT_WERROR=yesWITHOUT_TESTS=yesWITHOUT_WIRELESS=yes BSDSERV===cpu HAMMERident   BSDSERVdevice  amdtempoptions SCHED_ULE   # ULE scheduleroptions PREEMPTION  # Enable kernel thread preemptionoptions VIMAGE  # Subsystem virtualization, e.g. VNEToptions INET    # InterNETworkingoptions TCP_OFFLOAD # TCP offloadoptions TCP_BLACKBOX    # Enhanced TCP event loggingoptions TCP_HHOOK   # hhook(9) framework for TCPoptions TCP_RFC7413 # TCP Fast Openoptions KERN_TLS    # TLS transmit & receive offloadoptions FFS # Berkeley Fast Filesystemoptions SOF

Re: pkg server for current/arm64 stopped ? [main-armv7 on ampere2, elapsed so far: 651:21:56]

2024-04-23 Thread Philip Paeps

On 2024-04-24 02:12:41 (+0800), Mark Millard wrote:


On Apr 19, 2024, at 07:16, Philip Paeps  wrote:


On 2024-04-18 23:02:30 (+0800), Mark Millard wrote:


void  wrote on
Date: Thu, 18 Apr 2024 14:08:36 UTC :


Not sure where to post this..

The last bulk build for arm64 appears to have happened around
mid-March on ampere2. Is it broken?


main-armv7 building is broken and the last completed build
was the one started on Mon, 19 Feb 2024 12:32:10 GMT. It
gets stuck making no progress until manually forced to stop,
which leads to huge elapsed times for the incomplete builds:

pd5512ae7b8c6_s75464941dc 34472 12282  (+9196) 107  (+77) 4753  
(+2247) 1390  (+529) 15940 parallel_build: Fri, 22 Mar 2024 11:05:01 
GMT 651:21:56


p43e3af5f5763_sf5f08e41aa 19809 5919  (+3126) 137  (+100) 5363  
(+2741) 1395  (+522) 6995 parallel_build: Wed, 28 Feb 2024 15:46:14 
GMT 359:42:14 ampere2


ampere2 alternates between trying to build main-arm64 and 
main-armv7, so main-armv7 being stuck blocks main-arm64 from 
building.


One can see that all 13 job ID's show over 570 hours:

http://ampere2.nyi.freebsd.org/build.html?mastername=main-armv7-default=pd5512ae7b8c6_s75464941dc

It is not random which packages are building when this happens. 
Compare:


http://ampere2.nyi.freebsd.org/build.html?mastername=main-armv7-default=p43e3af5f5763_sf5f08e41aa

By contrast, the 19 Feb 2024 from-scratch (full) build worked:

http://ampere2.nyi.freebsd.org/build.html?mastername=main-armv7-default=pe9c9c73181b5_sbd45bbe440

My guess is that FreeBSD has something that broken after bd45bbe440
that was broken as of f5f08e41aa and was still broken at 75464941dc 
.


I'll kill the build on ampere2 again.  Thanks for the nudge.

We don't really have good monitoring for this.  Also: builds should 
time out after 36 hours.  The fact that this one does not is a bug in 
itself.


Philip [hat: clusteradm]


I'll note that I've never managed to replicate the problem for
building for armv7 on aarch64. But my context never has the
likes of:

QUOTE
Host OSVERSION: 156
Jail OSVERSION: 1500015
. . .
!!! Jail is newer than host. (Jail: 1500015, Host: 156) !!!
!!! This is not supported. !!!
!!! Host kernel must be same or newer than jail. !!!
!!! Expect build failures. !!!
END QUOTE

but always has the two OSVERSION's the same, such as:

Host OSVERSION: 1500015
Jail OSVERSION: 1500015

or, recently,

Host OSVERSION: 1500018
Jail OSVERSION: 1500018

My bulk runs do go through the sequence where the hangups
have repeated for main-armv7 on ampere2.

I wonder what would happen if "Host OSVERSION" was updated
(modernized) to match the modern "Jail OSVERSION" that would
be used?


The package builders are due for a regular refresh to newer -CURRENT 
dogfood.  I'll do the aarch64 builders first this time.


I've set /root/stop-builds on them.  I'll upgrade them when they go 
idle.  Or I'll kill them if they take much longer to build what they're 
building.  It annoys me that they do not stop building after 36 hours, 
like they're supposed to.


They're currently running:

n266879-6abee52e0d79   2023-12-09 01:06:28 jlduran strfmon: Silence 
scan-build warning


Our current clusteradm build is:

n269399-bbc6e6c5ec8c   2024-04-14 03:12:36 sigsys daemon: fix -R to 
enable supervision mode


I may do a new build while waiting for them to go idle:

-   quarterly 140arm64 1b931669de11 parallel_build 28776 15299   33  588 
   985 0  11871 3D:01:08:29 
https://pkg-status.freebsd.org/ampere1/build.html?mastername=140arm64-quarterly=1b931669de11
-   default main-arm64 p1c7a816cd0ad_s1bd4f769caf parallel_build 34528 
19888   65  669980 0  12926 4D:00:52:21 
https://pkg-status.freebsd.org/ampere2/build.html?mastername=main-arm64-default=p1c7a816cd0ad_s1bd4f769caf
-   default 140releng-armv7 2910ff97e727 parallel_build 34543 14826   60 
5539   1397 0  12721 1D:09:35:28 
https://pkg-status.freebsd.org/ampere3/build.html?mastername=140releng-armv7-default=2910ff97e727


Philip



Re: pkg server for current/arm64 stopped ? [main-armv7 on ampere2, elapsed so far: 651:21:56]

2024-04-23 Thread Mark Millard
On Apr 19, 2024, at 07:16, Philip Paeps  wrote:

> On 2024-04-18 23:02:30 (+0800), Mark Millard wrote:
> 
>> void  wrote on
>> Date: Thu, 18 Apr 2024 14:08:36 UTC :
>> 
>>> Not sure where to post this..
>>> 
>>> The last bulk build for arm64 appears to have happened around
>>> mid-March on ampere2. Is it broken?
>> 
>> main-armv7 building is broken and the last completed build
>> was the one started on Mon, 19 Feb 2024 12:32:10 GMT. It
>> gets stuck making no progress until manually forced to stop,
>> which leads to huge elapsed times for the incomplete builds:
>> 
>> pd5512ae7b8c6_s75464941dc 34472 12282  (+9196) 107  (+77) 4753  (+2247) 1390 
>>  (+529) 15940 parallel_build: Fri, 22 Mar 2024 11:05:01 GMT 651:21:56
>> 
>> p43e3af5f5763_sf5f08e41aa 19809 5919  (+3126) 137  (+100) 5363  (+2741) 1395 
>>  (+522) 6995 parallel_build: Wed, 28 Feb 2024 15:46:14 GMT 359:42:14 ampere2
>> 
>> ampere2 alternates between trying to build main-arm64 and main-armv7, so 
>> main-armv7 being stuck blocks main-arm64 from building.
>> 
>> One can see that all 13 job ID's show over 570 hours:
>> 
>> http://ampere2.nyi.freebsd.org/build.html?mastername=main-armv7-default=pd5512ae7b8c6_s75464941dc
>> 
>> It is not random which packages are building when this happens. Compare:
>> 
>> http://ampere2.nyi.freebsd.org/build.html?mastername=main-armv7-default=p43e3af5f5763_sf5f08e41aa
>> 
>> By contrast, the 19 Feb 2024 from-scratch (full) build worked:
>> 
>> http://ampere2.nyi.freebsd.org/build.html?mastername=main-armv7-default=pe9c9c73181b5_sbd45bbe440
>> 
>> My guess is that FreeBSD has something that broken after bd45bbe440
>> that was broken as of f5f08e41aa and was still broken at 75464941dc .
> 
> I'll kill the build on ampere2 again.  Thanks for the nudge.
> 
> We don't really have good monitoring for this.  Also: builds should time out 
> after 36 hours.  The fact that this one does not is a bug in itself.
> 
> Philip [hat: clusteradm]

I'll note that I've never managed to replicate the problem for
building for armv7 on aarch64. But my context never has the
likes of:

QUOTE
Host OSVERSION: 156
Jail OSVERSION: 1500015
 . .
!!! Jail is newer than host. (Jail: 1500015, Host: 156) !!!
!!! This is not supported. !!!
!!! Host kernel must be same or newer than jail. !!!
!!! Expect build failures. !!!
END QUOTE

but always has the two OSVERSION's the same, such as:

Host OSVERSION: 1500015
Jail OSVERSION: 1500015

or, recently,

Host OSVERSION: 1500018
Jail OSVERSION: 1500018

My bulk runs do go through the sequence where the hangups
have repeated for main-armv7 on ampere2.

I wonder what would happen if "Host OSVERSION" was updated
(modernized) to match the modern "Jail OSVERSION" that would
be used?



===
Mark Millard
marklmi at yahoo.com




Re: pkg server for current/arm64 stopped ? [main-armv7 on ampere2, elapsed so far: 651:21:56]

2024-04-19 Thread Philip Paeps

On 2024-04-18 23:02:30 (+0800), Mark Millard wrote:


void  wrote on
Date: Thu, 18 Apr 2024 14:08:36 UTC :


Not sure where to post this..

The last bulk build for arm64 appears to have happened around
mid-March on ampere2. Is it broken?


main-armv7 building is broken and the last completed build
was the one started on Mon, 19 Feb 2024 12:32:10 GMT. It
gets stuck making no progress until manually forced to stop,
which leads to huge elapsed times for the incomplete builds:

pd5512ae7b8c6_s75464941dc 34472 12282  (+9196) 107  (+77) 4753  
(+2247) 1390  (+529) 15940 parallel_build: Fri, 22 Mar 2024 11:05:01 
GMT 651:21:56


p43e3af5f5763_sf5f08e41aa 19809 5919  (+3126) 137  (+100) 5363  
(+2741) 1395  (+522) 6995 parallel_build: Wed, 28 Feb 2024 15:46:14 
GMT 359:42:14 ampere2


ampere2 alternates between trying to build main-arm64 and main-armv7, 
so main-armv7 being stuck blocks main-arm64 from building.


One can see that all 13 job ID's show over 570 hours:

http://ampere2.nyi.freebsd.org/build.html?mastername=main-armv7-default=pd5512ae7b8c6_s75464941dc

It is not random which packages are building when this happens. 
Compare:


http://ampere2.nyi.freebsd.org/build.html?mastername=main-armv7-default=p43e3af5f5763_sf5f08e41aa

By contrast, the 19 Feb 2024 from-scratch (full) build worked:

http://ampere2.nyi.freebsd.org/build.html?mastername=main-armv7-default=pe9c9c73181b5_sbd45bbe440

My guess is that FreeBSD has something that broken after bd45bbe440
that was broken as of f5f08e41aa and was still broken at 75464941dc .


I'll kill the build on ampere2 again.  Thanks for the nudge.

We don't really have good monitoring for this.  Also: builds should time 
out after 36 hours.  The fact that this one does not is a bug in itself.


Philip [hat: clusteradm]



Re: pkg server for current/arm64 stopped ? [main-armv7 on ampere2, elapsed so far: 651:21:56]

2024-04-18 Thread void

On Thu, Apr 18, 2024 at 08:02:30AM -0700, Mark Millard wrote:

void  wrote on
Date: Thu, 18 Apr 2024 14:08:36 UTC :


Not sure where to post this..

The last bulk build for arm64 appears to have happened around
mid-March on ampere2. Is it broken?


main-armv7 building is broken and the last completed build
was the one started on Mon, 19 Feb 2024 12:32:10 GMT. It
gets stuck making no progress until manually forced to stop,
which leads to huge elapsed times for the incomplete builds:


Should I report it in bugzilla?

--



Re: pkg server for current/arm64 stopped ? [main-armv7 on ampere2, elapsed so far: 651:21:56]

2024-04-18 Thread Mark Millard



On Apr 18, 2024, at 08:02, Mark Millard  wrote:

> void  wrote on
> Date: Thu, 18 Apr 2024 14:08:36 UTC :
> 
>> Not sure where to post this..
>> 
>> The last bulk build for arm64 appears to have happened around
>> mid-March on ampere2. Is it broken?
> 
> main-armv7 building is broken and the last completed build
> was the one started on Mon, 19 Feb 2024 12:32:10 GMT. It
> gets stuck making no progress until manually forced to stop,
> which leads to huge elapsed times for the incomplete builds:
> 
> pd5512ae7b8c6_s75464941dc 34472 12282  (+9196) 107  (+77) 4753  (+2247) 1390  
> (+529) 15940 parallel_build: Fri, 22 Mar 2024 11:05:01 GMT 651:21:56
> 
> p43e3af5f5763_sf5f08e41aa 19809 5919  (+3126) 137  (+100) 5363  (+2741) 1395  
> (+522) 6995 parallel_build: Wed, 28 Feb 2024 15:46:14 GMT 359:42:14 ampere2
> 
> ampere2 alternates between trying to build main-arm64 and main-armv7, so 
> main-armv7 being stuck blocks main-arm64 from building.
> 
> One can see that all 13 job ID's show over 570 hours:
> 
> http://ampere2.nyi.freebsd.org/build.html?mastername=main-armv7-default=pd5512ae7b8c6_s75464941dc
> 
> It is not random which packages are building when this happens. Compare:
> 
> http://ampere2.nyi.freebsd.org/build.html?mastername=main-armv7-default=p43e3af5f5763_sf5f08e41aa
> 
> By contrast, the 19 Feb 2024 from-scratch (full) build worked:
> 
> http://ampere2.nyi.freebsd.org/build.html?mastername=main-armv7-default=pe9c9c73181b5_sbd45bbe440
> 
> My guess is that FreeBSD has something that broken after bd45bbe440
> that was broken as of f5f08e41aa and was still broken at 75464941dc .
> 

One thing of possible note:

Failing . . .

Host OSVERSION: 156
Jail OSVERSION: 1500014

and, more recently,

Host OSVERSION: 156
Jail OSVERSION: 1500015

But the most recent working had . . .

Host OSVERSION: 156
Jail OSVERSION: 1500014

So, if it is a FreeBSD problem, it seems to have started during 1500014 .


===
Mark Millard
marklmi at yahoo.com




pkg server for current/arm64 stopped ? [main-armv7 on ampere2, elapsed so far: 651:21:56]

2024-04-18 Thread Mark Millard
void  wrote on
Date: Thu, 18 Apr 2024 14:08:36 UTC :

> Not sure where to post this..
> 
> The last bulk build for arm64 appears to have happened around
> mid-March on ampere2. Is it broken?

main-armv7 building is broken and the last completed build
was the one started on Mon, 19 Feb 2024 12:32:10 GMT. It
gets stuck making no progress until manually forced to stop,
which leads to huge elapsed times for the incomplete builds:

pd5512ae7b8c6_s75464941dc 34472 12282  (+9196) 107  (+77) 4753  (+2247) 1390  
(+529) 15940 parallel_build: Fri, 22 Mar 2024 11:05:01 GMT 651:21:56

p43e3af5f5763_sf5f08e41aa 19809 5919  (+3126) 137  (+100) 5363  (+2741) 1395  
(+522) 6995 parallel_build: Wed, 28 Feb 2024 15:46:14 GMT 359:42:14 ampere2

ampere2 alternates between trying to build main-arm64 and main-armv7, so 
main-armv7 being stuck blocks main-arm64 from building.

One can see that all 13 job ID's show over 570 hours:

http://ampere2.nyi.freebsd.org/build.html?mastername=main-armv7-default=pd5512ae7b8c6_s75464941dc

It is not random which packages are building when this happens. Compare:

http://ampere2.nyi.freebsd.org/build.html?mastername=main-armv7-default=p43e3af5f5763_sf5f08e41aa

By contrast, the 19 Feb 2024 from-scratch (full) build worked:

http://ampere2.nyi.freebsd.org/build.html?mastername=main-armv7-default=pe9c9c73181b5_sbd45bbe440

My guess is that FreeBSD has something that broken after bd45bbe440
that was broken as of f5f08e41aa and was still broken at 75464941dc .


===
Mark Millard
marklmi at yahoo.com



Re: Multiple issues with current (kldload failures, missing CTF stuff, pty issues, ...)

2024-03-31 Thread Alexander Leidinger

Am 2024-03-29 18:21, schrieb Alexander Leidinger:

Am 2024-03-29 18:13, schrieb Mark Johnston:

On Fri, Mar 29, 2024 at 04:52:55PM +0100, Alexander Leidinger wrote:

Hi,

sources from 2024-03-11 work. Sources from 2024-03-25 and today don't 
work
(see below for the issue). As the monthly stabilisation pass didn't 
find

obvious issues, it is something related to my setup:
 - not a generic kernel
 - very modular kernel (as much as possible as a module)
 - bind_now (a build without fails too, tested with clean /usr/obj)
 - ccache (a build without fails too, tested with clean /usr/obj)
 - kernel retpoline (build without in progress)
 - userland retpoline (build without in progress)
 - kernel build with WITH_CTF / DDB_CTF (next one to test if it isn't
retpoline)
 - -fno-builtin
 - CPUFLAGS=native (except for stuff in /usr/src/sys/boot)
 - malloc production
 - COPTFLAGS= -O2 -pipe

The issue is, that kernel modules load OK from loader, but once it 
starts
init any module fails to load (e.g. via autodetection of hardware or 
rc.conf
kld_list) with the message that the kernel and module versions are 
out of

sync and the module refuses to load.


What is the exact revision you're running?  There were some unrelated
changes to the kernel linker around the same time.


The working src is from 2024-03-11-094351 (GMT+0100).
The failing src was fetched after Glebs stabilization week message (and 
todays src before the sound stuff still fails).


Retpoline wasn't the cause, next test is the CTF stuff in the kernel...


A rather obscure problem was causing this. The "last" BE had canmount 
set to "on" instead of "noauto". No idea how this happened, but this 
resulted in the "last" BE to be mounted on "zfs mount -a" on top of the 
current BE. This means that all modules loaded after the zfs rc script 
has run was loading old kernel modules and the error message of kernel 
version mismatch was correct. I fiund the issue while bisecting the tree 
and suddenly the error message went away but the new issue of missing 
dev entries popped up (/dev was mounted correctly on the booting 
dataset, but the last BE was mounted on top of it and /dev went 
empty...).


It looks to me like bectl was doing this (from "zpool history")...
2024-03-11.14:16:31 zpool set bootfs=rpool/ROOT/2024-03-11-094351 rpool
2024-03-11.14:16:31 zfs set canmount=noauto rpool/ROOT/2024-01-18-092730
2024-03-11.14:16:31 zfs set canmount=noauto rpool/ROOT/2024-02-10-144617
2024-03-11.14:16:32 zfs set canmount=noauto rpool/ROOT/2024-02-11-212006
2024-03-11.14:16:32 zfs set canmount=noauto rpool/ROOT/2024-02-16-082836
2024-03-11.14:16:32 zfs set canmount=noauto rpool/ROOT/2024-02-24-140211
2024-03-11.14:16:32 zfs set canmount=noauto 
rpool/ROOT/2024-02-24-140211_ok

2024-03-11.14:16:33 zfs set canmount=on rpool/ROOT/2024-03-11-094351
2024-03-11.14:16:33 zfs promote rpool/ROOT/2024-03-11-094351
2024-03-11.14:17:03 zfs destroy -r rpool/ROOT/2024-02-24-140211_ok

I surely didn't do the "zfs set canmount=..." for those by hand.

Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


signature.asc
Description: OpenPGP digital signature


Re: CURRENT on laptop ASUS VivoBook Pro 14 90NB0VZ2-M01230

2024-03-30 Thread Matthias Apitz



(For the not working Wifi chip, I use at the moment an USB-Wifi dongle,
Realtek RTL8191S WLAN Adapter, which works fine).

I also can't get Xorg plus twm up; it says in /var/log/Xorg.0.log at the end:

..
REDWOOD, ATI Mobility Radeon Graphics, CEDAR, ATI FirePro 2270,
ATI Radeon HD 5450, CAYMAN, AMD Radeon HD 6900 Series,
AMD Radeon HD 6900M Series, Mobility Radeon HD 6000 Series, BARTS,
AMD Radeon HD 6800 Series, AMD Radeon HD 6700 Series, TURKS, CAICOS,
ARUBA, TAHITI, PITCAIRN, VERDE, OLAND, HAINAN, BONAIRE, KABINI,
MULLINS, KAVERI, HAWAII
[   248.442] (II) modesetting: Driver for Modesetting Kernel Drivers: kms
[   248.442] (II) scfb: driver for wsdisplay framebuffer: scfb
[   248.442] (II) VESA: driver for VESA chipsets: vesa
[   248.442] (--) Using syscons driver with X support (version 2.0)
[   248.442] (--) using VT number 9

[   248.447] (EE) open /dev/dri/card0: No such file or directory
[   248.447] (WW) Falling back to old probe method for modesetting
[   248.447] (EE) open /dev/dri/card0: No such file or directory
[   248.447] (WW) Falling back to old probe method for scfb
[   248.447] scfb trace: probe start
..

The kernel modules loaded are:

Id Refs AddressSize Name
 1  139 0x8020  1d4f010 kernel
 21 0x81f5 36c0 coretemp.ko
 31 0x81f55000 9c48 if_cdce.ko
 42 0x81f5f000 6138 uether.ko
 51 0x81f66000 a698 cuse.ko
 61 0x81f71000f7f38 ipl.ko
 71 0x83c0   462be0 zfs.ko
 81 0x84063000   1510b8 radeonkms.ko
 92 0x841b500073da0 drm.ko
101 0x83bd7000 22a8 iic.ko
113 0x83bda000 1100 linuxkpi_gplv2.ko
124 0x83bdc000 6320 dmabuf.ko
134 0x83be3000 3080 linuxkpi_hdmi.ko
141 0x83be7000 c7b0 ttm.ko
151 0x83bf4000 3370 acpi_wmi.ko
161 0x83bf8000 5ee0 ig4.ko
171 0x84229000 3210 intpm.ko
181 0x8422d000 2178 smbus.ko
191 0x842330ad8 linux.ko
204 0x84261000 be30 linux_common.ko
211 0x8426d0002ccf8 linux64.ko
221 0x8429a000 2270 pty.ko
231 0x8429d000 3540 fdescfs.ko
241 0x842a1000 73c0 linprocfs.ko
251 0x842a9000 43e4 linsysfs.ko
261 0x842ae000 4d00 ng_ubt.ko
276 0x842b3000 bb28 netgraph.ko
282 0x842bf000 a238 ng_hci.ko
294 0x842ca000 2668 ng_bluetooth.ko
301 0x842cd000 a7e0 if_rsu.ko
311 0x842d8000 3218 iichid.ko
325 0x842dc000 32a8 hidbus.ko
331 0x842e f250 ng_l2cap.ko
341 0x842f19f08 ng_btsocket.ko
351 0x8430a000 38b8 ng_socket.ko
371 0x8432e000 21e0 hms.ko
381 0x84331000 40a8 hidmap.ko
391 0x84336000 334d hmt.ko
401 0x8433a000 22c4 hconf.ko

The complete Xorg.0.log is here: http://www.unixarea.de/Xorg.0.log.txt

Thanks in advance for ideas.

matthias
-- 
Matthias Apitz, ✉ g...@unixarea.de, http://www.unixarea.de/ +49-176-38902045
Public GnuPG key: http://www.unixarea.de/key.pub



Re: Multiple issues with current (kldload failures, missing CTF stuff, pty issues, ...)

2024-03-29 Thread Bojan Novković

On 3/29/24 16:52, Alexander Leidinger wrote:

Hi,

sources from 2024-03-11 work. Sources from 2024-03-25 and today don't 
work (see below for the issue). As the monthly stabilisation pass 
didn't find obvious issues, it is something related to my setup:

 - not a generic kernel
 - very modular kernel (as much as possible as a module)
 - bind_now (a build without fails too, tested with clean /usr/obj)
 - ccache (a build without fails too, tested with clean /usr/obj)
 - kernel retpoline (build without in progress)
 - userland retpoline (build without in progress)
 - kernel build with WITH_CTF / DDB_CTF (next one to test if it isn't 
retpoline)

 - -fno-builtin
 - CPUFLAGS=native (except for stuff in /usr/src/sys/boot)
 - malloc production
 - COPTFLAGS= -O2 -pipe

The issue is, that kernel modules load OK from loader, but once it 
starts init any module fails to load (e.g. via autodetection of 
hardware or rc.conf kld_list) with the message that the kernel and 
module versions are out of sync and the module refuses to load.


I tried the workaround to load the modules from the loader, which 
works, but then I can't login remotely as ssh fails to allocate a pty. 
By loading modules via the loader, I can see messages about missing 
CTF info when the nvidia modules (from ports = not yet rebuild = in 
/boot/modules/...ko instead of /boot/kernel/...ko) try to get 
initialised... and it looks like they are failing to get initialised 
because of this missing CTF stuff (I'm back to the previous boot env 
to be able to login remotely and send mails, I don't have a copy of 
the failure message at hand).


I assume the missing CTF stuff is due to the CTF based pretty printing 
(https://cgit.freebsd.org/src/commit/?id=c21bc6f3c2425de74141bfee07b609bf65b5a6b3). 
Is this supposed to fail to load modules which are compiled without 
CTF data? Shouldn't this work gracefully (e.g. spit out a warning that 
pretty printing is not available for module X and have the module 
working)?


This is indeed how it works, those messages are emitted by CTF loading 
routines in 'kern/kern_ctf.c' as a warning and do not affect the rest of 
the module loading process.


However, I completely agree that they are cryptic and spammy, I'll try 
to do something about that.


Bojan




Re: Multiple issues with current (kldload failures, missing CTF stuff, pty issues, ...)

2024-03-29 Thread Alexander Leidinger

Am 2024-03-29 18:13, schrieb Mark Johnston:

On Fri, Mar 29, 2024 at 04:52:55PM +0100, Alexander Leidinger wrote:

Hi,

sources from 2024-03-11 work. Sources from 2024-03-25 and today don't 
work
(see below for the issue). As the monthly stabilisation pass didn't 
find

obvious issues, it is something related to my setup:
 - not a generic kernel
 - very modular kernel (as much as possible as a module)
 - bind_now (a build without fails too, tested with clean /usr/obj)
 - ccache (a build without fails too, tested with clean /usr/obj)
 - kernel retpoline (build without in progress)
 - userland retpoline (build without in progress)
 - kernel build with WITH_CTF / DDB_CTF (next one to test if it isn't
retpoline)
 - -fno-builtin
 - CPUFLAGS=native (except for stuff in /usr/src/sys/boot)
 - malloc production
 - COPTFLAGS= -O2 -pipe

The issue is, that kernel modules load OK from loader, but once it 
starts
init any module fails to load (e.g. via autodetection of hardware or 
rc.conf
kld_list) with the message that the kernel and module versions are out 
of

sync and the module refuses to load.


What is the exact revision you're running?  There were some unrelated
changes to the kernel linker around the same time.


The working src is from 2024-03-11-094351 (GMT+0100).
The failing src was fetched after Glebs stabilization week message (and 
todays src before the sound stuff still fails).


Retpoline wasn't the cause, next test is the CTF stuff in the kernel...

I tried the workaround to load the modules from the loader, which 
works, but

then I can't login remotely as ssh fails to allocate a pty. By loading
modules via the loader, I can see messages about missing CTF info when 
the

nvidia modules (from ports = not yet rebuild = in /boot/modules/...ko
instead of /boot/kernel/...ko) try to get initialised... and it looks 
like
they are failing to get initialised because of this missing CTF stuff 
(I'm
back to the previous boot env to be able to login remotely and send 
mails, I

don't have a copy of the failure message at hand).

I assume the missing CTF stuff is due to the CTF based pretty printing 
(https://cgit.freebsd.org/src/commit/?id=c21bc6f3c2425de74141bfee07b609bf65b5a6b3).
Is this supposed to fail to load modules which are compiled without 
CTF
data? Shouldn't this work gracefully (e.g. spit out a warning that 
pretty

printing is not available for module X and have the module working)?


From my reading of linker_ctf_load_file(), this is exactly how it
already works.


Great that it works this way, I still suggest to print a message what 
the warning about missing stuff means.


Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


signature.asc
Description: OpenPGP digital signature


Re: Multiple issues with current (kldload failures, missing CTF stuff, pty issues, ...)

2024-03-29 Thread Mark Johnston
On Fri, Mar 29, 2024 at 04:52:55PM +0100, Alexander Leidinger wrote:
> Hi,
> 
> sources from 2024-03-11 work. Sources from 2024-03-25 and today don't work
> (see below for the issue). As the monthly stabilisation pass didn't find
> obvious issues, it is something related to my setup:
>  - not a generic kernel
>  - very modular kernel (as much as possible as a module)
>  - bind_now (a build without fails too, tested with clean /usr/obj)
>  - ccache (a build without fails too, tested with clean /usr/obj)
>  - kernel retpoline (build without in progress)
>  - userland retpoline (build without in progress)
>  - kernel build with WITH_CTF / DDB_CTF (next one to test if it isn't
> retpoline)
>  - -fno-builtin
>  - CPUFLAGS=native (except for stuff in /usr/src/sys/boot)
>  - malloc production
>  - COPTFLAGS= -O2 -pipe
> 
> The issue is, that kernel modules load OK from loader, but once it starts
> init any module fails to load (e.g. via autodetection of hardware or rc.conf
> kld_list) with the message that the kernel and module versions are out of
> sync and the module refuses to load.

What is the exact revision you're running?  There were some unrelated
changes to the kernel linker around the same time.

> I tried the workaround to load the modules from the loader, which works, but
> then I can't login remotely as ssh fails to allocate a pty. By loading
> modules via the loader, I can see messages about missing CTF info when the
> nvidia modules (from ports = not yet rebuild = in /boot/modules/...ko
> instead of /boot/kernel/...ko) try to get initialised... and it looks like
> they are failing to get initialised because of this missing CTF stuff (I'm
> back to the previous boot env to be able to login remotely and send mails, I
> don't have a copy of the failure message at hand).
> 
> I assume the missing CTF stuff is due to the CTF based pretty printing 
> (https://cgit.freebsd.org/src/commit/?id=c21bc6f3c2425de74141bfee07b609bf65b5a6b3).
> Is this supposed to fail to load modules which are compiled without CTF
> data? Shouldn't this work gracefully (e.g. spit out a warning that pretty
> printing is not available for module X and have the module working)?

>From my reading of linker_ctf_load_file(), this is exactly how it
already works.

> Next steps:
>  - try a world without retpoline (bind_now and ccache active)
>  - try a kernel without CTF (bind now, ccache, retpoline active)
>  - try a world without bind_now, retpoline, CTF, CPUFLAGS, COPTFLAGS
> 
> If anyone has an idea how to debug this in some other way...
> 
> Bye,
> Alexander.
> 
> -- 
> http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
> http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF





Multiple issues with current (kldload failures, missing CTF stuff, pty issues, ...)

2024-03-29 Thread Alexander Leidinger

Hi,

sources from 2024-03-11 work. Sources from 2024-03-25 and today don't 
work (see below for the issue). As the monthly stabilisation pass didn't 
find obvious issues, it is something related to my setup:

 - not a generic kernel
 - very modular kernel (as much as possible as a module)
 - bind_now (a build without fails too, tested with clean /usr/obj)
 - ccache (a build without fails too, tested with clean /usr/obj)
 - kernel retpoline (build without in progress)
 - userland retpoline (build without in progress)
 - kernel build with WITH_CTF / DDB_CTF (next one to test if it isn't 
retpoline)

 - -fno-builtin
 - CPUFLAGS=native (except for stuff in /usr/src/sys/boot)
 - malloc production
 - COPTFLAGS= -O2 -pipe

The issue is, that kernel modules load OK from loader, but once it 
starts init any module fails to load (e.g. via autodetection of hardware 
or rc.conf kld_list) with the message that the kernel and module 
versions are out of sync and the module refuses to load.


I tried the workaround to load the modules from the loader, which works, 
but then I can't login remotely as ssh fails to allocate a pty. By 
loading modules via the loader, I can see messages about missing CTF 
info when the nvidia modules (from ports = not yet rebuild = in 
/boot/modules/...ko instead of /boot/kernel/...ko) try to get 
initialised... and it looks like they are failing to get initialised 
because of this missing CTF stuff (I'm back to the previous boot env to 
be able to login remotely and send mails, I don't have a copy of the 
failure message at hand).


I assume the missing CTF stuff is due to the CTF based pretty printing 
(https://cgit.freebsd.org/src/commit/?id=c21bc6f3c2425de74141bfee07b609bf65b5a6b3). 
Is this supposed to fail to load modules which are compiled without CTF 
data? Shouldn't this work gracefully (e.g. spit out a warning that 
pretty printing is not available for module X and have the module 
working)?


Next steps:
 - try a world without retpoline (bind_now and ccache active)
 - try a kernel without CTF (bind now, ccache, retpoline active)
 - try a world without bind_now, retpoline, CTF, CPUFLAGS, COPTFLAGS

If anyone has an idea how to debug this in some other way...

Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


signature.asc
Description: OpenPGP digital signature


Re: CURRENT on laptop ASUS VivoBook Pro 14 90NB0VZ2-M01230

2024-03-28 Thread Matthias Apitz
El día miércoles, marzo 27, 2024 a las 03:51:33p. m. +0100, Matthias Apitz 
escribió:

> The WLAN card seems to be:
> 
> none2@pci0:1:0:0:   class=0x028000 rev=0x00 hdr=0x00 vendor=0x14c3 
> device=0x7961 subvendor=0x1a3b subdevice=0x4680
> vendor = 'MEDIATEK Corp.'
> device = 'MT7921 802.11ax PCI Express Wireless Network Adapter'
> class  = network
> 
> Perhaps not supported until today:
> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=264300

While grepping through /usr/src I found the driver in 
/usr/src/sys/contrib/dev/mediatek/mt76/mt7921/ and a make && make
install in 

# cd /usr/src/sys/modules/mt76
# make
..
# make install
===> core (install)
install -T release -o root -g wheel -m 555   mt76_core.ko /boot/modules/
kldxref /boot/modules
===> mt7915 (install)
install -T release -o root -g wheel -m 555   if_mt7915.ko /boot/modules/
kldxref /boot/modules
===> mt7921 (install)
install -T release -o root -g wheel -m 555   if_mt7921.ko /boot/modules/
kldxref /boot/modules

installs it fine. There is also a manpage in /usr/src/share/man/man4/mt7921.4
(attached)

I still have to build the firmware from ports/net/wifi-firmware-mt76-kmod.

matthias

-- 
Matthias Apitz, ✉ g...@unixarea.de, http://www.unixarea.de/ +49-176-38902045
Public GnuPG key: http://www.unixarea.de/key.pub
MT7921(4)  FreeBSD Kernel Interfaces Manual  MT7921(4)

NAME
 mt7921 – MediaTek IEEE 802.11ax wireless network driver

SYNOPSIS
 The driver will auto-load without any user interaction using devmatch(8)
 if enabled in rc.conf(5).

 Only if auto-loading is explicitly disabled, place the following lines in
 rc.conf(5) to manually load the driver as a module at boot time:

   kld_list="${kld_list} if_mt7921"

 The driver should automatically load any firmware needed for the
 particular chipset.

 It is discouraged to load the driver from loader(8).

DESCRIPTION
 The mt7921 driver is derived from MediaTek's Linux mt76 driver and
 provides support for the following chipsets:

   MediaTek MT7921E (PCIe)

 This driver requires firmware to be loaded before it will work.  The
 package wifi-firmware-mt76-kmod from the
 ports/net/wifi-firmware-mt76-kmod port needs to be installed before the
 driver is loaded.  Otherwise no wlan(4) interface can be created using
 ifconfig(8).

 The driver uses the linuxkpi_wlan and linuxkpi compat framework to bridge
 between the Linux and native FreeBSD driver code as well as to the native
 net80211(4) wireless stack.

 While mt7921 supports all 802.11 a/b/g/n/ac and ax the compatibility code
 currently only supports 802.11 a/b/g modes.  Support for 802.11 n/ac is
 to come.

BUGS
 Certainly.

SEE ALSO
 wlan(4), ifconfig(8), wpa_supplicant(8)

HISTORY
 The mt7921 driver first appeared in FreeBSD 14.0.

FreeBSD 14.0-CURRENT    April 18, 2023FreeBSD 14.0-CURRENT


Re: CURRENT on laptop ASUS VivoBook Pro 14 90NB0VZ2-M01230

2024-03-27 Thread Matthias Apitz
El día miércoles, marzo 27, 2024 a las 10:37:48a. m. +0100, Matthias Apitz 
escribió:

> 
> Hello,
> 
> I bought the laptop ASUS VivoBook Pro 14 90NB0VZ2-M01230 and managed to
> boot FreeBSD with boot verbose messages from an USB key and I'm able to
> login. The /var/log/messages are here
> http://www.unixarea.de/ASUS-VivoBook-Pro-14-messages.txt

A 'gpart list nda0' is here
http://www.unixarea.de/nda0.txt

Actual the (Windows) partitions are:

# egrep 'Name:|Mediasize:' nda0.txt
1. Name: nda0p1
   Mediasize: 272629760 (260M)
2. Name: nda0p2
   Mediasize: 16777216 (16M)
3. Name: nda0p3
   Mediasize: 510507949568 (475G)
4. Name: nda0p4
   Mediasize: 1101004800 (1.0G)
5. Name: nda0p5
   Mediasize: 209715200 (200M)
1. Name: nda0
   Mediasize: 512110190592 (477G)

A 'pciconf -lv' is here:
http://www.unixarea.de/pciconf.txt

The WLAN card seems to be:

none2@pci0:1:0:0:   class=0x028000 rev=0x00 hdr=0x00 vendor=0x14c3 
device=0x7961 subvendor=0x1a3b subdevice=0x4680
vendor = 'MEDIATEK Corp.'
device = 'MT7921 802.11ax PCI Express Wireless Network Adapter'
class  = network

Perhaps not supported until today:
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=264300

An attached USB mouse is detected and works.

matthias

-- 
Matthias Apitz, ✉ g...@unixarea.de, http://www.unixarea.de/ +49-176-38902045
Public GnuPG key: http://www.unixarea.de/key.pub



Re: CURRENT on laptop ASUS VivoBook Pro 14 90NB0VZ2-M01230

2024-03-27 Thread Gleb Popov
On Wed, Mar 27, 2024 at 12:38 PM Matthias Apitz  wrote:
>
>
> Hello,
>
> I bought the laptop ASUS VivoBook Pro 14 90NB0VZ2-M01230 and managed to
> boot FreeBSD with boot verbose messages from an USB key and I'm able to
> login. The /var/log/messages are here
> http://www.unixarea.de/ASUS-VivoBook-Pro-14-messages.txt

`pciconf -lv` would be useful too.



CURRENT on laptop ASUS VivoBook Pro 14 90NB0VZ2-M01230

2024-03-27 Thread Matthias Apitz


Hello,

I bought the laptop ASUS VivoBook Pro 14 90NB0VZ2-M01230 and managed to
boot FreeBSD with boot verbose messages from an USB key and I'm able to
login. The /var/log/messages are here
http://www.unixarea.de/ASUS-VivoBook-Pro-14-messages.txt

I can identify the harddisk as nda0 (correct), but don't see any mouse
and WLAN card. I will also boot an Ubuntu 22.04.4 from USB, maybe this
gives more hints about hardware details.

Thanks for any comments. I have 30 days to return the laptop to the
store.

matthias

-- 
Matthias Apitz, ✉ g...@unixarea.de, http://www.unixarea.de/ +49-176-38902045
Public GnuPG key: http://www.unixarea.de/key.pub



CURRENT 220ee18f1964 memstick kernel panic, MacBookPro8,3

2024-03-25 Thread Graham Perrin
Originally posted to 
<https://discord.com/channels/727023752348434432/1221505362016862288>


Photograph: 
<https://media.discordapp.net/attachments/1221505362016862288/1221505364936364152/image.png?ex=6612d285=66005d85=9b188930e96072deb379c61c52cca279c7cde78c0ba125199a62f336fe2083bd&==webp=lossless=915=686>


USB flash drive written from 
FreeBSD-15.0-CURRENT-amd64-20240314-220ee18f1964-268793-memstick.img.xz


Broadcom Wi-Fi-related, maybe? 
<https://bsd-hardware.info/?probe=89647876db#pci:14e4-4331-106b-00d6>


<https://cgit.freebsd.org/src/log/?qt=range=220ee18f1964>

Reproducible in safe mode.


Re: sysutils/pam_xdg: Cancelled on -CURRENT

2024-03-19 Thread Alastair Hogge
On 2024-03-19 16:02, Emmanuel Vadot wrote:
> On Tue, 19 Mar 2024 07:55:15 +
> Alastair Hogge  wrote:
> 
>> On 2024-03-19 15:23, Emmanuel Vadot wrote:
>> > Hi,
>> 
>> Hey Emmanuel,
>> 
>> > On Tue, 19 Mar 2024 06:54:27 +
>> > Alastair Hogge  wrote:
>> > 
>> >> Hello,
>> >> 
>> >> Recently a similar module (PAM) mentioned in the subject was committed
>> >> to base[1]. The module in base masks the currently installed Port, the
>> >> man page can be accessed with man -M /usr/local/share/man 8 pam_xdg,
>> >> however, I can now no longer build the Port. I noticed that the base
>> >> module has no WITHOUT_ option, tho, that might be extreme for one
>> >> module, but then again, the base module masks a more feature full
>> >> module. What is the practice to enable use of the Port again? At the
>> >> moment I am updating my host, and testing the following:
>> >> 
>> >> diff --git a/lib/libpam/modules/modules.inc
>> >> b/lib/libpam/modules/modules.inc
>> >> index f3ab65333f4f..ddbb326f0312 100644
>> >> --- a/lib/libpam/modules/modules.inc
>> >> +++ b/lib/libpam/modules/modules.inc
>> >> @@ -30,4 +30,3 @@ MODULES   += pam_ssh
>> >>  .endif
>> >>  MODULES+= pam_tacplus
>> >>  MODULES+= pam_unix
>> >> -MODULES+= pam_xdg
>> >> \ No newline at end of file
>> >> 
>> >> 1:
>> >> https://cgit.freebsd.org/src/commit/?id=6e69612d5df1c1d5bd86990ea4d9a170c030b292
>> >> 
>> >> Thanks.
>> >> 
>> > 
>> >  I don't see why you can't build the ports.
>> 
>> From sysutils/pam_xdg[2]:
>> 
>> if exists(/usr/lib/pam_xdg.so)
>> IGNORE= module name conflict with a different implementation in
>> base system
>> endif
> 
>  Ah yes, I've missed this :)
> 
>> >  Using would be a problem but why do you want to use it now that we
>> > have one in base ?
>> >  Do you have any problems with the one in base ?
>> 
>> I would like to continue using sysutils/pam_xdg because it handles all
>> ${XDG_*_HOME}, and local name spaces.
> 
>  XDG_*_HOME variables aren't needed, all applications must have a
> fallback to the base directories in the spec and sysutils/pam_xdg
> doesn't offer to use other directories so that's why I didn't implement
> those in the base one.
>  What do you mean by "local name spaces" ?

I meant all the other ${XDG_FU} excluding ${XDG_*_HOME}.

Anyways, turns out incredibly mistaken. I deployed another corporate
craptop from the dumpster today, and the User's homedir was not
populated with XDG dirs. I was sure I was using sysutils/pam_xdg for
that, but will now have to find my older scripts that predate using
sysutils/pam_xdg, to achieve that. Sorry for the noise.

Thanks,
Alastair



Re: sysutils/pam_xdg: Cancelled on -CURRENT

2024-03-19 Thread Emmanuel Vadot
On Tue, 19 Mar 2024 07:55:15 +
Alastair Hogge  wrote:

> On 2024-03-19 15:23, Emmanuel Vadot wrote:
> > Hi,
> 
> Hey Emmanuel,
> 
> > On Tue, 19 Mar 2024 06:54:27 +
> > Alastair Hogge  wrote:
> > 
> >> Hello,
> >> 
> >> Recently a similar module (PAM) mentioned in the subject was committed
> >> to base[1]. The module in base masks the currently installed Port, the
> >> man page can be accessed with man -M /usr/local/share/man 8 pam_xdg,
> >> however, I can now no longer build the Port. I noticed that the base
> >> module has no WITHOUT_ option, tho, that might be extreme for one
> >> module, but then again, the base module masks a more feature full
> >> module. What is the practice to enable use of the Port again? At the
> >> moment I am updating my host, and testing the following:
> >> 
> >> diff --git a/lib/libpam/modules/modules.inc
> >> b/lib/libpam/modules/modules.inc
> >> index f3ab65333f4f..ddbb326f0312 100644
> >> --- a/lib/libpam/modules/modules.inc
> >> +++ b/lib/libpam/modules/modules.inc
> >> @@ -30,4 +30,3 @@ MODULES   += pam_ssh
> >>  .endif
> >>  MODULES+= pam_tacplus
> >>  MODULES+= pam_unix
> >> -MODULES+= pam_xdg
> >> \ No newline at end of file
> >> 
> >> 1:
> >> https://cgit.freebsd.org/src/commit/?id=6e69612d5df1c1d5bd86990ea4d9a170c030b292
> >> 
> >> Thanks.
> >> 
> > 
> >  I don't see why you can't build the ports.
> 
> From sysutils/pam_xdg[2]:
> 
> if exists(/usr/lib/pam_xdg.so)
> IGNORE= module name conflict with a different implementation in
> base system
> endif

 Ah yes, I've missed this :)

> >  Using would be a problem but why do you want to use it now that we
> > have one in base ?
> >  Do you have any problems with the one in base ?
> 
> I would like to continue using sysutils/pam_xdg because it handles all
> ${XDG_*_HOME}, and local name spaces.

 XDG_*_HOME variables aren't needed, all applications must have a
fallback to the base directories in the spec and sysutils/pam_xdg
doesn't offer to use other directories so that's why I didn't implement
those in the base one.
 What do you mean by "local name spaces" ?

> 2: https://cgit.freebsd.org/ports/tree/sysutils/pam_xdg/Makefile#n16
> 
> Thanks.
> 

 Cheers,

-- 
Emmanuel Vadot  



Re: sysutils/pam_xdg: Cancelled on -CURRENT

2024-03-19 Thread Alastair Hogge
On 2024-03-19 15:23, Emmanuel Vadot wrote:
> Hi,

Hey Emmanuel,

> On Tue, 19 Mar 2024 06:54:27 +
> Alastair Hogge  wrote:
> 
>> Hello,
>> 
>> Recently a similar module (PAM) mentioned in the subject was committed
>> to base[1]. The module in base masks the currently installed Port, the
>> man page can be accessed with man -M /usr/local/share/man 8 pam_xdg,
>> however, I can now no longer build the Port. I noticed that the base
>> module has no WITHOUT_ option, tho, that might be extreme for one
>> module, but then again, the base module masks a more feature full
>> module. What is the practice to enable use of the Port again? At the
>> moment I am updating my host, and testing the following:
>> 
>> diff --git a/lib/libpam/modules/modules.inc
>> b/lib/libpam/modules/modules.inc
>> index f3ab65333f4f..ddbb326f0312 100644
>> --- a/lib/libpam/modules/modules.inc
>> +++ b/lib/libpam/modules/modules.inc
>> @@ -30,4 +30,3 @@ MODULES   += pam_ssh
>>  .endif
>>  MODULES+= pam_tacplus
>>  MODULES+= pam_unix
>> -MODULES+= pam_xdg
>> \ No newline at end of file
>> 
>> 1:
>> https://cgit.freebsd.org/src/commit/?id=6e69612d5df1c1d5bd86990ea4d9a170c030b292
>> 
>> Thanks.
>> 
> 
>  I don't see why you can't build the ports.

>From sysutils/pam_xdg[2]:

if exists(/usr/lib/pam_xdg.so)
IGNORE= module name conflict with a different implementation in
base system
endif

>  Using would be a problem but why do you want to use it now that we
> have one in base ?
>  Do you have any problems with the one in base ?

I would like to continue using sysutils/pam_xdg because it handles all
${XDG_*_HOME}, and local name spaces.

2: https://cgit.freebsd.org/ports/tree/sysutils/pam_xdg/Makefile#n16

Thanks.



Re: sysutils/pam_xdg: Cancelled on -CURRENT

2024-03-19 Thread Emmanuel Vadot


 Hi,

On Tue, 19 Mar 2024 06:54:27 +
Alastair Hogge  wrote:

> Hello,
> 
> Recently a similar module (PAM) mentioned in the subject was committed
> to base[1]. The module in base masks the currently installed Port, the
> man page can be accessed with man -M /usr/local/share/man 8 pam_xdg,
> however, I can now no longer build the Port. I noticed that the base
> module has no WITHOUT_ option, tho, that might be extreme for one
> module, but then again, the base module masks a more feature full
> module. What is the practice to enable use of the Port again? At the
> moment I am updating my host, and testing the following:
> 
> diff --git a/lib/libpam/modules/modules.inc
> b/lib/libpam/modules/modules.inc
> index f3ab65333f4f..ddbb326f0312 100644
> --- a/lib/libpam/modules/modules.inc
> +++ b/lib/libpam/modules/modules.inc
> @@ -30,4 +30,3 @@ MODULES   += pam_ssh
>  .endif
>  MODULES+= pam_tacplus
>  MODULES+= pam_unix
> -MODULES+= pam_xdg
> \ No newline at end of file
> 
> 1:
> https://cgit.freebsd.org/src/commit/?id=6e69612d5df1c1d5bd86990ea4d9a170c030b292
> 
> Thanks.
> 

 I don't see why you can't build the ports.
 Using would be a problem but why do you want to use it now that we
have one in base ?
 Do you have any problems with the one in base ?

 Cheers,

-- 
Emmanuel Vadot  



sysutils/pam_xdg: Cancelled on -CURRENT

2024-03-19 Thread Alastair Hogge
Hello,

Recently a similar module (PAM) mentioned in the subject was committed
to base[1]. The module in base masks the currently installed Port, the
man page can be accessed with man -M /usr/local/share/man 8 pam_xdg,
however, I can now no longer build the Port. I noticed that the base
module has no WITHOUT_ option, tho, that might be extreme for one
module, but then again, the base module masks a more feature full
module. What is the practice to enable use of the Port again? At the
moment I am updating my host, and testing the following:

diff --git a/lib/libpam/modules/modules.inc
b/lib/libpam/modules/modules.inc
index f3ab65333f4f..ddbb326f0312 100644
--- a/lib/libpam/modules/modules.inc
+++ b/lib/libpam/modules/modules.inc
@@ -30,4 +30,3 @@ MODULES   += pam_ssh
 .endif
 MODULES+= pam_tacplus
 MODULES+= pam_unix
-MODULES+= pam_xdg
\ No newline at end of file

1:
https://cgit.freebsd.org/src/commit/?id=6e69612d5df1c1d5bd86990ea4d9a170c030b292

Thanks.



Re: Unable to boot -CURRENT on Thinkpad P16s G2

2024-03-07 Thread Warner Losh
On Thu, Mar 7, 2024 at 4:50 PM Doug Ambrisko  wrote:

> On Thu, Mar 07, 2024 at 07:15:48PM +0100, Philipp Ost wrote:
> | On 2/28/24 21:10, Philipp Ost wrote:
> | [boot log stripped]
> | > Does anyone have any suggestions on how to proceed at this point? [..]
> |
> | Short follow-up: disabling uart0 and uart1 at the loader prompt allowed
> us
> | to boot and install FreeBSD (the -CURRENT snapshot from 2024-02-29 in
> case
> | it matters).
>
> UARTS on AMD can be a bit different.  Some BIOS implementations seem
> to set them up to work like legacy ports others do not.  On a Naples
> platform I helped add support for them since they were not setup
> in the legacy configuration.  The AMD servers I'm using now have them
> setup in legacy mode and just work like on other systems.
>
> If I remember right those UARTS were defined in ACPI.  On a laptop they
> probably don't have serial ports and the probe is getting stuck on
> something.  It might be good to instrument it to see what.
>

It might also be time to finally drop the UART fallback when ACPI is
present.
I've seen spotty reports of accessing these registers (for uart, kbd and
maybe
mouse) causing problems. The ACPI definition of the UARTs would be
additional
uart units. The fallback stuff is needed only for extremely edge cases at
this point.

Warner


Re: Unable to boot -CURRENT on Thinkpad P16s G2

2024-03-07 Thread Doug Ambrisko
On Thu, Mar 07, 2024 at 07:15:48PM +0100, Philipp Ost wrote:
| On 2/28/24 21:10, Philipp Ost wrote:
| [boot log stripped]
| > Does anyone have any suggestions on how to proceed at this point? [...]
| 
| Short follow-up: disabling uart0 and uart1 at the loader prompt allowed us
| to boot and install FreeBSD (the -CURRENT snapshot from 2024-02-29 in case
| it matters).

UARTS on AMD can be a bit different.  Some BIOS implementations seem
to set them up to work like legacy ports others do not.  On a Naples
platform I helped add support for them since they were not setup
in the legacy configuration.  The AMD servers I'm using now have them
setup in legacy mode and just work like on other systems.

If I remember right those UARTS were defined in ACPI.  On a laptop they
probably don't have serial ports and the probe is getting stuck on
something.  It might be good to instrument it to see what.

Thanks,

Doug A.



Re: Unable to boot -CURRENT on Thinkpad P16s G2

2024-03-07 Thread Philipp Ost

On 2/28/24 21:10, Philipp Ost wrote:
[boot log stripped]

Does anyone have any suggestions on how to proceed at this point? [...]


Short follow-up: disabling uart0 and uart1 at the loader prompt allowed 
us to boot and install FreeBSD (the -CURRENT snapshot from 2024-02-29 in 
case it matters).


Best
Philipp



Unable to boot -CURRENT on Thinkpad P16s G2

2024-02-28 Thread Philipp Ost

Hi everyone,

we have a Lenovo Thinkpad P16s Gen2 (Type 21K9) here with an AMD Ryzen 7 
CPU which we would like to install FreeBSD on. Alas, it won't boot.


The FreeBSD 15-CURRENT amd64 snapshot from February 22 hangs after:
[...]
acpi_acad0:  on acpi0
battery0:  on acpi0

A verbose boot provides some more information, but then hangs as well:
[...]
acpi_acad0:  on acpi0
AcpiOsExecute: task queue not started
battery0:  on acpi0
AcpiOsExecute: task queue not started
ACPI: Enabled 1 GPEs in block 00 to 1F
ahc_isa_identify 0: ioport 0xc00 alloc failed
ahc_isa_identify 1: ioport 0x1c00 alloc failed
ahc_isa_identify 2: ioport 0x2c00 alloc failed
ahc_isa_identify 3: ioport 0x3c00 alloc failed
ahc_isa_identify 4: ioport 0x4c00 alloc failed
ahc_isa_identify 5: ioport 0x5c00 alloc failed
ahc_isa_identify 6: ioport 0x6c00 alloc failed
isa_probe_children: disabling PnP devices
atkbdc: atkbdc0 already exists; skipping it
atrtc: atrtc0 already exists; skipping it
attimer: attimer0 already exists; skipping it
sc: sc0 already exists; skipping it
isa_probe_children: probing non-PnP devices
sc0 failed to probe on isa0
vga0 failed to probe on isa0
fdc0 failed to probe at port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on isa0
ppc0: cannot reserve I/O port range
ppc0 failed to probe at irq 7 on isa0
uart0 failed to probe at port 0x3f8 irq 4 on isa0

Booting with ACPI disabled results in a kernel panic.

An older snapshot (-CURRENT from 15. February) as well as 
14-STABLE/-RELEASE fail to boot in the same manner.


Does anyone have any suggestions on how to proceed at this point? We 
really would like to get FreeBSD running on this machine and are willing 
to help. As neither of us is versed in low-level system programming, we 
would require some support.


Currently, the notebook is running Lubuntu 22.04 from which we gathered 
some information about the hardware. Everything we collected so far can 
be found here: https://philippost.de/TP-P16s-G2/ If there is anything 
missing, please indicate which information you require.


Thanks a lot in advance!

Best
Philipp and Klaus



Re: Missing files on -current

2024-02-24 Thread bob prohaska
On Sat, Feb 24, 2024 at 03:59:01PM +, Gary Jennejohn wrote:
> 
> The function run_rc_scripts is defined in /usr/src/libexec/rc/rc.subr and
> is called in /usr/src/libexec/rc/rc.  /etc/rc includes /etc/rc.subr.
> 
> So, maybe one of these files is not up to date under /etc?
> 

My fault, etcupdate reported a conflict and I didn't
notice it. Sorry for the noise!

bob prohaska
 



Re: Missing files on -current

2024-02-24 Thread Gary Jennejohn
On Sat, 24 Feb 2024 06:53:37 -0800
bob prohaska  wrote:

> A Pi4 running -current completed a build/install cycle for world and kernel
> without obvious errors but failed to reboot, reporting:
> ...
> Warning: no time-of-day clock registered, system time will not be set 
> accurately
> Dual Console: Serial Primary, Video Secondary
> /etc/rc: run_rc_scripts: not found
> /etc/rc: run_rc_scripts: not found
> /etc/rc: have: not found
>
> Sat Feb 24 13:42:09 UTC 2024
> 2024-02-24T13:42:10.007616+00:00 - init 31 - - can't exec getty 
> '/usr/libexec/getty' for port /dev/ttyv1: No such file or directory
> ...
>
> Uname -a reports:
> FreeBSD  15.0-CURRENT FreeBSD 15.0-CURRENT #121 main-n268499-b9870ba93ea9: 
> Fri Feb 23 23:14:59 PST 2024 
> b...@nemesis.zefox.com:/usr/obj/usr/src/arm64.aarch64/sys/GENERIC 
> arm64distribution.
>
> Power cycling allowed boot to single-user, running fsck -fy reports a clean
> root file system.
>
> /etc/fstab contains
> /dev/da0s2a   /   ufs rw  1   1
> /dev/da0s1 /boot/msdos msdosfs rw,noatime 0 0
> #tmpfs /tmp tmpfs rw,mode=1777,size=50m 0 0
> /dev/da0s2d /usrufs rw  2   2
> /dev/da0s2b noneswapsw
>
> There does not seem to be a file named run_rc_scripts present
> in the filesystem.
>
> Any suggestions on how to back myself out of this corner
> would be much appreciated!
>
> Thanks for reading,
>

The function run_rc_scripts is defined in /usr/src/libexec/rc/rc.subr and
is called in /usr/src/libexec/rc/rc.  /etc/rc includes /etc/rc.subr.

So, maybe one of these files is not up to date under /etc?

--
Gary Jennejohn



Re: Missing files on -current

2024-02-24 Thread bob prohaska
On Sat, Feb 24, 2024 at 07:02:19AM -0800, David Wolfskill wrote:
> 
> This is from an amd64 system at main-n268514-61b88a230bac, but
> run_rc_scripts is a shell function defined in /etc/rc.subr.
> 
> So the whine about not finding run_rc_scripts would indicate that at
> least one of the following is true:
> 
> * The script that should have sourced /etc/rc.subr failed to do so.
> 
> * /etc/rc.csubr is corrupted, and fails to define run_rc_scripts().
>


Indeed, it seems to be absent:
root@:~ # more /etc/rc.csubr
/etc/rc.csubr: No such file or directory
root@:~ #

However, the same is true of a Pi3 running 14-release p5.
It boots reliably once it reaches loader.

I wouldn't expect this part of the boot process to be
platform dependent. Maybe -current and -release do
things differently?
 
> * /etc/rc.subr is missing.
Present and accounted for:
root@:~ # ls -l /etc/rc.subr
-rw-r--r--  1 root wheel 51911 Nov 18 21:46 /etc/rc.subr
root@:~ # 

Thanks for writing!

bob prohaska




Missing files on -current

2024-02-24 Thread bob prohaska
A Pi4 running -current completed a build/install cycle for world and kernel
without obvious errors but failed to reboot, reporting:
...
Warning: no time-of-day clock registered, system time will not be set accurately
Dual Console: Serial Primary, Video Secondary
/etc/rc: run_rc_scripts: not found
/etc/rc: run_rc_scripts: not found
/etc/rc: have: not found

Sat Feb 24 13:42:09 UTC 2024
2024-02-24T13:42:10.007616+00:00 - init 31 - - can't exec getty 
'/usr/libexec/getty' for port /dev/ttyv1: No such file or directory 
...

Uname -a reports:
FreeBSD  15.0-CURRENT FreeBSD 15.0-CURRENT #121 main-n268499-b9870ba93ea9: Fri 
Feb 23 23:14:59 PST 2024 
b...@nemesis.zefox.com:/usr/obj/usr/src/arm64.aarch64/sys/GENERIC 
arm64distribution.

Power cycling allowed boot to single-user, running fsck -fy reports a clean
root file system.

/etc/fstab contains
/dev/da0s2a   /   ufs rw  1   1
/dev/da0s1 /boot/msdos msdosfs rw,noatime 0 0
#tmpfs /tmp tmpfs rw,mode=1777,size=50m 0 0
/dev/da0s2d /usrufs rw  2   2
/dev/da0s2b noneswapsw

There does not seem to be a file named run_rc_scripts present
in the filesystem.

Any suggestions on how to back myself out of this corner
would be much appreciated!

Thanks for reading,

bob prohaska




Re: FreeBSD CURRENT stabilization cycle

2024-02-24 Thread Franco Fichtner
Hi,

And whom do you want to „stab“ with this? ;)

Why not do the same thing that ports does and call this „monthly“ which is 
pretty much what it is and easy to understand and you can have one build at the 
end of that week?


Cheers,
Franco

> On 24. Feb 2024, at 12:51, Kirill Ponomarev  wrote:
> 
> On 02/23, Mark Millard wrote:
>> Gleb Smirnoff  wrote on
>> Date: Sat, 24 Feb 2024 04:32:52 UTC :
>> 
>>> More seriously speaking, I
>>> actually hope that in some future snapshots.FreeBSD.org will start using 
>>> these
>>> points for snapshot generation.
>> 
>> How about also the likes of:
>> 
>> https://pkg.freebsd.org/FreeBSD:15:aarch64/stabweek/
>> 
>> for pkgbase (various "aarch64" replacements too)?
> 
> yes, great idea, base_stabweek or similar is something I'd vote for.



Re: FreeBSD CURRENT stabilization cycle

2024-02-24 Thread Kirill Ponomarev
On 02/23, Mark Millard wrote:
> Gleb Smirnoff  wrote on
> Date: Sat, 24 Feb 2024 04:32:52 UTC :
> 
> > More seriously speaking, I
> > actually hope that in some future snapshots.FreeBSD.org will start using 
> > these
> > points for snapshot generation.
> 
> How about also the likes of:
> 
> https://pkg.freebsd.org/FreeBSD:15:aarch64/stabweek/
> 
> for pkgbase (various "aarch64" replacements too)?

yes, great idea, base_stabweek or similar is something I'd vote for.


signature.asc
Description: PGP signature


RE: FreeBSD CURRENT stabilization cycle

2024-02-23 Thread Mark Millard
Gleb Smirnoff  wrote on
Date: Sat, 24 Feb 2024 04:32:52 UTC :

> More seriously speaking, I
> actually hope that in some future snapshots.FreeBSD.org will start using these
> points for snapshot generation.

How about also the likes of:

https://pkg.freebsd.org/FreeBSD:15:aarch64/stabweek/

for pkgbase (various "aarch64" replacements too)?

===
Mark Millard
marklmi at yahoo.com




FreeBSD CURRENT stabilization cycle

2024-02-23 Thread Gleb Smirnoff
  Hi FreeBSD CURRENT users,

back in November I came up with a proposal of providing some stabilization
cadence to development of the main branch, also known as FreeBSD CURRENT.  Here
is a video with the initial proposal and following discussion at VendorBSD
Conference (18 minutes):

https://www.youtube.com/live/k-AzShVdAHo?si=hPAhCd_-RuoTRqcW=2511

And here goes an up to date version of the plan!

In the last decade quality of FreeBSD CURRENT improved so much, that not only
brave developers run it on their laptops, but also large companies use it. Time
to bring it to a new level.  Every individual or a business that use CURRENT
has their own protocol of how to stay up date and avoid disasters.  An
individual will first update their desktop and only after that will update
server(s).  A company would run their internal regression test suite or some
other validation protocol.  Right now we all do that independently from each
other having little coordination and providing little help each other.  We also
do not broadcast to the world that FreeBSD CURRENT is usable. I've seen a lot
of people who stay away from CURRENT based on their 20-year old experience with
it.

Here is how we are going to improve:

* Last week of a month is declared a stabilization week

Src committers are encouraged to avoid pushing risky changes to FreeBSD/main
during this week.  This is an advice, not a policy!  If a committer breaks
something during the week they got 3x public shame, but no administrative
penalties or fines.  Committers are encouraged to push bug fixes, improve unit
tests, clean up comments and improve documentation.  It is a also a good time
to do merging of past work to stable branches. Developers of course will
continue their work on bigger projects in their private branches.

Sidenote: there is no agreement in the world what is "the last week of a
month".  For our purposes we will use the week that contains the last Friday of
the month.  Because we want the monthly snapshot to be called by the name of
the month (not next month) and thus we want the last day of the stabilization
cycle always to be in that month.

* Monday of the stabweek is the day to update your CURRENT and test it

Monday 8:00 GMT a tag is created and published.  Right now it is published
at my personal https://github.com/glebius/FreeBSD/tags.  Note that the tag
points at a hash in the official repo, so there is no trust involved here.

At Netflix I will be working on merging the tagged revision into our tree and I
will hand off the resulting branch to our excellent testing team (dhw@ +
olivier@) usually by the end of Monday (PST time).  Other companies and parties
are encouraged to start testing the tagged revision.  Peter Holm may switch his
stress2 to run that revision.  You are encouraged to update your desktop or
laptop that of course runs FreeBSD CURRENT.

* A short lived stabilization branch may be created

In case we discover regressions compared to the previous month stabweek, bug
fixes to them will be committed to a short lived branch.  This branch may
contain direct cherry-picks from main, as well as work-in-progress bugfixes
that had not yet been committed to main, reverts of commits and even stop gaps
that disable certain functionality for the sake of stability.  This branch may
be rebased and force pushed if a temporary bugfix appears different to a final
one in main.  The branch may observe commits immediately Monday morning in case
we already know about a certain regression.  The branch will not observe
commits to a long standing bugs that were fixed in main during the stabweek,
unless somebody explicitly asks to include one.  And finally, the branch may
not even be created in case testing confirms everything is alright with the
Monday tag.

The branch will be published at https://github.com/glebius/FreeBSD.  There is
certain level of trust required to use it. That may change to a more official
publishing point in the future.

* The stabweek quiet period ends no later than Friday 18:00 GMT

No matter if we were able to identify and fix any or all bugs the quiet period
ends.  The public shame level for src committers breaking FreeBSD CURRENT goes
back to normal level.  In a case we were not able to address all issues by end
of Friday the stabweek branch will be active past the end of the stabweek, as
we want to collect all regression fixes in the branch.  But this is the worst
case scenario!

A more appreciated scenario is that the stabilization period ends earlier in
the week. If all testing parties report their satisfaction with state of main
as is or of the stabweek branch and if I don't see any fresh bug reports in
bugzilla or submissions via other channels, there is no reason to withheld
committers with pushing their stuff.

At the end of the stabilization period be it Friday or earlier I will write
email to current@ reporting the results:

- were there any regression identified with the Monday tag
- what has been a

Re: nvme controller reset failures on recent -CURRENT

2024-02-13 Thread Patrick M. Hausen
Hi all,

> Am 13.02.2024 um 20:56 schrieb Pete Wright :
> 1. M.2 nvme really does need proper cooling, much more so than traditional 
> SATA/SAS/SCSI drives.

I recently found a tool named "Scrutiny" that presents a nice dashboard
of all your disk devices and their SMART data including crucial points
like temperature.

Pros:

Open source
Nice web UI
Uses smartmontools to gather the data, not reinventing the wheel
Agents that can be called from cron jobs for many OSes including FreeBSD
Alerting via a variety of communication channels

Cons:

Central hub best run on Linux plus docker compose
No authentication whatsoever, so strictly internal use
No grouping or any organisation of systems so does not scale beyond tens of 
servers

I found a couple of problematic HDDs and SSDs right after deploying it
which regular SMART tests overlooked.

https://github.com/AnalogJ/scrutiny

Look for the Hub/Spoke deployment if you are willing to use e.g.
a Linux VM to run the tool, then point your FreeBSD systems at that.

It probably can be deployed strictly on FreeBSD, too, using the manual
installation instructions.

HTH, kind regards,
Patrick


Re: nvme controller reset failures on recent -CURRENT

2024-02-13 Thread Craig Leres
I had issues with a nvme drive in an intel nuc. When I asked 
freebsd-hackers, overheating was the first guess:



https://lists.freebsd.org/pipermail/freebsd-hackers/2018-May/052783.html

I blew the dust out of the fan assembly and changed the bios fan 
settings to be more aggressive and the system has been rock solid since.


Craig



Re: nvme controller reset failures on recent -CURRENT

2024-02-13 Thread Pete Wright

There's a tiny chance that this could be something more exotic,
but my money is on hardware gone bad after 2 years of service. I don't think
this is 'wear out' of the NAND (it's only 15TB written, but it could be if
this
drive is really really crappy nand: first generation QLC maybe, but it seems
too new). It might also be a connector problem that's developed over time.
There might be a few other things too, but I don't think this is a U.2 drive
with funky cables.

The system was probably idle the majority of those two years of power on
time.

It's one of these:
https://www.techpowerup.com/ssd-specs/intel-660p-512-gb.d437
I've seen comments that these generally don't need cooling.

I just ordered a heatsink with some nice big fins, but it will take a
week or more to arrive.



just wanted to add another data-point to this discussion.  i had a 
crucial NVME drive on my workstation that recently was showing similar 
problems.  after much debugging i came to the same conclusion that it 
was getting too hot.  i went ahead an purchased a Sabrent NVME drive 
that came with a heat sink.  i've also starting making much more use of 
my workstation (and the disk subsystem) and have had zero issues.


so lessons learnt:

1. M.2 nvme really does need proper cooling, much more so than 
traditional SATA/SAS/SCSI drives.


2. not all vendors do a great job reporting the health of devices

-pete

--
Pete Wright
p...@nomadlogic.org




Re: nvme controller reset failures on recent -CURRENT

2024-02-13 Thread Don Lewis
On 12 Feb, Warner Losh wrote:
> On Mon, Feb 12, 2024 at 9:15 PM Don Lewis  wrote:
> 
>> On 12 Feb, Maxim Sobolev wrote:
>> > Might be an overheating. Today's nvme drives are notoriously flaky if you
>> > run them without proper heat sink attached to it.
>>
>> I don't think it is a thermal problem.  According to the drive health
>> page, the device temperature has never reached Temperature 2, whatever
>> that is.  The room temperature is around 65F.  The system was stable
>> last summer when the room temperature spent a lot of time in the 80-85F
>> range.  The device temperature depends a lot on the I/O rate, and the
>> last panic happened when the I/O rate had been below 40tps for quite a
>> while.
>>
> 
> It did reach temperature 1, though. That's the 'Warning this drive is too
> hot' temperature. It has spent 41213 minutes of your 19297 hours of up
> time, or an average of 2 minutes per hour. That's too much. Temperature
> 2 is critical error: we are about to shut down completely due to it
> being too hot. It's only a couple degrees below hardware power off
> due to temperature in many drives. Some really cheap ones don't really
> implement it at all. On my card with the bad heat sink, Warning temp is
> 70C while critical is 75C while IIRC thermal shutdown is 78C or 80C.
> 
> I don't think we report these values in nvmecontrol identify. But you can
> do a raw dump with -x look at bytes 266:267 for warning and 268:269
> for critical.
> 
> In contrast, the few dozen drives that I have, all of which have been
> abused in various ways, And only one of them has any heat issues,
> and that one is an engineering special / sample with what I think is
> a damaged heat sink. If your card has no heat sink, this could well
> be what's going on.
> 
> This panic means "the nvme card lost its mind and stopped talking
> to the host". Its status registers read 0xff's, which means that the card
> isn't decoding bus signals. Usually this means that the firmware on the
> card has faulted and rebooted. If the card is overheating, then this could
> well be what's happening.
> 
> There's a tiny chance that this could be something more exotic,
> but my money is on hardware gone bad after 2 years of service. I don't think
> this is 'wear out' of the NAND (it's only 15TB written, but it could be if
> this
> drive is really really crappy nand: first generation QLC maybe, but it seems
> too new). It might also be a connector problem that's developed over time.
> There might be a few other things too, but I don't think this is a U.2 drive
> with funky cables.

The system was probably idle the majority of those two years of power on
time.

It's one of these:
https://www.techpowerup.com/ssd-specs/intel-660p-512-gb.d437
I've seen comments that these generally don't need cooling.

I just ordered a heatsink with some nice big fins, but it will take a
week or more to arrive.

> 
>> > On Mon, Feb 12, 2024, 4:28 PM Don Lewis  wrote:
>> >
>> >> I just upgraded my package build machine to:
>> >>   FreeBSD 15.0-CURRENT #110 main-n268161-4015c064200e
>> >> from:
>> >>   FreeBSD 15.0-CURRENT #106 main-n265953-a5ed6a815e38
>> >> and I've had two nvme-triggered panics in the last day.
>> >>
>> >> nvme is being used for swap and L2ARC.  I'm not able to get a crash
>> >> dump, probably because the nvme device has gone away and I get an error
>> >> about not having a dump device.  It looks like a low-memory panic
>> >> because free memory is low and zfs is calling malloc().
>> >>
>> >> This shows up in the log leading up to the panic:
>> >> Feb 12 10:07:41 zipper kernel: nvme0: Resetting controller due to a
>> >> timeout a
>> >> nd possible hot unplug.
>> >> Feb 12 10:07:41 zipper syslogd: last message repeated 1 times
>> >> Feb 12 10:07:41 zipper kernel: nvme0: resetting controller
>> >> Feb 12 10:07:41 zipper kernel: nvme0: Resetting controller due to a
>> >> timeout a
>> >> nd possible hot unplug.
>> >> Feb 12 10:07:41 zipper syslogd: last message repeated 1 times
>> >> Feb 12 10:07:41 zipper kernel: nvme0: Waiting for reset to complete
>> >> Feb 12 10:07:41 zipper syslogd: last message repeated 2 times
>> >> Feb 12 10:07:41 zipper kernel: nvme0: failing queued i/o
>> >> Feb 12 10:07:41 zipper kernel: nvme0: Failed controller, stopping
>> watchdog
>> >> ti
>> >> meout.
>> >>
>> >> The device looks healthy to me:
>> >> SMART/Health Information Log
>> >> ===

Re: nvme controller reset failures on recent -CURRENT

2024-02-12 Thread Warner Losh
On Mon, Feb 12, 2024 at 9:15 PM Don Lewis  wrote:

> On 12 Feb, Maxim Sobolev wrote:
> > Might be an overheating. Today's nvme drives are notoriously flaky if you
> > run them without proper heat sink attached to it.
>
> I don't think it is a thermal problem.  According to the drive health
> page, the device temperature has never reached Temperature 2, whatever
> that is.  The room temperature is around 65F.  The system was stable
> last summer when the room temperature spent a lot of time in the 80-85F
> range.  The device temperature depends a lot on the I/O rate, and the
> last panic happened when the I/O rate had been below 40tps for quite a
> while.
>

It did reach temperature 1, though. That's the 'Warning this drive is too
hot' temperature. It has spent 41213 minutes of your 19297 hours of up
time, or an average of 2 minutes per hour. That's too much. Temperature
2 is critical error: we are about to shut down completely due to it
being too hot. It's only a couple degrees below hardware power off
due to temperature in many drives. Some really cheap ones don't really
implement it at all. On my card with the bad heat sink, Warning temp is
70C while critical is 75C while IIRC thermal shutdown is 78C or 80C.

I don't think we report these values in nvmecontrol identify. But you can
do a raw dump with -x look at bytes 266:267 for warning and 268:269
for critical.

In contrast, the few dozen drives that I have, all of which have been
abused in various ways, And only one of them has any heat issues,
and that one is an engineering special / sample with what I think is
a damaged heat sink. If your card has no heat sink, this could well
be what's going on.

This panic means "the nvme card lost its mind and stopped talking
to the host". Its status registers read 0xff's, which means that the card
isn't decoding bus signals. Usually this means that the firmware on the
card has faulted and rebooted. If the card is overheating, then this could
well be what's happening.

There's a tiny chance that this could be something more exotic,
but my money is on hardware gone bad after 2 years of service. I don't think
this is 'wear out' of the NAND (it's only 15TB written, but it could be if
this
drive is really really crappy nand: first generation QLC maybe, but it seems
too new). It might also be a connector problem that's developed over time.
There might be a few other things too, but I don't think this is a U.2 drive
with funky cables.

Warner


> > On Mon, Feb 12, 2024, 4:28 PM Don Lewis  wrote:
> >
> >> I just upgraded my package build machine to:
> >>   FreeBSD 15.0-CURRENT #110 main-n268161-4015c064200e
> >> from:
> >>   FreeBSD 15.0-CURRENT #106 main-n265953-a5ed6a815e38
> >> and I've had two nvme-triggered panics in the last day.
> >>
> >> nvme is being used for swap and L2ARC.  I'm not able to get a crash
> >> dump, probably because the nvme device has gone away and I get an error
> >> about not having a dump device.  It looks like a low-memory panic
> >> because free memory is low and zfs is calling malloc().
> >>
> >> This shows up in the log leading up to the panic:
> >> Feb 12 10:07:41 zipper kernel: nvme0: Resetting controller due to a
> >> timeout a
> >> nd possible hot unplug.
> >> Feb 12 10:07:41 zipper syslogd: last message repeated 1 times
> >> Feb 12 10:07:41 zipper kernel: nvme0: resetting controller
> >> Feb 12 10:07:41 zipper kernel: nvme0: Resetting controller due to a
> >> timeout a
> >> nd possible hot unplug.
> >> Feb 12 10:07:41 zipper syslogd: last message repeated 1 times
> >> Feb 12 10:07:41 zipper kernel: nvme0: Waiting for reset to complete
> >> Feb 12 10:07:41 zipper syslogd: last message repeated 2 times
> >> Feb 12 10:07:41 zipper kernel: nvme0: failing queued i/o
> >> Feb 12 10:07:41 zipper kernel: nvme0: Failed controller, stopping
> watchdog
> >> ti
> >> meout.
> >>
> >> The device looks healthy to me:
> >> SMART/Health Information Log
> >> 
> >> Critical Warning State: 0x00
> >>  Available spare:   0
> >>  Temperature:   0
> >>  Device reliability:0
> >>  Read only: 0
> >>  Volatile memory backup:0
> >> Temperature:312 K, 38.85 C, 101.93 F
> >> Available spare:100
> >> Available spare threshold:  10
> >> Percentage used:3
> >> Data units (512,000 byte) read: 5761183
> >> Data units written: 29911502
> >> Host read commands: 471921188
>

Re: nvme controller reset failures on recent -CURRENT

2024-02-12 Thread Don Lewis
On 12 Feb, Maxim Sobolev wrote:
> Might be an overheating. Today's nvme drives are notoriously flaky if you
> run them without proper heat sink attached to it.

I don't think it is a thermal problem.  According to the drive health
page, the device temperature has never reached Temperature 2, whatever
that is.  The room temperature is around 65F.  The system was stable
last summer when the room temperature spent a lot of time in the 80-85F
range.  The device temperature depends a lot on the I/O rate, and the
last panic happened when the I/O rate had been below 40tps for quite a
while.

> On Mon, Feb 12, 2024, 4:28 PM Don Lewis  wrote:
> 
>> I just upgraded my package build machine to:
>>   FreeBSD 15.0-CURRENT #110 main-n268161-4015c064200e
>> from:
>>   FreeBSD 15.0-CURRENT #106 main-n265953-a5ed6a815e38
>> and I've had two nvme-triggered panics in the last day.
>>
>> nvme is being used for swap and L2ARC.  I'm not able to get a crash
>> dump, probably because the nvme device has gone away and I get an error
>> about not having a dump device.  It looks like a low-memory panic
>> because free memory is low and zfs is calling malloc().
>>
>> This shows up in the log leading up to the panic:
>> Feb 12 10:07:41 zipper kernel: nvme0: Resetting controller due to a
>> timeout a
>> nd possible hot unplug.
>> Feb 12 10:07:41 zipper syslogd: last message repeated 1 times
>> Feb 12 10:07:41 zipper kernel: nvme0: resetting controller
>> Feb 12 10:07:41 zipper kernel: nvme0: Resetting controller due to a
>> timeout a
>> nd possible hot unplug.
>> Feb 12 10:07:41 zipper syslogd: last message repeated 1 times
>> Feb 12 10:07:41 zipper kernel: nvme0: Waiting for reset to complete
>> Feb 12 10:07:41 zipper syslogd: last message repeated 2 times
>> Feb 12 10:07:41 zipper kernel: nvme0: failing queued i/o
>> Feb 12 10:07:41 zipper kernel: nvme0: Failed controller, stopping watchdog
>> ti
>> meout.
>>
>> The device looks healthy to me:
>> SMART/Health Information Log
>> 
>> Critical Warning State: 0x00
>>  Available spare:   0
>>  Temperature:   0
>>  Device reliability:0
>>  Read only: 0
>>  Volatile memory backup:0
>> Temperature:312 K, 38.85 C, 101.93 F
>> Available spare:100
>> Available spare threshold:  10
>> Percentage used:3
>> Data units (512,000 byte) read: 5761183
>> Data units written: 29911502
>> Host read commands: 471921188
>> Host write commands:605394753
>> Controller busy time (minutes): 32359
>> Power cycles:   110
>> Power on hours: 19297
>> Unsafe shutdowns:   14
>> Media errors:   0
>> No. error info log entries: 0
>> Warning Temp Composite Time:0
>> Error Temp Composite Time:  0
>> Temperature 1 Transition Count: 5231
>> Temperature 2 Transition Count: 0
>> Total Time For Temperature 1:   41213
>> Total Time For Temperature 2:   0
>>
>>
>>




Re: nvme controller reset failures on recent -CURRENT

2024-02-12 Thread Don Lewis
On 12 Feb, Mark Johnston wrote:
> On Mon, Feb 12, 2024 at 04:28:10PM -0800, Don Lewis wrote:
>> I just upgraded my package build machine to:
>>   FreeBSD 15.0-CURRENT #110 main-n268161-4015c064200e
>> from:
>>   FreeBSD 15.0-CURRENT #106 main-n265953-a5ed6a815e38
>> and I've had two nvme-triggered panics in the last day.
>> 
>> nvme is being used for swap and L2ARC.  I'm not able to get a crash
>> dump, probably because the nvme device has gone away and I get an error
>> about not having a dump device.  It looks like a low-memory panic
>> because free memory is low and zfs is calling malloc().
>> 
>> This shows up in the log leading up to the panic:
>> Feb 12 10:07:41 zipper kernel: nvme0: Resetting controller due to a timeout a
>> nd possible hot unplug.
>> Feb 12 10:07:41 zipper syslogd: last message repeated 1 times
>> Feb 12 10:07:41 zipper kernel: nvme0: resetting controller
>> Feb 12 10:07:41 zipper kernel: nvme0: Resetting controller due to a timeout a
>> nd possible hot unplug.
>> Feb 12 10:07:41 zipper syslogd: last message repeated 1 times
>> Feb 12 10:07:41 zipper kernel: nvme0: Waiting for reset to complete
>> Feb 12 10:07:41 zipper syslogd: last message repeated 2 times
>> Feb 12 10:07:41 zipper kernel: nvme0: failing queued i/o
>> Feb 12 10:07:41 zipper kernel: nvme0: Failed controller, stopping watchdog ti
>> meout.
> 
> Are you by chance using the drive mentioned here? 
> https://github.com/openzfs/zfs/discussions/14793
> 
> I was bitten by that and ended up replacing the drive with a different
> model.  The crash manifested exactly as you describe, though I didn't
> have L2ARC or swap enabled on it.

Nope:
nda0 at nvme0 bus 0 scbus9 target 0 lun 1
nda0: 
nda0: Serial Number BTNH940617WE512A
nda0: nvme version 1.3
nda0: 488386MB (1000215216 512 byte sectors)

I'm not seeing super high I/O rates>  I happened to have iostat running
when the machine paniced:
   0   584 88.431  2.68 65.8   112  7.18 68.2   107  7.13  80  0 20  0  0
   0   565 99.132  3.06 27.974  2.01 30.570  2.08  80  0 20  0  0
   0   612 92.831  2.77 18.9   148  2.74 18.9   148  2.73  86  0 14  0  0
   0   618 88.613  1.17 25.059  1.44 24.261  1.44  89  0 11  0  0
   0   586 45.4 5  0.22 31.455  1.70 30.857  1.70  84  0 16  0  0
   0   598 12.7 3  0.03 38.164  2.40 37.166  2.40  84  0 16  0  0
   0   675 36.1 6  0.21 23.7   156  3.62 22.7   164  3.63  88  0 12  0  0
   0   641  6.9 6  0.04 25.7   243  6.10 25.3   246  6.08  71  0 29  0  0
   0   737 20.1 9  0.18 36.4   148  5.24 37.2   144  5.24  78  0 22  0  0
   0   578 44.723  1.03 25.1   164  4.01 25.5   161  3.99  86  0 14  0  0
   0   608 70.315  1.06 51.164  3.19 51.364  3.19  89  0 11  0  0
   0   624 38.6 9  0.35 32.3   121  3.80 32.2   121  3.79  90  0 10  0  0
   0   577 80.616  1.28 37.866  2.44 36.569  2.46  90  0 10  0  0
   tty nda0 ada0 ada1 cpu
 tin  tout KB/t   tps  MB/s KB/t   tps  MB/s KB/t   tps  MB/s  us ni sy in id
   0   566 87.716  1.39 27.260  1.60 25.366  1.62  87  0 13  0  0
   0   599 77.211  0.83 17.4   391  6.66 17.3   395  6.66  74  0 26  0  0
   0   660 45.0 7  0.31 18.7   575 10.51 18.6   578 10.49  76  0 24  0  0
   0   615 37.7 8  0.31 24.0   303  7.11 24.0   303  7.11  58  0 42  0  0
Fssh_packet_write_wait: ... port 22: Broken pipe
ada* are old and slow spinning rust.


That report does mention something else that could also be a cause.  I
upgraded the motherboard BIOS around the same time.  When I get a
chance, I'll drop back to the older FreeBSD version and see if the
problem goes away.




Re: nvme controller reset failures on recent -CURRENT

2024-02-12 Thread Mark Johnston
On Mon, Feb 12, 2024 at 04:28:10PM -0800, Don Lewis wrote:
> I just upgraded my package build machine to:
>   FreeBSD 15.0-CURRENT #110 main-n268161-4015c064200e
> from:
>   FreeBSD 15.0-CURRENT #106 main-n265953-a5ed6a815e38
> and I've had two nvme-triggered panics in the last day.
> 
> nvme is being used for swap and L2ARC.  I'm not able to get a crash
> dump, probably because the nvme device has gone away and I get an error
> about not having a dump device.  It looks like a low-memory panic
> because free memory is low and zfs is calling malloc().
> 
> This shows up in the log leading up to the panic:
> Feb 12 10:07:41 zipper kernel: nvme0: Resetting controller due to a timeout a
> nd possible hot unplug.
> Feb 12 10:07:41 zipper syslogd: last message repeated 1 times
> Feb 12 10:07:41 zipper kernel: nvme0: resetting controller
> Feb 12 10:07:41 zipper kernel: nvme0: Resetting controller due to a timeout a
> nd possible hot unplug.
> Feb 12 10:07:41 zipper syslogd: last message repeated 1 times
> Feb 12 10:07:41 zipper kernel: nvme0: Waiting for reset to complete
> Feb 12 10:07:41 zipper syslogd: last message repeated 2 times
> Feb 12 10:07:41 zipper kernel: nvme0: failing queued i/o
> Feb 12 10:07:41 zipper kernel: nvme0: Failed controller, stopping watchdog ti
> meout.

Are you by chance using the drive mentioned here? 
https://github.com/openzfs/zfs/discussions/14793

I was bitten by that and ended up replacing the drive with a different
model.  The crash manifested exactly as you describe, though I didn't
have L2ARC or swap enabled on it.

> The device looks healthy to me:
> SMART/Health Information Log
> 
> Critical Warning State: 0x00
>  Available spare:   0
>  Temperature:   0
>  Device reliability:0
>  Read only: 0
>  Volatile memory backup:0
> Temperature:312 K, 38.85 C, 101.93 F
> Available spare:100
> Available spare threshold:  10
> Percentage used:3
> Data units (512,000 byte) read: 5761183
> Data units written: 29911502
> Host read commands: 471921188
> Host write commands:605394753
> Controller busy time (minutes): 32359
> Power cycles:   110
> Power on hours: 19297
> Unsafe shutdowns:   14
> Media errors:   0
> No. error info log entries: 0
> Warning Temp Composite Time:0
> Error Temp Composite Time:  0
> Temperature 1 Transition Count: 5231
> Temperature 2 Transition Count: 0
> Total Time For Temperature 1:   41213
> Total Time For Temperature 2:   0
> 
> 



Re: nvme controller reset failures on recent -CURRENT

2024-02-12 Thread Maxim Sobolev
Might be an overheating. Today's nvme drives are notoriously flaky if you
run them without proper heat sink attached to it.

-Max



On Mon, Feb 12, 2024, 4:28 PM Don Lewis  wrote:

> I just upgraded my package build machine to:
>   FreeBSD 15.0-CURRENT #110 main-n268161-4015c064200e
> from:
>   FreeBSD 15.0-CURRENT #106 main-n265953-a5ed6a815e38
> and I've had two nvme-triggered panics in the last day.
>
> nvme is being used for swap and L2ARC.  I'm not able to get a crash
> dump, probably because the nvme device has gone away and I get an error
> about not having a dump device.  It looks like a low-memory panic
> because free memory is low and zfs is calling malloc().
>
> This shows up in the log leading up to the panic:
> Feb 12 10:07:41 zipper kernel: nvme0: Resetting controller due to a
> timeout a
> nd possible hot unplug.
> Feb 12 10:07:41 zipper syslogd: last message repeated 1 times
> Feb 12 10:07:41 zipper kernel: nvme0: resetting controller
> Feb 12 10:07:41 zipper kernel: nvme0: Resetting controller due to a
> timeout a
> nd possible hot unplug.
> Feb 12 10:07:41 zipper syslogd: last message repeated 1 times
> Feb 12 10:07:41 zipper kernel: nvme0: Waiting for reset to complete
> Feb 12 10:07:41 zipper syslogd: last message repeated 2 times
> Feb 12 10:07:41 zipper kernel: nvme0: failing queued i/o
> Feb 12 10:07:41 zipper kernel: nvme0: Failed controller, stopping watchdog
> ti
> meout.
>
> The device looks healthy to me:
> SMART/Health Information Log
> 
> Critical Warning State: 0x00
>  Available spare:   0
>  Temperature:   0
>  Device reliability:0
>  Read only: 0
>  Volatile memory backup:0
> Temperature:312 K, 38.85 C, 101.93 F
> Available spare:100
> Available spare threshold:  10
> Percentage used:3
> Data units (512,000 byte) read: 5761183
> Data units written: 29911502
> Host read commands: 471921188
> Host write commands:605394753
> Controller busy time (minutes): 32359
> Power cycles:   110
> Power on hours: 19297
> Unsafe shutdowns:   14
> Media errors:   0
> No. error info log entries: 0
> Warning Temp Composite Time:0
> Error Temp Composite Time:  0
> Temperature 1 Transition Count: 5231
> Temperature 2 Transition Count: 0
> Total Time For Temperature 1:   41213
> Total Time For Temperature 2:   0
>
>
>


nvme controller reset failures on recent -CURRENT

2024-02-12 Thread Don Lewis
I just upgraded my package build machine to:
  FreeBSD 15.0-CURRENT #110 main-n268161-4015c064200e
from:
  FreeBSD 15.0-CURRENT #106 main-n265953-a5ed6a815e38
and I've had two nvme-triggered panics in the last day.

nvme is being used for swap and L2ARC.  I'm not able to get a crash
dump, probably because the nvme device has gone away and I get an error
about not having a dump device.  It looks like a low-memory panic
because free memory is low and zfs is calling malloc().

This shows up in the log leading up to the panic:
Feb 12 10:07:41 zipper kernel: nvme0: Resetting controller due to a timeout a
nd possible hot unplug.
Feb 12 10:07:41 zipper syslogd: last message repeated 1 times
Feb 12 10:07:41 zipper kernel: nvme0: resetting controller
Feb 12 10:07:41 zipper kernel: nvme0: Resetting controller due to a timeout a
nd possible hot unplug.
Feb 12 10:07:41 zipper syslogd: last message repeated 1 times
Feb 12 10:07:41 zipper kernel: nvme0: Waiting for reset to complete
Feb 12 10:07:41 zipper syslogd: last message repeated 2 times
Feb 12 10:07:41 zipper kernel: nvme0: failing queued i/o
Feb 12 10:07:41 zipper kernel: nvme0: Failed controller, stopping watchdog ti
meout.

The device looks healthy to me:
SMART/Health Information Log

Critical Warning State: 0x00
 Available spare:   0
 Temperature:   0
 Device reliability:0
 Read only: 0
 Volatile memory backup:0
Temperature:312 K, 38.85 C, 101.93 F
Available spare:100
Available spare threshold:  10
Percentage used:3
Data units (512,000 byte) read: 5761183
Data units written: 29911502
Host read commands: 471921188
Host write commands:605394753
Controller busy time (minutes): 32359
Power cycles:   110
Power on hours: 19297
Unsafe shutdowns:   14
Media errors:   0
No. error info log entries: 0
Warning Temp Composite Time:0
Error Temp Composite Time:  0
Temperature 1 Transition Count: 5231
Temperature 2 Transition Count: 0
Total Time For Temperature 1:   41213
Total Time For Temperature 2:   0




Re: make buildworld failure on arm64 on -current n267777

2024-01-28 Thread void
On Fri, 26 Jan 2024, at 00:14, void wrote:
> In /usr/src # git rev-list --count --first-parent HEAD
> 26

in /usr/src, a 'git reset --hard' followed by 'git pull' and then 'git checkout 
main'
fixed this.

For some reason, 'git pull --ff-only' didn't pull
/usr/src/sys/contrib/dev/acpica/include/platform !

-- 



make buildworld failure on arm64 on -current n267777

2024-01-25 Thread void
In /usr/src # git rev-list --count --first-parent HEAD
26

include/machine -> /usr/src/sys/arm64/include 
Building /usr/obj/usr/src/arm64.aarch64/stand/efi/loader_4th/vers.c 
Building /usr/obj/usr/src/arm64.aarch64/stand/efi/loader_4th/8x16.c 
Building /usr/obj/usr/src/arm64.aarch64/stand/efi/loader_4th/autoload.o 
Building /usr/obj/usr/src/arm64.aarch64/stand/efi/loader_4th/bootinfo.o 
Building /usr/obj/usr/src/arm64.aarch64/stand/efi/loader_4th/conf.o 
Building /usr/obj/usr/src/arm64.aarch64/stand/efi/loader_4th/copy.o 
Building /usr/obj/usr/src/arm64.aarch64/stand/efi/loader_4th/efi_main.o 
Building /usr/obj/usr/src/arm64.aarch64/stand/efi/loader_4th/framebuffer.o 
Building /usr/obj/usr/src/arm64.aarch64/stand/efi/loader_4th/main.o 
/usr/src/stand/efi/loader_4th/../loader/main.c:63:10: fatal error: 
'platform/acfreebsd.h' file not found 63 | #include "platform/acfreebsd.h" | 
^~ 1 error generated. 

make[2]: stopped in /usr/src make[2]: 
stopped in /usr/src make[4]: 
stopped in /usr/src/secure/lib make[3]: 
stopped in /usr/src/secure make[2]: 
stopped in /usr/src make[3]: 
stopped in /usr/src/lib make[2]: 
stopped in /usr/src 

93.16 real 26.94 user 8.34 sys 



Re: NFSv4 crash of CURRENT

2024-01-15 Thread Rick Macklem
On Mon, Jan 15, 2024 at 11:03 AM FreeBSD User  wrote:
>
> Am Mon, 15 Jan 2024 11:53:31 +0100
> Peter Blok  schrieb:
>
> > Hi,
> >
> > Forgot to mention I’m on 13-stable. The fix that is causing the crash with 
> > automounted NFS
> > is:
> >
> > commit cc5cda1dbaa907ce52074f47264cc45b5a7d6c8b
> > Author: Konstantin Belousov 
> > Date:   Tue Jan 2 00:22:44 2024 +0200
> >
> > nfsclient: limit situations when we do unlocked read-ahead by nfsiod
> >
> > (cherry picked from commit 70dc6b2ce314a0f32755005ad02802fca7ed186e)
> >
> > When I remove the fix, the problem is gone. Add it back and the crash 
> > happens.
> >
> > Peter
> >
> > > On 15 Jan 2024, at 09:31, Peter Blok  wrote:
> > >
> > > Hi,
> > >
> > > I do have a crash on a NFS client with stable of today
> > > (4c4633fdffbe8e4b6d328c2bc9bb3edacc9ab50a). It is also autofs related 
> > > Maybe it is the
> > > same problem.
> > >
> > > I have ports automounted on /am/ports. When I do cd /am/ports/sys and 
> > > type tab to
> > > autocomplete it crashes with the below stack trace. If I plainly mount 
> > > ports on /usr/ports
> > > and do the same everything works. I am using NFSv3
> > >
> > > Peter
> > >
> > >
> > >
> > >
> > > Fatal trap 12: page fault while in kernel mode
> > > cpuid = 2; apic id = 04
> > > fault virtual address   = 0x89
> > > fault code  = supervisor read data, page not present
> > > instruction pointer = 0x20:0xffff809645d4
> > > stack pointer   = 0x28:0xfe00acadb830
> > > frame pointer   = 0x28:0xfe00acadb830
> > > code segment= base 0x0, limit 0xf, type 0x1b
> > > = DPL 0, pres 1, long 1, def32 0, gran 1
> > > processor eflags= interrupt enabled, resume, IOPL = 0
> > > current process = 6869 (csh)
> > > trap number = 12
> > > panic: page fault
> > > cpuid = 2
> > > time = 1705306940
> > > KDB: stack backtrace:
> > > #0 0x806232f5 at kdb_backtrace+0x65
> > > #1 0x805d7a02 at vpanic+0x152
> > > #2 0x805d78a3 at panic+0x43
> > > #3 0x809d58ad at trap_fatal+0x38d
> > > #4 0x809d58ff at trap_pfault+0x4f
> > > #5 0x809af048 at calltrap+0x8
> > > #6 0x804c7a7e at ncl_bioread+0xb7e
> > > #7 0x804b9d90 at nfs_readdir+0x1f0
> > > #8 0x8069c61a at vop_sigdefer+0x2a
> > > #9 0x809f8ae0 at VOP_READDIR_APV+0x20
> > > #10 0x81ce75de at autofs_readdir+0x2ce
> > > #11 0x809f8ae0 at VOP_READDIR_APV+0x20
> > > #12 0x806c3002 at kern_getdirentries+0x222
> > > #13 0x806c33a9 at sys_getdirentries+0x29
> > > #14 0x809d6180 at amd64_syscall+0x110
> > > #15 0x809af95b at fast_syscall_common+0xf8
> > >
> > >
> > >
> > >> On 15 Jan 2024, at 06:46, FreeBSD User  > >> <mailto:free...@walstatt-de.de>> wrote:
> > >>
> > >> Am Sun, 14 Jan 2024 20:34:12 -0800
> > >> Cy Schubert  > >> <mailto:Cy.Schubert@cschubertcom>> schrieb:
> > >>
> > >>> In message 
> > >>>  > >>> <mailto:CAM5tNy5aat8vUn2fsX9jV=D9yGZdnO20Q0Ea7qtszx+zSES2bw@mail.gmail.c>
> > >>> om>
> > >>> , Rick Macklem writes:
> > >>>> On Sat, Jan 13, 2024 at 12:39=E2=80=AFPM Ronald Klop 
> > >>>>  > >>>> <mailto:ronald-li...@klop.ws>>= wrote:
> > >>>>>
> > >>>>>
> > >>>>> Van: FreeBSD User  > >>>>> <mailto:free...@walstatt-de.de>>
> > >>>>> Datum: 13 januari 2024 19:34
> > >>>>> Aan: FreeBSD CURRENT  > >>>>> <mailto:freebsd-current@freebsd.org>>
> > >>>>> Onderwerp: NFSv4 crash of CURRENT
> > >>>>>
> > >>>>> Hello,
> > >>>>>
> > >>>>> running CURRENT client (FreeBSD 15.0-CURRENT #4 
> > >>>>> main-n267556-69748e62e82a=
> > >>>> : Sat Jan 13 18:08:32
> > >>>>> CET 2024 amd64). One NFSv4 server is same OS revision as the 
> > >>>>> mentioned cl=
> > >>>> i

Re: NFSv4 crash of CURRENT

2024-01-15 Thread FreeBSD User
Am Mon, 15 Jan 2024 16:59:07 +0100
Peter Blok  schrieb:

> Rick,
> 
> I can confirm Kostik’s fix works on 13-stable.
> 
> Peter

Me, too.
The patch fixed the reported problem.

Thank you very much.

oh

> 
> > On 15 Jan 2024, at 16:13, Peter Blok  wrote:
> > 
> > I can give it a shot on one of my clients.
> >   
> >> On 15 Jan 2024, at 16:04, Rick Macklem  >> <mailto:rick.mack...@gmail.com>> wrote:
> >> 
> >> On Mon, Jan 15, 2024 at 2:53 AM Peter Blok  >> <mailto:pb...@bsd4all.org>>
> >> wrote:  
> >>> 
> >>> Hi,
> >>> 
> >>> Forgot to mention I’m on 13-stable. The fix that is causing the crash 
> >>> with automounted
> >>> NFS is:
> >>> 
> >>> commit cc5cda1dbaa907ce52074f47264cc45b5a7d6c8b
> >>> Author: Konstantin Belousov mailto:k...@freebsd.org>>
> >>> Date:   Tue Jan 2 00:22:44 2024 +0200
> >>> 
> >>>nfsclient: limit situations when we do unlocked read-ahead by nfsiod
> >>> 
> >>>(cherry picked from commit 70dc6b2ce314a0f32755005ad02802fca7ed186e)
> >>> 
> >>> When I remove the fix, the problem is gone. Add it back and the crash 
> >>> happens.  
> >> Kostik has already come up with a probable fix. If you want it right
> >> away, here it is,
> >> but he'll probably commit it soon anyhow:
> >> diff --git a/sys/fs/nfsclient/nfs_clbio.c b/sys/fs/nfsclient/nfs_clbioc
> >> index c027d7d7c3fd..1cf45bb0c924 100644
> >> --- a/sys/fs/nfsclient/nfs_clbio.c
> >> +++ b/sys/fs/nfsclient/nfs_clbio.c
> >> @@ -414,6 +414,18 @@ nfs_bioread_check_cons(struct vnode *vp, struct
> >> thread *td, struct ucred *cred)
> >>return (error);
> >> }
> >> 
> >> +static bool
> >> +ncl_bioread_dora(struct vnode *vp)
> >> +{
> >> +   vm_object_t obj;
> >> +
> >> +   obj = vp->v_object;
> >> +   if (obj == NULL)
> >> +   return (true);
> >> +   return (!vm_object_mightbedirty(vp->v_object) &&
> >> +   vp->v_object->un_pager.vnp.writemappings == 0);
> >> +}
> >> +
> >> /*
> >>  * Vnode op for read using bio
> >>  */
> >> @@ -486,9 +498,7 @@ ncl_bioread(struct vnode *vp, struct uio *uio, int
> >> ioflag, struct ucred *cred)
> >> * unlocked read by nfsiod could obliterate changes
> >> * done by userspace.
> >> */
> >> -   if (nmp->nm_readahead > 0 &&
> >> -   !vm_object_mightbedirty(vp->v_object) &&
> >> -   vp->v_object->un_pager.vnp.writemappings == 0) {
> >> +   if (nmp->nm_readahead > 0 && ncl_bioread_dora(vp)) {
> >>for (nra = 0; nra < nmp->nm_readahead && nra < seqcount 
> >> &&
> >>(off_t)(lbn + 1 + nra) * biosize < nsize; nra++) {
> >>rabn = lbn + 1 + nra;
> >> @@ -675,9 +685,7 @@ ncl_bioread(struct vnode *vp, struct uio *uio, int
> >> ioflag, struct ucred *cred)
> >> *  directory offset cookie of the next block.)
> >> */
> >>NFSLOCKNODE(np);
> >> -   if (nmp->nm_readahead > 0 &&
> >> -   !vm_object_mightbedirty(vp->v_object) &&
> >> -   vp->v_object->un_pager.vnp.writemappings == 0 &&
> >> +   if (nmp->nm_readahead > 0 && ncl_bioread_dora(vp) &&
> >>(bp->b_flags & B_INVAL) == 0 &&
> >>(np->n_direofoffset == 0 ||
> >>(lbn + 1) * NFS_DIRBLKSIZ < np->n_direofoffset) &&
> >> 
> >> rick
> >> ps: It appears that autofs causes the directory to be read before it
> >> is open'd for
> >>  some reason. I've never looked at autofs.
> >>   
> >>> 
> >>> Peter
> >>> 
> >>> On 15 Jan 2024, at 09:31, Peter Blok  >>> <mailto:pb...@bsd4all.org>>
> >>> wrote:
> >>> 
> >>> Hi,
> >>> 
> >>> I do have a crash on a NFS client with stable of today
> >>> (4c4633fdffbe8e4b6d328c2bc9bb3edacc9ab50a). It is also autofs related 

Re: NFSv4 crash of CURRENT

2024-01-15 Thread FreeBSD User
Am Mon, 15 Jan 2024 11:53:31 +0100
Peter Blok  schrieb:

> Hi,
> 
> Forgot to mention I’m on 13-stable. The fix that is causing the crash with 
> automounted NFS
> is:
> 
> commit cc5cda1dbaa907ce52074f47264cc45b5a7d6c8b
> Author: Konstantin Belousov 
> Date:   Tue Jan 2 00:22:44 2024 +0200
> 
> nfsclient: limit situations when we do unlocked read-ahead by nfsiod
> 
> (cherry picked from commit 70dc6b2ce314a0f32755005ad02802fca7ed186e)
> 
> When I remove the fix, the problem is gone. Add it back and the crash happens.
> 
> Peter
> 
> > On 15 Jan 2024, at 09:31, Peter Blok  wrote:
> > 
> > Hi,
> > 
> > I do have a crash on a NFS client with stable of today
> > (4c4633fdffbe8e4b6d328c2bc9bb3edacc9ab50a). It is also autofs related. 
> > Maybe it is the
> > same problem.
> > 
> > I have ports automounted on /am/ports. When I do cd /am/ports/sys and type 
> > tab to
> > autocomplete it crashes with the below stack trace. If I plainly mount 
> > ports on /usr/ports
> > and do the same everything works. I am using NFSv3
> > 
> > Peter
> > 
> > 
> > 
> > 
> > Fatal trap 12: page fault while in kernel mode
> > cpuid = 2; apic id = 04
> > fault virtual address   = 0x89
> > fault code  = supervisor read data, page not present
> > instruction pointer = 0x20:0x809645d4
> > stack pointer   = 0x28:0xfe00acadb830
> > frame pointer   = 0x28:0xfe00acadb830
> > code segment= base 0x0, limit 0xf, type 0x1b
> > = DPL 0, pres 1, long 1, def32 0, gran 1
> > processor eflags= interrupt enabled, resume, IOPL = 0
> > current process = 6869 (csh)
> > trap number = 12
> > panic: page fault
> > cpuid = 2
> > time = 1705306940
> > KDB: stack backtrace:
> > #0 0x806232f5 at kdb_backtrace+0x65
> > #1 0x805d7a02 at vpanic+0x152
> > #2 0x805d78a3 at panic+0x43
> > #3 0x809d58ad at trap_fatal+0x38d
> > #4 0x809d58ff at trap_pfault+0x4f
> > #5 0x809af048 at calltrap+0x8
> > #6 0x804c7a7e at ncl_bioread+0xb7e
> > #7 0x804b9d90 at nfs_readdir+0x1f0
> > #8 0x8069c61a at vop_sigdefer+0x2a
> > #9 0x809f8ae0 at VOP_READDIR_APV+0x20
> > #10 0x81ce75de at autofs_readdir+0x2ce
> > #11 0x809f8ae0 at VOP_READDIR_APV+0x20
> > #12 0x806c3002 at kern_getdirentries+0x222
> > #13 0x806c33a9 at sys_getdirentries+0x29
> > #14 0x809d6180 at amd64_syscall+0x110
> > #15 0x809af95b at fast_syscall_common+0xf8
> > 
> > 
> >   
> >> On 15 Jan 2024, at 06:46, FreeBSD User  >> <mailto:free...@walstatt-de.de>> wrote:
> >> 
> >> Am Sun, 14 Jan 2024 20:34:12 -0800
> >> Cy Schubert mailto:cy.schub...@cschubert.com>> 
> >> schrieb:
> >>   
> >>> In message 
> >>>  >>> <mailto:CAM5tNy5aat8vUn2fsX9jV=D9yGZdnO20Q0Ea7qtszx+zSES2bw@mail.gmail.c> 
> >>>  
> >>> om>
> >>> , Rick Macklem writes:  
> >>>> On Sat, Jan 13, 2024 at 12:39=E2=80=AFPM Ronald Klop 
> >>>>  >>>> <mailto:ronald-li...@klop.ws>>= wrote:
> >>>>> 
> >>>>> 
> >>>>> Van: FreeBSD User  >>>>> <mailto:free...@walstatt-de.de>>
> >>>>> Datum: 13 januari 2024 19:34
> >>>>> Aan: FreeBSD CURRENT  >>>>> <mailto:freebsd-current@freebsd.org>>
> >>>>> Onderwerp: NFSv4 crash of CURRENT
> >>>>> 
> >>>>> Hello,
> >>>>> 
> >>>>> running CURRENT client (FreeBSD 15.0-CURRENT #4 
> >>>>> main-n267556-69748e62e82a=
> >>>> : Sat Jan 13 18:08:32
> >>>>> CET 2024 amd64). One NFSv4 server is same OS revision as the mentioned 
> >>>>> cl=
> >>>> ient, other is FreeBSD
> >>>>> 13.2-RELEASE-p8. Both offer NFSv4 filesystems, non-kerberized.
> >>>>> 
> >>>>> I can crash the client reproducable by accessing the one or other NFSv4 
> >>>>> F=
> >>>> S (a simple ls -la).
> >>>>> The NFSv4 FS is backed by ZFS (if this matters). I do not have physicla 
> >>>>> a=
> >>>> ccess to the client
> >>>>> host, luckily the box recovers.
> >>>> Did you rebuild both the nfscommon and nfscl modules from the same 
> >>>> sources?
> >>>> I did a commit to main that changes the interface between these two
> >>>> modules and did bump the
> >>>> __FreeBSD_version to 1500010, which should cause both to be rebuilt.
> >>>> (If you have "options NFSCL" in your kernel config, both should have
> >>>> been rebuilt as a part of
> >>>> the kernel build.)
> >>>>   
> >>> 
> >>> Is anyone by chance seeing autofs in the backtrace too?
> >>> 
> >>>   
> >> 
> >> Hello Cy Shubert,
> >> 
> >> I forgot to mention that those crashes occur with autofs mounted 
> >> filesystems. Good
> >> question, by the way, I will check whether crashes also happen when 
> >> mounting the
> >> tradidional way.
> >> 
> >> Kind regards,
> >> 
> >> oh
> >> 
> >> -- 
> >> O. Hartmann  
> >   
> 

good catch!

-- 
O. Hartmann



Re: NFSv4 crash of CURRENT

2024-01-15 Thread Peter Blok
Rick,

I can confirm Kostik’s fix works on 13-stable.

Peter

> On 15 Jan 2024, at 16:13, Peter Blok  wrote:
> 
> I can give it a shot on one of my clients.
> 
>> On 15 Jan 2024, at 16:04, Rick Macklem > <mailto:rick.mack...@gmail.com>> wrote:
>> 
>> On Mon, Jan 15, 2024 at 2:53 AM Peter Blok > <mailto:pb...@bsd4all.org>> wrote:
>>> 
>>> Hi,
>>> 
>>> Forgot to mention I’m on 13-stable. The fix that is causing the crash with 
>>> automounted NFS is:
>>> 
>>> commit cc5cda1dbaa907ce52074f47264cc45b5a7d6c8b
>>> Author: Konstantin Belousov mailto:k...@freebsd.org>>
>>> Date:   Tue Jan 2 00:22:44 2024 +0200
>>> 
>>>nfsclient: limit situations when we do unlocked read-ahead by nfsiod
>>> 
>>>(cherry picked from commit 70dc6b2ce314a0f32755005ad02802fca7ed186e)
>>> 
>>> When I remove the fix, the problem is gone. Add it back and the crash 
>>> happens.
>> Kostik has already come up with a probable fix. If you want it right
>> away, here it is,
>> but he'll probably commit it soon anyhow:
>> diff --git a/sys/fs/nfsclient/nfs_clbio.c b/sys/fs/nfsclient/nfs_clbio.c
>> index c027d7d7c3fd..1cf45bb0c924 100644
>> --- a/sys/fs/nfsclient/nfs_clbio.c
>> +++ b/sys/fs/nfsclient/nfs_clbio.c
>> @@ -414,6 +414,18 @@ nfs_bioread_check_cons(struct vnode *vp, struct
>> thread *td, struct ucred *cred)
>>return (error);
>> }
>> 
>> +static bool
>> +ncl_bioread_dora(struct vnode *vp)
>> +{
>> +   vm_object_t obj;
>> +
>> +   obj = vp->v_object;
>> +   if (obj == NULL)
>> +   return (true);
>> +   return (!vm_object_mightbedirty(vp->v_object) &&
>> +   vp->v_object->un_pager.vnp.writemappings == 0);
>> +}
>> +
>> /*
>>  * Vnode op for read using bio
>>  */
>> @@ -486,9 +498,7 @@ ncl_bioread(struct vnode *vp, struct uio *uio, int
>> ioflag, struct ucred *cred)
>> * unlocked read by nfsiod could obliterate changes
>> * done by userspace.
>> */
>> -   if (nmp->nm_readahead > 0 &&
>> -   !vm_object_mightbedirty(vp->v_object) &&
>> -   vp->v_object->un_pager.vnp.writemappings == 0) {
>> +   if (nmp->nm_readahead > 0 && ncl_bioread_dora(vp)) {
>>for (nra = 0; nra < nmp->nm_readahead && nra < seqcount &&
>>(off_t)(lbn + 1 + nra) * biosize < nsize; nra++) {
>>rabn = lbn + 1 + nra;
>> @@ -675,9 +685,7 @@ ncl_bioread(struct vnode *vp, struct uio *uio, int
>> ioflag, struct ucred *cred)
>> *  directory offset cookie of the next block.)
>> */
>>NFSLOCKNODE(np);
>> -   if (nmp->nm_readahead > 0 &&
>> -   !vm_object_mightbedirty(vp->v_object) &&
>> -   vp->v_object->un_pager.vnp.writemappings == 0 &&
>> +   if (nmp->nm_readahead > 0 && ncl_bioread_dora(vp) &&
>>(bp->b_flags & B_INVAL) == 0 &&
>>(np->n_direofoffset == 0 ||
>>(lbn + 1) * NFS_DIRBLKSIZ < np->n_direofoffset) &&
>> 
>> rick
>> ps: It appears that autofs causes the directory to be read before it
>> is open'd for
>>  some reason. I've never looked at autofs.
>> 
>>> 
>>> Peter
>>> 
>>> On 15 Jan 2024, at 09:31, Peter Blok >> <mailto:pb...@bsd4all.org>> wrote:
>>> 
>>> Hi,
>>> 
>>> I do have a crash on a NFS client with stable of today 
>>> (4c4633fdffbe8e4b6d328c2bc9bb3edacc9ab50a). It is also autofs related. 
>>> Maybe it is the same problem.
>>> 
>>> I have ports automounted on /am/ports. When I do cd /am/ports/sys and type 
>>> tab to autocomplete it crashes with the below stack trace. If I plainly 
>>> mount ports on /usr/ports and do the same everything works. I am using NFSv3
>>> 
>>> Peter
>>> 
>>> 
>>> 
>>> 
>>> Fatal trap 12: page fault while in kernel mode
>>> cpuid = 2; apic id = 04
>>> fault virtual address = 0x89
>>> fault code = supervisor read data, page not present
>>> instruction pointer = 0x20:0x809645d4
>>&

Re: NFSv4 crash of CURRENT

2024-01-15 Thread Peter Blok
I can give it a shot on one of my clients.

> On 15 Jan 2024, at 16:04, Rick Macklem  wrote:
> 
> On Mon, Jan 15, 2024 at 2:53 AM Peter Blok  <mailto:pb...@bsd4all.org>> wrote:
>> 
>> Hi,
>> 
>> Forgot to mention I’m on 13-stable. The fix that is causing the crash with 
>> automounted NFS is:
>> 
>> commit cc5cda1dbaa907ce52074f47264cc45b5a7d6c8b
>> Author: Konstantin Belousov 
>> Date:   Tue Jan 2 00:22:44 2024 +0200
>> 
>>nfsclient: limit situations when we do unlocked read-ahead by nfsiod
>> 
>>(cherry picked from commit 70dc6b2ce314a0f32755005ad02802fca7ed186e)
>> 
>> When I remove the fix, the problem is gone. Add it back and the crash 
>> happens.
> Kostik has already come up with a probable fix. If you want it right
> away, here it is,
> but he'll probably commit it soon anyhow:
> diff --git a/sys/fs/nfsclient/nfs_clbio.c b/sys/fs/nfsclient/nfs_clbio.c
> index c027d7d7c3fd..1cf45bb0c924 100644
> --- a/sys/fs/nfsclient/nfs_clbio.c
> +++ b/sys/fs/nfsclient/nfs_clbio.c
> @@ -414,6 +414,18 @@ nfs_bioread_check_cons(struct vnode *vp, struct
> thread *td, struct ucred *cred)
>return (error);
> }
> 
> +static bool
> +ncl_bioread_dora(struct vnode *vp)
> +{
> +   vm_object_t obj;
> +
> +   obj = vp->v_object;
> +   if (obj == NULL)
> +   return (true);
> +   return (!vm_object_mightbedirty(vp->v_object) &&
> +   vp->v_object->un_pager.vnp.writemappings == 0);
> +}
> +
> /*
>  * Vnode op for read using bio
>  */
> @@ -486,9 +498,7 @@ ncl_bioread(struct vnode *vp, struct uio *uio, int
> ioflag, struct ucred *cred)
> * unlocked read by nfsiod could obliterate changes
> * done by userspace.
> */
> -   if (nmp->nm_readahead > 0 &&
> -   !vm_object_mightbedirty(vp->v_object) &&
> -   vp->v_object->un_pager.vnp.writemappings == 0) {
> +   if (nmp->nm_readahead > 0 && ncl_bioread_dora(vp)) {
>for (nra = 0; nra < nmp->nm_readahead && nra < seqcount &&
>(off_t)(lbn + 1 + nra) * biosize < nsize; nra++) {
>rabn = lbn + 1 + nra;
> @@ -675,9 +685,7 @@ ncl_bioread(struct vnode *vp, struct uio *uio, int
> ioflag, struct ucred *cred)
> *  directory offset cookie of the next block.)
> */
>NFSLOCKNODE(np);
> -   if (nmp->nm_readahead > 0 &&
> -   !vm_object_mightbedirty(vp->v_object) &&
> -   vp->v_object->un_pager.vnp.writemappings == 0 &&
> +   if (nmp->nm_readahead > 0 && ncl_bioread_dora(vp) &&
>(bp->b_flags & B_INVAL) == 0 &&
>(np->n_direofoffset == 0 ||
>(lbn + 1) * NFS_DIRBLKSIZ < np->n_direofoffset) &&
> 
> rick
> ps: It appears that autofs causes the directory to be read before it
> is open'd for
>  some reason. I've never looked at autofs.
> 
>> 
>> Peter
>> 
>> On 15 Jan 2024, at 09:31, Peter Blok  wrote:
>> 
>> Hi,
>> 
>> I do have a crash on a NFS client with stable of today 
>> (4c4633fdffbe8e4b6d328c2bc9bb3edacc9ab50a). It is also autofs related. Maybe 
>> it is the same problem.
>> 
>> I have ports automounted on /am/ports. When I do cd /am/ports/sys and type 
>> tab to autocomplete it crashes with the below stack trace. If I plainly 
>> mount ports on /usr/ports and do the same everything works. I am using NFSv3
>> 
>> Peter
>> 
>> 
>> 
>> 
>> Fatal trap 12: page fault while in kernel mode
>> cpuid = 2; apic id = 04
>> fault virtual address = 0x89
>> fault code = supervisor read data, page not present
>> instruction pointer = 0x20:0x809645d4
>> stack pointer= 0x28:0xfe00acadb830
>> frame pointer= 0x28:0xfe00acadb830
>> code segment = base 0x0, limit 0xf, type 0x1b
>> = DPL 0, pres 1, long 1, def32 0, gran 1
>> processor eflags = interrupt enabled, resume, IOPL = 0
>> current process = 6869 (csh)
>> trap number = 12
>> panic: page fault
>> cpuid = 2
>> time = 1705306940
>> KDB: stack backtrace:
>> #0 0x806232f5 at kdb_backtrace+0x65
>> #1 0x805d7a02 at vpanic+0x152
>> #2 0x805d78a3 at panic+0x43
>> #3 0x809d58ad at trap_fatal+0x38d
>> #4 0x

Re: NFSv4 crash of CURRENT

2024-01-15 Thread Rick Macklem
On Mon, Jan 15, 2024 at 2:53 AM Peter Blok  wrote:
>
> Hi,
>
> Forgot to mention I’m on 13-stable. The fix that is causing the crash with 
> automounted NFS is:
>
> commit cc5cda1dbaa907ce52074f47264cc45b5a7d6c8b
> Author: Konstantin Belousov 
> Date:   Tue Jan 2 00:22:44 2024 +0200
>
> nfsclient: limit situations when we do unlocked read-ahead by nfsiod
>
> (cherry picked from commit 70dc6b2ce314a0f32755005ad02802fca7ed186e)
>
> When I remove the fix, the problem is gone. Add it back and the crash happens.
Kostik has already come up with a probable fix. If you want it right
away, here it is,
but he'll probably commit it soon anyhow:
diff --git a/sys/fs/nfsclient/nfs_clbio.c b/sys/fs/nfsclient/nfs_clbio.c
index c027d7d7c3fd..1cf45bb0c924 100644
--- a/sys/fs/nfsclient/nfs_clbio.c
+++ b/sys/fs/nfsclient/nfs_clbio.c
@@ -414,6 +414,18 @@ nfs_bioread_check_cons(struct vnode *vp, struct
thread *td, struct ucred *cred)
return (error);
 }

+static bool
+ncl_bioread_dora(struct vnode *vp)
+{
+   vm_object_t obj;
+
+   obj = vp->v_object;
+   if (obj == NULL)
+   return (true);
+   return (!vm_object_mightbedirty(vp->v_object) &&
+   vp->v_object->un_pager.vnp.writemappings == 0);
+}
+
 /*
  * Vnode op for read using bio
  */
@@ -486,9 +498,7 @@ ncl_bioread(struct vnode *vp, struct uio *uio, int
ioflag, struct ucred *cred)
 * unlocked read by nfsiod could obliterate changes
 * done by userspace.
 */
-   if (nmp->nm_readahead > 0 &&
-   !vm_object_mightbedirty(vp->v_object) &&
-   vp->v_object->un_pager.vnp.writemappings == 0) {
+   if (nmp->nm_readahead > 0 && ncl_bioread_dora(vp)) {
for (nra = 0; nra < nmp->nm_readahead && nra < seqcount &&
(off_t)(lbn + 1 + nra) * biosize < nsize; nra++) {
rabn = lbn + 1 + nra;
@@ -675,9 +685,7 @@ ncl_bioread(struct vnode *vp, struct uio *uio, int
ioflag, struct ucred *cred)
 *  directory offset cookie of the next block.)
 */
NFSLOCKNODE(np);
-   if (nmp->nm_readahead > 0 &&
-   !vm_object_mightbedirty(vp->v_object) &&
-   vp->v_object->un_pager.vnp.writemappings == 0 &&
+   if (nmp->nm_readahead > 0 && ncl_bioread_dora(vp) &&
(bp->b_flags & B_INVAL) == 0 &&
(np->n_direofoffset == 0 ||
(lbn + 1) * NFS_DIRBLKSIZ < np->n_direofoffset) &&

rick
ps: It appears that autofs causes the directory to be read before it
is open'd for
  some reason. I've never looked at autofs.

>
> Peter
>
> On 15 Jan 2024, at 09:31, Peter Blok  wrote:
>
> Hi,
>
> I do have a crash on a NFS client with stable of today 
> (4c4633fdffbe8e4b6d328c2bc9bb3edacc9ab50a). It is also autofs related. Maybe 
> it is the same problem.
>
> I have ports automounted on /am/ports. When I do cd /am/ports/sys and type 
> tab to autocomplete it crashes with the below stack trace. If I plainly mount 
> ports on /usr/ports and do the same everything works. I am using NFSv3
>
> Peter
>
>
>
>
> Fatal trap 12: page fault while in kernel mode
> cpuid = 2; apic id = 04
> fault virtual address = 0x89
> fault code = supervisor read data, page not present
> instruction pointer = 0x20:0x809645d4
> stack pointer= 0x28:0xfe00acadb830
> frame pointer= 0x28:0xfe00acadb830
> code segment = base 0x0, limit 0xf, type 0x1b
> = DPL 0, pres 1, long 1, def32 0, gran 1
> processor eflags = interrupt enabled, resume, IOPL = 0
> current process = 6869 (csh)
> trap number = 12
> panic: page fault
> cpuid = 2
> time = 1705306940
> KDB: stack backtrace:
> #0 0x806232f5 at kdb_backtrace+0x65
> #1 0x805d7a02 at vpanic+0x152
> #2 0x805d78a3 at panic+0x43
> #3 0x809d58ad at trap_fatal+0x38d
> #4 0x809d58ff at trap_pfault+0x4f
> #5 0x809af048 at calltrap+0x8
> #6 0x804c7a7e at ncl_bioread+0xb7e
> #7 0x804b9d90 at nfs_readdir+0x1f0
> #8 0x8069c61a at vop_sigdefer+0x2a
> #9 0x809f8ae0 at VOP_READDIR_APV+0x20
> #10 0x81ce75de at autofs_readdir+0x2ce
> #11 0x809f8ae0 at VOP_READDIR_APV+0x20
> #12 0x806c3002 at kern_getdirentries+0x222
> #13 0x806c33a9 at sys_getdirentries+0x29
> #14 0x809d6180 at amd64_syscall+0x110
> #15 0xffff809af95b at fast_syscall_common+0xf8
>
>
>
> On 15 Jan 2024, at 06:46, FreeBSD User  wrote:
>
> 

Re: NFSv4 crash of CURRENT

2024-01-15 Thread Peter Blok
Hi,

Forgot to mention I’m on 13-stable. The fix that is causing the crash with 
automounted NFS is:

commit cc5cda1dbaa907ce52074f47264cc45b5a7d6c8b
Author: Konstantin Belousov 
Date:   Tue Jan 2 00:22:44 2024 +0200

nfsclient: limit situations when we do unlocked read-ahead by nfsiod

(cherry picked from commit 70dc6b2ce314a0f32755005ad02802fca7ed186e)

When I remove the fix, the problem is gone. Add it back and the crash happens.

Peter

> On 15 Jan 2024, at 09:31, Peter Blok  wrote:
> 
> Hi,
> 
> I do have a crash on a NFS client with stable of today 
> (4c4633fdffbe8e4b6d328c2bc9bb3edacc9ab50a). It is also autofs related. Maybe 
> it is the same problem.
> 
> I have ports automounted on /am/ports. When I do cd /am/ports/sys and type 
> tab to autocomplete it crashes with the below stack trace. If I plainly mount 
> ports on /usr/ports and do the same everything works. I am using NFSv3
> 
> Peter
> 
> 
> 
> 
> Fatal trap 12: page fault while in kernel mode
> cpuid = 2; apic id = 04
> fault virtual address = 0x89
> fault code= supervisor read data, page not present
> instruction pointer   = 0x20:0x809645d4
> stack pointer = 0x28:0xfe00acadb830
> frame pointer = 0x28:0xfe00acadb830
> code segment  = base 0x0, limit 0xf, type 0x1b
>   = DPL 0, pres 1, long 1, def32 0, gran 1
> processor eflags  = interrupt enabled, resume, IOPL = 0
> current process   = 6869 (csh)
> trap number   = 12
> panic: page fault
> cpuid = 2
> time = 1705306940
> KDB: stack backtrace:
> #0 0x806232f5 at kdb_backtrace+0x65
> #1 0x805d7a02 at vpanic+0x152
> #2 0x805d78a3 at panic+0x43
> #3 0x809d58ad at trap_fatal+0x38d
> #4 0x809d58ff at trap_pfault+0x4f
> #5 0x809af048 at calltrap+0x8
> #6 0x804c7a7e at ncl_bioread+0xb7e
> #7 0x804b9d90 at nfs_readdir+0x1f0
> #8 0x8069c61a at vop_sigdefer+0x2a
> #9 0x809f8ae0 at VOP_READDIR_APV+0x20
> #10 0x81ce75de at autofs_readdir+0x2ce
> #11 0x809f8ae0 at VOP_READDIR_APV+0x20
> #12 0x806c3002 at kern_getdirentries+0x222
> #13 0x806c33a9 at sys_getdirentries+0x29
> #14 0x809d6180 at amd64_syscall+0x110
> #15 0x809af95b at fast_syscall_common+0xf8
> 
> 
> 
>> On 15 Jan 2024, at 06:46, FreeBSD User > <mailto:free...@walstatt-de.de>> wrote:
>> 
>> Am Sun, 14 Jan 2024 20:34:12 -0800
>> Cy Schubert mailto:cy.schub...@cschubert.com>> 
>> schrieb:
>> 
>>> In message 
>>> >> <mailto:CAM5tNy5aat8vUn2fsX9jV=D9yGZdnO20Q0Ea7qtszx+zSES2bw@mail.gmail.c>
>>> om>  
>>> , Rick Macklem writes:
>>>> On Sat, Jan 13, 2024 at 12:39=E2=80=AFPM Ronald Klop >>> <mailto:ronald-li...@klop.ws>>=
>>>> wrote:  
>>>>> 
>>>>> 
>>>>> Van: FreeBSD User mailto:free...@walstatt-de.de>>
>>>>> Datum: 13 januari 2024 19:34
>>>>> Aan: FreeBSD CURRENT >>>> <mailto:freebsd-current@freebsd.org>>
>>>>> Onderwerp: NFSv4 crash of CURRENT
>>>>> 
>>>>> Hello,
>>>>> 
>>>>> running CURRENT client (FreeBSD 15.0-CURRENT #4 
>>>>> main-n267556-69748e62e82a=  
>>>> : Sat Jan 13 18:08:32  
>>>>> CET 2024 amd64). One NFSv4 server is same OS revision as the mentioned 
>>>>> cl=  
>>>> ient, other is FreeBSD  
>>>>> 13.2-RELEASE-p8. Both offer NFSv4 filesystems, non-kerberized.
>>>>> 
>>>>> I can crash the client reproducable by accessing the one or other NFSv4 
>>>>> F=  
>>>> S (a simple ls -la).  
>>>>> The NFSv4 FS is backed by ZFS (if this matters). I do not have physicla 
>>>>> a=  
>>>> ccess to the client  
>>>>> host, luckily the box recovers.  
>>>> Did you rebuild both the nfscommon and nfscl modules from the same sources?
>>>> I did a commit to main that changes the interface between these two
>>>> modules and did bump the
>>>> __FreeBSD_version to 1500010, which should cause both to be rebuilt.
>>>> (If you have "options NFSCL" in your kernel config, both should have
>>>> been rebuilt as a part of
>>>> the kernel build.)
>>>> 
>>> 
>>> Is anyone by chance seeing autofs in the backtrace too?
>>> 
>>> 
>> 
>> Hello Cy Shubert,
>> 
>> I forgot to mention that those crashes occur with autofs mounted 
>> filesystems. Good question,
>> by the way, I will check whether crashes also happen when mounting the 
>> tradidional way.
>> 
>> Kind regards,
>> 
>> oh
>> 
>> -- 
>> O. Hartmann
> 



Re: NFSv4 crash of CURRENT

2024-01-15 Thread Peter Blok
Hi,

I do have a crash on a NFS client with stable of today 
(4c4633fdffbe8e4b6d328c2bc9bb3edacc9ab50a). It is also autofs related. Maybe it 
is the same problem.

I have ports automounted on /am/ports. When I do cd /am/ports/sys and type tab 
to autocomplete it crashes with the below stack trace. If I plainly mount ports 
on /usr/ports and do the same everything works. I am using NFSv3

Peter




Fatal trap 12: page fault while in kernel mode
cpuid = 2; apic id = 04
fault virtual address   = 0x89
fault code  = supervisor read data, page not present
instruction pointer = 0x20:0x809645d4
stack pointer   = 0x28:0xfe00acadb830
frame pointer   = 0x28:0xfe00acadb830
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 6869 (csh)
trap number = 12
panic: page fault
cpuid = 2
time = 1705306940
KDB: stack backtrace:
#0 0x806232f5 at kdb_backtrace+0x65
#1 0x805d7a02 at vpanic+0x152
#2 0x805d78a3 at panic+0x43
#3 0x809d58ad at trap_fatal+0x38d
#4 0x809d58ff at trap_pfault+0x4f
#5 0x809af048 at calltrap+0x8
#6 0x804c7a7e at ncl_bioread+0xb7e
#7 0x804b9d90 at nfs_readdir+0x1f0
#8 0x8069c61a at vop_sigdefer+0x2a
#9 0x809f8ae0 at VOP_READDIR_APV+0x20
#10 0x81ce75de at autofs_readdir+0x2ce
#11 0x809f8ae0 at VOP_READDIR_APV+0x20
#12 0x806c3002 at kern_getdirentries+0x222
#13 0x806c33a9 at sys_getdirentries+0x29
#14 0x809d6180 at amd64_syscall+0x110
#15 0x809af95b at fast_syscall_common+0xf8



> On 15 Jan 2024, at 06:46, FreeBSD User  wrote:
> 
> Am Sun, 14 Jan 2024 20:34:12 -0800
> Cy Schubert mailto:cy.schub...@cschubert.com>> 
> schrieb:
> 
>> In message > om>  
>> , Rick Macklem writes:
>>> On Sat, Jan 13, 2024 at 12:39=E2=80=AFPM Ronald Klop =
>>> wrote:  
>>>> 
>>>> 
>>>> Van: FreeBSD User 
>>>> Datum: 13 januari 2024 19:34
>>>> Aan: FreeBSD CURRENT 
>>>> Onderwerp: NFSv4 crash of CURRENT
>>>> 
>>>> Hello,
>>>> 
>>>> running CURRENT client (FreeBSD 15.0-CURRENT #4 main-n267556-69748e62e82a= 
>>>>  
>>> : Sat Jan 13 18:08:32  
>>>> CET 2024 amd64). One NFSv4 server is same OS revision as the mentioned cl= 
>>>>  
>>> ient, other is FreeBSD  
>>>> 13.2-RELEASE-p8. Both offer NFSv4 filesystems, non-kerberized.
>>>> 
>>>> I can crash the client reproducable by accessing the one or other NFSv4 F= 
>>>>  
>>> S (a simple ls -la).  
>>>> The NFSv4 FS is backed by ZFS (if this matters). I do not have physicla a= 
>>>>  
>>> ccess to the client  
>>>> host, luckily the box recovers.  
>>> Did you rebuild both the nfscommon and nfscl modules from the same sources?
>>> I did a commit to main that changes the interface between these two
>>> modules and did bump the
>>> __FreeBSD_version to 1500010, which should cause both to be rebuilt.
>>> (If you have "options NFSCL" in your kernel config, both should have
>>> been rebuilt as a part of
>>> the kernel build.)
>>> 
>> 
>> Is anyone by chance seeing autofs in the backtrace too?
>> 
>> 
> 
> Hello Cy Shubert,
> 
> I forgot to mention that those crashes occur with autofs mounted filesystems. 
> Good question,
> by the way, I will check whether crashes also happen when mounting the 
> tradidional way.
> 
> Kind regards,
> 
> oh
> 
> -- 
> O. Hartmann



Re: NFSv4 crash of CURRENT

2024-01-14 Thread FreeBSD User
Am Sun, 14 Jan 2024 20:34:12 -0800
Cy Schubert  schrieb:

> In message  om>  
> , Rick Macklem writes:
> > On Sat, Jan 13, 2024 at 12:39=E2=80=AFPM Ronald Klop =
> >  wrote:  
> > >
> > >
> > > Van: FreeBSD User 
> > > Datum: 13 januari 2024 19:34
> > > Aan: FreeBSD CURRENT 
> > > Onderwerp: NFSv4 crash of CURRENT
> > >
> > > Hello,
> > >
> > > running CURRENT client (FreeBSD 15.0-CURRENT #4 
> > > main-n267556-69748e62e82a=  
> > : Sat Jan 13 18:08:32  
> > > CET 2024 amd64). One NFSv4 server is same OS revision as the mentioned 
> > > cl=  
> > ient, other is FreeBSD  
> > > 13.2-RELEASE-p8. Both offer NFSv4 filesystems, non-kerberized.
> > >
> > > I can crash the client reproducable by accessing the one or other NFSv4 
> > > F=  
> > S (a simple ls -la).  
> > > The NFSv4 FS is backed by ZFS (if this matters). I do not have physicla 
> > > a=  
> > ccess to the client  
> > > host, luckily the box recovers.  
> > Did you rebuild both the nfscommon and nfscl modules from the same sources?
> > I did a commit to main that changes the interface between these two
> > modules and did bump the
> > __FreeBSD_version to 1500010, which should cause both to be rebuilt.
> > (If you have "options NFSCL" in your kernel config, both should have
> > been rebuilt as a part of
> > the kernel build.)
> >  
> 
> Is anyone by chance seeing autofs in the backtrace too?
> 
> 

Hello Cy Shubert,

I forgot to mention that those crashes occur with autofs mounted filesystems. 
Good question,
by the way, I will check whether crashes also happen when mounting the 
tradidional way.

Kind regards,

oh

-- 
O. Hartmann



Re: NFSv4 crash of CURRENT

2024-01-14 Thread Cy Schubert
In message 
, Rick Macklem writes:
> On Sat, Jan 13, 2024 at 12:39=E2=80=AFPM Ronald Klop =
>  wrote:
> >
> >
> > Van: FreeBSD User 
> > Datum: 13 januari 2024 19:34
> > Aan: FreeBSD CURRENT 
> > Onderwerp: NFSv4 crash of CURRENT
> >
> > Hello,
> >
> > running CURRENT client (FreeBSD 15.0-CURRENT #4 main-n267556-69748e62e82a=
> : Sat Jan 13 18:08:32
> > CET 2024 amd64). One NFSv4 server is same OS revision as the mentioned cl=
> ient, other is FreeBSD
> > 13.2-RELEASE-p8. Both offer NFSv4 filesystems, non-kerberized.
> >
> > I can crash the client reproducable by accessing the one or other NFSv4 F=
> S (a simple ls -la).
> > The NFSv4 FS is backed by ZFS (if this matters). I do not have physicla a=
> ccess to the client
> > host, luckily the box recovers.
> Did you rebuild both the nfscommon and nfscl modules from the same sources?
> I did a commit to main that changes the interface between these two
> modules and did bump the
> __FreeBSD_version to 1500010, which should cause both to be rebuilt.
> (If you have "options NFSCL" in your kernel config, both should have
> been rebuilt as a part of
> the kernel build.)
>

Is anyone by chance seeing autofs in the backtrace too?


-- 
Cheers,
Cy Schubert 
FreeBSD UNIX: Web:  https://FreeBSD.org
NTP:   Web:  https://nwtime.org

e^(i*pi)+1=0





Re: NFSv4 crash of CURRENT

2024-01-14 Thread FreeBSD User
Am Sat, 13 Jan 2024 19:41:30 -0800
Rick Macklem  schrieb:

> On Sat, Jan 13, 2024 at 12:39 PM Ronald Klop  wrote:
> >
> >
> > Van: FreeBSD User 
> > Datum: 13 januari 2024 19:34
> > Aan: FreeBSD CURRENT 
> > Onderwerp: NFSv4 crash of CURRENT
> >
> > Hello,
> >
> > running CURRENT client (FreeBSD 15.0-CURRENT #4 main-n267556-69748e62e82a: 
> > Sat Jan 13
> > 18:08:32 CET 2024 amd64). One NFSv4 server is same OS revision as the 
> > mentioned client,
> > other is FreeBSD 13.2-RELEASE-p8. Both offer NFSv4 filesystems, 
> > non-kerberized.
> >
> > I can crash the client reproducable by accessing the one or other NFSv4 FS 
> > (a simple ls
> > -la). The NFSv4 FS is backed by ZFS (if this matters). I do not have 
> > physicla access to
> > the client host, luckily the box recovers.  
> Did you rebuild both the nfscommon and nfscl modules from the same sources?

Yes, as requested, as soon as the commit occured. I recompiled the whole OS 
from a "make -j4
cleanworld cleandir" .

But I have a custom kernel with several custom options statically compiled in.

> I did a commit to main that changes the interface between these two
> modules and did bump the
> __FreeBSD_version to 1500010, which should cause both to be rebuilt.
> (If you have "options NFSCL" in your kernel config, both should have
> been rebuilt as a part of
> the kernel build.)

Monday I will try to compile in several debug options whe I get hands on the 
machine again and
I can test Tuesday on several other boxes running CURRENT (after update) how 
they interact
with themselfes (CURRENT) and other (FBSD14, FBSD13) via NFSv4.

> 
> rick
> >
> > I have no idea what causes this problem ...
> >
> > Kind regards,
> >
> > O. Hartmann
> >
> >
> > --
> > O. Hartmann
> >
> > 
> >
> >
> >
> > Do you have something like a panic message, stack trace or core dump?
> >
> > Regards
> > Ronald  
> 



-- 
O. Hartmann



Re: NFSv4 crash of CURRENT

2024-01-13 Thread Rick Macklem
On Sat, Jan 13, 2024 at 12:39 PM Ronald Klop  wrote:
>
>
> Van: FreeBSD User 
> Datum: 13 januari 2024 19:34
> Aan: FreeBSD CURRENT 
> Onderwerp: NFSv4 crash of CURRENT
>
> Hello,
>
> running CURRENT client (FreeBSD 15.0-CURRENT #4 main-n267556-69748e62e82a: 
> Sat Jan 13 18:08:32
> CET 2024 amd64). One NFSv4 server is same OS revision as the mentioned 
> client, other is FreeBSD
> 13.2-RELEASE-p8. Both offer NFSv4 filesystems, non-kerberized.
>
> I can crash the client reproducable by accessing the one or other NFSv4 FS (a 
> simple ls -la).
> The NFSv4 FS is backed by ZFS (if this matters). I do not have physicla 
> access to the client
> host, luckily the box recovers.
Did you rebuild both the nfscommon and nfscl modules from the same sources?
I did a commit to main that changes the interface between these two
modules and did bump the
__FreeBSD_version to 1500010, which should cause both to be rebuilt.
(If you have "options NFSCL" in your kernel config, both should have
been rebuilt as a part of
the kernel build.)

rick
>
> I have no idea what causes this problem ...
>
> Kind regards,
>
> O. Hartmann
>
>
> --
> O. Hartmann
>
> 
>
>
>
> Do you have something like a panic message, stack trace or core dump?
>
> Regards
> Ronald



Re: NFSv4 crash of CURRENT

2024-01-13 Thread Ronald Klop

Van: FreeBSD User 
Datum: 13 januari 2024 19:34
Aan: FreeBSD CURRENT 
Onderwerp: NFSv4 crash of CURRENT




Hello,

running CURRENT client (FreeBSD 15.0-CURRENT #4 main-n267556-69748e62e82a: Sat 
Jan 13 18:08:32
CET 2024 amd64). One NFSv4 server is same OS revision as the mentioned client, 
other is FreeBSD
13.2-RELEASE-p8. Both offer NFSv4 filesystems, non-kerberized.

I can crash the client reproducable by accessing the one or other NFSv4 FS (a 
simple ls -la).
The NFSv4 FS is backed by ZFS (if this matters). I do not have physicla access 
to the client
host, luckily the box recovers.

I have no idea what causes this problem ...

Kind regards,

O. Hartmann


--
O. Hartmann








Do you have something like a panic message, stack trace or core dump?

Regards
Ronald

Re: route ipv6 errors on bootup in -current main-n267425-aa1223ac3afc on arm64

2024-01-10 Thread Zhenlei Huang



> On Jan 9, 2024, at 6:24 PM, void  wrote:
> 
> On Mon, Jan 08, 2024 at 01:07:30PM -0800, Enji Cooper wrote:
>> 
>> Was the kernel/utility built with IPv6? If not, that’s a general bug which 
>> should be filed (which can be easily checked/avoided using the FEATURES(9) 
>> subsystem)…
>> Cheers!
>> -Enji
> 
> world/kernel was built with WITHOUT_INET6= in /etc/src.conf
> 
> I made the problem go away with removing WITHOUT_INET6= and rebuilding.
> The system was installed by taking 
> FreeBSD-15.0-CURRENT-arm64-aarch64-RPI-20240104-8bf0882e186e-267378.img
> and dd-ing it to a usb3-connected hd.
> 
> Where can I read about features?

Features can be retrieved by `sysctl kern.features`.

As for INET6 it should be `kern.features.inet6` .

> 
> % man features
> No manual entry for "features"
> 
> it's not in apropos
> thanks,
> -- 
> 

Best regards,
Zhenlei




Re: route ipv6 errors on bootup in -current main-n267425-aa1223ac3afc on arm64

2024-01-10 Thread Enji Cooper

> On Jan 9, 2024, at 7:17 AM, void  wrote:
> 
> On Tue, Jan 09, 2024 at 12:24:40PM +, void wrote:
>> On Tue, Jan 09, 2024 at 10:24:53AM +, void wrote:
>>> On Mon, Jan 08, 2024 at 01:07:30PM -0800, Enji Cooper wrote:
>>>> 
>>>> Was the kernel/utility built with IPv6? If not, that’s a general bug which 
>>>> should be filed (which can be easily checked/avoided using the FEATURES(9) 
>>>> subsystem)…
>>>> Cheers!
>>>> -Enji
>>> 
>>> world/kernel was built with WITHOUT_INET6= in /etc/src.conf
>>> 
>>> I made the problem go away with removing WITHOUT_INET6= and rebuilding.
>> 
>> I'll re-add this to try and replicate the problem with the same sources
>> (main-n267425-aa1223ac3afc) and if it happens again I'll make a PR for it
> 
> I forgot about this line:
> 
> options INET6   # IPv6 communications protocols
> 
> which, on current/arm64 lives in std.arm64 which gets included by
> GENERIC which is included by GENERIC-MMCCAM which is included by
> GENERIC-MMCCAM-NODEBUG
> 
> commenting it out and having WITHOUT_INET6= in /etc/src.conf and rebuilding
> fixes the problem. Sorry for the noise.

It’s not noise; what you found is a valid issue.
Please file an issue for this, noting that the kernel was built without 
INET6 support (that’s the key bit of info for reproing the issue).
Thank you!
-Enji


signature.asc
Description: Message signed with OpenPGP


Re: kernel: fatal trap 12 on CURRENT, when using WireGuard

2024-01-09 Thread Rainer Hurling

Am 09.01.24 um 21:40 schrieb Gleb Smirnoff:

   Rainer,

On Tue, Jan 09, 2024 at 09:23:54PM +0100, Rainer Hurling wrote:
R> I tried to update my 15.0-CURRENT box from n267335-499e84e16f56 to a very
R> recent commit. The build and install went fine. After booting with new
R> base, I got a page fault with the following error:

Sorry for that, my fault. Can you please test this patch?



Hi Gleb,

Thanks for the very fast response.

I tried your patch and it seems to work as expected. I have a running 
system, with WireGuard on, at commit main-n267469-0013741108bc-dirty.


Many thanks again and best wishes,
Rainer




Re: kernel: fatal trap 12 on CURRENT, when using WireGuard

2024-01-09 Thread Gleb Smirnoff
  Rainer,

On Tue, Jan 09, 2024 at 09:23:54PM +0100, Rainer Hurling wrote:
R> I tried to update my 15.0-CURRENT box from n267335-499e84e16f56 to a very
R> recent commit. The build and install went fine. After booting with new
R> base, I got a page fault with the following error:

Sorry for that, my fault. Can you please test this patch?

-- 
Gleb Smirnoff
diff --git a/sys/netlink/netlink_domain.c b/sys/netlink/netlink_domain.c
index 7660dcada103..4790845d1d31 100644
--- a/sys/netlink/netlink_domain.c
+++ b/sys/netlink/netlink_domain.c
@@ -233,7 +233,7 @@ nl_send_group(struct nl_writer *nw)
 copy = nl_buf_copy(nb);
 if (copy != NULL) {
 	nw->buf = copy;
-	(void)nl_send_one(nw);
+	(void)nl_send(nw, nlp_last);
 } else {
 	NLP_LOCK(nlp_last);
 	if (nlp_last->nl_socket != NULL)
@@ -246,7 +246,7 @@ nl_send_group(struct nl_writer *nw)
 	}
 	if (nlp_last != NULL) {
 		nw->buf = nb;
-		(void)nl_send_one(nw);
+		(void)nl_send(nw, nlp_last);
 	} else
 		nl_buf_free(nb);
 
diff --git a/sys/netlink/netlink_io.c b/sys/netlink/netlink_io.c
index fb8e0a46e8dd..5f50c40f71d8 100644
--- a/sys/netlink/netlink_io.c
+++ b/sys/netlink/netlink_io.c
@@ -194,9 +194,8 @@ nl_taskqueue_handler(void *_arg, int pending)
  * If no queue overrunes happened, wakes up socket owner.
  */
 bool
-nl_send_one(struct nl_writer *nw)
+nl_send(struct nl_writer *nw, struct nlpcb *nlp)
 {
-	struct nlpcb *nlp = nw->nlp;
 	struct socket *so = nlp->nl_socket;
 	struct sockbuf *sb = >so_rcv;
 	struct nl_buf *nb;
diff --git a/sys/netlink/netlink_message_writer.c b/sys/netlink/netlink_message_writer.c
index 0b85378b41b6..50305e3d9d80 100644
--- a/sys/netlink/netlink_message_writer.c
+++ b/sys/netlink/netlink_message_writer.c
@@ -65,6 +65,13 @@ nlmsg_get_buf(struct nl_writer *nw, u_int len, bool waitok)
 	return (true);
 }
 
+static bool
+nl_send_one(struct nl_writer *nw)
+{
+
+	return (nl_send(nw, nw->nlp));
+}
+
 bool
 _nlmsg_get_unicast_writer(struct nl_writer *nw, int size, struct nlpcb *nlp)
 {
diff --git a/sys/netlink/netlink_var.h b/sys/netlink/netlink_var.h
index c8f0d02a0dab..ddf30b373446 100644
--- a/sys/netlink/netlink_var.h
+++ b/sys/netlink/netlink_var.h
@@ -130,9 +130,7 @@ void nl_osd_unregister(void);
 void nl_set_thread_nlp(struct thread *td, struct nlpcb *nlp);
 
 /* netlink_io.c */
-#define	NL_IOF_UNTRANSLATED	0x01
-#define	NL_IOF_IGNORE_LIMIT	0x02
-bool nl_send_one(struct nl_writer *);
+bool nl_send(struct nl_writer *, struct nlpcb *);
 void nlmsg_ack(struct nlpcb *nlp, int error, struct nlmsghdr *nlmsg,
 struct nl_pstate *npt);
 void nl_on_transmit(struct nlpcb *nlp);


kernel: fatal trap 12 on CURRENT, when using WireGuard

2024-01-09 Thread Rainer Hurling
I tried to update my 15.0-CURRENT box from n267335-499e84e16f56 to a 
very recent commit. The build and install went fine. After booting with 
new base, I got a page fault with the following error:



Kernel page fault with the following non-sleepable locks held:
shared rm netlink lock (netlink lock) r = 0 (0xf8005fc8ca20) locked 
@ /usr/src/sys/netlink/netlink_domain.c:241
exclusive rw lle (lle) r = 0 (0xf801951dce90) locked @ 
/usr/src/sys/netinet/in.c:1716

stack backtrace:
#0 0x80bc6c45 at witness_debugger+0x65
#1 0x80bc7d89 at witness_warn+0x3e9
#2 0x81056b18 at trap_pfault+0x88
#3 0x81028708 at calltrap+0x8
#4 0x80dbd6a2 at nl_send_group+0x1d2
#5 0x80dc0e27 at _nlmsg_flush+0x37
#6 0x80dc4fdc at rtnl_lle_event+0x10c
#7 0x80d15e32 at arp_mark_lle_reachable+0xd2
#8 0x80d15b43 at arp_check_update_lle+0x293
#9 0x80d151c5 at arpintr+0xa65
#10 0x80caaaed at netisr_dispatch_src+0xad
#11 0x80c8d57a at ether_demux+0x0x17a
#12 0x80c8ec53 at ether_nh_input+0x403
#13 0x80caaaed at netisr_dispatch_src+0xad
#14 0x80c8d9c9 at ether_input+0xd9
#15 0x80ca66ac at iflib_rxeof+0xe4c
#16 0x80ca0b5a at _task_fn_rx+0x7a
#17 0x80ba0118 at gtaskqueue_run_locked+0xa8

Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address   = 0x3
fault code  = supervisor read data, page not present
instruction pointer = 0x20:0x80dc0a10
stack pointer   = 0x28:0xfe006a3a8760
frame pointer   = 0x28:0xfe006a3a8790
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, long 1. def32 0, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 0 (if_io_tqg_0)
rdi: fe006a3a8850 rsi: fe006a3a86f0 rdx: fe006a3a87b0
rcx: f80001f88740  r8: 83210090  r9: 
rax:  rbx: 0003 rbp: fe006a3a8790
r10: 0001 r11:  r12: f8005fc8ca00
r13: f8005fc8ca20 r14: fe006a3a8850 r15: 
trap number = 12
panic: page fault
cpuid = 0
time = 1704824328
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 
0xfe006a3a8430

vpanic() at vpanic+0x131/frame 0xfe006a3a8560
panic() at panic+0x43/frame 0xfe006a3a85c0
trap_fatal() at trap_fatal+0x40f/frame 0xfe006a3a8620
trap_pfault() at trap_pfault+0xae/frame 0xfe006a3a8690
calltrap() at calltrap+0x8/frame 0xfe006a3a8690
--- trap 0xc, rip = 0x80dc0a10, rsp = 0xfe006a3a8760, rbp = 
0xfe006a3a8790 ---

nl_send_one() at nl_send_one+0x20/frame 0xfe006a3a8790
nl_send_group() at nl_send_group+0x1d2/frame 0xfe006a3a8820
_nlmsg-flush() at _nlmsg_flush+0x37/frame 0xfe006a3a8840
rtnl_lle_event() at rtnl_lle_event+0x10c/frame 0xfe006a3a88e0
arp_mark_lle_reachable() at arp_mark_lle_reachable+0xd2/frame 
0xfe006a3a8930
arp_check_update_lle() at arp_check_update_lle+0x293/frame 
0xfe006a3a8a00

arpintr() at arpintr+0xa65/frame 0xfe006a3a8b60
netisr_dispatch_src() at netisr_dispatch_src+0xad/frame 0xfe006a3a8bc8
ether_demux() at ether_demux+0x17a/frame 0xfe006a4a8bf0
ether_nh_input() at ether_nh_input+0x403/frame 0xfe006a3a8c40
netisr_dispatch_src() at netisr_dispatch_src+0xad/frame 0xfe006a3a8ca0
ether_input() at ehter_input+0xd9/frame 0xfe006a3a8d00
iflib_rxeof() at iflib_rxeof+0xe4c/frame 0xfe006a3a8e00
_task_fn_rx() at _task_fn_rx+0x7a/frame 0xfe006a3a8e40
gtaskqueue_run_locked() at gtaskqueue_run_locked+0xa8/frame 
0xfe006a3a8ec0
gtaskqueue_thread_loop() at gtaskqueue_thread_loop+0xd3/frame 
0xfe006a3a8ef0

fork_exit() at fork_exit+0x82/frame 0xfe006a3a8f30
fork_trampoline() at fork_trampoline+0xe/frame 0xfe006a3a8f30
--- trap 0xf2b9f109, rip = 0x7afef8a176bef8a5, rsp = 0xddc963edd18963e9, 
rbp = 0x61f64fc36db64fc7

KDB: enter: panic
[ thread pid 0 tid 100067 ]
Stopped at  kdb_enter+0x33: movq$0,0xe3a582(%rip)
db>


Since the current process 'if_io_tqg_0' and problems with netlink are 
mentioned, I searched in the area of my network connections. I 
discovered that this page fault only occurs when a connection is 
established with WireGuard (wg-quick up wg0). Without using WireGuard, 
this error does not occur.


I was able to find out at which commit this behavior occurs with my box:
- Up to commit main-n267347-660bd40a598a everything is fine.
- The two following commits n267348-67d9023f07a4 and 
n267349-0ad011ececb9 do not build on my box (module/netlink broken ...).
- From commit n267349-0ad011ececb9 (netlink) onwards this page fault 
occurs when WireGuard is started.


Any help is greatly appreciated.
CC'ed Gleb Smirnoff due to the affected commits.

Regards,
Rainer Hurling



Re: route ipv6 errors on bootup in -current main-n267425-aa1223ac3afc on arm64

2024-01-09 Thread void

On Tue, Jan 09, 2024 at 12:24:40PM +, void wrote:

On Tue, Jan 09, 2024 at 10:24:53AM +, void wrote:

On Mon, Jan 08, 2024 at 01:07:30PM -0800, Enji Cooper wrote:


Was the kernel/utility built with IPv6? If not, that’s a general 
bug which should be filed (which can be easily checked/avoided 
using the FEATURES(9) subsystem)…

Cheers!
-Enji


world/kernel was built with WITHOUT_INET6= in /etc/src.conf

I made the problem go away with removing WITHOUT_INET6= and rebuilding.


I'll re-add this to try and replicate the problem with the same sources
(main-n267425-aa1223ac3afc) and if it happens again I'll make a PR for it


I forgot about this line:

options INET6   # IPv6 communications protocols

which, on current/arm64 lives in std.arm64 which gets included by
GENERIC which is included by GENERIC-MMCCAM which is included by
GENERIC-MMCCAM-NODEBUG

commenting it out and having WITHOUT_INET6= in /etc/src.conf and rebuilding
fixes the problem. Sorry for the noise.
--



Re: route ipv6 errors on bootup in -current main-n267425-aa1223ac3afc on arm64

2024-01-09 Thread void

On Tue, Jan 09, 2024 at 10:24:53AM +, void wrote:

On Mon, Jan 08, 2024 at 01:07:30PM -0800, Enji Cooper wrote:


Was the kernel/utility built with IPv6? If not, that’s a general bug 
which should be filed (which can be easily checked/avoided using the 
FEATURES(9) subsystem)…

Cheers!
-Enji


world/kernel was built with WITHOUT_INET6= in /etc/src.conf

I made the problem go away with removing WITHOUT_INET6= and rebuilding.


I'll re-add this to try and replicate the problem with the same sources
(main-n267425-aa1223ac3afc) and if it happens again I'll make a PR for it
--



Re: route ipv6 errors on bootup in -current main-n267425-aa1223ac3afc on arm64

2024-01-09 Thread void

On Mon, Jan 08, 2024 at 01:07:30PM -0800, Enji Cooper wrote:


Was the kernel/utility built with IPv6? If not, that’s a general 
bug which should be filed (which can be easily checked/avoided 
using the FEATURES(9) subsystem)…

Cheers!
-Enji


world/kernel was built with WITHOUT_INET6= in /etc/src.conf

I made the problem go away with removing WITHOUT_INET6= and rebuilding.
The system was installed by taking 
FreeBSD-15.0-CURRENT-arm64-aarch64-RPI-20240104-8bf0882e186e-267378.img

and dd-ing it to a usb3-connected hd.

Where can I read about features?

% man features
No manual entry for "features"

it's not in apropos
thanks,
--



[Bug 197921] scheduler: Allow non-migratable threads to bind to their current CPU

2024-01-09 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=197921

Zhenlei Huang  changed:

   What|Removed |Added

 CC||z...@freebsd.org

--- Comment #3 from Zhenlei Huang  ---
It seems we do not have usage that bind a thread to local CPU, otherwise
`KASSERT(THREAD_CAN_MIGRATE(td), ("%p must be migratable", td))` will complain
(when kernel built with option INVARIANTS).

(In reply to Ed Maste from comment #1)
> but, what about just moving the KASSERT after the `if (PCPU_GET(cpuid) == 
> cpu)` test?
I think that is much simpler.

-- 
You are receiving this mail because:
You are on the CC list for the bug.


Re: route ipv6 errors on bootup in -current main-n267425-aa1223ac3afc on arm64

2024-01-08 Thread Enji Cooper

> On Jan 7, 2024, at 6:29 AM, void  wrote:
> 
> Hi,
> 
> on a rpi4/8GB, my rc.conf looks like so. It's an ipv4-only system on a LAN 
> not directly connected to the internet
> 
> hostname="generic.home.arpa"
> ifconfig_genet0="inet 192.168.1.199 netmask 255.255.255.0"
> defaultrouter="192.168.1.1"
> sshd_enable="YES"
> sendmail_enable="NONE"
> sendmail_submit_enable="NO"
> sendmail_outbound_enable="NO"
> sendmail_msp_queue_enable="NO"
> growfs_enable="YES"
> # Set dumpdev to "AUTO" to enable crash dumps, "NO" to disable
> dumpdev="AUTO"
> ntpd_enable="YES"
> ntpdate_enable="YES"
> 
> when it boots, the following appears in the serial console
> 
> ###
> 
> Starting devd.
> Autoloading module: uhid
> Autoloading module: usbhid
> Autoloading module: wmt
> route: message indicates error: File exists
> add host 127.0.0.1: gateway lo0 fib 0: route already in table
> add net default: gateway 192.168.1.1
> route: bad keyword: inet6
> route: usage: route [-j jail] [-46dnqtv] command [[modifiers] args]
> route: bad keyword: inet6
> route: usage: route [-j jail] [-46dnqtv] command [[modifiers] args]
> route: bad keyword: inet6
> route: usage: route [-j jail] [-46dnqtv] command [[modifiers] args]
> route: bad keyword: inet6
> route: usage: route [-j jail] [-46dnqtv] command [[modifiers] args]
> route: bad keyword: inet6
> route: usage: route [-j jail] [-46dnqtv] command [[modifiers] args]
> Updating motd:.
> Creating and/or trimming log files.

Was the kernel/utility built with IPv6? If not, that’s a general bug which 
should be filed (which can be easily checked/avoided using the FEATURES(9) 
subsystem)…
Cheers!
-Enji


signature.asc
Description: Message signed with OpenPGP


route ipv6 errors on bootup in -current main-n267425-aa1223ac3afc on arm64

2024-01-07 Thread void

Hi,

on a rpi4/8GB, my rc.conf looks like so. It's an ipv4-only system 
on a LAN not directly connected to the internet


hostname="generic.home.arpa"
ifconfig_genet0="inet 192.168.1.199 netmask 255.255.255.0"
defaultrouter="192.168.1.1"
sshd_enable="YES"
sendmail_enable="NONE"
sendmail_submit_enable="NO"
sendmail_outbound_enable="NO"
sendmail_msp_queue_enable="NO"
growfs_enable="YES"
# Set dumpdev to "AUTO" to enable crash dumps, "NO" to disable
dumpdev="AUTO"
ntpd_enable="YES"
ntpdate_enable="YES"

when it boots, the following appears in the serial console

###

Starting devd.
Autoloading module: uhid
Autoloading module: usbhid
Autoloading module: wmt
route: message indicates error: File exists
add host 127.0.0.1: gateway lo0 fib 0: route already in table
add net default: gateway 192.168.1.1
route: bad keyword: inet6
route: usage: route [-j jail] [-46dnqtv] command [[modifiers] args]
route: bad keyword: inet6
route: usage: route [-j jail] [-46dnqtv] command [[modifiers] args]
route: bad keyword: inet6
route: usage: route [-j jail] [-46dnqtv] command [[modifiers] args]
route: bad keyword: inet6
route: usage: route [-j jail] [-46dnqtv] command [[modifiers] args]
route: bad keyword: inet6
route: usage: route [-j jail] [-46dnqtv] command [[modifiers] args]
Updating motd:.
Creating and/or trimming log files.

###

Why is it erroring for ipv6 when theres no ipv6 in rc.conf?
I've not tried an amd64 -current system yet.

--



[Bug 197921] scheduler: Allow non-migratable threads to bind to their current CPU

2024-01-05 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=197921

Mark Linimon  changed:

   What|Removed |Added

  Flags|mfc-stable12?,  |
   |mfc-stable11?   |

--- Comment #2 from Mark Linimon  ---
^Triage: remove OBE flags.

-- 
You are receiving this mail because:
You are on the CC list for the bug.


Re: Checksum Error on installer (3 iso images in CURRENT)

2023-12-29 Thread Alfonso S. Siciliano

On 29/12/2023 05:46, Christopher Davidson wrote:

Hi FreeBSD mailing list,

I have recently started to look at the CURRENT isos, for installation in 
a virtualbox, and while trying to install these images I have received 
verification issues with the checksums.


Problem: FreeBSD installer will error out of the installer upon 
verification of checksums, they do not align with the checksum files in 
the directory: 
https://download.freebsd.org/snapshots/amd64/amd64/ISO-IMAGES/15.0/ 
<https://download.freebsd.org/snapshots/amd64/amd64/ISO-IMAGES/150/>


Steps to replicate:

 1. Create a virtualbox profile with freebsd
 2. Attach one of the below iso images
 3. Run the installation
 4. Select keymap
 5. Select partition setup (UFS)
 6. Select packages to install
 7. Program starts verifying the base package and comes back with the
error message

I have confirmed this with some people on the liberachat IRC server, 
under #freebsd and this is not an isolated event.


The 3 iso images in question are:

  * FreeBSD-15.0-CURRENT-amd64-20231216-ca39f23347e1-266973-bootonly.iso
  * FreeBSD-15.0-CURRENT-amd64-20231228-fb03f7f8e30d-267242-disc1.iso
  * FreeBSD-15.0-CURRENT-amd64-20231223-dac33a65b965-267058-bootonly.iso

Here are the respective checksums for each of these files:

CHECKSUM.SHA256-FreeBSD-15.0-CURRENT-amd64-20231216-ca39f23347e1-266973

SHA256 
(CHECKSUM.SHA256-FreeBSD-15.0-CURRENT-amd64-20231216-ca39f23347e1-266973) = 827182ccbfbce984c969790e7aac43828dffc4a21d43e855c91bac03f29dc74e


SHA256 
(FreeBSD-150-CURRENT-amd64-20231216-ca39f23347e1-266973-bootonly.iso) = 
fdd8870549474f38d35665c330d209df7733aa8608630845471685b291c06746


CHECKSUM.SHA256-FreeBSD-15.0-CURRENT-amd64-20231223-dac33a65b965-267058

SHA256 
(CHECKSUM.SHA256-FreeBSD-15.0-CURRENT-amd64-20231223-dac33a65b965-267058) = 60f01c27aa02acb47cab7dec58119f34e7215c3656b8486854bc64217cdfe3bb


SHA256 
(FreeBSD-15.0-CURRENT-amd64-20231223-dac33a65b965-267058-bootonly.iso) = 
abdd81c253c651bbc10e3db1b97b8b111f73b3f657f729e37cdbe975de0dc056


CHECKSUM.SHA256-FreeBSD-15.0-CURRENT-amd64-20231228-fb03f7f8e30d-267242

SHA256 
(CHECKSUM.SHA256-FreeBSD-15.0-CURRENT-amd64-20231228-fb03f7f8e30d-267242) = 83698ee594d56108b29e40d635671c7a2de6ada2af636ef5254eafbd35e95e96


SHA256 
(FreeBSD-150-CURRENT-amd64-20231228-fb03f7f8e30d-267242-disc1.iso) = 
2deb850673f148cf1ab269175ddf40448e6a96b331b4ca0027f8abe16b3edfa0


If any further information/clarification is required, please do let me know.

Kind Regards,

Chris



Hi Chris,

I had the same problem some time ago because I used to test
iso/bsdinstall every day. Fortunately, the re@ team explained the
cause and the solution to me.

The new development snapshot builds are propagated to the
mirror.  It is non-atomic, so there is a bit of a possible race 
condition. (This happens once a week, unless there is some sort of build 
failure.)


You could subscribe to the freebsd-snapshots@ mailing list if you want 
to be notified when the propagation is complete.


Alfonso




Checksum Error on installer (3 iso images in CURRENT)

2023-12-28 Thread Christopher Davidson
Hi FreeBSD mailing list,

I have recently started to look at the CURRENT isos, for installation in a 
virtualbox, and while trying to install these images I have received 
verification issues with the checksums.

Problem: FreeBSD installer will error out of the installer upon verification of 
checksums, they do not align with the checksum files in the directory: 
https://download.freebsd.org/snapshots/amd64/amd64/ISO-IMAGES/15.0/

Steps to replicate:

  1.  Create a virtualbox profile with freebsd
  2.  Attach one of the below iso images
  3.  Run the installation
  4.  Select keymap
  5.  Select partition setup (UFS)
  6.  Select packages to install
  7.  Program starts verifying the base package and comes back with the error 
message

I have confirmed this with some people on the libera.chat IRC server, under 
#freebsd and this is not an isolated event.

The 3 iso images in question are:

  *   FreeBSD-15.0-CURRENT-amd64-20231216-ca39f23347e1-266973-bootonly.iso
  *   FreeBSD-15.0-CURRENT-amd64-20231228-fb03f7f8e30d-267242-disc1.iso
  *   FreeBSD-15.0-CURRENT-amd64-20231223-dac33a65b965-267058-bootonly.iso

Here are the respective checksums for each of these files:
CHECKSUM.SHA256-FreeBSD-15.0-CURRENT-amd64-20231216-ca39f23347e1-266973

SHA256 
(CHECKSUM.SHA256-FreeBSD-15.0-CURRENT-amd64-20231216-ca39f23347e1-266973) = 
827182ccbfbce984c969790e7aac43828dffc4a21d43e855c91bac03f29dc74e
SHA256 (FreeBSD-15.0-CURRENT-amd64-20231216-ca39f23347e1-266973-bootonly.iso) = 
fdd8870549474f38d35665c330d209df7733aa8608630845471685b291c06746

CHECKSUM.SHA256-FreeBSD-15.0-CURRENT-amd64-20231223-dac33a65b965-267058

SHA256 
(CHECKSUM.SHA256-FreeBSD-15.0-CURRENT-amd64-20231223-dac33a65b965-267058) = 
60f01c27aa02acb47cab7dec58119f34e7215c3656b8486854bc64217cdfe3bb
SHA256 (FreeBSD-15.0-CURRENT-amd64-20231223-dac33a65b965-267058-bootonly.iso) = 
abdd81c253c651bbc10e3db1b97b8b111f73b3f657f729e37cdbe975de0dc056

CHECKSUM.SHA256-FreeBSD-15.0-CURRENT-amd64-20231228-fb03f7f8e30d-267242

SHA256 
(CHECKSUM.SHA256-FreeBSD-15.0-CURRENT-amd64-20231228-fb03f7f8e30d-267242) = 
83698ee594d56108b29e40d635671c7a2de6ada2af636ef5254eafbd35e95e96
SHA256 (FreeBSD-15.0-CURRENT-amd64-20231228-fb03f7f8e30d-267242-disc1.iso) = 
2deb850673f148cf1ab269175ddf40448e6a96b331b4ca0027f8abe16b3edfa0

If any further information/clarification is required, please do let me know.

Kind Regards,
Chris



Re: Problem building world on current

2023-12-28 Thread Santiago Martinez

Seems that it was related to PR273661.

I follow this https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=273661#c5

and its now building

Thanks

Santi


On 12/28/23 20:32, Santiago Martinez wrote:

Hi David,

- I'm running 14.0R-P3

- Last commit:
    5f71f9636efa25f6de1a832202bae7c78ad013aa (HEAD -> main, 
origin/main, origin/HEAD)

    Author: rilysh 
    Date:   Thu Dec 28 02:34:32 2023 -0500

- Just a clean build, no options on command line or src.conf/make

- Kernel builds without a problem ( just in case).

Thanks.

Santi

On 12/28/23 16:23, David Wolfskill wrote:

On Thu, Dec 28, 2023 at 04:05:49PM +0100, Santiago Martinez wrote:

Hi Everyone, I'm having issues building world from current (just now).

Same header missing on multiple parts.

Best regards.

Santiago


It might be useful to know:
* What you are running at the time;
* what the most recent commit in your source tree is;
* whether this is a clean build, you are using make's META_MODE, or you
   are just setting -DNO_CLEAN.

I have not seen the issue you cite; my most recent builds of head:

FreeBSD 15.0-CURRENT #22 main-n267169-5bc10feacc9d: Tue Dec 26 
12:14:41 UTC 2023 
r...@freebeast.catwhisker.org:/common/S4/obj/usr/src/amd64.amd64/sys/GENERIC 
amd64 158 158
FreeBSD 15.0-CURRENT #23 main-n267215-3334a537ed38: Wed Dec 27 
15:52:04 UTC 2023 
r...@freebeast.catwhisker.org:/common/S4/obj/usr/src/amd64.amd64/sys/GENERIC 
amd64 158 158
FreeBSD 15.0-CURRENT #24 main-n267279-789480702e49: Thu Dec 28 
12:18:34 UTC 2023 
r...@freebeast.catwhisker.org:/common/S4/obj/usr/src/amd64.amd64/sys/GENERIC 
amd64 158 158

(in my case, using make's META_MODE).

More details at https://www.catwhisker.org/~david/FreeBSD/history/.

Peace,
david






Re: Problem building world on current

2023-12-28 Thread Santiago Martinez

Hi David,

- I'm running 14.0R-P3

- Last commit:
    5f71f9636efa25f6de1a832202bae7c78ad013aa (HEAD -> main, 
origin/main, origin/HEAD)

    Author: rilysh 
    Date:   Thu Dec 28 02:34:32 2023 -0500

- Just a clean build, no options on command line or src.conf/make

- Kernel builds without a problem ( just in case).

Thanks.

Santi

On 12/28/23 16:23, David Wolfskill wrote:

On Thu, Dec 28, 2023 at 04:05:49PM +0100, Santiago Martinez wrote:

Hi Everyone, I'm having issues building world from current (just now).

Same header missing on multiple parts.

Best regards.

Santiago


It might be useful to know:
* What you are running at the time;
* what the most recent commit in your source tree is;
* whether this is a clean build, you are using make's META_MODE, or you
   are just setting -DNO_CLEAN.

I have not seen the issue you cite; my most recent builds of head:

FreeBSD 15.0-CURRENT #22 main-n267169-5bc10feacc9d: Tue Dec 26 12:14:41 UTC 
2023 
r...@freebeast.catwhisker.org:/common/S4/obj/usr/src/amd64.amd64/sys/GENERIC 
amd64 158 158
FreeBSD 15.0-CURRENT #23 main-n267215-3334a537ed38: Wed Dec 27 15:52:04 UTC 
2023 
r...@freebeast.catwhisker.org:/common/S4/obj/usr/src/amd64.amd64/sys/GENERIC 
amd64 158 158
FreeBSD 15.0-CURRENT #24 main-n267279-789480702e49: Thu Dec 28 12:18:34 UTC 
2023 
r...@freebeast.catwhisker.org:/common/S4/obj/usr/src/amd64.amd64/sys/GENERIC 
amd64 158 158
(in my case, using make's META_MODE).

More details at https://www.catwhisker.org/~david/FreeBSD/history/.

Peace,
david




Re: Problem building world on current

2023-12-28 Thread David Wolfskill
On Thu, Dec 28, 2023 at 04:05:49PM +0100, Santiago Martinez wrote:
> Hi Everyone, I'm having issues building world from current (just now).
> 
> Same header missing on multiple parts.
> 
> Best regards.
> 
> Santiago
> 

It might be useful to know:
* What you are running at the time;
* what the most recent commit in your source tree is;
* whether this is a clean build, you are using make's META_MODE, or you
  are just setting -DNO_CLEAN.

I have not seen the issue you cite; my most recent builds of head:

FreeBSD 15.0-CURRENT #22 main-n267169-5bc10feacc9d: Tue Dec 26 12:14:41 UTC 
2023 
r...@freebeast.catwhisker.org:/common/S4/obj/usr/src/amd64.amd64/sys/GENERIC 
amd64 1500008 1500008
FreeBSD 15.0-CURRENT #23 main-n267215-3334a537ed38: Wed Dec 27 15:52:04 UTC 
2023 
r...@freebeast.catwhisker.org:/common/S4/obj/usr/src/amd64.amd64/sys/GENERIC 
amd64 1500008 1500008
FreeBSD 15.0-CURRENT #24 main-n267279-789480702e49: Thu Dec 28 12:18:34 UTC 
2023 
r...@freebeast.catwhisker.org:/common/S4/obj/usr/src/amd64.amd64/sys/GENERIC 
amd64 158 158
(in my case, using make's META_MODE).

More details at https://www.catwhisker.org/~david/FreeBSD/history/.

Peace,
david
-- 
David H. Wolfskill  da...@catwhisker.org
Do these ends really justify those means?

See https://www.catwhisker.org/~david/publickey.gpg for my public key.


signature.asc
Description: PGP signature


Problem building world on current

2023-12-28 Thread Santiago Martinez

Hi Everyone, I'm having issues building world from current (just now).

Same header missing on multiple parts.

Best regards.

Santiago

"""
In file included from 
/usr/src/contrib/llvm-project/llvm/lib/Demangle/ItaniumDemangle.cpp:13:
In file included from 
/usr/src/contrib/llvm-project/llvm/include/llvm/Demangle/Demangle.h:13:
/usr/include/c++/v1/string:561:10: fatal error: '__string/char_traits.h' 
file not found

#include <__string/char_traits.h>
^~~~
1 error generated.
*** Error code 1
"""



Re: compile 13.2p8 on a recent current fails: compiler issue ?

2023-12-13 Thread Dag-Erling Smørgrav
Dimitry Andric  writes:
> henry vogt  writes:
> > ===> usr.sbin/zic (obj,all,install)
> > Building /usr/obj/usr/src/13.2/amd64.amd64/tmp/obj-tools/usr.sbin/zic/zic.o
> > --- zic.o ---
> > /usr/src/13.2/contrib/tzcode/zic.c:464:8: error: an attribute list cannot 
> > appear here
> >   464 | static ATTRIBUTE_NORETURN void
> >   |^~
> This appears to have been fixed upstream some time ago:
> https://github.com/eggert/tz/commit/9cfe9507fcc22cd4a0c4da486ea1c7f0de6b075f

It's also fixed in 14 and 15:
https://cgit.freebsd.org/src/commit/?id=75411d157232ee3b4789b92c9205453e7d59a3d2

It was too late for 13.2, but I'll make sure it's merged before 13.3.

DES
-- 
Dag-Erling Smørgrav - d...@freebsd.org



Re: compile 13.2p8 on a recent current fails: compiler issue ?

2023-12-13 Thread Dimitry Andric
On 13 Dec 2023, at 13:08, henry vogt  wrote:
> 
> attempt to compile 13.2p8 on a recent current fails: compiler issue ?
> 
> ...
> 
> ===> usr.sbin/zic (obj,all,install)
> Building /usr/obj/usr/src/13.2/amd64.amd64/tmp/obj-tools/usr.sbin/zic/zic.o
> --- zic.o ---
> /usr/src/13.2/contrib/tzcode/zic.c:464:8: error: an attribute list cannot 
> appear here
>   464 | static ATTRIBUTE_NORETURN void
>   |^~
> /usr/src/13.2/contrib/tzcode/private.h:471:30: note: expanded from macro 
> 'ATTRIBUTE_NORETURN'
>   471 | #  define ATTRIBUTE_NORETURN [[noreturn]]
>   |  ^~~~
> /usr/src/13.2/contrib/tzcode/zic.c:471:8: error: an attribute list cannot 
> appear here
>   471 | static ATTRIBUTE_NORETURN void
>   |^~
> /usr/src/13.2/contrib/tzcode/private.h:471:30: note: expanded from macro 
> 'ATTRIBUTE_NORETURN'
>   471 | #  define ATTRIBUTE_NORETURN [[noreturn]]
>   |  ^~~~
> /usr/src/13.2/contrib/tzcode/zic.c:669:8: error: an attribute list cannot 
> appear here
>   669 | static ATTRIBUTE_NORETURN void
>   |^~
> /usr/src/13.2/contrib/tzcode/private.h:471:30: note: expanded from macro 
> 'ATTRIBUTE_NORETURN'
>   471 | #  define ATTRIBUTE_NORETURN [[noreturn]]
>   |  ^~~~
> /usr/src/13.2/contrib/tzcode/zic.c:3778:8: error: an attribute list cannot 
> appear here
>  3778 | static ATTRIBUTE_NORETURN void
>   |^~
> /usr/src/13.2/contrib/tzcode/private.h:471:30: note: expanded from macro 
> 'ATTRIBUTE_NORETURN'
>   471 | #  define ATTRIBUTE_NORETURN [[noreturn]]
>   |  ^~~~
> 4 errors generated.
> *** [zic.o] Error code 1
> 
> make[3]: stopped in /usr/src/13.2/usr.sbin/zic
> 
> # cc -v
> FreeBSD clang version 17.0.6 (https://github.com/llvm/llvm-project.git 
> llvmorg-17.0.6-0-g6009708b4367)
> Target: x86_64-unknown-freebsd15.0
> Thread model: posix
> InstalledDir: /usr/bin

This appears to have been fixed upstream some time ago:
https://github.com/eggert/tz/commit/9cfe9507fcc22cd4a0c4da486ea1c7f0de6b075f

but clang 17 has become more strict about invalid attribute placement, possibly 
to be more like gcc (which should already have given this as a warning or 
error).

So I guess for 13.2-p8 you will have to apply that fix manually, if you want to 
build it on a 15-CURRENT box. Otherwise, I would advise a jail.

-Dimitry



signature.asc
Description: Message signed with OpenPGP


  1   2   3   4   5   6   7   8   9   10   >