Re: huge amount laundry memory not being cleaned up

2023-07-10 Thread Pete Wright




On 7/10/23 4:26 PM, Pete Wright wrote:




i'm doing a build world now since i'll need to reboot this box anyway, 
just to get everything up to date.  interestingly enough i'm still 
pegged at 14G of laundry memory, and my 2G swap is %100 utilized.  once 
the build world completes i'll do a double check and see if i can find 
any large consumers of resident memory which may lead me in the right 
direction.


so...build world competed, and after quitting my kde5 session all of the 
laundry got free'd up right away.  i suspect either firefox/chrome 
didn't exit cleanly, or there was something funky going on with kde.


sorry for the noise folks, this is almost certainly an issue with my 
local env.  this is helpful for me, as i'll have a better idea as to 
where i should focus my efforts trying to find this memory hog next time.


-p

--
Pete Wright
p...@nomadlogic.org



Re: huge amount laundry memory not being cleaned up

2023-07-10 Thread Pete Wright




On 7/10/23 4:01 PM, Mark Millard wrote:

Pete Wright  wrote on
Date: Mon, 10 Jul 2023 19:35:26 UTC :


hi there,
i've got a workstation running CURRENT that recently ran out of swap
space. i killed the usual suspects (chrome, firefox and thunderbird)
and noticed some odd behavior. while some memory did get freed up -
after leaving the system idle for 4 hours i still have 14G or memory in
the Laundry according to top. I also have noticed that very little data
has paged out of swap (100MB out of 2G).


i was wondering if there was a good way to determine what is in the
laundry,


I do not know how to get a breakdown of the laundry's usage.
But I'd expect, say, for example, top's resident memory
figures would count what is in the laundry as resident.
If correct, given the large laundry usage, may be some
resident figures would be suggestive?


thanks Mark, so yea i poked around and didn't see any large consumers of 
resident memory.





or get diagnostic info on why it's not cleaning itself up?


As I understand, it would take take one of the following
to change the status of the pages in the laundry in normal
operation:

A) access to a page by a program, turning the page into
 being in the active category. (It might go through
 inactive to get there?)

B) memory pressure leading to sending the page to the swap
 in order to provide a page for a different use.

Time alone does not contribute much as I understand. More
on-demand driven. Laundry is sort of "inactive but known
to be dirty" as I understand, in some respects just a
subset of inactive optimized for being closer to ready to
page out to swap space if needed.



i'm doing a build world now since i'll need to reboot this box anyway, 
just to get everything up to date.  interestingly enough i'm still 
pegged at 14G of laundry memory, and my 2G swap is %100 utilized.  once 
the build world completes i'll do a double check and see if i can find 
any large consumers of resident memory which may lead me in the right 
direction.


cheers!
-pete


--
Pete Wright
p...@nomadlogic.org



RE: huge amount laundry memory not being cleaned up

2023-07-10 Thread Mark Millard
Pete Wright  wrote on
Date: Mon, 10 Jul 2023 19:35:26 UTC :

> hi there,
> i've got a workstation running CURRENT that recently ran out of swap 
> space. i killed the usual suspects (chrome, firefox and thunderbird) 
> and noticed some odd behavior. while some memory did get freed up - 
> after leaving the system idle for 4 hours i still have 14G or memory in 
> the Laundry according to top. I also have noticed that very little data 
> has paged out of swap (100MB out of 2G).
> 
> 
> i was wondering if there was a good way to determine what is in the 
> laundry,

I do not know how to get a breakdown of the laundry's usage.
But I'd expect, say, for example, top's resident memory
figures would count what is in the laundry as resident.
If correct, given the large laundry usage, may be some
resident figures would be suggestive?

> or get diagnostic info on why it's not cleaning itself up? 

As I understand, it would take take one of the following
to change the status of the pages in the laundry in normal
operation:

A) access to a page by a program, turning the page into
being in the active category. (It might go through
inactive to get there?)

B) memory pressure leading to sending the page to the swap
in order to provide a page for a different use.

Time alone does not contribute much as I understand. More
on-demand driven. Laundry is sort of "inactive but known
to be dirty" as I understand, in some respects just a
subset of inactive optimized for being closer to ready to
page out to swap space if needed.

> when i've seen this before only a reboot will get the system back to 
> being stable, if i re-launch my desktop apps they'll quickly start 
> trying to page out to disk again creating an OOM condition.
> 
> system has 32G of RAM and is running this checkout
> FreeBSD topanga 14.0-CURRENT FreeBSD 14.0-CURRENT #66 
> main-n263884-d2a45e9e817a: Thu Jun 29 15:50:44 PDT 2023


===
Mark Millard
marklmi at yahoo.com




Re: Does kyua based testing need some hazmat/bindings/_openssl.abi3.so related updating?: Undefined symbol "ERR_GET_FUNC"

2023-07-10 Thread Mark Millard
On Jul 10, 2023, at 15:03, Mark Millard  wrote:

> On Jul 10, 2023, at 11:42, The Doctor  wrote:
> 
>> On Mon, Jul 10, 2023 at 08:56:22AM -0700, Mark Millard wrote:
>>> The subject line's question was prompted by
>>> . . ./hazmat/bindings/_openssl.abi3.so related notices
>>> in a kyua report:
>>> 
>>> # kyua report --verbose 
>>> --results-file=usr_obj_DESTDIRs_main-CA7-chroot_usr_tests.20230710-064632-752785
>>>  2>&1 | grep "Undefined symbol" | sort -u
>>> +ImportError: 
>>> /usr/obj/DESTDIRs/main-CA7-chroot/usr/local/lib/python3.9/site-packages/cryptography/hazmat/bindings/_openssl.abi3.so:
>>>  Undefined symbol "ERR_GET_FUNC"
>>> ImportError: 
>>> /usr/local/lib/python3.9/site-packages/cryptography/hazmat/bindings/_openssl.abi3.so:
>>>  Undefined symbol "ERR_GET_FUNC"
>>> ImportError: 
>>> /usr/obj/DESTDIRs/main-CA7-chroot/usr/local/lib/python3.9/site-packages/cryptography/hazmat/bindings/_openssl.abi3.so:
>>>  Undefined symbol "ERR_GET_FUNC"
>>> 
>>> It is possible that this is related to some oddities of my
>>> context for this. But I figured I'd ask the general question
>>> anyway.
>>> 
>> 
>> No! The problem is that Python is calling an openssl 1.X function
>> which is dropped in Opensss 3.X
>> 
>> Python nedds to fix that issue.
> 
> Well:
> 
> # strings 
> /usr/obj/DESTDIRs/main-CA7-chroot/usr/local/lib/python3.9/site-packages/cryptography/hazmat/bindings/_openssl.abi3.so
>  | grep -i "3\.[0-9]*\.[0-9]"
> OpenSSL 3.0.9 30 May 2023
> 3.4.8
> 
> From what I read, 3.4.8 is too old and is known to have this issue and this
> was fixed in a later version. I see references to "cryptography" needing to
> be "at least 35.0.0 for OpenSSL 3.0 support" instead of "3.4.8" as one place
> put it.
> 
> I've no clue of the details for python3.9 vs. python3.10 or python3.11 for
> containing a sufficiently modern "cryptography" already in FreeBSD ports
> (vs. not). But this may be more of a port-update issue than an up-stream
> python issue -- or possibly just a "use python 3.? or later" issue for
> some value for "?".
> 

35.0.0 of cryptography dates back to 2021-09-29.
Current for cryptography is 41.0.1 (2023-06-01).
It claims: "It supports Python 3.7+ and PyPy3
7.3.10+."

security/py-cryptography is at 3.4.8 (2021-08-24)
for py39-cryptography and is, in-part, a FreeBSD
ports issue as far as I can tell.

Looking, it seems it is at 3.4.8 for all @${PY_FLAVOR}
instances. So trying python310 or python311 might
well do no good for openssl 3.0 compatibility if they
use security/py-cryptography .

(Note: I build my own ports via poudriere-devel .)

===
Mark Millard
marklmi at yahoo.com




Re: Does kyua based testing need some hazmat/bindings/_openssl.abi3.so related updating?: Undefined symbol "ERR_GET_FUNC"

2023-07-10 Thread Mark Millard
On Jul 10, 2023, at 11:42, The Doctor  wrote:

> On Mon, Jul 10, 2023 at 08:56:22AM -0700, Mark Millard wrote:
>> The subject line's question was prompted by
>> . . ./hazmat/bindings/_openssl.abi3.so related notices
>> in a kyua report:
>> 
>> # kyua report --verbose 
>> --results-file=usr_obj_DESTDIRs_main-CA7-chroot_usr_tests.20230710-064632-752785
>>  2>&1 | grep "Undefined symbol" | sort -u
>> +ImportError: 
>> /usr/obj/DESTDIRs/main-CA7-chroot/usr/local/lib/python3.9/site-packages/cryptography/hazmat/bindings/_openssl.abi3.so:
>>  Undefined symbol "ERR_GET_FUNC"
>> ImportError: 
>> /usr/local/lib/python3.9/site-packages/cryptography/hazmat/bindings/_openssl.abi3.so:
>>  Undefined symbol "ERR_GET_FUNC"
>> ImportError: 
>> /usr/obj/DESTDIRs/main-CA7-chroot/usr/local/lib/python3.9/site-packages/cryptography/hazmat/bindings/_openssl.abi3.so:
>>  Undefined symbol "ERR_GET_FUNC"
>> 
>> It is possible that this is related to some oddities of my
>> context for this. But I figured I'd ask the general question
>> anyway.
>> 
> 
> No! The problem is that Python is calling an openssl 1.X function
> which is dropped in Opensss 3.X
> 
> Python nedds to fix that issue.

Well:

# strings 
/usr/obj/DESTDIRs/main-CA7-chroot/usr/local/lib/python3.9/site-packages/cryptography/hazmat/bindings/_openssl.abi3.so
 | grep -i "3\.[0-9]*\.[0-9]"
OpenSSL 3.0.9 30 May 2023
3.4.8

From what I read, 3.4.8 is too old and is known to have this issue and this
was fixed in a later version. I see references to "cryptography" needing to
be "at least 35.0.0 for OpenSSL 3.0 support" instead of "3.4.8" as one place
put it.

I've no clue of the details for python3.9 vs. python3.10 or python3.11 for
containing a sufficiently modern "cryptography" already in FreeBSD ports
(vs. not). But this may be more of a port-update issue than an up-stream
python issue -- or possibly just a "use python 3.? or later" issue for
some value for "?".


===
Mark Millard
marklmi at yahoo.com




huge amount laundry memory not being cleaned up

2023-07-10 Thread Pete Wright

hi there,
i've got a workstation running CURRENT that recently ran out of swap 
space.  i killed the usual suspects (chrome, firefox and thunderbird) 
and noticed some odd behavior.  while some memory did get freed up - 
after leaving the system idle for 4 hours i still have 14G or memory in 
the Laundry according to top.  I also have noticed that very little data 
has paged out of swap (100MB out of 2G).



i was wondering if there was a good way to determine what is in the 
laundry, or get diagnostic info on why it's not cleaning itself up? 
when i've seen this before only a reboot will get the system back to 
being stable, if i re-launch my desktop apps they'll quickly start 
trying to page out to disk again creating an OOM condition.


system has 32G of RAM and is running this checkout
FreeBSD topanga 14.0-CURRENT FreeBSD 14.0-CURRENT #66 
main-n263884-d2a45e9e817a: Thu Jun 29 15:50:44 PDT 2023


Cheers,
-pete

--
Pete Wright
p...@nomadlogic.org



Re: Does kyua based testing need some hazmat/bindings/_openssl.abi3.so related updating?: Undefined symbol "ERR_GET_FUNC"

2023-07-10 Thread The Doctor
On Mon, Jul 10, 2023 at 08:56:22AM -0700, Mark Millard wrote:
> The subject line's question was prompted by
> . . ./hazmat/bindings/_openssl.abi3.so related notices
> in a kyua report:
> 
> # kyua report --verbose 
> --results-file=usr_obj_DESTDIRs_main-CA7-chroot_usr_tests.20230710-064632-752785
>  2>&1 | grep "Undefined symbol" | sort -u
> +ImportError: 
> /usr/obj/DESTDIRs/main-CA7-chroot/usr/local/lib/python3.9/site-packages/cryptography/hazmat/bindings/_openssl.abi3.so:
>  Undefined symbol "ERR_GET_FUNC"
> ImportError: 
> /usr/local/lib/python3.9/site-packages/cryptography/hazmat/bindings/_openssl.abi3.so:
>  Undefined symbol "ERR_GET_FUNC"
> ImportError: 
> /usr/obj/DESTDIRs/main-CA7-chroot/usr/local/lib/python3.9/site-packages/cryptography/hazmat/bindings/_openssl.abi3.so:
>  Undefined symbol "ERR_GET_FUNC"
> 
> It is possible that this is related to some oddities of my
> context for this. But I figured I'd ask the general question
> anyway.
>

No! The problem is that Python is calling an openssl 1.X function
which is dropped in Opensss 3.X

Python nedds to fix that issue.

> ===
> Mark Millard
> marklmi at yahoo.com
> 
> 

-- 
Member - Liberal International This is doc...@nk.ca Ici doc...@nk.ca
Yahweh, King & country!Never Satan President Republic!Beware AntiChrist rising!
Look at Psalms 14 and 53 on Atheism https://www.empire.kred/ROOTNK?t=94a1f39b 
"We should do good unless it inconveniences us," is not righteous thinking. 
-unknown Beware https://mindspring.com



Re: Does kyua based testing need some hazmat/bindings/_openssl.abi3.so related updating?: Undefined symbol "ERR_GET_FUNC"

2023-07-10 Thread Mark Millard
On Jul 10, 2023, at 10:55, Mark Millard  wrote:

> On Jul 10, 2023, at 09:45, Mike Karels  wrote:
> 
>> On 10 Jul 2023, at 10:56, Mark Millard wrote:
>> 
>>> The subject line's question was prompted by
>>> . . ./hazmat/bindings/_openssl.abi3.so related notices
>>> in a kyua report:
>>> 
>>> # kyua report --verbose 
>>> --results-file=usr_obj_DESTDIRs_main-CA7-chroot_usr_tests.20230710-064632-752785
>>>  2>&1 | grep "Undefined symbol" | sort -u
>>> +ImportError: 
>>> /usr/obj/DESTDIRs/main-CA7-chroot/usr/local/lib/python3.9/site-packages/cryptography/hazmat/bindings/_openssl.abi3.so:
>>>  Undefined symbol "ERR_GET_FUNC"
>>> ImportError: 
>>> /usr/local/lib/python3.9/site-packages/cryptography/hazmat/bindings/_openssl.abi3.so:
>>>  Undefined symbol "ERR_GET_FUNC"
>>> ImportError: 
>>> /usr/obj/DESTDIRs/main-CA7-chroot/usr/local/lib/python3.9/site-packages/cryptography/hazmat/bindings/_openssl.abi3.so:
>>>  Undefined symbol "ERR_GET_FUNC"
>>> 
>>> It is possible that this is related to some oddities of my
>>> context for this. But I figured I'd ask the general question
>>> anyway.
>> 
>> I haven't seen this.  My v7 environments (chroot and /usr/lib32) have
>> only libssl.so.3, not .111, so they must be using OpenSSL 3.0.
> 
> But is the phython3 use by kyua of aarch64 code? armv7 code?
> 
> # file /usr/obj/DESTDIRs/main-CA7-chroot/usr/bin/kyua
> /usr/obj/DESTDIRs/main-CA7-chroot/usr/bin/kyua: ELF 32-bit LSB executable, 
> ARM, EABI5 version 1 (FreeBSD), dynamically linked, interpreter 
> /libexec/ld-elf.so.1, FreeBSD-style, for FreeBSD 14.0 (1400092), not stripped
> 
> So: armv7 for my lib32 testing activity.
> 
>> Which version of kyua is this running (32 or 64 bit)?
> 
> armv7 (so: 32-bit). This is using my way of causing more
> code to be armv7 instead of aarch64 processes for lib32
> testing than I expect your testing technique ends up
> with.
> 
> # file /usr/obj/DESTDIRs/main-CA7-chroot/usr/bin/kyua
> /usr/obj/DESTDIRs/main-CA7-chroot/usr/bin/kyua: ELF 32-bit LSB executable, 
> ARM, EABI5 version 1 (FreeBSD), dynamically linked, interpreter 
> /libexec/ld-elf.so.1, FreeBSD-style, for FreeBSD 14.0 (1400092), not stripped
> 
> For reference:
> 
> /usr/obj/DESTDIRs/main-CA7-chroot/usr/lib/libssl.so
> /usr/obj/DESTDIRs/main-CA7-chroot/usr/lib/libssl.so.30
> 
> As for the aarch4 boot environment:
> 
> /usr/lib/libssl.so
> /usr/lib/libssl.so.30

I forgot to list:

/usr/lib32/libssl.so.30
/usr/lib32/libssl.so

Sorry for the confusion.

> There are no *.111* files on the system other than some
> old log files or other archiving of old things in 2
> separate old-stuff directory trees that are not in
> use.


===
Mark Millard
marklmi at yahoo.com




Re: Does kyua based testing need some hazmat/bindings/_openssl.abi3.so related updating?: Undefined symbol "ERR_GET_FUNC"

2023-07-10 Thread Mark Millard
On Jul 10, 2023, at 09:45, Mike Karels  wrote:

> On 10 Jul 2023, at 10:56, Mark Millard wrote:
> 
>> The subject line's question was prompted by
>> . . ./hazmat/bindings/_openssl.abi3.so related notices
>> in a kyua report:
>> 
>> # kyua report --verbose 
>> --results-file=usr_obj_DESTDIRs_main-CA7-chroot_usr_tests.20230710-064632-752785
>>  2>&1 | grep "Undefined symbol" | sort -u
>> +ImportError: 
>> /usr/obj/DESTDIRs/main-CA7-chroot/usr/local/lib/python3.9/site-packages/cryptography/hazmat/bindings/_openssl.abi3.so:
>>  Undefined symbol "ERR_GET_FUNC"
>> ImportError: 
>> /usr/local/lib/python3.9/site-packages/cryptography/hazmat/bindings/_openssl.abi3.so:
>>  Undefined symbol "ERR_GET_FUNC"
>> ImportError: 
>> /usr/obj/DESTDIRs/main-CA7-chroot/usr/local/lib/python3.9/site-packages/cryptography/hazmat/bindings/_openssl.abi3.so:
>>  Undefined symbol "ERR_GET_FUNC"
>> 
>> It is possible that this is related to some oddities of my
>> context for this. But I figured I'd ask the general question
>> anyway.
> 
> I haven't seen this.  My v7 environments (chroot and /usr/lib32) have
> only libssl.so.3, not .111, so they must be using OpenSSL 3.0.

But is the phython3 use by kyua of aarch64 code? armv7 code?

# file /usr/obj/DESTDIRs/main-CA7-chroot/usr/bin/kyua
/usr/obj/DESTDIRs/main-CA7-chroot/usr/bin/kyua: ELF 32-bit LSB executable, ARM, 
EABI5 version 1 (FreeBSD), dynamically linked, interpreter 
/libexec/ld-elf.so.1, FreeBSD-style, for FreeBSD 14.0 (1400092), not stripped

So: armv7 for my lib32 testing activity.

> Which version of kyua is this running (32 or 64 bit)?

armv7 (so: 32-bit). This is using my way of causing more
code to be armv7 instead of aarch64 processes for lib32
testing than I expect your testing technique ends up
with.

# file /usr/obj/DESTDIRs/main-CA7-chroot/usr/bin/kyua
/usr/obj/DESTDIRs/main-CA7-chroot/usr/bin/kyua: ELF 32-bit LSB executable, ARM, 
EABI5 version 1 (FreeBSD), dynamically linked, interpreter 
/libexec/ld-elf.so.1, FreeBSD-style, for FreeBSD 14.0 (1400092), not stripped

For reference:

/usr/obj/DESTDIRs/main-CA7-chroot/usr/lib/libssl.so
/usr/obj/DESTDIRs/main-CA7-chroot/usr/lib/libssl.so.30

As for the aarch4 boot environment:

/usr/lib/libssl.so
/usr/lib/libssl.so.30

There are no *.111* files on the system other than some
old log files or other archiving of old things in 2
separate old-stuff directory trees that are not in
use.

===
Mark Millard
marklmi at yahoo.com




Re: Does kyua based testing need some hazmat/bindings/_openssl.abi3.so related updating?: Undefined symbol "ERR_GET_FUNC"

2023-07-10 Thread Mike Karels
On 10 Jul 2023, at 10:56, Mark Millard wrote:

> The subject line's question was prompted by
> . . ./hazmat/bindings/_openssl.abi3.so related notices
> in a kyua report:
>
> # kyua report --verbose 
> --results-file=usr_obj_DESTDIRs_main-CA7-chroot_usr_tests.20230710-064632-752785
>  2>&1 | grep "Undefined symbol" | sort -u
> +ImportError: 
> /usr/obj/DESTDIRs/main-CA7-chroot/usr/local/lib/python3.9/site-packages/cryptography/hazmat/bindings/_openssl.abi3.so:
>  Undefined symbol "ERR_GET_FUNC"
> ImportError: 
> /usr/local/lib/python3.9/site-packages/cryptography/hazmat/bindings/_openssl.abi3.so:
>  Undefined symbol "ERR_GET_FUNC"
> ImportError: 
> /usr/obj/DESTDIRs/main-CA7-chroot/usr/local/lib/python3.9/site-packages/cryptography/hazmat/bindings/_openssl.abi3.so:
>  Undefined symbol "ERR_GET_FUNC"
>
> It is possible that this is related to some oddities of my
> context for this. But I figured I'd ask the general question
> anyway.

I haven't seen this.  My v7 environments (chroot and /usr/lib32) have
only libssl.so.3, not .111, so they must be using OpenSSL 3.0.

Which version of kyua is this running (32 or 64 bit)?

Mike

> ===
> Mark Millard
> marklmi at yahoo.com



Does kyua based testing need some hazmat/bindings/_openssl.abi3.so related updating?: Undefined symbol "ERR_GET_FUNC"

2023-07-10 Thread Mark Millard
The subject line's question was prompted by
. . ./hazmat/bindings/_openssl.abi3.so related notices
in a kyua report:

# kyua report --verbose 
--results-file=usr_obj_DESTDIRs_main-CA7-chroot_usr_tests.20230710-064632-752785
 2>&1 | grep "Undefined symbol" | sort -u
+ImportError: 
/usr/obj/DESTDIRs/main-CA7-chroot/usr/local/lib/python3.9/site-packages/cryptography/hazmat/bindings/_openssl.abi3.so:
 Undefined symbol "ERR_GET_FUNC"
ImportError: 
/usr/local/lib/python3.9/site-packages/cryptography/hazmat/bindings/_openssl.abi3.so:
 Undefined symbol "ERR_GET_FUNC"
ImportError: 
/usr/obj/DESTDIRs/main-CA7-chroot/usr/local/lib/python3.9/site-packages/cryptography/hazmat/bindings/_openssl.abi3.so:
 Undefined symbol "ERR_GET_FUNC"

It is possible that this is related to some oddities of my
context for this. But I figured I'd ask the general question
anyway.

===
Mark Millard
marklmi at yahoo.com




Re: shell hung in fork system call

2023-07-10 Thread Konstantin Belousov
On Mon, Jul 10, 2023 at 09:39:35AM +, John F Carr wrote:
> 
> 
> > On Jul 9, 2023, at 19:59, Konstantin Belousov  wrote:
> > 
> > On Sun, Jul 09, 2023 at 11:36:03PM +, John F Carr wrote:
> >> 
> >> 
> >>> On Jul 9, 2023, at 19:25, Konstantin Belousov  wrote:
> >>> 
> >>> On Sun, Jul 09, 2023 at 10:41:27PM +, John F Carr wrote:
>  Kernel and system at a146207d66f320ed239c1059de9df854b66b55b7 plus some 
>  irrelevant local changes, four 64 bit ARM processors, make.conf sets 
>  CPUTYPE?=cortex-a57.
>  
>  I typed ^C while /bin/sh was starting a pipeline and my shell got hung 
>  in the middle of fork().
>  
> > From the terminal:
>  
>  # git log --oneline --|more
>  ^C^C^C
>  load: 3.26  cmd: sh 95505 [fork] 5308.67r 0.00u 0.03s 0% 2860k
>  mi_switch+0x198 sleepq_switch+0xfc sleepq_timedwait+0x40 _sleep+0x264 
>  fork1+0x67c sys_fork+0x34 do_el0_sync+0x4c8 handle_el0_sync+0x44 
>  load: 3.16  cmd: sh 95505 [fork] 5311.75r 0.00u 0.03s 0% 2860k
>  mi_switch+0x198 sleepq_switch+0xfc sleepq_timedwait+0x40 _sleep+0x264 
>  fork1+0x67c sys_fork+0x34 do_el0_sync+0x4c8 handle_el0_sync+0x44 
>  
>  According to ps -d on another terminal the shell has no children:
>  
>  PID TT  STAT   TIME COMMAND
>  [...]
>  873 u0  IWs 0:00.00 `-- login [pam] (login)
>  874 u0  I   0:00.17   `-- -sh (sh)
>  95504 u0  I   0:00.01 `-- su -
>  95505 u0  D+  0:00.05   `-- -su (sh)
>  [...]
>  
>  Nothing on the (115200 bps serial) console.  No change in system 
>  performance.
>  
>  The system is busy copying a large amount of data from the network to a 
>  ZFS pool on spinning disks.  The git|more pipeline could have taken some 
>  time to get going while I/O requests worked their way through the queue. 
>   It would not have touched the busy pool, only the zroot pool on an SSD.
>  
>  Has anything changed recently that might cause this?
> >>> 
> >>> There was some change around fork, but your sleep seems to be not from
> >>> that change.  Can you show the wait channel for the process?  Do something
> >>> like
> >>> $ ps alxww
> >>> 
> >> 
> >> UID   PID  PPID  C PRI NI   VSZ   RSS MWCHAN   STAT TTTIME COMMAND
> >>   0 95505 95504  2  20  0 13508  2876 fork D+   u0 0:00.13 -su (sh)
> >> 
> >> This is probably the same information displayed as [fork] in the output 
> >> from ^T.
> >> 
> >> Does it correspond to the source line
> >> 
> >> pause("fork", hz / 2);
> >> 
> >> ?
> > 
> > Yes, it is rate-limiting code.  Still it is interesting to see the whole
> > ps output.
> > 
> > Do you have 7a70f17ac4bd64dc1a5020f in your source?
> 
> No, I do not have that commit.
> 
> The comment mentions livelock.  CPU use as reported by iostat did not change 
> after the process hung.

It is livelocking, but the looping could be rate-limited, similar to
what you see with pause. You need that revision definitely.

Is the problem reproducable on your machine? If yes, you can try one
additional fix, below.

commit 840ce1801ef1c1ab9b10c4c6e7b02403d2e749ee
Author: Konstantin Belousov 
Date:   Mon Jul 10 03:29:43 2023 +0300

sigqueue_delete_set_proc(): initialize sq_proc for worklist

This should fix leaks for the p_killpg_cnt counter, because
sigqueue_flush() drops ksi's.

Sponsored by:   The FreeBSD Foundation
MFC after:  1 week

diff --git a/sys/kern/kern_sig.c b/sys/kern/kern_sig.c
index 18756d53e98c..ecfde7a549fc 100644
--- a/sys/kern/kern_sig.c
+++ b/sys/kern/kern_sig.c
@@ -683,7 +683,7 @@ sigqueue_delete_set_proc(struct proc *p, const sigset_t 
*set)
 
PROC_LOCK_ASSERT(p, MA_OWNED);
 
-   sigqueue_init(, NULL);
+   sigqueue_init(, p);
sigqueue_move_set(>p_sigqueue, , set);
 
FOREACH_THREAD_IN_PROC(p, td0)



Re: shell hung in fork system call

2023-07-10 Thread John F Carr



> On Jul 9, 2023, at 19:59, Konstantin Belousov  wrote:
> 
> On Sun, Jul 09, 2023 at 11:36:03PM +, John F Carr wrote:
>> 
>> 
>>> On Jul 9, 2023, at 19:25, Konstantin Belousov  wrote:
>>> 
>>> On Sun, Jul 09, 2023 at 10:41:27PM +, John F Carr wrote:
 Kernel and system at a146207d66f320ed239c1059de9df854b66b55b7 plus some 
 irrelevant local changes, four 64 bit ARM processors, make.conf sets 
 CPUTYPE?=cortex-a57.
 
 I typed ^C while /bin/sh was starting a pipeline and my shell got hung in 
 the middle of fork().
 
> From the terminal:
 
 # git log --oneline --|more
 ^C^C^C
 load: 3.26  cmd: sh 95505 [fork] 5308.67r 0.00u 0.03s 0% 2860k
 mi_switch+0x198 sleepq_switch+0xfc sleepq_timedwait+0x40 _sleep+0x264 
 fork1+0x67c sys_fork+0x34 do_el0_sync+0x4c8 handle_el0_sync+0x44 
 load: 3.16  cmd: sh 95505 [fork] 5311.75r 0.00u 0.03s 0% 2860k
 mi_switch+0x198 sleepq_switch+0xfc sleepq_timedwait+0x40 _sleep+0x264 
 fork1+0x67c sys_fork+0x34 do_el0_sync+0x4c8 handle_el0_sync+0x44 
 
 According to ps -d on another terminal the shell has no children:
 
 PID TT  STAT   TIME COMMAND
 [...]
 873 u0  IWs 0:00.00 `-- login [pam] (login)
 874 u0  I   0:00.17   `-- -sh (sh)
 95504 u0  I   0:00.01 `-- su -
 95505 u0  D+  0:00.05   `-- -su (sh)
 [...]
 
 Nothing on the (115200 bps serial) console.  No change in system 
 performance.
 
 The system is busy copying a large amount of data from the network to a 
 ZFS pool on spinning disks.  The git|more pipeline could have taken some 
 time to get going while I/O requests worked their way through the queue.  
 It would not have touched the busy pool, only the zroot pool on an SSD.
 
 Has anything changed recently that might cause this?
>>> 
>>> There was some change around fork, but your sleep seems to be not from
>>> that change.  Can you show the wait channel for the process?  Do something
>>> like
>>> $ ps alxww
>>> 
>> 
>> UID   PID  PPID  C PRI NI   VSZ   RSS MWCHAN   STAT TTTIME COMMAND
>>   0 95505 95504  2  20  0 13508  2876 fork D+   u0 0:00.13 -su (sh)
>> 
>> This is probably the same information displayed as [fork] in the output from 
>> ^T.
>> 
>> Does it correspond to the source line
>> 
>> pause("fork", hz / 2);
>> 
>> ?
> 
> Yes, it is rate-limiting code.  Still it is interesting to see the whole
> ps output.
> 
> Do you have 7a70f17ac4bd64dc1a5020f in your source?

No, I do not have that commit.

The comment mentions livelock.  CPU use as reported by iostat did not change 
after the process hung.