Re: ALTQ cannot be stopped Was: Fwd: 10-BETA : some network issues

2023-07-23 Thread Luke Mewburn
On 23-07-22 04:34, matthew green wrote:
  | > (gdb) info locals
  | > No symbol table info available.
  | > (gdb)
  | >
  | >   I don't understand why gdb complains about debugging symbols.
  | 
  | i think it's because our build has a bug.  i was recently
  | trying to debug something in a shared library and had the
  | same problem and i traced it down to this code:
  | 
  |share/mk/bsd.README:MKSTRIPSYM If "yes", strip all local symbols 
from shared libraries;
  | 
  |share/mk/bsd.lib.mk:.if ${MKSTRIPSYM:Uyes} == "yes"
  |share/mk/bsd.lib.mk:_LIBLDOPTS+=   -Wl,-x
  |share/mk/bsd.lib.mk:.else
  |share/mk/bsd.lib.mk:_LIBLDOPTS+=   -Wl,-X
  | 
  | and putting "MKSTRIPSYM=no" in my mk.conf fixed it.
  | 
  | i believe this is a bug and should default to "no", if
  | MKDEBUG has also been enabled.

I've changed bsd.own.mk so that if MKDEBUG!=no (MKDEBUG=yes),
it forces MKSTRIPSYM=no


Luke.


Re: ALTQ cannot be stopped Was: Fwd: 10-BETA : some network issues

2023-07-22 Thread Taylor R Campbell
> Date: Sat, 22 Jul 2023 17:14:11 +0200
> From: BERTRAND Joël 
> 
>   I have rebuilt my tree, started and stopped altqd.

Excellent, thanks!

Can you either:

(a) send me a tarball of your altqd binary together with all the
shared libraries and .debug files that gdb mentioned, and together
with a core dump of the process (you should be able to get one by
sending it SIGABRT); or

(b) print the the contents of the ifinfo->cllist queue?

For (b), you will need to do:

(gdb) print *ifinfo
(gdb) print ifinfo->cllist->lh_first
(gdb) print *ifinfo->cllist->lh_first
(gdb) print ifinfo->cllist->lh_first->next.le_next
(gdb) print *ifinfo->cllist->lh_first->next.le_next
(gdb) print ifinfo->cllist->lh_first->next.le_next->next.le_next
(gdb) print *ifinfo->cllist->lh_first->next.le_next->next.le_next
...

and so on, until the next.le_next member is null, or until you get
bored of printing the next element which might indicate a cycle.  (Not
sure how many classes in this list to expect, probably depends on your
altq configuration.)


Re: ALTQ cannot be stopped Was: Fwd: 10-BETA : some network issues

2023-07-22 Thread BERTRAND Joël
I have rebuilt my tree, started and stopped altqd.

legendre# gdb /usr/sbin/altqd 22610
GNU gdb (GDB) 11.0.50.20200914-git
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later

This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64--netbsd".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
.
Find the GDB manual and other documentation resources online at:
.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/sbin/altqd...
Reading symbols from /usr/libdata/debug//usr/sbin/altqd.debug...
Attaching to program: /usr/sbin/altqd, process 22610
Reading symbols from /usr/lib/libutil.so.7...
Reading symbols from /usr/libdata/debug//usr/lib/libutil.so.7.24.debug...
Reading symbols from /usr/lib/libm.so.0...
Reading symbols from /usr/libdata/debug//usr/lib/libm.so.0.12.debug...
Reading symbols from /usr/lib/libc.so.12...
--Type  for more, q to quit, c to continue without paging--
Reading symbols from /usr/libdata/debug//usr/lib/libc.so.12.220.debug...
Reading symbols from /usr/libexec/ld.elf_so...
Reading symbols from /usr/libdata/debug//usr/libexec/ld.elf_so.debug...
[Switching to LWP 22610 of process 22610]
qop_clear (ifinfo=0x7253a46be480)
at /usr/src/netbsd-10/src/usr.sbin/altq/libaltq/qop.c:467
467 while (!LIST_EMPTY(>cllist)) {
(gdb) bt
#0  qop_clear (ifinfo=0x7253a46be480)
at /usr/src/netbsd-10/src/usr.sbin/altq/libaltq/qop.c:467
#1  0x000152a05774 in qop_delete_if (ifinfo=0x7253a46be480)
at /usr/src/netbsd-10/src/usr.sbin/altq/libaltq/qop.c:394
#2  0x000152a058c6 in qcmd_destroyall ()
at /usr/src/netbsd-10/src/usr.sbin/altq/libaltq/qop.c:204
#3  0x000152a12c45 in main (argc=, argv=)
at /usr/src/netbsd-10/src/usr.sbin/altq/altqd/altqd.c:313
(gdb) info locals
root = 
clinfo = 
(gdb)

Best regards,

JKB


re: ALTQ cannot be stopped Was: Fwd: 10-BETA : some network issues

2023-07-21 Thread matthew green
> Reading symbols from /usr/libdata/debug//usr/lib/libm.so.0.12.debug...
> (No debugging symbols found in
> /usr/libdata/debug//usr/lib/libm.so.0.12.debug)
> Reading symbols from /usr/lib/libc.so.12...
> Reading symbols from /usr/libdata/debug//usr/lib/libc.so.12.220.debug...
> (No debugging symbols found in
> /usr/libdata/debug//usr/lib/libc.so.12.220.debug)
> Reading symbols from /usr/libexec/ld.elf_so...
> (No debugging symbols found in /usr/libexec/ld.elf_so)
> [Switching to LWP 7893 of process 7893]
> 0x00010820564b in qop_clear ()
> (gdb) bt
> #0  0x00010820564b in qop_clear ()
> #1  0x000108205774 in qop_delete_if ()
> #2  0x0001082058c6 in qcmd_destroyall ()
> #3  0x000108212c45 in main ()
> (gdb) info locals
> No symbol table info available.
> (gdb)
>
>   I don't understand why gdb complains about debugging symbols.

i think it's because our build has a bug.  i was recently
trying to debug something in a shared library and had the
same problem and i traced it down to this code:

   share/mk/bsd.README:MKSTRIPSYM If "yes", strip all local symbols from 
shared libraries;

   share/mk/bsd.lib.mk:.if ${MKSTRIPSYM:Uyes} == "yes"
   share/mk/bsd.lib.mk:_LIBLDOPTS+=   -Wl,-x
   share/mk/bsd.lib.mk:.else
   share/mk/bsd.lib.mk:_LIBLDOPTS+=   -Wl,-X

and putting "MKSTRIPSYM=no" in my mk.conf fixed it.

i believe this is a bug and should default to "no", if
MKDEBUG has also been enabled.


.mrg.


Re: ALTQ cannot be stopped Was: Fwd: 10-BETA : some network issues

2023-07-20 Thread BERTRAND Joël
Taylor R Campbell a écrit :
>> Date: Wed, 19 Jul 2023 15:36:48 +0200
>> From: BERTRAND Joël 
>>
>>  Yesterday, I have done a mistake... ALTQD is running on one of my
>> NetBSD server (this server routes rtp packets) and I have tried to
>> reboot this machine... It was not possible: shutdown stalled on
>> "Stopping altqd" (more than 10 minutes, I have only powered up screen
>> after a long time...). Fortunately, this server was not too far and I
>> have manually killed altqd.
>>
>>  Any news about PR 57171 ? If altqd is running, it has to be manually
>> killed before shutdown or shutdown cannot stop system...
> 
> It looks like the PR is waiting for feedback from you to get a stack
> trace from altqd with debug info?
> 
> Can you please extract the debug.tar.xz set for your userland, attach
> gdb to the stuck altqd, and ask for `bt'?

I have rebuilt my tree, done an install=/ and extract debug.tar.xz in
/usr/libdata.

altqd is stopped.

legendre# /etc/rc.d/altqd start
Starting altqd.
legendre# /etc/rc.d/altqd stop
Stopping altqd.
Waiting for PIDS: 7893, 7893, 7893, 7893, 7893, 7893, 7893, 7893, 7893,
7893, 7893, 7893, 7893, 7893, 7893, 7893, 7893, 7893, 7893, 7893, 7893,
7893, 7893, 7893, 7893, 7893, 7893, 7893, 7893, 7893, 7893, 7893, 7893,
7893, 7893, 7893, 7893, 7893, 7893, 7893, 7893 ...


legendre# gdb /usr/sbin/altqd 7893
GNU gdb (GDB) 11.0.50.20200914-git
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later

This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64--netbsd".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
.
Find the GDB manual and other documentation resources online at:
.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/sbin/altqd...
(No debugging symbols found in /usr/sbin/altqd)
Attaching to program: /usr/sbin/altqd, process 7893
Reading symbols from /usr/lib/libutil.so.7...
Reading symbols from /usr/libdata/debug//usr/lib/libutil.so.7.24.debug...
(No debugging symbols found in
/usr/libdata/debug//usr/lib/libutil.so.7.24.debug)
Reading symbols from /usr/lib/libm.so.0...
--Type  for more, q to quit, c to continue without paging--
Reading symbols from /usr/libdata/debug//usr/lib/libm.so.0.12.debug...
(No debugging symbols found in
/usr/libdata/debug//usr/lib/libm.so.0.12.debug)
Reading symbols from /usr/lib/libc.so.12...
Reading symbols from /usr/libdata/debug//usr/lib/libc.so.12.220.debug...
(No debugging symbols found in
/usr/libdata/debug//usr/lib/libc.so.12.220.debug)
Reading symbols from /usr/libexec/ld.elf_so...
(No debugging symbols found in /usr/libexec/ld.elf_so)
[Switching to LWP 7893 of process 7893]
0x00010820564b in qop_clear ()
(gdb) bt
#0  0x00010820564b in qop_clear ()
#1  0x000108205774 in qop_delete_if ()
#2  0x0001082058c6 in qcmd_destroyall ()
#3  0x000108212c45 in main ()
(gdb) info locals
No symbol table info available.
(gdb)

I don't understand why gdb complains about debugging symbols.

legendre# pwd
/usr/libdata/debug/usr/lib
legendre# file libutil.so.7.24.debug
libutil.so.7.24.debug: ELF 64-bit LSB shared object, x86-64, version 1
(SYSV), dynamically linked, for NetBSD 10.0, not stripped
legendre# ls -l libutil.so.7.24.debug
-r--r--r--  1 root  wheel  04 Jul 20 16:17 libutil.so.7.24.debug
legendre#

Best regards,

JB


Re: ALTQ cannot be stopped Was: Fwd: 10-BETA : some network issues

2023-07-19 Thread Taylor R Campbell
> Date: Wed, 19 Jul 2023 15:36:48 +0200
> From: BERTRAND Joël 
> 
>   Yesterday, I have done a mistake... ALTQD is running on one of my
> NetBSD server (this server routes rtp packets) and I have tried to
> reboot this machine... It was not possible: shutdown stalled on
> "Stopping altqd" (more than 10 minutes, I have only powered up screen
> after a long time...). Fortunately, this server was not too far and I
> have manually killed altqd.
> 
>   Any news about PR 57171 ? If altqd is running, it has to be manually
> killed before shutdown or shutdown cannot stop system...

It looks like the PR is waiting for feedback from you to get a stack
trace from altqd with debug info?

Can you please extract the debug.tar.xz set for your userland, attach
gdb to the stuck altqd, and ask for `bt'?

# progress -zf /path/to/debug.tar.xz tar -C / -xpf -
# gdb /usr/sbin/altqd 
(gdb) bt
(gdb) info locals

If the debug.tar.xz set you extracted doesn't have a file
/usr/libdata/debug/usr/sbin/altqd.debug, then something is seriously
wrong with it; can you update the PR with exactly what debug.tar.xz
set you used and how you made it or where you got it from?


Re: ALTQ cannot be stopped Was: Fwd: 10-BETA : some network issues

2023-07-19 Thread BERTRAND Joël
Hello,

Yesterday, I have done a mistake... ALTQD is running on one of my
NetBSD server (this server routes rtp packets) and I have tried to
reboot this machine... It was not possible: shutdown stalled on
"Stopping altqd" (more than 10 minutes, I have only powered up screen
after a long time...). Fortunately, this server was not too far and I
have manually killed altqd.

Any news about PR 57171 ? If altqd is running, it has to be manually
killed before shutdown or shutdown cannot stop system...

Best regards,

JB


Re: ALTQ cannot be stopped Was: Fwd: 10-BETA : some network issues

2023-01-06 Thread BERTRAND Joël
PR 57171


Re: ALTQ cannot be stopped Was: Fwd: 10-BETA : some network issues

2023-01-06 Thread BERTRAND Joël
BERTRAND Joël a écrit :
> Taylor R Campbell a écrit :
>>> Date: Fri, 6 Jan 2023 15:55:37 +0100
>>> From: BERTRAND Joël 
>>>
>>> /etc/rc.d/altqd onestop doesn't stop altqd. top shows that altqd
>>> remains on CPU (and takes 100% of a CPU).
>>>
>>> gdb -p 1342 (altq) returns that altqd stalls in qop_clear() function (I
>>> don't have altdq sources on this system).
>>
>> Can you get a kernel stack trace with crash(8) on the running system
>> when altqd(8) is taking 100% CPU?  If the pid is 1342, you use the
>> command `bt/t 0t1342' to get it:
> 
>   I have tried, but without usable result.
> 
> netbsd-test1# ps auwx | grep altq
> root 1779 98.5 0.2 20356 1433 ? Os 4:39PM 1:06.26 /usr/sbin/altqd
> 
>> # crash
>> crash> bt/t 0t1342
> 
> netbsd-test1# crash
> ...
> crash> bt/t 0t1779
> crash: kvm_resad(0x7f7fff7738e8, 8): kvm_read: Bad address
> trace: pid 1779 lid 1779
> crash>

New test with serial console: altqd runs with PID 470.

>> While here, can you also provide output of:
>>
>> crash> ps
>> crash> ps/w

crash> bt/t 470
trace: pid 1136 not found
crash> bt/t 0t470
crash: kvm_read(0x7f7fff7b0f68, 8): kvm_read: Bad address
trace: pid 470 lid 470
crash> ps
PIDLID S CPU FLAGS   STRUCT LWP *   NAME WAIT
1962 >1962 7   1   100   80a29340b700  crash
1967  1967 3   0   180   80a28e6531c0 sh wait
1980  1980 3   0   180   80a293508680top select
470  > 470 7   0 40100   80a29298f100  altqd
1627  1627 3   1   180   80a292f62480 sh wait
1325  1325 3   0   180   80a2931beb80  getty ttyraw
1318  1318 3   0   180   80a292e184c0  getty ttyraw
1240  1240 3   0   180   80a29298f540  login wait
1319  1319 3   1   180   80a29340b2c0  login wait
1355  1355 3   0   180   80a292b35940   cron nanoslp
1361  1361 3   0   180   80a292b35500  inetd kqueue
1279  1279 3   1   180   80a28f49e580   qmgr kqueue
1206  1206 3   1   180   80a292b350c0 pickup kqueue
1331  1331 3   1   180   80a29298f980 master kqueue
1138  1138 3   1   180   80a292e18900   sshd poll
936936 3   1   180   80a2931be300 powerd kqueue
565565 3   0   180   80a292f628c0syslogd kqueue
368368 3   1   180   80a292ed0bc0 dhcpcd poll
466466 3   1   180   80a292ed0340 dhcpcd poll
458458 3   0   180   80a292f62040 dhcpcd poll
454454 3   0   180   80a292ed0780 dhcpcd poll
11 3   0   180   80a293566a80   init wait
0  162 3   1   200   80a293508ac0physiod physiod
0  206 3   1   200   80a293645b00  pooldrain pooldrain
0  205 3   1   200   80a2936456c0ioflush syncer
0  204 3   0   200   80a293645280   pgdaemon pgdaemon
0  201 3   0   200   80a2935fc500  swwreboot swwreboot
0  200 3   0   200   80a2935fc0c0  iscsi_cleanup cleanup
0  199 3   0   200   80a2935fc940  atapibus0 sccomp
0  196 3   0   200   80a2935e7900   usb3 usbevt
0  195 3   1   200   80a2935e74c0   usb2 usbevt
0  194 3   0   200   80a2935e7080   usb1 usbevt
0  193 3   0   200   80a2935d1100   usb0 usbevt
0  192 3   1   200   80a293508240 npfgc0 npfgcw
0  178 3   1   200   80a293566640rt_free rt_free
0  177 3   1   200   80a293566200  unpgc unpgc
0   31 3   0   200   80a293551a40key_timehandler
key_timehandler
0   63 3   1   200   80a293551600icmp6_wqinput/1
icmp6_wqinput
0  126 3   0   200   80a2935511c0icmp6_wqinput/0
icmp6_wqinput
0  125 3   1   200   80a29357ca00  nd6_timer nd6_timer
0  124 3   1   200   80a29357c5c0carp6_wqinput/1
carp6_wqinput
0  123 3   0   200   80a29357c180carp6_wqinput/0
carp6_wqinput
0  122 3   1   200   80a2935279c0 carp_wqinput/1
carp_wqinput
0  121 3   0   200   80a293527580 carp_wqinput/0
carp_wqinput
0  120 3   1   200   80a293527140 icmp_wqinput/1
icmp_wqinput
0  119 3   0   200   80a2935d1980 icmp_wqinput/0
icmp_wqinput
0  118 3   1   200   80a2935d1540   rt_timer rt_timer
0  117 3   0   200   80a25815c6c0vmem_rehash vmem_rehash
0  108 3   1   200   80a25828c8c0  entbutler entropy
0  107 3   0   240   80a25828c480atabus5 atath
0  106 3   1   240   80a25828c040atabus4 atath
0  105 3   0 

Re: ALTQ cannot be stopped Was: Fwd: 10-BETA : some network issues

2023-01-06 Thread BERTRAND Joël
Taylor R Campbell a écrit :
>> Date: Fri, 6 Jan 2023 15:55:37 +0100
>> From: BERTRAND Joël 
>>
>>  /etc/rc.d/altqd onestop doesn't stop altqd. top shows that altqd
>> remains on CPU (and takes 100% of a CPU).
>>
>> gdb -p 1342 (altq) returns that altqd stalls in qop_clear() function (I
>> don't have altdq sources on this system).
> 
> Can you get a kernel stack trace with crash(8) on the running system
> when altqd(8) is taking 100% CPU?  If the pid is 1342, you use the
> command `bt/t 0t1342' to get it:

I have tried, but without usable result.

netbsd-test1# ps auwx | grep altq
root 1779 98.5 0.2 20356 1433 ? Os 4:39PM 1:06.26 /usr/sbin/altqd

> # crash
> crash> bt/t 0t1342

netbsd-test1# crash
...
crash> bt/t 0t1779
crash: kvm_resad(0x7f7fff7738e8, 8): kvm_read: Bad address
trace: pid 1779 lid 1779
crash>

> While here, can you also provide output of:
> 
> crash> ps
> crash> ps/w

I cannot copy these line (it's a virtual machine). I'll try to access
to thos virtual machine through serial line.

Regards,

JKB


Re: ALTQ cannot be stopped Was: Fwd: 10-BETA : some network issues

2023-01-06 Thread BERTRAND Joël
Brian Buhrow a écrit :
>   Hello Joel.  I'm not sure this is a new problem.  I've seen similar
> behavior on NetBSD-5.2.  It seems to happen on systems where there is a
> good deal of traffic traversing the network at the time the stop is
> requested.

Yes, I have seen this issue for a very long time.


Re: ALTQ cannot be stopped Was: Fwd: 10-BETA : some network issues

2023-01-06 Thread Taylor R Campbell
> Date: Fri, 6 Jan 2023 15:55:37 +0100
> From: BERTRAND Joël 
> 
>   /etc/rc.d/altqd onestop doesn't stop altqd. top shows that altqd
> remains on CPU (and takes 100% of a CPU).
> 
> gdb -p 1342 (altq) returns that altqd stalls in qop_clear() function (I
> don't have altdq sources on this system).

Can you get a kernel stack trace with crash(8) on the running system
when altqd(8) is taking 100% CPU?  If the pid is 1342, you use the
command `bt/t 0t1342' to get it:

# crash
crash> bt/t 0t1342

While here, can you also provide output of:

crash> ps
crash> ps/w

And once you have that, can you file a PR with the output and the altq
configuration you quoted?


Re: ALTQ cannot be stopped Was: Fwd: 10-BETA : some network issues

2023-01-06 Thread Brian Buhrow
Hello Joel.  I'm not sure this is a new problem.  I've seen similar
behavior on NetBSD-5.2.  It seems to happen on systems where there is a
good deal of traffic traversing the network at the time the stop is
requested.
-thanks
-Brian