Re: Status of 8.99.12

2018-02-11 Thread Paul Goyette

On Mon, 12 Feb 2018, Paul Goyette wrote:


2. Whenever I try to shutdown the system, I get a networking-related
  panic.  The following is manually transcribed:

trap type 4 code 0 rip 0x802d3f75 cs 0x8 rflags 0x10282
  cr2 0x77e0e931c020 ilevel 0x4 rsp 0x80090a7e3c80
curlwp 0xe4afbb6e8700 pid 926.1 lowest kstack
  0x80090a7e0c20
kernel: protection fault trap, code = 0
stopped in 926.1 (avahi-daemon) at ip_setmoptions+0x237: movq
  360(%rax),%rdi
traceback:
ip_setmoptions + 0x237
ip_rtloutput + 0x218
udp_ctloutput + 0x82
udp_ctloutput_wrapper + 0x2c
sosetopt + 0x67
sys_setsockopt + 0x91
syscall + 0x1ed (syscall #105)


This appears to be fixed by a patch provided by ozakir@


3. After getting the above, as soon as I type a single character as
  command input to ddb(4), I get a LOCKDEBUG panic.  I didn't yet
  transcribe the 40+ lines of output, but the backtrace clearly
  includes a couple entries from the xhci (USB-3) driver.


Here's the console output from the LOCKDEBUG panic - all transcribed by 
hand, but hopefully without too many typos!


Mutex error: mutex_vector_enter,523: spin lock held
lock address: 0xe410e9d1d9a0   type: spin
initialized:  0x802bac06
shared holds:  0   exclusive:  1
shares wanted: 0   exclusive:  0
current CPU:  11   last held: 11
curlwp:   0xe41fc09ad2c0   last held: 0xe41fc09ad2c0
last locked*: 0x802b81de   unlocked:  0x80291179
owner field:  0x00010600   wait/spin:0/1
panic: LOCKDEBUG: Mutex error: mutex_vector_enter,523: spin lock held

And the backtrace is

vpanic+0x140
snprintf
lockdebug_more
mutex_enter+0x69d
xhci_device_intr_start+0x125
usbd_start_next+0x65
xhci_soft_intr+0x49b
xhci_poll+0x37
ukbd_cngetc+0x19
cngetc+0x34
db_readline+0x65
db_read_line+0x15
db_command_loop+0x84
db_trap+0xe3
kbd_trap+0xe2
trap (number 4)

followed by the original backtrace (see above, starting with 
ip_setmoptions).







4. While the system is running, I have noticed that un-mounting nullfs
  mounts is very slow.  Using mksandbox (from pkgsrc), I create a
  sandbox with about 22 null mounts.  Creating/mounting is no problem,
  and everything runs as expected.  However, when unmounting these
  nullfs, each one takes between 3 and 6 wall-seconds, during which
  the umount process is running at 100% of one CPU.  Additionally,
  some of these umounts seem to grab the CPU with interrupts disabled,
  resulting in total stall of the machine for the duration (and, in X,
  cursor movement stalls/gets "jerky").  All the unmounts eventually
  complete successflly.




+--+--++
| Paul Goyette | PGP Key fingerprint: | E-mail addresses:  |
| (Retired)| FA29 0E3B 35AF E8AE 6651 | paul at whooppee dot com   |
| Kernel Developer | 0786 F758 55DE 53BA 7731 | pgoyette at netbsd dot org |
+--+--++



+--+--++
| Paul Goyette | PGP Key fingerprint: | E-mail addresses:  |
| (Retired)| FA29 0E3B 35AF E8AE 6651 | paul at whooppee dot com   |
| Kernel Developer | 0786 F758 55DE 53BA 7731 | pgoyette at netbsd dot org |
+--+--++


Re: Status of 8.99.12

2018-02-11 Thread Paul Goyette

On Mon, 12 Feb 2018, Ryota Ozaki wrote:


2. Whenever I try to shutdown the system, I get a networking-related
   panic.  The following is manually transcribed:

trap type 4 code 0 rip 0x802d3f75 cs 0x8 rflags 0x10282
  cr2 0x77e0e931c020 ilevel 0x4 rsp 0x80090a7e3c80
curlwp 0xe4afbb6e8700 pid 926.1 lowest kstack
  0x80090a7e0c20
kernel: protection fault trap, code = 0
stopped in 926.1 (avahi-daemon) at ip_setmoptions+0x237: movq
  360(%rax),%rdi
traceback:
ip_setmoptions + 0x237
ip_rtloutput + 0x218
udp_ctloutput + 0x82
udp_ctloutput_wrapper + 0x2c
sosetopt + 0x67
sys_setsockopt + 0x91
syscall + 0x1ed (syscall #105)


Is the panic fixed by the following patch?

diff --git a/sys/netinet/ip_output.c b/sys/netinet/ip_output.c
index 44d8032f387..2e5e346af91 100644
--- a/sys/netinet/ip_output.c
+++ b/sys/netinet/ip_output.c
@@ -1927,9 +1927,13 @@ ip_drop_membership(struct ip_moptions *imo,
const struct sockopt *sopt)
* Give up the multicast address record to which the
* membership points.
*/
-   IFNET_LOCK(imo->imo_membership[i]->inm_ifp);
+{
+   struct ifnet *inm_ifp = imo->imo_membership[i]->inm_ifp;
+   IFNET_LOCK(inm_ifp);
   in_delmulti(imo->imo_membership[i]);
-   IFNET_UNLOCK(imo->imo_membership[i]->inm_ifp);
+   /* ifp should not leave thanks to solock */
+   IFNET_UNLOCK(inm_ifp);
+}

   /*
* Remove the gap in the membership array.



Yes it appears to address the problem.  Without this patch, the above 
crash was 100% reproducible (5 out of 5).  With this patch applied (and 
no other changes) I have had 3 consecutive reboots without and problem!


Thanks for the quick turn-around.



+--+--++
| Paul Goyette | PGP Key fingerprint: | E-mail addresses:  |
| (Retired)| FA29 0E3B 35AF E8AE 6651 | paul at whooppee dot com   |
| Kernel Developer | 0786 F758 55DE 53BA 7731 | pgoyette at netbsd dot org |
+--+--++


Re: Status of 8.99.12

2018-02-11 Thread Ryota Ozaki
On Mon, Feb 12, 2018 at 9:48 AM, Paul Goyette  wrote:
> After an extended period of build breaks, I finally got a new release built
> from sources updated on 2018-02-10 at 04:02:43 UTC
>
> I'm seeing several problems with this release that were not seen with my
> previous installation (from last November).
>
> 1. Starting the gnucash program (from pkgsrc finance/gnucash) now takes
>about 3 times as long as before.  Even after successfully loading
>the image (to get libraries etc into the file system cache) it take
>more than three full minutes for the program to initialize.
>
> 2. Whenever I try to shutdown the system, I get a networking-related
>panic.  The following is manually transcribed:
>
> trap type 4 code 0 rip 0x802d3f75 cs 0x8 rflags 0x10282
>   cr2 0x77e0e931c020 ilevel 0x4 rsp 0x80090a7e3c80
> curlwp 0xe4afbb6e8700 pid 926.1 lowest kstack
>   0x80090a7e0c20
> kernel: protection fault trap, code = 0
> stopped in 926.1 (avahi-daemon) at ip_setmoptions+0x237: movq
>   360(%rax),%rdi
> traceback:
> ip_setmoptions + 0x237
> ip_rtloutput + 0x218
> udp_ctloutput + 0x82
> udp_ctloutput_wrapper + 0x2c
> sosetopt + 0x67
> sys_setsockopt + 0x91
> syscall + 0x1ed (syscall #105)

Is the panic fixed by the following patch?

Thanks,
  ozaki-r


diff --git a/sys/netinet/ip_output.c b/sys/netinet/ip_output.c
index 44d8032f387..2e5e346af91 100644
--- a/sys/netinet/ip_output.c
+++ b/sys/netinet/ip_output.c
@@ -1927,9 +1927,13 @@ ip_drop_membership(struct ip_moptions *imo,
const struct sockopt *sopt)
 * Give up the multicast address record to which the
 * membership points.
 */
-   IFNET_LOCK(imo->imo_membership[i]->inm_ifp);
+{
+   struct ifnet *inm_ifp = imo->imo_membership[i]->inm_ifp;
+   IFNET_LOCK(inm_ifp);
in_delmulti(imo->imo_membership[i]);
-   IFNET_UNLOCK(imo->imo_membership[i]->inm_ifp);
+   /* ifp should not leave thanks to solock */
+   IFNET_UNLOCK(inm_ifp);
+}

/*
 * Remove the gap in the membership array.


daily CVS update output

2018-02-11 Thread NetBSD source update

Updating src tree:
P src/crypto/external/bsd/openssl/dist/crypto/idea/i_skey.c
P src/distrib/sets/lists/comp/mi
P src/distrib/sets/lists/tests/mi
P src/external/gpl3/gcc/dist/gcc/config/i386/i386.c
P src/external/gpl3/gcc/dist/gcc/cp/decl.c
P src/external/gpl3/gcc.old/dist/gcc/config/i386/i386.c
P src/lib/libc/time/ctime.3
P src/lib/libc/time/getdate.3
P src/lib/libc/yp/ypclnt.3
P src/libexec/ld.elf_so/rtld.c
P src/share/man/man4/ahc.4
P src/share/man/man4/bluetooth.4
P src/share/man/man7/ascii.7
P src/share/man/man9/pcmcia.9
P src/share/man/man9/wsbell.9
P src/share/mk/bsd.own.mk
P src/sys/arch/amd64/amd64/db_machdep.c
P src/sys/arch/amd64/amd64/machdep.c
P src/sys/arch/i386/i386/db_machdep.c
P src/sys/arch/x86/conf/files.x86
P src/sys/arch/x86/x86/db_trace.c
U src/sys/arch/x86/x86/svs.c
P 
src/sys/external/bsd/drm2/dist/drm/nouveau/core/engine/device/nouveau_engine_device_nve0.c

Updating xsrc tree:


Killing core files:



Updating release-6 src tree (netbsd-6):
U doc/CHANGES-6.2
P sys/dist/pf/net/pf.c

Updating release-6 xsrc tree (netbsd-6):



Updating release-7 src tree (netbsd-7):
U doc/CHANGES-7.2
P sys/dist/pf/net/pf.c

Updating release-7 xsrc tree (netbsd-7):



Updating release-8 src tree (netbsd-8):
P distrib/sets/lists/base/shl.mi
P distrib/sets/lists/comp/mi
P distrib/sets/lists/comp/shl.mi
P distrib/sets/lists/debug/mi
P distrib/sets/lists/debug/shl.mi
P distrib/sets/lists/man/mi
P distrib/sets/lists/tests/mi
U doc/CHANGES-8.0
P etc/mtree/NetBSD.dist.tests
P external/gpl2/xcvs/dist/src/rsh-client.c
P share/man/man4/Makefile
P share/man/man4/ipsec.4
U share/man/man4/ipsecif.4
P sys/arch/amd64/conf/ALL
P sys/arch/amd64/conf/GENERIC
P sys/conf/files
P sys/dev/sdmmc/sdmmc_mem.c
P sys/dist/pf/net/pf.c
P 
sys/external/bsd/drm2/dist/drm/nouveau/core/engine/device/nouveau_engine_device_nve0.c
P sys/net/Makefile
P sys/net/files.net
P sys/net/if.c
P sys/net/if.h
P sys/net/if_gif.c
U sys/net/if_ipsec.c
U sys/net/if_ipsec.h
P sys/net/if_l2tp.c
P sys/net/if_types.h
P sys/netinet/in.c
P sys/netinet/in.h
P sys/netinet/in_gif.c
P sys/netinet/ip_var.h
P sys/netinet6/in6.c
P sys/netinet6/in6.h
P sys/netinet6/in6_gif.c
P sys/netinet6/ip6_var.h
P sys/netipsec/Makefile
P sys/netipsec/files.netipsec
P sys/netipsec/ipsec.h
U sys/netipsec/ipsecif.c
U sys/netipsec/ipsecif.h
P sys/netipsec/key.c
P sys/netipsec/key.h
P sys/rump/dev/lib/libucom/UCOM.ioconf
P sys/rump/net/Makefile.rumpnetcomp
U sys/rump/net/lib/libipsec/IPSEC.ioconf
U sys/rump/net/lib/libipsec/Makefile
U sys/rump/net/lib/libipsec/ipsec_component.c
P tests/net/Makefile
U tests/net/if_ipsec/Makefile
U tests/net/if_ipsec/t_ipsec.sh
P usr.sbin/ypserv/ypserv/ypserv_proc.c

Updating release-8 xsrc tree (netbsd-8):




Updating file list:
-rw-rw-r--  1 srcmastr  netbsd  50375996 Feb 12 03:10 ls-lRA.gz


Status of 8.99.12

2018-02-11 Thread Paul Goyette
After an extended period of build breaks, I finally got a new release 
built from sources updated on 2018-02-10 at 04:02:43 UTC


I'm seeing several problems with this release that were not seen with my 
previous installation (from last November).


1. Starting the gnucash program (from pkgsrc finance/gnucash) now takes
   about 3 times as long as before.  Even after successfully loading
   the image (to get libraries etc into the file system cache) it take
   more than three full minutes for the program to initialize.

2. Whenever I try to shutdown the system, I get a networking-related
   panic.  The following is manually transcribed:

trap type 4 code 0 rip 0x802d3f75 cs 0x8 rflags 0x10282
  cr2 0x77e0e931c020 ilevel 0x4 rsp 0x80090a7e3c80
curlwp 0xe4afbb6e8700 pid 926.1 lowest kstack
  0x80090a7e0c20
kernel: protection fault trap, code = 0
stopped in 926.1 (avahi-daemon) at ip_setmoptions+0x237: movq
  360(%rax),%rdi
traceback:
ip_setmoptions + 0x237
ip_rtloutput + 0x218
udp_ctloutput + 0x82
udp_ctloutput_wrapper + 0x2c
sosetopt + 0x67
sys_setsockopt + 0x91
syscall + 0x1ed (syscall #105)

3. After getting the above, as soon as I type a single character as
   command input to ddb(4), I get a LOCKDEBUG panic.  I didn't yet
   transcribe the 40+ lines of output, but the backtrace clearly
   includes a couple entries from the xhci (USB-3) driver.

4. While the system is running, I have noticed that un-mounting nullfs
   mounts is very slow.  Using mksandbox (from pkgsrc), I create a
   sandbox with about 22 null mounts.  Creating/mounting is no problem,
   and everything runs as expected.  However, when unmounting these
   nullfs, each one takes between 3 and 6 wall-seconds, during which
   the umount process is running at 100% of one CPU.  Additionally,
   some of these umounts seem to grab the CPU with interrupts disabled,
   resulting in total stall of the machine for the duration (and, in X,
   cursor movement stalls/gets "jerky").  All the unmounts eventually
   complete successflly.




+--+--++
| Paul Goyette | PGP Key fingerprint: | E-mail addresses:  |
| (Retired)| FA29 0E3B 35AF E8AE 6651 | paul at whooppee dot com   |
| Kernel Developer | 0786 F758 55DE 53BA 7731 | pgoyette at netbsd dot org |
+--+--++


Automated report: NetBSD-current/i386 test failure

2018-02-11 Thread NetBSD Test Fixture
This is an automatically generated notice of new failures of the
NetBSD test suite.

The newly failing test cases are:

crypto/libcrypto/t_certs:x509v3
crypto/libcrypto/t_ciphers:evp
crypto/libcrypto/t_ciphers:idea
crypto/libcrypto/t_hashes:sha
crypto/libcrypto/t_libcrypto:lhash

The above tests failed in each of the last 3 test runs, and passed in
at least 27 consecutive runs before that.

The following commits were made between the last successful test and
the failed test:

2018.02.09.04.38.24 christos src/share/mk/bsd.own.mk,v 1.1032
2018.02.09.08.03.33 maxv src/sys/netinet/ip_mroute.c,v 1.154
2018.02.09.08.42.26 maxv src/sys/arch/amd64/amd64/vector.S,v 1.58
2018.02.09.08.54.11 maxv src/sys/arch/amd64/amd64/amd64_trap.S,v 1.24
2018.02.09.08.58.01 maxv src/sys/arch/x86/x86/fpu.c,v 1.28
2018.02.09.09.07.13 maxv src/sys/uvm/uvm_bio.c,v 1.92
2018.02.09.09.36.42 maxv src/sys/arch/amd64/amd64/db_interface.c,v 1.28
2018.02.09.09.36.42 maxv src/sys/arch/i386/i386/db_interface.c,v 1.77
2018.02.09.13.25.41 christos 
src/crypto/external/bsd/openssl/lib/libcrypto/arch/i386/Makefile,v 1.8
2018.02.09.13.25.41 christos 
src/crypto/external/bsd/openssl/lib/libcrypto/arch/i386/aes-586.S,v 1.7
2018.02.09.13.25.41 christos 
src/crypto/external/bsd/openssl/lib/libcrypto/arch/i386/aesni-x86.S,v 1.8
2018.02.09.13.25.41 christos 
src/crypto/external/bsd/openssl/lib/libcrypto/arch/i386/bf-586.S,v 1.4
2018.02.09.13.25.41 christos 
src/crypto/external/bsd/openssl/lib/libcrypto/arch/i386/bn-586.S,v 1.9
2018.02.09.13.25.41 christos 
src/crypto/external/bsd/openssl/lib/libcrypto/arch/i386/cast-586.S,v 1.5
2018.02.09.13.25.41 christos 
src/crypto/external/bsd/openssl/lib/libcrypto/arch/i386/cmll-x86.S,v 1.5
2018.02.09.13.25.41 christos 
src/crypto/external/bsd/openssl/lib/libcrypto/arch/i386/co-586.S,v 1.7
2018.02.09.13.25.41 christos 
src/crypto/external/bsd/openssl/lib/libcrypto/arch/i386/crypt586.S,v 1.4
2018.02.09.13.25.41 christos 
src/crypto/external/bsd/openssl/lib/libcrypto/arch/i386/des-586.S,v 1.6
2018.02.09.13.25.41 christos 
src/crypto/external/bsd/openssl/lib/libcrypto/arch/i386/ghash-x86.S,v 1.6
2018.02.09.13.25.41 christos 
src/crypto/external/bsd/openssl/lib/libcrypto/arch/i386/md5-586.S,v 1.7
2018.02.09.13.25.41 christos 
src/crypto/external/bsd/openssl/lib/libcrypto/arch/i386/rc4-586.S,v 1.6
2018.02.09.13.25.41 christos 
src/crypto/external/bsd/openssl/lib/libcrypto/arch/i386/rc5-586.S,v 1.4
2018.02.09.13.25.41 christos 
src/crypto/external/bsd/openssl/lib/libcrypto/arch/i386/rmd-586.S,v 1.7
2018.02.09.13.25.41 christos 
src/crypto/external/bsd/openssl/lib/libcrypto/arch/i386/sha1-586.S,v 1.7
2018.02.09.13.25.41 christos 
src/crypto/external/bsd/openssl/lib/libcrypto/arch/i386/sha256-586.S,v 1.6
2018.02.09.13.25.41 christos 
src/crypto/external/bsd/openssl/lib/libcrypto/arch/i386/sha512-586.S,v 1.6
2018.02.09.13.25.41 christos 
src/crypto/external/bsd/openssl/lib/libcrypto/arch/i386/uplink-x86.S,v 1.5
2018.02.09.13.25.41 christos 
src/crypto/external/bsd/openssl/lib/libcrypto/arch/i386/vpaes-x86.S,v 1.5
2018.02.09.13.25.41 christos 
src/crypto/external/bsd/openssl/lib/libcrypto/arch/i386/x86cpuid.S,v 1.14
2018.02.09.13.35.45 christos 
src/crypto/external/bsd/openssl/lib/libcrypto/arch/i386/bn.inc,v 1.3
2018.02.09.13.35.45 christos 
src/crypto/external/bsd/openssl/lib/libcrypto/bn.inc,v 1.5
2018.02.09.13.37.16 christos 
src/crypto/external/bsd/openssl/lib/libcrypto/arch/i386/bf-686.S,v 1.4
2018.02.09.13.37.16 christos 
src/crypto/external/bsd/openssl/lib/libcrypto/arch/i386/x86.S,v 1.7
2018.02.09.14.06.17 maxv src/sys/netinet/tcp_input.c,v 1.375
2018.02.09.15.24.35 tsutsui src/sys/arch/atari/pci/pci_machdep.c,v 1.56
2018.02.09.16.06.59 christos 
src/crypto/external/bsd/openssl/lib/libcrypto/arch/i386/Makefile,v 1.9
2018.02.09.16.06.59 christos 
src/crypto/external/bsd/openssl/lib/libcrypto/arch/i386/aes-586.S,v 1.8
2018.02.09.16.06.59 christos 
src/crypto/external/bsd/openssl/lib/libcrypto/arch/i386/aesni-x86.S,v 1.9
2018.02.09.16.06.59 christos 
src/crypto/external/bsd/openssl/lib/libcrypto/arch/i386/bn.inc,v 1.4
2018.02.09.16.06.59 christos 
src/crypto/external/bsd/openssl/lib/libcrypto/arch/i386/cast-586.S,v 1.6
2018.02.09.16.06.59 christos 
src/crypto/external/bsd/openssl/lib/libcrypto/arch/i386/chacha-x86.S,v 1.2
2018.02.09.16.06.59 christos 
src/crypto/external/bsd/openssl/lib/libcrypto/arch/i386/cmll-x86.S,v 1.6
2018.02.09.16.06.59 christos 
src/crypto/external/bsd/openssl/lib/libcrypto/arch/i386/crypt586.S,v 1.5
2018.02.09.16.06.59 christos 
src/crypto/external/bsd/openssl/lib/libcrypto/arch/i386/e_padlock-x86.S,v 1.2
2018.02.09.16.06.59 christos 
src/crypto/external/bsd/openssl/lib/libcrypto/arch/i386/ec.inc,v 1.1
2018.02.09.16.06.59 christos 

Re: Lockups in a Ryzen 7 1800X ASUS Crosshair Hero VI system

2018-02-11 Thread Michael van Elst
kar...@netbsd.org (Frank Kardel) writes:

>Lockups - the Ryzen machine is difficault as mostof the time I cannot 
>get into the debugger thus I cannot decide wether it is a hard 
>(CPU/system) lockup or just a software bug.

>I have seen tstile related lockups on another 8.99.9 machine that ceases 
>network operation at the point and processes pile up on tstiles when 
>accessing the network. So at least one locking issue seems to be there.

That could be independent of the CPU... after all, there are lots of
changes in the network stack.

I'm running continous bulk builds on that machine for a week now, no
lockups or crashes.

-- 
-- 
Michael van Elst
Internet: mlel...@serpens.de
"A potential Snark may lurk in every tree."


Re: Lockups in a Ryzen 7 1800X ASUS Crosshair Hero VI system

2018-02-11 Thread Frank Kardel

On 02/06/18 13:16, m...@netbsd.org wrote:

upon further reading it's probably not related but it bugs me that this
patch/similar hack is still not in.


Yes - I have been running with XSAVEOPT disable since mlelstx sent that 
observation. I still have


Lockups - the Ryzen machine is difficault as mostof the time I cannot 
get into the debugger thus I cannot decide wether it is a hard 
(CPU/system) lockup or just a software bug.


I have seen tstile related lockups on another 8.99.9 machine that ceases 
network operation at the point and processes pile up on tstiles when 
accessing the network. So at least one locking issue seems to be there.


Frank