Re: 9.99.104: panic in tcp_shutdown_wrapper
On Sun, Oct 30, 2022 at 5:12 PM J. Hannken-Illjes wrote: > > > On 30. Oct 2022, at 06:52, Michael van Elst wrote: > > > > ozak...@netbsd.org (Ryota Ozaki) writes: > > > >> I've committed a possible fix. Could you try it? > > > >> Thanks, > >> ozaki-r > > > > > > I just got a NULL pointer dereference in tcp_ctloutput where > > the previous check for inp == NULL is also missing. > > > > [ 24837.756043] fp c0016794db70 tcp_ctloutput() at c02ec4b4 > > netbsd:tcp_ctloutput+0x94 > > [ 24837.756043] fp c0016794dcc0 tcp_ctloutput_wrapper() at > > c02d2680 netbsd:tcp_ctloutput_wrapper+-0x31150 > > [ 24837.756043] fp c0016794dcf0 sosetopt() at c0603cbc > > netbsd:sosetopt+0x78 > > [ 24837.756043] fp c0016794ddb0 sys_setsockopt() at c060b0fc > > netbsd:sys_setsockopt+0x7c > > [ 24837.766041] fp c0016794de20 syscall() at c00b30fc > > netbsd:syscall+0x19c > > > > That's: > > > > int > > tcp_ctloutput(int op, struct socket *so, struct sockopt *sopt) > > { > > ... > > s = splsoftnet(); > >inp = sotoinpcb(so); > > ... > >} > >tp = intotcpcb(inp); <- > > > >switch (op) { > > ... and Syzcaller (https://syzkaller.appspot.com/netbsd) has a > bunch of new tcp related crashes starting ~2 days before ... It seems that all of the failures stem from the missing NULL checks. So they should be fixed now. ozaki-r
Re: ECONNREFUSED no longer works
On Mon, Oct 31, 2022 at 6:12 AM Michael van Elst wrote: > > t...@netbsd.org (Tobias Nygren) writes: > > >$ nc -n -v 127.0.0.1 1234 > ># hangs forever in connect(2) instead of exiting w/ connection refused. > > The logic in tcp_drop() got reversed: > > @@ -1042,17 +1017,12 @@ tcp_newtcpcb(int family, void *aux) > struct tcpcb * > tcp_drop(struct tcpcb *tp, int errno) > { > - struct socket *so = NULL; > + struct socket *so; > > - KASSERT(!(tp->t_inpcb && tp->t_in6pcb)); > + KASSERT(tp->t_inpcb != NULL); > > - if (tp->t_inpcb) > - so = tp->t_inpcb->inp_socket; > -#ifdef INET6 > - if (tp->t_in6pcb) > - so = tp->t_in6pcb->in6p_socket; > -#endif > - if (!so) > + so = tp->t_inpcb->inp_socket; > + if (so != NULL)<- > return NULL; > > if (TCPS_HAVERCVDSYN(tp->t_state)) { > Thank you for pointing this out. I've committed a fix. ozaki-r
Re: 9.99.104: panic in tcp_shutdown_wrapper
On Sun, Oct 30, 2022 at 2:52 PM Michael van Elst wrote: > > ozak...@netbsd.org (Ryota Ozaki) writes: > > >I've committed a possible fix. Could you try it? > > >Thanks, > > ozaki-r > > > I just got a NULL pointer dereference in tcp_ctloutput where > the previous check for inp == NULL is also missing. > > [ 24837.756043] fp c0016794db70 tcp_ctloutput() at c02ec4b4 > netbsd:tcp_ctloutput+0x94 > [ 24837.756043] fp c0016794dcc0 tcp_ctloutput_wrapper() at > c02d2680 netbsd:tcp_ctloutput_wrapper+-0x31150 > [ 24837.756043] fp c0016794dcf0 sosetopt() at c0603cbc > netbsd:sosetopt+0x78 > [ 24837.756043] fp c0016794ddb0 sys_setsockopt() at c060b0fc > netbsd:sys_setsockopt+0x7c > [ 24837.766041] fp c0016794de20 syscall() at c00b30fc > netbsd:syscall+0x19c > > That's: > > int > tcp_ctloutput(int op, struct socket *so, struct sockopt *sopt) > { > ... > s = splsoftnet(); > inp = sotoinpcb(so); > ... > } > tp = intotcpcb(inp); <- > > switch (op) { > Thank you for the report. I've fixed the panic too. ozaki-r
Re: 9.99.104: panic in tcp_shutdown_wrapper
Hi, I've committed a possible fix. Could you try it? Thanks, ozaki-r On Sun, Oct 30, 2022 at 12:17 AM Thomas Klausner wrote: > > Hi! > > A couple hours later, my shell was in an NFS mounted directory (probably idle > for some time) and I tried tab-completing an entry, and it panicked again. > Same location as below. > > Hand copied: > tcp_shutdown_wrapper+0x20 > nfs_disconnect+0x69 > nfs_reconnect+0x1a > nfs_request+0x7fb > nfs_access+0x1ed > VOP_ACCESS+0x61 > nfs_lookup+052f > VOP_LOOKUP+0x8a > lookup_once+0x1a6 > namei_tryemulroot+0xb00 > namei+0x29 > vn_open+0x133 > do_open+0xc3 > do_sys_openat+0x74 > sys_open+0x24 > syscall+0x196 > > Thomas > > > On 29.10.2022, at 11:53, Thomas Klausner wrote: > > > > Hi! > > > > I’ve upgraded from 9.99.100 (stable) to 9.99.104 this morning (kernel + > > user land, but packages still the old ones built on 9.99.100 in case it > > matters). > > A couple hours later I started transmission-gtk and the machine immediately > > panicked. > > > > Hand copied: > > > > uvm_fault(0xf8b04ab6d8f0, 0x0, 1) -> e > > Fatal page fault in supervisor mode > > Trap type 6 code 0 rip 0x80b06b82 cs 0x8 rflags 0x10246 cr2 0x38 > > ilevel 0 rsp 0xfc62191caaaf0 > > Curlwp 0xff8b08ac6d040 pid 6904.22757 lowest kstack 0xfc62191ca62c0 > > Kernel: page fault trap, code = 0 > > Stopped in pid 6904.22757 (transmission-gtk) at > > netbsd:tcp_shutdown_wrapper+0x20 > > : movq 38(%rax), %r14 > > tcp_shutdown_wrapper() at netbsd:tcp_shutdown_wrapper:0x20 > > nfs_disconnect() at netbsd:nfs_disconnect+0x69 > > nfs_reconnect() at netbsd:nfs_reconnect+0x1a > > nfs_request() at netbsd:nfs_request+0x7fb > > nfs_statvfs() at netbsd:nfs_statvfs+0x173 > > VFS_STATVFS() at netbsd:VFS_STATVFS+0x22 > > dostatvfs() at netbsd:dostatvfs+0x132 > > do_sys_getvfsstat() at netbsd:do_sys_getvfsstat+0x9f > > sys___getvfsstat90() at netbsd:sys___getvfsstat90+0x2b > > syscall() at netbsd:syscall+0x196 > > > > I have nfs mounted some shares from a Synology station. > > > > Ideas? Perhaps the pcb merge changes from this week? > > Thomas >
Re: Any TCP VTW users?
On Fri, Sep 16, 2022 at 10:21 AM Simon Burge wrote: > > Ryota Ozaki wrote: > > > Hi, > > > > Are there any users of TCP Vestigial Time-Wait (VTW)? > > The feature is disabled by default and we need to explicitly > > enable via sysctl to use it. > > > > I just want to know if we should still maintain it. > > I wouldn't be unhappy if it just disappeared. It's totally > undocumented, and will also cause a panic on archs with larger cache > line size because it does some really funky incorrect math that I > stared at for a while then gave up on. > > erlite# sysctl -w net.inet.tcp.vtw.enable=1 > 15293.7939168] panic: kernel diagnostic assertion "n <= FATP_MAX / 2" failed: > file "../../../../netinet/tcp_vtw.c", line 218 Oh, that's bad... Thank you for telling me. ozaki-r
Re: Any TCP VTW users?
On Fri, Sep 16, 2022 at 2:03 AM Brad Spencer wrote: > > Ryota Ozaki writes: > > > Hi, > > > > Are there any users of TCP Vestigial Time-Wait (VTW)? > > The feature is disabled by default and we need to explicitly > > enable via sysctl to use it. > > > > I just want to know if we should still maintain it. > > > > ozaki-r > > > I do use it on one system, but it is not likely to be critical to > anything. The system in question creates a whole ton of short lived > connections and I think I was trying to get them to expire quicker then > normal. Thank you for the report! Just curious. Does it improve performance? (or reduce CPU/memory usage?) ozaki-r
Re: Any TCP VTW users?
On Thu, Sep 15, 2022 at 9:33 PM Andy Ruhl wrote: > > On Thu, Sep 15, 2022 at 12:34 AM Ryota Ozaki wrote: > > > > Hi, > > > > Are there any users of TCP Vestigial Time-Wait (VTW)? > > The feature is disabled by default and we need to explicitly > > enable via sysctl to use it. > > > > I just want to know if we should still maintain it. > > > > ozaki-r > > I wasn't even aware of it. I read the comments in > sys/netinet/tcp_vtw.c. Seems useful for systems that handle a lot of > sockets. Pretty neat. > > Is there some reason why this is obsolete or something? > > Andy When we do mp-ification of TCP, VTW requires extra efforts; the code looks not mp-ification friendly. If nobody uses the feature, we can defer mp-ification of it or completely ignore it. So I asked the question. ozaki-r
Any TCP VTW users?
Hi, Are there any users of TCP Vestigial Time-Wait (VTW)? The feature is disabled by default and we need to explicitly enable via sysctl to use it. I just want to know if we should still maintain it. ozaki-r
Re: State of NET_MPSAFE in -9 and/or -current?
On Thu, Mar 10, 2022 at 6:51 PM Michael van Elst wrote: > > ozak...@netbsd.org (Ryota Ozaki) writes: > > >Note that as you can see from the list above, Layer 4 protocols > >including TCP and UDP are > >not MP-safe yet. > > Is anyone working on it ? No one, AFAIK. ozaki-r
Re: State of NET_MPSAFE in -9 and/or -current?
On Thu, Mar 10, 2022 at 3:19 AM Brian Buhrow wrote: > > hello. I'm wondering if someone could comment on the state of using > NET_MPSAFE kernels? > Is it ready for production use yet? > -thanks > -Brian > It depends on your machine(s) and usages. doc/TODO.smpnet[1] lists up MP-safe components. If you need to use components in the "Unprotected ones" (sub)section, we can clearly say NET_MPSAFE kernels are not ready for production. Otherwise, you may be able to use NET_MPSAFE kernels in production. Actually we've used NET_MPSAFE kernels based on NetBSD 8 for a couple of years in production as routers. [1] http://cvsweb.netbsd.org/bsdweb.cgi/src/doc/TODO.smpnet?rev=1.45=text/x-cvsweb-markup_with_tag=MAIN Note that as you can see from the list above, Layer 4 protocols including TCP and UDP are not MP-safe yet. So using NET_MPSAFE kernels as servers or clients is probably not performant. ozaki-r
Re: File sharing over virtio-9p
On Fri, Oct 25, 2019 at 11:19 PM Mouse wrote: > > > [W]hich of the following is more readable to the user: > > > $ ls foo > > ls: foo: No such file or directory > > > or > > > $ ls foo > > ls: stat(foo): No such file or directory > > It depends entirely on the user. > > As I recently wrote on a non-NetBSD mailing list, there is no such > thing as a good or bad user interface; there is only a good or bad user > interfaces for a particular user (or class of sufficiently-similar > users). > > I've lost track of the number of times I've had to resort to a > sledgehammer such as ktrace to find out what's really going wrong > because an error message doesn't report enough information. I've had similar experiences on KASSERT; if a KASSERT fails because of memory corruption, I wish to know not only if it fails or not but also values used in KASSERT. Anyway thank you for suggestions. I committed the patches with changing the error message for open while keeping one for setsockopt. It may be good to have guidelines on writing error messages somewhere. ozaki-r
Re: File sharing over virtio-9p
On Fri, Oct 25, 2019 at 3:04 PM Michael van Elst wrote: > > ozaki.ry...@gmail.com (Ryota Ozaki) writes: > > >> @@ -72,7 +74,7 @@ serverconnect(const char *addr, unsigned short port) > >> + err(1, "setsockopt(SO_NOSIGPIPE)"); > >> +err(1, "open(%s)", path); > > >I prefer more informative messages. Why do you want to trim them? > > > Usually the error gives enough context, e.g. SO_NOSIGPIPE is a socket option > and telling that it's setsockopt failing is redundant and printing > an input file name is enough when the error identifies the operation > or the specific operation doesn't matter. > > But there is no rule for this, in particular when embedding filenames > where multiple operations are possible. Many people seem to prefer even > more verbose phrases like "Cannot open `%s'". Our code base has lots > of variants. I think I'm affected by ping6 or something (it's just one of variants though). > > I personally would prefer error messages without special characters > so that you can grep them easily. :) Indeed. A type of annoying messages is that a phrase is separated into two (or more) lines to avoid the 80 character limit. That's quite anti-grep :-/ ozaki-r
Re: File sharing over virtio-9p
On Fri, Oct 25, 2019 at 2:38 PM Valery Ushakov wrote: > > On Fri, Oct 25, 2019 at 12:56:43 +0900, Ryota Ozaki wrote: > > > > @@ -72,7 +74,7 @@ serverconnect(const char *addr, unsigned short port) > > > [...] > > > + err(1, "setsockopt(SO_NOSIGPIPE)"); > > > > > > I'd just trim it down to "SO_NOSIGPIPE". > > > > > > +err(1, "open(%s)", path); > > > > > > Ditto. Just make it "%s". > > > > I prefer more informative messages. Why do you want to trim them? > > Consider that from the user perspective. As a developer it's tempting > to dump the implementation details, but which of the following is more > readable to the user: > > $ ls foo > ls: foo: No such file or directory > > or > > $ ls foo > ls: stat(foo): No such file or directory Hm, the example makes sense to me (so I'll fix open's one), but doesn't for setsockopt: mount_9p: SO_NOSIGPIPE: Cannot allocate memory or mount_9p: setsockopt(SO_NOSIGPIPE): Cannot allocate memory I think the latter looks readable/understandable to users. ozaki-r
Re: File sharing over virtio-9p
On Tue, May 21, 2019 at 1:39 PM Ryota Ozaki wrote: > > Hi, > > The following two patches enables a NetBSD guest running > on a Linux KVM to share files with its host over virtio-9p. > > https://www.netbsd.org/~ozaki-r/vio9p.diff > https://www.netbsd.org/~ozaki-r/mount_9p-cdev.diff > > The first patch exports a 9p endpoint of virtio-9p via a character > device (e.g., /dev/vio9p0) and the second one extends mount_9p > (puffs) to talk with a 9p server inside qemu via the character device. > > [Usage] > > Export a directory of a host via virtio-9p. The below XML is part of > a libvirt domain configuration. It exports "/var/lib/libvirt/export" > directory with a tag "test". If it is the first entry, it will have > minor number 0 and /dev/vio9p0 is assigned to it typically. > > > > >function='0x0'/> > > > A boot log of vio9p looks like this: > > virtio0 at pci0 dev 3 function 0 > virtio0: Virtio 9P Transport Device (rev. 0x00) > vio9p0 at virtio0: Features: 0x1000 > virtio0: allocated 12288 byte for virtqueue 0 for vio9p, size 128 > virtio0: using 4096 byte (256 entries) indirect descriptors > vio9p0: tagged as test > virtio0: interrupting at ioapic0 pin 11 > > A NetBSD guest can mount the exported directory with mount_9p. > > mount_9p -cu /dev/vio9p0 /mnt/9p > > -c tells mount_9p to interpret the first argument as a character > device file to talk with. > > > Have fun, > ozaki-r I've prepared complete patches ready to commit: https://www.netbsd.org/~ozaki-r/tweak-MAKEDEV.diff https://www.netbsd.org/~ozaki-r/vio9p.diff https://www.netbsd.org/~ozaki-r/vio9p-configs.diff https://www.netbsd.org/~ozaki-r/mount_9p-cdev.diff Any comments? I would like to commit them in several days if there is no objection. Regards, ozaki-r
Re: Panic on netbsd 8.1 sparc nfsroot: sosend: locking against myself
On Fri, May 24, 2019 at 11:45 PM Paul Ripke wrote: > > On Fri, May 24, 2019 at 12:15:58PM +0900, Ryota Ozaki wrote: > > On Thu, May 23, 2019 at 10:00 PM Paul Ripke wrote: > > > > > > Old Sun sparc 5, 32MiB RAM, netbooted with nfs root & swap. Has been > > > running fine with an old kernel built from netbsd-8: > > > > > > NetBSD 8.0_STABLE (GENERIC) #0: Wed Sep 26 17:47:02 AEST 2018 > > > > > > Booting a kernel from netbsd-8 from the last few days: > > > > > > NetBSD 8.1_RC1 (ORAC) #5: Thu May 23 21:24:22 AEST 2019 > > > > > > panics either during or shortly after boot, with the following console > > > log: > > > > > > --- > > > Starting sshd. > > > Mutex error: mutex_vector_enter,552: locking against myself > > > > > > lock address : 0xf04aafc0 > > > current cpu : 0 > > > current lwp : 0xf0604680 > > > owner field : 0xf0604680 wait/spin:0/0 > > > > > > panic: lock error: Mutex: mutex_vector_enter,552: locking against myself: > > > lock 0xf04aafc0 cpu 0 lwp 0xf0604680 > > > cpu0: Begin traceback... > > > 0x0(0xf02cbb88, 0xf3454108, 0xf0348800, 0xf0349400, 0xf0349648, 0x104) at > > > netbsd:panic+0x20 > > > panic(0xf02cbb88, 0xf02c89f8, 0xf02a1d08, 0x228, 0xf02c89c0, 0xf04aafc0) > > > at netbsd:lockdebug_abort+0x9c > > > lockdebug_abort(0xf02a1d08, 0x228, 0xf04aafc0, 0xf0329950, 0xf02c89c0, > > > 0xf0002000) at netbsd:mutex_enter+0x1cc > > > mutex_enter(0xf04aafc0, 0x13, 0xf032993c, 0xf0349400, 0xf0604680, > > > 0xf0604680) at netbsd:sosend+0x44 > > > sosend(0xf05a22a0, 0xf060a020, 0x0, 0xf04aafc0, 0x700, 0x0) at > > > netbsd:nfs_send+0x90 > > > nfs_send(0xf05a22a0, 0xf060a000, 0xf0791e00, 0xf052f1f8, 0xf0604680, 0x0) > > > at netbsd:nfs_request+0x2f4 > > > nfs_request(0xf052f1f8, 0xf04f6a00, 0x2c, 0xf0342764, 0x0, 0x700) at > > > netbsd:nfs_readrpc+0x1dc > > > nfs_readrpc(0xf06d8cb8, 0xf34544c8, 0x1000, 0x1000, 0xf06d61b0, > > > 0xf04f6a40) at netbsd:nfs_doio+0x6bc > > > nfs_doio(0xf07c9020, 0x1, 0xf07c9020, 0x0, 0xf073ee00, 0xf06d8cb8) at > > > netbsd:VOP_STRATEGY+0x3c > > > VOP_STRATEGY(0xf06d8cb8, 0xf07c9020, 0x0, 0xf04ad468, 0xf34545d8, > > > 0xf0744000) at netbsd:sw_reg_start.part.0+0x20 > > > sw_reg_start.part.0(0xf0518008, 0xf07c9020, 0x1, 0xf07c9020, 0x10, > > > 0xf06d8cb8) at netbsd:swstrategy+0x3fc > > > swstrategy(0xf05b6480, 0x1000, 0xf19af000, 0x1000, 0xf07c0fc0, > > > 0xf0518008) at netbsd:bdev_strategy+0x50 > > > bdev_strategy(0xf05b6480, 0x0, 0xf032993c, 0x0, 0xf0604680, 0x0) at > > > netbsd:spec_strategy+0x88 > > > spec_strategy(0x0, 0x1c, 0x400, 0x0, 0xf0539d48, 0xf05b6480) at > > > netbsd:VOP_STRATEGY+0x3c > > > VOP_STRATEGY(0xf0539d48, 0xf05b6480, 0xf0342ecc, 0xf0330de8, 0xf0029538, > > > 0xf04fb000) at netbsd:uvm_swap_io+0x10c > > > uvm_swap_io(0xf345488c, 0xe90, 0x1, 0x10, 0x10, 0xf05b6480) at > > > netbsd:uvm_swap_get+0x3c > > > uvm_swap_get(0x5, 0x1d2, 0x2, 0x0, 0x10, 0xf0342ecc) at > > > netbsd:uvmfault_anonget+0x2c4 > > > uvmfault_anonget(0xf3454944, 0xf060d758, 0xf05fa630, 0x1, 0xf0342ecc, > > > 0xf044a530) at netbsd:uvm_fault_internal+0xbbc > > > uvm_fault_internal(0xedb0a000, 0x1, 0x20, 0x0, 0xf3454944, 0xf05fa630) at > > > netbsd:mem_access_fault4m+0x514 > > > mem_access_fault4m(0x9, 0x3a6, 0xedb05000, 0xf3454b08, 0x40, 0xf0604680) > > > at netbsd:memfault_sun4m+0xe8 > > > memfault_sun4m(0xf04de400, 0xedb05000, 0xf8, 0xf3453000, 0x1000404, > > > 0x2) at netbsd:copyout+0x28 > > > copyout(0x0, 0xf3454d88, 0xedb05000, 0xe400, 0x0, 0xf0644e60) at > > > netbsd:rt_walktree_visitor+0xc > > > rt_walktree_visitor(0xf0644e60, 0xf3454d10, 0xedb05000, 0xe400, 0x0, > > > 0x0) at netbsd:rn_walktree+0xbc > > > rn_walktree(0xf04d8e70, 0xf02593b8, 0xf3454d10, 0x0, 0xf0644950, > > > 0xf05a4870) at netbsd:rtbl_walktree+0x30 > > > rtbl_walktree(0x0, 0xf0259dd8, 0xf3454d88, 0xf0349400, 0xf0604680, 0x0) > > > at netbsd:sysctl_rtable+0x114 > > > sysctl_rtable(0xf0259dd8, 0x18, 0xedb05000, 0xf3454e94, 0x16, 0x18) at > > > netbsd:sysctl_dispatch+0x94 > > > sysctl_dispatch(0xf3454e98, 0x6, 0xedb05000, 0xf3454e94, 0x0, 0x0) at > > > netbsd:sys___sysctl+0xc4 > > > sys___sysctl(0xf0604680, 0xf3454f30, 0xf3454f28, 0xe404, 0x1b54, > > > 0xe400) a
Re: Panic on netbsd 8.1 sparc nfsroot: sosend: locking against myself
On Thu, May 23, 2019 at 10:00 PM Paul Ripke wrote: > > Old Sun sparc 5, 32MiB RAM, netbooted with nfs root & swap. Has been > running fine with an old kernel built from netbsd-8: > > NetBSD 8.0_STABLE (GENERIC) #0: Wed Sep 26 17:47:02 AEST 2018 > > Booting a kernel from netbsd-8 from the last few days: > > NetBSD 8.1_RC1 (ORAC) #5: Thu May 23 21:24:22 AEST 2019 > > panics either during or shortly after boot, with the following console > log: > > --- > Starting sshd. > Mutex error: mutex_vector_enter,552: locking against myself > > lock address : 0xf04aafc0 > current cpu : 0 > current lwp : 0xf0604680 > owner field : 0xf0604680 wait/spin:0/0 > > panic: lock error: Mutex: mutex_vector_enter,552: locking against myself: > lock 0xf04aafc0 cpu 0 lwp 0xf0604680 > cpu0: Begin traceback... > 0x0(0xf02cbb88, 0xf3454108, 0xf0348800, 0xf0349400, 0xf0349648, 0x104) at > netbsd:panic+0x20 > panic(0xf02cbb88, 0xf02c89f8, 0xf02a1d08, 0x228, 0xf02c89c0, 0xf04aafc0) at > netbsd:lockdebug_abort+0x9c > lockdebug_abort(0xf02a1d08, 0x228, 0xf04aafc0, 0xf0329950, 0xf02c89c0, > 0xf0002000) at netbsd:mutex_enter+0x1cc > mutex_enter(0xf04aafc0, 0x13, 0xf032993c, 0xf0349400, 0xf0604680, 0xf0604680) > at netbsd:sosend+0x44 > sosend(0xf05a22a0, 0xf060a020, 0x0, 0xf04aafc0, 0x700, 0x0) at > netbsd:nfs_send+0x90 > nfs_send(0xf05a22a0, 0xf060a000, 0xf0791e00, 0xf052f1f8, 0xf0604680, 0x0) at > netbsd:nfs_request+0x2f4 > nfs_request(0xf052f1f8, 0xf04f6a00, 0x2c, 0xf0342764, 0x0, 0x700) at > netbsd:nfs_readrpc+0x1dc > nfs_readrpc(0xf06d8cb8, 0xf34544c8, 0x1000, 0x1000, 0xf06d61b0, 0xf04f6a40) > at netbsd:nfs_doio+0x6bc > nfs_doio(0xf07c9020, 0x1, 0xf07c9020, 0x0, 0xf073ee00, 0xf06d8cb8) at > netbsd:VOP_STRATEGY+0x3c > VOP_STRATEGY(0xf06d8cb8, 0xf07c9020, 0x0, 0xf04ad468, 0xf34545d8, 0xf0744000) > at netbsd:sw_reg_start.part.0+0x20 > sw_reg_start.part.0(0xf0518008, 0xf07c9020, 0x1, 0xf07c9020, 0x10, > 0xf06d8cb8) at netbsd:swstrategy+0x3fc > swstrategy(0xf05b6480, 0x1000, 0xf19af000, 0x1000, 0xf07c0fc0, 0xf0518008) at > netbsd:bdev_strategy+0x50 > bdev_strategy(0xf05b6480, 0x0, 0xf032993c, 0x0, 0xf0604680, 0x0) at > netbsd:spec_strategy+0x88 > spec_strategy(0x0, 0x1c, 0x400, 0x0, 0xf0539d48, 0xf05b6480) at > netbsd:VOP_STRATEGY+0x3c > VOP_STRATEGY(0xf0539d48, 0xf05b6480, 0xf0342ecc, 0xf0330de8, 0xf0029538, > 0xf04fb000) at netbsd:uvm_swap_io+0x10c > uvm_swap_io(0xf345488c, 0xe90, 0x1, 0x10, 0x10, 0xf05b6480) at > netbsd:uvm_swap_get+0x3c > uvm_swap_get(0x5, 0x1d2, 0x2, 0x0, 0x10, 0xf0342ecc) at > netbsd:uvmfault_anonget+0x2c4 > uvmfault_anonget(0xf3454944, 0xf060d758, 0xf05fa630, 0x1, 0xf0342ecc, > 0xf044a530) at netbsd:uvm_fault_internal+0xbbc > uvm_fault_internal(0xedb0a000, 0x1, 0x20, 0x0, 0xf3454944, 0xf05fa630) at > netbsd:mem_access_fault4m+0x514 > mem_access_fault4m(0x9, 0x3a6, 0xedb05000, 0xf3454b08, 0x40, 0xf0604680) at > netbsd:memfault_sun4m+0xe8 > memfault_sun4m(0xf04de400, 0xedb05000, 0xf8, 0xf3453000, 0x1000404, 0x2) > at netbsd:copyout+0x28 > copyout(0x0, 0xf3454d88, 0xedb05000, 0xe400, 0x0, 0xf0644e60) at > netbsd:rt_walktree_visitor+0xc > rt_walktree_visitor(0xf0644e60, 0xf3454d10, 0xedb05000, 0xe400, 0x0, 0x0) > at netbsd:rn_walktree+0xbc > rn_walktree(0xf04d8e70, 0xf02593b8, 0xf3454d10, 0x0, 0xf0644950, 0xf05a4870) > at netbsd:rtbl_walktree+0x30 > rtbl_walktree(0x0, 0xf0259dd8, 0xf3454d88, 0xf0349400, 0xf0604680, 0x0) at > netbsd:sysctl_rtable+0x114 > sysctl_rtable(0xf0259dd8, 0x18, 0xedb05000, 0xf3454e94, 0x16, 0x18) at > netbsd:sysctl_dispatch+0x94 > sysctl_dispatch(0xf3454e98, 0x6, 0xedb05000, 0xf3454e94, 0x0, 0x0) at > netbsd:sys___sysctl+0xc4 > sys___sysctl(0xf0604680, 0xf3454f30, 0xf3454f28, 0xe404, 0x1b54, > 0xe400) at netbsd:syscall+0x248 > syscall(0xcca, 0xf3454fb0, 0xede028d0, 0xca, 0x4e, 0xf0604680) at > netbsd:memfault_sun4m+0x3f4 > cpu0: End traceback... > Frame pointer is at 0xf3453f20 > Call traceback: > pc = 0xf0024fec args = (0xf02be550, 0x0, 0xffe2, 0xf02aca38, 0xf01dcfc8, > 0xf0002000) fp = 0xf3453f90 > pc = 0xf01dd358 args = (0x104, 0x0, 0xf02cbb88, 0xf0002000, 0xf0321000, > 0xf0344c00) fp = 0xf3453ff8 > pc = 0xf01dd3e4 args = (0xf02cbb88, 0xf3454108, 0xf0348800, 0xf0349400, > 0xf0349648, 0x104) fp = 0xf3454058 > rebooting Could you try the below patch? Thanks, ozaki-r --- diff --git a/sys/net/rtsock.c b/sys/net/rtsock.c index 399b2049130..4f17e716e29 100644 --- a/sys/net/rtsock.c +++ b/sys/net/rtsock.c @@ -1873,7 +1873,7 @@ again: w.w_needed = 0 - w.w_given; w.w_where = where; - SOFTNET_KERNEL_LOCK_UNLESS_NET_MPSAFE(); + KERNEL_LOCK_UNLESS_NET_MPSAFE(); s = splsoftnet(); switch (w.w_op) { @@ -1932,7 +1932,7 @@ again: break; } splx(s); - SOFTNET_KERNEL_UNLOCK_UNLESS_NET_MPSAFE(); + KERNEL_UNLOCK_UNLESS_NET_MPSAFE(); /* check to see if we couldn't allocate memory with
File sharing over virtio-9p
Hi, The following two patches enables a NetBSD guest running on a Linux KVM to share files with its host over virtio-9p. https://www.netbsd.org/~ozaki-r/vio9p.diff https://www.netbsd.org/~ozaki-r/mount_9p-cdev.diff The first patch exports a 9p endpoint of virtio-9p via a character device (e.g., /dev/vio9p0) and the second one extends mount_9p (puffs) to talk with a 9p server inside qemu via the character device. [Usage] Export a directory of a host via virtio-9p. The below XML is part of a libvirt domain configuration. It exports "/var/lib/libvirt/export" directory with a tag "test". If it is the first entry, it will have minor number 0 and /dev/vio9p0 is assigned to it typically. A boot log of vio9p looks like this: virtio0 at pci0 dev 3 function 0 virtio0: Virtio 9P Transport Device (rev. 0x00) vio9p0 at virtio0: Features: 0x1000 virtio0: allocated 12288 byte for virtqueue 0 for vio9p, size 128 virtio0: using 4096 byte (256 entries) indirect descriptors vio9p0: tagged as test virtio0: interrupting at ioapic0 pin 11 A NetBSD guest can mount the exported directory with mount_9p. mount_9p -cu /dev/vio9p0 /mnt/9p -c tells mount_9p to interpret the first argument as a character device file to talk with. Have fun, ozaki-r
Re: iwm driver leads to kernel crash
On Mon, Apr 1, 2019 at 6:53 AM wrote: > > Hi, > > would further dmesg outputs from the last 10 or so kernel crashes > still be useful? Yes, and if you have crashdumps, could you please provide detailed information from them? (see https://wiki.netbsd.org/panic/ for the instructions). > This still keeps happening (workaround so far is to use ethernet). > > Or maybe I'm looking at the wrong kind of bug and there is something > to track / being worked on already? I guess no. Thanks, ozaki-r
Re: netbsd-8: panic: sockaddr_copy: source too long, 28 < 128 bytes
On Mon, Nov 5, 2018 at 12:38 PM Ryota Ozaki wrote: > (snip) > > I can reproduce the panic easily by the small program: > > // start-- > #include > #include > #include > > int > main(void) > { > char buf[64]; > struct sockaddr_storage ss = {0}; > int s, e; > > ss.ss_family = AF_INET6; > ss.ss_len = sizeof(struct sockaddr_in6); Oops. sin6_addr and sin6_port (of ss casted to sockaddr_in6) should not be zero and so be set some 1. ozaki-r > s = socket(AF_INET6, SOCK_DGRAM, 0); > e = sendto(s, buf, sizeof(buf), 0, (struct sockaddr *), ss.ss_len); > if (e == -1) > warn("sendto"); > ss.ss_len = sizeof(ss); > e = sendto(s, buf, sizeof(buf), 0, (struct sockaddr *), ss.ss_len); > if (e == -1) > warn("sendto"); > } > // --end
Re: netbsd-8: panic: sockaddr_copy: source too long, 28 < 128 bytes
On Tue, Nov 6, 2018 at 10:41 AM Paul Ripke wrote: > > On Mon, Nov 05, 2018 at 05:28:23PM +0900, Ryota Ozaki wrote: > > On Mon, Nov 5, 2018 at 4:40 PM Michael van Elst wrote: > > > > > > ozak...@netbsd.org (Ryota Ozaki) writes: > > > > > > >diff --git a/sys/netinet6/udp6_usrreq.c b/sys/netinet6/udp6_usrreq.c > > > >index ee4fc6fdfb3..a4a74c8009e 100644 > > > >--- a/sys/netinet6/udp6_usrreq.c > > > >+++ b/sys/netinet6/udp6_usrreq.c > > > >@@ -668,10 +668,18 @@ udp6_output(struct in6pcb * const in6p, struct > > > >mbuf *m, > > > > > > >if (addr6) { > > > >sin6 = addr6; > > > >+ if (sin6->sin6_len != sizeof(*sin6)) { > > > >+ error = EINVAL; > > > >+ goto release; > > > >+ } > > > >if (sin6->sin6_family != AF_INET6) { > > > >error = EAFNOSUPPORT; > > > >goto release; > > > >} > > > >+ if (sin6->sin6_port == 0) { > > > >+ error = EADDRNOTAVAIL; > > > >+ goto release; > > > >+ } > > > > > > The port validation is already done a few lines below, > > > > Thanks, that's right. > > > > > but the comment when using the port is a bit strange: > > > > > > fport = sin6->sin6_port; /* allow 0 port */ > > > > > > Apparently that comment (and the port check) already > > > existed when the initial version was imported. > > > > Well... I think the comment is just a leftover to be removed :-/ > > > > ozaki-r > > Thanks! Patched into netbsd-8, running with it now. I do wonder > which process was responsible for doing the op. It's been too long > since I've tried grokking gdb on kvm cores... Thank you for testing! I hope the patch fixes the panic you encountered. Anyway I'll commit and pull up the fix soon because it certainly fixes a panic. ozaki-r
Re: netbsd-8: panic: sockaddr_copy: source too long, 28 < 128 bytes
On Mon, Nov 5, 2018 at 4:40 PM Michael van Elst wrote: > > ozak...@netbsd.org (Ryota Ozaki) writes: > > >diff --git a/sys/netinet6/udp6_usrreq.c b/sys/netinet6/udp6_usrreq.c > >index ee4fc6fdfb3..a4a74c8009e 100644 > >--- a/sys/netinet6/udp6_usrreq.c > >+++ b/sys/netinet6/udp6_usrreq.c > >@@ -668,10 +668,18 @@ udp6_output(struct in6pcb * const in6p, struct mbuf *m, > > >if (addr6) { > >sin6 = addr6; > >+ if (sin6->sin6_len != sizeof(*sin6)) { > >+ error = EINVAL; > >+ goto release; > >+ } > >if (sin6->sin6_family != AF_INET6) { > >error = EAFNOSUPPORT; > >goto release; > >} > >+ if (sin6->sin6_port == 0) { > >+ error = EADDRNOTAVAIL; > >+ goto release; > >+ } > > The port validation is already done a few lines below, Thanks, that's right. > but the comment when using the port is a bit strange: > > fport = sin6->sin6_port; /* allow 0 port */ > > Apparently that comment (and the port check) already > existed when the initial version was imported. Well... I think the comment is just a leftover to be removed :-/ ozaki-r
Re: netbsd-8: panic: sockaddr_copy: source too long, 28 < 128 bytes
On Mon, Nov 5, 2018 at 10:38 AM Paul Ripke wrote: > > I'm running netbsd-8 synced as of ~2018-10-30, and am continuing to > get occasional panics, about once a week on the prior kernel, and > another just now. Is this familiar to anybody? Core on hand, pointers > on what to look at appreciated. > > NetBSD slave 8.0_STABLE NetBSD 8.0_STABLE (SLAVE) #1: Tue Oct 30 07:40:01 > AEDT 2018 > stix@slave:/home/netbsd/netbsd-8/obj.amd64/home/netbsd/netbsd-8/src/sys/arch/amd64/compile/SLAVE > amd64 > > Nov 5 12:09:59 slave /netbsd: panic: sockaddr_copy: source too long, 28 < > 128 bytes > Nov 5 12:09:59 slave /netbsd: cpu0: Begin traceback... > Nov 5 12:09:59 slave /netbsd: vpanic() at netbsd:vpanic+0x15d > Nov 5 12:09:59 slave /netbsd: snprintf() at netbsd:snprintf > Nov 5 12:09:59 slave /netbsd: sockaddr_copy() at netbsd:sockaddr_copy+0x7f > Nov 5 12:09:59 slave /netbsd: rtcache_setdst() at netbsd:rtcache_setdst+0x5d > Nov 5 12:09:59 slave /netbsd: rtcache_lookup2() at > netbsd:rtcache_lookup2+0x5e > Nov 5 12:09:59 slave /netbsd: in6_selectroute() at > netbsd:in6_selectroute+0x15a > Nov 5 12:09:59 slave /netbsd: in6_selectsrc() at netbsd:in6_selectsrc+0x100 > Nov 5 12:09:59 slave /netbsd: udp6_output() at netbsd:udp6_output+0x246 > Nov 5 12:09:59 slave /netbsd: udp6_send_wrapper() at > netbsd:udp6_send_wrapper+0x51 > Nov 5 12:09:59 slave /netbsd: sosend() at netbsd:sosend+0x77f > Nov 5 12:09:59 slave /netbsd: do_sys_sendmsg_so() at > netbsd:do_sys_sendmsg_so+0x26d > Nov 5 12:09:59 slave /netbsd: do_sys_sendmsg() at netbsd:do_sys_sendmsg+0x85 > Nov 5 12:09:59 slave /netbsd: sys_sendto() at netbsd:sys_sendto+0x5c > Nov 5 12:09:59 slave /netbsd: syscall() at netbsd:syscall+0x1ec > Nov 5 12:09:59 slave /netbsd: --- syscall (number 133) --- > Nov 5 12:09:59 slave /netbsd: 7db02a8eea4a: > Nov 5 12:09:59 slave /netbsd: cpu0: End traceback... > > (gdb) p *src > $3 = { > sa_len = 128 '\200', > sa_family = 24 '\030', > sa_data = "\254\005\000\000\000\000 \001A\320\000\n^\207" > } > > -- > Paul Ripke > "Great minds discuss ideas, average minds discuss events, small minds > discuss people." > -- Disputed: Often attributed to Eleanor Roosevelt. 1948. I can reproduce the panic easily by the small program: // start-- #include #include #include int main(void) { char buf[64]; struct sockaddr_storage ss = {0}; int s, e; ss.ss_family = AF_INET6; ss.ss_len = sizeof(struct sockaddr_in6); s = socket(AF_INET6, SOCK_DGRAM, 0); e = sendto(s, buf, sizeof(buf), 0, (struct sockaddr *), ss.ss_len); if (e == -1) warn("sendto"); ss.ss_len = sizeof(ss); e = sendto(s, buf, sizeof(buf), 0, (struct sockaddr *), ss.ss_len); if (e == -1) warn("sendto"); } // --end It seems that, on UDP/IPv6, a passed sockaddr to sendto isn't validated well and passed to the lower layers as is, then it triggers the panic. The length check of a sockaddr was performed implicitly in udp6_output but the check was removed accidentally between NetBSD 7 and 8 (*). (*) http://cvsweb.netbsd.org/cgi-bin/cvsweb.cgi/src/sys/netinet6/Attic/udp6_output.c.diff?r1=1.48=1.49=h So the follow patch fixes the issue. (There can be better fixes.) Thanks, ozaki-r diff --git a/sys/netinet6/udp6_usrreq.c b/sys/netinet6/udp6_usrreq.c index ee4fc6fdfb3..a4a74c8009e 100644 --- a/sys/netinet6/udp6_usrreq.c +++ b/sys/netinet6/udp6_usrreq.c @@ -668,10 +668,18 @@ udp6_output(struct in6pcb * const in6p, struct mbuf *m, if (addr6) { sin6 = addr6; + if (sin6->sin6_len != sizeof(*sin6)) { + error = EINVAL; + goto release; + } if (sin6->sin6_family != AF_INET6) { error = EAFNOSUPPORT; goto release; } + if (sin6->sin6_port == 0) { + error = EADDRNOTAVAIL; + goto release; + } /* protect *sin6 from overwrites */ tmp = *sin6;
Re: Running out of buffers?
On Fri, Apr 27, 2018 at 1:18 PM Roy Marpleswrote: > On 27/04/2018 04:09, Paul Goyette wrote: > > I've got lots of memory, so I don't understand what buffers are not > > available. Ever since upgrading to my current system (sources dated > > 2018-03-20 11:25:00 UTC), I've been seeing these messages at random > > intervals: > > > > Apr 23 05:51:33 speedy ntpd[526]: routing socket reports: No buffer > > space available > This may come as some suprise, but the only change is that the error is > now logged. Previously it was silenty discarded. > No-one has yet weighed in on how this should be resolved. > > I never saw them with a previous kernel (from March 3rd), so it > > would seem that something changed between the 3rd and 20th. > > > > Is anyone else seeing similar? > > > > Any clues on what changed? > > > > The situation doesn't seem fatal (at least, not yet), but I'd like > > to mitigate the condition before it gets worse. :) > Ideas welcome! > The only one stop solution I can think of is increasing the the default > buffer size, but this might adversley affect small memory systems. > > Thanks in advance for any suggestions. > Looking forward to hearimg some! One option would be to add a new socket option and send up an error only if it's set, which avoids surprising unaware apps. ozaki-r
Re: netbsd-8 hang on tstile
On Thu, Mar 8, 2018 at 5:04 PM, Manuel Bouyer <bou...@antioche.eu.org> wrote: > On Thu, Mar 08, 2018 at 11:04:00AM +0900, Ryota Ozaki wrote: >> On Wed, Mar 7, 2018 at 6:38 PM, Manuel Bouyer <bou...@antioche.eu.org> wrote: >> > On Wed, Mar 07, 2018 at 09:40:45AM +0100, Manuel Bouyer wrote: >> >> On Wed, Mar 07, 2018 at 04:49:07PM +0900, Ryota Ozaki wrote: >> >> > >[...] >> >> > > This is reproductible, restarting my automatic test script hangs the >> >> > > same >> >> > > way. This i plain ffs, no wapbl. >> >> > > >> >> > > Any idea ? >> >> > >> >> > pserialize_perform can get stuck if any softints get stuck. Can you >> >> > check >> >> > if such softints exist? >> >> >> >> I'll look the next time I can see this. >> > >> > I had a console log of a previous hang. No softint appears to be waiting >> > in the ps/a output >> >> Thanks. Hm, does ps/a show softints (say softnet/0)? >> I use just ps for the purpose. > > I used plain ps too. Not sure why I added this /a in my mail (probably > related to tr/a :) Okay, thanks :) So my concern probably proved unfounded. ozaki-r
Re: netbsd-8 hang on tstile
On Wed, Mar 7, 2018 at 6:38 PM, Manuel Bouyer <bou...@antioche.eu.org> wrote: > On Wed, Mar 07, 2018 at 09:40:45AM +0100, Manuel Bouyer wrote: >> On Wed, Mar 07, 2018 at 04:49:07PM +0900, Ryota Ozaki wrote: >> > >[...] >> > > This is reproductible, restarting my automatic test script hangs the same >> > > way. This i plain ffs, no wapbl. >> > > >> > > Any idea ? >> > >> > pserialize_perform can get stuck if any softints get stuck. Can you check >> > if such softints exist? >> >> I'll look the next time I can see this. > > I had a console log of a previous hang. No softint appears to be waiting > in the ps/a output Thanks. Hm, does ps/a show softints (say softnet/0)? I use just ps for the purpose. ozaki-r
Re: netbsd-8 hang on tstile
On Wed, Mar 7, 2018 at 7:33 AM, Manuel Bouyerwrote: > Hello > on an up-to-date netbsd-8 Xen3 i386PAE kernel I see hangs on tstile. > Hung processes shows the same pattern, they sleep in fstrans_start(): > sleepq_block(0,0,c0596900,c0639b6c,c534e802,40,c03dbcfe,75,c5356340,c534e800) > at netbsd:sleepq_block+0xe6 > turnstile_block(c59847f0,1,c078b6c0,c0639b6c,de8b9b70,c59847f0,c5db6020,c59a2008,0,c59a2008) > at netbsd:turnstile_block+0x29d > mutex_vector_enter(c078b6c0,c0cb8fe0,c055570c,de8b9b84,c05349d7,c575ed04,1,c59a2008,de8b9ba4,c0493ff8) > at netbsd:mutex_vector_enter+0x28c > fstrans_start(c59a2008,0,de8b9ba8,10,c055cf2c,c575ed04,20002,20002,c575ed04,0) > at netbsd:fstrans_start+0x3b6 > VOP_LOCK(c575ed04,20002,22000,0,c03dc2e7,c0746e1a,de8b9d0c,20002,c575ed04,de8b9c88) > at netbsd:VOP_LOCK+0x48 > vn_lock(c575ed04,20002,4,0,de8b9c14,c03af32d,de8b9edc,3,0,5) at > netbsd:vn_lock+0x7f > namei_tryemulroot(0,c012a96a,c532c240,de8b9ce8,c041f909,c532c2b4,de8b9d0c,de8b9d34,0,0) > at netbsd:namei_tryemulroot+0x14f > namei(de8b9d0c,1,3,de8b9d10,c042266a,0,c532c240,de8b9d20,c04211d6,0) at > netbsd:namei+0x27 > check_exec(c5db6020,de8b9dc8,c5f558a8,de8b9da4,9cf21000,de8b9dac,c0114f1a,bad054d0,de8b9ddc,bf7fef58) > at netbsd:check_exec+0x40 > execve_loadvm(bf7feab8,c03c95a0,de8b9dc8,c5a6f000,c6460c00,c700b008,404,0,0,0) > at netbsd:execve_loadvm+0x233 > execve1(c5db6020,bf7fef58,bad054d0,bf7feab8,c03c95a0,de8b9f9c,c0113572,c5db6020, > de8b9f68,de8b9f60) at netbsd:execve1+0x3c > sys_execve(c5db6020,de8b9f68,de8b9f60,c63d8290,0,c0636ddc,de8b9f68,0,0,bf7fef58) > at netbsd:sys_execve+0x31 > syscall() at netbsd:syscall+0x82 > > I guess the culprit is: > 2650 1 3 2 0 c5f92a80 python2.7 psrlz > (an anita process, actually) > sleepq_block(1,0,c059af17,c0639f80,c0640804,c5356340,c5358a80,c063f93c,6406c2,c0 > 59af17) at netbsd:sleepq_block+0x1cd > kpause(c059af17,0,1,0,c5448590,c5cec008,de5e7b5c,c048355a,c5330a28,de5e7b6c) > at > n > etbsd:kpause+0xf2 > pserialize_perform(c5330a28,de5e7b6c,c0484a6f,c638e8c0,0,c055cf2c,c5cec008,504,0,de5e7b7c) > at netbsd:pserialize_perform+0x10a > fstrans_setstate(c5cec008,0,fffe,c0115914,c5cec008,c5cec008,de5e7b94,c047a2c0,c5cec008,2) > at netbsd:fstrans_setstate+0x3a > genfs_suspendctl(c5cec008,2,de5e7bb0,de5e7bd4,de5e7bb0,c0483fc4,c5cec008,2,de5e7bd4,de5e7bd4) > at netbsd:genfs_suspendctl+0x3a > VFS_SUSPENDCTL(c5cec008,2,de5e7bd4,de5e7bd4,504,de5e7be4,c0486fd6,c5cec008,504,0) > at netbsd:VFS_SUSPENDCTL+0x20 > vfs_resume(c5cec008,504,0,de5e7bd4,c011599f,4,c5cec008,c5e978e4,c5e978e4,c5f92a80) > at netbsd:vfs_resume+0x74 > vrevoke(c5e978e4,de5e7c14,c049370a,de5e7c04,0,0,c055d160,c5e978e4,1,0) at > netbsd:vrevoke+0x96 > genfs_revoke(de5e7c04,0,0,c055d160,c5e978e4,1,0,de5e7cc4,c044dea9,c5e978e4) > at netbsd:genfs_revoke+0x1a > VOP_REVOKE(c5e978e4,1,c5353f00,504,0,74,c5e978e4,0,190,) at > netbsd:VOP_REVOKE+0x4a > pty_grant_slave(c5f92a80,504,0,c5cec008,0,10,c055cf2c,c5c271bc,20002,20002) > at netbsd:pty_grant_slave+0xc9 > ptmioctl(a501,0,48087446,c6922008,3,c5f92a80,c5f92a80,48087446,c055c260,3) at > netbsd:ptmioctl+0xdd > cdev_ioctl(a501,0,48087446,c6922008,3,c5f92a80,a501,c5c271bc,c5c271bc,c5fd22c0) > at netbsd:cdev_ioctl+0xd0 > spec_ioctl(de5e7da0,c0115914,c5df9bc0,c055d1f0,c5c271bc,48087446,c6922008,3,c5eac9c0,48087446) > at netbsd:spec_ioctl+0x90 > VOP_IOCTL(c5c271bc,48087446,c6922008,3,c5eac9c0,c5e74380,c6fbe790,fffe,c0115914,0) > at netbsd:VOP_IOCTL+0x3e > vn_ioctl(c5fd22c0,48087446,c6922008,c5f8eb74,c5fd22c0,,,0,0,c6922008) > at netbsd:vn_ioctl+0x9f > sys_ioctl(c5f92a80,de5e7f68,de5e7f60,c63d8788,0,c0636d78,de5e7f68,0,0,7) at > netbsd:sys_ioctl+0x10a > syscall() at netbsd:syscall+0x82 > > This is reproductible, restarting my automatic test script hangs the same > way. This i plain ffs, no wapbl. > > Any idea ? pserialize_perform can get stuck if any softints get stuck. Can you check if such softints exist? ozaki-r
Re: Status of 8.99.12
On Mon, Feb 12, 2018 at 9:48 AM, Paul Goyettewrote: > After an extended period of build breaks, I finally got a new release built > from sources updated on 2018-02-10 at 04:02:43 UTC > > I'm seeing several problems with this release that were not seen with my > previous installation (from last November). > > 1. Starting the gnucash program (from pkgsrc finance/gnucash) now takes >about 3 times as long as before. Even after successfully loading >the image (to get libraries etc into the file system cache) it take >more than three full minutes for the program to initialize. > > 2. Whenever I try to shutdown the system, I get a networking-related >panic. The following is manually transcribed: > > trap type 4 code 0 rip 0x802d3f75 cs 0x8 rflags 0x10282 > cr2 0x77e0e931c020 ilevel 0x4 rsp 0x80090a7e3c80 > curlwp 0xe4afbb6e8700 pid 926.1 lowest kstack > 0x80090a7e0c20 > kernel: protection fault trap, code = 0 > stopped in 926.1 (avahi-daemon) at ip_setmoptions+0x237: movq > 360(%rax),%rdi > traceback: > ip_setmoptions + 0x237 > ip_rtloutput + 0x218 > udp_ctloutput + 0x82 > udp_ctloutput_wrapper + 0x2c > sosetopt + 0x67 > sys_setsockopt + 0x91 > syscall + 0x1ed (syscall #105) Is the panic fixed by the following patch? Thanks, ozaki-r diff --git a/sys/netinet/ip_output.c b/sys/netinet/ip_output.c index 44d8032f387..2e5e346af91 100644 --- a/sys/netinet/ip_output.c +++ b/sys/netinet/ip_output.c @@ -1927,9 +1927,13 @@ ip_drop_membership(struct ip_moptions *imo, const struct sockopt *sopt) * Give up the multicast address record to which the * membership points. */ - IFNET_LOCK(imo->imo_membership[i]->inm_ifp); +{ + struct ifnet *inm_ifp = imo->imo_membership[i]->inm_ifp; + IFNET_LOCK(inm_ifp); in_delmulti(imo->imo_membership[i]); - IFNET_UNLOCK(imo->imo_membership[i]->inm_ifp); + /* ifp should not leave thanks to solock */ + IFNET_UNLOCK(inm_ifp); +} /* * Remove the gap in the membership array.
Re: 8.99.12 panic [Re: two 8.99.9 panics]
On Thu, Jan 11, 2018 at 6:01 PM, Thomas Klausnerwrote: > On Wed, Jan 10, 2018 at 01:40:52PM -0500, Christos Zoulas wrote: >> On Jan 10, 7:32pm, t...@giga.or.at (Thomas Klausner) wrote: >> -- Subject: Re: 8.99.12 panic [Re: two 8.99.9 panics] >> >> | On Wed, Jan 10, 2018 at 12:36:24PM -0500, Christos Zoulas wrote: >> | > On Jan 10, 3:45pm, t...@giga.or.at (Thomas Klausner) wrote: >> | > -- Subject: 8.99.12 panic [Re: two 8.99.9 panics] >> | > >> | > | I was told to try 8.99.12 (head as of ~20 minutes ago) and I quickly >> | > | saw the second panic again. >> | > >> | > CVS update. >> | >> | I used your version, not ozaki-r's, rebooted and went away. >> | 20 minutes later: >> >> Ozaki's version is probably going to work better :-) > > The machine survived half a day including a bulk build, so "yes". Thanks! (I'll send a pullup request of the fix to netbsd-8.) ozaki-r
Re: 8.99.12 panic [Re: two 8.99.9 panics]
On Wed, Jan 10, 2018 at 11:45 PM, Thomas Klausnerwrote: > I was told to try 8.99.12 (head as of ~20 minutes ago) and I quickly > saw the second panic again. > > Jan 10 15:42:23 yt /netbsd: panic: prevented null pointer dereference at > 0x360 (SMAP) > Jan 10 15:42:23 yt /netbsd: cpu8: Begin traceback... > Jan 10 15:42:23 yt /netbsd: vpanic() at netbsd:vpanic+0x140 > Jan 10 15:42:23 yt /netbsd: snprintf() at netbsd:snprintf > Jan 10 15:42:23 yt /netbsd: trap() at netbsd:trap+0xc15 > Jan 10 15:42:23 yt /netbsd: --- trap (number 6) --- > Jan 10 15:42:23 yt /netbsd: ip_setmoptions() at netbsd:ip_setmoptions+0x1ff > Jan 10 15:42:23 yt /netbsd: ip_ctloutput() at netbsd:ip_ctloutput+0x260 > Jan 10 15:42:23 yt /netbsd: udp_ctloutput() at netbsd:udp_ctloutput+0x82 > Jan 10 15:42:23 yt /netbsd: udp_ctloutput_wrapper() at > netbsd:udp_ctloutput_wrapper+0x2c > Jan 10 15:42:23 yt /netbsd: sosetopt() at netbsd:sosetopt+0x67 > Jan 10 15:42:23 yt /netbsd: sys_setsockopt() at netbsd:sys_setsockopt+0x91 > Jan 10 15:42:23 yt /netbsd: syscall() at netbsd:syscall+0x1d8 > Jan 10 15:42:23 yt /netbsd: --- syscall (number 105) --- > Jan 10 15:42:23 yt /netbsd: 765ca20cde5a: > Jan 10 15:42:23 yt /netbsd: cpu8: End traceback... > > Reduced workload: just syncthing and mercurial self tests. > Thomas Does the following patch help you? Thanks, ozaki-r diff --git a/sys/netinet/ip_output.c b/sys/netinet/ip_output.c index d643aef71c8..4e70cc7ad94 100644 --- a/sys/netinet/ip_output.c +++ b/sys/netinet/ip_output.c @@ -1905,9 +1905,9 @@ ip_drop_membership(struct ip_moptions *imo, const struct sockopt *sopt) * Give up the multicast address record to which the * membership points. */ - IFNET_LOCK(ifp); + IFNET_LOCK(imo->imo_membership[i]->inm_ifp); in_delmulti(imo->imo_membership[i]); - IFNET_UNLOCK(ifp); + IFNET_UNLOCK(imo->imo_membership[i]->inm_ifp); /* * Remove the gap in the membership array.
Re: Strange new test failures
On Sun, Jan 7, 2018 at 6:29 AM, Martin Husemannwrote: > We should have 0 unexpected failures, but the latest run got: > > Failed test cases: > lib/librumphijack/t_config:fdoff, lib/librumphijack/t_tcpip:http, > lib/librumphijack/t_tcpip:nfs, lib/librumphijack/t_tcpip:ssh, > lib/librumphijack/t_vfs:cpcopy, lib/librumphijack/t_vfs:mv_x, > lib/librumphijack/t_vfs:paxcopy, > net/net/t_forwarding:ipforwarding_fastforward_v4, > net/net/t_forwarding:ipforwarding_fastforward_v6, > net/net/t_forwarding:ipforwarding_fragment_v4, > net/net/t_forwarding:ipforwarding_misc, net/net/t_mtudisc6:mtudisc6_basic, > net/ipsec/t_ipsec_tunnel_odd:ipsec_tunnel_v6v4_ah_aesxcbcmac, > net/ipsec/t_ipsec_tunnel_odd:ipsec_tunnel_v6v4_ah_hmacmd5, > net/ipsec/t_ipsec_tunnel_odd:ipsec_tunnel_v6v4_ah_hmacripemd160, > net/ipsec/t_ipsec_tunnel_odd:ipsec_tunnel_v6v4_ah_hmacsha1, > net/ipsec/t_ipsec_tunnel_odd:ipsec_tunnel_v6v4_ah_hmacsha256, > net/ipsec/t_ipsec_tunnel_odd:ipsec_tunnel_v6v4_ah_hmacsha384, > net/ipsec/t_ipsec_tunnel_odd:ipsec_tunnel_v6v4_ah_hmacsha512, > net/ipsec/t_ipsec_tunnel_odd:ipsec_tunnel_v6v4_ah_keyedmd5, > net/ipsec/t_ipsec_tunnel_odd:ipsec_tunnel_v6v4_ah_keyedsha1, > net/ipsec/t_ipsec_tunnel_odd:ipsec_tunnel_v6v4_ah_null, > net/npf/t_npf:npf_state, fs/nfs/t_rquotad:get_nfs_be_1_both, > fs/nfs/t_rquotad:get_nfs_be_1_group, fs/nfs/t_rquotad:get_nfs_be_1_user, > fs/nfs/t_rquotad:get_nfs_le_1_both, fs/nfs/t_rquotad:get_nfs_le_1_group, > fs/nfs/t_rquotad:get_nfs_le_1_user > > > See http://www.netbsd.org/~martin/sparc64-atf for details. > > Anyone recognize their latest changes as potential culprit? My fixes of use-after-free of mbuf should fix ipsec failures, but I don't know about other failures. You may need to clean-build rump libraries (I needed). ozaki-r
Re: Automated report: NetBSD-current/i386 build failure
On Thu, Dec 28, 2017 at 4:39 PM, NetBSD Test Fixturewrote: > This is an automatically generated notice of a NetBSD-current/i386 > build failure. > > The failure occurred on babylon5.netbsd.org, a NetBSD/amd64 host, > using sources from CVS date 2017.12.28.06.13.50. > > An extract from the build.sh output follows: > > --- side.o --- > # compile diff/side.o > /tmp/bracket/build/2017.12.28.06.13.50-i386/tools/bin/i486--netbsdelf-gcc > -O2 -std=gnu99-Werror -fPIE -DLOCALEDIR=\"/usr/share/locale\" > -DHAVE_CONFIG_H > -I/tmp/bracket/build/2017.12.28.06.13.50-i386/src/external/gpl2/diffutils/dist/../include > > -I/tmp/bracket/build/2017.12.28.06.13.50-i386/src/external/gpl2/diffutils/dist/lib > --sysroot=/tmp/bracket/build/2017.12.28.06.13.50-i386/destdir -c > /tmp/bracket/build/2017.12.28.06.13.50-i386/src/external/gpl2/diffutils/dist/src/side.c > --- dependall-gettext --- > /tmp/bracket/build/2017.12.28.06.13.50-i386/tools/bin/nbctfmerge -t -g -L > VERSION -o msgexec msgexec.o > --- dependall-msgfmt --- > dependall ===> external/gpl2/gettext/bin/msgfmt > --- dependall-usr.bin --- > /tmp/bracket/build/2017.12.28.06.13.50-i386/tools/bin/nbctfconvert -g -L > VERSION lstIsEmpty.o > --- lstLast.o --- > # compile make/lstLast.o > /tmp/bracket/build/2017.12.28.06.13.50-i386/tools/bin/i486--netbsdelf-gcc > -O2 -fPIE-std=gnu99-Wall -Wstrict-prototypes -Wmissing-prototypes > -Wpointer-arith -Wno-sign-compare -Wsystem-headers -Wno-traditional > -Wa,--fatal-warnings -Wreturn-type -Wswitch -Wshadow -Wcast-qual > -Wwrite-strings -Wextra -Wno-unused-parameter -Wno-sign-compare > -Wold-style-definition -Wsign-compare -Wformat=2 -Wno-format-zero-length > -Werror-DUSE_META > --sysroot=/tmp/bracket/build/2017.12.28.06.13.50-i386/destdir -DMAKE_NATIVE > -DUSE_EMALLOC -c > /tmp/bracket/build/2017.12.28.06.13.50-i386/src/usr.bin/make/lst.lib/lstLast.c > --- dependall-usr.sbin --- > /tmp/bracket/build/2017.12.28.06.13.50-i386/tools/bin/nbctfconvert -g -L > VERSION msdosfs_vnops.o > --- msdosfs_unicode.o --- > # compile makefs/msdosfs_unicode.o > /tmp/bracket/build/2017.12.28.06.13.50-i386/tools/bin/i486--netbsdelf-gcc > -O2 -fPIE-std=gnu99-Wall -Wstrict-prototypes -Wmissing-prototypes > -Wpointer-arith -Wno-sign-compare -Wsystem-headers -Wno-traditional > -Wa,--fatal-warnings -Wreturn-type -Wswitch -Wshadow -Wcast-qual > -Wwrite-strings -Wextra -Wno-unused-parameter -Wno-sign-compare > -Wold-style-definition -Wsign-compare -Wformat=2 -Wno-format-zero-length > -Werror--sysroot=/tmp/bracket/build/2017.12.28.06.13.50-i386/destdir > -I/tmp/bracket/build/2017.12.28.06.13.50-i386/src/usr.sbin/makefs > -I/tmp/bracket/build/2017.12.28.06.13.50-i386/src/sbin/mknod > -I/tmp/bracket/build/2017.12.28.06.13.50-i386/src/usr.sbin/mtree -DMAKEFS > -I/tmp/bracket/build/2017.12.28.06.13.50-i386/src/sys/fs/cd9660 > -I/tmp/bracket/build/2017.12.28.06.13.50-i386/src/sys/ufs/chfs -DV7FS_EI > -I/tmp/bracket/build/2017.12.28.06.13.50-i386/src/sys/fs/v7fs > -I/tmp/bracket/build/2017.12.28.06.13.50-i386/src/sbin/newfs_v7fs > -I/tmp/bracket/build/2017.12. > 28.06.13.50-i386/src/sbin/fsck -DMSDOS_EI > -I/tmp/bracket/build/2017.12.28.06.13.50-i386/src/sys/fs/msdosfs > -I/tmp/bracket/build/2017.12.28.06.13.50-i386/src/sbin/newfs_msdos > -I/tmp/bracket/build/2017.12.28.06.13.50-i386/src/sys > -I/tmp/bracket/build/2017.12.28.06.13.50-i386/src/sys/fs/udf > -I/tmp/bracket/build/2017.12.28.06.13.50-i386/src/sbin/newfs_udf > -I/tmp/bracket/build/2017.12.28.06.13.50-i386/src/sbin/fsck -D_KERNTYPES -c > > /tmp/bracket/build/2017.12.28.06.13.50-i386/src/sys/fs/msdosfs/msdosfs_unicode.c > --- dependall-tests --- > --- dependall-rump --- > cc1: all warnings being treated as errors > *** [workqueue.o] Error code 1 > nbmake[8]: stopped in > /tmp/bracket/build/2017.12.28.06.13.50-i386/src/tests/rump/kernspace > 1 error > > The following commits were made between the last successful build and > the failed build: > > 2017.12.28.04.36.15 ozaki-r src/tests/rump/kernspace/workqueue.c,v 1.2 > 2017.12.28.04.38.02 ozaki-r src/tests/rump/kernspace/workqueue.c,v 1.3 > 2017.12.28.05.43.42 msaitoh src/sys/dev/pci/xhci_pci.c,v 1.11 > 2017.12.28.06.10.01 msaitoh src/sys/dev/pci/ixgbe/ixgbe.c,v 1.119 > 2017.12.28.06.13.50 msaitoh src/sys/dev/pci/if_wm.c,v 1.550 Fixed. ozaki-r
Re: Automated report: NetBSD-current/i386 test failure
On Wed, Dec 13, 2017 at 9:27 PM, Andreas Gustafssonwrote: > NetBSD Test Fixture wrote: >> The newly failing test cases are: >> >> fs/vfs/t_mtime_otrunc:puffs_otrunc_mtime_update >> net/route/t_change:route_change_ifp >> net/route/t_change:route_change_ifp_ifa >> >> The above tests failed in each of the last 3 test runs, and passed in >> at least 27 consecutive runs before that. >> >> The following commits were made between the last successful test and >> the failed test: >> >> 2017.12.11.02.33.17 knakahara src/sys/kern/subr_psref.c,v 1.8 >> 2017.12.11.02.33.17 knakahara src/sys/sys/psref.h,v 1.3 >> 2017.12.11.03.25.45 ozaki-r src/sys/net/if.c,v 1.412 >> 2017.12.11.03.25.45 ozaki-r src/sys/net/if.h,v 1.252 >> 2017.12.11.03.25.46 ozaki-r src/sys/net/npf/npf_ifaddr.c,v 1.3 >> 2017.12.11.03.25.46 ozaki-r src/sys/net/npf/npf_os.c,v 1.9 >> 2017.12.11.03.29.20 ozaki-r src/sys/net/if.c,v 1.413 >> 2017.12.11.03.29.20 ozaki-r src/sys/net/if.h,v 1.253 >> 2017.12.11.03.29.20 ozaki-r src/sys/net/if_bridge.c,v 1.145 >> 2017.12.11.03.29.20 ozaki-r src/sys/net/if_spppsubr.c,v 1.177 >> 2017.12.11.03.29.20 ozaki-r src/sys/net/if_vlan.c,v 1.119 >> 2017.12.11.03.29.20 ozaki-r >> src/sys/rump/net/lib/libnetinet/netinet_component.c,v 1.10 > > Also, the tests now leave two rump_server processes looping in the > background: > > UID PID PPIDCPU PRI NI VSZ RSS WCHAN STAT TTYTIME > COMMAND > 0 44331 310129 27 0 74696 4024 - Rsl ? 194:32.70 > rump_server -lrumpdev -lrumpnet -lrumpnet_net -lrumpnet_netin > 0 203561 666580 27 0 73884 4024 - Rsl ? 216:32.72 > rump_server -lrumpdev -lrumpnet -lrumpnet_net -lrumpnet_netin > > This is slowing down the test VMs enough to make some of the test runs > time out. > -- > Andreas Gustafsson, g...@gson.org Fixed in -current. The cause of the failures was a bug which calls psref_release to an ifa twice in rtsock.c, not due to the change to psref. It seems that changing from LIST to SLIST revealed the bug. LIST could resist the bug (LIST_REMOVE can be called twice to an item without errors if the list isn't modified between the removals) while SLIST couldn't. ozaki-r
Re: panic with bpf-using tool
On Tue, Dec 12, 2017 at 1:03 AM, Thomas Klausner <t...@giga.or.at> wrote: > On Thu, Dec 07, 2017 at 06:57:43PM +0900, Ryota Ozaki wrote: >> On Thu, Dec 7, 2017 at 6:54 PM, Thomas Klausner <t...@giga.or.at> wrote: >> > I just started net/trafshow for fun and shortly afterwards the machine >> > paniced (NetBSD 8.99.7/amd64): >> > >> > WARNING: SPL NOT LOWERED ON SYSCALL 1 235601568 EXIT 176106b0 7 >> > WARNING: SPL NOT LOWERED ON SYSCALL 1 235601568 EXIT 176106b0 7 >> > vpanic() at netbsd:vpanic+0x140 >> > snprintf() at netbsd:snprintf >> > lockdebug_abort() at netbsd:lockdebug_abort+0x6e >> > mutex_vector_exit() at netbsd:mutex_vector_exit+0xe4 >> > callout_halt() at netbsd:callout_halt+0xe6 >> > bpf_read() at netbsd:bpf_read+0x199 >> > dofileread() at netbsd:dofileread+0x8f >> > sys_read() at netbsd:sys_read+0x5f >> > syscall() at netbsd:syscall+0x1d8 >> > --- syscall (number 3) --- >> > 7bb169e3e1fa: >> > cpu9: End traceback... >> > >> > Any ideas? >> >> Oops. Could you try the below patch? >> >> Thanks, >> ozaki-r >> >> diff --git a/sys/net/bpf.c b/sys/net/bpf.c >> index c4bd8306042..64c9d4900bd 100644 >> --- a/sys/net/bpf.c >> +++ b/sys/net/bpf.c >> @@ -662,7 +662,7 @@ bpf_read(struct file *fp, off_t *offp, struct uio *uio, >> >> mutex_enter(d->bd_mtx); >> if (d->bd_state == BPF_WAITING) >> - callout_halt(>bd_callout, d->bd_buf_mtx); >> + callout_halt(>bd_callout, d->bd_mtx); >> timed_out = (d->bd_state == BPF_TIMED_OUT); >> d->bd_state = BPF_IDLE; >> mutex_exit(d->bd_mtx); >> > > With this patch applied, trafshow runs for minutes without problems. > > Thank you! > Thomas Good :) I've committed the patch. Thanks! ozaki-r
Fails to build "ALL" kernel of amd64
Hi, "ALL" kernel of amd64 (and maybe i386 too) fails to be built for recent days (or months?). Unfortunately few people have not noticed the failure because http://build.tastylime.net/builders/ , which had been building kernels including ALL per commits, is now out of service. Of course we should fix the build though, I think we also should build the ALL kernel regularly somehow, for example build it when build.sh release (just build, not install). Thought? ozaki-r
Re: panic with bpf-using tool
On Thu, Dec 7, 2017 at 6:54 PM, Thomas Klausnerwrote: > I just started net/trafshow for fun and shortly afterwards the machine > paniced (NetBSD 8.99.7/amd64): > > WARNING: SPL NOT LOWERED ON SYSCALL 1 235601568 EXIT 176106b0 7 > WARNING: SPL NOT LOWERED ON SYSCALL 1 235601568 EXIT 176106b0 7 > vpanic() at netbsd:vpanic+0x140 > snprintf() at netbsd:snprintf > lockdebug_abort() at netbsd:lockdebug_abort+0x6e > mutex_vector_exit() at netbsd:mutex_vector_exit+0xe4 > callout_halt() at netbsd:callout_halt+0xe6 > bpf_read() at netbsd:bpf_read+0x199 > dofileread() at netbsd:dofileread+0x8f > sys_read() at netbsd:sys_read+0x5f > syscall() at netbsd:syscall+0x1d8 > --- syscall (number 3) --- > 7bb169e3e1fa: > cpu9: End traceback... > > Any ideas? Oops. Could you try the below patch? Thanks, ozaki-r diff --git a/sys/net/bpf.c b/sys/net/bpf.c index c4bd8306042..64c9d4900bd 100644 --- a/sys/net/bpf.c +++ b/sys/net/bpf.c @@ -662,7 +662,7 @@ bpf_read(struct file *fp, off_t *offp, struct uio *uio, mutex_enter(d->bd_mtx); if (d->bd_state == BPF_WAITING) - callout_halt(>bd_callout, d->bd_buf_mtx); + callout_halt(>bd_callout, d->bd_mtx); timed_out = (d->bd_state == BPF_TIMED_OUT); d->bd_state = BPF_IDLE; mutex_exit(d->bd_mtx);
Re: Automated report: NetBSD-current/i386 build failure
On Wed, Sep 27, 2017 at 6:47 PM, NetBSD Test Fixturewrote: > This is an automatically generated notice of a NetBSD-current/i386 > build failure. > > The failure occurred on babylon5.netbsd.org, a NetBSD/amd64 host, > using sources from CVS date 2017.09.27.08.14.18. > > An extract from the build.sh output follows: > > --- ieee8023ad_lacp_sm_tx.pico --- > # compile libagr/ieee8023ad_lacp_sm_tx.pico > /tmp/bracket/build/2017.09.27.08.14.18-i386/tools/bin/i486--netbsdelf-gcc > -O2 -ffreestanding -fno-strict-aliasing -msoft-float -mno-mmx -mno-sse > -mno-avx -msoft-float -mno-mmx -mno-sse -mno-avx -std=gnu99-Wall > -Wstrict-prototypes -Wmissing-prototypes -Wpointer-arith -Wno-sign-compare > -Wsystem-headers -Wno-traditional -Wa,--fatal-warnings -Wreturn-type > -Wswitch -Wshadow -Wcast-qual -Wwrite-strings -Wextra -Wno-unused-parameter > -Wno-sign-compare -Werror -Wno-format-zero-length -Wno-pointer-sign -fPIE > -fstack-protector -Wstack-protector --param ssp-buffer-size=1 > --sysroot=/tmp/bracket/build/2017.09.27.08.14.18-i386/destdir -DCOMPAT_50 > -DCOMPAT_60 -DCOMPAT_70 -nostdinc -imacros > /tmp/bracket/build/2017.09.27.08.14.18-i386/src/sys/rump/net/lib/libagr/../../../include/opt/opt_rumpkernel.h > -I/tmp/bracket/build/2017.09.27.08.14.18-i386/src/sys/rump/net/lib/libagr > -I. > -I/tmp/bracket/build/2017.09.27.08.14.18-i386/src/sys/rump/net/lib/libagr/../../../../../common/include > - > > I/tmp/bracket/build/2017.09.27.08.14.18-i386/src/sys/rump/net/lib/libagr/../../../include > > -I/tmp/bracket/build/2017.09.27.08.14.18-i386/src/sys/rump/net/lib/libagr/../../../include/opt > > -I/tmp/bracket/build/2017.09.27.08.14.18-i386/src/sys/rump/net/lib/libagr/../../../../arch > > -I/tmp/bracket/build/2017.09.27.08.14.18-i386/src/sys/rump/net/lib/libagr/../../../.. > -DDIAGNOSTIC -DKTRACE -D_FORTIFY_SOURCE=2 -c-fPIC > /tmp/bracket/build/2017.09.27.08.14.18-i386/src/sys/rump/net/lib/libagr/../../../../net/agr/ieee8023ad_lacp_sm_tx.c > -o ieee8023ad_lacp_sm_tx.pico > --- dependall-libnetipsec --- > --- key.pico --- > cc1: all warnings being treated as errors > --- dependall-libnpf --- > > uild/2017.09.27.08.14.18-i386/src/sys/rump/net/lib/libnpf/../../../include > -I/tmp/bracket/build/2017.09.27.08.14.18-i386/src/sys/rump/net/lib/libnpf/../../../include/opt > > -I/tmp/bracket/build/2017.09.27.08.14.18-i386/src/sys/rump/net/lib/libnpf/../../../../arch > > -I/tmp/bracket/build/2017.09.27.08.14.18-i386/src/sys/rump/net/lib/libnpf/../../../.. > -DDIAGNOSTIC -DKTRACE -D_FORTIFY_SOURCE=2 -c -DGPROF -DPROF-pg > /tmp/bracket/build/2017.09.27.08.14.18-i386/src/sys/rump/net/lib/libnpf/../../../../net/npf/npf_tableset.c > -o npf_tableset.po > --- dependall-libnetipsec --- > --- keysock.po --- > /tmp/bracket/build/2017.09.27.08.14.18-i386/tools/bin/nbctfconvert -g -L > VERSION keysock.po > > /tmp/bracket/build/2017.09.27.08.14.18-i386/tools/bin/i486--netbsdelf-objcopy > -X keysock.po > --- dependall-libnetbt --- > --- rfcomm_socket.po --- > # compile libnetbt/rfcomm_socket.po > /tmp/bracket/build/2017.09.27.08.14.18-i386/tools/bin/i486--netbsdelf-gcc > -O2 -ffreestanding -fno-strict-aliasing -msoft-float -mno-mmx -mno-sse > -mno-avx -msoft-float -mno-mmx -mno-sse -mno-avx -std=gnu99-Wall > -Wstrict-prototypes -Wmissing-prototypes -Wpointer-arith -Wno-sign-compare > -Wsystem-headers -Wno-traditional -Wa,--fatal-warnings -Wreturn-type > -Wswitch -Wshadow -Wcast-qual -Wwrite-strings -Wextra -Wno-unused-parameter > -Wno-sign-compare -Werror -Wno-format-zero-length -Wno-pointer-sign -fPIE > -fstack-protector -Wstack-protector --param ssp-buffer-size=1 > --sysroot=/tmp/bracket/build/2017.09.27.08.14.18-i386/destdir -DCOMPAT_50 > -DCOMPAT_60 -DCOMPAT_70 -nostdinc -imacros > /tmp/bracket/build/2017.09.27.08.14.18-i386/src/sys/rump/net/lib/libnetbt/../../../include/opt/opt_rumpkernel.h > -I/tmp/bracket/build/2017.09.27.08.14.18-i386/src/sys/rump/net/lib/libnetbt > -I. > -I/tmp/bracket/build/2017.09.27.08.14.18-i386/src/sys/rump/net/lib/libnetbt/../../../../../common/inc > lude > -I/tmp/bracket/build/2017.09.27.08.14.18-i386/src/sys/rump/net/lib/libnetbt/../../../include > > -I/tmp/bracket/build/2017.09.27.08.14.18-i386/src/sys/rump/net/lib/libnetbt/../../../include/opt > > -I/tmp/bracket/build/2017.09.27.08.14.18-i386/src/sys/rump/net/lib/libnetbt/../../../../arch > > -I/tmp/bracket/build/2017.09.27.08.14.18-i386/src/sys/rump/net/lib/libnetbt/../../../.. > -DDIAGNOSTIC -DKTRACE -D_FORTIFY_SOURCE=2 -c -DGPROF -DPROF-pg > /tmp/bracket/build/2017.09.27.08.14.18-i386/src/sys/rump/net/lib/libnetbt/../../../../netbt/rfcomm_socket.c > -o rfcomm_socket.po > --- dependall-libnpf --- > --- npf_conndb.po --- > --- dependall-libnetipsec --- > --- key.pico --- > *** [key.pico] Error code 1 > nbmake[8]: stopped in >
Re: Using NET_MPSAFE
I'm sorry for late replying. On Wed, Aug 9, 2017 at 8:53 AM, Brian Buhrowwrote: > hello Ryota San. Thank you for your detailed response. Yes, I'm > interested in using NetBSD as a router. We use NetBSD-5 as router > devices and find it quite reliable, but for higher speed applications, > we're running into the cpu0 takes all interrupts bottleneck. I can't > promise to deliver anything in a timely manner, but I can look at the > agr(4) driver and see if I can MP-ify it. My co-worker would also work on MP-ification of agr in a few months. Of course your help still welcome for us :) > Are there any notes in English > describing the basic procedures for MP-ifying a driver? I've done a bit of > it for drivers under NetBSD-5, so it's not completely foreign to me. My presentation at BSDCan 2017 includes rough procedures: http://www.netbsd.org/gallery/presentations/ozaki-r/2017_BSDCan/BSDCan2017-ozaki-nakahara.pdf There is no detailed documentation for MP-ification. You still need to reference source codes and change logs of existing MP-safe components such as gif(4) and l2tp(4) to know how to MP-ify a component. > We also use pf(4) extensively. Unfortunately, npf(4) doesn't have all > the functionality we need to implement the configurations we use. > Consequently, it may be necessary to MP-ify pf(4) as well, as I suspect > that's easier than implementing its functionality in npf(4). Personally I recommend to extend nfp because it's already MP-safe and maintained better than pf and ipf, but MP-ifying pf is welcome anyway. ozaki-r
Re: Using NET_MPSAFE
Hi, On Sat, Aug 5, 2017 at 3:25 AM, Brian Buhrowwrote: > hello. I'm excited to see the development of the MP-safe network > stack in NetBSD. Now that some progress has been made in that regard and > there are MP-safe drivers and stack components to use, I have some > questions. I'm interested in using options NET_MPSAFE in NetBSD-8.0_BETA > and the eventual netbsd-8 release. Here are my questions. I apologize if > some of them seem obvious, but I don't want to make any assumptions when > trying this new stuff. First of all, the primary target of the work is routers. So if you use NetBSD as clients or servers, you may not gain benefits from the work. For such users, we're looking for someone who works on MP-safe Layer 4 :) > > 1. If I enable NET_MPSAFE in the kernel, will non-MP-ify'd components work > in that kernel using the kernel lock? In other words, if I enable > NET_MPSAFE and use the wm(4) driver, I'll get MP performance out of the > network stack. However, what if I try to use a non-MP-ify'd component on > that same machine, i.e. agr(4) or pf(4)? It looks to me like things should > work, but traffic through the non-MP-ify'd components will be single > threaded. Is this correct? Nope, unfortunately. Non-MP-safe components need to be protected somehow (probably by adding KERNEL_LOCK to the entrances of the component) if NET_MPSAFE is enabled. That's why NET_MPSAFE is not enabled by default. We're looking for someone who works on the tasks too. Nonetheless, some non-MP-safe components luckily work even if NET_MPSAFE is enabled. For example CARP isn't MP-safe yet however it works pretty stable with NET_MPSAFE thanks to the big lock for the network stack (softnet_lock). My dogfooding router works with it for several months without any issues. FYI: you can check the lists of MP-safe/non-MP-safe components at: https://nxr.netbsd.org/xref/src/doc/TODO.smpnet > > 2. Am I correct that when NET_MPSAFE is turned on, the network stack is > runing as an LWP inside the kernel? No. > And, am I correct that this means that > even if a particular network component is single-threaded, it's able to > execute on any CPU, thus reducing CPU congestion on CPU0 as happens on the > stock NetBSD kernels? NetBSD doesn't have dedicated threads for network components (except for timers). For transmissions from a userland program, the network stack runs in a LWP of the program. For receptions, the network stack runs in some of software interrupt contexts. In any cases, the big locks (KERNEL_LOCK and softnet_lock) prevents such contexts from running in parallel. NET_MPSAFE option gets rid of (some of) the big locks and thus the network stack runs in parallel on multiple CPUs. NET_MPSAFE doesn't remove the big locks for transmissions from userland, so sending packets don't run in parallel. For packet receptions and forwarding, NET_MPSAFE remove the big locks and packet processing runs in parallel. If you use one of MP-safe network device drivers such as wm(4), NET_MPSAFE enables the hardware multi-queue feature and incoming packets are delivered to multiple CPUs. If you use non-MP-safe drivers, all packets are delivered to CPU0 and no packet processing runs in parallel even if NET_MPSAFE is enabled. > > 3. How stable is the NET_MPSAFE stack? Is anyone using it in any sort of > production environment? > the BSDCAN paper I read suggests it's pretty stable, but I'm wondering if > anyone can report their experience. We (IIJ) are working on making the network stack with NET_MPSAFE stable enough for productions. What I can say now is that if you use only MP-safe network components it should be stable. Regards, ozaki-r
Re: panic in wqinput_input
On Thu, May 4, 2017 at 6:31 AM, Thomas Klausner <t...@giga.or.at> wrote: > On Wed, May 03, 2017 at 05:11:12PM +0900, Ryota Ozaki wrote: >> On Wed, May 3, 2017 at 4:16 PM, Thomas Klausner <t...@giga.or.at> wrote: >> > Hi! >> > >> > Last night my 7.99.67/amd64 rebooted after this panic: >> > >> > fatal page fault in supervisor mode >> > trap type 6 code 0x2 rip 0x80a815b6 cs 0x8 rflags 0x10286 cr2 0 >> > ilevel 0x4 rsp 0xfe813a414dc0 >> > curlwp 0xfe882df26420 pid 0.3 lowest kstack 0xfe813a4112c0 >> > panic: trap >> > cpu0: Begin traceback... >> > vpanic() at netbsd:vpanic+0x140 >> > snprintf() at netbsd:snprintf >> > trap() at netbsd:trap+0xc6b >> > --- trap (number 6) --- >> > wqinput_input() at netbsd:wqinput_input+0x43 >> > icmp6_input() at netbsd:icmp6_input+0x17 >> > ip6_input() at netbsd:ip6_input+0x6cb >> > ip6intr() at netbsd:ip6intr+0x71 >> > softint_dispatch() at netbsd:softint_dispatch+0xd3 >> > DDB lost frame for netbsd:Xsoftintr+0x4f, trying 0xfe813a414ff0 >> > Xsoftintr() at netbsd:Xsoftintr+0x4f >> > --- interrupt --- >> > 0: >> > cpu0: End traceback... >> > >> > It probably was under high memory load, so it could be an >> > out-of-memory situation. >> > >> > Crash dump is available. >> > Thomas >> >> I think the below diff fixes the panic. >> >> Could you confirm that the variable "work" was NULL by >> inspecting the crash dump? > > Thank you for your reply and patch. I'm sorry for not replying this mail. > > I'm sorry, it seems something went wrong with the crash dump, it looks > like this: Hmm, I'm not sure why the crash dump is broken. Anyway my patch should be necessary and likely to fix the panic, so I committed the patch. If the same panic would happen again, please let me know. Thanks, ozaki-r > > (gdb) target kvm netbsd.107.core > 0x80219f55 in cpu_reboot (howto=howto@entry=260, > bootstr=bootstr@entry=0x0) at /usr/src/sys/arch/amd64/amd64/machdep.c:674 > 674 dumpsys(); > (gdb) bt > #0 0x80219f55 in cpu_reboot (howto=howto@entry=260, > bootstr=bootstr@entry=0x0) at /usr/src/sys/arch/amd64/amd64/machdep.c:674 > #1 0x809a867c in vpanic (fmt=fmt@entry=0x810d5762 "trap", > ap=ap@entry=0xfe813b390888) at /usr/src/sys/kern/subr_prf.c:342 > #2 0x809a8730 in panic (fmt=fmt@entry=0x810d5762 "trap") at > /usr/src/sys/kern/subr_prf.c:258 > #3 0x8021bb86 in trap (frame=0xfe813b3909c0) at > /usr/src/sys/arch/amd64/amd64/trap.c:297 > #4 0x8020113e in alltraps () > #5 0x8021ba0f in trap (frame=0xfe813b390b90) at > /usr/src/sys/arch/amd64/amd64/trap.c:345 > #6 0x8020113e in alltraps () > #7 0x81dc7033 in ?? () > #8 0xfe81046a in ?? () > #9 0xfe870023 in ?? () > #10 0xfe813b390cf0 in ?? () > #11 0x80445829 in usbd_fill_deviceinfo (dev=0xfe839b9bdab0, > di=0xfe86c3e7b020, usedev=0) at /usr/src/sys/dev/usb/usb_subr.c:1507 > Backtrace stopped: frame did not save the PC > (gdb) > > "thread apply all bt" makes gdb crash, as does "thread 2.1", so I > don't know how to find backtraces for other active threads. > > (gdb) thread apply all bt > > Thread 2.1 (): > /usr/src/external/gpl3/gdb/lib/libgdb/../../dist/gdb/gdbarch.c:4884: > internal-error: gdbarch_addressable_memory_unit_size: Assertion `gdbarch != > NULL' failed. > A problem internal to GDB has been detected, > further debugging may prove unreliable. > Quit this debugging session? (y or n) y > > This is a bug, please report it. For instructions, see: > <http://www.gnu.org/software/gdb/bugs/>. > > /usr/src/external/gpl3/gdb/lib/libgdb/../../dist/gdb/gdbarch.c:4884: > internal-error: gdbarch_addressable_memory_unit_size: Assertion `gdbarch != > NULL' failed. > A problem internal to GDB has been detected, > further debugging may prove unreliable. > Create a core file of GDB? (y or n) y > Abort (core dumped) > > (gdb) thread 2.1 > [Switching to thread 2.1 ()] > /usr/src/external/gpl3/gdb/lib/libgdb/../../dist/gdb/gdbarch.c:4884: > internal-error: gdbarch_addressable_memory_unit_size: Assertion `gdbarch != > NULL' failed. > A problem internal to GDB has been detected, > further debugging may prove unreliable. > Quit this debugging session? (y or n) > ... > > > gdb backtrace for the first case: > > (gdb) bt > #0 0x7f61c731cd8a in _lwp_kill () from /usr/lib
Re: re(4) trouble
On Wed, Mar 8, 2017 at 6:42 PM, Patrick Welchewrote: > On Fri, Mar 03, 2017 at 05:31:30PM +, Patrick Welche wrote: >> Netbooted a new box with this morning's -current/amd64, so its >> network interface is successfully configured. ftping the sets >> failed. I just typed dhcpcd to check, and: > ... >> mounted the disks, tried ftp again - 100% packet loss. Tried dhcpcd -k, and >> it didn't return - hang. >> >> re1 at pci4 dev 0 function 0: RealTek 8168/8111 PCIe Gigabit Ethernet (rev. >> 0x07) >> re1: interrupting at msi4 vec 0 >> re1: Ethernet address 00:01:2e:67:bc:68 >> re1: using 256 tx descriptors >> rgephy1 at re1 phy 7: RTL8169S/8110S/8211 1000BASE-T media interface, rev. 5 >> rgephy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, >> 1000baseT-FDX, auto >> >> is the interface involved... > > Guessing fixed by ozaki-r > > http://mail-index.netbsd.org/current-users/2017/03/05/msg031261.html Oh I'm sorry for not responding to the report. I agree your guess. Thank you for the report, ozaki-r
Re: A few crashes with yesterday's amd64-current -- IPv6 related?
On Wed, Mar 8, 2017 at 12:32 AM, Tom Ivar Helbekkmowrote: > Tom Ivar Helbekkmo writes: > >> It seems rather unwilling to crash again. It's possible, I guess, >> that the last one was actually due to the corrupted file system, which >> I fixed. I'll keep trying, anyway. > > It seems I didn't entirely fix it, and that was, indeed, it. I still > haven't been able to fix it completely, but with the troublesome inode > sitting as an empty file in /lost+found, things seem stable. I guess > I'll have to boot from CD, and dump, newfs and restore the file system. > (I used fsdb to fake a link count of 1 on the inode, so that fsck put it > in lost+found. Just clearing it made the problem come back next boot.) > > Heads up to those others who've experienced fs crashes, in other words. The issue of ffs has been fixed in PR kern/52045. > > However, with the fixes committed by ozaki-r, it seems the networking > troubles with re devices have been solved, which is very nice. :) Good :) ozaki-r
Re: A few crashes with yesterday's amd64-current -- IPv6 related?
On Tue, Mar 7, 2017 at 3:05 AM, Tom Ivar Helbekkmo <t...@hamartun.priv.no> wrote: > Ryota Ozaki <ozak...@netbsd.org> writes: >> <t...@hamartun.priv.no> wrote: >>> No idea, I'm afraid. I'll just have to provoke another one. > > It seems rather unwilling to crash again. It's possible, I guess, that > the last one was actually due to the corrupted file system, which I > fixed. I'll keep trying, anyway. Thanks. I also encountered the ffs panic. It seems to happen on bootup after a kernel panicked and a filesystem became unclean. It's not related to neither NFS nor re(4). I guess it's because of a recent ffs/vfs change. > >> BTW sysctl -w ddb.onpanic=1 may help to avoid losing the first crash. > > I've been wondering about that: will it do the right thing if the system > crashes while I'm using X on the console? I assume it has to switch to > a text-based virtual console, and then run the debugger on that? Oh, I have no idea when using X because I don't use X daily. I guess you need to switch to a console beforehand to see DDB on a kernel panic. ozaki-r
Re: A few crashes with yesterday's amd64-current -- IPv6 related?
On Mon, Mar 6, 2017 at 12:24 PM, Tom Ivar Helbekkmo <t...@hamartun.priv.no> wrote: > Ryota Ozaki <ozak...@netbsd.org> writes: > >> Hmm. Where did the first crash happen? In re(4) or NFS or ffs? > > No idea, I'm afraid. I'll just have to provoke another one. NP, thanks. BTW sysctl -w ddb.onpanic=1 may help to avoid losing the first crash. ozaki-r
Re: A few crashes with yesterday's amd64-current -- IPv6 related?
On Mon, Mar 6, 2017 at 4:31 AM, Tom Ivar Helbekkmowrote: > Tom Ivar Helbekkmo writes: > >> Have done so, and have been building stuff for two or three hours with >> no accidents. Looking good so far, in other words. > > It crashed just now, though. Unfortunately, it crashed again during > boot, and ended up overwriting the old core dump before saving it. :( Hmm. Where did the first crash happen? In re(4) or NFS or ffs? ozaki-r
Re: A few crashes with yesterday's amd64-current -- IPv6 related?
On Sun, Mar 5, 2017 at 8:18 PM, Ryota Ozaki <ozak...@netbsd.org> wrote: > Hi Tom > > Thank you for the reports. > > > On Sun, Mar 5, 2017 at 6:59 PM, Tom Ivar Helbekkmo <t...@hamartun.priv.no> > wrote: >> I updated again yesterday, and it seems at least one stability issue >> has been introduced since 7.99.59, which I was running before this. >> >> The first crash came when I was trying to shut down to single user after >> booting the new kernel with the existing userland. I *think* it was >> triggered by the kernel missing the correct module directory; I caught a >> glimpse of it trying to access a module to connect to the console, and I >> later discovered that my ttys file had console enabled instead of ttyE0: >> >> panic: kernel diagnostic assertion "(kpreempt_disabled() || cpu_softintr_p() >> || ISSET(curlwp->l_pflag, LP_BOUND))" failed: file >> "/usr/src/sys/kern/subr_psref.c", line 291 passive references are CPU-local, >> but preemption is enabled and the caller is not in a softint or CPU-bound LWP >> cpu1: Begin traceback... >> vpanic() at netbsd:vpanic+0x140 >> ch_voltag_convert_in() at netbsd:ch_voltag_convert_in >> psref_release() at netbsd:psref_release+0xf8 >> ip_setmoptions() at netbsd:ip_setmoptions+0x269 >> ip_ctloutput() at netbsd:ip_ctloutput+0x1ee >> rip_ctloutput() at netbsd:rip_ctloutput+0xee >> rip_ctloutput_wrapper() at netbsd:rip_ctloutput_wrapper+0x2c >> sosetopt() at netbsd:sosetopt+0x67 >> sys_setsockopt() at netbsd:sys_setsockopt+0x91 >> syscall() at netbsd:syscall+0x1d8 >> --- syscall (number 105) --- >> 7eb0dacdb16a: >> cpu1: End traceback... > > I fixed the panic in -current. > >> >> Then it crashed during boot, seemingly related to fsck: >> >> panic: ffs_sync: rofs mod, fs=/ >> cpu0: Begin traceback... >> vpanic() at netbsd:vpanic+0x140 >> snprintf() at netbsd:snprintf >> ffs_sync() at netbsd:ffs_sync+0x26b >> VFS_SYNC() at netbsd:VFS_SYNC+0x1c >> sched_sync() at netbsd:sched_sync+0x27b >> cpu0: End traceback... >> >> Anyway, I installed the complete updated userland on the machine, and >> started updating a bunch of packages from source, with all disk activity >> over NFS over UDP over IPv6. After about three hours: >> >> panic: kernel diagnostic assertion "txq->txq_mbuf != NULL" failed: file >> "/usr/src/sys/dev/ic/rtl8169.c", line 1380 >> cpu0: Begin traceback... >> vpanic() at netbsd:vpanic+0x140 >> ch_voltag_convert_in() at netbsd:ch_voltag_convert_in >> re_txeof() at netbsd:re_txeof+0x250 >> re_intr() at netbsd:re_intr+0x11b >> intr_biglock_wrapper() at netbsd:intr_biglock_wrapper+0x1d >> Xintr_ioapic_edge19() at netbsd:Xintr_ioapic_edge19+0xee >> --- interrupt --- >> x86_mwait() at netbsd:x86_mwait+0xd >> acpicpu_cstate_idle_enter() at netbsd:acpicpu_cstate_idle_enter+0xdb >> acpicpu_cstate_idle() at netbsd:acpicpu_cstate_idle+0xb6 >> idle_loop() at netbsd:idle_loop+0x18c >> cpu0: End traceback... >> uvm_fault(0xfe80cbca48c0, 0x0, 2) -> e >> fatal page fault in supervisor mode >> trap type 6 code 2 rip 8095500b cs 8 rflags 10282 cr2 84 ilevel 8 >> rsp fe8040afea80 >> curlwp 0xfe804dedaa20 pid 20873.1 lowest kstack 0xfe8040afb2c0 >> >> Once more, it crashed during boot, just like after the first crash: >> >> panic: ffs_sync: rofs mod, fs=/ >> cpu1: Begin traceback... >> vpanic() at netbsd:vpanic+0x140 >> snprintf() at netbsd:snprintf >> ffs_sync() at netbsd:ffs_sync+0x26b >> VFS_SYNC() at netbsd:VFS_SYNC+0x1c >> sched_sync() at netbsd:sched_sync+0x27b >> cpu1: End traceback... >> >> I tried to continue building packages over NFS, but this happened again: >> >> panic: kernel diagnostic assertion "txq->txq_mbuf != NULL" failed: file >> "/usr/src/sys/dev/ic/rtl8169.c", line 1380 >> cpu0: Begin traceback... >> vpanic() at netbsd:vpanic+0x140 >> ch_voltag_convert_in() at netbsd:ch_voltag_convert_in >> re_txeof() at netbsd:re_txeof+0x250 >> re_intr() at netbsd:re_intr+0x11b >> intr_biglock_wrapper() at netbsd:intr_biglock_wrapper+0x1d >> Xintr_ioapic_edge19() at netbsd:Xintr_ioapic_edge19+0xee >> --- interrupt --- >> x86_mwait() at netbsd:x86_mwait+0xd >> acpicpu_cstate_idle_enter() at netbsd:acpicpu_cstate_idle_enter+0xdb >> acpicpu_cstate_idle() at netbsd:acpicpu_cstate_idle+0xb6 >> idle_loop() at netbsd:idle_loop+0x18c >> cpu0: End traceback... >> >> This is when I p
Re: A few crashes with yesterday's amd64-current -- IPv6 related?
Hi Tom Thank you for the reports. On Sun, Mar 5, 2017 at 6:59 PM, Tom Ivar Helbekkmowrote: > I updated again yesterday, and it seems at least one stability issue > has been introduced since 7.99.59, which I was running before this. > > The first crash came when I was trying to shut down to single user after > booting the new kernel with the existing userland. I *think* it was > triggered by the kernel missing the correct module directory; I caught a > glimpse of it trying to access a module to connect to the console, and I > later discovered that my ttys file had console enabled instead of ttyE0: > > panic: kernel diagnostic assertion "(kpreempt_disabled() || cpu_softintr_p() > || ISSET(curlwp->l_pflag, LP_BOUND))" failed: file > "/usr/src/sys/kern/subr_psref.c", line 291 passive references are CPU-local, > but preemption is enabled and the caller is not in a softint or CPU-bound LWP > cpu1: Begin traceback... > vpanic() at netbsd:vpanic+0x140 > ch_voltag_convert_in() at netbsd:ch_voltag_convert_in > psref_release() at netbsd:psref_release+0xf8 > ip_setmoptions() at netbsd:ip_setmoptions+0x269 > ip_ctloutput() at netbsd:ip_ctloutput+0x1ee > rip_ctloutput() at netbsd:rip_ctloutput+0xee > rip_ctloutput_wrapper() at netbsd:rip_ctloutput_wrapper+0x2c > sosetopt() at netbsd:sosetopt+0x67 > sys_setsockopt() at netbsd:sys_setsockopt+0x91 > syscall() at netbsd:syscall+0x1d8 > --- syscall (number 105) --- > 7eb0dacdb16a: > cpu1: End traceback... I fixed the panic in -current. > > Then it crashed during boot, seemingly related to fsck: > > panic: ffs_sync: rofs mod, fs=/ > cpu0: Begin traceback... > vpanic() at netbsd:vpanic+0x140 > snprintf() at netbsd:snprintf > ffs_sync() at netbsd:ffs_sync+0x26b > VFS_SYNC() at netbsd:VFS_SYNC+0x1c > sched_sync() at netbsd:sched_sync+0x27b > cpu0: End traceback... > > Anyway, I installed the complete updated userland on the machine, and > started updating a bunch of packages from source, with all disk activity > over NFS over UDP over IPv6. After about three hours: > > panic: kernel diagnostic assertion "txq->txq_mbuf != NULL" failed: file > "/usr/src/sys/dev/ic/rtl8169.c", line 1380 > cpu0: Begin traceback... > vpanic() at netbsd:vpanic+0x140 > ch_voltag_convert_in() at netbsd:ch_voltag_convert_in > re_txeof() at netbsd:re_txeof+0x250 > re_intr() at netbsd:re_intr+0x11b > intr_biglock_wrapper() at netbsd:intr_biglock_wrapper+0x1d > Xintr_ioapic_edge19() at netbsd:Xintr_ioapic_edge19+0xee > --- interrupt --- > x86_mwait() at netbsd:x86_mwait+0xd > acpicpu_cstate_idle_enter() at netbsd:acpicpu_cstate_idle_enter+0xdb > acpicpu_cstate_idle() at netbsd:acpicpu_cstate_idle+0xb6 > idle_loop() at netbsd:idle_loop+0x18c > cpu0: End traceback... > uvm_fault(0xfe80cbca48c0, 0x0, 2) -> e > fatal page fault in supervisor mode > trap type 6 code 2 rip 8095500b cs 8 rflags 10282 cr2 84 ilevel 8 rsp > fe8040afea80 > curlwp 0xfe804dedaa20 pid 20873.1 lowest kstack 0xfe8040afb2c0 > > Once more, it crashed during boot, just like after the first crash: > > panic: ffs_sync: rofs mod, fs=/ > cpu1: Begin traceback... > vpanic() at netbsd:vpanic+0x140 > snprintf() at netbsd:snprintf > ffs_sync() at netbsd:ffs_sync+0x26b > VFS_SYNC() at netbsd:VFS_SYNC+0x1c > sched_sync() at netbsd:sched_sync+0x27b > cpu1: End traceback... > > I tried to continue building packages over NFS, but this happened again: > > panic: kernel diagnostic assertion "txq->txq_mbuf != NULL" failed: file > "/usr/src/sys/dev/ic/rtl8169.c", line 1380 > cpu0: Begin traceback... > vpanic() at netbsd:vpanic+0x140 > ch_voltag_convert_in() at netbsd:ch_voltag_convert_in > re_txeof() at netbsd:re_txeof+0x250 > re_intr() at netbsd:re_intr+0x11b > intr_biglock_wrapper() at netbsd:intr_biglock_wrapper+0x1d > Xintr_ioapic_edge19() at netbsd:Xintr_ioapic_edge19+0xee > --- interrupt --- > x86_mwait() at netbsd:x86_mwait+0xd > acpicpu_cstate_idle_enter() at netbsd:acpicpu_cstate_idle_enter+0xdb > acpicpu_cstate_idle() at netbsd:acpicpu_cstate_idle+0xb6 > idle_loop() at netbsd:idle_loop+0x18c > cpu0: End traceback... > > This is when I pointed WRKOBJDIR to a local scratch directory in > /etc/mk.conf, thus reducing the amount of network traffic severely. > It's now building happily. :) > > I've noticed quite a few IPv6 changes, lately. Might these mbuf related > assertions have something to do with that? I doubt rather the change of rtl8169.c,v 1.149; it applied the deferred if_start mechanism to re(4). Could you apply the following patch and try again? If the patch doesn't help, could you revert rtl8169.c,v 1.149 and try again? Thanks, ozaki-r diff --git a/sys/net/if.c b/sys/net/if.c index 482bcbe..61c1b50 100644 --- a/sys/net/if.c +++ b/sys/net/if.c @@ -1008,8 +1008,11 @@ if_deferred_start_softint(void *arg) static void if_deferred_start_common(struct ifnet *ifp) { + int s; + s = splnet(); if_start_lock(ifp); + splx(s); } static inline bool
Re: re(4) bpf-related assert
On Sun, Feb 19, 2017 at 9:17 PM,wrote: > It holds up fine (although I'm not confident I know > how to end up in that code). > Thanks! Thank you for testing! That code is executed when there are certain amount of Tx/Rx traffics on the NIC. ozaki-r
Re: re(4) bpf-related assert
Hi, On Fri, Feb 17, 2017 at 7:40 PM,wrote: > Hi, > > I'm using: > re0 at pci8 dev 0 function 0: RealTek 8168/8111 PCIe Gigabit Ethernet (rev. > 0x06) > > while using -current I got: > System panicked: kernel diagnostic assertion "!cpu_intr_p()" failed: file > "/usr/src/sys/net/bpf.c", line 1577 > > _KERNEL_OPT_NARCNET() at 0 > ?() at 80001d563000 > vpanic() at vpanic+0x149 > ch_voltag_convert_in() at ch_voltag_convert_in > _bpf_mtap() at _bpf_mtap+0x48f > re_start() at re_start+0x3d8 > re_intr() at re_intr+0x176 > intr_biglock_wrapper() at intr_biglock_wrapper+0x1d > Xintr_ioapic_edge18() at Xintr_ioapic_edge18+0xee > --- interrupt --- > Xspllower() at Xspllower+0xe > callout_softclock() at callout_softclock+0x41c > softint_dispatch() at softint_dispatch+0xda > > how should bpf_mtap callers be adjusted in this case? > > thanks. Sorry, I forgot to make re use the deferred if_start mechanism. Could you try the patch? ozaki-r diff --git a/sys/dev/ic/rtl8169.c b/sys/dev/ic/rtl8169.c index 691afa4..d262af1 100644 --- a/sys/dev/ic/rtl8169.c +++ b/sys/dev/ic/rtl8169.c @@ -869,6 +869,7 @@ re_attach(struct rtk_softc *sc) * Call MI attach routine. */ if_attach(ifp); + if_deferred_start_init(ifp, NULL); ether_ifattach(ifp, eaddr); rnd_attach_source(>rnd_source, device_xname(sc->sc_dev), @@ -1496,8 +1497,8 @@ re_intr(void *arg) } } - if (handled && !IFQ_IS_EMPTY(>if_snd)) - re_start(ifp); + if (handled) + if_schedule_deferred_start(ifp); rnd_add_uint32(>rnd_source, status);
Re: OpenVPN causes fresh -current to crash
On Tue, Jan 24, 2017 at 12:53 AM, Tom Ivar Helbekkmo <t...@hamartun.priv.no> wrote: > Ryota Ozaki <ozak...@netbsd.org> writes: > >> The latest pfil.c (v1.34) should fix the panic. Could you try it? > > I'll give it a go tonight, and report back. Thanks. > > Meanwhile, do you think this ongoing MPSAFE work may have some unwanted > consequences for NFS? There's a problem that's been around for at least > a couple of months, but that I only discovered the other day -- I was > running with kernels from late October then, and the problem I observed > is still there after upgrading. I'm not sure. I don't know much about NFS, how it works and how it involves the network stack. > > Reading NFS file systems is no problem, which is why I didn't notice it > before, but writing hangs. Here's an example: I started compiling a C > source file directly to an executable on an NFS mounted file system > (server and client both amd64 running fresh -current). The compile pass > is fine, but when the ld end of the pipeline wants to write the > executable, it hangs. So I try to do a 'df' in another terminal, and it > hangs. Finally, I simply attempt to make 'ls -l [target executable]' > show me if it's written anything yet, and that hangs, too: after an > attempt to write has hung the communication up, reads no longer work, > either: > > UID PID PPID CPU PRI NI VSZ RSS WCHAN STAT TTY TIME > COMMAND >0 22179 22678 0 124 0 333445136 netio D+ pts/170:00.01 > ld [...] > 501 21370 21006 516 85 089521144 nfsrcv I+ pts/180:00.00 > df > 501 21710 1 0 127 089641116 tstile Dpts/20- 0:00.00 > /bin/ls [...] > > Once I have something with "tstile" in the "WCHAN" column, I know that > I can't just reboot the machine: it's going to take a hard reset. Can you get DDB? If you can, you can know where the processes hang up: db> ps # you can get LWP addresses of ld and ls db> bt/a # you can get their stack traces And I guess by ps you can see some other LWPs stuck on tstile, for example softnet/N. Getting stack traces of such LWPs would explain how the hang happens, at least, can be hints to investigate. > > Oh, and it's the client that hangs; the server seems to be just fine, > and a reboot of the client makes NFS reads behave normally again. On > the server, the output file got created, but is zero bytes. The error > logged on the client when it gets stuck is this console output: > > nfs send error 64 for barsoom:/usr/local > > ...and then the normal "nfs server not responding" messages in syslog > after that, of course. I tried a NFS client with -current and a NFS server with netbsd-7, but writing didn't hang (I compiled a C program and cp -r /etc/ /mnt/nfs). The hang may happen depending on a NIC. Which NIC do you use? And please let me know NFS options of the client and the server? ozaki-r
Re: OpenVPN causes fresh -current to crash
On Sun, Jan 22, 2017 at 8:05 PM, Tom Ivar Helbekkmowrote: > Martin Husemann writes: > >> Could you try backing out this change and see if it helps? >> >> http://mail-index.netbsd.org/source-changes/2017/01/16/msg081115.html > > That did the trick. I've rebooted a few times, now, and the system > comes up as it should, with no incident, every time. Thanks! :) The latest pfil.c (v1.34) should fix the panic. Could you try it? Thanks, ozaki-r
Re: strange observations on network configuration (ifconfig)
Hi Thank you for testing! I'll commit the patch unless my further investigation denies. ozaki-r On Tue, Dec 13, 2016 at 2:29 AM, Frank Kardel <kar...@netbsd.org> wrote: > Hi, > > thanks for that. > > It looks fine to me now. > > Frank > > > On 12/12/16 14:13, Ryota Ozaki wrote: >> >> On Mon, Dec 12, 2016 at 5:22 PM, Frank Kardel <kar...@netbsd.org> wrote: >>> >>> Hi, >>> >>> I did try that efore an verified it again. Now routed attempts to install >>> a >>> local route >>> for the lo0 interface and fill the log with the EEXIST messages. >>> >>> That's why I went for LLDATA in order to avoid to analyse routed's inner >>> workings completely. >> >> RTF_LLDATA is introduced for userland programs to get L2 routes >> (ARP/NDP caches), so we don't use it for this case. >> >>> Maybe we need a different test for ignoring kernel routing messages. >> >> Yes. I'll investigate more though, prepared another patch: >>http://www.netbsd.org/~ozaki-r/fix-routed2.diff >> >> It mimics checks to ignore RTF_STATIC routes. It works for me. >> >> Thanks, >>ozaki-r > >
Re: strange observations on network configuration (ifconfig)
On Mon, Dec 12, 2016 at 5:22 PM, Frank Kardelwrote: > Hi, > > I did try that efore an verified it again. Now routed attempts to install a > local route > for the lo0 interface and fill the log with the EEXIST messages. > > That's why I went for LLDATA in order to avoid to analyse routed's inner > workings completely. RTF_LLDATA is introduced for userland programs to get L2 routes (ARP/NDP caches), so we don't use it for this case. > Maybe we need a different test for ignoring kernel routing messages. Yes. I'll investigate more though, prepared another patch: http://www.netbsd.org/~ozaki-r/fix-routed2.diff It mimics checks to ignore RTF_STATIC routes. It works for me. Thanks, ozaki-r
Re: strange observations on network configuration (ifconfig)
Hi, Thank you for the investigation. On Sun, Dec 11, 2016 at 9:08 PM, Frank Kardelwrote: > Hi ! > > Reverting that change (1.24->1.25) and using RTF_LLDATA instead of > RTF_LLINFO seems to solve the problem. > Is this correct or am I overlooking something? Local routes aren't actually link-layer routes; RTF_LLDATA remain in them for backward compatibility, IIRC. So as you said if old routed works on a new kernel, I think it is good to fix routed as I proposed in my earlier mail. Could you try the patch? http://www.netbsd.org/~ozaki-r/fix-routed.diff Thanks, ozaki-r
Re: strange observations on network configuration (ifconfig)
Hi, Thank you for the report. On Mon, Dec 5, 2016 at 11:47 PM, Frank Kardelwrote: > Hi ! > > when trying out a -current from 20161127 (7.99.42) I see issues with routed. > > On configuration of an interface address A.B.C.D/m the local network address > A.B.C.D is correctly entered with a loopback host route for the local > address > in the routing table. > Also the network route via the interface is correctly entered in the table. > > As soon as routed detects the new interface it seems to miss the loopback > host route for the local address and consequently decides to remove the > loopback host route from the kernel routing table, > > route monitor output: > got message of size 160 on Mon Dec 5 15:10:49 2016 > RTM_CHANGE: Change Metrics, Flags or Gateway: len 160, pid 25290, seq 1, > errno 0, flags: > locks: none inits: > sockaddrs: > default 10.200.1.1 0.0.0.0 > got message of size 96 on Mon Dec 5 15:10:52 2016 > RTM_ONEWADDR: address being added to iface: len 96, pid 2, seq 0, errno 528, > flags: > locks: inits: none > got message of size 104 on Mon Dec 5 15:10:52 2016 > RTM_NEWADDR: address being added to iface: len 104, metric 0, flags: > > sockaddrs: > 255.255.255.248 00:1b:21:aa:9b:7c A.B.C.38 default > ### new address (tentative) > got message of size 160 on Mon Dec 5 15:10:52 2016 > RTM_ADD: Add Route: len 160, pid 4878, seq 0, errno 0, flags: > > locks: none inits: none > sockaddrs: > A.B.C.38 link#2 > ### local address loopback link > got message of size 208 on Mon Dec 5 15:10:52 2016 > RTM_ADD: Add Route: len 208, pid 4878, seq 0, errno 0, flags: > > locks: none inits: none > sockaddrs: > A.B.C.32 link#2 255.255.255.248 00:1b:21:aa:9b:7c A.B.C.38 > ### net route via interface > got message of size 160 on Mon Dec 5 15:10:52 2016 > RTM_DELETE: Delete Route: len 160, pid 25290, seq 2, errno 0, flags: > > locks: none inits: none > sockaddrs: > A.B.C.38 link#2 > ### routed deletes local address loopback link > got message of size 88 on Mon Dec 5 15:10:57 2016 > RTM_ONEWADDR: address being added to iface: len 88, pid 2, seq 0, errno 520, > flags: > locks: inits: > > got message of size 96 on Mon Dec 5 15:10:57 2016 > RTM_NEWADDR: address being added to iface: len 96, metric 0, flags: > > sockaddrs: > 255.255.255.248 00:1b:21:aa:9b:7c A.B.C.38 A.B.C.39 > ### address finally valid > > [BTW: routed/table.c contains an out of date RTM_* number to string table - > fixed in output below] > > Trace from routed: > Tracing actions started > Tracing packets started > Tracing packet contents started > Tracing kernel changes started > Add interface lo0 127.0.0.1 -->127.0.0.1/32 > RCVBUF=61440 > Add interface wm1 10.200.1.2 -->10.200.1.0/24 > turn on RIP > Add10.200.1.0/24 -->10.200.1.2 metric=0 wm1 > Add127.0.0.1/32-->127.0.0.1metric=0 lo0 > ### initial interface state > Send mcast RIPv2 REQUEST to 224.0.0.9.520 via wm1 > QUERY > -- 15:10:46 -- > Recv RIPv2 REQUEST from 10.200.1.2.520 via wm1 > QUERY > discard our own RIP request > -- 15:10:46 -- > Recv RIPv2 RESPONSE from 10.200.1.1.520 via wm1 > 0.0.0.0metric=9 > 10.0.0.0 metric=1 > 10.0.0.128/32 metric=2 > ... > Add0.0.0.0 -->10.200.1.1 metric=9 wm1 15:10:46 > Add10.0.0.0-->10.200.1.1 metric=1 wm1 15:10:46 > Add10.0.0.128/32 -->10.200.1.1 metric=2 wm1 15:10:46 > ... > ### received routing information > > -- 15:10:47 -- > Send multicast Router Solic. from 10.200.1.2 to 224.0.0.2 via wm1 value=0 > -- 15:10:48 -- > write kernel RTM_CHANGE 0.0.0.0 -->10.200.1.1 metric=9 > flags=0x2 > -- 15:10:50 -- > Send multicast Router Solic. from 10.200.1.2 to 224.0.0.2 via wm1 value=0 > -- 15:10:50 -- > ignore RTM_ONEWADDR without dst > ### old routing messages are not properly skipped? > > Add interface wm0 A.B.C.38 -->A.B.C.32/29 > AddA.B.C.32/29 -->A.B.C.38 metric=0 wm0 > ### new interface due to ifconfig wm0 A.B.C.D/29 > > note RTM_NEWADDR with flags 0x100 for unknown interface index #180 > ### RTM_NEWADDR not properly handled/skipped > > RTM_ADD from pid 4878: A.B.C.38/32 --> A.B.C.38 > RTM_ADD from pid 4878: A.B.C.32/29 --> A.B.C.38 > -- 15:10:51 -- > write kernel RTM_DELETE A.B.C.38/32 -->A.B.C.38metric=0 flags=0 > ### routed does not seem to consider the A.B.C.38/32 -->A.B.C.38 (if=lo0, > gw=link#2) as being valid > > -- 15:10:53 -- > Send multicast Router Solic. from 10.200.1.2 to 224.0.0.2 via wm1 value=0 > -- 15:10:53 -- > ignore RTM_ONEWADDR without dst > note RTM_NEWADDR with flags 0x101 for unknown interface
Re: Why so many packet filters?
On Mon, Aug 15, 2016 at 4:13 PM, Joerg Sonnenberger <jo...@bec.de> wrote: > On Mon, Aug 15, 2016 at 01:51:38PM +0900, Ryota Ozaki wrote: >> BTW should we mark pf and ipf deprecated in netbsd-8 as they aren't >> well maintained nowadays? > > While they are not maintained, PF works quite well for the feature set > our version has. There are still quite a few issues with NPF, primarily > documentation issues, but also some functional ones. It seems a bit > premature to deprecate IPF and PF in the current situation. That said, > they should certainly be considered legacy functionality. Hmm, I thought npf is mature. I think it's better to have TODOs of npf somewhere to clarify what we need to do to make it mature enough. ozaki-r
Re: Why so many packet filters?
On Mon, Aug 15, 2016 at 12:10 PM, Paul Goyettewrote: > Taking a quick look, it seems that we have at least four (maybe five) > different packet filters available. > > pf > ipf > bpf (and bpfjit) > npf > > Is there a concise description of each, and when to use one vs the > other? (I'm not so familiar with filters, so please someone correct me if I'm wrong.) First of all, bpf (bpfjit) is different from the others. bpf sniffs raw packets on rx/tx in network device drivers (grep bpf_mtap) and also allows to send raw packets directly via ifp->if_output (e.g., ether_output). It doesn't provide pass/block filters that the others provide. bpfjit is just an optimization option of bpf. So we don't need to treat it individually. pf, ipf and npf provide pass/block functionalities (and more) at hook points (grep pfil_run_hooks) in the network stack via pfil(9), which realizes say firewall and NAT/NAPT. They provide similar functions but unfortunately their functions aren't compatible and cannot replace one to another easily, IIUC. (Someone would explain details of the differences.) npf is a newer filter than the others and designed for multi-core systems. So basically we recommend npf when one want to use one of them newly. BTW should we mark pf and ipf deprecated in netbsd-8 as they aren't well maintained nowadays? ozaki-r
HEADS UP: netstat needs to be updated
Hello -current users, Running netstat -ia (before 2016-07-14) on a 7.99.35 (or newer) kernel causes a problem (same as PR kern/51325). You can solve the problem by updating netstat with the latest source code (or get a recent binary at http://nyftp.netbsd.org/). Technical details: netstat -ia used kvm(3) to get address information from the kernel by reading lists of struct in_ifaddr/in6_ifaddr embedding struct ifaddr. Since 2016-07-14, netstat -ia began to use sysctl instead of kvm(3) to get the information. In favor of the change, the kernel changed struct ifaddr (that was needed for MP-safe network stack work) at 7.99.35, which broke old netstat -ia. The breakage is expected. We may not keep backcompat of kvm(3) if the work for backcompat needs dirty workarounds and/or extra overheads in the kernel. Regards, ozaki-r
Re: kernel panic
On Sun, Jun 19, 2016 at 9:23 PM, Michael van Elstwrote: > brad.har...@gmail.com (bch) writes: > >>kernel (adjusted from GENNERIC to allow dtrace support) from latest src >>panics: > >>(transcription): > >>reboot after panic: panic: kernel diagnostic assertion "M_GETCTX(m, >>struct ieee80211_node *) == NULL)" failed: file >>"/usr/src/sys/80211/ieee80211_output.c", line 1347 > > > That assertion seems to be bogus. It checks a field in an mbuf > that was just allocated in ieee80211_getmgtframe using m_getcl > and that may contain random data in the ctx pointer. Indeed. > > Another similar assertion in the same file is #ifdef __FreeBSD__. > > Looking at the current FreeBSD code, it still abuses the rcvif > pointer for local data. But there are no such assertions, which > would be bogus in FreeBSD either. Thanks. I think we can remove the assertion(s) safely. (I'm not sure why the assertion hadn't failed ever. I guess my changes broke some implicit zeroing rcvif somewhere.) ozaki-r
Re: Test failure for amd64 -current
On Fri, Jun 17, 2016 at 11:32 AM, Paul Goyettewrote: > I just noticed that the automated test bed for amd64 -current has been > crashing during one of the test cases: > > sbin/sysctl/t_perm (439/654): 8 test cases > sysctl_ddb: [3.708810s] Passed. > sysctl_hw: [32.423393s] Passed. > sysctl_kern: [39.811971s] Passed. > sysctl_machdep: [8.952583s] Passed. > sysctl_net: uvm_fault(0xfe8007a9b2e8, 0x0, 1) -> e > fatal page fault in supervisor mode > trap type 6 code 0 rip 80862581 cs 8 rflags 286 cr2 398 ilevel 0 rsp > fe80079facf0 > curlwp 0xfe80033376c0 pid 22351.1 lowest kstack 0xfe80079f72c0 > panic: trap > cpu0: Begin traceback... > vpanic() at netbsd:vpanic+0x140 > snprintf() at netbsd:snprintf > trap() at netbsd:trap+0xc4b > --- trap (number 6) --- > psref_release() at netbsd:psref_release+0x23 > if_sdl_sysctl() at netbsd:if_sdl_sysctl+0xc1 > sysctl_dispatch() at netbsd:sysctl_dispatch+0xc1 > sys___sysctl() at netbsd:sys___sysctl+0xd8 > syscall() at netbsd:syscall+0x15b > --- syscall (number 202) --- > 78e62090bfaa: > cpu0: End traceback... > > > Is anyone looking into this? riastradh@ already fixed it (sys/net/if.c 1.341). ozaki-r
Re: kernel panic
On Thu, Jun 16, 2016 at 3:04 PM, Ryota Ozaki <ozak...@netbsd.org> wrote: > On Thu, Jun 16, 2016 at 1:56 PM, bch <brad.har...@gmail.com> wrote: >> >> On Jun 15, 2016 9:29 PM, "Kengo NAKAHARA" <k-nakah...@iij.ad.jp> wrote: >>> >>> Hi, >>> >>> On 2016/06/16 8:15, bch wrote: >>> > I am now at 1.414, and it seems stable. >>> >>> Thank you for your checking and reporting. >> >> My pleasure. Question, were my wm(4) and iwm(4) faults related (maybe some >> luck macro-ization, rejigging)? > > Not related. > >> Can anybody point me to the commits that >> apparently fixed these interfaces (did if_wm.c fix iwm(4) too??)? > > For iwm: > http://cvsweb.netbsd.org/bsdweb.cgi/src/sys/sys/mbuf.h > > Commit 1.164 broke iwm (and I guess all other wifi drivers) > and commit 1.165 fixed it. > > For wm: > http://cvsweb.netbsd.org/bsdweb.cgi/src/sys/dev/pci/if_wm.c > > Commit 1.413 broke wm and commit 1.414 fixed it. > >> >>> If it seems there is still >>> problems, please tell us. >> >> Will do. I'd like to have the commit(s) identified and >> re-witness/characterize the issue. Otherwise, things currently seem stable. >> Thanks. > > [Timeline] > > - Jun 10 13:31:45: mbuf.h r1.164 > - Jun 11 ??:??:??: you encountered the first panic > - Jun 12 10:14:12: mbuf.h r1.165 oops > - Jun 14 09:07:22: if_wm.c r1.164 ^^ r1.413 > - Jun 14 ??:??:??: you encountered the second panic > - Jun 14 17:09:20: if_wm.c r1.165 ^^ r1.414 > - Jun 16 ??:??:??: you are here > > And I noticed that I forgot to bump the kernel version; my mbuf.h > change required it. (I already bumped.) If you run a kernel between > my mbuf.h change and the bump with network device driver modules > of 7.99.30, something bad will happen. (I guess the issues you saw > aren't related to this though.) > > Thanks, > ozaki-r
Re: kernel panic
On Thu, Jun 16, 2016 at 1:56 PM, bchwrote: > > On Jun 15, 2016 9:29 PM, "Kengo NAKAHARA" wrote: >> >> Hi, >> >> On 2016/06/16 8:15, bch wrote: >> > I am now at 1.414, and it seems stable. >> >> Thank you for your checking and reporting. > > My pleasure. Question, were my wm(4) and iwm(4) faults related (maybe some > luck macro-ization, rejigging)? Not related. > Can anybody point me to the commits that > apparently fixed these interfaces (did if_wm.c fix iwm(4) too??)? For iwm: http://cvsweb.netbsd.org/bsdweb.cgi/src/sys/sys/mbuf.h Commit 1.164 broke iwm (and I guess all other wifi drivers) and commit 1.165 fixed it. For wm: http://cvsweb.netbsd.org/bsdweb.cgi/src/sys/dev/pci/if_wm.c Commit 1.413 broke wm and commit 1.414 fixed it. > >> If it seems there is still >> problems, please tell us. > > Will do. I'd like to have the commit(s) identified and > re-witness/characterize the issue. Otherwise, things currently seem stable. > Thanks. [Timeline] - Jun 10 13:31:45: mbuf.h r1.164 - Jun 11 ??:??:??: you encountered the first panic - Jun 12 10:14:12: mbuf.h r1.165 - Jun 14 09:07:22: if_wm.c r1.164 - Jun 14 ??:??:??: you encountered the second panic - Jun 14 17:09:20: if_wm.c r1.165 - Jun 16 ??:??:??: you are here And I noticed that I forgot to bump the kernel version; my mbuf.h change required it. (I already bumped.) If you run a kernel between my mbuf.h change and the bump with network device driver modules of 7.99.30, something bad will happen. (I guess the issues you saw aren't related to this though.) Thanks, ozaki-r
Re: kernel panic
Hi, On Sat, Jun 11, 2016 at 3:58 AM, bchwrote: > kernel (adjusted from GENNERIC to allow dtrace support) from latest src > panics: > > (transcription): > > reboot after panic: panic: kernel diagnostic assertion "M_GETCTX(m, > struct ieee80211_node *) == NULL)" failed: file > "/usr/src/sys/80211/ieee80211_output.c", line 1347 Can you show me a backtrace? And let me know the latest version (date) of the kernel that worked for you. ozaki-r
Re: Crash during rc bootup (amd64) with new networking stuff
On Sat, Apr 16, 2016 at 10:49 AM, Geoff Wing <g...@pobox.com> wrote: > On Friday 2016-04-15 16:48 +1000, Ryota Ozaki output: > :> :panic: kernel "(la->la_flags & LLE_STATIC) == 0 failed: .. if_arp", > line 1220 > :> :It also deletes and adds in a static arp address: > :> : "arp -d 1.2.3.4; arp -s 1.2.3.4 xx:xx:xx:xx:xx:xx" > :> Taking out the static arp commands and it boots up OK. > :Thanks. I could reproduce the panic on my machine with the latest kernel. > : > :A quick fix is like this: > [...] > :Does this patch help you? > :If so, I'll commit it (with tweaks maybe) after more validations. > > Great, boots up OK. Thanks. I committed the fix and a regression test for the panic. ozaki-r
Re: Crash during rc bootup (amd64) with new networking stuff
2016/04/15 4:34 "Geoff Wing" <g...@pobox.com>: > > On Friday 2016-04-15 16:20 +1000, Ryota Ozaki output: > :> panic: kernel "(la->la_flags & LLE_STATIC) == 0 failed: .. if_arp", line 1220 > :The source code of your kernel looks a bit old: > : http://nxr.netbsd.org/xref/src/sys/netinet/if_arp.c#1220 > : > :You can see the version of the file by: > : $ ident /netbsd |grep if_arp.c > : $NetBSD: if_arp.c,v 1.205 2016/04/07 03:22:15 christos Exp $ > :And this is the latest version. > > Mine says: > $NetBSD: if_arp.c,v 1.206 2016/04/13 00:47:01 ozaki-r Exp $ Heh. Oh mine IS old! Sorry for confusing you. > > I'll try the patch in the other post in a couple of hours. Thanks! ozaki-r
Re: Crash during rc bootup (amd64) with new networking stuff
On Fri, Apr 15, 2016 at 3:10 PM, Geoff Wingwrote: > On Friday 2016-04-15 13:20 +1000, Geoff Wing output: > :panic: kernel "(la->la_flags & LLE_STATIC) == 0 failed: .. if_arp", > line 1220 > :It also deletes and adds in a static arp address: > : "arp -d 1.2.3.4; arp -s 1.2.3.4 xx:xx:xx:xx:xx:xx" > > Taking out the static arp commands and it boots up OK. Thanks. I could reproduce the panic on my machine with the latest kernel. A quick fix is like this: --- a/sys/netinet/if_arp.c +++ b/sys/netinet/if_arp.c @@ -1223,10 +1223,11 @@ in_arpinput(struct mbuf *m) KASSERT(sizeof(la->ll_addr) >= ifp->if_addrlen); (void)memcpy(>ll_addr, ar_sha(ah), ifp->if_addrlen); la->la_flags |= LLE_VALID; - la->la_expire = time_uptime + arpt_keep; + if ((la->la_flags & LLE_STATIC) == 0) { + la->la_expire = time_uptime + arpt_keep; + arp_settimer(la, arpt_keep); + } la->la_asked = 0; - KASSERT((la->la_flags & LLE_STATIC) == 0); - arp_settimer(la, arpt_keep); /* rt->rt_flags &= ~RTF_REJECT; */ Does this patch help you? If so, I'll commit it (with tweaks maybe) after more validations. Thanks, ozaki-r
Re: Crash during rc bootup (amd64) with new networking stuff
On Fri, Apr 15, 2016 at 12:20 PM, Geoff Wingwrote: > Hi, > with the new networking setup, I'm getting a crash using clean amd64 > build (GENERIC kernel) during rc script processing. > > After getting past netstart.local, I'll get > > interface address is missing from cache = 0x0 in delete > arp: writing to routing socket: No such file or directory > Building databases: > Starting syslogd. > Starting named. > Setting date via ntp. > then > > panic: kernel "(la->la_flags & LLE_STATIC) == 0 failed: .. if_arp", line > 1220 The source code of your kernel looks a bit old: http://nxr.netbsd.org/xref/src/sys/netinet/if_arp.c#1220 You can see the version of the file by: $ ident /netbsd |grep if_arp.c $NetBSD: if_arp.c,v 1.205 2016/04/07 03:22:15 christos Exp $ And this is the latest version. Can you rebuild a kernel with the latest snapshot of -current and try it again (if the source code is actually old)? Thanks, ozaki-r > > My netstart.local adds static routes and blackhole routes. > It also deletes and adds in a static arp address: > "arp -d 1.2.3.4; arp -s 1.2.3.4 xx:xx:xx:xx:xx:xx" > > It's a bit hard to diagnose on my computer, but I can try if others > cannot reproduce. > > Regards, > Geoff
BuildBot failures
Hi, BuildBot (http://build.tastylime.net/waterfall) has stopped for several weeks it seems because of cvs update failure. Do we have to remove local files and checkout cleanly? Regards, ozaki-r
Re: ATF test failures
On Thu, Mar 10, 2016 at 4:02 PM, Martin Husemann <mar...@duskware.de> wrote: > On Thu, Mar 10, 2016 at 03:05:36PM +0900, Ryota Ozaki wrote: >> We're seeing many test failures (> 1000) on >> amd64 and i386, and installation failures on >> sparc. > > We should back out the gcc change that causes the x86 failures. Please someone do it (or fix the problem) :) > > Sparc bootblocks are broken, everything else works fine - I am looking > at it. Thanks! ozaki-r
ATF test failures
Hi, http://releng.netbsd.org/test-results.html We're seeing many test failures (> 1000) on amd64 and i386, and installation failures on sparc. What is happening on them? Do anyone have ideas to fix them? Thanks, ozaki-r
Re: recurring panics
On Wed, Mar 9, 2016 at 10:59 PM, Thomas Klausner <w...@netbsd.org> wrote: > On Wed, Mar 09, 2016 at 10:22:56PM +0900, Ryota Ozaki wrote: >> On Wed, Mar 9, 2016 at 8:45 PM, Thomas Klausner <t...@giga.or.at> wrote: >> > Hi! >> > >> > I have had this kind of reboot about 5 times in the last couple of days: >> > >> > Mar 8 16:26:14 yt savecore: reboot after panic: panic: kernel diagnostic >> > assertion "l->l_nopreempt == 0" failed: file >> > "/archive/foreign/src/sys/sys/userret.h", line 116 WARNING: SPL NOT >> > LOWERED ON SYSCALL 16384 24 EXIT ef20f930 6 WARNING: SPL NOT LOWERED ON >> > TRAP EXIT 6 0 >> > >> > For some of them, I even have crash dumps: >> > >> > (gdb) target kvm netbsd.94.core >> > 0x801195a5 in cpu_reboot (howto=howto@entry=260, >> > bootstr=bootstr@entry=0x0) at >> > /archive/foreign/src/sys/arch/amd64/amd64/machdep.c:671 >> > 671 dumpsys(); >> > (gdb) bt >> > #0 0x801195a5 in cpu_reboot (howto=howto@entry=260, >> > bootstr=bootstr@entry=0x0) at >> > /archive/foreign/src/sys/arch/amd64/amd64/machdep.c:671 >> > #1 0x8081c704 in vpanic (fmt=0x80e13840 "kernel >> > %sassertion \"%s\" failed: file \"%s\", line %d ", >> > ap=ap@entry=0xfe8154d51e58) >> > at /archive/foreign/src/sys/kern/subr_prf.c:342 >> > #2 0x80b3edb3 in kern_assert (fmt=fmt@entry=0x80e13840 >> > "kernel %sassertion \"%s\" failed: file \"%s\", line %d ") >> > at /archive/foreign/src/sys/lib/libkern/kern_assert.c:51 >> > #3 0x8013d0e7 in mi_userret (l=0xfe84092b3460) at >> > /archive/foreign/src/sys/sys/userret.h:116 >> > #4 userret (l=0xfe84092b3460) at ./machine/userret.h:82 >> > #5 syscall (frame=0xfe8154d51f00) at >> > /archive/foreign/src/sys/arch/x86/x86/syscall.c:184 >> > #6 0x80100661 in Xsyscall () >> > (gdb) fr 3 >> > #3 0x8013d0e7 in mi_userret (l=0xfe84092b3460) at >> > /archive/foreign/src/sys/sys/userret.h:116 >> > 116 KASSERT(l->l_nopreempt == 0); >> > (gdb) >> > >> > I upgraded from a Jan 28 kernel to a March 3 kernel, and I think it >> > only started afterwards. >> > >> > Any ideas? >> > Thomas >> >> I saw similar panics and my commit at March 7 (if.c,v 1.326) fixed them. >> So updating your kernel may solve the problem. > > This one? Yes. > > Index: src/sys/net/if.c > diff -u src/sys/net/if.c:1.325 src/sys/net/if.c:1.326 > --- src/sys/net/if.c:1.325 Fri Feb 19 20:05:43 2016 > +++ src/sys/net/if.cMon Mar 7 01:41:55 2016 > @@ -770,6 +770,7 @@ > ifq = percpu_getref(ipq->ipq_ifqs); > if (IF_QFULL(ifq)) { > IF_DROP(ifq); > + percpu_putref(ipq->ipq_ifqs); > m_freem(m); > goto out; > } > > What would trigger that case? percpu_putref is kpreempt_enable that decrements l_nopreempt. If we forget it, some other places that accesses l_nopreempt can be affected. In your case, mi_userret is suffered. ozaki-r
Re: recurring panics
On Wed, Mar 9, 2016 at 8:45 PM, Thomas Klausnerwrote: > Hi! > > I have had this kind of reboot about 5 times in the last couple of days: > > Mar 8 16:26:14 yt savecore: reboot after panic: panic: kernel diagnostic > assertion "l->l_nopreempt == 0" failed: file > "/archive/foreign/src/sys/sys/userret.h", line 116 WARNING: SPL NOT LOWERED > ON SYSCALL 16384 24 EXIT ef20f930 6 WARNING: SPL NOT LOWERED ON TRAP EXIT 6 0 > > For some of them, I even have crash dumps: > > (gdb) target kvm netbsd.94.core > 0x801195a5 in cpu_reboot (howto=howto@entry=260, > bootstr=bootstr@entry=0x0) at > /archive/foreign/src/sys/arch/amd64/amd64/machdep.c:671 > 671 dumpsys(); > (gdb) bt > #0 0x801195a5 in cpu_reboot (howto=howto@entry=260, > bootstr=bootstr@entry=0x0) at > /archive/foreign/src/sys/arch/amd64/amd64/machdep.c:671 > #1 0x8081c704 in vpanic (fmt=0x80e13840 "kernel %sassertion > \"%s\" failed: file \"%s\", line %d ", ap=ap@entry=0xfe8154d51e58) > at /archive/foreign/src/sys/kern/subr_prf.c:342 > #2 0x80b3edb3 in kern_assert (fmt=fmt@entry=0x80e13840 > "kernel %sassertion \"%s\" failed: file \"%s\", line %d ") > at /archive/foreign/src/sys/lib/libkern/kern_assert.c:51 > #3 0x8013d0e7 in mi_userret (l=0xfe84092b3460) at > /archive/foreign/src/sys/sys/userret.h:116 > #4 userret (l=0xfe84092b3460) at ./machine/userret.h:82 > #5 syscall (frame=0xfe8154d51f00) at > /archive/foreign/src/sys/arch/x86/x86/syscall.c:184 > #6 0x80100661 in Xsyscall () > (gdb) fr 3 > #3 0x8013d0e7 in mi_userret (l=0xfe84092b3460) at > /archive/foreign/src/sys/sys/userret.h:116 > 116 KASSERT(l->l_nopreempt == 0); > (gdb) > > I upgraded from a Jan 28 kernel to a March 3 kernel, and I think it > only started afterwards. > > Any ideas? > Thomas I saw similar panics and my commit at March 7 (if.c,v 1.326) fixed them. So updating your kernel may solve the problem. ozaki-r
Re: Disk full on anita/amd64?
On Tue, Feb 2, 2016 at 4:28 PM, Andreas Gustafsson <g...@gson.org> wrote: > Ryota Ozaki wrote: >> I've noticed increase of test failures on anita/amd64: >> http://releng.netbsd.org/b5reports/amd64/commits-2016.01.html#2016.01.30.03.38.39 >> >> A test failure indicates disk full of the system: >> http://releng.netbsd.org/b5reports/amd64/build/2016.01.30.03.38.39/test.html#atf_atf-c_detail_fs_test_mkdtemp_err >> >> A sanity check of ATF also indicates that: >> df-pre-test Filesystem 1K-blocks UsedAvail %Cap Mounted on >> df-pre-test /dev/wd0a 862463 810271 9069 98% / >> >> I guess it happens due to dtrace stuffs added to the base >> recently and keeping debugging symbols of kernel modules >> for dtrace (this is the just previous commit where test >> failures increased). > > I just incrased the disk size in the configuration of the amd64 tests > on babylon5, and will also increase anita's default disk size in the > next release (assuming the increase in disk use is permanent). Thanks! ozaki-r
Disk full on anita/amd64?
Hi, I've noticed increase of test failures on anita/amd64: http://releng.netbsd.org/b5reports/amd64/commits-2016.01.html#2016.01.30.03.38.39 A test failure indicates disk full of the system: http://releng.netbsd.org/b5reports/amd64/build/2016.01.30.03.38.39/test.html#atf_atf-c_detail_fs_test_mkdtemp_err A sanity check of ATF also indicates that: df-pre-test Filesystem 1K-blocks UsedAvail %Cap Mounted on df-pre-test /dev/wd0a 862463 810271 9069 98% / I guess it happens due to dtrace stuffs added to the base recently and keeping debugging symbols of kernel modules for dtrace (this is the just previous commit where test failures increased). I think we should increase the size of the disk image of qemu. Regards, ozaki-r
Re: IPv6 init order and RPI
On Mon, Nov 9, 2015 at 5:24 AM, Joerg Sonnenbergerwrote: > Hi all, > while trying to update my first generation RPI, I hit a kernel panic > during first boot. Looking a bit further, for unknown reasons the > address setup sometimes fails with usmsc. When it does, the ip6_input > case can trigger the timeout handling on the incompletely set up > address. The attached patch seems to fix that problem at least. > Comments? LGTM; in6m (in6m_timer_ch) should be initialized before being published via ia6_multiaddrs. Thanks, ozaki-r
Re: panic in arptimer
On Mon, Oct 19, 2015 at 6:33 PM, Ryota Ozaki <ozak...@netbsd.org> wrote: > Hi, > > I've reproduced the panic on my machine and I'm investing > the problem. A possible fix has been committed. Could you try a latest kernel? (a kernel binary will be built in several hours.) tips: by doing sysctl -w net.inet.arp.keep=30, you don't need to wait for 1200 seconds. > > Thank you for the report, > ozaki-r > > On Mon, Oct 19, 2015 at 4:10 PM, Takahiro Hayashi <t.hash...@gmail.com> wrote: >> Hello, >> >> Kernel panics in arptimer after detaching network interface. >> See dmesg below please. >> It happened on NetBSD/amd64 on GENERIC.201510182130Z from nyftp. >> I think this problem looks like kern/50186. BTW I think this problem is different from PR kern/50186. Regards, ozaki-r >> >> How-To-Repeat: >> 1. Boot kernel into single user mode with "boot netbsd -s". >> 2. sysctl -w net.inet6.ip6.auto_linklocal=0 >> 3. ifconfig interface . >> 4. Send one ping to other host. >> 5. Detach the interface with "drvctl -d". >> 6. Wait about 1200 seconds (actually 1200 sec after ping is sent). >> >> >> I saw following panic after detaching "re0". >> >> fatal page fault in supervisor mode >> trap type 6 code 0 rip 808cbf2f cs 8 rflags 10246 cr2 >> 873c7368 ilevel 2 rsp fe80dabd1f08 >> curlwp 0xfe81071a30c0 pid 0.22 lowest kstack 0xfe80dabce2c0 >> kernel: page fault trap, code=0 >> Stopped in pid 0.22 (system) at netbsd:arptimer+0xc4: movq >> 360(%r15),%rdi >> db{1}> bt >> arptimer() at netbsd:arptimer+0xc4 >> callout_softclock() at netbsd:callout_softclock+0x1d0 >> softint_dispatch() at netbsd:softint_dispatch+0xd3 >> DDB lost frame for netbsd:Xsoftintr+0x4f, trying 0xfe80dabd1ff0 >> Xsoftintr() at netbsd:Xsoftintr+0x4f >> --- interrupt --- >> 0: >> db{1}> show reg >> ds 1ee8 >> es 4040 >> fs 600 >> gs 3d5e >> rdi fe81078c8a00 >> rsi f800 >> rbp fe80dabd1f38 >> rbx fe81078c8908 >> rdx 0 >> rcx 7 >> rax fe81071a30c4 >> r8 fe8107184040 >> r9 7d >> r10 fe811eb244f4 >> r11 246 >> r12 fe81078c8a00 >> r13 fe81078c89b0 >> r14 0 >> r15 873c7008 >> rip 808cbf2farptimer+0xc4 >> cs 8 >> rflags 10246 >> rsp fe80dabd1f08 >> ss 10 >> netbsd:arptimer+0xc4: movq360(%r15),%rdi >> db{1}> >> >> >> I met folloging panic after "drvctl -d axe0". >> >> fatal page fault in supervisor mode >> trap type 6 code 2 rip 8011bcdd cs 8 rflags 10282 cr2 0 ilevel 2 rsp >> fe80dabd1f00 >> curlwp 0xfe81071a30c0 pid 0.22 lowest kstack 0xfe80dabce2c0 >> kernel: page fault trap, code=0 >> Stopped in pid 0.22 (system) at netbsd:rw_enter+0x2d: lock cmpxchgq >> %rcx,0(% >> rdi) >> db{1}> bt >> rw_enter() at netbsd:rw_enter+0x2d >> callout_softclock() at netbsd:callout_softclock+0x1d0 >> softint_dispatch() at netbsd:softint_dispatch+0xd3 >> DDB lost frame for netbsd:Xsoftintr+0x4f, trying 0xfe80dabd1ff0 >> Xsoftintr() at netbsd:Xsoftintr+0x4f >> --- interrupt --- >> 0: >> db{1}> show reg >> ds 1ee8 >> es 0 >> fs fc00 >> gs 3d5e >> rdi 0 >> rsi 1 >> rbp fe80dabd1f38 >> rbx fe811dbc8188 >> rdx 0 >> rcx fe81071a30c4 >> rax 0 >> r8 fe8107184040 >> r9 7d >> r10 fe811eb244f4 >> r11 246 >> r12 fe811dbc8280 >> r13 fe811dbc8230 >> r14 0 >> r15 fe811de0b010 >> rip 8011bcddrw_enter+0x2d >> cs 8 >> rflags 10282 >> rsp fe80dabd1f00 >> ss 10 >> netbsd:rw_enter+0x2d: lock cmpxchgq %rcx,0(%rdi) >> db{1}> >> >> >> -- >> t-hash
Re: panic in arptimer
On Tue, Oct 20, 2015 at 7:31 PM, Takahiro Hayashi <t.hash...@gmail.com> wrote: > Hello, > > On 2015/10/20 16:59, Ryota Ozaki wrote: >> >> On Mon, Oct 19, 2015 at 6:33 PM, Ryota Ozaki <ozak...@netbsd.org> wrote: >>> >>> Hi, >>> >>> I've reproduced the panic on my machine and I'm investing >>> the problem. >> >> >> A possible fix has been committed. Could you try a latest kernel? >> (a kernel binary will be built in several hours.) > > > I have updated my local tree and confirmed the problem is fixed. Good to hear. Thank you for testing. ozaki-r > Thank you for working on this prob. > >> tips: by doing sysctl -w net.inet.arp.keep=30, you don't need to >> wait for 1200 seconds. > > > Thank you for nice tip. > > >>> >>> Thank you for the report, >>>ozaki-r >>> >>> On Mon, Oct 19, 2015 at 4:10 PM, Takahiro Hayashi <t.hash...@gmail.com> >>> wrote: >>>> >>>> Hello, >>>> >>>> Kernel panics in arptimer after detaching network interface. >>>> See dmesg below please. >>>> It happened on NetBSD/amd64 on GENERIC.201510182130Z from nyftp. >>>> I think this problem looks like kern/50186. >> >> >> BTW I think this problem is different from PR kern/50186. >> >> Regards, >>ozaki-r >> >>>> >>>> How-To-Repeat: >>>> 1. Boot kernel into single user mode with "boot netbsd -s". >>>> 2. sysctl -w net.inet6.ip6.auto_linklocal=0 >>>> 3. ifconfig interface . >>>> 4. Send one ping to other host. >>>> 5. Detach the interface with "drvctl -d". >>>> 6. Wait about 1200 seconds (actually 1200 sec after ping is sent). >>>> >>>> >>>> I saw following panic after detaching "re0". >>>> >>>> fatal page fault in supervisor mode >>>> trap type 6 code 0 rip 808cbf2f cs 8 rflags 10246 cr2 >>>> 873c7368 ilevel 2 rsp fe80dabd1f08 >>>> curlwp 0xfe81071a30c0 pid 0.22 lowest kstack 0xfe80dabce2c0 >>>> kernel: page fault trap, code=0 >>>> Stopped in pid 0.22 (system) at netbsd:arptimer+0xc4: movq >>>> 360(%r15),%rdi >>>> db{1}> bt >>>> arptimer() at netbsd:arptimer+0xc4 >>>> callout_softclock() at netbsd:callout_softclock+0x1d0 >>>> softint_dispatch() at netbsd:softint_dispatch+0xd3 >>>> DDB lost frame for netbsd:Xsoftintr+0x4f, trying 0xfe80dabd1ff0 >>>> Xsoftintr() at netbsd:Xsoftintr+0x4f >>>> --- interrupt --- >>>> 0: >>>> db{1}> show reg >>>> ds 1ee8 >>>> es 4040 >>>> fs 600 >>>> gs 3d5e >>>> rdi fe81078c8a00 >>>> rsi f800 >>>> rbp fe80dabd1f38 >>>> rbx fe81078c8908 >>>> rdx 0 >>>> rcx 7 >>>> rax fe81071a30c4 >>>> r8 fe8107184040 >>>> r9 7d >>>> r10 fe811eb244f4 >>>> r11 246 >>>> r12 fe81078c8a00 >>>> r13 fe81078c89b0 >>>> r14 0 >>>> r15 873c7008 >>>> rip 808cbf2farptimer+0xc4 >>>> cs 8 >>>> rflags 10246 >>>> rsp fe80dabd1f08 >>>> ss 10 >>>> netbsd:arptimer+0xc4: movq360(%r15),%rdi >>>> db{1}> >>>> >>>> >>>> I met folloging panic after "drvctl -d axe0". >>>> >>>> fatal page fault in supervisor mode >>>> trap type 6 code 2 rip 8011bcdd cs 8 rflags 10282 cr2 0 ilevel 2 >>>> rsp >>>> fe80dabd1f00 >>>> curlwp 0xfe81071a30c0 pid 0.22 lowest kstack 0xfe80dabce2c0 >>>> kernel: page fault trap, code=0 >>>> Stopped in pid 0.22 (system) at netbsd:rw_enter+0x2d: lock cmpxchgq >>>> %rcx,0(% >>>> rdi) >>>> db{1}> bt >>>> rw_enter() at netbsd:rw_enter+0x2d >>>> callout_softclock() at netbsd:callout_softclock+0x1d0 >>>> softint_dispatch() at netbsd:softint_dispatch+0xd3 >>>> DDB lost frame for netbsd:Xsoftintr+0x4f, trying 0xfe80dabd1ff0 >>>> Xsoftintr() at netbsd:Xsoftintr+0x4f >>>> --- interrupt --- >>>> 0: >>>> db{1}> show reg >>>> ds 1ee8 >>>> es 0 >>>> fs fc00 >>>> gs 3d5e >>>> rdi 0 >>>> rsi 1 >>>> rbp fe80dabd1f38 >>>> rbx fe811dbc8188 >>>> rdx 0 >>>> rcx fe81071a30c4 >>>> rax 0 >>>> r8 fe8107184040 >>>> r9 7d >>>> r10 fe811eb244f4 >>>> r11 246 >>>> r12 fe811dbc8280 >>>> r13 fe811dbc8230 >>>> r14 0 >>>> r15 fe811de0b010 >>>> rip 8011bcddrw_enter+0x2d >>>> cs 8 >>>> rflags 10282 >>>> rsp fe80dabd1f00 >>>> ss 10 >>>> netbsd:rw_enter+0x2d: lock cmpxchgq %rcx,0(%rdi) >>>> db{1}> >>>> >>>> >>>> -- >>>> t-hash >> >> > > -- > t-hash
Re: panic in arptimer
Hi, I've reproduced the panic on my machine and I'm investing the problem. Thank you for the report, ozaki-r On Mon, Oct 19, 2015 at 4:10 PM, Takahiro Hayashiwrote: > Hello, > > Kernel panics in arptimer after detaching network interface. > See dmesg below please. > It happened on NetBSD/amd64 on GENERIC.201510182130Z from nyftp. > I think this problem looks like kern/50186. > > How-To-Repeat: > 1. Boot kernel into single user mode with "boot netbsd -s". > 2. sysctl -w net.inet6.ip6.auto_linklocal=0 > 3. ifconfig interface . > 4. Send one ping to other host. > 5. Detach the interface with "drvctl -d". > 6. Wait about 1200 seconds (actually 1200 sec after ping is sent). > > > I saw following panic after detaching "re0". > > fatal page fault in supervisor mode > trap type 6 code 0 rip 808cbf2f cs 8 rflags 10246 cr2 > 873c7368 ilevel 2 rsp fe80dabd1f08 > curlwp 0xfe81071a30c0 pid 0.22 lowest kstack 0xfe80dabce2c0 > kernel: page fault trap, code=0 > Stopped in pid 0.22 (system) at netbsd:arptimer+0xc4: movq > 360(%r15),%rdi > db{1}> bt > arptimer() at netbsd:arptimer+0xc4 > callout_softclock() at netbsd:callout_softclock+0x1d0 > softint_dispatch() at netbsd:softint_dispatch+0xd3 > DDB lost frame for netbsd:Xsoftintr+0x4f, trying 0xfe80dabd1ff0 > Xsoftintr() at netbsd:Xsoftintr+0x4f > --- interrupt --- > 0: > db{1}> show reg > ds 1ee8 > es 4040 > fs 600 > gs 3d5e > rdi fe81078c8a00 > rsi f800 > rbp fe80dabd1f38 > rbx fe81078c8908 > rdx 0 > rcx 7 > rax fe81071a30c4 > r8 fe8107184040 > r9 7d > r10 fe811eb244f4 > r11 246 > r12 fe81078c8a00 > r13 fe81078c89b0 > r14 0 > r15 873c7008 > rip 808cbf2farptimer+0xc4 > cs 8 > rflags 10246 > rsp fe80dabd1f08 > ss 10 > netbsd:arptimer+0xc4: movq360(%r15),%rdi > db{1}> > > > I met folloging panic after "drvctl -d axe0". > > fatal page fault in supervisor mode > trap type 6 code 2 rip 8011bcdd cs 8 rflags 10282 cr2 0 ilevel 2 rsp > fe80dabd1f00 > curlwp 0xfe81071a30c0 pid 0.22 lowest kstack 0xfe80dabce2c0 > kernel: page fault trap, code=0 > Stopped in pid 0.22 (system) at netbsd:rw_enter+0x2d: lock cmpxchgq > %rcx,0(% > rdi) > db{1}> bt > rw_enter() at netbsd:rw_enter+0x2d > callout_softclock() at netbsd:callout_softclock+0x1d0 > softint_dispatch() at netbsd:softint_dispatch+0xd3 > DDB lost frame for netbsd:Xsoftintr+0x4f, trying 0xfe80dabd1ff0 > Xsoftintr() at netbsd:Xsoftintr+0x4f > --- interrupt --- > 0: > db{1}> show reg > ds 1ee8 > es 0 > fs fc00 > gs 3d5e > rdi 0 > rsi 1 > rbp fe80dabd1f38 > rbx fe811dbc8188 > rdx 0 > rcx fe81071a30c4 > rax 0 > r8 fe8107184040 > r9 7d > r10 fe811eb244f4 > r11 246 > r12 fe811dbc8280 > r13 fe811dbc8230 > r14 0 > r15 fe811de0b010 > rip 8011bcddrw_enter+0x2d > cs 8 > rflags 10282 > rsp fe80dabd1f00 > ss 10 > netbsd:rw_enter+0x2d: lock cmpxchgq %rcx,0(%rdi) > db{1}> > > > -- > t-hash
Re: panic in if_arp.c
Hi, roy@ and I have been working on the issue and have had a fix for it. The fix will be committed by roy@ (or me) soon. I'm sorry for the inconvenience, ozaki-r On Tue, Oct 13, 2015 at 4:04 PM, Thomas Klausnerwrote: > Hi! > > I've upgraded my kernel from a version of end of September (28th or > 30th, not quite sure) to one from yesterday, Oct 12. > > Since then I've had two panics, both in: > > panic: kernel diagnostic assertion "rw_write_held(&(la)->lle_lock)" failed: > file "...sys/netinet/if_arp.c", line 931 > > The second was this morning. The machine had been idle after the > previous panic, I started vncserver, and in that filezilla and > transmission, and it immediately paniced. > > The backtrace is: > > (gdb) target kvm netbsd.88.core > 0x80119755 in cpu_reboot (howto=howto@entry=260, > bootstr=bootstr@entry=0x0) at > /archive/foreign/src/sys/arch/amd64/amd64/machdep.c:671 > 671 dumpsys(); > (gdb) bt > #0 0x80119755 in cpu_reboot (howto=howto@entry=260, > bootstr=bootstr@entry=0x0) at > /archive/foreign/src/sys/arch/amd64/amd64/machdep.c:671 > #1 0x80811da4 in vpanic (fmt=0x80d38b08 "kernel %sassertion > \"%s\" failed: file \"%s\", line %d ", ap=ap@entry=0xfe813bf7b6f8) > at /archive/foreign/src/sys/kern/subr_prf.c:342 > #2 0x80a86303 in kern_assert (fmt=fmt@entry=0x80d38b08 > "kernel %sassertion \"%s\" failed: file \"%s\", line %d ") > at /archive/foreign/src/sys/lib/libkern/kern_assert.c:51 > #3 0x808cf369 in arpresolve (ifp=ifp@entry=0x80003b2ec058, > rt=rt@entry=0xfe8826534118, m=0xfe88174e0e00, > dst=dst@entry=0xfe8827d2d8b0, > desten=desten@entry=0xfe813bf7b7c2 > "��\377\377\377\377\030AS&�\376\377\377r\253\202\245\347\063\"�\030AS&�\376\377\377��\322'�\376\377\377\030AS&�\376\377\377\060�\367;�\376\377\377�6T�\377\377\377\377") > at /archive/foreign/src/sys/netinet/if_arp.c:931 > #4 0x8089c80c in ether_output (ifp0=0x80003b2ec058, > m0=, dst=0xfe8827d2d8b0, rt=0xfe8826534118) > at /archive/foreign/src/sys/net/if_ethersubr.c:245 > #5 0x805436b3 in klock_if_output (rt=0xfe8826534118, > dst=0xfe8827d2d8b0, m=0xfe88174e0e00, ifp=0x80003b2ec058) > at /archive/foreign/src/sys/netinet/ip_output.c:189 > #6 ip_hresolv_output (ifp0=, m=0xfe88174e0e00, > dst=dst@entry=0xfe8827d2d8b0, rt00=rt00@entry=0xfe8826534118) > at /archive/foreign/src/sys/netinet/ip_output.c:314 > #7 0x80544e1e in ip_output (m0=m0@entry=0xfe88174e0e00) at > /archive/foreign/src/sys/netinet/ip_output.c:749 > #8 0x8054f323 in tcp_output (tp=0xfe882578c000) at > /archive/foreign/src/sys/netinet/tcp_output.c:1627 > #9 0x8055863b in tcp_shutdown (so=0xfe882582f498) at > /archive/foreign/src/sys/netinet/tcp_usrreq.c:931 > #10 tcp_shutdown_wrapper (a=0xfe882582f498) at > /archive/foreign/src/sys/netinet/tcp_usrreq.c:2471 > #11 0x8071484b in nfs_disconnect (nmp=0xfe882554e008) at > /archive/foreign/src/sys/nfs/nfs_socket.c:410 > #12 0x807156d1 in nfs_reconnect (rep=rep@entry=0xfe880d6e92d8) at > /archive/foreign/src/sys/nfs/nfs_socket.c:355 > #13 0x806fe473 in nfs_receive (l=0xfe880aab3ba0, > mp=0xfe813bf7bc28, aname=0xfe813bf7bc30, rep=0xfe880d6e92d8) > at /archive/foreign/src/sys/nfs/nfs_clntsocket.c:278 > #14 nfs_reply (lwp=0xfe880aab3ba0, myrep=0xfe880d6e92d8) at > /archive/foreign/src/sys/nfs/nfs_clntsocket.c:352 > #15 nfs_request (np=np@entry=0xfe8825d92e38, mrest=0xfe8826c9e800, > procnum=procnum@entry=18, lwp=lwp@entry=0xfe880aab3ba0, > cred=cred@entry=0xfe880d70d180, > mrp=mrp@entry=0xfe813bf7bd50, mdp=mdp@entry=0xfe813bf7bd58, > dposp=dposp@entry=0xfe813bf7bd40, rexmitp=rexmitp@entry=0x0) > at /archive/foreign/src/sys/nfs/nfs_clntsocket.c:688 > #16 0x8071d74a in nfs_statvfs (mp=0xfe8825e03008, > sbp=0xfe880d472008) at /archive/foreign/src/sys/nfs/nfs_vfsops.c:202 > #17 0x80859948 in VFS_STATVFS (mp=mp@entry=0xfe8825e03008, > a=a@entry=0xfe880d472008) at /archive/foreign/src/sys/kern/vfs_subr.c:1339 > #18 0x8085d123 in dostatvfs (mp=mp@entry=0xfe8825e03008, > sp=sp@entry=0xfe880d472008, l=l@entry=0xfe880aab3ba0, > flags=flags@entry=1, root=root@entry=0) > at /archive/foreign/src/sys/kern/vfs_syscalls.c:1101 > #19 0x8085d4cf in do_sys_getvfsstat (l=0xfe880aab3ba0, > sfsp=0x7f7ff5d41290, bufsize=, flags=1, > copyfn=0x80113c40 , > entry_sz=entry_sz@entry=2256, retval=0xfe813bf7beb8) at > /archive/foreign/src/sys/kern/vfs_syscalls.c:1254 > #20 0x8085d60b in sys_getvfsstat (l=, uap= out>, retval=) at > /archive/foreign/src/sys/kern/vfs_syscalls.c:1306 > #21 0x8013cdbe in sy_call (rval=0xfe813bf7beb8, > uap=0xfe813bf7bf00, l=0xfe880aab3ba0,
cd sys/rump; $TOOLDIR/bin/nbmake-$MACHINE doesn't work
Hi, I sometimes build only rump-related binaries by cd sys/rump; $TOOLDIR/bin/nbmake-$MACHINE. It worked fine, but since one or two weeks ago, it doesn't work for me; it fails with the following errors: cd sys/rump; ~/git/seil6/work.tools/bin/nbmake-amd64 -j9 && ~/git/seil6/work.tools/bin/nbmake-amd64 -j9 install; cd - all ===> include all ===> librump all ===> dev all ===> fs all ===> kern all ===> net all ===> share all ===> dev/lib all ===> include/rump all ===> librump/rumpkern all ===> librump/rumpdev all ===> kern/lib all ===> librump/rumpnet all ===> net/lib all ===> share/man nbmake: "/home/ozaki-r/git/netbsd-src/sys/rump/include/rump/Makefile" line 7: Malformed conditional ((${MKRUMP} != "no")) nbmake: Fatal errors encountered -- cannot continue nbmake: stopped in /home/ozaki-r/git/netbsd-src/sys/rump/include/rump --- all-rump --- *** [all-rump] Error code 1 nbmake: stopped in /home/ozaki-r/git/netbsd-src/sys/rump/include 1 error nbmake: stopped in /home/ozaki-r/git/netbsd-src/sys/rump/include --- all-include --- *** [all-include] Error code 2 nbmake: stopped in /home/ozaki-r/git/netbsd-src/sys/rump nbmake: "/home/ozaki-r/git/netbsd-src/sys/rump/kern/lib/../Makefile.rumpkerncomp" line 8: Malformed conditional (${MKSLJIT} != "no") nbmake: "/home/ozaki-r/git/netbsd-src/sys/rump/net/lib/../Makefile.rumpnetcomp" line 9: Malformed conditional (${MKSLJIT} != "no") nbmake: Fatal errors encountered -- cannot continue nbmake: stopped in /home/ozaki-r/git/netbsd-src/sys/rump/kern/lib A failure has been detected in another branch of the parallel make nbmake: stopped in /home/ozaki-r/git/netbsd-src/sys/rump/share/man After the build failure, some symlinks are created in the source code tree: sys/rump/librump/rumpdev/amd64 sys/rump/librump/rumpdev/i386 sys/rump/librump/rumpdev/machine sys/rump/librump/rumpdev/x86 sys/rump/librump/rumpkern/amd64 sys/rump/librump/rumpkern/i386 sys/rump/librump/rumpkern/machine sys/rump/librump/rumpkern/x86 sys/rump/librump/rumpnet/amd64 sys/rump/librump/rumpnet/i386 sys/rump/librump/rumpnet/machine sys/rump/librump/rumpnet/x86 It's something wrong because I do build.sh with -O work.amd64 so nothing should be created in the source code tree. Do anyone know how to fix it? Thanks, ozaki-r
Re: arplookup: unable to enter address - host is not on local network
On Fri, Sep 18, 2015 at 5:53 AM, Benjamin Lorenz <b...@pocketservices.de> wrote: > On 17 Sep 2015, at 03:34, Ryota Ozaki <ozak...@netbsd.org> wrote: > >> We discussed a similar issue ever and introduced a sysctl to suppress >> the messages in -current: >> http://mail-index.netbsd.org/tech-kern/2014/11/13/msg017981.html >> >> So I think we can pull up the sysctl to netbsd-6 and netbsd-7 >> if you want. (It may take a while though.) > > I consider this to be a great idea — at least for netbsd-7. I have my own > patch > which basically comments out the log in all instances, so I am fine. But your > solution > would help other users as well. Okay. I'll request pull-ups of the commit to -7 and -6. ozaki-r
Re: arplookup: unable to enter address - host is not on local network
Hi, On Mon, Sep 14, 2015 at 4:49 PM, Benjamin Lorenzwrote: > > Talking about > http://ftp.netbsd.org/pub/NetBSD/NetBSD-current/src/sys/netinet/if_arp.c: I > have a hoster (www.vultr.com) with some strange(?) network setup. All > kernels (6.1 and 7.0 RC) complain with plenty of log messages highlighted > below. They screw up dmesg output so I had to comment out the log statement > and compile my own kernel. > > Question: Is this setup something vultr should change, or is our debug > logging a little bit too pedantic/paranoid so we should consider removing > it? We discussed a similar issue ever and introduced a sysctl to suppress the messages in -current: http://mail-index.netbsd.org/tech-kern/2014/11/13/msg017981.html So I think we can pull up the sysctl to netbsd-6 and netbsd-7 if you want. (It may take a while though.) ozaki-r > > Thanks for insight, > Benjamin > > > > > if (create) { > if (rt->rt_flags & RTF_GATEWAY) { > if (log_unknown_network) > why = "host is not on local network"; > } else if ((rt->rt_flags & RTF_LLINFO) == 0) { > ARP_STATINC(ARP_STAT_ALLOCFAIL); > why = "could not allocate llinfo"; > } else > why = "gateway route is not ours"; > if (why) { > log(LOG_DEBUG, "arplookup: unable to enter address" >" for %s@%s on %s (%s)\n", in_fmtaddr(*addr), >lla_snprintf(ar_sha(ah), ah->ar_hln), >(ifp) ? ifp->if_xname : "null", why); > } > if ((rt->rt_flags & RTF_CLONED) != 0) { > rtrequest(RTM_DELETE, rt_getkey(rt), >rt->rt_gateway, rt_mask(rt), rt->rt_flags, NULL); > } > } > >
Re: "ro->_ro_rt ==NULL || ro->_ro_rt->rt_refcnt > 0" failed
On Fri, Sep 4, 2015 at 10:28 AM, Jun Ebiharawrote: > From: Jun Ebihara > Subject: Re: "ro->_ro_rt ==NULL || ro->_ro_rt->rt_refcnt > 0" failed > Date: Thu, 03 Sep 2015 15:41:28 +0900 (JST) > I'll re-updating my kernel and try. >>> The fix is already committed, so you can try it by cvs update. >>> If the panic still happens, could you please try a kernel with >>> DEBUG? It will provide a more useful backtrace. >> Thanx. >> Now I make cvs update and make GENERIC_DEBUG and waiting for the panic will >> come. > > After over 10 hours,my system still work without headached panic! > Many thanx! Good :) Thank you for your help! ozaki-r
Re: "ro->_ro_rt ==NULL || ro->_ro_rt->rt_refcnt > 0" failed
On Wed, Sep 2, 2015 at 4:14 PM, Paul Goyette <p...@vps1.whooppee.com> wrote: > On Wed, 2 Sep 2015, Ryota Ozaki wrote: > >> On Wed, Sep 2, 2015 at 3:16 PM, Ryota Ozaki <ozak...@netbsd.org> wrote: >>> >>> Hi Paul and Jun, >>> >>> Thank you for your reporting! >>> >>> Now I can reproduce the issue quickly using openvpn. >>> So I would provide a fix soon (hopefully). >> >> >> Oops. The tested kernel was built at 8/24. A kernel built today >> doesn't reproduce the issue... > > > Hmmm. > > I will update my sources and check with a up-to-the-minute kernel. > > I should be able to provide results within the next 60 to 90 minutes. Thank you very much for your kind support. ozaki-r
Re: "ro->_ro_rt ==NULL || ro->_ro_rt->rt_refcnt > 0" failed
Hi, Thank you for rechecking. I found you're right and I mess up. I've been using a kernel with a different config from GENERIC, which is tuned for KVM to reduce build time. So I'm now able to reproduce the issue again with a latest GENERIC kernel. I've started debugging really. Thank you so much! ozaki-r On Wed, Sep 2, 2015 at 5:06 PM, Paul Goyette <p...@vps1.whooppee.com> wrote: > On Wed, 2 Sep 2015, Ryota Ozaki wrote: > >> On Wed, Sep 2, 2015 at 3:16 PM, Ryota Ozaki <ozak...@netbsd.org> wrote: >>> >>> Hi Paul and Jun, >>> >>> Thank you for your reporting! >>> >>> Now I can reproduce the issue quickly using openvpn. >>> So I would provide a fix soon (hopefully). >> >> >> Oops. The tested kernel was built at 8/24. A kernel built today >> doesn't reproduce the issue... > > > > Hmmm, I don't know what kernel you have from Aug 24. But a kernel that was > built from up-to-date sources less than one hour ago (and with no subsequent > CVS commits!) still fails. > > This kernel identifies itself as > > %uname -a > NetBSD pokey.whooppee.com 7.99.21 NetBSD 7.99.21 (GENERIC) #0: Wed Sep 2 > 15:49:03 PHT 2015 > p...@pokey.whooppee.com:/build/netbsd-local/obj/amd64/sys/arch/amd64/compile/GENERIC > amd64 > > And I have attached the Xterm log from my gdb session (after running > > tr -d '\r' > > to remove trailing ^M characters!) > > > > > > > - > | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | > | (Retired)| FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com| > | Kernel Developer | 0786 F758 55DE 53BA 7731 | pgoyette at netbsd.org | > -
Re: "ro->_ro_rt ==NULL || ro->_ro_rt->rt_refcnt > 0" failed
Hi Paul and Jun, Thank you for your reporting! Now I can reproduce the issue quickly using openvpn. So I would provide a fix soon (hopefully). ozaki-r On Tue, Sep 1, 2015 at 8:14 PM, Paul Goyette <p...@vps1.whooppee.com> wrote: > On Mon, 31 Aug 2015, Ryota Ozaki wrote: > >> Hi, >> >> I've committed a fix for rt_refcnt. Could you try again >> with -current? (though I'm not sure the fix is related to >> the issue...) > > > The fix is not helping on my situation, either! > > Here is the latest info, based on a kernel from today's sources... > > (gdb) target kvm netbsd.0.core > 0x8069eda5 in cpu_reboot (howto=howto@entry=260, > bootstr=bootstr@entry=0x0) > at /build/netbsd-local/src/sys/arch/amd64/amd64/machdep.c:671 > 671 dumpsys(); > #0 0x8069eda5 in cpu_reboot (howto=howto@entry=260, > bootstr=bootstr@entry=0x0) > at /build/netbsd-local/src/sys/arch/amd64/amd64/machdep.c:671 > #1 0x808e4cc4 in vpanic ( > fmt=0x80d2d8b8 "kernel %sassertion \"%s\" failed: file \"%s\", > line %d ", ap=ap@entry=0xfe810f5a48a8) > at /build/netbsd-local/src/sys/kern/subr_prf.c:342 > #2 0x80a7d763 in kern_assert ( > fmt=fmt@entry=0x80d2d8b8 "kernel %sassertion \"%s\" failed: file > \"%s\", line %d ") at > /build/netbsd-local/src/sys/lib/libkern/kern_assert.c:51 > #3 0x80a8421a in rtcache_invariants (ro=0xfe821c830060) > at /build/netbsd-local/src/sys/net/route.h:441 > #4 0x8082b2a5 in rtcache_invariants (ro=0xfe821c830060) > at /build/netbsd-local/src/sys/net/route.h:441 > #5 rtcache_getdst (ro=0xfe821c830060) > at /build/netbsd-local/src/sys/net/route.h:467 > #6 rtcache_lookup2 (ro=0xfe821c830060, > dst=dst@entry=0xfe810f5a495c, > clone=clone@entry=1, hitp=hitp@entry=0xfe810f5a4958) > at /build/netbsd-local/src/sys/net/route.c:1493 > #7 0x805070f7 in rtcache_lookup1 (clone=1, dst=0xfe810f5a495c, > ro=) at /build/netbsd-local/src/sys/net/route.h:449 > #8 selectroute (dstsock=dstsock@entry=0xfe810f5a4bc4, > opts=opts@entry=0xfe81f7af67d0, mopts=, > ro=ro@entry=0x0, retifp=retifp@entry=0xfe810f5a4a10, > retrt=retrt@entry=0xfe810f5a4a18, clone=1, > norouteok=norouteok@entry=1) > at /build/netbsd-local/src/sys/netinet6/in6_src.c:665 > #9 0x8050723a in in6_selectif (retifp=0xfe810f5a4a10, ro=0x0, > mopts=, opts=0xfe81f7af67d0, > dstsock=0xfe810f5a4bc4) > at /build/netbsd-local/src/sys/netinet6/in6_src.c:724 > #10 in6_selectsrc (dstsock=dstsock@entry=0xfe810f5a4bc4, > opts=opts@entry=0xfe81f7af67d0, mopts=, > ro=ro@entry=0xfe821c830060, laddr=laddr@entry=0xfe821c8300a0, > ifpp=ifpp@entry=0xfe810f5a4ae0, > errorp=errorp@entry=0xfe810f5a4adc) > at /build/netbsd-local/src/sys/netinet6/in6_src.c:204 > #11 0x80800869 in rip6_output (m=m@entry=0xfe821c249400, > so=so@entry=0xfe8219325db0, > dstsock=dstsock@entry=0xfe810f5a4bc4, > control=control@entry=0x0) > at /build/netbsd-local/src/sys/netinet6/raw_ip6.c:447 > #12 0x80800e08 in rip6_send (l=, control=0x0, > nam=0xfe81effcd638, m=0xfe821c249400, so=0xfe8219325db0) > at /build/netbsd-local/src/sys/netinet6/raw_ip6.c:893 > #13 rip6_send_wrapper (a=0xfe8219325db0, b=0xfe821c249400, > c=0xfe81effcd638, d=0x0, e=) > at /build/netbsd-local/src/sys/netinet6/raw_ip6.c:966 > #14 0x80999251 in sosend (so=0xfe8219325db0, > addr=0xfe81effcd638, uio=0xfe810f5a4d10, top=0xfe821c249400, > control=0x0, flags=, l=0xfe81f455c4c0) > at /build/netbsd-local/src/sys/kern/uipc_socket.c:1064 > #15 0x809a0785 in do_sys_sendmsg_so (l=l@entry=0xfe81f455c4c0, > s=s@entry=4, so=, fp=0xfe81f1ac6380, > mp=mp@entry=0xfe810f5a4e58, flags=flags@entry=0, > retsize=retsize@entry=0xfe810f5a4eb8) > at /build/netbsd-local/src/sys/kern/uipc_syscalls.c:622 > #16 0x809a0ad2 in do_sys_sendmsg (l=l@entry=0xfe81f455c4c0, s=4, > mp=mp@entry=0xfe810f5a4e58, flags=0, > retsize=retsize@entry=0xfe810f5a4eb8) > at /build/netbsd-local/src/sys/kern/uipc_syscalls.c:672 > #17 0x809a0b9b in sys_sendmsg (l=0xfe81f455c4c0, > uap=0xfe810f5a4f00, retval=0xfe810f5a4eb8) > at /build/netbsd-local/src/sys/kern/uipc_syscalls.c:528 > #18 0x80901f6c in sy_call (rval=0xfe810f5a4eb8, > uap=0xfe810f5a4f00, l=0xfe81f455c4c0, > sy=0x810ef240 <sysent+672>) > at /bu
Re: "ro->_ro_rt ==NULL || ro->_ro_rt->rt_refcnt > 0" failed
On Wed, Sep 2, 2015 at 3:16 PM, Ryota Ozaki <ozak...@netbsd.org> wrote: > Hi Paul and Jun, > > Thank you for your reporting! > > Now I can reproduce the issue quickly using openvpn. > So I would provide a fix soon (hopefully). Oops. The tested kernel was built at 8/24. A kernel built today doesn't reproduce the issue... ozaki-r > > ozaki-r > > On Tue, Sep 1, 2015 at 8:14 PM, Paul Goyette <p...@vps1.whooppee.com> wrote: >> On Mon, 31 Aug 2015, Ryota Ozaki wrote: >> >>> Hi, >>> >>> I've committed a fix for rt_refcnt. Could you try again >>> with -current? (though I'm not sure the fix is related to >>> the issue...) >> >> >> The fix is not helping on my situation, either! >> >> Here is the latest info, based on a kernel from today's sources... >> >> (gdb) target kvm netbsd.0.core >> 0x8069eda5 in cpu_reboot (howto=howto@entry=260, >> bootstr=bootstr@entry=0x0) >> at /build/netbsd-local/src/sys/arch/amd64/amd64/machdep.c:671 >> 671 dumpsys(); >> #0 0x8069eda5 in cpu_reboot (howto=howto@entry=260, >> bootstr=bootstr@entry=0x0) >> at /build/netbsd-local/src/sys/arch/amd64/amd64/machdep.c:671 >> #1 0x808e4cc4 in vpanic ( >> fmt=0x80d2d8b8 "kernel %sassertion \"%s\" failed: file \"%s\", >> line %d ", ap=ap@entry=0xfe810f5a48a8) >> at /build/netbsd-local/src/sys/kern/subr_prf.c:342 >> #2 0x80a7d763 in kern_assert ( >> fmt=fmt@entry=0x80d2d8b8 "kernel %sassertion \"%s\" failed: file >> \"%s\", line %d ") at >> /build/netbsd-local/src/sys/lib/libkern/kern_assert.c:51 >> #3 0x80a8421a in rtcache_invariants (ro=0xfe821c830060) >> at /build/netbsd-local/src/sys/net/route.h:441 >> #4 0x8082b2a5 in rtcache_invariants (ro=0xfe821c830060) >> at /build/netbsd-local/src/sys/net/route.h:441 >> #5 rtcache_getdst (ro=0xfe821c830060) >> at /build/netbsd-local/src/sys/net/route.h:467 >> #6 rtcache_lookup2 (ro=0xfe821c830060, >> dst=dst@entry=0xfe810f5a495c, >> clone=clone@entry=1, hitp=hitp@entry=0xfe810f5a4958) >> at /build/netbsd-local/src/sys/net/route.c:1493 >> #7 0x805070f7 in rtcache_lookup1 (clone=1, dst=0xfe810f5a495c, >> ro=) at /build/netbsd-local/src/sys/net/route.h:449 >> #8 selectroute (dstsock=dstsock@entry=0xfe810f5a4bc4, >> opts=opts@entry=0xfe81f7af67d0, mopts=, >> ro=ro@entry=0x0, retifp=retifp@entry=0xfe810f5a4a10, >> retrt=retrt@entry=0xfe810f5a4a18, clone=1, >> norouteok=norouteok@entry=1) >> at /build/netbsd-local/src/sys/netinet6/in6_src.c:665 >> #9 0x8050723a in in6_selectif (retifp=0xfe810f5a4a10, ro=0x0, >> mopts=, opts=0xfe81f7af67d0, >> dstsock=0xfe810f5a4bc4) >> at /build/netbsd-local/src/sys/netinet6/in6_src.c:724 >> #10 in6_selectsrc (dstsock=dstsock@entry=0xfe810f5a4bc4, >> opts=opts@entry=0xfe81f7af67d0, mopts=, >> ro=ro@entry=0xfe821c830060, laddr=laddr@entry=0xfe821c8300a0, >> ifpp=ifpp@entry=0xfe810f5a4ae0, >> errorp=errorp@entry=0xfe810f5a4adc) >> at /build/netbsd-local/src/sys/netinet6/in6_src.c:204 >> #11 0x80800869 in rip6_output (m=m@entry=0xfe821c249400, >> so=so@entry=0xfe8219325db0, >> dstsock=dstsock@entry=0xfe810f5a4bc4, >> control=control@entry=0x0) >> at /build/netbsd-local/src/sys/netinet6/raw_ip6.c:447 >> #12 0x80800e08 in rip6_send (l=, control=0x0, >> nam=0xfe81effcd638, m=0xfe821c249400, so=0xfe8219325db0) >> at /build/netbsd-local/src/sys/netinet6/raw_ip6.c:893 >> #13 rip6_send_wrapper (a=0xfe8219325db0, b=0xfe821c249400, >> c=0xfe81effcd638, d=0x0, e=) >> at /build/netbsd-local/src/sys/netinet6/raw_ip6.c:966 >> #14 0x80999251 in sosend (so=0xfe8219325db0, >> addr=0xfe81effcd638, uio=0xfe810f5a4d10, top=0xfe821c249400, >> control=0x0, flags=, l=0xfe81f455c4c0) >> at /build/netbsd-local/src/sys/kern/uipc_socket.c:1064 >> #15 0x809a0785 in do_sys_sendmsg_so (l=l@entry=0xfe81f455c4c0, >> s=s@entry=4, so=, fp=0xfe81f1ac6380, >> mp=mp@entry=0xfe810f5a4e58, flags=flags@entry=0, >> retsize=retsize@entry=0xfe810f5a4eb8) >> at /build/netbsd-local/src/sys/kern/uipc_syscalls.c:622 >> #16 0x809a0ad2 in do_sys_sendmsg (l=l@entry=0xfe81f455c4c0, s=4, >>
Re: "ro->_ro_rt ==NULL || ro->_ro_rt->rt_refcnt > 0" failed
On Wed, Sep 2, 2015 at 9:11 PM, Paul Goyette <p...@vps1.whooppee.com> wrote: > On Wed, 2 Sep 2015, Ryota Ozaki wrote: > >> On Wed, Sep 2, 2015 at 5:45 PM, Ryota Ozaki <ozak...@netbsd.org> wrote: >>> >>> Hi, >>> >>> Thank you for rechecking. >>> >>> I found you're right and I mess up. I've been using a kernel >>> with a different config from GENERIC, which is tuned for >>> KVM to reduce build time. So I'm now able to reproduce the >>> issue again with a latest GENERIC kernel. I've started >>> debugging really. >> >> >> I found nd6_lookup is broken. Here is a patch to fix the issue: >> http://www.netbsd.org/~ozaki-r/fix-ipv6-refcnt.diff >> It works for me. I hope it works for you too. >> >> The change looks big because it is based on my local change that >> cleans up complicated nd6_lookup though, the point is to replace >> rtfree with RTFREE_IF_NEEDED. > > > This seems to work for me, too! Yay! > > (I still have other routing issues with my vpn tunnel, but at least the > machine no longer crashes.) Hmm, feel free to report the issues. Anyway thanks for you help! ozaki-r
Re: "ro->_ro_rt ==NULL || ro->_ro_rt->rt_refcnt > 0" failed
On Wed, Sep 2, 2015 at 5:45 PM, Ryota Ozaki <ozak...@netbsd.org> wrote: > Hi, > > Thank you for rechecking. > > I found you're right and I mess up. I've been using a kernel > with a different config from GENERIC, which is tuned for > KVM to reduce build time. So I'm now able to reproduce the > issue again with a latest GENERIC kernel. I've started > debugging really. I found nd6_lookup is broken. Here is a patch to fix the issue: http://www.netbsd.org/~ozaki-r/fix-ipv6-refcnt.diff It works for me. I hope it works for you too. The change looks big because it is based on my local change that cleans up complicated nd6_lookup though, the point is to replace rtfree with RTFREE_IF_NEEDED. Thanks, ozaki-r > > Thank you so much! > ozaki-r > > On Wed, Sep 2, 2015 at 5:06 PM, Paul Goyette <p...@vps1.whooppee.com> wrote: >> On Wed, 2 Sep 2015, Ryota Ozaki wrote: >> >>> On Wed, Sep 2, 2015 at 3:16 PM, Ryota Ozaki <ozak...@netbsd.org> wrote: >>>> >>>> Hi Paul and Jun, >>>> >>>> Thank you for your reporting! >>>> >>>> Now I can reproduce the issue quickly using openvpn. >>>> So I would provide a fix soon (hopefully). >>> >>> >>> Oops. The tested kernel was built at 8/24. A kernel built today >>> doesn't reproduce the issue... >> >> >> >> Hmmm, I don't know what kernel you have from Aug 24. But a kernel that was >> built from up-to-date sources less than one hour ago (and with no subsequent >> CVS commits!) still fails. >> >> This kernel identifies itself as >> >> %uname -a >> NetBSD pokey.whooppee.com 7.99.21 NetBSD 7.99.21 (GENERIC) #0: Wed Sep 2 >> 15:49:03 PHT 2015 >> p...@pokey.whooppee.com:/build/netbsd-local/obj/amd64/sys/arch/amd64/compile/GENERIC >> amd64 >> >> And I have attached the Xterm log from my gdb session (after running >> >> tr -d '\r' >> >> to remove trailing ^M characters!) >> >> >> >> >> >> >> - >> | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | >> | (Retired)| FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com| >> | Kernel Developer | 0786 F758 55DE 53BA 7731 | pgoyette at netbsd.org | >> -
Re: "ro->_ro_rt ==NULL || ro->_ro_rt->rt_refcnt > 0" failed
Hi, I've committed a fix for rt_refcnt. Could you try again with -current? (though I'm not sure the fix is related to the issue...) Thanks, ozaki-r On Fri, Aug 28, 2015 at 3:55 PM, Ryota Ozaki <ozak...@netbsd.org> wrote: > Hi, > > Thank you for sending backtraces. I'm investigating now... > > ozaki-r > > > On Fri, Aug 28, 2015 at 1:46 PM, Paul Goyette <p...@vps1.whooppee.com> wrote: >> On Fri, 28 Aug 2015, Paul Goyette wrote: >> >>> On Fri, 28 Aug 2015, Paul Goyette wrote: >>> >>>> On Fri, 28 Aug 2015, Jun Ebihara wrote: >>>> >>>>> On: i386 kernel from nyftp. >>>>> 7.99.21 NetBSD 7.99.21 (GENERIC.201508271450Z) #0: Thu Aug 27 17:23:37 >>>>> UTC 2015 >>>>> bui...@b47.netbsd.org:/home/builds/ab/HEAD/i386/201508271450Z-obj/home/source/ab/HEAD/src/sys/arch/i386/compile/GENERIC >>>>> i386 >>>>> >>>>> sometimes panic around route.h. >>>>> >>>>> savecore: reboot after panic: panic: kernel diagnostic assertion >>>>> "ro->_ro_rt ==NULL || ro->_ro_rt->rt_refcnt > 0" failed: file >>>>> "/home/source/ab/HEAD/src/sys/net/route.h", line 433 >>>>> >>>> >>>> I had one of these crashes yesterday. It happened when I was stopping a >>>> pkgsrc/net/openvpn tunnel via "/etc/rc.d/openvpn onestop" >>>> >>>> >>>> I don't bring my tunnel up very often, so I'm not sure if it is >>>> reproducible... >>> >>> >>> I don't know if this is the same crash as yesterday, but I just got >>> another one! >>> >>> I started openvpn, then stopped it. No crash. so I repeated this a >>> couple more times. Still no crash. >>> >>> Then I brought the tunnel up, and did a 'ping6 ftp.netbsd.org' and it >>> crashed almost immediately. >> >> >> Here's the backtrace again, this time with symbol table loaded! >> >> (gdb) bt >> #0 0x802f5ae5 in cpu_reboot (howto=howto@entry=260, >> bootstr=bootstr@entry=0x0) >> at /build/netbsd-local/src/sys/arch/amd64/amd64/machdep.c:671 >> #1 0x803742d4 in vpanic ( >> fmt=0x804b6e38 "kernel %sassertion \"%s\" failed: file \"%s\", >> line %d ", ap=ap@entry=0xfe817d50eb70) >> at /build/netbsd-local/src/sys/kern/subr_prf.c:340 >> #2 0x80467583 in kern_assert ( >> fmt=fmt@entry=0x804b6e38 "kernel %sassertion \"%s\" failed: file >> \"%s\", line %d ") >> at /build/netbsd-local/src/sys/lib/libkern/kern_assert.c:51 >> #3 0x8033a0e3 in rtfree (rt=0xfe815325aab0) >> at /build/netbsd-local/src/sys/net/route.c:417 >> #4 0x8033a5fd in rtcache_clear (ro=ro@entry=0xfe821db78060) >> at /build/netbsd-local/src/sys/net/route.c:1473 >> #5 0x8033a681 in rtcache_free (ro=ro@entry=0xfe821db78060) >> at /build/netbsd-local/src/sys/net/route.c:1518 >> #6 0x8020a65a in in6_pcbdetach (in6p=0xfe821db78000) >> at /build/netbsd-local/src/sys/netinet6/in6_pcb.c:618 >> #7 0x80332bec in rip6_detach (so=) >> at /build/netbsd-local/src/sys/netinet6/raw_ip6.c:644 >> #8 0x80332ccc in rip6_detach_wrapper (a=0xfe817c792498) >> at /build/netbsd-local/src/sys/netinet6/raw_ip6.c:964 >> #9 0x803cea10 in soclose (so=0xfe817c792498) >> at /build/netbsd-local/src/sys/kern/uipc_socket.c:762 >> #10 0x803860a1 in soo_close (fp=0xfe81ef653980) >> at /build/netbsd-local/src/sys/kern/sys_socket.c:255 >> #11 0x802b0de4 in closef (fp=0xfe81ef653980) >> at /build/netbsd-local/src/sys/kern/kern_descrip.c:831 >> #12 0x802b39e3 in fd_free () >> at /build/netbsd-local/src/sys/kern/kern_descrip.c:1561 >> #13 0x802badd5 in exit1 (l=l@entry=0xfe815a2249a0, >> rv=rv@entry=2) >> at /build/netbsd-local/src/sys/kern/kern_exit.c:275 >> #14 0x802db146 in sigexit (l=l@entry=0xfe815a2249a0, >> signo=signo@entry=2) >> at /build/netbsd-local/src/sys/kern/kern_sig.c:2048 >> #15 0x802db46b in postsig (signo=2) >> at /build/netbsd-local/src/sys/kern/kern_sig.c:1848 >> #16 0x802c4999 in lwp_userret (l=l@entry=0xfe815a2249a0) >> at /build/netbsd-local/src/sys/kern/kern_lwp.c:1530 >> #17 0x803866b4 in mi_userret (l=0xfe815a2249a0) >> at /build/netbs
Re: ro-_ro_rt ==NULL || ro-_ro_rt-rt_refcnt 0 failed
Hi, Thank you for sending backtraces. I'm investigating now... ozaki-r On Fri, Aug 28, 2015 at 1:46 PM, Paul Goyette p...@vps1.whooppee.com wrote: On Fri, 28 Aug 2015, Paul Goyette wrote: On Fri, 28 Aug 2015, Paul Goyette wrote: On Fri, 28 Aug 2015, Jun Ebihara wrote: On: i386 kernel from nyftp. 7.99.21 NetBSD 7.99.21 (GENERIC.201508271450Z) #0: Thu Aug 27 17:23:37 UTC 2015 bui...@b47.netbsd.org:/home/builds/ab/HEAD/i386/201508271450Z-obj/home/source/ab/HEAD/src/sys/arch/i386/compile/GENERIC i386 sometimes panic around route.h. savecore: reboot after panic: panic: kernel diagnostic assertion ro-_ro_rt ==NULL || ro-_ro_rt-rt_refcnt 0 failed: file /home/source/ab/HEAD/src/sys/net/route.h, line 433 I had one of these crashes yesterday. It happened when I was stopping a pkgsrc/net/openvpn tunnel via /etc/rc.d/openvpn onestop I don't bring my tunnel up very often, so I'm not sure if it is reproducible... I don't know if this is the same crash as yesterday, but I just got another one! I started openvpn, then stopped it. No crash. so I repeated this a couple more times. Still no crash. Then I brought the tunnel up, and did a 'ping6 ftp.netbsd.org' and it crashed almost immediately. Here's the backtrace again, this time with symbol table loaded! (gdb) bt #0 0x802f5ae5 in cpu_reboot (howto=howto@entry=260, bootstr=bootstr@entry=0x0) at /build/netbsd-local/src/sys/arch/amd64/amd64/machdep.c:671 #1 0x803742d4 in vpanic ( fmt=0x804b6e38 kernel %sassertion \%s\ failed: file \%s\, line %d , ap=ap@entry=0xfe817d50eb70) at /build/netbsd-local/src/sys/kern/subr_prf.c:340 #2 0x80467583 in kern_assert ( fmt=fmt@entry=0x804b6e38 kernel %sassertion \%s\ failed: file \%s\, line %d ) at /build/netbsd-local/src/sys/lib/libkern/kern_assert.c:51 #3 0x8033a0e3 in rtfree (rt=0xfe815325aab0) at /build/netbsd-local/src/sys/net/route.c:417 #4 0x8033a5fd in rtcache_clear (ro=ro@entry=0xfe821db78060) at /build/netbsd-local/src/sys/net/route.c:1473 #5 0x8033a681 in rtcache_free (ro=ro@entry=0xfe821db78060) at /build/netbsd-local/src/sys/net/route.c:1518 #6 0x8020a65a in in6_pcbdetach (in6p=0xfe821db78000) at /build/netbsd-local/src/sys/netinet6/in6_pcb.c:618 #7 0x80332bec in rip6_detach (so=optimized out) at /build/netbsd-local/src/sys/netinet6/raw_ip6.c:644 #8 0x80332ccc in rip6_detach_wrapper (a=0xfe817c792498) at /build/netbsd-local/src/sys/netinet6/raw_ip6.c:964 #9 0x803cea10 in soclose (so=0xfe817c792498) at /build/netbsd-local/src/sys/kern/uipc_socket.c:762 #10 0x803860a1 in soo_close (fp=0xfe81ef653980) at /build/netbsd-local/src/sys/kern/sys_socket.c:255 #11 0x802b0de4 in closef (fp=0xfe81ef653980) at /build/netbsd-local/src/sys/kern/kern_descrip.c:831 #12 0x802b39e3 in fd_free () at /build/netbsd-local/src/sys/kern/kern_descrip.c:1561 #13 0x802badd5 in exit1 (l=l@entry=0xfe815a2249a0, rv=rv@entry=2) at /build/netbsd-local/src/sys/kern/kern_exit.c:275 #14 0x802db146 in sigexit (l=l@entry=0xfe815a2249a0, signo=signo@entry=2) at /build/netbsd-local/src/sys/kern/kern_sig.c:2048 #15 0x802db46b in postsig (signo=2) at /build/netbsd-local/src/sys/kern/kern_sig.c:1848 #16 0x802c4999 in lwp_userret (l=l@entry=0xfe815a2249a0) at /build/netbsd-local/src/sys/kern/kern_lwp.c:1530 #17 0x803866b4 in mi_userret (l=0xfe815a2249a0) at /build/netbsd-local/src/sys/sys/userret.h:94 #18 userret (l=0xfe815a2249a0) at ./machine/userret.h:82 #19 syscall (frame=0xfe817d50ef00) at /build/netbsd-local/src/sys/arch/x86/x86/syscall.c:184 #20 0x80100691 in Xsyscall () (gdb) frame 3 #3 0x8033a0e3 in rtfree (rt=0xfe815325aab0) at /build/netbsd-local/src/sys/net/route.c:417 warning: Source file is more recent than executable. (gdb) print rt $1 = (struct rtentry *) 0xfe815325aab0 (gdb) print *rt $2 = {rt_nodes = {{rn_mklist = 0xfe81f18b9558, rn_p = 0xfe821dbb4bb8, rn_b = -1, rn_bmask = 0 '\000', rn_flags = 4 '\004', rn_u = {rn_leaf = { rn_Key = 0xfe821ddcc048 \034\030, rn_Mask = 0xfe810e9882b8 , rn_Dupedkey = 0x0}, rn_node = { rn_Off = 501006408, rn_L = 0xfe810e9882b8, rn_R = 0x0}}}, { rn_mklist = 0x0, rn_p = 0x0, rn_b = 0, rn_bmask = 0 '\000', rn_flags = 0 '\000', rn_u = {rn_leaf = {rn_Key = 0x0, rn_Mask = 0x0, rn_Dupedkey = 0x0}, rn_node = {rn_Off = 0, rn_L = 0x0, rn_R = 0x0, rt_gateway = 0xfe821e1cdeb8, rt_flags = 2051, rt_refcnt = 0, rt_use = 17, rt_ifp = 0xfe8138031810, ^^ rt_ifa = 0xfe81ee677010, rt_ifa_seqno = 0, rt_llinfo = 0x0, rt_rmx = { rmx_locks = 0, rmx_mtu = 0,
Re: -current kernel on KVM with virtio disk fails to boot
On Mon, Aug 24, 2015 at 2:12 PM, Michael van Elst mlel...@serpens.de wrote: ozak...@netbsd.org (Ryota Ozaki) writes: Hi, I got the following panic on bootup. It seems recent IPL_VM = IPL_NONE change in dk_attach causes it. Yes. Unfortunately the ld drivers differ very much in what context the start and iodone routines are called. Should we apply it? It would only help virtio. Either all drivers must be adjusted or a common solution must be found. I thought the panic happens only on the virtio disk driver. Feel free to apply the patch if we choose the former. ozaki-r -- -- Michael van Elst Internet: mlel...@serpens.de A potential Snark may lurk in every tree.
-current kernel on KVM with virtio disk fails to boot
Hi, I got the following panic on bootup. It seems recent IPL_VM = IPL_NONE change in dk_attach causes it. Mutex error: lockdebug_wantlock: acquiring sleep lock from interrupt context lock address : 0xfe800386bce0 type : sleep/adaptive initialized : 0x801b3a65 shared holds : 0 exclusive: 0 shares wanted: 0 exclusive: 0 current cpu : 0 last held: 0 current lwp : 0xfe80035771a0 last held: 00 last locked : 0x801b3bfc unlocked*: 0x80489f53 owner field : 00 wait/spin:0/0 Turnstile chain at 0x809bca40. = No active turnstile for this lock. panic: LOCKDEBUG: Mutex error: lockdebug_wantlock: acquiring sleep lock from interrupt context fatal breakpoint trap in supervisor mode trap type 1 code 0 rip 80197c75 cs 8 rflags 246 cr2 0 ilevel 6 rsp fe8003997b00 curlwp 0xfe80035771a0 pid 0.43 lowest kstack 0xfe80039942c0 Stopped in pid 0.43 (system) at netbsd:breakpoint+0x5: leave db{0} bt breakpoint() at netbsd:breakpoint+0x5 vpanic() at netbsd:vpanic+0x13c snprintf() at netbsd:snprintf lockdebug_more() at netbsd:lockdebug_more mutex_enter() at netbsd:mutex_enter+0x43f dk_done() at netbsd:dk_done+0x62 lddone() at netbsd:lddone+0xf ld_virtio_vq_done() at netbsd:ld_virtio_vq_done+0x37 virtio_vq_intr() at netbsd:virtio_vq_intr+0x70 virtio_intr() at netbsd:virtio_intr+0x70 intr_biglock_wrapper() at netbsd:intr_biglock_wrapper+0x19 Xintr_ioapic_level6() at netbsd:Xintr_ioapic_level6+0xf2 --- interrupt --- bus_space_read_4() at netbsd:bus_space_read_4+0xa lwp_exit_switchaway() at netbsd:lwp_exit_switchaway+0x79 lwp_exit() at netbsd:lwp_exit+0x317 kthread_exit() at netbsd:kthread_exit+0x4e config_finalize_register() at netbsd:config_finalize_register We can fix it by running ld_virtio's interrupt hander in softint; it can be done simply by the patch. diff --git a/sys/dev/pci/ld_virtio.c b/sys/dev/pci/ld_virtio.c index f578a73..404edfd 100644 --- a/sys/dev/pci/ld_virtio.c +++ b/sys/dev/pci/ld_virtio.c @@ -249,7 +249,7 @@ ld_virtio_attach(device_t parent, device_t self, void *aux) vsc-sc_nvqs = 1; vsc-sc_config_change = NULL; vsc-sc_intrhand = virtio_vq_intr; - vsc-sc_flags = 0; + vsc-sc_flags = VIRTIO_F_PCI_INTR_SOFTINT; features = virtio_negotiate_features(vsc, (VIRTIO_BLK_F_SIZE_MAX | Should we apply it? ozaki-r
Re: Kernel panic from network traffic
Hi, It's probably due to my recent change to refcnt. I'm investigating that defect. Thanks, ozaki-r On Fri, Jul 24, 2015 at 2:41 PM, Hisashi T Fujinaka ht...@twofifty.com wrote: Being a moron, I plugged ports of my switch together. The big surprise is two ports away is my -current box and it kept panicking. This is all I got so far. Jul 23 21:46:11 mara /netbsd: panic: kernel diagnostic assertion rt-rt_refcnt 0 failed: file /usr/src/sys/net/route.c, line 418 Jul 23 21:46:11 mara /netbsd: cpu3: Begin traceback... Jul 23 21:46:11 mara /netbsd: vpanic() at netbsd:vpanic+0x13c Jul 23 21:46:11 mara /netbsd: kern_assert() at netbsd:kern_assert+0x4f Jul 23 21:46:11 mara /netbsd: rtfree() at netbsd:rtfree+0xf5 Jul 23 21:46:11 mara /netbsd: rtcache_clear() at netbsd:rtcache_clear+0x41 Jul 23 21:46:11 mara /netbsd: rtcache_free() at netbsd:rtcache_free+0xd Jul 23 21:46:11 mara /netbsd: in6_pcbdetach() at netbsd:in6_pcbdetach+0xcb Jul 23 21:46:11 mara /netbsd: udp6_detach_wrapper() at netbsd:udp6_detach_wrapper+0x3f Jul 23 21:46:11 mara /netbsd: soclose() at netbsd:soclose+0x63 Jul 23 21:46:11 mara /netbsd: soo_close() at netbsd:soo_close+0x16 Jul 23 21:46:11 mara /netbsd: closef() at netbsd:closef+0x54 Jul 23 21:46:11 mara /netbsd: fd_close() at netbsd:fd_close+0x19f Jul 23 21:46:11 mara /netbsd: sys_close() at netbsd:sys_close+0x20 Jul 23 21:46:11 mara /netbsd: syscall() at netbsd:syscall+0x9c Jul 23 21:46:11 mara /netbsd: --- syscall (number 6) --- Jul 23 21:46:11 mara /netbsd: 7f7ff5e5494a: Jul 23 21:46:11 mara /netbsd: cpu3: End traceback... -- Hisashi T Fujinaka - ht...@twofifty.com BSEE + BSChem + BAEnglish + MSCS + $2.50 = coffee
Re: Kernel panic from network traffic
Hi, I just fixed one bug related to refcnt. The fix may shut up the panic. Could you try again with a latest kernel? Thanks, ozaki-r On Fri, Jul 24, 2015 at 3:38 PM, Ryota Ozaki ozak...@netbsd.org wrote: On Fri, Jul 24, 2015 at 3:12 PM, Ryota Ozaki ozak...@netbsd.org wrote: Hi, It's probably due to my recent change to refcnt. I'm investigating that defect. Hmm, I cannot reproduce it. Could you tell me the kernel config, network setups and apps running on the box? Thanks, ozaki-r Thanks, ozaki-r On Fri, Jul 24, 2015 at 2:41 PM, Hisashi T Fujinaka ht...@twofifty.com wrote: Being a moron, I plugged ports of my switch together. The big surprise is two ports away is my -current box and it kept panicking. This is all I got so far. Jul 23 21:46:11 mara /netbsd: panic: kernel diagnostic assertion rt-rt_refcnt 0 failed: file /usr/src/sys/net/route.c, line 418 Jul 23 21:46:11 mara /netbsd: cpu3: Begin traceback... Jul 23 21:46:11 mara /netbsd: vpanic() at netbsd:vpanic+0x13c Jul 23 21:46:11 mara /netbsd: kern_assert() at netbsd:kern_assert+0x4f Jul 23 21:46:11 mara /netbsd: rtfree() at netbsd:rtfree+0xf5 Jul 23 21:46:11 mara /netbsd: rtcache_clear() at netbsd:rtcache_clear+0x41 Jul 23 21:46:11 mara /netbsd: rtcache_free() at netbsd:rtcache_free+0xd Jul 23 21:46:11 mara /netbsd: in6_pcbdetach() at netbsd:in6_pcbdetach+0xcb Jul 23 21:46:11 mara /netbsd: udp6_detach_wrapper() at netbsd:udp6_detach_wrapper+0x3f Jul 23 21:46:11 mara /netbsd: soclose() at netbsd:soclose+0x63 Jul 23 21:46:11 mara /netbsd: soo_close() at netbsd:soo_close+0x16 Jul 23 21:46:11 mara /netbsd: closef() at netbsd:closef+0x54 Jul 23 21:46:11 mara /netbsd: fd_close() at netbsd:fd_close+0x19f Jul 23 21:46:11 mara /netbsd: sys_close() at netbsd:sys_close+0x20 Jul 23 21:46:11 mara /netbsd: syscall() at netbsd:syscall+0x9c Jul 23 21:46:11 mara /netbsd: --- syscall (number 6) --- Jul 23 21:46:11 mara /netbsd: 7f7ff5e5494a: Jul 23 21:46:11 mara /netbsd: cpu3: End traceback... -- Hisashi T Fujinaka - ht...@twofifty.com BSEE + BSChem + BAEnglish + MSCS + $2.50 = coffee
Re: Kernel panic from network traffic
On Fri, Jul 24, 2015 at 3:12 PM, Ryota Ozaki ozak...@netbsd.org wrote: Hi, It's probably due to my recent change to refcnt. I'm investigating that defect. Hmm, I cannot reproduce it. Could you tell me the kernel config, network setups and apps running on the box? Thanks, ozaki-r Thanks, ozaki-r On Fri, Jul 24, 2015 at 2:41 PM, Hisashi T Fujinaka ht...@twofifty.com wrote: Being a moron, I plugged ports of my switch together. The big surprise is two ports away is my -current box and it kept panicking. This is all I got so far. Jul 23 21:46:11 mara /netbsd: panic: kernel diagnostic assertion rt-rt_refcnt 0 failed: file /usr/src/sys/net/route.c, line 418 Jul 23 21:46:11 mara /netbsd: cpu3: Begin traceback... Jul 23 21:46:11 mara /netbsd: vpanic() at netbsd:vpanic+0x13c Jul 23 21:46:11 mara /netbsd: kern_assert() at netbsd:kern_assert+0x4f Jul 23 21:46:11 mara /netbsd: rtfree() at netbsd:rtfree+0xf5 Jul 23 21:46:11 mara /netbsd: rtcache_clear() at netbsd:rtcache_clear+0x41 Jul 23 21:46:11 mara /netbsd: rtcache_free() at netbsd:rtcache_free+0xd Jul 23 21:46:11 mara /netbsd: in6_pcbdetach() at netbsd:in6_pcbdetach+0xcb Jul 23 21:46:11 mara /netbsd: udp6_detach_wrapper() at netbsd:udp6_detach_wrapper+0x3f Jul 23 21:46:11 mara /netbsd: soclose() at netbsd:soclose+0x63 Jul 23 21:46:11 mara /netbsd: soo_close() at netbsd:soo_close+0x16 Jul 23 21:46:11 mara /netbsd: closef() at netbsd:closef+0x54 Jul 23 21:46:11 mara /netbsd: fd_close() at netbsd:fd_close+0x19f Jul 23 21:46:11 mara /netbsd: sys_close() at netbsd:sys_close+0x20 Jul 23 21:46:11 mara /netbsd: syscall() at netbsd:syscall+0x9c Jul 23 21:46:11 mara /netbsd: --- syscall (number 6) --- Jul 23 21:46:11 mara /netbsd: 7f7ff5e5494a: Jul 23 21:46:11 mara /netbsd: cpu3: End traceback... -- Hisashi T Fujinaka - ht...@twofifty.com BSEE + BSChem + BAEnglish + MSCS + $2.50 = coffee
Re: Regression on rtadvd (was: Re: CVS commit: src/usr.sbin/rtadvd)
On Mon, Jun 15, 2015 at 5:58 AM, Timo Buhrmester fstd.l...@gmail.com wrote: Module Name: src Committed By: roy Date: Fri Jun 5 14:15:41 UTC 2015 Modified Files: src/usr.sbin/rtadvd: rtadvd.c Log Message: Set the hoplimit of 255 as specified in RFC 4861 section 4.2 using the IPV6_MULTICAST_HOPS socket option rather than using CMSG when constructing each message. This commit broke rtadvd for me, on i386, which now gives Invalid argument on the call to sendmsg() that is supposed to generate RAs. Here's two gdb transcripts: This is what happens AFTER the offending commit: | # gdb -q rtadvd | (gdb) break rtadvd.c:1699 | (gdb) run -df vr1 | [...] | Breakpoint 1, ra_output (rai=0xbb91c0e0) at /usr/src.head/usr.sbin/rtadvd/rtadvd.c:1699 | 1699 i = sendmsg(sock, sndmhdr, 0); | (gdb) bt | #0 ra_output (rai=0xbb91c0e0) at /usr/src.head/usr.sbin/rtadvd/rtadvd.c:1699 | #1 0x0804cdbf in ra_timeout (data=0xbb91c0e0) at /usr/src.head/usr.sbin/rtadvd/rtadvd.c:1779 | #2 0x08052610 in rtadvd_check_timer () at /usr/src.head/usr.sbin/rtadvd/timer.c:130 | #3 0x080499c0 in main (argc=-1, argv=0xbfbfec64) at /usr/src.head/usr.sbin/rtadvd/rtadvd.c:315 | (gdb) x/28xb sndmhdr | 0x8058b2c sndmhdr:0x40 0x7e0x050x080x1c0x000x00 0x00 | 0x8058b34 sndmhdr+8: 0xc4 0x8a0x050x080x010x000x00 0x00 | 0x8058b3c sndmhdr+16: 0xd0 0xa00x900xbb0x300x000x00 0x00 | 0x8058b44 sndmhdr+24: 0x00 0x000x000x00 | (gdb) n | 1701 if (i 0 || (size_t)i != rai-ra_datalen) { | (gdb) print i | $1 = -1 | (gdb) n | [...] | rtadvd[3794]: ra_output sendmsg on vr1: Invalid argument The following is what it looked like BEFORE the offending commit: | # gdb -q rtadvd | (gdb) break rtadvd.c:1702 | (gdb) run -df vr1 | [...] | Breakpoint 1, ra_output (rai=0xbb91c0e0) at /usr/src.head/usr.sbin/rtadvd/rtadvd.c:1702 | 1702 i = sendmsg(sock, sndmhdr, 0); | (gdb) bt | #0 ra_output (rai=0xbb91c0e0) at /usr/src.head/usr.sbin/rtadvd/rtadvd.c:1702 | #1 0x0804cdcf in ra_timeout (data=0xbb91c0e0) at /usr/src.head/usr.sbin/rtadvd/rtadvd.c:1782 | #2 0x08052620 in rtadvd_check_timer () at /usr/src.head/usr.sbin/rtadvd/timer.c:130 | #3 0x080499c0 in main (argc=-1, argv=0xbfbfec64) at /usr/src.head/usr.sbin/rtadvd/rtadvd.c:315 | (gdb) x/28xb sndmhdr | 0x8058b0c sndmhdr:0x20 0x7e0x050x080x1c0x000x00 0x00 | 0x8058b14 sndmhdr+8: 0xa4 0x8a0x050x080x010x000x00 0x00 | 0x8058b1c sndmhdr+16: 0xd0 0xa00x900xbb0x300x000x00 0x00 | 0x8058b24 sndmhdr+24: 0x00 0x000x000x00 | (gdb) n | 1704 if (i 0 || (size_t)i != rai-ra_datalen) { | (gdb) print i | $2 = 56 The difference in the representations of `sndmhdr` are in the 1st and 9th byte, which on i386 correspond to the 1st and 3rd members of struct sndhdr: |void*msg_name; /* optional address */ | andstruct iovec*msg_iov; /* scatter/gather array */ I've manually tracked the failing code path down through the sendmsg system call: sys/kern/uipc_syscalls.c:620 do_sys_sendmsg_so() ``error = (*so-so_send)(so, sa, auio, NULL, control, flags, l);'' sys/kern/uipc_socket.c:1602 sosend() ``error = (*so-so_proto-pr_usrreqs-pr_send)(so, top, addr, control, l);'' sys/netinet6/raw_ip6.c:891 rip6_send()``error = rip6_output(m, so, dst, control);'' sys/netinet6/raw_ip6.c:391 rip6_output() ``if ((error = ip6_setpktopts(control, opt, [...]'' sys/netinet6/ip6_output.c:2705 ip6_setpktopts() ``if (cm-cmsg_len == 0 || cm-cmsg_len control-m_len)'' The last function is where the EINVAL is produced, due to cm-cmsg_len being zero (in the 2nd iteration of the enclosing loop) Unfortunately, I couldn't figure out what's supposed to happen there. Any ideas? It seems that the kernel attempts to read non-existent cmsg which was removed by the commit. Can you try the below patch? It reduces cmsg buffer length as hoplimit cmsg is removed. ozaki-r diff --git a/usr.sbin/rtadvd/rtadvd.c b/usr.sbin/rtadvd/rtadvd.c index 66885c4..f2016eb 100644 --- a/usr.sbin/rtadvd/rtadvd.c +++ b/usr.sbin/rtadvd/rtadvd.c @@ -1503,8 +1503,7 @@ sock_open(void) exit(1); } - sndcmsgbuflen = CMSG_SPACE(sizeof(struct in6_pktinfo)) + - CMSG_SPACE(sizeof(int)); + sndcmsgbuflen = CMSG_SPACE(sizeof(struct in6_pktinfo)); sndcmsgbuf = malloc(sndcmsgbuflen); if (sndcmsgbuf == NULL) { syslog(LOG_ERR, %s malloc: %m, __func__);
fatal integer divide fault in dk(4)
Hi, I got the following fault with a recent -current kernel on KVM since some days ago. fatal integer divide fault in supervisor mode trap type 8 code 0 rip 801b38fa cs 8 rflags 10246 cr2 0 ilevel 0 rsp fe800398cb48 curlwp 0xfe80035345c0 pid 0.42 lowest kstack 0xfe80039892c0 kernel: integer divide fault trap, code=0 Stopped in pid 0.42 (system) at netbsd:dk_strategy+0x41:divl %esi,%eax db{0} bt dk_strategy() at netbsd:dk_strategy+0x41 disk_read_sectors() at netbsd:disk_read_sectors+0x3b read_sector() at netbsd:read_sector+0x1d scan_mbr() at netbsd:scan_mbr+0x32 readdisklabel() at netbsd:readdisklabel+0x150 dk_getdisklabel() at netbsd:dk_getdisklabel+0xbf dk_open() at netbsd:dk_open+0xf4 cdev_open() at netbsd:cdev_open+0xb2 spec_open() at netbsd:spec_open+0x25d VOP_OPEN() at netbsd:VOP_OPEN+0x33 dkwedge_discover() at netbsd:dkwedge_discover+0xb4 config_interrupts_thread() at netbsd:config_interrupts_thread+0x2c db{0} The place of rip is around the code below. /* * The transfer must be a whole number of blocks and the offset must * not be negative. */ if ((bp-b_bcount % secsize) != 0 || bp-b_blkno 0) { 801b38f6: 89 c8 mov%ecx,%eax 801b38f8: 31 d2 xor%edx,%edx 801b38fa: f7 f6 div%esi 801b38fc: 85 d2 test %edx,%edx 801b38fe: 0f 85 8c 00 00 00 jne 801b3990 dk_strategy+0xd7 801b3904: 48 83 7b 48 00 cmpq $0x0,0x48(%rbx) 801b3909: 0f 88 81 00 00 00 js 801b3990 dk_strategy+0xd7 biodone(bp); return; } I know what happens easily but I don't know how to fix it. Can anyone fix it? Thanks, ozaki-r
Re: Build error spdmem.
Applied. Thanks! ozaki-r On Mon, Apr 20, 2015 at 2:52 PM, henning petersen henning.peter...@t-online.de wrote: In spdmemvar.h is there a space after backslace and one semicolon is missing.
Re: Revisiting DTrace syscall provider
On Sat, Mar 7, 2015 at 11:34 PM, Christos Zoulas chris...@zoulas.com wrote: On Mar 7, 6:11pm, ozak...@netbsd.org (Ryota Ozaki) wrote: -- Subject: Re: Revisiting DTrace syscall provider | I first did so but I changed back to systrace because FreeBSD named it | systrace and named modules for syscall emulations as systrace_linux32 | and systrace_freebsd32. If we follow so, we should keep systrace name as is? Seeing how much confusion systrace is causing I renamed it again: dtrace_syscall dtrace_syscall_netbsd32 dtrace_syscall_linux ... Yea, they're better than dtrace_systrace* :) BTW please don't forget to rename its internal name. (See http://cvsweb.netbsd.org/cgi-bin/cvsweb.cgi/src/external/cddl/osnet/dev/fbt/fbt.c.diff?r1=1.17r2=1.18f=u ) | NP, please do it :) I am making one more change to make things even simpler than what FreeBSD did and save more space. Thanks! ozaki-r Thanks, christos
Re: Revisiting DTrace syscall provider
On Sat, Mar 7, 2015 at 6:10 AM, Christos Zoulas chris...@astron.com wrote: In article cakryomg6z8scuvmzqmfk+n09n+1xfniznc4fac59zxoaljo...@mail.gmail.com, Ryota Ozaki ozak...@netbsd.org wrote: Anyway I updated my patches; they're based on latest -current. Changes since the previous are: - Remove an unexpected contribution comment from kern_dtrace.c (thanks riastradh!) - Don't unload systrace.kmod when there are users using dtrace - Add created from line to *_systrace_args.c http://www.netbsd.org/~ozaki-r/systrace.diff http://www.netbsd.org/~ozaki-r/systrace-full.diff https://github.com/ozaki-r/netbsd-src/commits/dtrace-syscall-provider2 Thanks for working on this... I had done mostly the same, but with a few differences: 1. I named the module dtrace_systrace to match with the other dtrace modules. I first did so but I changed back to systrace because FreeBSD named it systrace and named modules for syscall emulations as systrace_linux32 and systrace_freebsd32. If we follow so, we should keep systrace name as is? 2. I did not create the args union to deal with signed/unsigned, chose to mimick what FreeBSD does. Do you prefer your way? I don't mind if it works. It was just as riz did. 3. I fixed the entry/exit probe functions to deal with the return value of the syscall, as well as the entry and exit argument descriptions. 4. I am not sure if creating systrace_foo files in the emulations should be done right now, since we don't really load/use them. This should be a todo item, perhaps these functions should be the emulation structure and loaded with the emulation. But that should be phase II... Yes I agree. 5. I bumped the kernel because of struct sysent changes. Sure. Anyway it works nicely, and I'd like to commit it if you don't mind (or if you have any changes). NP, please do it :) ozaki-r Thanks, christos Index: kern/Makefile === RCS file: /cvsroot/src/sys/kern/Makefile,v retrieving revision 1.17 diff -u -u -r1.17 Makefile --- kern/Makefile 16 Jan 2014 01:15:34 - 1.17 +++ kern/Makefile 6 Mar 2015 20:55:27 - @@ -11,7 +11,7 @@ @false SYSCALLSRC = makesyscalls.sh syscalls.conf syscalls.master -init_sysent.c syscalls.c ../sys/syscall.h ../sys/syscallargs.h: ${SYSCALLSRC} +init_sysent.c syscalls.c systrace_args.c ../sys/syscall.h ../sys/syscallargs.h: ${SYSCALLSRC} ${HOST_SH} makesyscalls.sh syscalls.conf syscalls.master VNODEIFSRC = vnode_if.sh vnode_if.src Index: kern/makesyscalls.sh === RCS file: /cvsroot/src/sys/kern/makesyscalls.sh,v retrieving revision 1.145 diff -u -u -r1.145 makesyscalls.sh --- kern/makesyscalls.sh24 Jul 2014 11:58:45 - 1.145 +++ kern/makesyscalls.sh6 Mar 2015 20:55:28 - @@ -61,6 +61,7 @@ # source the config file. sys_nosys=sys_nosys # default is sys_nosys(), if not specified otherwise maxsysargs=8 # default limit is 8 (32bit) arguments +systrace=/dev/null rumpcalls=/dev/null rumpcallshdr=/dev/null rumpsysmap=/dev/null @@ -75,15 +76,17 @@ sysnamesbottom=sysnames.bottom rumptypes=rumphdr.types rumpprotos=rumphdr.protos +systracetmp=systrace.$$ +systraceret=systraceret.$$ -trap rm $sysdcl $sysprotos $sysent $sysnamesbottom $rumpsysent $rumptypes $rumpprotos 0 +trap rm $sysdcl $sysprotos $sysent $sysnamesbottom $rumpsysent $rumptypes $rumpprotos $systracetmp $systraceret 0 # Awk program (must support nawk extensions) # Use awk at Berkeley, nawk or gawk elsewhere. awk=${AWK:-awk} # Does this awk have a toupper function? -have_toupper=`$awk 'BEGIN { print toupper(true); exit; }' 2/dev/null` +have_toupper=$($awk 'BEGIN { print toupper(true); exit; }' 2/dev/null) # If this awk does not define toupper then define our own. if [ $have_toupper = TRUE ] ; then @@ -137,6 +140,9 @@ sysnumhdr = \$sysnumhdr\ sysarghdr = \$sysarghdr\ sysarghdrextra = \$sysarghdrextra\ + systrace = \$systrace\ + systracetmp = \$systracetmp\ + systraceret = \$systraceret\ rumpcalls = \$rumpcalls\ rumpcallshdr = \$rumpcallshdr\ rumpsysent = \$rumpsysent\ @@ -211,6 +217,10 @@ printf /* %s */\n\n, tag rumpcallshdr printf /*\n * System call protos in rump namespace.\n *\n rumpcallshdr printf * DO NOT EDIT-- this file is automatically generated.\n rumpcallshdr + + printf /* %s */\n\n, tag systrace + printf /*\n * System call argument to DTrace register array converstion.\n *\n systrace + printf * DO NOT EDIT-- this file is automatically generated.\n systrace } NR == 1 { sub(/ $/, ) @@ -324,6 +334,17 @@ \t\t